Private speech improves cognitive performance in young adults

The current study investigated the relationship between private speech usage and cognitive performance in young adults. Participants (n = 103, mean age = 20.21 years) were instructed to complete a visual-spatial working memory task while talking out loud to themselves as much as possible (Private Speech condition). We found that participants performed better on trials for which they produced a greater amount of private speech. To establish causality, we further found that participants performed better in the Private Speech condition than in a condition in which they were instructed to remain silent (Quiet condition). These beneficial effects of private speech were not moderated by task difficulty, which was manipulated by varying image labelability. However, participants who used more private speech during the task, as well as those who re-ported greater use of self-management private speech in everyday life, showed the greatest benefits. These findings have implications for real-world educational/instructional settings.


Introduction
Humans possess the unique ability to talk to themselves, and this self-talk can be in the form of "inner speech" (i.e., thinking inside the head) or "private speech" (i.e., talking out loud to oneself).Given the general pervasiveness of self-talk (at least within the context of inner speech, see Hurlburt et al., 2013), a natural question arises as to whether it has beneficial effects on everyday functioning.In the laboratory, this question has been studied mostly in the domain of cognitive functioning, with different methodological approaches used to study inner vs. private speech.To investigate the benefits of inner speech, "articulatory suppression" studies have compared cognitive performance between conditions in which inner speech is vs. is not diminished/suppressed, with results demonstrating that inner speech facilitates performance on (some) cognitive tasks (see Nedergaard et al. (2022) for a review).One obvious limitation of studying the effects of inner speech, however, is that its quantification relies on self-reports, which can be unreliable (see McCarthy-Jones & Fernyhough, 2011;Uttl et al., 2011 for discussion).By contrast, private speech can be quantified objectively (via audio recording), thereby allowing a rigorous investigation of its relationship with cognitive performance.Thus, while the results from articulatory suppression studies suggest an important role of inner speech in cognitive performance, much knowledge can be gained by studying the role of private speech on cognitive performance.
To date, the vast majority of studies investigating the impact of private speech on cognitive performance have been restricted to children, likely owing to the fact that the spontaneous use of private speech is known to be prominent in children, but not adults (see Berk, 1986;Winsler et al., 2003, for empirical evidence andVygotsky, 1987 for theory).The results of these studies show that children perform better on tasks when they are instructed to use private speech while performing those tasks (Tower of London : Fernyhough & Fradley, 2005; Tangram puzzles: Lee, 1999; speech-action coordination tasks: Winsler et al., 2007, and see Alderson-Day et al., 2015;Winsler, 2009 for a review).Although there is a dearth of analogous studies investigating the impact of private speech on cognitive performance in adults, there is reason to believe that, like children, adults might be benefited by private speech, for at least two reasons.First, adults do use private speech in their everyday lives (which Fernyhough, 2004 refers to as a "re-expansion" process), with studies showing the highest frequencies of spontaneous private speech usage during challenging and/or complex cognitive tasks (Alarcón-Rubio et al., 2013;Duncan & Cheyne, 2001;Mulvihill et al., 2021), when learning new manual tasks like crafting lanyards (Soskin & John, 1963), and in embarrassing social situations (Duncan & Tarulli, 2009).Although correlational in nature, these findings suggest that adults likely do use private speech to enhance performance and/or to self-regulate.Second, there exists substantial literature directly demonstrating the beneficial effects of private speech on sports performance, for example, when first learning to golf (see Hatzigeorgiadis et al., 2011;Thibodeaux & Winsler, 2018, 2022 for reviews and perspectives, noting that some of these studies involved instructing learners to use inner, not private, speech).
Motivated by the previous literature, and to fill a gap within it, the main goal of the current study was to ask whether instructing adults to use private speech improves their cognitive performance, specifically, on a card-matching game that relies on visual-spatial working memory.To this end, we employed an experimental approach that compared performance on this task between conditions in which adults were instructed to talk out loud (the "Private Speech" condition) vs. instructed to not talk out loud (the "Quiet" condition), counterbalancing the order of the two conditions across participants.The current study serves as a follow-up to our previous study in adults (Guo & Dobkins, 2023), in which participants were instructed to "talk out loud as much as possible" on two trials of the card-matching game, and both performance and amount of private speech were measured.Using a within-person correlational analysis, we found that adults performed significantly better on the trial for which they produced a greater amount of private speech.Because the vast majority of the content of their private speech was found to be strategic in nature (for example, using words related to what and where the hidden images might be), rather than in response to performance (for example, using words with positive affect after having found a hidden image), we argued that our correlational findings were likely to reflect a causal relationship, whereby increasing one's amount of private speech results in improvements in their performance (rather than vice versa).Still, because the study was correlational in nature, the results could not provide conclusive evidence that private speech benefits performance.In sum, the current study used the same visual-spatial memory task as in our previous study, this time taking an experimental approach in order to establish a causal link between private speech usage and cognitive performance.In addition, because the current study collected data from two back-to-back trials of the Private Speech condition, this allowed us to conduct a direct replication of the correlational analysis in our previous study.
A second goal of the current study was to investigate whether the impact of private speech on cognitive performance varies as a function of task difficulty.This question was inspired by studies in the sports psychology literature (mentioned above), which report that talking out loud can hinder golf performance once people become experts (Beilock & Carr, 2001;Marshall et al., 2016).That is, private speech might help beginner golfers who, because they are novices, find golfing to be difficult, yet hurt experts who find golfing to be relatively easy.A commonplace example is learning to tie one's shoes, which is a type of procedural memory.At first, using selftalk (with either inner or private speech) to explain the procedure ("make one loop, tie the other end around the loop, etc.") is helpful, but once one has become an expert in shoe-tying, then self-talk gets in the way.In our previous study, we tried to test this idea by obtaining a baseline measure of performance (i.e., under a condition in which participants did not use private speech), which we used as a proxy for how easy/difficult a participant perceived the card-matching game to be.Our study found no moderating effects of perceived difficulty on the relationship between amount of private speech and performance, although this might have resulted from there not being enough variation across participants to show an effect.
In the current study, we took a different approach to test the moderating effects of task difficulty, by creating two versions of the card-matching task that differ in difficulty.To achieve this, we tested participants with both easy-to-label images (as in our previous study), as well as hard-to-label images.The notion that variations in labelability ought to produce differences in task difficulty comes from previous literature showing that visual working memory performance tends to be better when images are easy-to-label or meaningful, as opposed to hard-to-label or abstract (Asp et al., 2021;Brady et al., 2016;Brady & Störmer, 2021;Souza et al., 2021).In the current study, we confirmed that our easy-to-label images produced better performance than the hard-to-label images, and therefore refer to these conditions as having two different difficulty levels: "Easy" vs. "Hard".
A third goal of the current study was to investigate whether the effects of private speech vary as a function of an individual's natural tendency to use private speech in their everyday life.To test this, the current study employed the Self-Talk Scale (Brinthaupt et al., 2009), which asks participants to self-report their self-talk usage in everyday life (noting that we modified it slightly to ask specifically about private speech usage).In previous studies, it has been shown that responses on this scale do, in fact, predict performance.For example, Shi et al. (2017) showed that participants who report using the self-management type of self-talk more frequently also perform better when assigned to give a persuasive public speech.Likewise, the current study predicted that the benefits of talking out loud when instructed to do so (in an experimental setting) might be greatest for people who have a natural "fluency" in talking out loud to themselves.
In sum, the current study had three main objectives.First, to determine if instructing adults to use private speech improves their performance on a cognitive (visual-spatial working memory) task.Second, to determine if the impact of private speech on performance is moderated by task difficulty.Third, to determine if the impact of private speech on performance is moderated by one's tendency to use private speech in their everyday life.
X. Guo and K. Dobkins

Method
The hypothesis, study design, exclusion criteria, and analysis plan were preregistered: https://osf.io/uz9kf.The three main objectives of the study (see Introduction) were all confirmatory hypotheses in our pre-registration.Throughout the Methods, we note any other analyses that were either exploratory in our pre-registration or were not pre-registered.

Participants
Participants were undergraduate students recruited through a participant pool run by the Department of Psychology at the University of California San Diego between November 2022 and January 2023.Eligibility was restricted to participants who reported being at least 18 years old.All participants gave their informed consent before participating and were compensated with course credits.The study was approved by the Institutional Review Board at our university.
The collected sample consisted of 113 participants, a sample size that was determined by a priori power simulation (see preregistration).Seven participants in total were excluded for the following reasons: not agreeing for their audio to be analyzed anonymously for research purposes (n = 6), not following the proper procedure (n = 1), failing an effort check (see pre-registration for details, n = 1), their private speech was not recorded due to experimenter error (n = 2).The demographics of the remaining 103 participants whose data were analyzed were as follows: Age ranged from 18 to 33 years (M = 20.14,SD = 2.15), the gender identities were 69.1 % women, 25.8 % men, 2.1 % non-binary, and 1 % "prefer not to say", and the ethnicities were 50.5 % Asian, 21.6 % Hispanic, 11.8 % White, 4.1 % Middle Eastern or North African, 2.1 % Black/African American, 8.2 % mixed, and 2.1 % "prefer not to say".

General study design
In this study, there were two main manipulations.The first manipulation was "Speech Condition", with trials being either Quiet (participants were instructed to not talk out loud while performing a card-matching task) or Private Speech (participants were instructed to talk out loud as much as possible during that task).The second manipulation was "Labelability", with the images being either easy-to-label ("Easy" condition) or hard-to-label ("Hard" condition).This 2 × 2 design resulted in four total conditions, and participants were tested with two trials per condition, resulting in eight total trials per participant (see Fig. 2).The data from these eight trials allowed us to conduct two different types of analyses, separately for the Easy and Hard conditions.The Correlational X. Guo and K. Dobkins analyses served as a replication of Guo and Dobkins (2023), in which we showed that participants performed significantly better on the Private Speech trial for which they produced a greater amount of private speech.The Experimental analyses were conducted to establish causality, comparing performance between the Quiet and Private Speech conditions whose order was counterbalanced across participants.

Card-Matching task
The study used a card-matching game, called "Concentration Cat" (iOS App), wherein players are tasked with finding hidden pairs of matching images within an array by tapping/revealing two cards at a time.If a match is made, those cards disappear.If instead there is a mismatch, those cards are automatically hidden again.This task relies on visual-spatial working memory, with the player needing to remember where in the array of cards they last saw an image.To play the game efficiently, the player aims to use as few "turns" as possible, with a turn defined as a pair of taps.
In the current study, we used the card-matching game in a 5 × 5 card array, which required 12 unique images, noting that each image is hidden under two cards, resulting in 24 total cards.Because a 5 × 5 array has 25 spots, one of those spots (i.e., the bottom/ right spot of the array) was intentionally left empty.In the current study, each participant was tested on eight trials, and thus we needed 96 unique images (i.e., 12 per trial).

Creating stimuli for the easy and hard labelability conditions
In our original study (Guo & Dobkins, 2023), we employed images that were easy-to-label.In the current study, our goal was to replicate this previous study using easy-to-label images, and in addition, add a condition in which the images were hard-to-label, as a way of varying the difficulty of the task (see Introduction).Rather than use our old set of easy-to-label images, we created a new bank of images, wherein both the Easy and Hard images share the same low-level visual features.To this end, we used Tangram images (from an online source, Tangram Channel, 2015), each of which consists of seven geometric pieces, including triangles and squares, that can be rearranged to form various images.Example images are shown in Fig. 1.The methods used for choosing Tangram images that were easy-vs.hard-label can be found in Section A of Supplementary Materials.

Trait-level private speech questionnaires
We obtained a trait-like measure of how frequently participants use private speech in their everyday life by administering the 16item Self-Talk Scale ("STS", Brinthaupt et al., 2009), which was developed, and has been validated, in undergraduate student populations.The STS assesses the tendency to use self-talk (without specifying whether the self-talk is private speech or inner speech) under various situations.Because the current study investigated private (and not inner) speech, we modified the STS so that the leading statement was specific to private speech, as follows: "I talk to myself out loud when…" followed by a situation of interest.Participants provided their responses using a 5-point Likert scale with options ranging from "Never" to "Very Often".The STS has four subscales (each subscale comprises four items) to capture different aspects of everyday usage of private speech: 1-Social Assessment measures the tendency to engage in private speech to replay conversations to oneself and envision the reactions of others.For example, "I talk to myself out loud when I'm imagining how other people respond to things I've said".2-Self-Reinforcement measures the tendency to engage in private speech when experiencing a sense of accomplishment or when a positive event has occurred.For example, "I talk to myself out loud when I want to reinforce myself for doing well".3-Self-Criticism measures the tendency to engage in private speech when criticizing oneself for things said or done and showing discouragement.For example, "I talk to myself out loud when I'm really upset with myself".4-Self-Management measures the tendency to engage in private speech when self-directing and deciding on the appropriate actions or words to say.For example, "I talk to myself out loud when I'm mentally exploring a possible course of action".
The STS has been used to ask whether certain types of self-talk predict trait characteristics.For example, Brinthaupt et al. (2009) reported that (higher) responses on the Self-Criticism sub-scale predict (lower) self-esteem.
Participants also filled out three in-house measures, as follows: Frequency: "In general, how often do you talk out loud to yourself?"(10-point scale, ranging from "Never" to "Very Often").
Comfort: "In general, how comfortable are you talking out loud to yourself?"(10-point scale, ranging from "Not at All" to "Completely/Entirely").
Tendency: Because we found a high correlation between "frequency" and "comfort" (r = 0.70, p < 0.001), in line with our preregistration, we created an average score of the two, which we refer to as "Tendency".
Attitudes: "Do you think talking to oneself out loud has a negative societal taboo attached to it?"(10 point scale, ranging from "Not At All" to "Very Much").The mean score on this metric (normalized by subtracting by 1 and dividing by 9, so that 1.0 was the max and 0 was the minimum) was M = 0.47 (SD = 0.26), indicating that participants felt, on average, that talking out loud to themselves was moderately a taboo.We did not use this metric in any of our analyses, but include it only to present a descriptive statistic regarding participants' attitudes about talking out loud to oneself.
In the current study, we explored whether scores on any of the four subscales of the STS or the one in-house construct (Tendency) moderated any effects that were observed in either the Correlational or Experimental analyses.This meant that we tested for potential effects of five different "Trait-Private Speech (PS)" metrics.

In-lab procedure
Upon arrival at the lab, participants were asked to complete the Trait-PS Questionnaires (filled out online over Qualtrics) in a waiting area of the lab.They were then guided into the testing room, where they were informed that they would be playing a cardmatching game, which was explained to them through a pre-recorded video demonstration on a laptop computer.The video demonstration featured a 2 × 3 array of face-down cards with patterns different from those used in the actual trials.During the video, the experimenter paused periodically to elaborate on the rules and goals of the game.Next, the experimenter proceeded by setting up participants to play eight actual trials of the game on an iPad, through an iOS app called "Concentration Cat", which was specifically developed for our study by a professional.The experimenter stepped outside the testing room during all eight trials, so as to not make the participant uncomfortable, and only came back in between the trials to deliver instruction for the next trial.The eight trials were presented in a pre-designed order, as follows.
Labelability of the images (Easy vs. Hard) was blocked and counterbalanced across participants, with half of the participants starting with four Easy trials, and the other half starting with four Hard trials.Within each block of four trials, the order of the Speech condition (Quiet vs. Private Speech, described further below) was counterbalanced across participants, with half of the participants starting with the two Quiet trials first, and the other half of the participants starting with two Private Speech trials first, and this trial order was maintained across the two labelability blocks.This counterbalancing ensured that any observed effects could not be accounted for by trial or block order.(Still, although not pre-registered, we were curious about the potential effects of trial/block, should they exist, which we present in Section C of Supplementary Materials).In sum, the counterbalancing resulted in four different participant groups, tested with four different orders of the eight trials, shown in Fig. 2.
On Quiet trials, participants were instructed to finish the game in as few turns as possible and to not talk out loud.Specifically, they were told "Please finish the game quietly and use as few taps as you can.You will see the time and taps you used after each trial.But the only goal is to use as few taps to finish the game, and we do not care about the time taken to finish the game in this study.You can finish the game at your own pace.I (the experimenter) will be outside, and the door will be closed.".Note that rarely ever did a participant spontaneously talk out loud in this condition (see Results).
On the Private Speech trials, participants were instructed to finish the game in as few turns as possible, and told to talk out loud as much as they could during the game.Specifically, they were told: "Please finish the game using as few taps as you can.You will see the time and taps you used after each trial.But the only goal is to use as few taps to finish the game, and we do not care about the time taken to finish the game in this study.You can finish the game at your own pace.Talk to yourself audibly or externally throughout the game as much as you can.You can use the language you're comfortable with.We do not have instructions on the content of your self-talk.The volume of your self-talk can be comparable to the volume of your social conversations.I (the experimenter) will be outside, and the door will be closed.I won't be able to hear you during the game".
Unbeknownst to the participants, we recorded their speech output through an iPad microphone, so as to calculate an objective measurement of the amount and content of their private speech (see below).Also unbeknownst to the participants, we used a screen capture function on the iPad to collect two pieces of information: (1) number of turns, and (2) time to complete the trial.Both (1) and (2) were automatically shown by the iOS App after each trial.(1) was used as our main performance measure, and (2) was used to compute the utterance rate of private speech (see Measures, below).The screen and audio recordings were collected for all eight trials (the Quiet and Private Speech trials).After finishing all eight trials, the experimenter came back to the testing room again with a laptop, where the Qualtrics questionnaire was loaded, and instructed the participant to answer two questions about their effort level in the experiment (see "failed effort check", above) as well as demographic questions.In an email after the experiment, the participants were debriefed about being secretly audio-recorded during the experiment, and they were given an audio consent form to indicate if they agree for their audio to be analyzed anonymously for research purposes.

Performance measure
As in our previous study (Guo & Dobkins, 2023), the main measure of performance for each trial was the "number of turns" (i.e., pairs of taps) to finish the card-matching game.This measure is regarded as a straightforward and holistic evaluation of efficiency in the card-matching game (Krøjgaard et al., 2019), and is in line with many previous studies that used the same game (Eskritt & Lee, 2002;Washburn & Gulledge, 2002).The "number of turns" was z-scored (within the relevant grouping 1 ) and converted into its additive inverse, so that higher numbers represent better performance, which facilitates understanding of figures depicting performance as a function of other variables.For the Correlational analyses, the two Quiet trials were averaged and used as a measure of Baseline on the task, to explore whether it moderated the relationship between amount of private speech and performance (see Introduction).

Amount of private speech (PS) measure
The current study used utterances/minute as the objective measure of amount of private speech.The choice of this metric (as opposed to total utterances or time) is justified in our previous study (see Guo & Dobkins, 2023), noting that it has also been used in previous private speech studies (Duncan & Cheyne, 2001;Fernyhough & Fradley, 2005;Kronk, 1994;Mulvihill et al., 2021).As a first step, the audio recordings of participants' private speech were analyzed offline by the first author and her research assistants.For private speech trials in English (86.5 % of all trials), an automatic speech recognition tool named Whisper (Radford et al., 2022) was used to generate the initial transcription.The Whisper-transcribed utterances were then reviewed and edited by the first author to ensure accuracy.The initial transcriptions of non-English languages were performed by the first author or her research assistants who know the language (Mandarin: 9.8 % of trials, 1.0 % Korean, 1.0 % Turkish, 1.7 % Spanish).Note that these percentages are out of the total number of trials, as some participants switched languages between their first and second Private Speech trials.
Next, data were entered into a spreadsheet in units of "Utterances", defined as an audible verbal unit separated by differences in semantic meaning or at least one second of temporal distance (Frausel et al., 2020;Rowe, 2012;Rowe & Goldin-Meadow, 2009).For example, "Dog at the top right corner" would be considered as one utterance, whereas "Is the dog here?Nope." would be considered as two utterances.We tested inter-rater reliability for a random subset of 10 participants (40 trials) by having a second transcriber, in addition to the first author.The data from these 10 participants showed very high inter-rater reliability in quantifying the number of utterances (Easy: ICC = 0.962; Hard: ICC = 0.956).As a final step, utterances/minute was calculated as the number of utterances divided by the time to finish the trial.
Note that for our Correlational analyses, which used a within-person approach that replicates Guo & Dobkins (2023), we needed to transform amount of private speech.As we explain in that previous study, our use of two trials for the Private Speech condition allowed us to investigate within-person relationships between amount of PS and performance, i.e., asking whether an individual performed better on the trial for which they produced a greater amount of private speech.In order to conduct a within-person analysis within our multilevel models, we first person-mean centered the amount of PS.For example, if a participant's utterance/min was 40 on one trial and 20 on the other (with a mean of 30), this resulted in the amount of PS in their two Private Speech trials being encoded as +10 and − 10, respectively.Note that in 3.6 % of the Private Speech trials, the number of utterances was 0 (i.e., the participant did not follow the instructions to talk out loud), but values of 0 are permissible in the analyses.This person-centered transformation, sometimes referred to as "centering-within-cluster", reveals Level 1 (i.e., within-person) effects while eliminating Level 2 (i.e., between-person) effects in a multilevel model (Enders & Tofighi, 2007).For our Experimental analyses, where we asked whether amount of PS moderates any of the observed effects, we z-scored the average amount of private speech across the two Private Speech trials (separately for Easy vs. Hard), and used this as a Level 2 variable.Note that we added this analysis after the pre-registration, acknowledging that it was an oversight to not include it in the original pre-registration.
As part of an exploratory analysis (see pre-registration), we investigated the content of private speech, as such findings might steer future studies investigating the effects of different types of private speech on performance.We present the private speech content distribution in Section B of Supplementary Materials.
1 For the Correlational analyses (see below), this meant that the transformation was done separately for Quiet trials (to obtain a measure of baseline performance) vs. the two Private Speech trials.For the Experimental analyses (see below), which included both the Easy and Hard condition in the same model, this meant that the transformation was done using data from all eight trials.For the Experimental analyses performed separately on the Easy vs. Hard conditions, this meant that the transformation was done separately on the four Easy vs. the four Hard trials.

Data exclusion
At the participant level, we applied separate exclusion criteria for the Correlation and Experimental analyses since participants who were disqualified for one analysis could qualify for the other analysis, and we wanted to retain as much data as possible for each analysis.As described above, for the Correlational analyses, the average performance across the two Quiet trials was used as a measure of participants' baseline performance, separately for the Easy vs. Hard labelability conditions.As in our previous study (Guo & Dobkins, 2023), a participant was excluded from these analyses if their baseline performance was three standard deviations worse than the average across participants, computed separately for the Easy vs. Hard conditions.This exclusion criterion resulted in one participant excluded from the Easy condition (leaving 102) and three participants excluded from the Hard condition (leaving 100).For the Experimental analyses, the two Quiet trials were treated as individual trials rather than being averaged to calculate a baseline measure.Thus, no participant was excluded from the Experimental analysis, and data from all 106 participants were retained for analysis.
At the trial level, exclusion criteria were applied separately for each of the four conditions (i.e., Quiet-Easy, Private Speech (PS)-Easy, Quiet-Hard, PS-Hard).A trial was excluded if performance on that trial was three standard deviations worse than the trial-wise average performance of the condition.Accordingly, three trials were excluded for Quiet-Easy, four trials were excluded for PS-Easy, four trials were excluded for Quiet-Hard, and three trials were excluded for PS-Hard.Note that this exclusion criterion was part of our pre-registration and that missing data points of this sort are permissible in multilevel models (Huta, 2014).

Descriptive analyses
Descriptive data of means and distributions of study variables are presented from 810 trials (8 trials × 103 participants after excluding fourteen outlier trials, see Methods).Note that there is some interdependence in these means as they are created from two trials per participant.Still, they provide a reasonable estimate of the raw means.The model-estimated means, which account for this interdependency, are presented later in the Results (in Fig. 3).With regard to performance, the mean number of turns for the four different conditions were as follows: Quiet-Easy trials = 28.16(SD = 6.26),Quiet-Hard trials = 32.82(SD = 7.67), PS-Easy trials = 26.29 (SD = 5.42), PS-Hard trials = 30.69(SD = 7.47).Formal statistical analyses that compare across the four conditions are presented in the Experimental analyses section (below), but we point out up front that participants performed worse when images were hard-to-label.Specifically, in the Quiet condition, the mean number of turns to complete the task was 17 % higher in the Hard vs. Easy condition, confirming that the Hard condition was, in fact, more difficult, as we intended it to be.
With regard to Amount of Private Speech (PS), we had hoped that the Quiet condition would yield no private speech, since participants were explicitly told not to talk out loud.However, 3.2 % of the Quiet trials (Easy: 2.2 %, Hard: 1.0 %) nonetheless contained some spontaneous utterances, and we chose to retain these rare trials in our analysis. 2For the Private Speech trials, the mean number of utterances/minute for the Easy condition was 22.40 (SD = 10.18), which was similar to the values observed in our previous study that also used easy-to-label images (i.e., M = 27.56,SD = 11.26).For the Hard condition, the mean number of utterances/minute was 14.95 (SD = 8.70) was significantly lower than observed in the Easy condition (p < 0.001, see Section D of Supplementary Materials for explanation of analysis).

Correlational analyses: testing the relationship between amount of private speech and performance
These analyses served as a replication of Guo and Dobkins (2023), in which we showed that participants perform significantly better on trials for which they produce more private speech.Using a Type III sum of squares multilevel regression model, separately for the Easy vs. Hard condition, the dependent variable was performance on the Private Speech trials and the predictor terms were: 1) Level 1 Amount of PS (person-centered and entered as a fixed effect), 2) Baseline performance (the average of the two Quiet trials, entered as a fixed effect), with Participant included as a random intercept effect.As we did in our previous study, we also added an interaction term between (1) and ( 2), to see whether participants with lower baseline performance show a stronger relationship between amount of private speech and performance (see Introduction).
The results of these analyses are shown in Table 1 (Left panel: Easy condition, Right panel: Hard condition).Since the direction and effect size of the predictors' coefficients were largely consistent between the Easy and Hard conditions, we present a single narrative for both as follows.Replicating Guo and Dobkins (2023) 3 , these analyses revealed three main findings.First, there was a main effect of Level 1 Amount of PS on performance (Easy: β = 0.12, 95 % CI = [0.03,0.21], p = 0.011; Hard: β = 0.18, 95 % CI = [0.07,0.28], p = 0.001), with higher amounts of private speech being associated with better performance.In other words, participants performed better on the trial for which they produced a greater amount of private speech.Second, as might be expected, Baseline performance predicted performance in the Private Speech condition, i.e., people who did better in the Quiet condition also did better in the Private Speech condition (Easy: β = 0.48, 95 % CI = [0.33,0.63], p < 0.001; Hard: β = 0.35, 95 % CI = [0.19,0.50], p < 0.001).Third, there was no significant interaction between Level 1 Amount of PS and Baseline (Easy: p = 0.501; Hard: p = 0.251), meaning that the relationship between Level 1 Amount of PS and performance in the Private Speech trials was invariant across participants with different baseline performances. 4To address whether the effect of Level 1 Amount of PS differed between the Easy and Hard conditions, we included both Easy and Hard trials into the same model (including Labelability as a random effect).Because we found no interaction between Level 1 Amount of PS and Labelability (p = 0.162), this suggests that the relationship between Level 1 Amount of PS and performance in the private speech trials is comparable across the two levels of task difficulty.

Does trait-PS moderate the effect of level 1 amount of PS on performance?
Using Type III sum of squares multilevel regression models, separately for the Easy vs. Hard condition, we asked whether trait-level private speech usage (Trait-PS), obtained from the self-reports of participants, moderates the positive relationship between Level 1 Amount of PS and performance observed in the prior analysis.Because there are different types of habitual private speech usage, we tested separate models for each of the five different metrics of trait-PS (see Methods).For each of the five models, the dependent variable was performance in the Private Speech trials and the predictor terms were: 1) Level 1 Amount of PS (entered as a fixed effect), 2) Trait-PS (entered as a fixed effect), and 3) the interaction between (1) and ( 2), with Participant included as a random intercept effect.The results of these analyses showed that for both the Easy and Hard conditions, there were no moderating effects of Trait-PS for any of the five measures of Trait-PS (Easy: all ps > 0.097, Hard: all ps > 0.134), meaning that the positive relationship between Level 1 Amount of PS and performance did not vary across participants with different levels of Trait-PS.There was also no main effect of Trait-PS on performance (Easy: all ps > 0.256.Hard: all ps > 0.120), meaning that participants who were high vs. low in Trait-PS performed equally well on Private Speech trials.Finally, the main effect of Level 1 Amount of PS on performance remained significant (Easy: ps ranged from 0.005 to 0.021, Hard: ps ranged from < 0.001 to 0.004).
In sum, the results of our Correlational analyses replicate the key finding from Guo and Dobkins (2023) showing that within-person private speech amount positively predicts visual-spatial working memory performance, and that this effect generalizes to a version of the task made more difficult by virtue of using a set of harder-to-label images.As we discussed in our previous study, finding a positive correlation between amount of private speech and performance still leaves open the question of causality and the direction of causality.This is addressed in our Experimental analyses (next section), which compare performance between the Quiet and Private Speech conditions.

Experimental analyses: testing the causal relationship between private speech and performance
As a starting point, we looked at the effects of both Speech condition (Quiet vs. Private Speech) and Labelability condition (Easy vs. Hard) within the same model, without the inclusion of amount of PS.Using a Type III sum of squares multilevel regression model, the dependent variable was performance and the predictor terms were: 1) Speech condition (Private Speech vs. Quiet, entered as a fixed effect, with levels dummy coded as +0.5 and − 0.5), 2) Labelability condition (Easy vs. Hard, entered as a fixed effect, with levels dummy coded as +0.5 and − 0.5), and 3) the interaction between (1) and ( 2), with Participant included as a random intercept, and reasons.First, as we note in the text, Baseline performance was a highly significant predictor of performance in the Private Speech trials, indicating high predictive validity (in both the Easy and Hard Condition).Second, the correlation between the two Quiet trials (that were averaged to create a baseline measure) was significant (Easy: r(97) = 0.525, p < 0.001, Hard: r(95) = 0.399, p < 0.001), indicating that the baseline measure has moderate reliability.
X. Guo and K. Dobkins   In order to get a sense of the spread of the impact of private speech on performance, in Fig. 4 we present a frequency distribution of "difference scores", which was calculated for each participant as the average z-score performance in the two Private Speech trials minus the average z-score performance in the two Quiet trials, separately for the Easy (Top panel) and Hard (Bottom panel) condition (i.e., Private Speech performance -Quiet performance).As can be seen in the distributions, although the majority of participants had positive-going difference scores (indicative of being benefited by using private speech), a substantial proportion of participants had negative-going difference scores (indicative of being impaired by using private speech).The variation in the difference scoresboth in magnitude and directionis likely due to a combination of true individual differences regarding whether private speech improves vs. hinders performance, noise in the measurement, and the amount of private speech used by a participant (which is explored in the following section).In the Discussion, we delve deeper into potential reasons why private speech might genuinely hinder performance for some participants.

Does level 2 amount of PS moderate the benefit of private speech on performance?
Using Type III sum of squares multilevel regression model, we asked whether between-person (i.e., Level 2) Amount of PS moderates the degree of benefit of private speech on performance observed in the previous analysis.That is, we asked whether the benefit of private speech was greater for participants who used more private speech, which, given the results of the previous analysis, makes intuitive sense would be the case.As a first step, for each participant, Level 2 Amount of PS was calculated by averaging the utterances/ min between their two Private Speech trials (followed by z-scoring), separately for the Easy vs. Hard conditions.The dependent variable was performance and the predictor terms were: 1) Speech condition (Private Speech vs. Quiet, entered as a fixed effect, with levels dummy coded as + 0.5 and − 0.5), 2) Level 2 Amount of PS (entered as a fixed effect), and 3) the interaction between (1) and ( 2), with Participant included as a random intercept, and Speech condition included as random slopes.
The results are shown in Table 3, separately for the Easy condition (Left panel) and Hard condition (Right panel).Since the direction and effect size of the predictors' coefficients were largely consistent between the Easy and Hard conditions, we present a single narrative for both as follows.The results revealed a significant interaction between Level 2 Amount of PS and Speech condition (Easy: β = 0.28, 95 % CI = [0.12,0.43], p = 0.001; Hard: β = 0.28, 95 % CI = [0.12,0.45], p = 0.005), which, unsurprisingly, was driven by participants who talked out loud more (i.e., higher Level 2 Amount of PS) showing the biggest benefits.To further investigate what pair-wise effects drove these interactions, we conducted post-hoc comparisons, and a visual depiction of the resulting model-estimated performance means is presented in Fig. 5, for the Easy (Left panel) and Hard (right panel) condition.Post-hoc analyses revealed that Level 2 Amount of PS positively predicted performance in the Private Speech condition (Easy, β = 0.21, 95 % CI = [0.06,0.37], p = 0.008; Hard, β = 0.21, 95 % CI = [0.04,0.37], p = 0.014), but not in the Quiet condition (Easy, p = 0.301; Hard, p = 0.738).That is, participants who talked out loud the most outperformed those who talked less, but only in the Private Speech, not the Quiet, condition.

Does trait-PS moderate the benefit of private speech on performance over and beyond the effect of level 2 amount of PS?
In addition to the above analysis, which asked whether participants with higher amounts of private speech show more benefit, it was our intention to ask a similar question regarding Trait-PS.Mirroring the spirit of the question asked within the Correlational analyses (above), we wondered whether the benefit of private speech might be greater for people who report more usage of private speech in their everyday lives.Before proceeding with this question, we wanted to determine whether Trait-PS shares variance with Level 2 Amount of PS, since the two constructs could be redundant (i.e., people who report more usage of private speech in their everyday lives are likely to be the same people who produce greater amounts of private speech when instructed to "talk out loud as much as possible" in a laboratory study).To this end, we conducted correlational analyses between Level 2 Amount of PS with all five measures of trait-PS.Although the correlations were low (Easy: r values ranging from − 0.05 -0.16, ps = 0.966-0.168,Hard: r values ranging from -0.053 -0.189, ps = 0.921-0.065),we deemed it safer to keep Level 2 Amount of PS (and its interaction with Speech condition) in our models testing the moderating effects of Trait-PS, so that the results would reveal effects "over and beyond" those explained by effects of Level 2 Amount of PS (seen in Table 3 and Fig. 5).We then conducted five multilevel regression models, separately for each of the five Trait-PS measures, with the dependent variable being performance and the predictor terms being: 1) Speech condition (Private Speech vs. Quiet, entered as a fixed effect with levels dummy coded as +0.5 and − 0.5), 2) Level 2 Amount of PS (entered as a fixed effect), 3) the interaction between (1) and (2), 4) Trait-PS (entered as a fixed effect), 5) the interaction between (1) and ( 4), with Participant included as a random intercept, and Speech condition included as random slopes.The findings are shown in Table 4 (Left panel: Easy, Right panel: Hard).The results show that for one of the Trait-PS metrics, specifically, Self-Management, there was a moderating effect wherein participants who reported higher levels of self-management private speech in their everyday lives showed the biggest benefits.This effect was only significant in the Easy condition (β = 0.17, 95 % CI = [0.02,0.32], p = 0.031), but a similar trend was seen in the Hard condition.To further investigate what pair-wise effects drove this interaction, we conducted post-hoc comparisons, and a visual depiction of the resulting model-estimated performance means is presented in Fig. 6.These analyses revealed that Self-Management negatively predicted performance in the Quiet condition (Easy: β = -0.25,95 % CI = [-0.39,− 0.10], p = 0.001, Hard: β = -0.21,95 % CI = [-0.37,− 0.05], p = 0.011), but not in the Private Speech condition (Easy: p = 0.199, Hard: p = 0.155).That is, participants who reported using more Self-Management self-talk underperformed those who reported less usage, but only in the Quiet, and not the Private Speech, condition.This suggests that participants who reported habitually using private speech in everyday life may have felt hindered in the Quiet condition where they were  explicitly told to keep quiet.This finding and its interpretation should be interpreted with caution, however, since our investigation of the effects of Trait-PS consisted of testing 10 different models (five measures of Trait-PS for each the Easy and Hard condition) and thus the resulting p-value for the one significant effect observed might not hold up after correcting for multiple comparisons.Still, the effect is intriguing and deserves further study.

Discussion
The results of the current study conducted in young adults show that the degree to which one uses private speech, when instructed to do so, is positively associated with performance on a cognitive task, specifically, a visual-spatial working memory task.In our Correlational analyses, within-person performance was compared between two Private Speech trials, in which they were instructed to finish the game in as few turns as possible, while talking out loud to themselves as much as they could.The results of these analyses show that individuals perform significantly better on trials for which they produce a greater amount of private speech, providing a direct replication of our previous study (Guo and Dobkins, 2023).These correlational findings were boosted by our Experimental analyses, which compared within-person performance between the Private Speech condition and a Quiet condition, in which participants were explicitly instructed to not talk out loud.The results of these analyses provided direct evidence that the use of private speech significantly improves performance.In addition, we found that the effects of private speech (in both our Correlational and Experimental analyses) were comparable between conditions that made the task Easy (with easy-to-label images) vs. Hard (with hardto-label images), suggesting thatat least within the confines of the current study, the benefits of private speech on performance are invariant across different levels of difficulty.
As might be expected given the finding from our Correlational studies showing that participants perform better on Private Speech trials in which they produce greater (vs.lesser) amounts of private speech, the results from our Experimental analyses showed that the benefit of private speech on performance is highest for people who produce greater (vs.lesser) amounts of private speech during the task.Post-hoc analyses revealed that this effect was driven by superior performance in people who talk out loud the most in the Private Speech condition, with no difference between people in the Quiet condition (see Fig. 5).In addition, the results from our Experimental analyses showed that the benefit of private speech on performance is highest for people who report greater (vs.lesser) usage of selfmanagement private speech in their everyday lives.Here, post-hoc analyses revealed that this effect was driven by inferior performance in people who habitually talk out loud to themselves, but only in the Quiet condition, with no difference between people in the Private Speech condition (see Fig. 6).[Although our Correlation analysis did not reveal a moderating effect of private speech usage in everyday life, this should not be considered contradictory to the positive findings from our Experimental analysis, as the Correlational analysis targeted the relationship between the two Private Speech trials, and the effect of daily private speech usage (seen in the Experimental analysis) is restricted to its effect on Quiet trials].In sum, these findings suggests that people who habitually use private speech for selfmanagement purposes might be relatively hindered in visual-spatial memory tasks when they are instructed to "keep quiet" since this goes against their natural tendency.
Mechanisms.With respect to what might underlie the beneficial effects of private speech on visual-spatial working memory, there are several potential mechanisms.First, the use of private speech might serve to increase attention to the task at hand, thus enhancing performance, a notion that has been discussed in the sports psychology literature (see Hatzigeorgiadis & Galanis, 2017).In the working memory literature, the idea that attention plays a role in performance is supported by empirical studies showing that attentiondirecting cues can influence which items get encoded into working memory (e.g., Sperling, 1960, Schmidt et al., 2002) as well as theories that propose that attention and working memory are tightly intertwined constructs (see Fougnie, 2008;Oberauer, 2019 for reviews).At even a more basic level of processing, attention has been shown to influence visual discrimination (e.g., Carrasco et al., 2000;Huang & Dobkins, 2005;Lee et al., 1997), and this could contribute to performance on higher-level working memory tasks.
A second possibility is that private speech enhances visual-spatial working memory performance by adding an auditory cue (e.g., the sound of one's voice saying "cat") on top of an already present visual cue (i.e., the image of a cat).This idea is in line with a large body of literature showing that working memory is enhanced when stimuli are presented bimodally (auditory + visual) as compared to presented unimodally (see Mastroberardino et al, 2008 for review).In addition, this "bimodal" effect is one of the potential explanations of a well-studied phenomenon, referred to as the "production effect, in which reading a word out loud makes it more memorable than reading it silently (MacLeod et al., 2010;and see MacLeod & Bodner, 2017 for review).Specifically, the bimodal aspect of the production effect has been demonstrated in a study showing that simply listening to another's auditory production of a word while viewing that word enhances memory performance (compared to reading it silently while viewing), although not as much as producing the word oneself (MacLeod, 2011).
A final potential underlying mechanism comes from a related area of research known as "verbalization" studies, which show that people's memory for visual images is enhanced when they are instructed to label them out loud (see Introduction of Souza & Skora, 2017 for review).One idea regarding why verbalizing may enhance memory performance comes from another line of work, which shows that (even under articulatory suppression that disallows verbalization), visual working memory is significantly better for meaningful (easily recognizable) images than for unmeaningful (e.g., scrambled or unrecognizable) images.Although the exact mechanism underlying these effects is not fully understood, converging evidence suggests that meaningful images better activate prior knowledge in long-term memory, which serves to expand the capacity of working memory (Asp et al., 2021;Brady et al., 2016;Brady & Störmer, 2021).In a similar vein, we propose that the act of labeling out loud, perhaps through increasing the meaningfulness of an image, could likewise activate long-term memories and, in turn, enhance working memory.Although the current study was not a verbalization study, in that we did not instruct participants regarding the content of their private speech, much of their output was in the form of labeling the images (see Section B of Supplementary Materials), and therefore the mechanisms underlying enhancement of performance on our task may, in fact, be similar to those proposed for verbalization studies.
On a final note, we point out that it is difficult to know whether the cognitive performance benefits of using private speech, as well as the underlying mechanisms, are unique to private speech or are a more general phenomenon of self-talk (private speech or inner speech).Because inner cannot be quantified objectively (see Introduction), attempts to compare/contrast the impacts (and mechanisms) of inner vs. private speech will be quite challenging.Even in the current study, we do not know the extent to which participants used inner speech in the Quiet condition.Had the current study found no overall benefit of talking out loud, we might have proposed that participants simply switched between using private speech (when instructed to do so) and inner speech (when instructed to keep quiet), and that the two types of self-talk are equally effective.In fact, the apparent impairment from using private speech we saw in some participants (i.e., the negative-going difference scores in Fig. 4), if real, could have resulted from those participants relying heavily on inner speech when told to keep quiet, which resulted in better performance in the Quiet, as opposed to the Private Speech, condition.Before we can make any conclusions, however, further research is needed to determine the reliability of the impact of private speech in individual participants (see Results).

Implications and future directions
The findings of the current study show that talking out loudwhen instructed to do so, improves cognitive performance in adults.The results also suggest that "the more private speech the better" and that for people who habitually use private speech for selfmanagement in everyday life, keeping quiet during a task might actually impair performance.Still, there is much left to discover.For example, there may be a non-linear effect regarding the effects of private speech on performance, i.e., a "sweet spot" regarding how much private speech is optimal for performance.And, as mentioned above, using private speech may not be beneficial for all people.For example, for some people, being asked to talk out loud might feel like an additional task placed on top of performing a cognitive (e. g., card-matching) task, which might then negatively affect their performance on the cognitive task (see Jackson et al., 2023, Rhodes et al., 2019 for evidence that dual-tasks impair memory performance).These possibilities should be explored further.
In addition, we suggest the exploration of other variables that might moderate or enhance the benefits of using private speech.First, although the current study found that the benefits of private speech did not differ between two different levels of task difficulty (manipulated with labelability), it may be that we did not create enough variation in task difficulty to witness its effects.Future studies might try to create more pronounced differences in difficulty, e.g., by scrambling images of real-life objects to make them completely meaningless and impossible to label, and/or using a greater number of images.Second, future studies could investigate the effects of different types of private speech on performance, by explicitly instructing participants to use different types (e.g., as in the verbalization literature, see Souza et al., 2021).Finally, the effect of age is another variable that might be fruitful to investigate.The cardmatching game of the current study was deliberately chosen because it can easily be administered in children (Krøjgaard et al., 2019).As such, future studies might map out the developmental trajectory -from young children to aging adults, of the effects observed in the current study.In sum, determining the "who, when and why" private speech benefits performance is likely to have important implications for real-world educational/instructional settings.
X. Guo and K. Dobkins

Fig. 2 .
Fig. 2. Order of the eight trials, counterbalanced across four participant groups.Note.Each participant provided data for two Speech conditions (Quiet vs. Private Speech) crossed with two Labelability conditions (Easy vs. Hard).Instructions for the Quiet and Private Speech (PS) conditions are shown here.The number 2 in parentheses denotes there being two back-to-back trials in each condition.

Fig. 3 .
Fig. 3. Model-Estimated Mean Performance as a Function of Speech Condition (Quiet vs. Private Speech, PS) and Labelability Condition (Easy vs. Hard).Note.Performance values are z-scored (see Methods), and error bars represent 95 % confidence intervals.The raw means and SD of performance for the four conditions are presented in the Descriptive Statistics section, above.

Fig. 4 .
Fig. 4. Frequency Distributions of Difference Scores (Private Speech performance -Quiet performance), separately for the Easy (Top panel) and Hard (Bottom panel) condition.See text for details.The dashed red line represents the mean of the distribution.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Model-Estimated Performance as a Function of Speech Condition (Quiet vs. Private Speech, PS) and Level 2 Amount of PS, separately for the Easy (left panel) and Hard (right panel) condition.Note.The values "-1″, "0″, and "1″ represent participants who are one standard deviation below, equal to, and one standard deviation above the mean Level 2 Amount of PS for the group.Performance values are z-scored (see Methods), and error bars represent 95 % confidence intervals.

Fig. 6 .
Fig. 6.Model-Estimated Performance as a Function of Speech Condition (Quiet vs. Private Speech, PS) and Daily Usage of Self-Management Private Speech (Self-Manage), separately for the Easy (left panel) and Hard (right panel) condition.Note.The values "-1″, "0″, and "1″ represent participants who are one standard deviation below, equal to, and one standard deviation above the mean self-reported usage of self-management private speech for the group.Performance values are z-scored (see Methods), and error bars represent 95 % confidence intervals.

Table 1
The Results of Type III Multilevel Models for Testing the Effects of Level 1 Amount of PS on Performance in the Private Speech condition, when the images were Easy (left panel) and Hard (right panel) to label.

Table 2
The Results of a Type III Multilevel Model Testing the Effects of Speech Condition (Quiet vs. Private Speech, PS) and Labelability Condition (Easy vs. Hard) on Performance.

Table 3
The Results of a Type III Multilevel Model Investigating the Moderating Effects of Level 2 Amount of PS on the Impact of Private Speech on Performance, in the Easy (left panel) and Hard (right panel) conditions.

Table 4
The Results of a Type III Multilevel Model Investigating the Moderating Effects of Daily Usage of Self-Management Private Speech (Self-Manage) on the Impact of Private Speech on Performance, in the Easy (left panel) and Hard (right panel) conditions.