Working memory for movement rhythms given spatial relevance: Effects of sequence length and maintenance delay

ABSTRACT Temporal information is an essential component of human movements. However, it is still unclear how the temporal information is extracted from complex whole-body movements through observation and how it is encoded and retained in working memory. In the current study, we investigated how the sequence length and maintenance delay influence working memory for movement rhythms (i.e., temporal structures of movement sequences) after considering the task-relevance of the corresponding spatial information and the sensitivity difference between spatial and temporal processing in visual perception. We found that the sequence length – in the sense of information load more than temporal duration – may act as the first bottleneck in the processing of movement rhythms, deciding whether temporal information can be encoded as individual units in high precision or it might be encoded as an ensemble “whole” in relatively low precision. In addition, the maintenance delay may act as the second bottleneck, determining to what extent the encoded information can be retained in memory.

Perceiving and extracting information from biological movements is an essential ability of human beings. For example, it has been shown that humans can efficiently determine the emotional state (Dittrich et al., 1996;Pollick et al., 2001), identity Troje et al., 2005), gender Mather & Murdoch, 1994), and intention (Kilner, 2011;Rizzolatti et al., 2001) of other individuals by their movements. However, it has also been observed that humans are less sensitive to temporal information (e.g., speed, rhythm) than to spatial information (e.g., joint trajectory, action pattern) of movements.

Sensitivity difference between spatial and temporal processing
The concept of "sensitivity" is twofold, defined by both subjective perceptual abilities of an observer and objective properties of a stimulus. Therefore, the sensitivity difference between spatial and temporal processing in visual perception can be attributed to at least two reasons. First, according to the modality appropriateness hypothesis, vision is more specialized in processing spatial information (just like audition is more specialized in processing temporal information) due to the modality advantage of vision (Freides, 1974;O'Connor & Hermelin, 1972;Welch, 1999;Welch & Warren, 1980) (but see Bell et al., 2019 for a review on the corss-modal effects of sensory deprivation). As a result, visual perception of temporal information is normally less accurate than that of its spatial counterpart (e.g., Collier & Logan, 2000;Glenberg et al., 1989;Glenberg & Jona, 1991;Repp & Penel, 2002). Second, spatial information is in general more behaviourally relevant in natural environment and thus is more likely to gain priority in processing. For example, to achieve the robustness of action perception, the perceptual system tends to categorize movements by spatial structures, while generalizing across timing differences (Giese et al., 2008). Moreover, when temporal information is defined as the change of spatial information over time (i.e., second-order feature), such as speed (defined as the distance travelled along a trajectory divided by elapsed time), the subordinate position of temporal information also implies an intrinsic dependence of temporal processing on spatial information. Based on this twofold perspective, the sensitivity difference we referred to in this paper is a phenomenon or a "compound effect" resulting from both the modality advantage of vision and the behavioural relevance or physical salience of the stimuli.
Although temporal information is less prioritized in visual perception, it is not trivial to ask how temporal information of movements is perceived. For example, temporal features can still influence the accuracy of biological motion perception (e.g., Barclay et al., 1978;Hill & Pollick, 2000;Pollick et al., 2001). Furthermore, the performance of a variety of motor skills, such as dance, also requires performing a movement (or a sequence of movements) with a precise timing (or rhythm), which implies the necessity of perceiving the embedded temporal structure through vision when observation is used as a way of learning (i.e., observational learning) (Bandura, 1986;Bandura & Walters, 1977). Previous research has shown that observers can perceive temporal regularities in complex biological movements (e.g., vertical trunk bounces plus lateral limb movements) and extract different metrical periodicities based on a similar mechanism as hearing the music (Su, 2016). However, it is unclear how a more complex rhythm, such as a metric complex rhythm (i.e., integer-ratio rhythms without regular temporal accents aligned with the beat, see Grahn, 2012), might be extracted from human movements.

Working memory for observed movements
In addition, the perception of temporal structures requires an integration of temporal durations or temporal characteristics of movements (e.g., speed) over time and thus the involvement of working memory. It has been shown that observed movements, albeit comprising ostensibly visual and spatial components, are retained independently from colours, shapes, objects, or spatial locations in working memory (Shen et al., 2014;Smyth et al., 1988;Smyth & Pendleton, 1989, 1990Wood, 2007Wood, , 2011. Evidence from neuroimaging studies also supports a distinct neural mechanism for maintaining body-related images (e.g.,  and biological motions (e.g., Cai et al., 2018;Lu et al., 2016) (see Galvez-Pol et al., 2020 for a review).
Concerning features of actions (or body movements), it was shown that features that are inherent to actions (e.g., type, duration) are stored as integrated representations (Wood, 2007(Wood, , 2011, while actions and other non-action features (e.g., agent identity, colour) are stored separately (Ding et al., 2015;Wood, 2008). It is worth noting that, although movement perception entails an integration of spatial information over time or an integration of "form" and "motion" in dissociable neural pathways (Giese & Poggio, 2003), biological motions are not regarded as a type of common binding (e.g., colourshape binding), as they tend to be stored independently from bound representations in working memory (Liu et al., 2019). Memorizing static body postures and dynamic body movements were also shown to rely on different mechanisms .
Moreover, working memory for movement patterns differs from working memory for movements to spatial locations (Smyth et al., 1988;Smyth & Pendleton, 1989). The finding resonates with the distinction between movements with an external target in space as the goal and movements with its movement pattern or body configuration as the goal (i.e., movement-based goal) (Schachner & Carey, 2013). While the former can be represented as a position in space irrespective of how the movement is carried out, the latter requires a body-centered or a trajectory-based representation. As the present research focused on the processing of movement rhythms conveyed through complex movement trajectories (see below), "movements" we referred to in this paper are movements of the latter type.
Previous research has also examined working memory capacity for movements. It was shown that depending on the way movements are designed (e.g., with or without action semantics), displayed (e.g., sequentially or simultaneously, performed by humans, computer-generated avatars, or point-light figures), and encoded (e.g., with or without concurrent interference), storage capacity for movements can vary between 2-5 units (e.g., Shen et al., 2014;Smyth et al., 1988;Smyth & Pendleton, 1989, 1990Wang et al., 2022;Wood, 2007Wood, , 2011. For example, Smyth et al. (1988) used a movement span task, in which participants watched a sequence of body movements (e.g., head turn, arm raise, leg raise, etc.) performed by experimenters face-to-face and then reproduced those movements themselves. The authors found that participants can remember "just over four" movements when encoded without a suppression task and 3-4 movements when encoded with a concurrent verbal (i.e., counting) or motor (i.e., body tapping) task (see also Smyth & Pendleton, 1989, 1990. More recently, Wood (2007) measured the storage capacity in a more elegant way by using a change detection task adapted from Luck and Vogel (1997). On each trial, participants observed a sample sequence of movements, containing movements similar to those used in Smyth et al. (1988) but performed by a computer-generated avatar and displayed on a computer screen. After a short delay, they observed a test sequence and judged whether the two sequences were the same or different. The author found that participants can only retain 2-3 movements or properties of movements in working memory (see also Wood, 2011). Different from the aforementioned studies, Shen et al. (2014) used meaningful movements (i.e., movements with action semantics, e.g., cycling, jumping, waving, etc.) performed by point-light figures as visual stimuli and presented them simultaneously (rather than sequentially) in a change detection task. They found that participants can retain 3-4 movements in working memory when sufficient processing time was given.
Furthermore, the complexity of movements may also influence working memory capacity. For example, Wang et al. (2022) decomposed biological motions from the perspective of systematic anatomy and found that complex biological motions, i.e., motions containing more joints (elbow, shoulder, and wrist) and planes (horizontal, sagittal, frontal) of the body, required more cognitive resources for verbal encoding and thus are more difficult to remember. The finding is consistent with previous research showing that the higher the stimulus complexity (i.e., information load per item), the fewer items one can hold in memory (Alvarez & Cavanagh, 2004).
Although the previous research has provided valuable insights into how movement information is represented and processed in working memory, most of the studies used isolated simple movements as visual stimuli. It is therefore unclear how the information conveyed through complex whole-body movements might be encoded and retained in working memory.
In addition, a large amount of work has focused on the capacity limitation of working memory, with an aim to determine the number of movement units that can be retained at once. Yet less has addressed the relation between the units and the hierarchical structure of working memory representations, namely the possibility that the information can be encoded as a summation of individual units or an ensemble "whole" that integrates information across all units, such as a perceptual group or ensemble statistics (e.g., Brady et al., 2011;Brady & Alvarez, 2015a;Brady & Tenenbaum, 2013;Liesefeld et al., 2019).

The present research
To address the unsolved issues raised above, we used a change detection task similar to those used in previous studies (e.g., Wood, 2007) to investigate how metric complex rhythms conveyed through complex whole-body movement sequences are perceived, encoded, and retained in working memory and how the sequence length and maintenance delay modulate memory performance. Importantly, we examined working memory for movement rhythms after considering the task-relevance of the corresponding movement trajectories and the sensitivity difference between spatial and temporal processing in visual perception. As discussed previously, humans are in general more sensitive to spatial than to temporal information of movements partly due to behavioural relevance of spatial information in natural observation. Moreover, to explore how a movement sequence might be represented in memory when "a summation of parts" and "an integrated whole" are both possible, we defined the "change" in the change detection task in terms of the whole sequence rather than a single unit of the sequence in both spatial and temporal domains. As a result, change detection in the current study may be supported by high-precision representations of individual units and/or low-precision "gist" representations of the whole sequence.
We started from designing a stimulus pool that contained whole-body movement sequences resembling real-world materials in dance or martial arts, namely sequences composed of movements that were seen as an intended outcome rather than a means to an end (Schachner & Carey, 2013). All movements were thus without interpretable external goals and action semantics. We then applied metric complex rhythms to the sequences. Rhythms were designed and paced to deliver high temporal contrast both within and between the sequences (see "Method" below for more details).
In the pilot experiment, we tested whether participants are more sensitive to spatial (trajectory) than to temporal (rhythm) changes of movements when observing movement sequences from the stimulus pool we created. Next, given the high discriminability of the spatial information and the sensitivity difference between spatial and temporal processing (i.e., spatial bias) observed in the pilot experiment, we investigated in the main experiment how the sequence length and maintenance delay influence working memory for movement rhythms when the perceptually salient spatial information is also taskrelevant. In general, we predicted that longer movement sequences would result in worse memory performance ("set-size effect") (e.g., Cowan, 2001) and that longer maintenance delay would lead to severer performance decline ("forgetting") (Brown, 1958;see Ricker et al., 2016 for a review).

Participants
Twenty-eight participants (19 female; aged 19-38 years, M = 27.3, SD = 4.6) were recruited for the experiment. A minimum sample size of 19 was determined to provide a power of .90 at an alpha of .05 to detect a large within-subjects effect (d = 0.8) for information type (spatial vs. temporal) on change-detection performance. Based on our literature review and practical observations, we expected a large sensitivity difference between spatial and temporal processing. Nevertheless, since the current stimulus set was newly created, we intentionally adopted a slightly larger sample size that was around 30 to gain as well a representative description of the stimulus set. The final sample size of 28 would provide a power of .98 for large effects (d = 0.8) or a power of .91 for medium-to-large effects (d = 0.65). Power analyses were conducted by using G*Power software (Faul et al., 2007).
Participants' experiences in dance, music, and sport were evaluated by a questionnaire and reported here as expertise indexes (0: No experience, 1: Beginner, 2: Intermediate amateur, 3: Advanced amateur, 4: Professional) of 0.5 (SD = 0.6), 0.6 (SD = 0.8), and 1.3 (SD = 1.0), defined by both the self-evaluated skill level and the training length. 1 No professionals were recruited in the present research. Participants signed informed consent prior to the experiment and received €8 per hour for their participation. The two experiments presented in this paper were conducted in accordance with the ethical principles stated within the declaration of Helsinki (1964) and were approved by the Ethics Committee of Bielefeld University.

Stimuli and apparatus
A stimulus pool of 44 whole-body movement sequences (each performed in four different rhythms) was originally designed and recorded for this research. Half of the sequences were performed by a female dancer and the other half were performed by a male dancer; both were professionally trained in classical and contemporary dance. As illustrated in Figure 1, each movement sequence was composed of four linked movement units. A movement unit was defined as a coordinated whole-body movement that can be performed with a single bell-shaped velocity profile (i.e., accelerating till the midpoint of the movement and then decelerating) (Abend et al., 1982) 2 and thus had a clear starting point and ending point where the velocity was zero. To better resemble the continuous nature of real-world movement sequences, the ending pose of the first unit was linked with the starting pose of the second one, and so on, to create a relatively continuous trajectory. Movements were all without interpretable external goals and action semantics to diminish the influence from long-term semantic memory.
In addition, each sequence was performed in four metric complex rhythms: 3212, 2132, 1223, and 2321 (see below for the rationality behind the design of rhythms). Each rhythm was eight-beat long in a 4/4 musical metre and paced at a tempo of 90 beats per minute (bpm), yielding a sequence length of around 6 s after including one additional beat for preparation. Each rhythm was composed of four temporal durations (one 1-beat duration, two 2-beat durations, and one 3-beat duration), corresponding to four movement units of a sequence. When the same movement unit was performed, a shorter duration also implied a higher speed. Note that the tempo of 90 bpm was chosen as it was the highest speed at which most of the 1-beat units can be precisely performed. The tempo of 100 bpm had also been tested by the dancers before the recording, but it was not adopted due to high imprecision in movement execution.
To ensure that each rhythm had its own salient feature to be recognized and that each pair of the rhythms, when different, had sufficient temporal contrast to be detected, the four rhythms were designed to have one single salient component, i.e., the 1-beat unit, equally distributed across the four serial positions of a sequence. Since the 1-beat unit yielded a relatively high speed of movement (i.e., at a speed over which the movement might not be precisely performed by the human body), it was assumed to be salient enough to be detected within a sequence. This design also ensured the discriminability between any of the two rhythms, as the 1-beat unit in the sample sequence would always be compared with either a 2-beat or a 3-beat unit (i.e., unit that was performed two times or three times as slow) in the test sequence. Moreover, to equalize the level of discriminability across trials, the overall temporal contrast, defined as the sum of absolute duration differences (by unit) between each of the two rhythms, was also controlled. For example, the overall contrast between the rhythms 3212 and 2132 was | 3 − 2 | + | 2 − 1 | + | 1 − 3 | + | 2 − 2 | = 4, which was equal to the overall contrast between 1223 and 2321 or between 2132 and 1223.
All movement recordings were first scrutinized by the experimenters to ensure the consistency of trajectory and the precision of rhythm. Eight sequences were not used due to the imprecise execution of the rhythm, leaving a total of 36 sequences in the stimulus pool that were deemed to have sufficient and comparable temporal contrast to be detected, i.e., relatively homogeneous in this sense. Four sequences (2 female, 2 male) were used in the practice trials and 24 sequences (12 female, 12 male) were used in the experimental trials. It would be ideal if all remaining 32 sequences could have been used in the experimental trials to gain a representative description of the stimulus pool. However, considering the large amount of trials required by a balanced design and the resulting length of the experiment (approaching 3 hours), we randomly chose 24 sequences from the stimulus pool for the current experiment.
Video clips were displayed silently to participants at 1600 × 900 pixels on a 24-inch LCD screen (Dell U2412M) with a viewing distance of approximately 50 cm. The experimental flow and data analysis were programmed in Python; stimuli presentation was implemented with the PsychoPy software package (Peirce, 2007(Peirce, , 2009. More information about the stimulus design, recording, and postediting as well as three sample video clips that illustrate movement sequences with different trajectories and/or rhythms were published in another study of our group (see Chiou, 2022a and Footnote). 3 Design and procedure Participants performed a change detection task on whole-body movement sequences that were either the same or different in temporal (rhythm) and/or spatial (trajectory) domain(s). As mentioned previously, the "change" we referred to in the current study was the change of the whole sequence (i.e., movement trajectory or movement rhythm of the sequence) rather than the change of a single unit of the sequence. The design yielded four trial conditions: (1) same condition, in which the sample and the test sequences had the same trajectory and the same rhythm, (2) temporal condition, in which the sample and the test sequences had the same trajectory but different rhythms, (3) spatial condition, in which the sample and the test sequences had the same rhythm but different trajectories, and (4) both condition, in which the sample and the test sequences had different trajectories and different rhythms. Each condition accounted for one fourth of the total trials.
One may suspect that an uneven a priori probability of a same vs. different response (1/4 vs. 3/4) may lead to a response bias. However, as we expected that participants were likely to use the spatial cue as the first decision cue (i.e., to judge first whether the sample and the test sequences have the same trajectory) due to the sensitivity difference between spatial and temporal processing (see "Data analysis" below for more discussion), the response probability in each domain-specific decision space should be more relevant to participants' decision behaviour than the response probability at the global level. Specifically, decisions in the spatial domain, based on the current design, would be relatively unbiased since the "same" trials (including trials in the same and temporal conditions wherein movement sequences had the same trajectory) accounted for 50% of the total trials; decisions in the temporal domain (given the same trajectory) would also be unbiased as the "same" trials (trials in the same condition) accounted for 50% of the total trials (including trials in the same and temporal conditions) as well.
To probe potential effects of forgetting and to provide pilot results for the design of the main experiment, the retention interval was also manipulated. Previous research on working memory for visual materials has shown that performance deteriorated when the retention interval increased from 1 s to 6 s and to 12 s (e.g., . Considering the extended length of the current stimuli (6 s), we decided to use a retention interval of 0.5 s and 3 s (not including the 1s fixation cross before the display of the test sequence, see below and Figure 2) for this pilot experiment.
At the beginning of the experiment, spatial and temporal changes illustrated by simplified animations on a computer screen (i.e., a square moving along a straight line or a curved line to illustrate a difference in "path", and a square moving along a straight line with a low or a high speed to illustrate a difference in "speed") were explained verbally to participants. The words path and speed (instead of trajectory and rhythm) were used to make the concepts more understandable to participants. In addition, participants were clearly informed that they should attend to both possible changes and make a "yes" (i.e., "same") judgment only when both the path and the speed were the same.
Each trial began with a 1-s fixation cross (+) followed by a sample sequence and a retention interval for 0.5 s or 3 s, masked by a black-white chessboard pattern. Then, another 1-s fixation cross was presented followed by a test sequence and a mask for 0.5 s. After the mask, a question "Are video 1 and video 2 the same?" was presented, and participants were required to make a yes/no judgment by keystroke on a standard computer keyboard ("F" key and "J" key, respectively, marked in red). They were instructed to respond as accurately as possible without a strict time constraint (Figure 2).
Before the formal experiment, participants completed 6 practice trials with movement sequences selected from the stimulus pool but not used in the formal experiment. Feedback, as shown by the word correct or incorrect on the computer screen, was provided on a trial-by-trial basis, followed by an automatic replay of the trial if an incorrect response had been made. The goal of this arrangement was to ensure participants' understanding of the judgement rules, e.g., not to generalize across movements with different speeds. No feedback or trial replay was provided in formal experiment. In total, 288 trials were completed in six blocks. Trials were fully randomized across conditions within participants. The entire experiment lasted about 100 minutes including short breaks between blocks.

Data analysis
Participants' performance in the change detection task was measured by the proportion of correct responses, defined as a mean of hit rate (correctly responding "same" on same trials) and correct-rejection rate (correctly responding "different" on different trials). Considering potential influence from response bias, we also reported the sensitivity measures d' and A', which are relatively unaffected by response bias according to the signal detection theory (SDT) (Green & Swets, 1966;Macmillan & Creelman, 1991). A' is a nonparametric measure of sensitivity and is commonly used when d' assumptions (i.e., the signal and noise distributions are Gaussian and equal in variance) are violated (Pastore et al., 2003;Stanislaw & Todorov, 1999). 4 In addition, to better understand participants' decision process, the decision criterion c would also be reported and discussed.
As we expected that participants' sensitivity to spatial changes might be much higher than that to temporal changes, this sensitivity difference should be considered when calculating domain-specific performance measures. Specifically, the question "Are video 1 and video 2 the same?" implied two independent questions: "Do they have the same trajectory?" and "Do they have the same rhythm?" In other words, there were two feature dimensions (spatial, temporal) involved in the decision space. As an overall hit rate obtained from the same trials can only reflect an evaluation of evidence distributed across spatial and temporal dimensions rather than an evaluation of evidence in each dimension, respectively, the overall hit rate may highly underestimate participants' performance in the spatial domain if a spatial bias exists. To solve this problem, we used an adapted approach inspired by Luan et al. (2011), combining the SDT analysis with a two-cue fast-and-frugal tree (FFT) to calculate domain-specific hit rates and false alarm rates (incorrectly responding "same" on different trials, i.e., complementary to correct-rejection rates) (see Figure 3 and Chiou, 2022a for more details).
The overall hit rate calculated from the same trials (P(Hit) same ) and the false alarm rates calculated from the three types of different trials, i.e., temporal (P (FA) temporal ), spatial (P(FA) spatial ), and both (P(FA) both ) can be expressed as the following: where x ss and x ns are decision variables in the spatial domain when drawn from signal or noise category; x st and x nt are decision variables in the temporal domain when drawn from signal or noise category; x cs and x ct are decision criteria in the spatial and temporal domains, respectively. The hit rates and false alarm rates of the two decision cues are denoted as Hit s , FA s (spatial domain), and Hit t , FA t (temporal domain). Strictly speaking, Hit s , FA s , and Hit t , FA t cannot be solved directly from the equations (1)−(4). We therefore added one additional assumption that participants' miss rates (incorrectly responding "different" on same trials, i.e., complementary to hit rates), denoted as "Miss", were proportional to their false alarm rates in the respective domains, since the incorrect judgments in both same and different trials mainly resulted from the same inaccurate working memory representation of the sample sequence: Miss s : Miss t = (1 − Hit s ): (1 − Hit t ) = FA s : FA t (5) If FA s = FA t = 0, Hit s and Hit t would be unsolvable. Under this condition, if P(Hit) same = 1, it was assumed that Hit s = Hit t = 1 (i.e., perfect discrimination); if P (Hit) same < 1, false alarm rates of a similar within- Figure 2. Trial structure of the change detection task in the pilot experiment. A sample sequence and a test sequence (both in the same length of 4 units) were displayed sequentially with a retention interval of 0.5 s or 3 s in between (not including the 1-s fixation cross before the display of the test sequence). The two sequences were either the same or different in spatial (denoted as A or B) and/ or temporal (denoted as T1 or T2) domain(s). Participants were required to attend to both possible changes and to make judgments on whether the two sequences were the same ("yes") or different ("no").
subjects category would be used as an approximation. For example, the ratio of FA s to FA t in block 1 would be used as an approximation for that in block 2. If no within-subjects data were available, a group average would be used instead. By solving the equations (1) (2) (3) and (5), domain-specific hit rates and false alarm rates (i.e., Hit s , FA s , Hit t , FA t ) can be obtained and then used to calculate the proportion correct, d', A', and c for spatial and temporal information, respectively.
Note that the both condition was a redundant condition, as participants required only the change information in one single dimension (either spatial or temporal) to make a "different" judgment. Therefore, we did not use the data from the both condition to calculate the performance measures. As discussed previously, only equations (1) (2) (3) and (5) and the data from the same (P(Hit) same ), temporal (P (FA) temporal ), and spatial (P(FA) spatial ) conditions were relevant to the calculation. The data from the both condition were used here to verify the approximated domain-specific measures. Specifically, we compared P(FA) both obtained from the data points of the both condition with the results calculated from the approximated FA s × FA t (see equation 4). The results showed that the deviation measured by the mean of absolute difference was less than 2% (M = 1.48%, SD = 1.98%).
We set the statistical threshold of Type I error at α = .05 and reported Cohen's d and partial eta squared (η p 2 ) to indicate effect size. Post hoc analyses were conducted using Bonferroni correction, and Greenhouse-Geisser correction was applied when the assumption of sphericity had been violated in an analysis of variance (ANOVA).

Results
We first conducted a 2 (information type: temporal, spatial) × 2 (retention interval: 0.5, 3 s) repeatedmeasures ANOVA on proportion correct. The analysis yielded a significant main effect for information type,  Participants' sensitivity to spatial (trajectory) changes was assumed to be higher than that to temporal (rhythm) changes, and thus the spatial cue is more likely to be used as the first decision cue. Decision criteria are placed, for simplicity, where the noise and signal distributions intersect, while each cue may have its own decision criterion, which is not necessarily to be unbiased. indicating that participants had a bias, albeit small, toward responding "yes" (or "same") in the change detection task (negative values of c lies to the left of the neutral point, where neither response is favoured). In addition, this tendency was larger in temporal than in spatial judgments, t(27) = -3.56, p = .001, d = -0.67. Note that, if participants' decision process were affected by the global-level response probability (1/4 same trials, 3/4 different trials), the values of c should have been positive, indicating a bias toward responding "no" (or "different"), as the different trials were three times as many as the same trials. The results therefore support the rationality behind our experimental design, namely that domain-specific response probability was more relevant to decision behaviour than global-level response probability. Moreover, due to the sensitivity difference between spatial and temporal processing, movement sequences in the temporal condition, albeit different, looked more similar than those in the spatial condition. This impression of perceptual similarity might result in a stronger bias toward responding "same" in temporal judgments.

Discussion
The pilot experiment showed that participants were more sensitive to spatial (trajectory) than to temporal (rhythm) changes when observing whole-body movement sequences from the stimulus pool we created, consistent with previous findings that spatial information is more dominant in the visual processing of movements (e.g., Giese et al., 2008). Although the aim of the pilot experiment was to test whether there is a spatial bias when processing the stimulus set we created rather than to test whether there is a spatial bias in the perception of human movements in general, it should be noted that we did not create on purpose a stimulus set on which the spatial bias existed. Instead, we created a stimulus set that resembled real-world movements and was performed with a realistic range of speeds, from a relatively high speed (for the human body to perform) to a speed that was two or three times as slow. Therefore, the finding that the sensitivity difference exists between spatial and temporal processing should be able to generalize, to some extent, to real-world conditions. Interestingly, performance was not affected by the length of retention interval, contradicting with our prediction that longer maintenance delay would lead to severer performance decline. Since the change-detection performance in both spatial and temporal domains were above the chance level of 50%, indicating that participants can still differentiate between the sequences, the finding suggests that change detection in the current experiment may rely on memory representations that last over 3 s. In other words, a retention interval of 3 s may be too short to induce any observable forgetting.
In the following main experiment, we investigated how the sequence length and maintenance delay influence working memory for movement rhythms given the high discriminability of the spatial information and the presence of the spatial bias observed in the pilot experiment. We also increased the longest retention interval from 3 s to 6 s to capture potential performance decline due to temporal forgetting.

Participants
Twenty-seven new participants were recruited for the experiment. One was excluded from analyses due to below-chance performance (out of 3 standard deviations from the group mean), leaving a final sample of 26 participants (16 female; aged 18-38 years, M = 25.2, SD = 5.4). The original sample size of 27 was determined by power analysis based on medium-tolarge effects (d = 0.65) of sequence length (set-size effect) and retention interval (temporal forgetting) (see Oberauer et al., 2018 for a review) to provide a power of .90 at an alpha of .05. Participants' expertise indexes in dance, music, and sport were 0.5 (SD = 0.7), 1.0 (SD = 1.0), and 0.9 (SD = 0.9), indicating limited expertise in the respective fields.

Stimuli
Twenty-four movement sequences (12 female, 12 male) were used in this experiment; two-thirds remained the same as the pilot experiment, while one-third were replaced by new sequences from the stimulus pool (i.e., the remaining 8 "unused" sequences of the pre-selected 36 sequences, see "Stimuli and apparatus" in the pilot experiment). Each movement sequence was performed in four metric complex rhythms (i.e., 3212, 2132, 1223, and 2321), same as the pilot experiment.

Design and procedure
Participants performed the change detection task on whole-body movement sequences with the sequence length of 1, 2, 3, or 4 unit(s) and the retention interval of 0.5, 2, 4, or 6 s ( Figure 5). To control for the starting pose, movement sequences, albeit in different lengths, were all displayed from the movement unit 1 (i.e., the first unit of the sequence, see Figure 1). For instance, a sequence with the length of 2 units was defined as a sequence composed of unit 1 and unit 2, and a sequence with the length of 3 units was defined as a sequence composed of unit 1, unit 2, and unit 3. The sample and the test sequences (both in the same length) were either the same or different in temporal (rhythm) or spatial (trajectory) domain, yielding three trial conditions, i.e., same, temporal, spatial, each containing 96, 120, and 72 trials, respectively. The both condition was not included in the main experiment, as it was redundant and irrelevant to the calculation of the domain-specific performance measures. In other words, albeit slightly different in design, data of the pilot and the main experiments were analyzed in the same way. Figure 5. Trial structure of the change detection task in the main experiment. A sample sequence and a test sequence (both in the same length of 1, 2, 3, or 4 units) were displayed sequentially with a retention interval of 0.5, 2, 4, or 6 s in between (not including the 1-s fixation cross before the display of the test sequence). The two sequences could be the same or different in either spatial (denoted as A or B) or temporal (denoted as T1 or T2) domain. Participants were required to attend to both possible changes and to make judgments on whether the two sequences were the same ("yes") or different ("no").
Since the current experiment aimed to investigate how temporal information is processed when the perceptually salient spatial information is also task-relevant, the spatial trials (i.e., trials with spatial changes) were used here as "fillers" to ensure that participants did process spatial information as requested and thus accounted for a smaller portion of the total trials. As the spatial performance was close to ceiling (i.e., the spatial discriminability was high as shown in the pilot experiment), the uneven trial distribution (216:72) and the resulting response bias, if any, should have limited influence on the change detection performance in the spatial domain. For temporal judgments, however, as the performance was far from the optimal, the uneven trial distribution (96:120) may bias participants' decision. It would be ideal if the same trials could have accounted for 50% of the total trials, but in order to control the total number of trials performed in the experiment (i.e., 288) and to maintain a balanced design given the number of movement sequences (i.e., 24), we adjusted the trial distribution in a minimal way. In response to this adjustment and the potential influence from response bias, we reported the sensitivity measure A' together with the proportion of correct responses to provide a relatively fair evaluation of the performance (d' was not calculated due to the insufficient number of trials per condition). The decision criterion c would also be reported and discussed.
Moreover, temporal changes with less salience (i.e., more difficult to discriminate due to a lack of 1-beat unit in either the sample or test sequence), namely 2 (beats) vs. 3 (beats) for 1-unit sequences and 23 vs. 32 for 2-unit sequences, were not included in the experimental trials. Due to a larger variety of trial types (16 possible combinations resulting from 4 sequence lengths and 4 retention intervals), participants performed 14 practice trials before the formal experiment, which contained 288 trials in six blocks. Trials were fully randomized across conditions within participants. Other procedures were the same as in the pilot experiment. The experiment took about 70 minutes to complete.
Further analyses indicated that temporal performance was modulated by the retention interval when Figure 6. Performance (measured by proportion of correct responses) of the main experiment in (a) temporal and (b) spatial domains, respectively, when the retention interval was 0.5, 2, 4, or 6 s and the sequence length was 1, 2, 3, or 4 unit(s). Error bars represent one standard deviation of the mean. the sequence length was relatively short (i.e., 1 unit or 2 units), proportion correct: F(3, 75) = 4.01, p = .010, η p 2 = .14; A': F(2.23, 55.8) = 3.57, p = .030, η p 2 = .13 for 1-unit sequences, and proportion correct: F(3, 75) = 5.45, p = .002, η p 2 = .18; A': F(2.24, 56.0) = 5.54, p = .005, η p 2 = .18 for 2-unit sequences. Specifically, performance declined when the retention interval increased to 6 s from 0.5 s or 2 s, the smallest p = .022 on proportion correct and the smallest p = .014 on A'. However, when the sequence length was longer (i.e., 3 units or 4 units), performance remained unchanged irrespective of the length of retention interval, the smallest p = .277 on proportion correct and the smallest p = .521 on A'. The finding is consistent with the pilot experiment in which no performance deterioration was observed on 4-unit sequences when the retention interval increased from 0.5 s to 3 s. Moreover, sequence length also influenced working memory for temporal information, especially when the retention interval was short (i.e., 0.5 s or 2 s), proportion correct: F(3, 75) = 11.6, p < .001, η p 2 = .32; was 2 s. Further analyses showed that performance for short sequences (1 unit or 2 units) was better than that for long sequences (3 units or 4 units), the smallest p < .001 on proportion correct and the smallest p = .001 on A'. However, when the retention interval was long (i.e., 4 s or 6 s), there were only 1-unit sequences that showed a superior performance when measured by proportion correct, the smallest p = .023. Same as the pilot experiment, the values of decision criterion (c) in spatial (M = -0.14, SD = 0.14) and temporal (M = -0.31, SD = 0.35) domains were lower than zero, t(25) = -5.15, p < .001, d = -1.01, for spatial judgments; t(25) = -4.52, p < .001, d = -0.89, for temporal judgments, indicating that participants had a bias, albeit small, toward responding "yes" (or "same") in the change detection task. Since the global-level response probability (1/3 same trials, 2/3 different trials) would predict a bias toward responding "no" (or "different"), the finding indicates that participants' decision process was not significantly affected by the global-level response probability. Instead, a bias toward a "same" response, which was larger in temporal than in spatial domains, t(25) = -3.29, p = .003, d = -0.64, might reflect an influence from perceptual similarity (i.e., movement sequences in the temporal condition looked more similar than those in the spatial condition). It is worth noting that, although the a priori response probabilities of the pilot and the main experiments were not equivalent, the values of c in both experiments did not show significant difference, t(52) = 1.48, p = .144, d = 0.40, for spatial judgments; t(52) = -0.74, p = .462, d = -0.20, for temporal judgments. Therefore, the results of the two experiments should be fairly comparable in this regard.
The effect of sequence length: information load more than temporal duration As the sequence length we manipulated here could refer to both the "display length" (i.e., temporal duration of the video display) and the "information load" (defined as the number of movement units in a sequence, i.e., set size), it was unclear whether the performance decline observed on longer sequences should be attributed to higher retention demand due to longer display length or due to an increase of the information load. While the current study was not designed for answering this question, it can be partially addressed by conducting analyses after taking the display length into consideration.
Specifically, if we defined the retention interval as the temporal duration between the offset of the sample unit (i.e., any one of the units in the sample sequence) and the onset of the test unit (i.e., the corresponding unit in the test sequence), the actual retention interval should include the display length of the remaining units in the sample sequence and the display length of the preceding units in the test sequence. Since the sample and the test sequences had the same length, the actual retention interval (RI) can be expressed as the following: RI actual = RI original + display length of the sequence × (n -1/n), where n is the number of movement units in a sequence. For example, RI actual of a 2-unit sequence was RI original plus 1.5 s (1/2 of the display length of a 2-unit sequence, i.e., 3 s), and RI actual of a 3-unit sequence was RI original plus 3 s (2/3 of the display length of a 3-unit sequence, i.e., 4.5 s). Table 1 summarizes the actual retention interval of each trial condition.
To examine the effect of sequence length after taking the display length into consideration, we conducted one-way repeated-measures ANOVAs on trial conditions whose RI actual were relatively comparable. The analysis was first conducted on the following four conditions: (1) 1-unit sequence + 6-s retention (RI actual = 6.0 s), (2) 2-unit sequence + 4-s retention (RI actual = 5.5 s), (3) 3-unit sequence + 2-s retention (RI actual = 5.0 s), and (4) 4-unit sequence + 0.5-s retention (RI actual = 5.0 s). The results yielded a significant main effect for sequence length on proportion correct, F(3, 75) = 3.14, p = .030, η p 2 = .11, but not on A': F(2.12, 52.9) = 2.69, p = .074, η p 2 = .10. Post hoc analyses showed a better performance for 1-unit sequences, albeit with the longest RI actual , the smallest p = .041. The analysis was then conducted on the following three conditions: (1) 1-unit sequence + 4-s retention (RI actual = 4.0 s), (2) 2-unit sequence + 2-s retention (RI actual = 3.5 s), and (3) 3-unit sequence + 0.5-s retention (RI actual = 3.5 s). The results showed a significant main effect for sequence length on proportion correct, F(2, 50) = 8.74, p = .001, η p 2 = .26, and A': F(2, 50) = 6.16, p = .004, η p 2 = .20; performance for 1-unit or 2-unit sequences was better than that for 3-unit sequences, the smallest p < .001 on proportion correct and the smallest p = .005 on A'. If the superior performance for short sequences was simply due to a short display length, conditions with comparable RI actual should have led to similar levels of performance. The results therefore suggest that the information load might have a greater influence on memory performance than the display length. Given the finding that the effect of sequence length, if not including 1-unit sequences, was only significant when the retention interval was short (see the previous section), the analysis was not conducted on the following trial conditions: (1) 2-unit sequence + 6-s retention (RI actual = 7.5 s), (2) 3-unit sequence + 4-s retention (RI actual = 7.0 s), and (3) 4-unit sequence + 2-s retention (RI actual = 6.5 s), whose RI actual were also comparable. It is worth noting that, although the retention interval was lengthened to 6 s as compared with 3 s in the pilot experiment, no significant main effect for retention interval was shown in the 4 retention interval × 4 sequence length repeated-measures ANOVA for spatial information, proportion correct: F (1.49, 37.3) = 2.48, p = .110, η p 2 = .09; A': F(1.41, 35.1) = 2.37, p = .123, η p 2 = .09, indicating that there was no performance decline (i.e., forgetting) during the 6-s retention interval. However, as can be seen in Figure 6(b) that there was a tendency of performance decline when the retention interval increased, we would expect a significant effect if the retention interval can be extended to even longer. Furthermore, there was no main effect for block in the 2 (information type: temporal, spatial) × 6 (block: 1, 2, 3, 4, 5, 6) repeated-measures ANOVA, proportion correct: F(5, 125) = 0.49, p = .781, η p 2 = .02; A': F(5, 125) = 0.52, p = .759, η p 2 = .02, indicating no learning effects, either.

Discussion
The main experiment replicated the finding of the pilot experiment with one-third of new sequences and a group of new participants, illustrating the consistency of the performance measures and the homogeneity of the movement sequences in the stimulus pool. Crucially, the results showed that working memory for movement rhythms was jointly influenced by the sequence length and maintenance delay. The expected forgetting occurred when the sequence length was short (i.e., 1 unit or 2 units), indicating that temporal information was encoded and gradually forgotten over a retention interval of 0.5 s to 6 s; however, when the sequence length was long (i.e., 3 units or 4 units), performance remained at a lower level across all retention intervals. Same as the pilot experiment, change-detection performance on long sequences was above the chance level of 50%, indicating that participants can still differentiate between the rhythms, albeit with a lower accuracy. Nevertheless, the absence of forgetting suggests that change detection on long sequences may rely on a different type of memory  representation that is relatively nondecaying (i.e., lasting over 6 s) as compared with that involved in change detection on short sequences. As we will discuss further in General discussion, this memory behaviour sheds lights on how temporal information might be represented in working memory.

General discussion
In the present research, we investigated how the sequence length and maintenance delay influence working memory for movement rhythms after considering the spatial relevance in natural observation and the sensitivity difference between spatial and temporal processing in visual perception. In the pilot experiment, we tested whether participants had a spatial bias when observing whole-body movement sequences created for this research. Consistent with our prediction, we found that participants were more sensitive to spatial (trajectory) than to temporal (rhythm) changes of movements, justifying previous findings that spatial information is normally processed with a higher priority due to its essential role in visual cognition of movements (e.g., Giese et al., 2008) and that spatial information tends to be more perceptually salient owing to the modality advantage of vision (Freides, 1974;O'Connor & Hermelin, 1972;Welch, 1999;Welch & Warren, 1980). Note that, however, we do not claim that there is always a spatial bias in the perception of human movements or that the spatial information will always gain the processing advantage over temporal information. Instead, we acknowledge that the strength of the spatial bias as well as the level of sensitivity difference between spatial and temporal processing would still depend on, for instance, the characteristics of movements being observed and the intention of observation. Therefore, the aim of the pilot experiment was not to measure participants' sensitivity to spatial and temporal information, respectively, but to examine whether participants' sensitivity to spatial information was higher than that to temporal information for the stimulus set we used, which was designed to resemble real-world movements. We ensured that the deviants in both spatial and temporal domains were salient enough to be detected, i.e., far from the just-noticeable difference, while detecting a temporal change might still be more challenging than detecting a spatial change, as having been demonstrated in many previous studies (e.g., Collier & Logan, 2000;Glenberg et al., 1989;Glenberg & Jona, 1991;Repp & Penel, 2002).
Given the high discriminability of the spatial information and the sensitivity difference between spatial and temporal processing observed in the pilot experiment, we investigated in the main experiment how the sequence length and maintenance delay influence working memory for movement rhythms when the perceptually salient spatial information is also task-relevant. We found that working memory for movement rhythms was jointly influenced by the sequence length and maintenance delay. When the sequence length was short (1 unit or 2 units), temporal information was encoded and gradually forgotten over time, consistent with the prediction that performance declines as the retention interval increases. However, when the sequence length was long (3 units or 4 units), performance remained at a lower level across all retention intervals (i.e., no forgetting), suggesting that change detection on long sequences may be supported by a memory representation that is relatively nondecaying.

Forgetting as a characteristic of memory
Although most information retained in working memory is eventually forgotten as time goes by, not all information is forgotten at the same speed. For example, Poom (2012) showed that participants can retain gait direction of biological motions with the same precision for up to 9 s, but only 1 s for gender-stereotypical gait patterns. Therefore, the pattern of forgetting may be seen as a characteristic of memory and an indicator of the underlying process.
In the present research, temporal information of 1unit sequences can be encoded as a perceptual representation (i.e., the exact length of temporal duration or the speed of movement) or a categorical representation (i.e., "quick" vs. "slow", "strong" vs. "soft", etc.). When the second movement unit was appended, a higher-order representation that integrates information from individual units, such as a perceptual group or an ensemble, might also be formed, supporting the information to be encoded in a complementary way. Nevertheless, a similar pattern of forgetting observed on 1-unit and 2-unit sequences (i.e., performance declined substantially beyond a 2-s delay) suggests that change detection on short sequences may rely on similar processes and memory representations. However, when the sequence length was 3 units or longer, performance remained at a lower level, close to that for 2-unit sequences after forgetting, irrespective of the length of retention interval. A higher-than-chance performance indicates that participants can still differentiate between the rhythms, while the absence of forgetting suggests that change detection on long sequences may rely on a different type of memory representation that is less precise, less sensitive to information load (i.e., similar results were obtained when the sequence length was 3 and 4 units), and relatively nondecaying as compared with that involved in change detection on short sequences.

How is temporal information represented in memory?
As discussed previously, change detection may be supported by unit-based perceptual or categorical representations or ensemble-based representations that extract and combine information from multiple units (see also Brady & Alvarez, 2015a;Brady & Tenenbaum, 2013;Liesefeld et al., 2019). It was shown that representing sets of objects or features as a group or an ensemble rather than as individuals can enhance visual cognition (Alvarez, 2011;Whitney & Yamanashi Leib, 2018) and working memory (Brady et al., 2009(Brady et al., , 2011Brady & Alvarez, 2015a, 2015bBrady & Tenenbaum, 2013;Liesefeld et al., 2019;Nassar et al., 2018;Orhan & Jacobs, 2013), especially when information load exceeds processing capacity.
As longer sequences contained more information in the present research, it is possible that the information load of a sequence containing 3 or 4 units was approaching or might have exceeded the working memory capacity for movementsthe number of high-precision movement units that can be retained at once in memory. Previous research has shown that working memory for observed movements is about 2-3 units when sequentially-presented, simple discrete movements were used as visual stimuli (e.g., Wood, 2007). With increased complexity of movements, one follow-up research from our group demonstrated similar results that participants can retain 2-3 whole-body movements from a spatiotemporal discontinuous sequence or slightly more (i.e., "just over three" or 3.01) when contextual effects of spatiotemporal dependency contributed to the performance (Chiou, 2022b). As a result, change detection on long sequences may rely more on ensemble-based representations, which do not require identification or localization of the changing units (e.g., Ball & Busch, 2015;Haberman & Whitney, 2011), than high-precision representations of individual units. However, was it not possible to differentiate between the rhythms by single-unit information across all sequence lengths? Specifically, since the four metric complex rhythms used in the current study (3212, 2132, 1223, and 2321) were designed to have one single salient component, i.e., the 1beat unit, equally distributed across the four serial positions of a sequence, rhythm discrimination by single-unit information should be theoretically possible irrespective of the length of movement sequences, for example, by comparing the "quick unit 2" (of the rhythm 2132) with the "not-so-quick unit 2" (of the rhythm 1223) or the "slow unit 2" (of the rhythm 2321). Yet the results of the main experiment, which showed that rhythm discriminations were supported by different representations or processes when the sequence length differed, did not support this account.
There might be two explanations. First, memory traces of perceptual information are susceptible to interference, which makes it difficult to retain singleunit information due to the retrospective interference from the ongoing movement display. Second, although temporal information of individual units can also be represented as categorical information, e.g., a "quick" or a "slow" unit, which is less affected by interference, successful discrimination of movement rhythms might still require a position code to track when in the timeline or where in the sequence this quick (or slow) unit occurred. As a result, simply retaining single-unit categorical information without a sequence-based context would not be sufficient to perform the change detection task. Otherwise, temporal properties would need to be encoded with the corresponding spatial patterns to form a "hybrid" representation, such as a "quick kick" or a "slow turn", to be recognized independently. However, feature integration is resource-demanding and vulnerable to interference as well (Wheeler & Treisman, 2002).

Why is spatial relevance relevant?
It is generally observed that participants are more sensitive to spatial than to temporal information when observing human movements. This is likely due to the modality advantage of vision on spatial processing, the crucial role of spatial information in movement recognition, and the intrinsic dependence of temporal processing on spatial information. Although the strength of the spatial bias as well as the level of sensitivity difference between spatial and temporal processing still depend on other factors (e.g., characteristics of movements, action goals, etc.), the prevalence of this phenomenon nevertheless emphasizes the need to understand how perceptually salient and task-relevant spatial information might interact with the temporal information and how temporal information might be processed given the presence and relevance of the spatial information.
One may wonder whether the task relevance of the spatial information was indeed relevant to temporal processing given its high discriminability, i.e., low processing demand. The answer might be yes or no. By adding additional processing demand on spatial information, it is reasonable to predict that cognitive resources that can be allocated to temporal processing would decrease, leading to a performance decline in the temporal domain; moreover, the task relevance of the spatial information might further strengthen the existing spatial bias and the sensitivity difference between spatial and temporal processing. Nevertheless, one should also remember that there is an intrinsic dependence of temporal processing on spatial information, namely that temporal processing may rely on a certain level of spatial processing irrespective of whether the spatial information is task-relevant or not. In addition, the spatial bias due to the modality advantage of vision as well as the salience of spatial information in general also suggest that it might be difficult to suppress task-irrelevant spatial processing, even intentionally. The current study did not provide an answer to this question, but we will discuss it further in our next study (see Chiou, 2022a).

The consideration of ecological validity
By taking the spatial relevance and the sensitivity difference between spatial and temporal processing into consideration, the present research aimed to provide insights into how temporal information of movements is processed under a more ecologically valid condition. Moreover, to better resemble real-world scenarios, we not only used complex whole-body movements performed by human dancers as visual stimuli, but also displayed them in a spatiotemporal-continuous manner. As movements unfold over time and are continuous in nature, it is essential to consider potential influence from spatiotemporal continuity when investigating temporal processing of movements. For example, spatiotemporal continuity may facilitate perceptual integration and enhance memory performance by optimizing the encoding process and reducing the overall load on memory (Brady et al., 2009(Brady et al., , 2011Cowan, 2001;Cowan et al., 2004;Xu & Chun, 2007). However, spatiotemporal continuity may also diminish memory benefits from isolation or distinctiveness, which has been shown to improve memory performance as it precisely specifies items within events (Hunt & Worthen, 2006). In the current study, we did not look into how spatiotemporal continuity might influence working memory for movements; instead, we took it as a given and examined working memory performance under this more realistic condition. Nevertheless, this important aspect will be discussed in more detail in our follow-up study (see Chiou, 2022b).

Conclusion
The present research provides insights into how movement rhythms conveyed through complex whole-body movement sequences are perceived, encoded, and retained in working memory under a more ecologically valid condition, in which the corresponding spatial information is perceptually salient and task-relevant. The finding suggests that the sequence lengthin the sense of information load more than temporal durationmay act as the first bottleneck in the processing of movement rhythms, deciding whether temporal information can be encoded as individual units in high precision or it might be encoded as an ensemble "whole" in relatively low precision. In addition, the maintenance delay may act as the second bottleneck, determining to what extent the encoded information can be retained in memory. Notes 1. The expertise index (0-4) is defined as the following: 0: No experience, 1: Beginner (skill level = 1 or 2 in a fivepoint scale, or skill level > 2 but training length < 3-5 years), 2: Intermediate amateur (skill level = 3 and training length ≥ 3-5 years, or skill level > 3 but training length < 6 years), 3: Advanced amateur (skill level = 4 and training length ≥ 6 years), 4: Professionals (skill level = 5 and training length ≥ 6 years). The expertise index is an arbitrary indicator mainly based on the selfevaluated skill level. To avoid the effect of overconfidence in self-evaluation (e.g., Moore & Healy, 2008), the training length is also considered. Namely, a participant will be classified as an "intermediate amateur" only when having at least 3-year training and as an "advanced amateur" only when having at least 6-year training. 2. There were a few exceptions, such as a jump or a circular movement, that were not performed with a bell-shaped velocity profile. Those movements were included into the stimulus set only if a clear starting point and ending point can be identified and the manipulation of movement speed can be precisely implemented. 3. Three sample video clips (A_2132, A_3212, B_3212) that illustrate movement sequences with different trajectories (A, B) and/or rhythms (2132, 3212) were published with the article "Attention modulates incidental memory encoding of human movements" by S.-C. Chiou, 2022, Cognitive Processing, and can be seen online: https:// doi.org/10.1007/s10339-022-01078-1. 4. For the pilot experiment, given a large amount of trials per condition, d' would be a better sensitivity measure, while in the main experiment, due to a small number of trials per condition, the nonparametric sensitivity measure A' would provide a better evaluation of performance. We reported both d' and A' for the pilot experiment to facilitate the comparison of results in both experiments, but reported only A' for the main experiment, as d' calculated from the signal and noise distributions where the Gaussian assumption might not be met could be misleading.