Entropy, Uncertainty, and the Depth of Implicit Knowledge on Musical Creativity: Computational Study of Improvisation in Melody and Rhythm

Recent neurophysiological and computational studies have proposed the hypothesis that our brain automatically codes the nth-order transitional probabilities (TPs) embedded in sequential phenomena such as music and language (i.e., local statistics in nth-order level), grasps the entropy of the TP distribution (i.e., global statistics), and predicts the future state based on the internalized nth-order statistical model. This mechanism is called statistical learning (SL). SL is also believed to contribute to the creativity involved in musical improvisation. The present study examines the interactions among local statistics, global statistics, and different levels of orders (mutual information) in musical improvisation interact. Interactions among local statistics, global statistics, and hierarchy were detected in higher-order SL models of pitches, but not lower-order SL models of pitches or SL models of rhythms. These results suggest that the information-theoretical phenomena of local and global statistics in each order may be reflected in improvisational music. The present study proposes novel methodology to evaluate musical creativity associated with SL based on information theory.


INTRODUCTION Statistical Learning in the Brain: Local and Global Statistics
The notion of statistical learning (SL) (Saffran et al., 1996), which includes both informatics and neurophysiology (Harrison et al., 2006;Pearce and Wiggins, 2012), involves the hypothesis that our brain automatically codes the nth-order transitional probabilities (TPs) embedded in sequential phenomena such as music and language (i.e., local statistics in nth-order levels) (Daikoku et al., , 2017bDaikoku and Yumoto, 2017), grasps the entropy/uncertainty of the TP distribution (i.e., global statistics) (Hasson, 2017), predicts the future state based on the internalized nth-order statistical model (Daikoku et al., 2014;Yumoto and Daikoku, 2016), and continually updates the model to adapt to the variable external environment (Daikoku et al., 2012(Daikoku et al., , 2017d. The concept of brain nth-order SL is underpinned by information theory (Shannon, 1951) involving n-gram or Markov models. TP (local statistics) and entropy (global statistics) are used to estimate the statistical structure of environmental information. The nth-order Markov model is a mathematical system based on the conditional probability of sequence in which the probability of the forthcoming state is statistically defined by the most recent n state (i.e., nth-order TP). A recent neurophysiological study suggested that sequences with higher entropy are learned based on higher-order TP whereas those with lower entropy are learned based on lower-order TP (Daikoku et al., 2017a). Another study suggested that certain regions or networks perform specific computations of global statistics (i.e., entropy) that are independent of local statistics (i.e., TP) (Hasson, 2017). Few studies, however, have investigated how perceptive systems of local and global statistics interact. It is important to examine the entire process of brain SL in both computational and neurophysiological areas (Daikoku, 2018b).

Statistical Learning and Information Theory
Local Statistics: Nth-Order Transitional Probability Research suggests that there are two types of coding systems involved in brain SL (see Figure 1): nth-order TPs (local statistics at various order levels) (Daikoku et al., 2017a;Daikoku, 2018a) and uncertainty/entropy (global statistics) (Hasson, 2017). The TP is the conditional probability of an event B, given that the most recent event A has occurred-this is written as P(B|A). The nth-order TP distributions sampled from sequential information such as music and language can be expressed by nth-order Markov models (Markov, 1971). The nth-order Markov model is based on the conditional probability of an event e n+1 , given the preceding n events based on Bayes' theorem [P(e n+1 |e n )]. From a psychological viewpoint, the formula can be interpreted as positing that the brain predicts a subsequent event e n+1 based on the preceding events e n in a sequence. In other words, learners expect the event with the highest TP based on the latest n states, and are likely to be surprised by an event with lower TP. Furthermore, TPs are often translated as information contents [ICs, -log 2 1/P(e n+1 |e n )], which can be regarded as degrees of surprising and predictable (Pearce and Wiggins, 2006). A lower IC (i.e., higher TPs) means higher predictability and smaller surprise whereas a higher IC (i.e., lower TPs) means lower predictability and larger surprise. In the end, a tone with lower IC may be one that a composer is more likely to predict and choose as the next tone compared to tones with higher IC. IC can be used in computational studies of music to discuss the psychological phenomena involved in prediction and SL.

Global Statistics: Entropy and Uncertainty
Entropy (i.e., global statistics, Figure 1) is also used to understand the general predictability of a sequence (Manzara et al., 1992;Reis, 1999;Cox, 2010). It is calculated from probability distribution, interpreted as uncertainty (Friston, 2010), and used to evaluate the neurophysiological effects of global SL (Harrison et al., 2006) as well as decision making (Summerfield and de Lange, 2014), anxiety (Hirsh et al., 2012), and curiosity (Loewenstein, 1994). A previous study reported that the neural systems of global SL were partially independent of those of local SL (Hasson, 2017). Furthermore, reorganization of learned local statistics requires more time than the acquisition of new local statistics, even if the new and previously acquired information sets have equivalent entropy levels (Daikoku et al., 2017d). Some articles, however, suggest that the global statistics of sequence modulate local SL (Daikoku et al., 2017a). Furthermore, uncertainty of auditory and visual statistics is coded by modalitygeneral, as well as modality-specific, neural systems (Strange FIGURE 1 | Relationship between order of transitional probabilities, entropy, conditional entropy, and MI illustrated using a Venn diagram. The degree of dependence on X i for X i+1 is measured by MI (MI (I(X;Y)) = entropy (H(X i+1 )]conditional entropy [H(X i+1 |X i ))). The MI of sequences in this figure is more than 0. Thus, each event X i+1 in the sequence is dependent on a preceding event X i . Nastase et al., 2014). This suggests that the neural basis that codes global statistics, as well as local statistics, is a domain-general system. Although domain-general and domainspecific learning system in the brain are under debate (Hauser et al., 2002;Jackendoff and Lerdahl, 2006), there seems to be neural and psychological interactions in perceptions between local and global statistics.

Depth: Mutual Information
Mutual information (MI) and pointwise MI (PMI) are measures of the mutual dependence between two variables. PMI refers to each event in sequence (local dependence), and MI refers to the average of all events in the sequence (global dependence). In the framework of SL based on TPs [P(e n+1 |e n )], MI explains how an event e n+1 is dependent on the preceding event e n . Thus, MI is key to understanding the order of SL. For example, a typical oddball sequence consisting of a frequent stimulus with high probability of appearance and a deviant stimulus with low probability of appearance has weak dependence between two adjacent events (e n , e n+1 ) and shows low MI, because event e n+1 appears independently of the preceding events e n . In contrast, an SL sequence based on TPs, but not probabilities of appearance, has strong dependence on the two adjacent events and shows larger MI. For example, a typical SL paradigm that consists of the concatenation of pseudo-words with three stimuli has large MI until second-order Markov or tri-gram models [i.e., P(C|AB)] whereas it has low MI from third-order Markov or four-gram models [i.e., P(D|ABC)]. Thus, MI is sometimes used to evaluate levels of SL in both neurophysiological (Harrison et al., 2006) and computational studies (Pearce et al., 2010). In sum, the three types of information-theoretical evaluations of SL models (i.e., IC, entropy, and MI) can be explained in terms of psychological aspects. (1) IC reflects local statistics. A tone with lower IC (i.e., higher TPs) may be one that a composer is more likely to predict and choose as the next tone compared to tones with higher IC. (2) Entropy reflects global statistics and is interpreted as the uncertainty of whole sequences. (3) MI reflects the levels of orders in statistics and is interpreted as the dependence of preceding sequential events in SL. Using them, the present study investigated how local statistics, global statistics, and the levels of the orders in musical improvisation interact.

Musical Improvisation
Implicit statistical knowledge is considered to contribute to the creativity involved in musical composition and musical improvisation (Pearce and Wiggins, 2012;Norgaard, 2014;Wiggins, 2018). Additionally, it is widely accepted that implicit knowledge causes a sense of intuition, spontaneous behavior, skill acquisition based on procedural learning, and creativity, and is also closely tied to musical expression, such as composition, playing, and intuitive creativity. Particularly, in musical improvisation, musicians are forced to express intuitive creativity and immediately play their own music based on long-term training associated with procedural and implicit learning (Clark and Squire, 1998;Ullman, 2001;Paradis, 2004;De Jong, 2005;Ellis, 2009;Müller et al., 2016). Thus, compared to other types of musical composition in which a composer deliberates and refines a composition scheme for a long time based on musical theory, the performance of musical improvisation is intimately bound to implicit knowledge because of the necessity of intuitive decision making (Berry and Dienes, 1993;Reber, 1993;Perkovic and Orquin, 2017) and auditory-motor planning based on procedural knowledge (Pearce et al., 2010;Norgaard, 2014). This suggests that the stochastic distribution calculated from musical improvisation may represent the musicians' implicit knowledge and creativity in music that has been developed via implicit learning. Few studies have investigated the relationship between musical improvisation and implicit statistical knowledge. The present study, using realworld improvisational music, first proposed a computational model of musical creativity in improvisation based on TP distribution, and examined how local statistics, global statistics, and hierarchy in music interact.  Dimensions, 1984) were used in the present study. The highest pitches with length were extracted based on the following definitions: the highest pitches that can be played at a given point in time, pitches with slurs that can be counted as one, and grace notes were excluded. In addition, the rests that were related to highestpitch sequences were also extracted. This spectral and temporal information were divided into four types of sequences: [1] a pitch sequence without length and rest information (i.e., pitch sequence without temporal information); [2] a temporal sequence without pitch information (i.e., temporal sequence without pitches); [3] a pitch sequence with length and rest information (i.e., pitch sequence with temporal information); and [4] a temporal sequence with pitch information (i.e., temporal sequence with pitches).

Pitch Sequence Without Temporal Information
For each type of pitch sequence, all of the intervals were numbered so that an increase or decrease in a semitone was 1 and −1 based on the first pitch, respectively. Representative examples were shown in Figure 2. This revealed the relative pitch-interval patterns but not the absolute pitch patterns. This procedure was used to eliminate the effects of the change in key on transitional patterns. Interpretation of the key change depends on the musician, and it is difficult to define in an objective manner. Thus, the results in the present study may represent a variation in the statistics associated with relative pitch rather than absolute pitch.

Temporal Sequence Without Pitches
The onset times of each note were used for analyses. Although, note onsets ignore the length of notes and rests, this methodology can capture the most essential rhythmic features of the music (Povel, 1984;Norgaard, 2014). To extract a temporal interval between adjacent notes, all onset times were subtracted from the onset of the preceding note. Then, for each type of temporal sequence, the second to last temporal interval was divided by the first temporal interval. Representative examples are shown in Figure 2. This revealed relative rhythm patterns but not absolute rhythm patterns; it is independent of the tempo of each piece of music.

Pitch Sequence With Temporal Information
The two methodologies of pitch and temporal sequences were combined. For each type of sequence, all of the intervals were numbered so that an increase or decrease in a semitone was 1 and −1 based on the first pitch, respectively. Additionally, for each type of pitch sequence, all onset times were subtracted from the onset of the preceding note, and the second to last temporal intervals were divided by the first temporal interval. The representative examples were shown in Figure 2. On the other hand, a temporal interval of first-order model was calculated as a ratio to the crotchet (i.e., quarter note), because only a temporal interval is included for each sequence and the note length cannot be calculated as a relative temporal interval. Thus, the patterns of pitch sequence (p) with temporal information (t) were represented as [p] with [t].

Temporal Sequence With Pitches
The methodologies of sequence extraction were the same as those of the pitch sequence with rhythm (see Figure 2), whereas the TPs of the rhythm, but not pitch, sequences were calculated as a statistic based on multi-order Markov chains. The probability of a forthcoming temporal interval with pitch was statistically defined by the last temporal interval with pitch to six successive temporal interval with pitch (i.e., first-to six-order Markov chains). Thus, the relative pattern of temporal sequence (r) with pitches (p) were represented as [t] with [p].

Modeling and Analysis
The TPs of the sequential patterns were calculated based on 0th−5th-order Markov chains. The nth-order Markov chain is the conditional probability of an event e n+1 , given the preceding n events based on Bayes' theorem: P (e n+1 |e n ) = P(e n+1 ∩ e n ) P(e n ) The ICs (I[e n+1 |e n ]) and conditional entropy [H(B|A)] in the nth-order TP distribution (hereafter, Markov entropy) were calculated using TPs in the framework of information theory.
I (e n+1 |e n ) = log 2 1 P (e n+1 |e n ) (bit) H (B|A) = − i j P(ai)P bj ai log 2 P bj ai (bit) (3) where P(bj|ai) is a conditional probability of sequence "ai bj." Then, MI [I(X;Y)] were calculated in 1st-, 2nd-, and 3rdorder Markov models. MI is an information theoretic measure of dependency between two variables (Cover and Thomas, 1991). The MI of two discrete variables X and Y can be defined as where p(x,y) is the joint probability function of X and Y, and p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively. From entropy values, the MI can also be expressed as where H(X) and H(Y) are the marginal entropies, H(X|Y) and H(Y|X) are the conditional entropies, and H(X,Y) is the joint entropy of X and Y (Figure 1). Based on psychological and information-theoretical concepts, the Equation (5) can be regarded that the amount of entropy (uncertainty) remaining about Y after X is known. That is, the MI is corresponding to reduction in entropy (uncertainty). Then, the transitional patterns with 1st−20th highest TPs in all musicians, which show higher predictabilities in each musician, were used as local statistics of familiar phrases. The applied familiar phrases and the TPs were shown in Supplementary material. The TPs of familiar phrases were averaged. Repeated-measure analysis of variances (ANOVAs) with factors of order and type of sequence were conducted in each IC, entropy, and MI. Furthermore, the global statistics and MI in each order were compared with local statistics of familiar phrases by Pearson's correlation analysis. Statistical significance levels were set at p = 0.05 for all analyses.

Local vs. Global Statistics
The means of IC, conditional entropy, and mutual information were shown in Figure 3. The means of IC, conditional entropy, and mutual information were shown in Figure 3.

Local vs. Global Statistics
All of the results in correlation analysis are shown in Figure 4.

Psychological Notions of Information Theory
The present study investigated how local statistics (TP and IC), global statistics (conditional entropy), and levels of orders (MI) in musical improvisation interact. The TP, IC, conditional entropy, and MI can be calculated based on Markov models, which are also applied to psychological and neurophysiological studies on SL (Harrison et al., 2006;Furl et al., 2011;Daikoku, 2018b). Based on psychological and neurophysiological studies on SL (Harrison et al., 2006;Pearce et al., 2010;de Zubicaray et al., 2013;Daikoku et al., 2015;Monroy et al., 2017), these three pieces of information can be translated to psychological indices: a tone with lower IC (i.e., higher TPs) may be one that a composer is more likely to predict and choose as the next tone compared to tones with higher IC whereas entropy and MI are interpreted as the global predictability of the sequences and the levels of order for the prediction, respectively. Previous studies also suggest that musical creativity in part depends on SL (Pearce, 2005;Pearce et al., 2010;Omigie et al., 2012Omigie et al., , 2013Pearce and Wiggins, 2012;Hansen and Pearce, 2014;Norgaard, 2014), and that musical training and experience is associated with the cognitive model of probabilistic structure in the music involved in SL (Pearce, 2005;Pearce andWiggins, 2006, 2012;Pearce et al., 2010;Omigie et al., 2012Omigie et al., , 2013Hansen and Pearce, 2014;Norgaard, 2014). The present study, using improvisational music by three musicians, FIGURE 3 | The means of information content (IC), Conditional entropy, and mutual information (MI). Error bars represent standard errors of the means. P, pitch sequence; R, rhythm sequence; PwR, pitch sequence with rhythms; RwP, rhythm sequence with pitches.
examined how local and global statistics embedded in music interact, and discussed them from the interdisciplinary viewpoint of SL.

Local vs. Global Statistics
In pitch sequence with and without temporal information, higher-order (1st−5th order) models detected positive correlations between global (conditional entropy) and local statistics (IC) in musical improvisation whereas no significance was detected in a lower-order (0th order) model. To understand the local statistics of familiar phrases, the present study used only the transitional patterns that showed the 1st−20th highest TPs for all musicians, which can be interpreted as higher predictabilities for each musician. Thus, the results suggest that, when the TPs of familiar phrases are decreased, the conditional entropy (uncertainty) of the entire TP distribution is increased. This finding is mathematically and psychologically reasonable. When improvisers are attempting to use various types of phrases, the variability of sequential patterns is increasing. In the end, the ICs (degree of surprise) of familiar phrases are positively correlated with the conditional entropy (uncertainty) of the entire sequential distribution. It is of note that this correlation could not be detected in a lower-order (0th order) model, and that no correlation was detected in a temporal sequence without pitches. This suggests that the interaction between local and global statistics may be stronger in the SL of spectral sequence compared to that of temporal sequence. Furthermore, these correlations may be detectable in higher-order models. This may suggest that higher-order SL can connect with grasping entropy. In sum, skills of musical improvisation and intuition may strongly depend on SL of pitch compared with that of rhythm. In addition, this phenomenon on intuition may occur in higher-, but not lower-order levels in SL. The higher-order SL model of pitches may be important  FIGURE 4 | The correlation analysis between conditional entropy (global statistics) and ICs of familiar phrases (local statistics) based on zeroth-to fifth-order Markov models of pitch and temporal (rhythm) sequences.
FIGURE 5 | The correlation analysis between MI and ICs of familiar phrases (local statistics) based on zeroth-to fifth-order Markov models of pitch and temporal (rhythm) sequences.
to grasp the entire process of hierarchical SL in musical improvisation.

Local Statistics vs. Hierarchy
In pitch sequences without temporal information, higher-order (3rd−5th order) models showed negative correlations between dependence of previous events (MI) and local statistics (IC), and no significance was detected in lower-order (0th−2nd order) models. This finding is also mathematically and psychologically reasonable. When players strongly depend on previous sequential information to improvise music, they tend to use familiar phrases because familiar phrases with higher TPs P(X i+1 |X i ) tend to have strong dependence on previous sequential information (X i ).
In the end, the ICs (degree of surprise) of familiar phrases are decreased when improvisers depend on previous sequential information that can be detected as larger MIs. Interestingly, this correlation could not be detected in a lower-order model (0th order), and no correlation was detected in the temporal sequence without pitches. As shown in the correlation between local and global statistics, the interaction between local statistics and levels of orders may be stronger in the SL of spectral sequence compared to that of temporal sequence. Furthermore, these correlations may be detectable in higher-order models.
In contrast, fourth-and fifth-order models of pitch sequence with temporal information did not show correlations. Thus, rhythms may modulate the levels of orders in the SL of pitches in improvisational music (Daikoku, 2018c). This hypothesis may be supported in the models of temporal sequence with pitches.
No correlation was detected in temporal sequence (Daikoku et al., 2018) with pitches. Future study is needed to investigate how rhythms affect improvisational music, and how the SL of rhythms interact with those of pitches. It is of note that the present study did not directly investigate the improviser's statistical knowledge of music, as only the statistics of music were analyzed. However, the transition probabilities shape only a small part of their respective styles. Future study should investigate the SL of music from many improvisers using interdisciplinary approaches of neurophysiology and informatics in parallel. The methodologies in this study are missing important information that constructs music such as beat, stresses, and ornamental note, which inspire the rhythm and intonation. Furthermore, the present study only analyzed three improvisers. To discuss universal phenomena in SL associated with improvisation, future study will be needed to examine a body of pieces of music.

CONCLUSION
The present study investigated how local statistics (TP and IC), global statistics (entropy), and levels of orders (MI) in musical improvisation interact. Generally, the interactions among local statistics and global statistics were detected in higher-order SL models of pitches, but not lower-order SL models of spectral sequence or SL models of temporal sequence. The results of the present study suggested that information-theoretical phenomena of local and global statistics in each hierarchy can be reflected in improvisational music. These results support a novel methodology to evaluate musical creativity associated with SL based on information theory.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.