Implementation of melodic morphing based on generative theory of tonal music

This paper proposes a new morphing method to generate a melody, given two melodies represented by time-span trees obtained from the generative theory of tonal music (GTTM). In this paper, we define a feature structure based on a time-span tree and the distance between two structures. Next, we construct a lattice of subsumption relations among these structures and define three algebraic operations: reduction, meet, and join. First, we conduct these operations in the pure algebraic domain and a new melody is as an interpolation in the lattice; thereafter, we render the obtained tree into a music melody, assigning a priority order among candidate branches. We confirmed that our morphing algorithm generated new variations with satisfactory quality.


Introduction
We propose a new methodology of melodic morphing as a tenable and trustworthy music manipulation based on the generative theory of tonal music (GTTM) (Lerdahl, 2001(Lerdahl, , 2019Lerdahl & Jackendoff, 1983). Imagine a scenario where a composer intends to give a different feeling to a given melody A and that he/she is aware that another melody B possesses such a feeling. In such a situation, melodic morphing can reflect the user's intention; that is, the composer can use melody C with the nuances of both melody A and melody B (Hamanaka et al., 2008(Hamanaka et al., , 2009.
GTTM consists of four analysis modules: grouping, metrical, time-span, and prolongational, each of which assigns a separate structural description to a listener's understanding of a piece of music. This paper uses the time-span tree module, a hierarchical binary tree representing the relative structural importance of notes, to reduce the number of notes and distinguish the essential parts of a melody from its ornamentation.
The compositional process is very different between art music composers and commercial arrangers, the latter of whom need to modify existing melodies for movies and games. In this work, we aim at satisfying the needs of the latter by developing a compositional process for melody morphing that is transparent and understandable and whose outcome is always predictable. If this type of melody generation can be realised at high speed by a CONTACT Masatoshi Hamanaka masatoshi.hamanaka@riken.jp computer, it will be possible to reduce the time and effort required for composing music.
To formalise the melodic morphing concept, we employ an algebraic method that is mathematically and cognitively well-grounded. First, we define the maximum time span as the longest temporal interval for each pitch event to become most salient on the basis of time-span analysis using the generative theory of tonal music (GTTM) (Lerdahl, 2001(Lerdahl, , 2019Lerdahl & Jackendoff, 1983). We acknowledge that GTTM is not a widely accepted explanatory model, however, we still believe that it is a useful tool for treating music mathematically and computationally.
Then, we define the idea of reducible branches in a tree and the distance between trees by using the maximum time span. Since the set of time-span trees is a lattice because of the subsumption relations among them, we can define three algebraic operations on time-span trees: reduction, meet, and join. The melodic morphing method is composed of these operations (Hirata & Aoyagi, 2003).
As an application of the method, we demonstrate composition of new variations from two existing variations, by combining the two time-span trees of the variations in an algebraic manner with join and meet operations. Here, the meet operation simply reduces uncommon pitch events and is rather naturally defined as the intersection of two music pieces. Thus, if we restrict our interest to the calculation of distance, meet may serve as an edit distance such as the earth mover's distance (EMD) or Rizo-Valero's edit distance (Rizo, 2010). In contrast, the join operation is problematic; since it may add pitch events of one melody to another, there could be incompatible pitch events in the two scores. Here, we propose the notion of a virtual set of time-span trees apart from the music scores in the real world, where we regard that the rendering from a tree to a score is another independent process. This paper is organised as follows. In Section 2, we briefly describe the related research. In Section 3, we provide the basic algebraic operations for time-span trees and the notion of distance as theoretical background. In Section 4, we clarify the morphing algorithm and its problems. We discuss the problems and propose solutions in Sections 5.2 and 5.3. In Section 6, we describe two experiments and their results. In Section 7, we describe our method's limitations and discuss possibilities for further development.

Related work
This section surveys the preceding methods on arranging an existing melody to a certain intended melody.
Many music generation systems using deep neural networks have been proposed. DeepBach (Hadjeres et al., 2017) combines a feed-forward network and longshort term (LSTM) memory (Hochreiter & Schmidhuber, 2017) to output an accompaniment like Bach Chorales for an input melody. Deepjazz (Ji-Sung, 2016) generates pieces by learning jazz music using two layers of LSTM. Although these methods (Hadjeres et al., 2017;Ji-Sung, 2016) can be used to generate new melodies, they can't be used to manipulate existing melodies.
MidiMe (Dinculescu et al., 2019) makes it possible to generate a melody by using a personalised variational auto-encoder (VAE) (Kingma & Welling, 2014). Inpaint-Net (Pati et al., 2019) predicts an inpainted melody from a past melody by using VAE. Since these methods (Dinculescu et al., 2019;Pati et al., 2019) employ the latent space, it is possible for a user to obtain a melody that is slightly different from the given melody by adjusting the values in the space. However, it is difficult to predict what melody will be output as a result of the user's operation.
FLOW Composer (Pachet & Roy, 2014;Papadopoulos et al., 2016) creates a lead sheet with a music style transferred from a partially specified incomplete lead sheet by using a model that combines two Markov chains. FLOW Composer has a mechanism for users to obtain more control over the output by specifying partial information on the lead sheet and selecting styles. However, since melodies are obtained from Markov chain probabilities, it is difficult for the user to predict the output melody. Cope (2005) developed a computer system to compose music pieces with a method called recombination, which is to splice segments of known pieces. The system utilises parts of the existing music structure, and in this respect, it resembles our system; however, morphing and splicing are two different methods. Ponce de León et al. (2016) proposed a genetic algorithm for creating music with a multi-dimensional fitness determined by machine learning. They mentioned the issue with such a Frankensteinstyle optimisation. The Lindenmayer system (L-system) has been used for making plant designs in computer games, and Fridenfalk (2015) attempted to use it to create music and simulated the interactive composition process in the style of tree growth.
In general, a trade-off exists between descriptive power and simplicity in the formal representation. Examples of highly descriptive systems include commercially available music sequencing software that allows detailed setting of the surface structure of the melody, that is, the pitch of each note, the timing of note-on, and so on. Depending on the time and effort spent by the composer, such software can fully control the compositional process. On the other hand, by using Mozart's Musical Dice Games (Langston, 1989), for example, a composer can generate agreeable melodies simply by configuring the parameters, but he/she must give up a certain amount of control over the output.
To resolve the trade-off between tractability and user control, algorithms (Drewes & Högberg, 2007;Högberg, 2005) have been proposed that consist of algebraic operations upon input melodies. These are close to our idea in that they use combinations of basic operations. We decided to pursue this direction and make the target of operations time-span trees acquired by GTTM; in so doing, we wanted to enable users to easily predict the results of the morphed melody by offering them a transparent process.
Thus far, GTTM rule sets have been rather naïvely built into a time-span tree analyser (Hamanaka et al., 2007a(Hamanaka et al., , 2007b, and rules have been learned on the basis of probabilistic context-free grammar (Hamanaka et al., 2015;Nakamura et al., 2016). However, these analysers have low accuracy. We have been developed a GTTM analyser using deep learning and have used deep learning to automate the grouping structure analysis and metrical structure analysis (Hamanaka et al., 2017). Recently, we developed deep-learning-based time-span tree analyser using time-span tree leveled in accordance with the duration on the time span (Hamanaka et al., 2021). Now, let us turn our attention to music similarity. Some studies on similarity are motivated by engineering demands such as music retrieval (Downie et al., 2009; Hewlett & Selfridge-Field, 1998), classification, and recommendation (Grachten et al., 2005;Pampalk, 2006;Schedl et al., 2011), while others model the cognitive processes of musical similarity (ESCOM, 2005a(ESCOM, , 2005b. In this paper, we also seek a notion of stable and consistent similarity, postponing context-dependency and subjectivity for later studies. That is, we assess the similarity only in terms of the score of the music, disregarding such context-dependent factors as timber, artist, subject matter of lyrics, and cultural factors. Also, we regard the similarity assessment to be consistent in the sense that most experienced listeners can deliver the same results within western tonal music.

Calculus in melody lattice using time-span tree of GTTM
Our melodic morphing algorithm operates upon timespan trees ( Figure 1) acquired by performing a time-span analysis in accordance with GTTM. In this section, we describe the notion of time span and then extend it to the maximum time span. Thereafter, we formalise the tree in feature structures in order to treat musical objects algebraically. Figure 2 shows an example of abstracting a melody by using a time-span tree. The figure includes a time-span tree from melody A that embodies the results of the GTTM analysis. In the tree, structurally important notes are connected to branches nearer the root of the tree, whereas unimportant notes are leaves. We can obtain an abstracted melody B, by pruning the branches in the middle (line B) and then omitting notes whose branch connections are below line B. In the same manner, if we prune the branches up to line C, we can obtain an even more abstracted melody, C. We can regard this abstraction as a kind of melodic morphing because melody B is an intermediate one between melodies A and C.

Maximum time span
The head in a time-span tree is the top-most pitch event, that is, the most salient in the tree. When two adjacent subtrees are combined, one of the two heads of the subtrees becomes the head of the whole. This implies that the head of a tree is most salient in the time interval the tree occupies. Since a tree is a hierarchical combination of subtrees, each event in the tree has its longest interval to be most salient as the head of some subtree. Accordingly, in the base case when a subtree consists of a single pitch event, we define the maximum time span to be the duration of the event Maximum time span: we call the longest temporal interval where a given pitch event becomes most salient as the maximum time span for the event. In other words, the maximum time span of a pitch event coincides with the temporal duration of the subtree of which the event becomes the head, as a result of the time-span analysis.  Figure 3 illustrates four contiguous pitch events, e 1 , e 2 , e 3 , and e 4 , each of which has its own temporal duration of s 1 , s 2 , s 3 , and s 4 , respectively. They are merged into larger time spans until the maximum time spans are reached, denoted as thick gray lines. At the lowest level in the hierarchy, the length of a muximum time span (mt i ) is equal to that of a pitch duration: i.e. mt 2 = s 2 , mt 3 = s 3 , and so on. A salient pitch event absorbs the adjacent time spans of less salient events in the comparison in the time-span tree and thus results in mt 1 = s 1 + mt 2 and mt 4 = mt 1 + mt 3 + s 4 = s 1 + s 2 + s 3 + s 4 .

Feature structure
Representing a time-span tree as a feature structure makes it possible to mathematically express operations on branches, such as reduction, meet, and join. This makes it easy to implement branch operations on a computer. Carpenter (1992) introduced the notion of feature structure (f -structure, hereafter) in the form of an acyclic directed graph that consists of a list of feature-value pairs. A value can be recursively replaced by another feature structure. According to Carpenter's notation, we represent an f -structure by a vertical column parenthesised by square brackets; in each row, the feature name appears on the left and its value appears on the right. The type of f -structure is identified by placing a '˜' (tilde) in front of it, aside from the feature-value pairs.
An f -structure for a˜tree is shown in (1).
A binary tree has left and right branches on which subtrees recursively appear; we call these branches daughters (dtrs). Here, a subtree consists of a single pitch event when the value of dtrs becomes '⊥'. The brace notation {x, y} means a choice of either x or y.
The values of a feature are referred to by a sequence of feature names connected by '.' (dot) when recursive f -structures are embedded. Let σ be an f -structure; then, the left daughter and right daughter are other˜tree type structures, referred to by σ .dtrs.left and σ .dtrs.right, respectively. The head of a˜tree must coincide with either the head of its left daughter or that of its right daughter, if any. 1 If σ .head = σ .dtrs.left.head, the node is right branching, and if σ .head = σ .dtrs.right.head, left branching.
A daughter subtree happens to be replaced by a pitch event at a leaf in a tree, the type of which is˜event and consists of such features as pitch, duration, and position in terms of bar, meter, and so on.

Subsumption relation, reduction, join, and meet
We use the primitive operations of the subsumption relation (written as ), meet (written as ), and join (written as ). The subsumption relation represents the relation: 'an instantiated object' 'an abstract object' (Figure 4(a)). For example, the relationship among σ A , σ B , and σ C , which are the time-span trees (or reduced time-span trees) of melodies A, B, and C in Figure 2, can be represented as follows: The meet operator extracts the maximally common part of two time-span trees of two melodies in a top-down manner ( Figure 4b). The join operator joins two timespan trees in a top-down manner as long as the structures of the two time-span trees are consistent (Figure 4(c)).
Thanks to the f -structure, we can define the subsumption relation between two trees.
Subsumption: Let σ 1 and σ 2 be two f -structures. When the two structures have a common type and for any  feature-value pair in σ 1 there exists the same pair in σ 2 , we say that σ 2 subsumes σ 1 and denote this by σ 1 σ 2 . Note that a feature name may consist of the dotted notation when the f -structures are embedded recursively.
Let σ 1 and σ 2 be two˜tree type structures. From now on, we will refer to these σ i 's directly as trees as long as there is no confusion. When σ 1 σ 2 , we regard σ 1 to be a reduced tree of σ 2 , as in abstraction in Figure 2. In σ 1 , some branches have disappeared from σ 2 .
Reduction: when two time-span trees are in the subsumption relation σ 1 σ 2 , we can regard σ 1 to be a reduction of σ 2 . Accordingly, we can define the reduction operation as the process of removing a branch from a given tree. Such a branch is called reducible. Also, consecutive reductions of reducible branches form a sequence of ' ' relations from σ 2 to σ 1 ; if there is a path from σ 2 to σ 1 in a partial ordered set (meet-semilattice) to be explained below, we call such a sequence a reduction path.
The reduction process is shown in Figure 5 using maximum time spans. A subsumption relation holds between the maximum time spans before and after the reduction.
At this point, we introduce the hypothesis that if we reduce (eliminate) a branch including one pitch event, we lose the information corresponding to that maximum time span. This is called the maximum time-span hypothesis.
In general, a set of f -structures is put in a partial order, defined by the subsumption relation. An f -structure is subsumed by an upper (larger) f -structure, while it subsumes a lower (smaller) one.
Two f -structures can be unified when there is no contradiction in the feature-value pairs, and the unification results in the least upper bound of the two structures. We define a unification as a join and the intersection of unifiable f -structures as a meet. Let σ A and σ B be tree structures for two music pieces, A and B, respectively.
Join If we can fix the least upper bound of σ A and σ B , that is, the least y such that σ A y and σ B y is unique, we call such y the join of σ A and σ B and denote it as σ A σ B .
Meet If we can fix the greatest lower bound of σ A and σ B , that is, the greatest x such that x σ A and x σ B are unique, we call such x the meet of σ A and σ B , denoted as We may not be always so fortunate that a set of possible reduction trees composes a Boolean lattice where meet and join exist uniquely. In such case, we need to incorporate compensating conditions, such as the absorption law 2 or the distributive law 3 in the lattice; however, the treatment of such conditions exceeds the current scope of our work.

Definition of unique distance
Reducing a time-span tree means eliminating one timespan contained within that tree. 4 A series of time-span trees obtained by eliminating time spans one by one is called a reduction path. Since there are generally multiple branches for which reduction is possible, there are also multiple reduction paths. In accordance with the maximum time-span hypothesis introduced in the previous section, we use the fact that the same amount of information is lost as the maximum time spans eliminated by reduction and define the distance of the time-span trees on the reduction path as being equal to the sum of the eliminated maximum time spans (we call this reduction of the path distance).
Next, we generalise the concept of the distance between time-span trees on the reduction path as the concept of the distance between time-span trees on the lattice. Considering the distance calculation, we determine that the reduction of a branch is carried out one branch at a time. Branches are always reduced in a bottom-up manner; in other words, we say that a branch is reducible when that branch has no sub-branches below it. However, we will make an exception for a tree made of one pitch event, i.e. a tree comprising one branch.
First, let us denote ς(σ ) as the set of pitch events included in time-span tree σ , ς (σ ) as the cardinality of the set, and s e as the maximum time-span of pitch event e. The distance d between two time-span trees such that σ A σ B in a reduction path is defined by For example, in Figure 5, the distance between σ 1 and σ 4 is mt 1 + mt 2 + mt 3 . Note that if e 3 is first reduced and e 2 is subsequently reduced, the distance is the same.
Although the distance appears at a glance to be a simple summation of maximum time spans, there is an additional latent order, because the reducible branches are different in each reduction step. To give a constructive procedure for this summation, we introduce the notion of the total sum of maximum time spans: When As a special case of the above, d (⊥, σ ) = tmts(σ ). As there is a reduction path between σ σ B and σ A σ B , and Here, let us define two distance metrics.
on the basis of the uniqueness of the reduction distance. 5 Hereafter, we will omit , from d , , simply expressing it as 'd' instead. Here, d(σ A , σ B ) is unique among the shortest paths between σ A and σ B . Finally, we obtain which is a triangle inequality. We show an example in which join and meet are calculated from two given pieces ( Figure 6). The two pieces are taken from Mozart's variations K.265/300e 'Ah, vous diraije, maman', that is, variations No. 2 and No. 5. Actually, in the process of calculating the join and meet operations of these two time-span trees, the join and meet of the unmatched branching ones occurred nine times for each operation, and the distances derived via join and meet, d and d , were the same. The value in parenthesis shows the total maximum time span of each time-span tree; according to the definition of distance, we obtained d = (822 − 744) + (822 − 654) = 246 and d = (744 − 576) + (654 − 576) = 246. Notice that the four time-span trees form a parallelogram because the lengths of the opposite sides are equal. Thus, we have proved a lemma on the uniqueness of reduction distance in the proposed framework.
We strictly distinguish the domain of time-span trees from that of music represented in scores (Figure 7). The right-hand side of Figure 7 refers to the algebraic domain that we mentioned in the preceding subsections, while the left-hand side refers to the domain of actual music. To obtain a time-span tree from a given score, we need to perform a time-span analysis, and as long as we stay in the algebraic domain, we do not need to concern ourselves with the actual musical composition. On the other hand, this implies that we must consider another process of rendering to obtain real music from a given abstract tree.

Basic algorithm of melodic morphing
The initial melody A, target nuance melody B, resulting morphed melody C, and the morphing method must meet the following conditions. Conditions 1 and 2 are for melody C, and 3 and 4 are for the method. We represent the f -structures of melodies A, B, and C as σ A , σ B , and σ C , respectively. ( (3) The output of multiple melodies C depends on the parameters that represent influential features of melodies A and B. (4) C is a monophony if A and B are monophonies.

Ideas of melodic morphing
The meaning of morphing is to change something, such as an image, into another through a seamless transition. For example, a method of morphing one face picture into another creates intermediate pictures through the following operations.
(a1) Link characteristic points such as eyes, nose, etc., in the two pictures (Figure 8(a)). (b1) Link the common pitch events of the time-span trees of two melodies (Figure 8(b)). (b2) Remove those notes which do not reside in the common part by using the melody partial reduction, as explained below. (b3) Combine both melodies. By using the time-span trees σ A and σ B from melodies A and B, we can calculate the common events of σ A σ B , which includes not only the essential parts of melody A but also those of melody B (Figure 9 (b1)). The meet operation σ A σ B is abstracted from σ A and σ B , and those abstracted notes which are not included in σ A σ B are regarded to be the difference between σ A and σ B .

Partial melody reduction
Music features contained in σ A and σ B should exist even in what is not included in the common part. In order to retrieve these characteristics, we need a method to smoothly increase or decrease the number of features. The method described below, called partial melody reduction, abstracts the notes of a melody by using the reduction in Section 3.1.
First, we acquire melodies α i (i = 1, 2, . . . , n) from σ A and σ A σ B with the following algorithm. The subscript m of α m indicates the number of notes that are included in σ A but not in σ A σ B .
Step 1: Decide the level of abstraction The user decides the parameter L that determines the level of abstraction of a melody. L is from 1 to the number of notes that are included in σ A but not included in σ A σ B .
Step 2: Abstraction of notes This step selects and abstracts a note that has the fewest dots, obtained by the metrical analysis (Figure 1), in the difference of σ A and σ B . The numbers of dots can be acquired from the results of the analysis. If two or more notes have the fewest dots, we select the first one.
Subsumption relations hold as follows for the timespan trees σ α m constructed with the above algorithm.
In Figure 9 (b2), there are nine notes included in σ A but not included in σ A σ B . Therefore, the value of n is 8, and we can acquire eight kinds of melody α i (i = 1, 2, . . . , n) between σ A and σ A σ B . Hence, melody α i attenuates the features that exist only in melody A.  In the same way, we can acquire melody β from σ B and σ A σ B , as follows.

Combining two melodies
We use the join operator to combine melodies σ α i and σ β j , which are the results of the partial reduction done using the time-span tree of melodies σ A and σ B (Figure 9 (b3)).
The simple join operator is not sufficient for combining σ α i and σ β j , because σ α i σ β j is not always a monophony; nevertheless, σ α i and σ β j are monophonies. In other words, the result of the operation may become a polyphony (chords) when the time-span structures overlap each other and the pitches of the notes are different; therefore, the result violates condition 4 in Section 4.
To solve this problem, we introduce a special notation, [n 1 , n 2 ], which indicates note n 1 or note n 2 , as a result of n 1 n 2 . Accordingly, the result of σ α σ β is all possible monophony combinations.

Implementation of melodic morphing algorithm
Although we have given priority to automating the morphing process, the melodic morphing algorithm described in Section 4 above has the following two problems.
Problem 1: No order of abstract notes. This problem has to do with the order of abstract notes in partial melody reduction. In Step 2 of Section 4.2, it was said that an abstraction is made from the notes with the fewest dots, but this is not always the case, for example, in a time-span tree where there is a structurally salient note on a weak beat. In addition, we have to consider whether it is appropriate to uniquely determine the partial reduction path, as in Equation (9) in Step 3. If there are multiple paths for partial reduction, there is a possibility that more diverse melodies can be output. Problem 2: Notes with overlapping times occur. This problem happens when two temporally overlapping notes occur in the join of two time span trees. In such cases, it is necessary to manually select one melody from among multiple generated melodies, and it is difficult to completely automate the morphing method. Further, the user remains in the dark as to the morphing process. In particular, it is difficult for the user to understand that the number of melodies output as a result of a number of melodic morphing changes. Even if the user understands the outline of the morphing method in Section 4, the outputs of multiple melodies may not match his or her expectations.

Two solutions to the two problems
We tackled the two aforementioned problems in two way. The first way (Sections 5.2, 6.1, Appendix 1) is to provide logically consistent melodic morphing. The definitions of join and meet are only applicable to unifiable pairs of trees in the sense of the branch configuration. In Section 5.2, we amend their definitions and achieve logically consistent melodic morphing based on a ternary branching representation (described in Appendix 1). In Section 6.1, we generate a morphed melody based on the ternary branching tree and evaluate it through a subject experiment. The second way (Appendix 2, Sections 5.3, 6.2) is to define the order of notes abstracted by partial reduction and the order of notes selected by join. That is, when the time-span trees σ A and σ B of melodies A and B and the number of notes to be abstracted for each are determined, a unique melody, C, is obtained. In the experiment reported in Appendix 2, a musicologist who was familiar with GTTM and the melodic morphing method manually generated morphing melodies. The musicologist selected the notes to be abstracted by partial reduction and notes with overlapping times with the join operation. The results of the first method was different from the musicologist's expectations in that the two input melodies were monophonic but the output was homophonic. Section 5.3 describes automated morphing that defines the priority of branches, and Section 6.2 shows the experimental results obtained using automated morphing.

Melodic morphing based on ternary-branching tree
So far, we have proposed a framework in which a timespan tree is distinguished from a written score. Here, disregarding the join of two melodies in a score, we introduce a ternary-branching tree that represents the superimposition of left-branching and right-branching binary trees. Appendix 1 details the introduction of the ternary branching representation into the time-span tree.
Let σ A and σ B be two pieces of music and σ C be the expected result of morphing; we require that σ C should reside at an internal dividing point of σ A and σ B identified on the basis of N : M. The ratio M : N means the one in terms of the total sum of maximum time spans (denoted as tmts in Section 3.1). Notice that there are infinitely many σ C 's such that the ratio of the distance between σ A and σ C to that between σ C and σ B is M : N because σ C resides on the so-called Apollonian circles. Thus, we should restrict σ C to the one that resides at the shortest distance from σ A and from σ B .
Our morphing algorithm is shown in Figure 10. It consists of the following steps: (1) Find a reduction σ α of σ A that divides σ A and meet(σ A , σ B ) with the ratio of N : M in terms of the given distance.
(2) Find a reduction σ β of σ B that divides σ B and meet(σ A , σ B ) with the ratio of M : N.
We see that four time-span trees σ α , σ β , meet(σ A , σ B ), and join(σ α , σ β ) also form a parallelogram as in Figure 6. As well, in terms of the distance between In the current implementation, a fully instantiated node is simply rendered as a chord of two notes, that is, both sounding at the same time. Otherwise, for instance, it could be rendered as a transformation of the superimposed time spans. 6

Automating melodic morphing by prioritisation of branches
As shown in the previous subsection, the morphing algorithm designed through theoretical considerations has several desirable features. The distance between time span trees satisfies the four distance axioms. 7 Given the intrapolating ratio M to N, the algorithm can precisely generate the intended time-span tree from the input time-span trees. On the other hand, a ternary branching time-span tree may occur if we calculate the join of differently branching nodes of two time-span trees. In this case, we regard that the ternary branching timespan tree is a superimposition of the left-branching and right-branching time-span trees.
In this subsection, to cope with the problem of a ternary branching time-span tree, we reconstruct the morphing algorithm to make it more practical. The reconstruction involves abstracting the reduction distance, relaxing the requirement on a morphed melody having to stay at the shortest from the both input melodies, and selecting the arguments of the join operation. The reconstructed algorithm no longer generates a chord of two notes raised by a ternary branching node, and it enhances musical expressibility.
The morphing process described in Section 4 is difficult to automate because a musicologist may arbitrarily determine the order of reduction. Therefore, • Non-negativity: d(x, y) ≥ 0 • Identity of indiscernibles: d(x, x) = 0 • Symmetry: d(x, y) = d(y, x) • Triangle inequality: d(x, y) + d(y, x) ≥ d (x, z) we attempted to automate the process by defining the order in which branches are reduced. The priority of the branches of the time-span tree is determined by a breadth-first search with the maximum time span. Since the height of the branching points in a time-span tree is sometimes arbitrary (Marsden et al., 2018), we need to give another rigorous order in priority.
The priority of each branch of the time-span tree is determined with a time-span tree drawn with the maximum time span used in the time-span segmentation performed as the first step of the analysis of the timespan reduction. The branch priority is determined in accordance with the following rules.
• Priorities are assigned to each level from the top of the time-span tree drawn with the duration of the time span. • At the top level, the main branches take precedence.
• At the second and subsequent levels, the higher the priority of a branch X is, the higher the priority of the branch off of X becomes. Figure 11 shows a time-span tree drawn with the duration of the time span. The branch priority is determined in order from the top in accordance with the first rule. Then, in accordance with the second rule, branch 1 in the figure has the highest priority in this time-span tree, and branch 2 has the second-highest priority. In this time-span tree, the second level is the double-note level. In accordance with the second rule, the branch off from 1 becomes 3, and that from 2 becomes 4. In the same way, the priority is determined up to the 16th note level.
For automatic partial reduction, we decide how much each melody is to be reduced and reduce the branches of the non-common part of the two melodies. If the noncommon part of the melody of A is reduced by 30%, the reduction ratio of melody B is determined to be 70%, so that the total is 100%. Then, in the non-common part of each melody, the branches are reduced in order from the branch with the lowest priority. The number of notes is finite, so reducing them in accordance with a set reduction ratio is often impossible. In such cases, the branches are reduced so as to be closest to the reduction ratio.
As described in Section 4.3, when a melody is synthesised by a join operation, the branches of the time-span tree may overlap at the same time. For example, if the branches and notes overlap at the same time due to the join operation of melody A and B, the note with the lower reduction ratio is left. If both reduction ratios are 50%, the note of A is left.

Experimental results
Section 6.1 describes the variations generated by the method described in Section 5.2 and evaluates the pieces from a psychological viewpoint. Section 6.2 describes the results of morphing with the method described in Section 5.3.

Subject evaluation with melodic morphing based on ternary branching tree
The morphing algorithm was implemented in SWI-Prolog (SWI-Prolog, 2020). The set piece was Mozart's variations K.265/300e 'Ah, vous dirai-je, maman'. The piece consists of a famous theme and twelve variations on it. In our experiment, we took variations No. 1, 2, and 5 as sources for morphing, and we excerpted the first eight bars (Figure 12). We chose these three variations because, for every pair of these three, we can calculate the result of join, that is, the joined maximum time spans are all concatenated. To make a comparison easy, the morphed melodies generated by the improved algorithm are shown between the variations. For example, in the figure, 'No. 2 & No. 5' means the morphed melody at the midpoint of variations No. 2 and 5.
For the similarity assessment of the morphed melodies, we recruited six university students (2 females and 4 males), four of whom had played musical instruments for five years or more. Each participant listened to all pairs m 1 , m 2 in random order without duplication, where m 1,2 means variations No. 1,No. 2,or No. 5 and morphed melodies such as No. 1 & No. 2. Every time a participant listened to a pair, he/she was asked 'how similar is m 1 to m 2 ?', and they then rated the pair in one of five grades: quite similar = 2, similar = 1, neutral = 0, not similar = −1, and quite different = −2. At the very beginning, to eliminate the cold start bias, every participant listened to the theme and twelve variations (eight bars long) without rating them. In addition, when the they listened to and rated a pair m 1 , m 2 , and they were made to listen to the same pair later to avoid the order effect. Finally, the average ratings of each participant were calculated, and the average for all participants was then determined.
The experimental results were first obtained for a matrix of distances between variations No. 1,No. 2,and No. 5 and the morphed melodies between them. Since it was difficult to examine the results as they were, we employed multidimensional scaling (MDS) to visualise the results (Figure 13). To explain briefly, MDS plots items on a coordinate plane on which the closer items are, the more similar they are.
In terms of pairs of Nos. 1 and 2 and of Nos. 1 and 5, the morphed melodies were plotted at the midpoint of their source variations, almost as expected. In contrast, the position of No. 2 & No. 5 was problematic. As can be   No. 2 and No. 5. However,No. 2 & No. 5 was almost entirely made of eighth notes, and as the result of join, many of the notes had the same pitch or sounded at the same time. Consequently, the impression conveyed by No. 2 & No. 5 was closer to that of No. 5.

Automating melodic morphing by prioritisation of branches
We conducted an experiment to see if the two melodies used in Appendix 2 could be morphed. The results showed that after acquiring the time-span tree, there was no arbitrariness in the prioritisation of the branches, partial reduction, and combination of melodies. Therefore, when the reduction ratio was determined, the morphed melody could be deterministically obtained. In Figure 14, the notes included in melody A are displayed with stems up, and the notes included in melody B are displayed with stems down.  In Figure 14, syncopated rhythms occur not only because of the meet and join operations are applied on the branches, but also the meet and join operations are applied on the time spans. Figure 15 shows an example of a meet and join operations for time spans τ A and τ B . If τ A and τ B are separate from each other (if there are no parts that overlap chronologically, or else connection is not possible), a join does not exist and the meet becomes empty (notated as ⊥).
In the manual morphing performed in Appendix 2, the melody gained a rich variation of notes because passing notes, appoggiatura notes, and auxiliary notes were used, but the note selection had arbitrariness. In contrast, in the automatic morphing, the melody gradually changed while the notes used were restricted.

Conclusion
We proposed a new method of melodic morphing, based upon tree structures of music. Our method is summarised as follows.
Time-span tree GTTM analysis results in a hierarchical musical structure where the most salient pitch event in each local time interval is externalised. Here, we obtain these structures, called time-span trees, from given music pieces. Reduction A time-span tree possesses reducible leaves at the ends of branches. By removing such reducible branches one by one from the original tree, we obtain multiple barer, or reduced, trees, and such trees form a reduction path. Meet and Join We calculate the meet tree from two arbitrarily given trees in order to find the common structure that resides on the two reduction paths of the given trees. Our target morphed tree is the join of two certain reduced trees. Interpolation The ratio of morphing is arbitrary; the two original melodies can be mixed by a ratio of N : M, for example, when the reduction rates are N versus M from the originals to the meet tree. Then, the morphed tree resides as an interpolation with this ratio. Algebraic domain We faced the difficulty of join, since we could not always find the corresponding pitch events in two trees in a strict way. Therefore, we calculated in the algebraic domain of abstract feature structures equivalent to trees, but independent of the music score. This avoided the problem of locating the exact join points, and allowed us to employ the virtual join nodes with ternary junctions reflecting the ambiguity of left-/ right-branching.
Rendering Our methodology, however, requires an independent rendering process to assign concrete music notes to leaves of an abstract tree structure, where we need to devise a strategy for choosing branches.
We applied our method to the variations of K.265/300e 'Ah! Vous dirais-je, maman' by Wolfgang Amadeus Mozart. We composed new artificial variations on the multi-dimensional scaling (MDS) and compared the psychological distance with the computational distance among trees.
Our method has some obvious disadvantages; one is that it is only applicable to short pieces, short enough to identify the tree structures. However, our experimentation on eight-bar pieces showed that the method is practical enough for our purpose. The other issue is the subjectivity or arbitrariness of the choice of branches in the rendering process. We need to somehow assign a priority to the branches, and for this, we would need some sort of customisation.
The musical quality of the melody output by morphing is not necessarily high. Also, a plurality of melodies generated by morphing may have limited variation. We plan to improve the quality of the output melody by adding pre-processing to select the two melodies to be morphed and post-processing to select the most appropriate melodies generated by morphing. For example, in applications where the harmonic structure must not change, pre-processing or post-processing is required so that the output does not change the harmonic structure.
Our morphing method has been incorporated in two smart-phone applications, Shake Guitar (Hamanaka et al., 2011) and Melody Slot Machine (Hamanaka, 2019), which have been downloaded 275,708 times as of December 18, 2022. We plan to develop systems, using timespan trees and the results of the music analyser, for other musical tasks, such as searching, harmonising, voicing, and ad-libbing. Such systems will help to evaluate the effectiveness of implementing GTTM as a way to provide musical knowledge.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by JSPS KAKENHI Grants, numbers 17H01847 and 16H01744.

Appendix 1. Ternary branching representation
A new representation for a time-span tree is introduced, shown in Backus-Naur Form, as follows.
The symbol p means a pitch event as a terminal symbol, and ⊥ on the bottom means the identify element for the join operation. p contains information on pitch, maximum time span, and the corresponding note in a score. n and t stand for a time-span tree; t can be ⊥, while n cannot. ⊥ may occur only at the second or third place of the representation, not the first. The term c( n , t , t ) represents a node of a time-span tree; the first place of the term n represents a primary branch, the second place (first t ) a secondary left branch, and the third place (second t ) a secondary right branch ( Figure A1). The idea here is that node c( n , t , t ) may be synthesised by joining unmatched-branching trees and joining to a fully instantiated tree c( n , t , t ). The new tree representation enables the join operation to yield a proper result for those cases that have thus far not been unifiable. Here, joining unmatched-branching trees comprises cases such as join(c( n , t , ⊥), c( n , ⊥, t )) (upper part of Figure A2) and join(c( n , t , t ), c( n , ⊥, ⊥)); joining to a fully instantiated tree c( n , t , t ) comprises cases such as join(c( n , t , t ), c( n , t , t ))andjoin(c( n , t , t ), c( n , ⊥, t )). Simply put, the join operation recursively computes an argument-wise join. The ternary branching representation  can be regarded as a superposition, abstracting the distinction of left-/right-branching, of a binary tree, not as a node having three branches.
Moreover, the lower part of Figure A2 shows the calculation of meet in one of the formerly nonviable cases. Similarly, the meet operation recursively computes the argument-wise meet. Thus, in this case, the meet operation takes into account only the primary branches, ignoring secondary branches, which is equivalent to the treatment in the previous research (Hirata et al., 2013).
Note that the ternary-branching tree representation introduced here is distinguished from a ternary branching timespan tree that may occur in ternary meter. 8 The ternarybranching appears only when we tentatively calculate the join operation. There is still a necessary condition that we must calculate the join operation, that is, in the case of a joined maximum time-span being concatenated; otherwise, the result is undefined. Let [b, e] be a time-span beginning at b and ending at e; we may assume that the join of [1, 3] and [2, 4] is the connected interval of [1,4], while that of [1, 2] and [3, 4] remain as two separated intervals. Incidentally, the meet of [1, 3] and [2, 4] is [2,3], and that of [1, 2] and [3, 4] is undefined, not as ⊥.
To introduce a proper join, we impose the following useful rules on the time-span tree.
in the lower part of Figure A2, let t be a tree; then, c(t, ⊥, ⊥) cannot be rewritten to t if t is not atomic. Suppose p i means a pitch event; then, c(c(p 1 , ⊥, ⊥), ⊥, p 2 ) can be rewritten as c(p 1 , ⊥, p 2 ).
As we have proposed the new representation of the timespan tree with a ternary-branching node c and structural equivalence rule, we can similarly extend all definitions of the reduction path, reduction distance, total maximum time span, and the lemma on the uniqueness of the reduction distance that we developed in Sections 3.3, 3.1, and 3.4. Finally, we can prove the theorem on the triangle inequality of distance with the new representation of the time-span tree, although we will omit the details of the definitions and the proofs of the lemma and the theorem.

Appendix 2. Manual morphing by musicologist
A musicologist created morphed melodies by manually performing time-span analysis and partial reduction from two original melodies ( Figure A3). The two melodies were Mozart's Horn Concerto No. 1 and Ponchielli's Dance of the Hours from La Gioconda. In manual morphing, first, the common and noncommon parts of the melodies were obtained from the two time-span trees, and the number of nodes in each time-span tree was also acquired. The number of nodes was obtained by counting the nodes passing from a note to a root of the time-span tree.
Next, partial reduction was performed in order from the branch with the largest number of nodes in the non-common part. Although a morphed melody could be generated with this process, it became unnatural where the number of notes decreased rapidly. In those places, the musicologist adjusted the unnatural parts so that they would form a natural melody.