Automatic Myotendinous Junction Identification in Ultrasound Images Based on Junction-Based Template Measurements

Tracking the myotendinous junction (MTJ) motion in consecutive ultrasound images is essential to assess muscle and tendon interaction and understand the mechanics’ muscle-tendon unit and its pathological conditions during motion. However, the inherent speckle noises and ambiguous boundaries deter the reliable identification of MTJ, thus restricting their usage in human motion analysis. This study advances a fully automatic displacement measurement method for MTJ using prior shape knowledge on the Y-shape MTJ, precluding the influence of irregular and complicated hyperechoic structures in muscular ultrasound images. Our proposed method first adopts the junction candidate points using a combined measure of Hessian matrix and phase congruency, followed by a hierarchical clustering technique to refine the candidates approximating the position of the MTJ. Then, based on the prior knowledge of Y-shape MTJ, we finally identify the best matching junction points according to intensity distributions and directions of their branches using multiscale Gaussian templates and a Kalman filter. We evaluated our proposed method using the ultrasound scans of the gastrocnemius from 8 young, healthy volunteers. Our results present more consistent with the manual method in the MTJ tracking method than existing optical flow tracking methods, suggesting its potential in facilitating muscle and tendon function examinations with in vivo ultrasound imaging.


I. INTRODUCTION
T HE Muscle-tendon unit (MTU), composed of muscle and tendon, is crucial for the body's force production and movement [1]. Recently, sonography has been used to measure some structural parameters of MTU, such as tendon length and myotendinous junction (MTJ) displacement, to study sports mechanics and reveal the roles of muscular and tendinous tissues in human movements [2], [3]. Some studies have applied ultrasound imaging to monitor MTU for exploring kinetic mechanisms in both diagnosis and rehabilitation, including investigating the muscle and tendon properties in patients with spastic cerebral palsy [4], monitoring the prognosis of knee surgery [5], or estimating the muscle force output [6]. In this field, ultrasound imaging has become an encouraging method to identify the structural changes and interactions of muscles and tendons during voluntary muscle contraction, facilitating the evaluation of physiological functions and pathological conditions in vivo and formulating effective treatment strategies [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15].
The MTJ is a specialized region where contractile force is transmitted across the sarcolemma to the tendon [14]. As shown in Fig. 1, the MTJ commonly appears clearly as a Y-type junction where the tendon and aponeurosis intersect at the distal end of the gastrocnemius in muscular ultrasound images [10]. As a result, MTJ has become a reliable landmark for monitoring MTU and tendon motion [13], which has been widely applied to understand muscle and tendon interaction and explore various biomechanical quantities, such as strain and stiffness in the tendon [11]. For instance, previous studies quantified the modulation of the interaction between muscle and tendon by tracking the MTJ displacement, revealing that tendon tissue acts as a spring during human movement, storing and recovering the elastic potential energy generated by muscles, thus preventing musculotendon injuries [3], [9], [13], [16], [17]. However, the subjective and time-consuming nature of manual segmentation of MTJ impedes its application, especially in human motion analysis. Furthermore, the inherent speckle essence in ultrasound images also obscures the MTJ boundaries and increases the difficulty of locating or labeling the MTJ landmarks for radiologists.
Some studies have been devoted to developing automatic approaches to identify MTJ in subsequent ultrasound images. Most assume homogeneous deformation in the motion region of interest (ROI). Several reported methods explore the motion context to measure the MTJ by observing the continuous changes between sequential images [18], [19], [20], [21]. These methods mainly obtain the global affine transformation parameters using the Lucas-Kanade (LK) optical flow method [22], deriving the motions in muscular and tendinous structures. However, the MTJ is located at the proximal end of the tendon, a specialized area connecting muscles and tendon tissues, which is prone to exhibit heterogeneous affine transformation nearby the MTJ region during motion due to the antagonistic and synergistic muscle action [23]. Kharazi et al. [2] employed a normalized cross-correlation method to track MTJ displacement with a predefined template, avoiding the homogenous assumption of optical flow-based methods, despite relying on manual initialization and minor appearance variations. On the other hand, the aponeurosis and the intramuscular tendon are usually nonuniformly distributed as hyperechoic linear structures in ultrasound images. Some line enhancement techniques have been adopted to directly identify the MTJ location by exploring the prior information on structures [24]. Englmair et al. [24] attempted to use a stick filter bank to enhance the Y-shape structure and used a region-growing algorithm to segment the MTJ of the gastrocnemius medialis (GM). Nonetheless, the high muscle contraction levels tend to make the muscular and tendinous tissues slightly curved, which might cause inaccurate MTJ estimation under the hypothesis of assuming it as linear structures.
Deep neural networks (DNN) have significantly advanced various medical image analysis tasks, including anatomical landmark localization [25], [26], [27]. MTJ detection has also seen great promise in recent years with the availability of deeplearning models [28], [29], [30], [31]. Some DNN methods regress the MTJ positions by learning the map from the input appearance to the image coordinates [30], [31]. However, these methods require the network to learn a complicated mapping between a landmark's appearance and the corresponding image coordinates, making them usually challenging to train [27], [32]. Zhou et al. [29] proposed a proposal-based DNN method to integrate an MTJ landmark regression branch into the Faster Region-based Convolutional Neural Network (Faster R-CNN) [33] architecture for adaptively segmenting the MTJ region more accurately. However, these DNN methods require a large training dataset that is expensive to annotate and acquire, especially in ultrasound videos. Also, ultrasound images tend to shift image appearance due to different Time Gain Compensation (TGC) settings and tissue intensity in different depths. This appearance shift between the model training and testing phases might degrade the MTJ detection performance using DNN.
In this study, we propose a fully automatic method to detect MTJ in the ultrasound image with prior shape knowledge of the Y-shape MTJ. The algorithm can deal with irregular and complicated hyperechoic structures in muscular ultrasound images, achieving accurate and efficient MTJ detection. This new model first proposes to scrutinize the junction candidate points based on a combination of Hessian matrix and phase congruency, followed by a hierarchical clustering technique to screen a few amounts of candidates approximating the position of the MTJ. Finally, based on the prior knowledge of Y-shape MTJ, we scrutinize these candidate junction points to identify the best matching points according to intensity distributions and directions of their branches using multiscale Gaussian templates. Also, we apply the Kalman filter to explore the motion context to refine further the localization of the MTJ in the sequential images. We evaluated the proposed model using real muscular ultrasound image sequences acquired during the passive ankle joint rotation.

II. METHOD
We define the MTJ junction as where three hyperechoic branches meet and intersect as a Y shape. Fig. 2 shows the flow diagram of the proposed MTJ detection method, which consists of Hessian-phase-based initialization, hierarchical clustering, template-based MTJ localization, and Kalmanfilter-based refinement. Inspired by the junction measurements based on the Hessian matrix [34] and considering the inherent speckle essence of the ultrasound image, we first reduce the irrelevant points from images using a combination of intensitybased and phase-based measurements to identify the initial candidate points. Then, we employ a hierarchical clustering method to obtain the representative junction classes with the corresponding cluster centers and regions, followed by a Gaussian-like template matching to detect MTJ based on their prior geometric structure. Finally, we employ a Kalman filter to refine the position of junction points, contributing to the MTJ detection of junction points among consecutive frames. We implemented our proposed method as a custom-designed program using Matlab R2020a (The MathWorks, Natick, MA, USA) to measure MTJ displacements in ultrasound images.

A. Hessian-Phase-Based Candidate Initialization
In musculoskeletal ultrasound images, the aponeurosis and the intramuscular tendon forming the MTJ exhibit linear hyperechoic bands [28], suggesting the center of the Y-type hyperechoic junction as the landmark to identify the MTJ. Nonetheless, it is challenging to accurately localize the junction center from these hyperechoic bands because of the inherent speckle essence and the shape deformation during motion. We crop about half of the input image to avoid the interference of hyperechoic bands in the lower part of the image, followed by Gaussian smoothing to reduce noises. Then, we distinguish the hyperechoic structures from the background by an adaptive threshold, suppressing the influence of speckle noises. This pixel-wise threshold is determined based on the average intensity in their neighborhood. Moreover, as the watershed line at ridge-like structures can be differentiated using the zero-crossings of at least one-direction first-order derivatives [35], we combine the first-order partial derivatives in the neighborhood to further preprocess the image to reduce the redundant weak foreground pixels. Hence, we formulate the identified hyperechoic structures, dubbed H (x, y) H (x, y), with the following equation: where I x , I x are the first-order derivatives vectors along the x and y directions of the image I (x, y), and u x , u y are the corresponding unit directional vectors.Ī (x, y) is the average intensity of pixel blocks of size w × w (3 × 3, in this study), determining the threshold to separate the hyperechoic structures from the image. An empirically predefined threshold, α = 1, was used to remove the most non-junction pixels but to retain the possible MTJ junction point set {M 0 }. The second-order information of the image is another intuitive indicator to screen the geometric structures with the multiple local principal directions. Therefore, we employed the eigenvalue analysis of the Hessian matrix, a widely used tool to measure the local geometrical properties [36], for further scrutinizing the candidate MTJ points. Assuming that λ 1 and λ 2 (|λ 1 | > |λ 2 |) are the two eigenvalues of the Hessian matrix computed at each pixel, a trivial second-order derivative suggests a much larger absolute value of λ 1 than λ 2 , which can characterize the direction along the linear hyperechoic structure with λ 1 < 0. In contrast, the junctions or corners should have two close eigenvalues due to multiple principal directions, resulting in a smaller absolute ratio of λ 1 to λ 2 than linear structures. So this study adopts a measurement metric, Y = |λ 2 | |λ 1 |, to measure the junction points.
However, the gradient vector is unavoidably sensitive to the noise in most images, especially speckle-noise ultrasound images, leading to many false-positive points when only using the Hessian matrix to measure local structures. Phase congruency is a contrast invariant measure of feature significance, which postulates that the Fourier series is either minima or maxima in phase at the significant features based on the local-energy model [37]. The psychophysical study revealed that the human visual system much more easily perceives visual features with high phase congruency [38]. Given their significant advantages, we can utilize the phase congruencybased measure to determine the junction and edge strength, restricting the false-positive points. The logarithmic Gabor filters are the commonly used bandpass filters to obtain local frequency information at each point because of their excellent noise compensation and feature localization [38]. The phase congruency at each orientation is then defined as below: where A s (x, y) is the wavelet amplitude response with a given scale at point (x, y), and e s (x, y) and o s (x, y) are the even and odd symmetric metrics from the filter response, respectively. The absolute value of o s and e s will approach 0 and 1 at a point of symmetry and vice versa. W (x, y) is the frequency spread weighting factor to stress the significance of the congruency over the more frequencies component. The T r is the noise threshold, and the small number ε is to avoid division by zero. The operator ⌊·⌋ denotes: ⌊z⌋ = z if z > 0; otherwise, ⌊z⌋ = 0. On the other hand, the maximum moment of the phase congruency, the second-order phase information, corresponds to the axis perpendicular to the principal axis, which can indicate the feature's orientation for screening the local geometric structures [39]. Therefore, instead of directly using phase congruency, we use their maximum moment magnitude as an auxiliary second-order metric to measure the significance of the local geometric structures, refining the possible candidate points from the point set {M 0 }. The maximum moments can be calculated as the expression in (3), shown as below: where PC (θ) is the phase congruency value at orientation θ, and the sum is performed over all orientations. All parameter values used in deriving local phase are the same as those found in [19]. Finally, this study adopts a combination of Hessian and local phase measurements to determine junction points using the following equations: where a is the predefined constant. The expected MTJ junction point should have a large M and Y , leading to a more significant R-value for a junction point. This study empirically set a to be 5. Fig. 3 shows an example of the effect of Hessian and phase and candidate MTJ junction points on the GM ultrasound image, suggesting a large amount of junctions output with this measurement. As a result, we sort all image points in descending R-value order, followed by taking the first N (100 in this study) points as the initial candidate MTJ junction points {M I }.

B. Localization Refinement and MTJ Detection
The speckle noises often disguise linear hyperechoic bands composing the MTJ in musculoskeletal ultrasound images, making the candidate points not exactly at the center of the junctions. Instead, as shown in Fig. 3, these candidate points lie nearby the aponeuroses where the weak and the strong Hessian-based signal intersect. Also, these noises result in superfluous candidate points with Hessian-based detection. Therefore, a refinement process is required to reduce the false-positive points and generate a self-organizing MTJ candidate using its Y-type shape prior.
1) Hierarchical Clustering-Based Refinement: Their positional relations are the first decisive information to identify candidates from those initial Hessian-based points {M I }.
An unsupervised cluster mechanism is suitable to cluster the points with close positions, thus reducing the computation burden of scrutinizing redundant candidate points. However, the state-of-art unsupervised partitional clustering methods, such as K-means, require predefining the number of categories for clustering. By contrast, hierarchical clustering methods can progressively merge nodes in the graph in a hierarchical structure [40], [41]. Thereupon, hierarchical clustering methods can identify subgroups from the medical image without the expected number of categories [42], which is essential to keep anatomical information in this study. In this study, we employ agglomerative hierarchical clustering to form the dendrogram of those initial Hessian-based points {M I }, which can naturally unveil anatomical similarities within the points. The commonly used Euclidean distance is first calculated to define the similarity metric between the pair subjects. Moreover, it is necessary to use the distance metric to group the binary clusters and iteratively associate those into higher-level clusters until all points are linked. We utilize the cluster average as the centroid to calculate the distance between two groups or the group to point for the dendrogram construction. The agglomerative hierarchical clustering algorithm is summarized as follows [43]: 1) Calculate Euclidean distances of every pair within the input point set {M I } to obtain a metric of pairwise distance similarity (treating each point as the lowestlevel cluster). 2) Assemble the closest pair points or clusters into higher-level clusters using the Euclidean distance metric. 3) Recalculate paired Euclidean distances from newly assembled clusters and remaining points or clusters. 4) Return to Step 2 until all points in {M I } are merged into one large cluster, a dendrogram with multi-level subclusters. 5) Cut off dendrogram branches at a specified level of the hierarchy to assign points below the dividing line to the specific clusters, thus generating group partitions of the point data. Fig. 4a shows an example of a dendrogram obtained on the ultrasound image. As the mean thickness of the gastrocnemius tendon of young adults was around 4.2 mm [44], we cut the dendrogram branches horizontally at a distance of 39 pixels in images with the 0.11 mm/pixel resolution, thus partitioning the data into subgroups {G H }. We assume that two clusters larger than this distance threshold belong to different hyperechoic bands and should not be merged into the same subgroup. Furthermore, the observation of the Y-type MTJ in ultrasound images ( Fig. 1) suggests that this structure can be used as a guide to accentuate the cluster points much closer to the Y-shape center. As a result, we adopt their centroid (the weighted average of the group, the weight is the score of each point which will be elaborated on in the next section) c G i to represent the rough candidate for the subgroup G i . Fig. 4b and Fig. 4c illustrate the subgroup and corresponding centroid derived from the ultrasound image.
2) MTJ Localization and Detection: The Y-type shapes are composed of three linear hyperechoic branches, suggesting the average intensity over the stick region should be the largest at aligning with a junction branch when rotatably sweeping with a stick-shaped area around the candidate point [35]. Moreover, a Gaussian curve can approximate the corresponding cross-section intensity profile of the linear hyperechoic structure [45]. Therefore, we can obtain the intensity distribution consisting of three Gaussian-like curves whose peaks correspond to the localization of each junction branch when rotatably sweeping the stick. As shown in Fig. 5a, we use a stick to rotate one turn from the horizon around the centroid point c G i of the subgroup G i and calculate the average intensity at every angle o = 0, ρ, . . . 360/ρ − 1 as below: where I (x i , y i ) is the image pixels covered by the stick at the orientation o, and k is the corresponding number of pixels.
In this study, we employ a straight stick with a length of 120 pixels and a width of 1 pixel to rotate with an interval of 1 • . In addition, we use a moving average filter to smooth the intensity profile, reducing false peak values in all rotated directions.
On the other hand, the intensity profile around each peak in f c G i (o) should best match a multiscale Gaussian-shaped template corresponding to the cross-section of aponeuroses and tendon structures in ultrasound images. Therefore, we can locate branch orientation from the intensity profile peak of f p i (o) using a similarity measurement between the Gaussianlike curves and a multiscale Gaussian-shaped template. This study adopts the ten degrees from both sides of each peak to fit the Gaussian template and discards those with a correlation coefficient of less than 0.6. In detail, the template is defined from the first frame image using the Gaussian function g (x) = ae x 2 2σ 2 , where a is the peak value and σ is calculated from the estimated value of the Gaussian function. In this study, the average value of the leftmost and rightmost extent of the extracted intensity distribution is used as the estimated value g (10). Moreover, the angles among the three branches composing MTJ are roughly consistent during the motion for each subject. We can then specify the branch orientation in three specified intervals determined in the first frame of the same subject. For example, Fig. 5b illustrates that the three branches correspond to the three intervals of 0-50 • , 135 • ∼225 • , and 310 • ∼360 • , respectively. Finally, we select those points with three Gaussian-like peaks from c G i for further MTJ detection. The result after this operation is denoted as {M c }.
For each candidate point p i in {M c }, we search points p it in the local neighborhood (12 × 5) around p i and select the best matching one among these points with the consideration of slight localization shifts due to the tendon width. Moreover, we apply phase congruency with the illumination invariance to extract the ridge-like tendon and aponeuroses in ultrasound images, thus reducing the searching points in the local neighborhood. Also, it has been reported that the orientation of tendinous tissue ranges from about −10 • to 10 • [46], motivating us to enhance these hyperechoic structures using phase congruency with specified central orientation [47], [48]. We then binarize the symmetrically oriented phase congruency measure as I M T J (x, y) using the OTSU [49] to identify the effective searching region M T J , which is composed of the points p it (x, y) with I M T J (x, y) = 0 and I M T J (x − 1, y) = 1.
We finally identify the MTJ points with the largest average peak intensity from the effective search region M T J . Also, as the angles among the three branches composing MTJ almost remain unchanged during the motion, we impose a constraint to make the intersection angle between branch 1 and 3 approximate the one in the first frame. Moreover, the MTJ seems to be a roughly symmetric structure consisting of two branches with an acute angle (Fig. 5), which can also be used as a constraint to identify the MTJ. Therefore, we choose the MTJ with the largest constrained average peak intensity calculated as below: where f 1 , f 2 and f 3 are the average intensity of three peaks located at θ 1 , θ 2 and θ 3 in the intensity curve f p i (o). θ 0 is the angle of the candidate point detected in the first frame, making α approaching 1 with the consistent angle between frames. Also, β is the calibration factor to prefer the symmetric structure with an acute angle. In this study, both T 1 and T 2 were empirically determined to be 100 • . Meanwhile, the Gaussian-shaped template is also applied for p i . Fig. 5 show an example of the MTJ localization on the GM ultrasound images, as shown in Fig. 3a.

C. Consecutive Images Processing
Given a sequence of ultrasound images, we could further utilize the context between sequential frames to regulate the MTJ measurements, better tackling too noisy ultrasound images. Kalman filter is a well-established algorithm for navigation and guidance problems by estimating a joint probability distribution over observed measurements with possible statistical noise and inaccuracies. Kalman-filtered measurements can then incorporate the dynamic model with the variables for each timeframe as additional prior information, achieving more accurate object tracking than those based on single measurements. Therefore, we employ the Kalman filter to refine the MTJ localization from consecutive ultrasound images in this study.
The system model to track MTJ motion consists of the linear equations evolving state and measurement with minimum mean square errors: where x k = x 1,k , x 2,k T is the state vector where x 1,k and x 2,k are the MTJ locations (e.g., x-axis and y-axis, respectively) at time k. The state transition matrix F k = 1 0 0 1 conducts the prediction of MTJ state with its previous state at time k − 1.
By contrast, H k = 1 0 0 1 is the observation matrix, mapping the state vector into the measurement vector z k = z 1,k , z 2,k T whose value is the MTJ position detected using the approach mentioned in the above section. And w k = w 1,k , w 2,k T and v k = v 1,k , v 2,k T respectively denote the random system and observation perturbations influencing MTJ locations. We follow the assumption in the conventional Kalman-filtered methods that w k and v k are Gaussian distributed with zero mean and known covariance, resulting in Gaussian distributed state vector that can be recursively updated as follows: Prediction where u k is the estimation of the state vector x k , K k is the Kalman Gain, P k|k−1 is the prior error covariance, Q k and R k are the nominal state and measurement noise covariance, assumed to be known. After Kalman filtering, false locations may still exist. Therefore, we impose a constraint to limit the range of movements, which excludes the cluster centers with a distance of more than 80 pixels from the MTJ coordinates of the previous frame in the update process. Moreover, we take N = 400 points as the initial candidate MTJ junction points {M I } in the first frame, ensuring at least one cluster group near the target point. An example of MTJ localization using different methods: (a) the measurement of MTJ at the 23 frame; (b) the measurement of MTJ at frame 135; the red, green, yellow, blue, cyan marker respectively represents the prediction from the manual method, the LK method [22], the multi-template LK method [20], the phase-based LK method [19] and our proposed method. Fig. 7. The MTJ displacement acquired by the proposed method, Lucas-Kanade method [22], multi-template LK method [20], phasebased LK method [19], and the manual method when tested the passive rotation of the ankle joint for a typical subject.
Then we use the MTJ position detected in the first frame as the reference to identify the initial candidate MTJ junction points {M I } with N = 100 from half of the image in the following frames to save the computation cost and improve the performance.

D. Experiments
Eight healthy adults (Age: 25.50 ± 2.56 years; Four male and four female; Body Mass Index (BMI): 22.23 ± 2.03 kg/m 2 ) with no musculoskeletal injury history were involved in the study to evaluate the proposed method. We asked participants to lie prone with the dominant knee The student paired t-test results between the proposed method, Lucas-Kanade method [22], multi-template LK method [20] and phase-based LK method [19]: a) the RMSE results; b) the CMC results. Bland-Altman plot of the MTJ displacement obtained with the manual and proposed methods. The vertical coordinate denotes the Euclidean distance between the manual measurement and the automatic measurement in measuring the same MTJ. bent at 20 • and fixed their feet on the pedals attached to a dynamometer (IsoMed 2000, D. & R. Ferstl GmbH, Hemau, Germany). Tests were only performed on the dominant foot at the convenience of the experimental setup. The ankle joint was carefully aligned with the dynamometer axis of rotation using a laser device. During the experiment, the ankle joints of all subjects were passively rotated several times within the range of motion (ROM) of −19.26 ± 0.08 plantarflexion to 11.67 ± 0.05 dorsiflexion while all subjects were instructed to remain relaxed. The regional ethics commission, Stockholm, Sweden, approved this study, and each subject was required to sign informed consent before experiments. The MTJ excursion was recorded using an ultrasonic scanner (Vivid Q, GE Healthcare, Milwaukee, WI, USA) at a sample rate of 40 frames per second with a linear probe of a frequency of 3.5-10 MHz and a field of view of 53mm. We optimally positioned the ultrasound transducer to parallel the tendon in the sagittal plane, making the ultrasound image with a resolution of 0.11 mm/pixel line up with the longitudinal axis of the tendon.

E. Data Analysis
The proposed approach, as well as the LK-based method [22], the multi-template LK method proposed by Lee [20], and the phase-based LK method proposed by Zhou [19], were applied to estimate MTJ displacement, respectively. We re-implemented these LK-based methods using Matlab R2020a (The MathWorks, Natick, MA, USA) based on the literature. Moreover, according to the definition of MTJ, manual labeling of MTJ was conducted three times in each image by a single expert experienced in muscular ultrasound images for more than ten years and blinded to other measurement results.
For comparison with the automatic measurements, the mean value of the manual labeling was used as a reference. We used the intra-class correlation coefficient (ICC) to assess the repeatability of intra-observer manual measurements [50]. Unless otherwise stated, values were reported using mean and standard deviation (Mean (± S.D.)). The root-meansquare error (RMSE) between the automatic and the manual measurements was first calculated to evaluate the performance of proposed method. The overall similarity with the coefficient of multiple correlations (CMC) [51] was also calculated to assess the waveform similarity between the automatic and manual measurements. Higher dissimilarity in waveforms has lower CMC values, whereas CMC of highly similar waveforms approaches 1. Moreover, the difference between the CMC and RMSE values among different methods was examined using a Student's paired t-test. In addition, the Bland-Altman plot of differences [52], usually used to identify any systematic difference between the fixed bias or potential outliers, was applied to assess the agreement between the manual and automatic measurements. The mean measurement difference between the two methods represents the estimated bias, while the corresponding standard deviation indicates the random variations around the mean value. Furthermore, the relationship between the rotation of the ankle angle and MTJ displacement was described by the Pearson product-moment correlation (R 2 ) coefficient. The level of significance was accepted at p < 0.05.

III. RESULTS
The intra-observer MTJ manual measurement shows excellent repeatability, with an average ICC of 0.95 ± 0.03. Table I shows the RMSE between the manual measurements and results obtained from automatic methods. Compared to the conventional LK (3.58 ± 0.72 mm), multi-template LK (3.03±0.92mm), and phase-based LK methods (3.17 ± 0.96 mm), our proposed method has the best performance in the MTJ localization with the smallest RMSE of 1.94 ± 0.63 mm. In addition, the proposed method had CMC values ranging from 0.90 to 0.98, which is also higher than other automatic methods (Table II). For instance, the CMC value (0.79 ± 0.11) of the Lucas-Kanade method only ranged from 0.63 and 0.86. As illustrated in Fig. 6, the MTJ determined by the LK-based method in comparison with the proposed approach exhibited slight deviation from the actual positions in some frames, which can be ascribed to heterogeneous motion patterns nearby the MTJ region. A typical example of measuring MTJ displacement acquired by the proposed approach in consecutive images is shown in Fig. 7, for which the LK-based methods were unable to keep up with the movement of MTJ, resulting in a larger RMSE and smaller CMC. These findings implied that the blurred images with high speckle noise and inhomogeneous deformation across the ROI violated the assumption of a uniform affine transformation. Thereupon, the performance of LK-based methods is susceptible to manual ROI initialization. On the other hand, our proposed method directly detects the MTJ junction from the whole image using Hessian-based metrics and Gaussian-like templates, independently from the ROI choice in consecutive images. As shown in Fig. 8, our proposed method performs significantly better in terms of CMC and RMSE than the other approaches (p < 0.05). Fig. 9 presents the Bland-Altman plot between the manual and automatic measurements in one typical subject, which revealed a low mean difference (0.62mm), and the 95% consistency limit was also within the acceptable range (±1.96SD = 1.44mm). According to these results, the automatic measurement with our proposed method exhibited more consistency with the manual measurement than the LK-based method, which was reliable for estimating MTJ displacement without manual initialization. Moreover, when the ankle angle was rotated from −19.26 ± 0.08 • to 11.67 ± 0.05 • , the MTJ displacement measured by the proposed method ranged from −9.90 mm (proximally) to +5.94 mm (distally) and was correlated with the ankle joint angle (r = 0.98 ± 0.01, all p < 0.001). Fig. 10 exhibits a typical example of the MTJ displacement with the ankle joint angle. On the other hand, Fig. 11 demonstrates the effect of using the Kalman filter in which the influence of anomalous detection points was restricted in the MTJ identification of consecutive images. As a result, the estimated MTJ displacement becomes smoother with fewer outliers. Also, the average computation time for detecting MTJ using our proposed method in each frame with 482 * 390 pixels was about 0.85 sec, using a computer with an Intel Core i5-8500 3.00GHz processor and 16GB of memory.

IV. DISCUSSION AND CONCLUSION
We have developed a fully automatic method to identify MTJ in ultrasound images by exploring the prior knowledge of Y-shape junction structures. Combining Hessian-based metrics with the maximum moment of phase congruency obviates the influence of inherent speckle essence and non-junction features from automatically selecting candidate junction points. To refine the MTJ location, we subsequently employ a hierarchical clustering method to determine the possible junction center, followed by a Gaussian template fitting with the prior Y-shape to determine the optimized MTJ junction point. Moreover, a Kalman filter is employed to regulate the MTJ measurements, better tackling noisy ultrasound images. The validation with in vivo experiment shows that MTJ could be reliably identified from continuous ultrasound images, consistent with the manual measurements, and highly correlated well with ankle angle during plantar flexion motion.
Our novel automatic method takes advantage of the prior knowledge of the structures of MTJ by examining the intensity-based and phased-based information simultaneously. The image intensity is the intuitive indicator to identify local structures, such as junctions where the local autocorrelation function has a distinct peak. However, this information is prone to be disfigured by the speckle noises in images, which increases the difficulty in accurately locating the corners using the local autocorrelation function of intensity-based metrics. On the other hand, phase congruency is an illumination and contrast invariant measure of feature significance, which is beneficial for detecting ridge-like structures in ultrasound images. Therefore, we exploit the second-order information of both image intensity and contrast invariant phase congruency to scrutinize the feature's importance of local geometric structures, restricting the false-positive candidates from noisy ultrasound images. Moreover, a morpho-Gaussian template with the prior branch number is advanced to determine the optimized junction point for the Y-shape MTJ structures. Fig. 6 and Table I illustrate that our method can detect MTJ with good localization accuracy, even with very short branches in the images due to motion. It indicates that our approach can achieve easier MTJ detection with prior knowledge of MTJ structures from the musculoskeletal ultrasound images. Also, the intersection between slightly curved muscular and tendinous tissues can still be characterized by this secondorder information, suggesting the robustness of our proposed method even during the relatively high muscle contraction levels.
By contrast, the semi-automatic LK-based methods highly depend on the manual ROI initialization and the assumption of homogeneous affine transformations. It is, therefore, necessary to tackle the accumulated errors due to inhomogeneous deformation across the region of interest between consecutive images. Lee et al. [20] proposed to employ the LK method to the multiple regions nearby MTJ and adopt their average as MTJ displacement, which was considered more reliable than using a single region to track MTJ movement. Zhou et al. [20] extracted the skeleton of the tendon based on phase congruency and Radon transform and then calculated the optical flow of the points on the skeleton instead of all points in the ROI. Table I shows that the multiple regions tracking improve the tracking accuracy from 3.58 ± 0.72 mm to 3.03±0.92 mm, and the phase-based tracking raises to 3.17±0.96 mm. Nonetheless, the inhomogeneous affine transformations and inappropriate ROI choices still result in underestimated MTJ displacement. As a result, the CMC and RMSE metrics of both methods are worse than those of our proposed method. In addition, we also found that the performance of LK-based methods tends to degrade significantly when non-linear displacement happens. On the other hand, our method explores the local structure directly from images, avoiding the accumulation of tracking errors in continuous images. Similarly, to prevent the homogenous assumption of optical flow-based methods, Kharazi et al. [2] scrutinized the local structure for tracking MTJ using dozens of predefined templates and manual outliers identification. In [24], a sticks filter bank was used to enhance the Y-shape structure, followed by the region-growing algorithm with a manual seed to segment MTJ. Nevertheless, unlike our proposed method, these methods still highly rely on manual initializations, such as predefined templates and seeds for region growth. Also, the normalized cross-correlation method and a stick filter bank used in these methods are computationally expensive. In contrast, the processing time of our proposed method is currently below 1 sec per frame, making real-time application possible.
Hierarchical clustering is another critical element to the success of our proposed method of handling noisy ultrasound images. Although we can effectively reduce the false-positive candidates using the maximum moment of phase congruency, there are still many candidate points due to the appearance of speckle patterns. As MTJ is the intersection of two aponeuroses nearby the distal end of the muscle, most candidate points are clustered around the tendinous tissues. Also, the extent of the cluster should be ruled by the tendon dimensionality. However, the state-of-art unsupervised K-means clustering methods require predefining the number of categories, which also lacks anatomical similarities exploration among the points within the group. Therefore, this study employs hierarchical clustering with a distance threshold less than the mean thickness of the gastrocnemius tendon of young adults [18] to refine the candidate points, which can progressively merge nodes in the graph in a hierarchical structure without the expected number of categories. Moreover, we determine the cluster centroid using the weighted average of all cluster points instead of the mean value, among which the weight of each point reflects the similarity coefficient with the Y-shape template. This weighted average operation can make the cluster centroid as possible as close to an MTJ-like structure with the assumptions of the prior anatomical information, thus enhancing the accurate localization through the morpho-Gaussian template matching and reducing the computational burden. Furthermore, as shown in Fig. 11, the Kalman filter is beneficial for MTJ measurement regulation in the context between sequential frames, which can smooth the prediction and reduce the computational burden.
In summary, this study advances a fully automatic method to identify MTJ in ultrasound images with prior knowledge of Y-shape junction structures. The proposed method obviated the influence of non-junction features using the combination of Hessian matrix and phase congruency, and hierarchical clustering. We finally identify the MTJ using a Gaussian template fitting with a prior Y-shape junction. The results showed a good agreement between the automatic and manual measurements. Also, there was a strong correlation between MTJ displacement and the ankle angle during plantar flexion. This approach avoids subjective measurements, such as initialization, thus reducing the changes in MTJ tracking. We anticipated that our proposed method would offer an efficient means of using ultrasound imaging to assess the functionality of the muscle-tendon units. However, the imaging quality of the MTJ structures is still a constraint of the method suggested in this study. Future studies with more subjects and patients are expected to improve the generalization ability of the proposed method and demonstrate the novel method's potential for a thorough understanding of muscle and tendon mechanisms. Moreover, the frame rate of ultrasound imaging might influence the performance of the proposed method. In future studies, we can incorporate the Bayesian Kalman Filter to enhance the tracking performance.