Deep fiber clustering: Anatomically informed fiber clustering with self-supervised deep learning for fast and effective tractography parcellation

White matter fiber clustering is an important strategy for white matter parcellation, which enables quantitative analysis of brain connections in health and disease. In combination with expert neuroanatomical labeling, data-driven white matter fiber clustering is a powerful tool for creating atlases that can model white matter anatomy across individuals. While widely used fiber clustering approaches have shown good performance using classical unsupervised machine learning techniques, recent advances in deep learning reveal a promising direction toward fast and effective fiber clustering. In this work, we propose a novel deep learning framework for white matter fiber clustering, Deep Fiber Clustering (DFC), which solves the unsupervised clustering problem as a self-supervised learning task with a domain-specific pretext task to predict pairwise fiber distances. This process learns a high-dimensional embedding feature representation for each fiber, regardless of the order of fiber points reconstructed during tractography. We design a novel network architecture that represents input fibers as point clouds and allows the incorporation of additional sources of input information from gray matter parcellation. Thus, DFC makes use of combined information about white matter fiber geometry and gray matter anatomy to improve the anatomical coherence of fiber clusters. In addition, DFC conducts outlier removal naturally by rejecting fibers with low cluster assignment probability. We evaluate DFC on three independently acquired cohorts, including data from 220 individuals across genders, ages (young and elderly adults), and different health conditions (healthy control and multiple neuropsychiatric disorders). We compare DFC to several state-of-the-art white matter fiber clustering algorithms. Experimental results demonstrate superior performance of DFC in terms of cluster compactness, generalization ability, anatomical coherence, and computational efficiency.


INTRODUCTION
Diffusion magnetic resonance imaging (dMRI) tractography is an advanced imaging technique that uniquely enables in vivo mapping of the brain's white matter connections at macro scale (Basser et al., 2000;Mori et al., 1999). Tractography enables quantitative analysis of the brain's structural connectivity in many applications such as neurological development, aging, and brain disease (Ciccarelli et al., 2008;Essayed et al., 2017;Piper et al., 2014;Yamada et al., 2009;Zhang et al., 2022b). However, when performing whole brain tractography, hundreds of thousands to millions of fibers (or streamlines) 1 are generated, which are not directly useful to clinicians or researchers. Therefore, to enable fiber tract quantification and visualization, it is essential to perform tractography parcellation where the massive number of tractography fibers is divided into multiple subdivisions (Zhang et al., 2022b).

Tractography parcellation methods
Two popular categories of tractography parcellation methods (O'Donnell et al., 2013;Zhang et al., 2022b) include cortical-parcellation-based methods that group fibers according to their endpoints in gray matter regions (Gong et al., 2009), and white matter fiber clustering methods that group fibers with similar geometric trajectories (Brun et al., 2004;Chekir et al., 2014;Garyfallidis et al., 2018;Guevara et al., 2012;Li et al., 2010;O'Donnell et al., 2013;Román et al., 2017;Siless et al., 2018;St-Onge et al., 2021;Tunç et al., 2014;Vázquez et al., 2020a;Wu et al., 2020;Yoo et al., 2015). Compared to cortical-parcellation-based methods, white matter fiber clustering methods can obtain more consistent parcellations across subjects (Sydnor et al., 2018;Zhang et al., 2018c) and demonstrate higher testretest reproducibility (Zhang et al., 2019b). White matter fiber clustering enables studies of the brain's white matter across the lifespan in health and disease (Cousineau et al., 2017;Ji et al., 2019;Maier-Hein et al., 2017;O'Donnell et al., 2017;Prasad et al., 2014;Tunç et al., 2016;Zekelman et al., 2022;Zhang et al., 2018a). White matter fiber clustering also enables the creation of tractography atlases and the study of white matter anatomy (Battocchio et al., 2022; 1 We note that the term "streamlines" is more technically correct to describe the digital reconstruction of biological white matter fibers obtained in tractography data, while the term "fibers" is also commonly used, in particular in the literature of white matter fiber clustering; therefore, to be consistent with the literature, we use "fibers" to refer to reconstructed fiber trajectories in the tractography data in this paper. Guevara et al., 2022Guevara et al., , 2020Levitt et al., 2021;O'Donnell et al., 2017;Lauren J. O'Donnell and Westin, 2007;Román et al., 2022Román et al., , 2021Román et al., , 2017Tunç et al., 2016Tunç et al., , 2014Vázquez et al., 2020a;Yeh et al., 2018;Zhang et al., 2018c). A popular strategy for the creation of white matter tractography atlases incorporates machine learning fiber clustering methods to automatically group streamlines, followed by expert neuroanatomical annotation of fiber clusters to define anatomical structures (Yeh et al., 2018) as well as false positive connections (Zhang et al., 2018c). The improvement of fiber clustering algorithms can enhance the depiction of understudied regions, such as the superficial white matter (Román et al., 2022;Xue et al., 2023) or the cerebellum (Zhang et al., 2018c), and can enable the automated study of very large datasets (Zhang et al., 2022a).
Many methods have been proposed for white matter fiber clustering (see (Zhang et al., 2022b) for a review of methods). Generally, fiber clustering methods compute distances between fibers and then group fibers into clusters using computational clustering methods. Several methods have been designed for rapid clustering of tractography from an individual subject, e.g., to create a compact representation of whole-brain tractography for further processing (Garyfallidis et al., 2016(Garyfallidis et al., , 2012Guevara et al., 2011;Vázquez et al., 2020b). For example, QuickBundles employs the minimum average direct-flip fiber distance with a fast linear-time clustering algorithm (Garyfallidis et al., 2012), while FFClust first clusters fiber points and then groups fibers into compact clusters with high efficiency (Vázquez et al., 2020b). Other fiber clustering methods cluster tractography from multiple subjects in a groupwise fashion (O'Donnell and Westin, 2005) to create population-based tractography atlases Lauren J. O'Donnell and Westin, 2007;Tunç et al., 2016Tunç et al., , 2014Zhang et al., 2018c). For example, WhiteMatterAnalysis uses the mean distance between pairs of closest fiber points to enable groupwise spectral clustering (Zhang et al., 2018c). Finally, other white matter fiber clustering methods use information from an anatomical parcellation of the brain. In an early approach, anatomical information from a white matter parcellation was used to guide the clustering of fiber tracts (Maddah et al., 2008). More recently, "connectivity-driven" fiber clustering is based on the connectivity of the voxels through which fibers pass (Tunç et al., 2016(Tunç et al., , 2014(Tunç et al., , 2013, and AnatomiCuts clusters fibers based on their position relative to anatomical regions (Siless et al., 2020(Siless et al., , 2018. Though existing white matter fiber clustering methods have shown good performance, several key challenges remain. First, it is computationally expensive to calculate all pairwise fiber similarities considering the large number of fibers in whole brain tractography. Second, the computed fiber similarities can be sensitive to the order of points along the fibers. This is a problem because a fiber can equivalently start from either end (Garyfallidis et al., 2012;Zhang et al., 2020). Third, false positive fibers are prevalent in tractography and outliers may exist in obtained fiber clusters (Legarreta et al., 2021;Maier-Hein et al., 2017). Therefore, outlier removal methods are needed to remove undesired fibers from cluster results. Fourth, current methods mostly use descriptions of either white matter fiber geometry (i.e., fiber point spatial coordinates (Brun et al., 2004;Chen et al., 2021;Corouge et al., 2004a;Garyfallidis et al., 2012;Ngattai Lam et al., 2018;Vázquez et al., 2020b;Zhang et al., 2018c) or gray matter anatomical parcellation (i.e., cortical and subcortical segmentations (Siless et al., 2018)) for fiber clustering.
It is a challenge to combine both white matter fiber geometry and gray matter anatomical parcellation information to improve the clustering results. Finally, it is important to identify cluster correspondences across subjects for group-wise analysis. To achieve this goal, some studies perform fiber clustering across subjects to form an atlas and predict clusters of new subjects with correspondence to the atlas (Lauren J. O'Donnell and Westin, 2007;Tunç et al., 2014;Zhang et al., 2018c), while other approaches first perform within-subject fiber clustering and then match the fiber clusters across subjects (Garyfallidis et al., 2012;Guevara et al., 2012;Huerta et al., 2020;Siless et al., 2020Siless et al., , 2018.

Unsupervised feature learning and clustering
In recent years, deep learning has demonstrated superior performance in computer vision tasks such as object classification, detection and segmentation Ronneberger et al., 2015;Simonyan and Zisserman, 2014). Deep-learning-based clustering has also been extensively studied as an unsupervised learning task (Károly et al., 2018). An intuitive way to perform unsupervised deep clustering is to extract feature embeddings with neural networks and then perform clustering on these embeddings to form clusters. The learned embeddings are highlevel representations of input data and have been shown to be informative for downstream tasks (Song et al., 2016), such as clustering (Tian et al., 2014;Xie et al., 2016). Auto-encoder networks are widely used to learn unsupervised feature embeddings because they do not require ground truth labels Xie et al., 2016). The representative work is the Deep Embedded Clustering framework, which performs simultaneous embedding of input data and cluster assignments in an end-to-end way (Xie et al., 2016). Deep Convolutional Embedded Clustering (DCEC) extends Deep Embedded Clustering from 1D feature vector clustering to 2D image clustering .
Another promising approach for learning feature embeddings is self-supervised learning, which is a subclass of unsupervised learning that shows advanced performance in many applications (Kolesnikov et al., 2019;van den Oord et al., 2018). Deep embeddings are obtained by designing a pretext task such as predicting context (Doersch et al., 2015) or image rotation (Komodakis and Gidaris, 2018) and generating pseudo labels from the input data to guide network training, without involving any manual annotations. The learned feature embeddings (usually referred to as the high-level feature representations) can then be transferred to downstream tasks such as clustering.
Recently, attempts have been made to apply supervised deep learning approaches for tractography segmentation (T. V. Gupta et al., 2017;Liu et al., 2019Liu et al., , 2022Ngattai Lam et al., 2018;Wasserthal et al., 2018;Xue et al., 2022;Xu et al., 2019;Zhang et al., 2020). In these studies, fibers from the whole brain are classified into anatomically meaningful fiber tracts based on labeled training datasets. To alleviate the requirement of ground truth labels, one recently proposed method (Xu et al., 2021) has shown the potential of using unsupervised deep learning for fiber clustering; however, it requires complex feature extraction procedures to generate inputs of the neural network. We proposed a novel unsupervised deep learning framework in our MICCAI work (Chen et al., 2021), where we adopted self-supervised learning to achieve fast and effective white matter fiber clustering. However, it also requires an extra step to generate inputs of the neural network (FiberMaps (Zhang et al., 2020)) from the fiber points.
In tractography data, each fiber is encoded as a set of points along its trajectory.
Therefore, it could be intuitive and efficient to represent and process fiber data as point clouds, which are an important geometric data format. In addition, each fiber could naturally be represented as a graph, where points are considered to be nodes. In these ways, original fiber point coordinates could be processed directly with point-based neural networks or Graph Neural Networks, which have demonstrated successful applications in geometric data processing Qi et al., 2017;Welling and Kipf, 2016). Another benefit of these representations for tractography data is that the point cloud or graph representation of a fiber is not sensitive to the point ordering along the fiber. In recent studies, fibers have been represented as point clouds for tractography-related supervised learning tasks (Astolfi et al., 2020;Chen et al., 2022;Logiraj et al., 2021;Xue et al., 2023), contributing to superior performance and high efficiency. In the computer vision community, unsupervised point cloud and graph clustering have been achieved in several studies by learning representations of inputs first and then performing traditional clustering on learned embeddings (Hassani and Haley, 2019;Tian et al., 2014). However, we have found no related work using point clouds or graphs for unsupervised white matter fiber clustering tasks yet.

Contributions
In this study, we propose a novel deep learning framework for fast and effective white matter fiber clustering. The whole framework is trained in an end-to-end way with fiber point coordinates as inputs and cluster assignments of fibers as outputs. Using a point cloud representation of input fibers, our framework learns deep embeddings by pretraining the neural network in a self-supervised manner and then fine-tunes the network in a self-training manner (Xie et al., 2016) with the task of updating cluster assignments. At the inference stage, the trained fiber clustering pipeline can be applied to parcellate independently acquired datasets. This paper has five contributions. First, input fibers are represented as point clouds, which are compact representations and improve efficiency via adopting point-based neural networks. Second, self-supervised learning is adopted in our pipeline with a designed pretext task to obtain feature embeddings insensitive to fiber point ordering for input fibers, enabling subsequent clustering. Third, white matter fiber geometric information and gray matter anatomical parcellation information are combined in the proposed framework to obtain spatially compact and anatomically coherent clusters. Fourth, outliers are removed after cluster assignment by rejecting fibers with low soft label assignment probabilities. Fifth, our approach automatically creates a multi-subject fiber cluster atlas that is applied for white matter parcellation of new subjects.
The preliminary version of this work, referred to as DFC conf , was published in MICCAI 2021 (Chen et al., 2021). In this paper, we extend our previous work by: 1) adopting a new fiber representation (i.e., point cloud), with a comprehensive evaluation of different representations of tractography data including point clouds, graphs, and images; 2) adding cortical surface parcellation information in addition to anatomical region information to further improve cluster anatomical coherence; 3) a new cluster-adaptive outlier removal process to filter anatomically implausible fibers while maintaining good generalization across subjects; and 4) demonstrating the robustness of our method on additional datasets with different acquisitions, ages, and health conditions.

METHODS
The overall pipeline of DFC is shown in Fig. 1. The training process includes two stages: pretraining and clustering. In the pretraining stage, neural networks are trained to perform a selfsupervised pretext task and obtain feature embeddings of a pair of input fibers (point clouds), followed by k-means clustering (Likas et al., 2003) to obtain initial clusters. In the clustering stage, based on the neural network initialized in the pretraining stage, clustering results are finetuned via a self-training manner (Xie et al., 2016). This process is done by adding a clustering layer (see details in Section 2.3) where cluster assignment probabilities are calculated from the distances between feature embeddings and cluster centroids. In this way, for each input fiber, the output is a probability vector with a dimension of the number of clusters. During inference, for each fiber represented as a point cloud, an embedding is predicted by the trained neural network, and the fiber is assigned to the closest cluster by calculating the distances between its embedding and all cluster centroids. By performing cluster assignments with the trained neural network, our method automatically achieves cluster correspondence across subjects. Fig. 1. Overview of our DFC framework. A self-supervised learning strategy is adopted with the pretext task of pairwise fiber distance prediction. In the pretraining stage, input fibers are encoded as embeddings with the Siamese Networks. K-means clustering is then performed on the obtained embeddings to generate initial cluster centroids. In the clustering stage, based on the neural network of the pretraining stage, a clustering layer is connected to the embedding layer and generates soft label assignment probabilities q (as shown in the orange dashed box). During training, a prediction loss (L p ) and a KL divergence loss (L c ) are combined for network optimization. During inference, an input fiber is assigned to cluster c with the maximum soft label assignment probability, which is calculated from the trained neural network. (n p : number of points; n e : dimension of embeddings; n c : number of clusters)

Input fiber geometry and anatomical information
In this work, we adopt point clouds as representations of fibers. Considering that the neighborhood relationship among points along a fiber could provide contextual information for clustering, we adopt the Dynamic Graph Convolutional Neural Network (DGCNN) model (Wang et al., 2019). The DGCNN model contains an edge feature engineering module, EdgeConv, which was proposed to capture the local geometric structure formed by points and their neighbors. In a similar way to DGCNN, a graph is constructed for each fiber with nodes representing fiber points and edges built between nearby points along the fiber (as illustrated in Fig. 2). Considering that fiber points are distributed along a fiber, we construct a graph with edges connecting each set of k (k=4 in this study) nearest points along a fiber (instead of edges connecting k nearby points based on Euclidean distance as in the original DGCNN method) (Astolfi et al., 2020). We note that while the graph structure for all fibers is the same, the node features (spatial coordinates of fiber points) of each graph are different so that fibers belonging to different clusters can be distinguished. The inputs to the DGCNN model are point clouds with dimension n p × 3, where n p is the number of fiber points and 3 is the number of spatial coordinates of fiber points. To provide anatomical context to improve performance at the fiber clustering stage (Section 2.3), we augment the white matter fiber geometry information with gray matter anatomical parcellation information. This information includes anatomical regions and cortical parcellations obtained from Freesurfer (Fischl, 2012) using the Desikan-Killiany Atlas (Desikan et al., 2006). To describe the anatomical regions through which each fiber passes, each point in a fiber is assigned the label of the anatomical region it intersects. Similarly, fiber endpoints are associated with the cortical parcellation label of the closest point on the cortical surface.

Pretraining with self-supervised deep embedding
In the pretraining stage, we propose a novel self-supervised learning approach to obtain deep embeddings of fibers. A pretext task is designed to obtain pairs of embeddings with distances similar to their corresponding fiber distances, enabling subsequent clustering in embedding space. Specifically, the pretext task is to predict the distance between a pair of input fibers, where their self-supervised pseudo label is given as the pairwise fiber distance between their pointwise spatial coordinates. To calculate the fiber distance, we adopt the minimum average direct-flip distance, which is widely applied in white matter fiber clustering (Garyfallidis et al., 2012;Zhang et al., 2018c). This fiber distance considers the order of points along the fibers, and it remains the same when a fiber is equivalently represented starting from either endpoint. With fiber distances as pseudo labels, the network is guided to produce similar embeddings for similar fibers, even in the presence of flipped fiber point orderings.
To perform the pretext task of fiber distance prediction, we adopt a Siamese Network (Chopra et al., 2005), which has two subnetworks with shared weights. Generally, a pair of inputs is put into the subnetworks, respectively, and a pair of deep embeddings is generated from the subnetworks. In this work, a pair of fibers (point cloud sets) is used as the input to the pointcloud-based neural network. We employ DGCNNs as subnetworks of the Siamese Network to obtain feature embeddings. Each DGCNN subnetwork is composed of 5 EdgeConv layers followed by 3 fully connected layers. The subnetworks output a pair of deep embeddings corresponding to the input pair.
In the general use of Siamese Network, a fully connected layer follows the subnetworks and outputs a similarity score. In our work, we replace the last fully connected layer with a direct calculation of the pairwise Euclidean distance between the learned deep embeddings. The mean squared error between the predicted distance and fiber distance (pseudo label) is calculated as the distance prediction loss L p .

Clustering integrating anatomical information
After the pretraining stage, the weights of the Siamese Network are initialized with the pretrained weights, and initial clusters are obtained by performing k-means clustering (Likas et al., 2003) on the generated embeddings. The clustering stage of our method is developed from the Deep Convolutional Embedded Clustering model . Following the DGCNN subnetwork, a clustering layer is designed to encapsulate cluster centroids as its trainable weights and compute soft label assignment probabilities q ij using Student's t-distribution (Maaten L. V. D, 2008): where z i is the embedding of fiber i and µ j is the centroid of cluster j (note that z i and µ j have the same dimensionality). q ij is the probability of assigning fiber i to cluster j. The network is trained in a self-training manner and its clustering loss L c is defined as a KL divergence loss (Xie et al., 2016). The distance prediction loss is retained in the clustering stage, and the total loss is L = L p + λL c , where λ is the weight of L c . During inference, a fiber i is assigned to the cluster with the maximum q ij referred to as q m .
We improve the clustering stage described above by incorporating gray matter anatomical parcellation information into the neural network. We design a new soft label assignment probability definition, which extends (1) to encourage grouping of fibers that pass through the same anatomical regions and cortical parcels: ( 2) where is the Dice score between the set of anatomical regions passed through by fiber i and those passed through by cluster j. To define this set of anatomical regions, we use the tract anatomical profile method that includes regions intersected by over 40% of fibers as in (Garyfallidis et al., 2012;Zhang et al., 2018c). Similarly, quantifies the agreement between the set of cortical regions intersected by the endpoints of fiber i and those intersected by the endpoints of cluster j. is defined as the percentage of endpoints in cluster j that are within the cortical regions intersected by the endpoints of fiber i. Analogous to the tract anatomical profile, we propose to call the percentage of endpoints within each intersected cortical region the tract surface profile. During training, the tract anatomical profile and tract surface profile are initially calculated from the clusters generated by k-means and updated iteratively with new predictions during the clustering stage. During inference, soft label assignments are calculated using (2).

Cluster-adaptive outlier removal
After initial clustering, outlier fibers may have distinctly different position and shape from most fibers in the cluster, and we empirically found that these outliers often exist in obtained clusters. Therefore, outlier removal is an essential step to filter anatomically implausible fibers (Astolfi et al., 2020;Guevara et al., 2011;Legarreta et al., 2021;Mendoza et al., 2021;Zhang et al., 2018c). In our previous work (Chen et al., 2021), we removed outliers by directly rejecting fibers with a label assignment probability q m lower than an absolute threshold.
This method could potentially remove plausible fibers, as it ignores the variability of q m across clusters with different anatomy.
Therefore, we propose a novel cluster-adaptive outlier removal method. It is also based on the maximum label assignment probability q m , considering that fibers with higher q m tend to have higher confidence of belonging to the corresponding clusters and are thus less likely to be outliers. In our proposed method, fibers with low soft label assignment probabilities are removed based on a cluster-specific threshold, rather than an absolute threshold across all clusters.
Specifically, for each cluster (c), we calculate the mean (m c ) and the standard deviation (s c ) of the label assignment probabilities of all fibers assigned to this cluster. Then, a threshold is computed as T c = m c -n * s c such that any fiber with a label assignment probability lower than T c is removed (where n is a hyperparameter that controls the quantity of removed outlier fibers).
The above threshold computation process is commonly used for outlier data detection (Dave and Varma, 2014), and a similar approach was effective in a previous work for fiber clustering (Zhang et al., 2018c).

Implementation details
In the pretraining and clustering stages, our model is trained for 50k iterations with a learning rate of 1e-4 and another 1k iterations with a learning rate of 1e-5. The batch size of training is 1024 and Adam (Kingma and Ba, 2014) is used for optimization. All methods were tested on a computer equipped with a 2.1 GHz Intel Xeon E5 CPU (8 DIMMs; 32 GB Memory) and an NVIDIA RTX 2080Ti GPU. Deep learning methods were implemented with Pytorch (v1.7.1) (Paszke et al., 2019). The weight of clustering loss λ was set to be 0.1, as suggested in . The source code and the trained model will be made available at https://github.com/SlicerDMRI/DFC.

Experimental datasets and preprocessing
We used dMRI data from three datasets that were independently acquired from different populations using different imaging protocols and scanners, as shown in For each subject, whole-brain tractography was performed using a two-tensor unscented Kalman filter method (Malcolm et al., 2010;Reddy and Rathi, 2016). Fibers shorter than 40 mm were removed to avoid any bias toward implausible short fibers (Guevara et al., 2012;Jin et al., 2014). The average numbers of fibers per subject obtained with the whole-brain tractography were approximately 490,000 for the HCP dataset, 950,000 for the PPMI dataset, and 880,000 for the CNP dataset. All tractography data were co-registered using a tractography-based registration method (O'Donnell et al., 2012). In order to obtain gray matter anatomical parcellation information (i.e., the anatomical regions each fiber passed through and the cortical regions each fiber connected to), we performed Freesurfer parcellation (Fischl, 2012) on the T1w data, which was then registered to the dMRI data. (Note: for HCP data, we used the provided FreeSurfer parcellation that had been co-registered with the dMRI data; for the CNP and PPMI data, we performed a non-linear registration using ANTs (Avants et al., 2009).) During model training, 10,000 fibers were randomly selected from each of the 100 training subjects, generating a training dataset of 1 million fiber samples. During the pretraining stage, each fiber sample was paired with another randomly selected sample other than itself, generating 1 million fiber pairs, to learn the embedding features. During the training stage, the training dataset was parcellated into 800 clusters, resulting in an average of 1250 fibers in each cluster. Then, the trained model was applied to the whole-brain tractography of each testing subject for subject-specific white matter fiber clustering. For fast and efficient processing of the large number of fiber samples during model training and inference, fibers were downsampled to n p points before being input into the network. In this study, we set n p as 14 because this number enables good performance with relatively low computational costs in terms of inference time and memory usage (for details see Supplementary Material 1). All anatomical region labels (from all fiber points) were preserved for input into the network without any downsampling.

Experimental metrics
We adopted four metrics to quantitatively evaluate white matter fiber clustering results.
These metrics enable evaluation of the quality of a white matter tractography parcellation from several perspectives.

Davies-Bouldin (DB) index. DB index is a commonly used metric in unsupervised
clustering tasks (Xu and Tian, 2015), and it has been recently adopted for fiber clustering evaluation (Vázquez et al., 2020b). It simultaneously measures within-cluster scatter and between-cluster separation, as the ratio of intra-and inter-cluster fiber distances , where n is the number of clusters, α i and α j are mean intra-cluster distances, and d(c i , c j ) is inter-cluster distance (minimum average direct-flip distance between centroids c i and c j of cluster i and j, where the centroid is defined as the fiber with minimum average distance to all other fibers in the cluster). A smaller DB index indicates better separation between clusters.

White Matter Parcellation Generalization (WMPG). WMPG measures the percentage of
successfully detected clusters in an individual subject (Zhang et al., 2018c). Clusters with over 20 fibers are considered to be successfully detected (Zhang et al., 2018c).

Tract Anatomical Profile Coherence (TAPC).
This metric measures if fibers within the same cluster pass through the same anatomical regions (Zhang et al., 2018c). It is calculated as the Dice score between each fiber's intersected anatomical regions and its assigned cluster's anatomical regions (i.e. the tract anatomical profile of the cluster (Section 2.3)), where a high value suggests a high anatomical region coherence of the cluster. The TAPC of a cluster is calculated as the mean of Dice scores across all fibers within the cluster, and the TAPC score of a subject is computed as the mean TAPC of all clusters.
Tract Surface Profile Coherence (TSPC). We propose a new metric, TSPC, to evaluate the coherence of cortical terminations of fibers within a cluster. The TSPC is defined as the average tract surface profile (Section 2.3) across the cortical regions intersected by fiber endpoints within the cluster. A higher TSPC indicates that fibers within a cluster terminate in a smaller set of cortical parcels. The TSPC of a subject is computed as the mean TSPC of all clusters.

Experiments and results
We performed five experimental evaluations, including 1) comparison to state-of-the-art 2) comparison to baseline Deep Convolutional Embedded Clustering, 3) ablation study, 4) evaluation of input representations and network architectures, and 5) evaluation of outlier fiber removal. Experiment results of 1) and 2) are reported using the three testing datasets and those of 3), 4) and 5) are reported using the HCP testing dataset.

Comparison to state-of-the-art methods
We compared the proposed DFC with three state-of-the-art methods: WhiteMatterAnalysis (Zhang et al., 2018c), QuickBundles (Garyfallidis et al., 2012) and DFC conf (Chen et al., 2021). WhiteMatterAnalysis is an atlas-based white matter fiber clustering method that shows good performance and strong correspondence across subjects. QuickBundles is a widely used white matter fiber clustering method that performs clustering within each subject and achieves group correspondence with post-processing steps. We used open-source software packages WhiteMatterAnalysis v0.3.0 (github.com/SlicerDMRI/whitematteranalysis) and Dipy v1.3.0 (dipy.org) with default settings to implement WhiteMatterAnalysis and QuickBundles, respectively. DFC conf is the preliminary version of this work that adopts FiberMap (Zhang et al., 2020), which is a 2D multi-channel feature descriptor that encodes spatial coordinates of points along each fiber, as representation of input fibers. Cluster correspondence across subjects is automatically generated by DFC and DFC conf . For each method, we performed white matter fiber clustering to output 800 clusters, which has been suggested to be a good whole brain tractography parcellation scale (Wu et al., 2021;Zhang et al., 2018c Guevara et al., 2022;Zhang et al., 2018b).) In general, DFC, DFC conf and WhiteMatterAnalysis obtain visually similar clusters, while the clusters from DFC appear to be more compact and anatomically reasonable than those from the other methods. QuickBundles tends to include some apparent outlier fibers. Fig. 3 gives a visualization of three example clusters and their connected FreeSurfer regions. The clusters from DFC are more anatomically coherent, connecting to the same cortical regions. In addition to the visualization of clusters from individual subjects, we also provide a visual comparison of population-wise clusters to demonstrate the methods' performance for tractography atlas creation. To do so, we compare the DFC and WhiteMatterAnalysis methods, which are explicitly designed to perform groupwise clustering to create tractography atlases. For DFC, the population-wise atlas is derived from our training process, where fiber clusters from the training subjects are formed. For WhiteMatterAnalysis, we use the anatomically curated white matter atlas created using WhiteMatterAnalysis (Zhang et al., 2018c). Fig. 4 gives a visual comparison of results from DFC and WhiteMatterAnalysis. Example clusters are shown in regions of the arcuate fasciculus, corpus callosum and superficial fronto-parietal tracts. It can be seen that the DFC method obtains population-wise clusters that are more separated and compact, where cluster subdivisions better respect terminating anatomical regions. We also compared the execution time and memory usage of each comparison algorithm during inference for various data sizes. This experiment was performed on a computer equipped with a 2.1 GHz Intel Xeon E5 CPU (8 DIMMs; 264 GB Memory) and an NVIDIA RTX 1080Ti GPU. Testing datasets were obtained by downsampling densely seeded tractography from one example HCP subject to produce datasets of 250,000, 500,000, 750,000, and 1,000,000 fibers (streamlines). As shown in Table 3, it is apparent that both execution time and memory usage increase with increasing data size. For all data sizes, DFC and DFC conf are the most efficient due to the use of GPU computation. QuickBundles is also computationally efficient. DFC, DFC conf , and QuickBundles show comparably low memory usage. WhiteMatterAnalysis shows a much longer execution time and larger memory usage than other comparison methods, e.g., 55 GB for 1,000,000 fibers, due to the expensive pairwise fiber similarity computation between the subject and atlas tractography data. These results in general demonstrate the high efficiency and low computation cost of the proposed DFC method.

Comparison to Deep Convolutional Embedded Clustering baseline
We compare the proposed DFC method with the DCEC baseline method, which is a widely used auto-encoder model for unsupervised clustering in computer vision . The inputs of DCEC are expected to be images, and thus we used FiberMap (Zhang et al., 2020) to represent input fibers as images (Chen et al., 2021). Hyperparameters in DCEC were optimized to obtain the best performance.
As shown in Table 3, DFC has obviously improved performance in terms of DB index, TAPC and TSPC, while DCEC has a slightly higher WMPG score (attributed to the lack of outlier removal in DCEC). It is worth noting that, for the DB index, the baseline DCEC obtained an exceptionally large score due to its sensitivity to point order along fibers. Fig. 5

Ablation study
We performed an ablation study to investigate how different modules in the proposed DFC method influence white matter fiber clustering performance. Evaluation of four models was performed, including DFC no−roi&cor&ro (DFC without anatomical region, cortical parcellation or outlier removal), DFC no−cor&ro (DFC without cortical parcellation or outlier removal but with anatomical region), DFC no−ro (DFC without outlier removal but with anatomical region and cortical parcellation) and our proposed DFC method.
As shown in

Comparison of input representations
We compared three kinds of representations for tractography fibers, i.e., FiberMap, graph and point cloud. For each representation, neural networks that can effectively process the input were used: Convolutional Neural Networks (CNNs) for FiberMap, Graph Convolutional Networks (GCNs) for graphs, and DGCNNs (proposed) for point clouds. The FiberMap input was introduced in Section 3.3.1, and more details can be found in (Zhang et al., 2019a). For the graph input, a fiber (streamline) was naturally regarded as a graph, with points as nodes and edges constructed between adjacent points, analogous to traditional graph construction for meshes in computer vision (Pfaff et al., 2021). The point cloud input was described in Section 2.1. For each input representation and its network, the proposed self-supervised learning pipeline was applied to generate clusters, followed by the proposed outlier removal process. For a fair comparison, we adjusted the threshold in each method so that they removed approximately the same number of fibers.
As shown in Table 5

Comparison of outlier removal methods
We provide a visual comparison between two outlier removal strategies: RO absolute that adopts an absolute removal threshold for all clusters (proposed in our conference paper version), and RO adaptive that adopts a cluster-adaptive threshold (proposed in the present work). For RO absolute , the threshold was set to 0.045 so that it removed a similar percentage of fibers as RO adaptive (0.2626 and 0.2571, respectively).
As shown in Fig. 6, the results of RO adaptive are more anatomically plausible, while the compared RO absolute method tends to be overly strict (Fig. 6a) or not properly reject apparent outlier fibers (Fig. 6b).

DISCUSSION
In this work, we proposed a novel end-to-end unsupervised deep learning framework, DFC, for fast and effective white matter fiber clustering. Our clustering method leverages not only white matter fiber geometry information but also gray matter anatomical parcellation information. Our pipeline adopts the self-supervised learning strategy to learn deep embeddings for unsupervised fiber clustering. Many pretext tasks, such as predicting context (Doersch et al., 2015) or image rotation (Komodakis and Gidaris, 2018), have been proposed in the computer vision community (Chen et al., 2020;Liu et al., 2021;Zhang et al., 2016). For medical image computing tasks, novel pretext tasks are designed by harnessing knowledge from the medical domain instead of directly adopting pre-designed pretext tasks from the computer vision field (Matzkin et al., 2020;Shurrab and Duwairi, 2022;Spitzer et al., 2018;. In our DFC framework, we designed the pretext task of fiber distance prediction to obtain embeddings for subsequent clustering. The minimum average direct-flip distance adopted in our study can be easily replaced with other fiber distance measures of interest such as the mean closest point fiber distance (L. J.  or Hausdorff distance (Corouge et al., 2004b). The pretext task leverages domain-specific knowledge of fiber distance, which can provide the following advantages. First, the general idea of white matter fiber clustering is to group fibers with low pairwise distances into the same group. By solving the pretext task of fiber distance prediction, our pipeline obtains embeddings whose pairwise distances are consistent with their corresponding fibers and thus benefits the performance of white matter fiber clustering. Second, the proposed self-supervised learning strategy could guide the network to learn similar embeddings for spatially close fibers regardless of their fiber point orderings, enabling them to be grouped into the same cluster. This gives our method an advantage over the widely used auto-encoder based models Xie et al., 2016), which are sensitive to fiber point ordering because they learn embeddings by reconstructing the input itself.
We proposed a novel framework that enables combined use of white matter fiber geometry and gray matter anatomical parcellation information in white matter fiber clustering.
Most current white matter fiber clustering methods group fibers into bundles by calculating the similarity of fibers based on their coordinates in Euclidean space (Garyfallidis et al., 2012;Vázquez et al., 2020b). On the other hand, a recent study performed white matter fiber clustering based on the brain anatomical structures each fiber passes through instead of fiber spatial coordinates (Siless et al., 2018). Therefore, either source of information could make contributions to the white matter fiber clustering task. In our method, we perform clustering leveraging both sources of information, including the spatial coordinates of fibers and gray matter anatomical parcellation information, to help identify anatomically meaningful clusters.
The results show that integrating gray matter anatomical parcellation information clearly improved the anatomical coherence within clusters. Therefore, anatomical parcellation information provides useful complementary information to fiber geometric information.
However, we only investigated the performance of the Desikan-Killiany parcellation (Desikan et al., 2006). A finer parcellation that provides more detailed information, such as that defined in more recently proposed atlases (Destrieux et al., 2010;Glasser et al., 2016), may be more beneficial to clustering performance. Our method shows the potential of combining multiple sources of information to improve white matter fiber clustering.
The representation of tractography data for deep learning is an open challenge for tractography-related tasks. Previous studies performed tractography segmentation by working on 3D volumes instead of the tractography data (Liu et al., 2022;Lu et al., 2020;Wasserthal et al., 2018), but this neglects subject-specific fiber tractography information. Recently, FiberMap was proposed to represent a fiber as a 2D image (Zhang et al., 2020(Zhang et al., , 2019a, a sparse representation of fibers that needs an extra step to generate. In our work, we used point clouds to represent fibers. Point clouds are compact representations of the original fiber points and enable end-toend learning of the neural network. In addition, point-based models are permutation invariant to input points and thus insensitive to point ordering along fibers. By representing fibers as point clouds, we adopted point-based neural networks, which show good clustering performance as well as efficiency. In this study, we propose a simple but effective outlier removal strategy to filter anatomically implausible fibers and improve white matter fiber clustering performance. Our strategy is rapid, as it simply rejects outlier fibers with low cluster assignment probabilities, without any added computational burden of fiber distance computations (Zhang et al., 2018c) or convex optimization (Daducci et al., 2015). However, our simple strategy is only able to remove fibers that do not correspond well to a cluster. We expect that a combination of outlier removal methods may have the best performance for reducing the well-known impact of outliers on fiber tractography (Drakesmith et al., 2015).
Limitations and potential future directions of the current work are as follows. First, our proposed pipeline only combines two sources of information, i.e., white matter fiber geometry and gray matter anatomical parcellation information, to achieve white matter fiber clustering. It is worth investigating incorporating additional sources of information such as functional MRI to obtain functionally meaningful clusters. Future work could also investigate more advanced neural networks and other self-supervised learning strategies such as contrastive learning (Chen et al., 2020) to potentially obtain better clustering results.

CONCLUSION
In this paper, we present a novel end-to-end unsupervised deep learning framework for white matter fiber clustering. We adopt the self-supervised learning strategy to enable joint deep embedding and cluster assignment. Our method can handle several key challenges in white matter fiber clustering methods including improving implementation efficiency, handling flipped order of points along fibers, combining fiber geometric and anatomical information, filtering anatomically implausible fibers and inter-subject correspondence of fiber clusters. Experimental results show that our proposed method achieves fast and effective white matter fiber clustering and demonstrates advantages over state-of-the-art algorithms in terms of clustering performance as well as efficiency.

Data and code availability
The data used in this project is from three datasets, Human Connectome Project (HCP),

Declaration of competing interest
None.