CoverBLIP: accelerated and scalable iterative matched-filtering for Magnetic Resonance Fingerprint reconstruction

Current popular methods for Magnetic Resonance Fingerprint (MRF) recovery are bottlenecked by the heavy computations of a matched-filtering step due to the growing size and complexity of the fingerprint dictionaries in multi-parametric quantitative MRI applications. We address this shortcoming by arranging dictionary atoms in the form of cover tree structures and adopt the corresponding fast approximate nearest neighbour searches to accelerate matched-filtering. For datasets belonging to smooth low-dimensional manifolds cover trees offer search complexities logarithmic in terms of data population. With this motivation we propose an iterative reconstruction algorithm, named CoverBLIP, to address large-size MRF problems where the fingerprint dictionary i.e. discrete manifold of Bloch responses, encodes several intrinsic NMR parameters. We study different forms of convergence for this algorithm and we show that provided with a notion of embedding, the inexact and non-convex iterations of CoverBLIP linearly convergence toward a near-global solution with the same order of accuracy as using exact brute-force searches. Our further examinations on both synthetic and real-world datasets and using different sampling strategies, indicates between 2 to 3 orders of magnitude reduction in total search computations. Cover trees are robust against the curse-of-dimensionality and therefore CoverBLIP provides a notion of scalability -- a consistent gain in time-accuracy performance-- for searching high-dimensional atoms which may not be easily preprocessed (i.e. for dimensionality reduction) due to the increasing degrees of non-linearities appearing in the emerging multi-parametric MRF dictionaries.

1. Introduction. Quantitative Magnetic Resonance Imaging (Q-MRI) provides a powerful tool for measuring various intrinsic NMR properties of tissues such as the T 1, T 2 and T 2 * relaxation times, field inhomogeneity, diffusion and perfusion [70]. As opposed to mainstream qualitative assessments these absolute physical quantities can be used for tissue or pathology identification independent of the scanner or scanning sequences. Despite being the long-standing goal of the MRI community, current quantitative approaches are extremely time-inefficient and for this reason not clinically applicable. Standard Q-MRI approaches (e.g. [53,44,45,27]) acquire a large sequence of images in different times and use curve-fitting tools for parameter estimation in each voxel. This procedure runs separately to estimate each parameter. The long process of acquiring multiple fully sampled images brings serious limitation to the conventional Q-MRI approaches to apply within a reasonable time and with an acceptable signal-to-noise ratio (SNR) and resolution.
Recently Magnetic Resonance Fingerprinting (MRF) has emerged to address this shortcoming and significantly accelerate the acquisition time of Q-MRI e.g. within a couple of seconds [54]. Three key principles are behind this new paradigm: i) applying one excitation sequence (i.e. in one acquisition run) that simultaneously encodes many quantitative parameters of interest, ii) incorporating more complicated (and sometimes random) but shorter excitation patterns than those used in conventional Q-MRI schemes, and finally iii) significant under-sampling of the k-space data at each temporal frame. Since the seminal work of Ma et al. [54] for joint quantification of T 1, T 2 relaxation times and the off-resonance frequency B0, several follow-up studies have successfully extended the MRF framework to measure multitude of additional quantitative parameters such as T 2 * , perfusion, diffusion and microvascular properties [74,64,66,75,76,52]. The aggressively short acquisition times used in this framework, on the other hand, introduce several algorithmic challenges at the parameter estimation stage of the MRF reconstruction problem. Common approaches adopt a physical model to disambiguate the lack of sufficient spatio-temporal measurements in such a highly ill-posed inverse problem. However, for complicated excitation patterns used in the MRF acquisitions, the (temporal) magnetic responses encoding the quantitative information are no longer following an analytic (e.g. complex exponential) model and they rather require solving Bloch differential equations [48]. The MRF framework proposes to discretize the parameter space and exhaustively simulate a large dictionary of magnetic responses (fingerprints) for all combinations of the quantized NMR parameters. This dictionary is then used for matched-filtering in many model-based reconstruction routines (see e.g. [54,49,56,26,78,4]). As occurs to any multi-parametric manifold enumeration, the main drawback of such approach is the size of this dictionary which grows exponentially in terms of the number of parameters and their quantization resolution. This brings a serious (scalability) limitation to the current popular schemes to be applicable in the emerging multi-parametric MRF problems, as the computational complexity of exact matched-filtering using brute-force searches grows linearly with the dictionary size.
To address this shortcoming we propose an iterative reconstruction method with inexact updates dubbed as Cover BLoch response Iterative Projection (CoverBLIP). Our algorithm accelerates matched-filtering steps by replacing iterative brute-force searches with fast Approximate Nearest Neighbour Searches (ANNS) based on cover tree structures constructed off-line for a given MRF dictionary. For datasets living on smooth manifolds with low intrinsic dimension (e.g. a constant number of NMR characteristics) cover tree approximate searches are shown to have logarithmic complexity in terms of data populations [12]. Under an embedding assumption similar to the restricted isometry property in compressed sensing theory [29,18,8,7,17,58,13,38], we show that CoverBLIP iterations are able to correct the inexact updates and achieve a linear global convergence i.e. stable signal recovery. We also introduce an adaptive step-size scheme that guarantees (local) monotone convergence of CoverBLIP in general cases e.g. when the embedding assumption does not hold. The results provided in this part apply beyond the customized MRF problem considered in this paper. We examine the reconstruction time-accuracy of the proposed method on both synthetic and in-vivo MRF datasets with different excitation sequences and k-space sampling patterns. Our experimental results indicate superiority of CoverBLIP compared to other tested baselines. Notably, CoverBLIP achieves 2-3 orders of magnitude acceleration in conducting matched-filtering while maintaining a similar accuracy as compared to using exact iterations with brute-force searches. Unlike non-scalable fast search algorithms such as KDtrees, we show that CoverBLIP maintains this superior performance when no dimensionalityreduction preprocessing is used. This feature of robustness against the high-dimensionality of search spaces makes CoverBLIP a well-suited candidate to tackle multi-parametric MRF applications with increased non-linear dynamic complexity, where applying common subspace compression preprocessing becomes prohibitive for their unfavourable compromise in the final estimation accuracy.
The rest of this paper is organized as follows: in Section 2 we briefly review current popular methods for MRF reconstruction. Section 3 formulates the MRF inverse problem and discuss an iterative model-based reconstruction framework for solving this problem. In Section 4 we provide an introduction to cover trees and their corresponding approximate fast search algorithm. We highlight a few complexity results useful for this work and finally we present our proposed CoverBLIP algorithm for accelerated MRF reconstruction. Section 5 is dedicated to the proof of convergence and stability of CoverBLIP. In Section 6 we present our experimental results comparing the performance of CoverBLIP against other baselines, and finally in Section 7 we conclude this paper and discuss possible future directions.
2. Related works. In this part we review a few algorithmic solutions proposed for solving the MRF problem. In their original paper Ma et al. [54] proposed a non-iterative reconstruction scheme which consists of Fourier back-projections for all temporal slices followed by a dictionary matching step where each high-dimensional temporal voxel is compared against the atoms of a MRF dictionary. The NMR parameters corresponding to the fingerprint with highest correlation is reported for that voxel. After Fourier back-projections (of the undersampled k-space data), one obtains highly corrupted images suffering from aliasing artefacts. The main drawback of the non-iterative Template Matching (TM) approach is to ignore this specific (aliasing) noise structure which as shown in [26] can lead to poor estimation accuracies for short acquisition sequences. On the other hand researches in the area of compressed sensing have demonstrated the efficiency of Iterative Projected Gradient (IPG) schemes for solving model-based inverse problems [25,9,14]. Adopted from compressed sensing literature Davies et al. [26] proposed an IPG type algorithm for the MRF problem and show that iterations (which require repeated application of the TM at each iteration for projection) indeed improve parameter estimation in low data/SNR regimes.
The large size of the MRF dictionary is however a big challenge for the runtime of matchedfiltering step(s) based on brute-force searches in both approaches. This issue has been addressed in two ways. First, it has been proposed to reduce the (temporal) ambient search dimension using a few SVD bases of the MRF dictionary [56]. The main drawback of this approach is that the MRF dictionaries contain highly non-linear structures (a low-dimensional manifold of solutions of the Bloch equations) and therefore applying a linear subspace compression trades-off the computation time against the final accuracy of the reconstructed parameters. For instance, the Steady State Precession (FISP) sequence [49] which encodes two NMR parameters (i.e. T 1, T 2 relaxation times) required 20 principal components for representing the corresponding search space to a reasonable accuracy, whereas an Inversion Recovery Balanced SSFP (IR-BSSFP) dictionary encoding an additional parameter (i.e. off-resonance frequency B0) cannot be accurately compressed with less than 200 components [56]. One can imagine with the rise in applications encoding a larger number of parameters associated with the non-linear dynamics such as T 2 * , perfusion, diffusion and microvascular properties, etc [74,64,66,75,76,52] this issue will get worse i.e. an exponential growth in the dictionary size without a good low-dimensional subspace representation.
A second approach incorporates hierarchical clustering for implementing fast searches over the dictionary [19] however it suffers from the limited accuracy of using a single step (non-iterative) matched-filtering. KD-tree searches has been proposed to accelerate matchedfiltering steps within an iterative reconstruction scheme [21]. However KD-trees are known to be non-scalable and crucially dependent on a dimensionality-reduction preprocessing step. This preprocessing might be expensive in general cases but more particular to the MRF problem, the use of the SVD subspace compression scheme (as proposed in [21]) introduces an unfavourable compromise between the accuracy and gain in acceleration, as discussed above. Besides, the suggested inexact updates -based on a small fixed number of arithmetic operations (referred to as KD-tree checks), but unbounded search precision, per iteration of IPG -are heuristic and not known to provide a convergence guarantee (see discussions in e.g. [38] for approximation-tolerant iterations).
We propose an accelerated model-based reconstruction algorithm, CoverBLIP, with consistent performance in high-dimensional search spaces. Cover tree structures provide an important feature of robustness against the curse-of-dimensionality [51,12]. For low (intrinsic) dimensional manifold data they have provable sub-linear search complexities and in addition we show that using such approximations within an iterative scheme can still result in monotone convergence (in general) and stable global reconstruction, under an embedding assumption. While we experimentally see that the KD-tree and cover tree searches are comparably efficient in small dimensions, the KD-tree performance scales considerably less effective to larger ambient dimensions compared to the achievable time-accuracy of the cover tree search.
Finally, we would like to mention a number of schemes which propose to incorporate additional low rank priors motivated by the high spatio-temporal correlations of the MRF data [77,78,55,28,4]. Most of these schemes are however validated on sequences encoding two (or three in [55]) NMR parameters. As discussed above a linear subspace i.e. low rank model will not scale to multi-parametric MRF problems with increased degree of non-linearities. Moreover schemes based on singular values thresholding [77,55,28] require intensive data factorization computations at each iteration. Zhao et al. [78] proposed to reconstruct images in the pre-calculated SVD subspace of the MRF dictionary and cascade the results to a TM step for parameter estimation. CoverBLIP with a temporal compression option does the same low rank subspace reconstruction however with the benefit of faster approximate searches whose total (iterative) cost is less than a single iterated brute-force search in TM (see e.g. Figure 4). More recently a deep learning approach has been adopted for the MRF problem (see e.g. [22,72,35]), the crux of which is to approximate the match-filtering step by a compact neural network during reconstruction. The MRF dictionary is only used for training the network and not in reconstruction. In our numerical comparisons we exclude these approaches and focus on purely dictionary-based reconstruction baselines.
3. MRF imaging model. MRF acquisitions follow a linear spatio-temporal model: where Y ∈ C cm×L is the k-space measurements collected by c coils at t = 1, . . . , L temporal frames and corrupted by some noise ξ. The MRF image (to be recovered) is represented by a complex-valued matrix X of spatio-temporal resolution n × L i.e. n spatial voxels and L temporal frames 1 . The multi-coil sensitivity operator S : C n×L → C cn×L maps each temporal frame of X to c weighted copies according to the sensitivity maps of c head-coils used in a scanner. The sensitivity maps are identical for all temporal frames and are calculated off-line either through a separate calibration process or directly from the MRF measurements [73]. Throughout whenever we consider a single coil setup c = 1, we assume S to be an identity operator (i.e. S(X) = X) and thus the true sensitivities are absorbed by X. Moreover, F corresponds to a Fourier operator that maps spatial images (at each temporal frame and for each coil) to the corresponding k-space measurements. This operator might correspond to the FFT transform if a Cartesian grid is used for k-space sampling e.g. in [10,64], or it might correspond to a Non-Uniform Fourier (NUFFT) transform [31] for non-Cartesian sampling patterns such as the variable density spirals used in [54,49]. Finally, P Ω : C cn×L → C cm×L is the sub-sampling operator with respect to a set of temporally-varying patterns Ω = L t=1 Ω t , where Ω t stores m < n k-space locations to be sampled at the time frame t. This pattern is identical for all coils at that given time frame.
The linear system (1) is under-determined due to lack of sufficient measurements (i.e. m < n) which means without further assumptions it admits infinitely many solutions and therefore, in order to hope for a stable MRF reconstruction one needs to incorporate efficient and restrictive priors for this type of images.
3.1. Bloch dynamic model. The main source of measurements in Q-MRI are the per-voxel net magnetization of proton dipoles obtained from dynamic rotations of the external magnetic field induced by a radio frequency (RF) coil. These excitations are in the form of a sequence of Flip Angles (FA) {α t } L t=1 applied at certain time intervals known as the repetition times (TR) which could be a constant or varying across different time-frames t = 1, . . . , L. Tissues with different NMR characteristics respond distinctively to these excitations. A qualitative MRI approach studies the contrasts between different tissues in a single time frame which is often-times dependent on the sequence type and the scanner. A Q-MRI approach rather fits a physical model to all spatio-temporal measurements and obtains the absolute NMR characteristics of the underlying tissues, however, at the cost of significantly longer acquisition times. Standard Q-MRI approaches such as DESPOT run separate sequences to measure one parameter at a time [53,44,45,27]. They use parameter-specific sequences that usually result in analytical time-trajectories such as 1 − 2 exp − tT R T 1 or exp − tT R T 2 to be fitted and recover the underlying parameter (here T 1 or T 2) per voxel. The long process of acquiring separately multiple fully sampled images brings serious limitations to standard Q-MRI approaches to apply within a reasonable time and with an acceptable SNR and resolution.
The MRF framework relies on a similar principle, however, it adopts more complicated and sometimes random excitation patterns that are able to simultaneously encode different NMR parameters and produce more distinctive dynamic signatures in shorter acquisition times. The resulting temporal trajectories no longer follow simple analytic e.g. exponential forms and they require methods for approximating the solutions of the Bloch differential equations which capture the overall macroscopic dynamics of per-voxel magnetizations [48]. We denote by B(Θ; T R, T E, α) ∈ C L the discrete-time Bloch response of a molecular structure with a set of intrinsic NMR parameters Θ to a specific excitation sequence of length L with a given FA pattern α, repetition TR and read-out TE times. The real and imaginary parts of B correspond to the amount of magnetizations across two transverse-plane components perpendicular to the external static magnetic field. For instance the IR-BSSFP sequence originally proposed for the MRF framework produces distinct magnetic responses for three parameters Θ = {T 1, T 2, B0} i.e. two relaxation times T 1, T 2 and the off-resonance frequency B 0 . Recent emerging MRF applications are designing sequences encoding a larger number of NMR characteristics such as T 2 * , diffusion, perfusion and vascular properties (see e.g. [74,64,66,75,76,52] and d = Card(Θ) is the total number of generated fingerprints (atoms). Under the voxel purity assumption each spatial voxel of the MRF image corresponds to a specific tissue with a unique NMR parameter and would approximately match to a temporal trajectory in the fingerprint dictionary. 2 By incorporating a notion of signal intensity in this model the rows of the MRF image belong to a cone associated with the fingerprints (2). Denoting by X v the v-th row of X i.e., a multi-dimensional spatial voxel, we have where the discrete cone of fingerprints is defined as follows: Here γ corresponds to the proton density which is generally non-uniformly distributed across spatial voxels.
3.2. Model-based MRF reconstruction. An important source of acceleration in the MRF acquisition process comes from the significant amount of k-space under-sampling. As a result one has to deal with solving a highly ill-posed problem to disambiguate the lack of sufficient measurements. The discrete Bloch model in (3) plays a critical role in regularizing the inverse problem (1) and enabling stable MRF image reconstruction and parameter estimation. Following the model-based compressed sensing approaches such as [7,8,65,38], the reconstruction problem can be cast as minimizing the measurement discrepancy -though the forward model (1)-constrained by the per-voxel Bloch cone model: The recovered image sequence (solution) at each spatial voxel corresponds to a fingerprint representing uniquely the underlying NMR characterizations. As appeared in compressed sensing literature [30,39,40,63,34,42], it might be natural to think of incorporating additional priors to promote certain spatial regularities and/or low-rank structures (i.e. accounting for the correlations between neighbouring voxels or image patches) in order to improve reconstruction, see e.g. [26,77,21] in the MRF context. However care must be taken here, since solving a multi-constrained problem combined with the non-convex fingerprints cone (3) is often intractable and therefore despite possible empirical improvements -perhaps under good initializations-the results are likely to lack global convergence guarantees. In this paper we focus on problem (5) constrained by the cone of fingerprints. A popular approach for solving compressed sensing problems is the Iterative Projected Gradient (IPG) algorithm [25,9,14]. IPG is a first-order algorithm suitable for big data applications and importantly it can also apply to globally solve problems with certain non-convex constraints [14,7,65]. Davies et al. [26] adopted this routine for the MRF reconstruction problem and named it Bloch Response Iterative Projection (BLIP). The BLIP algorithm iterates between a gradient descent update and a (voxel-wise) model projection step: where A(.) := P Ω F S(.) is the shorthand we use for the forward operator, A H := S H F H P H Ω (.) is the adjoint operator, {µ k } is the sequence of step-sizes and P C (.) is the Euclidean projection operator onto the set C i.e.
Note that throughout we use the shorthand ||.|| to refer to the Euclidean norm i.e. the 2 norm of a vector or the Frobenius norm a matrix. For the MRF problem and the constraint set C defined by (3) this projection is also called matched-filtering. After the gradient update decouples into separate cone projections for each spatial voxel v = 1, . . . n and is computed as follows: The non-iterative TM approach originally proposed in [54] corresponds to the first iteration of BLIP with zero initialization 4 . However the iterative approach has shown to be more robust against shorter excitation sequences and acquisition times, where the atoms of the fingerprint dictionary become more coherent and difficult to be distinguished [26].
3.3. Dimension-reduced subspace matched-filtering. The BLIP algorithm breaks down the computations involved in solving the MRF problem (5) into two local updates namely, the gradient and projection steps for which an exact matched-filtering step e.g. by using brute-force nearest neighbour searches, has the complexity O(nLd) in computation time. Discretization of the multi-parameter space often results in very large size MRF dictionaries where the number of fingerprints d has an exponential relationship with the number of NMR characteristics and their quantization resolutions. Therefore, search strategies with linear complexity in d are a serious bottleneck to the exact matched-filtering steps at the heart of model-based approaches for solving (5).
Current proposed solutions for the high dimensionality of the MRF problem rely on a (low rank) subspace compression step to reduce the matching computations [56,21,4]. Let V ∈ C L×L be the eigen-basis spanning the space of the fingerprint dictionary through the singular value decomposition (SVD) i.e. d j=1 D j (D j ) H = V ΣV H , and V s ∈ C L×s denotes the matrix of s-dominant eigenvectors. By assuming high (linear) correlations between fingerprints, there exists a reasonably small number s L for which one would have are the low-dimensional proxies for the original fingerprint dictionary. With this assumption one can solve the following problem instead of (5) in lower dimensions: Note that if D is low-rank and fully spanned by V s then D = V s D, cone(D) = V s cone( D) and by a change of variable we have X = XV H , and therefore both problems (5) and (10) become equivalent. Following the IPG routine for solving this problem, the gradient updates read and the corresponding searches are performed in the compressed temporal domain, directly reducing the complexity of pairwise distance calculations. Such a compression scheme can also reduce the gradient step computations. One can write The last line follows from expanding A and it holds since both the multi-slice Fourier transform F and the coil sensitivity operator S act identically across all time-frames and thus they commute with the temporal compression operators V, V H . As a result, the main computations for conducting the gradient updates (12) i.e., the middle term, comes from the forward-backward Fourier operations across a smaller number s < L of (compressed) temporal frames plus the cost of applying compression-decompression operations V, V H . Depending on how well a low rank model can approximate the dictionary i.e. how small would s be, the overall gradient computations can drop by using such subspace compression, particularly when F corresponds to expensive NUFFT transforms in non-Cartesian acquisition schemes. We empirically observe that V, V H operations would not bring a major overhead in total computations.
The idea of using subspace compressions has been applied to accelerate the brute-force searches in the single-stage TM method where the complexity of searches in compressed domain decreases to O(nsd) [56]. It has been also proposed to use subspace compressions within an iterative algorithm to boost the performance of fast but non-scalable searches based on KD-trees [21]. The applicability of this approach is totally reliant on such a compression preprocessing since it is well understood that KD-trees are inefficient in high-dimensional (ambient) search spaces. Beside these advantages, we would like to remind the reader about our discussion in sections 2 (see also our numerical experiments in Section 6), that methodologies purely relying on subspace dimensionality-reduction are prone to an unfavourable compromise in their estimation accuracies when applied to multi-parametric MRF dictionaries with increased non-linear complexities and growth in data population.
4. Accelerated MRF reconstruction with scalable tree searches. Accelerating the Nearest Neighbour Search (NNS) is a fundamental problem in computer science and it has a long historical literature. Successful proposed approaches are based on building tree structures which hierarchically partition large datasets and then use branch-and-bound algorithms for fast NNS (see e.g. [32,11,33,41,59,57,12,60]). KD-trees -which are the multi-dimensional generalization of binary searches -are perhaps the most widely-known classical structure for fast searches [11]. They consist of partitioning datasets across ambient coordinate axes and therefore do not efficiently adapt to complicated low-dimensional structures of datasets embedded into high (ambient) dimensions. A dimensionality reduction step is inevitably necessary when using KD-trees since they are non-scalable and their search complexity rapidly grows in high-dimensional problems [46]. Modern search algorithms circumvent the curse-ofdimensionality by using i) tree structures that could efficiently benefit from the low intrinsic dimensionality of natural datasets, which is a key assumption in machine learning, and ii) low-complexity algorithms for performing the search approximately i.e. Approximate Nearest Neighbour Search (ANNS).
In the following we briefly introduce a recent data structure known as a Cover tree [12] and highlight certain key properties making this structure ideal for accelerated and scalable searches within iterative MRF reconstruction. Notably for datasets with low intrinsic dimensions cover trees can achieve a logarithmic search complexity in terms of data population without needing an explicit a-priori knowledge of the data structure nor a dimensionality reduction preprocessing.

Cover trees.
A cover tree is a levelled tree whose nodes are associated with points in a dataset D = {D j } d j=1 and at different scales they form covering nets for data at multiple resolutions [12,51]. Denote by S i ⊆ D the set of nodes appearing at scale i = 1, . . . , i max , S 0 = {D j 0 } the tree's root and by σ := max D j ∈D ||D j 0 − D j || the maximal tree coverage from the root. A cover tree structure must satisfy the following three properties: 1. Nesting: S i ⊆ S i+1 , once a point p appears as a node in S i , then every lower level in the tree has that node. 2. Covering: every node q ∈ S i+1 has a parent node p ∈ S i , where ||p − q|| σ2 −i . As a result, covering becomes finer at higher scales in a dyadic fashion. 3. Separation: nodes belonging to the same scale are separated by a minimal distance which dyadically shrinks at higher scales i.e. ∀q, q ∈ S i we have ||q − q || > σ2 −i . Depth of the implicit cover tree constructed with respect to the constraints above might grow very large for arbitrary datasets. Indeed we can easily verify that i max log(∆(D)), where is the aspect ratio of D. In practice we however only keep one copy of the nodes which do not have either parent or a child other than themselves. This explicit representation efficiently reduces the required storage space to scale O(d) linearly with data population, regardless of any (intrinsic) dimensionality assumption [12]. As suggested in [47], each node q could optionally save the maximum distance to its descendants denoted by which provides a useful information for further acceleration of the branch-and-bound algorithm used for the search step. Note that any node q ∈ S i appearing at scale i satisfies as a result of the covering property and therefore, one might avoid saving maxdist(.) values and use this upper bound instead.
Definition 1. Given a dataset D, a query point p (which might not belong to D) and ε 0, then a point q ∈ D from dataset is a (1 + ε)-approximate nearest neighbour of p if it holds: Algorithm 1 details the branch-and-bound procedure for (1 + ε)-ANNS for a given cover tree structure. The proof of correctness of this algorithm is available in [12]. In short, we iteratively traverse down the cover tree and at each scale we populate the set of candidates Q i with nodes in S i which could be the ancestors of the nearest neighbour solution and discard others. This refinement uses the triangular inequality and a lower bound on the distance between the grandchildren of Q to the query p which is calculated based on maxdist(q), ∀q ∈ Q. Note Algorithm 1 Cover tree's (1 + ε)-ANNS (p, T , q c ) approximate search [12] 1: Inputs: query point p, cover tree structure T for dataset D, current estimate q c ∈ D, search inaccuracy ε 0 2: that the maxdist information is either previously stored during construction of the tree or is bounded by (13). Violating the refinement criteria at line 13 in Algorithm 1 implies that ∀q ∈ descendant(q) we would have and therefore, q cannot be an ancestor of the nearest neighbour point -because the current estimate q c would anyway provide a smaller distance to the query. At the finest scale (before stopping) we search the whole set of final candidates and report a (1 + ε)-ANNS point. Note that at each scale we only compute distances for non self-parent nodes i.e. we pass without any computation distance information of the self-parent children to finer scales.
The case ε = 0 refers to the exact tree NNS where one has to continue Algorithm 1 until the finest level of the tree. One should distinguish between this strategy and performing a brute-force search. Although they both perform an exact NNS, the complexity of Algorithm 1 is empirically shown to be way less in practical datasets. Noteworthy, although in this paper we focus on the Euclidean distance metric, cover trees are flexible to use a general notion of distance with respect to other metric spaces.

4.2.
Complexity of the cover tree search. In the construction stage, a cover tree does not need to explicitly know the low-dimensional structure of data. However through building multi-resolution nets, several key growth properties such as the tree's explicit depth, the number of children per node, and importantly the overall search complexity are characterized by the intrinsic dimension of data [51], a notion that is referred to as the doubling dimension  Table 1: Complexity of cover trees in terms of dataset population d regarding the construction time and memory, online insertion and removal operations, and approximate query time when the dataset is doubling and when no assumption on its doubling dimensionality is made (i.e worst-case complexity) [12].
The doubling dimension has several appealing properties, for instance we have [43,51]: Following these properties, a spare point has zero dimension and a discrete set D of d unstructured points has dim D (D) = O(log(d)), independent from the ambient dimension. This dimension could be further decreased by assuming certain regular structures in practical datasets e.g. they could belong to low-dimensional manifolds (embedded in higher ambient dimensions). It has been shown that a low-dimensional manifold M ∈ R n with certain smoothness and regularity assumptions has dim D (M) = O(K) where K L denotes the topological dimension of the manifold [24]. The following theorem [51,12] bounds the complexity of cover trees approximate searches using the (1 + ε)-ANNS Algorithm 1: Theorem 1. Given a query which might not belong to dataset D, cover tree's (1 + ε)-ANNS approximate search takes at most computations in time.
In most applications e.g. uniformly distributed datasets we have log(∆) = O(log(d)) [51]. Therefore for datasets with low dimensional structures i.e. dim D = O(1) and by using approximations one achieves search complexities logarithmic in data population d, as opposed to the linear complexity of a brute-force search. Table 1 outlines the construction time, memory requirement and the search time complexities of cover trees.
There is a great motivation behind using cover trees searches for the MRF reconstruction problem. The manifold M := B(Θ) ∈ C L corresponding to the solutions of Bloch equations is parametrized by a small number Card(Θ) L of parameters; an observation which implies the resulting fingerprint dictionary will have a low-dimensional intrinsic structure. We are Algorithm 2 CoverBLIP(Y, T (D), A, µ) 1: Inputs: k-space measurements Y , cover tree structure T (D) constructed for the normalized fingerprint dictionary D, forward operator A := P Ω F S and its corresponding adjoint operator A H , initial step-size µ. 2: Initialization: k = 0, X 0 = 0, µ k = µ ∀k = 1, 2, . . . 3: while stopping criterion = false do 4: for v = 1, . . . , n do #(per-voxel approximate model projection) 6: ||A(X k+1 −X k )|| 2 then #(adaptive step-size shrinkage) 10: ,v , ∀v 14: return reconstructed MRF image X k+1 , parameter maps Θ, proton density γ currently unable to provide a complete support for claiming that the Bloch response manifolds satisfy the regularity assumptions in [24] and that their doubling dimensions would be a constant i.e. dim D (B(Θ)) = O(Card(Θ)) = O(1) 5 , however, we keep this motivation to adopt cover trees searches to short-cut the heavy computations of the matched-filtering step in the MRF reconstruction. Our numerical experiments in Section 6 demonstrate that by using this structure one indeed achieves significant accelerations and great scalability i.e. a consistent performance in high ambient dimensions unrestricted by the use of a dimensionality reduction preprocessing step.

CoverBLIP algorithm.
Approximation plays a key role in accelerating the nearest neighbour searches and breaking the curse-of-dimensionality [46]. Motivated by the lowdimensional (manifold) structures present in the MRF dictionary, we propose to accelerate iterative matched-filtering steps within the BLIP algorithm by using cover tree's (1 + ε)-ANNS approximate searches. Algorithm 2 outlines the proposed Cover tree BLoch Iterative Projection (CoverBLIP) procedure for accelerated MRF reconstruction. We replace the exact NNS step (8) in the cone projection with the following approximation: which uses Algorithm 1 for a given inaccuracy level ε 0. We denote by T (D) the cover tree structure built for the normalized fingerprint dictionary. At each iteration CoverBLIP uses previously selected fingerprints (i.e. D j * k = X k v /γ v for each voxel) to initialize the ANNS searches. This has two positive impacts: i) the search achieves further acceleration especially, close to the converging point of the algorithm, because with an initialization close to the ANNS solution the branch-and-bound procedure can effectively rule out many branches at higher levels of the tree and thus keep the candidates set very small, and ii) the (1 + ε)-ANNS algorithm would produce non-expansive outputs i.e. ∀v we have which as will be discussed in the next part it is a key property to guarantee the monotone convergence of CoverBLIP. Note that we feed the search algorithm with the normalized gradient updates Z v /||Z v ||. Since dictionary atoms are normalized the search outcome is invariant with respect to the query rescaling, however from the complexity perspective one would gain in computation time by searching a query within a closer range to datasets' hypersphere. We also observed in our experiments that this trick leads to better accelerations.
Convergence is tied to a proper choice of the step-size sequence. We follow the adaptive scheme proposed in [15] which starts from a large initial step size and shrinks this choice by a division factor ζ > 1 e.g., half of the previous step size by setting ζ = 2 , until meeting the following criteria at each iteration k: where again, A(.) := P Ω F S(.) is the shorthand for the forward operator. This condition is another important ingredient to guarantee the convergence of CoverBLIP iterations, which supported by some extra assumptions will also imply a robust reconstruction i.e. near global convergence. We will discuss this point in further details in the next section. After the first iteration we can also use the following energy ratio between measurements and our first estimation i.e. κ = ||Y ||/||A(X 1 )|| in order to rescale the first iteration X 1 ← κX 1 and set an appropriate range (e.g. large enough) for the initial step size µ ← κµ.
When applicable -and with a possible compromise in the accuracy -a temporal subspace compression similar as explained in Section 3.3 can be optionally included to further shrink dimensions of Z v , X v , D j across the dominant SVD components V s ∈ C L×s of the MRF dictionary. In this case one has to build a cover tree structure for the normalized dimensionreduced dictionary D, update the gradient step in Algorithm 2 by the expression (11), and for the step-size expression (21) would change to (22) µ The updated gradient step might also introduce a compromise between cheaper distance evaluations during the search steps (i.e. in C s rather than C L ) and a computation overhead due to applying iteratively compression and decompression, as previously highlighted in Section 3.3. The approximate projection step presented in Algorithm 2 (i.e. lines 6 and 7) assumes that proton densities are real and positive valued quantities. A phase-alignment heuristic similar to [21] can be used to extend this framework to complex-valued proton densities. This approach approximates dictionary atoms with fingerprints having constant complex phases across temporal domain. Complex angles corresponding to the first principal component i.e. D s=1 = V H s=1 D, are then used to align dictionary atoms. Similarly, at each iteration in line 6 we align phases of the gradient update used for the search step; In our experiments we use the complex angles of the dominant compressed image i.e. angle( X s=1 ) for temporal phasealignment. Empirical results applying this approach are demonstrated for our volunteer data experiments in Section 6.2.

5.
Convergence of CoverBLIP. The analysis in this part covers the behaviour of a wide class of inexact IPG algorithms for solving linear inverse problems where the forward operator A and the set C of signal model could be regarded in general forms and not necessary customized for the MRF recovery problem.
A previous work [38] studied the stability of the inexact IPG algorithms with respect to several forms of approximations on gradient and projection updates. Here we focus on iterative algorithms that use the following notion of relative approximate projection step i.e. for an ε 0 we define Example 1. Following Definition 1, the (1 + ε)-ANNS search algorithm is an approximate projection of type (23) onto a discrete set of points C := D in a dataset e.g. a signal model which is used for data-driven inverse problems [38].
Example 2. Notably for projection onto C := cone(D), if we replace the exact search step in (8) with an approximate (1 + ε)-ANNS search, we obtain an approximate cone projection P ε cone(D) (.) satisfying definition (23). Steps 6 to 8 in CoverBLIP Algorithm 2 are indeed implementing such an approximate projection onto the cone associated with the MR fingerprints using fast cover tree searches.
The corresponding inexact IPG iterations, including the CoverBLIP algorithm as a particular case, are as follows: We now follow this section by discussing two types of guarantees for the inexact IPG. The first type makes an embedding assumption on (A, C) and provides a robust signal recovery result which in turn implies an interesting near global convergence guarantee for arbitrary signal models C including the non-convex conic constraints in the MRF problem. The second form of our analysis does not make an embedding assumption and only relies on an adaptive step-size scheme to ensure criteria (21) holds and guarantees local convergence of the inexact IPG algorithm.
The following embedding assumption plays a critical role in our stable signal recovery result [8,13,38]: Definition 3. A forward operator A is bi-Lipschitz with respect to a set C, if ∀x, x ∈ C there exists constants 0 < α β such that (25) α||x Equipped with this notion the following result states that when we have a good measurement consistency i.e. when min X∈C ||Y − A(X)|| is small, then a near global convergence could be achieved using inexact iterations [38,36]: Theorem 2. Assume (A, C) is bi-Lipschitz and that for a given ε 0 and some constant δ ∈ [0, 1) it holds ε + ε 2 δ √ α/|||A||| and β < (2 − 2δ + δ 2 )α, where |||A||| denotes the spectral norm of A. Set the step size µ k = µ, ∀k such that The sequence generated by Algorithm (24) obeys the following bound: Remark 1. Theorem 2 guarantees a linear convergence behaviour for inexact iterations. As a result after a finite K = O(log(τ −1 )) number of iterations Algorithm (24) achieves the solution accuracy ||X K − X 0 || = O(w) + τ for any τ > 0.
Remark 2. Under a properly conditioned bi-Lipschitz embedding as assumed in Theorem 2 the inexact algorithm achieves a solution accuracy comparable to that of the exact IPG algorithm. By increasing ε > 0 we require better embedding conditions as compared to the exact iterations (i.e. the case ε, δ = 0). Although, increasing ε slows down the rate ρ of linear convergence, it could facilitate significantly cheaper computations per iteration. In other words, approximation trades-off against the embedding conditions, rate of convergence and computation time, but not against the order of the solution accuracy.
The following proposition (see the proof in Section A.1 of the supplementary materials) says that by using the adaptive shrinkage scheme described in the previous part we can find a good step size in a finite (logarithmic) number of sub-iterations: Proposition 1. Following the iterative step-size shrinkage scheme with the initial size µ and division factor ζ > 1, the chosen step size µ k , ∀k meets the criteria (21) and satisfies the following bound: in a finite number log ζ (βµ) + 1 of iterations.
Following the adaptive step-size scheme with shrinkage factor ζ, the sequence generated by Algorithm (24) obeys the error bound (26) where, The proof architecture is similar to the proof of [38, Theorem 2], however, this result does not a priori assume µ k 1/β as there or in Theorem 2 of this paper. For the sake of completeness we provide detailed proof of Theorem 3 in the supplementary materials Section A.2 Remark 3. Without an a-priori knowledge of the embedding constants, the inexact IPG algorithm with adaptive step-size exhibits a similar linear convergence behaviour towards the global minima as in Theorem 2. The closer ζ is chosen to one, the embedding condition and the rate of convergence become more comparable to Theorem 2, however at the increased cost of more shrinkage sub-iterations.
Remark 4. Theorem 2 generalizes results in [15] in two ways: i) the set C of constraints are general and not restricted to sparse signals, and ii) results here establish robustness against inexact projection updates. Notably when no approximation is used ε, δ = 0, Theorem 2 relaxes the embedding conditions in [15, Theorem 3] which required ζβ < 8/7α. Finally, we consider a general convergence result which holds even in the absence of the bi-Lipschitz embedding assumption. We additionally assume that the approximate projection produces non-expansive updates with respect to the previous iterations i.e. ∀k and gradient updates Z k := X k − µ k A H A(X k ) − Y it holds : Example 3. The (1 + ε)-ANNS update (19) in CoverBLIP and the associated approximate cone projection satisfy the non-expansiveness property (28), thanks to initializing the search algorithm with previous iteration. 6 The following result guarantees monotone convergence of Algorithm (24) since the cost function ||Y − A(X)|| 0 is lower bounded. The proof is provided in the supplementary materials Section A.3: Theorem 4. Assume the approximate projections are non-expansive and the step-size satisfies (21). Algorithm (24) produces a non-increasing and convergent sequence ||Y − A(X k )||.
Algorithm Description BLIP Iterative reconstruction (6) using exact brute-force searches for the matched-filtering [26] Template Matching Non iterative matched-filtering reconstruction using exact brute-force searches [54] (TM) (i.e. the first iteration of BLIP)

KDBLIP
Iterative reconstruction similar to [21] using KD-tree's ANNS for the matched filtering (Approximation level is controlled by the number of checks which specifies the maximum leafs to visit during the search. Higher checks values give better search precision, but also take more time)

CoverBLIP
Iterative reconstruction Algorithm 2 using cover tree's (1 + ε)-ANNS for the matched filtering (Approximation level is controlled by ε 0 which bounds the search precision according to Definition 1. Smaller ε would give better search precision, but also take more time). Note that determining the bi-Lipschitz conditioning i.e. constants α and β and hence an admissible interval for choosing the step-size (as suggested in Theorem 2) is a combinatorial problem in general. For a certain class of random sampling schemes used in compressed sensing theory however it is possible to derive those constants with high probability, see e.g. [71,62,6]. Applied to the MRF problem, it has been shown in [26, Theorem 1] that if sampling patterns Ω t sub-select uniformly at random large enough number of rows (or columns) of the k-space -a sampling protocol referred to as the random Echo Planar Imaging (EPI) -then the resulting forward model A is bi-Lipschitz, and a fixed choice of step-size equal to the compression factor µ k = n/m, ∀k guarantees stable reconstruction. Randomized acquisition schemes are however not currently popular in practical MRF setups, leading to pronounce more the importance of theorems 3 and 4. In Theorem 3 one does not need to explicitly obtain the bi-Lipschitz constants, however if the forward model happens to satisfy a proper embedding condition then the adaptive step-size scheme is determined to make a proper choice (within the corresponding admissible interval) which guarantees near global convergence. Otherwise in the absence of any embedding assumption Theorem 4 ensures monotone convergence of the non-convex projected iterations i.e. the stability of the algorithm.
6. Numerical experiments. In this section we evaluate and compare the performance of the proposed CoverBLIP algorithm against dictionary-based MRF reconstruction baselines listed in Table 2. Experiments are conducted using MATLAB on a moderate desktop with 8 CPU-Cores and 32 GB RAM. For BLIP and TM algorithms the exact NNS is calculated using MATLAB's matrix product. KDBLIP iterations use randomized KD-tree searches implemented by the FLANN package [1]. Our CoverBLIP algorithm uses a parallel MATLAB interface to an existing implementation of the cover tree's (1 + ε)-ANNS in [2]. 7 We do not believe this implementation is as optimized as that of the FLANN package for KD-tree searches and thus any reconstruction time comparisons (if not unfair) must take this point into account.
Temporal subspace compression option. As discussed in Section 3.3 all considered methods here can use a temporal compression option where the corresponding subspaces are the s L  dominant SVD components of the fingerprint dictionary. This option has the advantage of reconstructing smaller objects i.e. MRF images, accelerated gradient updates i.e. forward and adjoint Fourier operations, and performing searches in low dimensional (ambient) space. The later is particularly beneficial for non-scalable search schemes such as KD-trees.
Datasets. Two sets of experiments are conducted: one using the synthetic Brainweb digital phantom with available Ground Truth (GT) maps [3], and the other using in-vivo scans of a healthy volunteer's brain which appeared in the original work of Ma et al. [54]. Both experiments use the Inversion Recovery (IR) Balanced SSFP acquisition sequence of length L = 1000, however with different flip angles and repetition times TR. The resulting temporal signals from the IR-BSSFP sequence encode three NMR parameters Θ = {T 1, T 2, B0} i.e. the relaxation times and the off-resonance frequency. For each experiment and given FA and TR patterns a fingerprint dictionary is created as in [54] by solving discrete-time Bloch equations for combinations of the NMR parameters.
Evaluation metrics. The normalized solution MSE (NMSE) is measured as where X 0 ,X are the ground truth and the reconstructed MRF images, respectively. The NMR parameter estimation accuracies are measured e.g. for the T 1 case, as follows: where v is the number of voxels within a masked region N defining the object of interest. The mask is obtained by contouring the output proton density (PD) map of the brain and to remove empty voxels where the quantitative values are undefined. Here T 1(v) represents the ground truth T 1 value for the v-th voxel andT 1(v) is the corresponding estimated value. Computational cost. To have a fair comparison between computational complexities of the considered methods -and independent from how optimally they are implemented -we measure total search costs in addition to the reconstruction times. The cost measures the total number of computed pairwise distances (multiplied by the search dimension i.e. either L or s L when using subspace compression) for performing the NNS or ANNS steps within (iterative) matched-filtering until the algorithm converges. For all iterative methods the maximum number of iterations is set to 50 and the algorithm is stopped earlier if the relative progress in minimizing the objective function of (5) (or (10) when using subspace compression) is less than 10 −6 .
6.1. Numerical Brainweb phantom with multi-shot EPI acquisition. In this part we compare the performance of different MRF recovery methods on a synthetic dataset. We use a 256 × 256 slice from the numerical Brainweb phantom with segments corresponding to the background and five different tissues each associated to a set of Θ = {T 1, T 2, B0}   Table 3: Comparisons between iterative methods with/without using temporal subspace compression in terms of reconstruction NMSE, parameter estimation accuracy, search cost/time, number of iterations and total reconstruction time.
parameters (Figure 1(a)). After an inversion pulse, a pseudo-random FA sequence 8 with fixed T R = 10 (msec) and T E = T R/2 is used for excitations. Temporal Bloch responses of the phantom segments are amplified by a given PD map (Figure 1(b)) and synthesize the whole ground truth MRF image  Figure 2 illustrates the FA pattern and the resulting dictionary fingerprints used in this experiment.   For k-space sampling we simulate a similar protocol to the recently proposed multi-shot Echo Planar Imaging (EPI) for MRF acquisition [10]. This protocol is based on a Cartesian grid Fourier sub-sampling where at each repetition time 16 out of 256 lines (with uniform spacing) from the k-space are simultaneously measured. In the next time frame the sixteenshot sampling pattern Ω t will be shifted by one line and so on. As a result we are dealing with reconstructing a 16x-fold undersampled data. We consider a single coil acquisition S(X) = X and white Gaussian noise of 50 dB SNR added to the k-space measurements. Figure 3 illustrates the ground truth MRF images X 0 and the highly aliased back-projected images (BPI) i.e. X = A H (Y ) using this sampling protocol.
6.1.1. Results. We report reconstruction times, total search (projection) costs, image reconstruction errors and parameter estimation accuracies in Figure 4 and Table 5. We also show the reconstructed parameter maps in Figure 5. For the KDBLIP algorithm we vary the KD-tree's search accuracy level by choosing checks = {1024, 256, 64, 16}. For the CoverBLIP algorithm we also test different (1 + ε)-ANNS search approximations by choosing ε = {0.1, 0.2, 0.4, 0.8, 1.6}. We initialize the step-size of the iterative schemes by the compression factor µ = n/m which empirically turns out to satisfy criteria (21) in most iterations and requires one shrinkage sub-iteration for the rest (see discussions in Section 5).
Temporal SVD compression (s = 20) accelerates the runtimes of all tested methods within 1-2 orders of magnitude (Table 3), however such an aggressive compression leads to poor parameter reconstructions (Figure 5(a)). Focusing on the non-compressed regime, we can see that Template Matching (TM) cannot achieve a good accuracy compared to the iterative methods ( Figures 4 and 5(b)). The BLIP algorithm addresses this issue however at a high computational cost of iterating exact brute-force searches. Note that since the multi-shot EPI acquisition uses a Cartesian sampling, F in the forward model (5) corresponds to a FFT operator with fast gradient updates. As a result, and as can be observed in Table 3, projections (i.e. searches) dominate the runtimes of the iterative methods and thus accelerating this step would directly improve the total reconstruction time. CoverBLIP does so by using inexact cover tree searches (e.g. ε = 0.4) and achieves the best reconstruction time-accuracy (also search cost-accuracy) in all cases. Remarkably, CoverBLIP reports a similar accuracy to BLIP iterations however with 3 orders of magnitude less search cost and 27x-fold acceleration in the reconstruction time. Notably, the total cost of CoverBLIP inexact searches does not exceed that of a single stage brute-force search in TM (Figure 4). When using temporal compression-a favourable case for the KD-tree searches-KDBLIP with number of checks = 256 performs comparable to the CoverBLIP algorithm. However, for improving the overall estimation accuracy if we wish to not use subspace compression, then KDBLIP's time-accuracy performance fails to catch up with that of CoverBLIP. Figure 4 shows the gap between performances of these two algorithms caused by the non-scalability of the KD-tree searches. For instance CoverBLIP with ε = 0.4 outputs more accurate parameter maps ( Figure 5(b)) whilst reporting 6x less total search cost.
6.2. In-vivo data with variable-density spiral acquisition. In this part we evaluate reconstruction methods in Table 2 on in-vivo MRF data acquired from a healthy volunteer using the IR-BSSFP sequence and the 1.5 T whole body Espree Siemens Healthcare scanner with 32-channel head receiver coil. This dataset was used in the seminal paper of Ma et al. [54], where FAs have a pseudo-randomized (Perlin noise) pattern of length L = 1000 and TRs are uniformly selected at random between 10.5 and 14 msec. At each time frame (repetition time) one interleaf of the variable-density spiral readout samples the k-space (see [54, Figure  1] for the FA,TR and spiral readout patterns used in this experiment). The spiral trajectory Ω t rotates by 7.5 • in the next time frame to sample different k-space locations and so on. The overall k-space undersampling factor is 48x folds and since a non-Cartesian readout pattern has been used, the operator F in the forward model (1) is implemented using the non-uniform Fourier transform (NUFFT) [31]. Sensitivity maps (i.e. S operator in (1)) are computed off-line from the acquired multi-coil data [73]. . As a common practice used to precondition non-Cartesian MRF problems, we incorporate a density compensation scheme within the reconstruction pipeline to enable faster convergence (see more details in Section B of the supplementary materials). With this update, we initialize the step-size by the compression factor µ = n/m similar to the Cartesian sampling. We empirically observe that this choice satisfies the criteria (21) for most of the iterations and for the rest one or two shrinkage sub-iterations suffices.
6.2.1. Missing high-resolution information and high-frequency artefacts. As can be observed in Figure 6(b), using iterative methods for spiral readouts may cause high-frequency  artefacts in the estimated maps. We would like to emphasize that this issue does not arise because of the deficiency of iterations. Indeed, the monotone decay of the measurement fidelity error implies that iterations improve data consistency as compared to the non-iterative TM scheme ( Figure 6(c)). As also highlighted in [21], after an initial rapid decay in the fidelity error a long epoch of slowly-decaying iterations will follow to recover high-resolution image features. However, since spiral readouts do not sufficiently sample high-frequency k-space locations, solving (5) may admit undesirable solutions with high-frequency artefacts which appear in the second epoch of iterations until convergence. These artefacts can be removed by either using a spatial-smoothing regularization 9 or by reconstructing images in a lower spatial resolution. Here we take the latter approach and reconstruct volunteer images in n = 100 × 100 resolution for the rest of our experiments -instead of the 128 × 128 resolution maps shown in [54, Figure 3] using the non-iterative TM. We also observe that with this update we require less iterations to converge.
6.2.2. Results. The non-iterative TM algorithm performs reasonably well when all 32 coil/channel data are used (Figure 7(A-D)), supporting the fact that in data-rich regimes we may not need sophisticated inference algorithms [20]. To better highlight the advantage of iterations we select measurements from 6 coils that maximally cover the k-space. As can be observed in Figure 7(H,L), the recovered PD maps from the 6-coil data demonstrate weaker signal intensity in central and certain border regions as compared to the one obtained from the 32-coil data in Figure 7(D). Comparing Figure 7(B,C) to Figure 7(F,G) shows that TM reconstruction for the reduced 6-coil data introduces artefacts in both T 2 and B0 maps around the Cerebrospinal Fluid (CSF) regions where the signal is weak. The iterative BLIP algorithm however corrects for this issue and works stably in low-data regime. The rest of our experiments focuses on the 6-coil k-space data. We next compare the performance of exact BLIP iterations using the subspace compression option. As we can see in Figure 8(F,G) a significant SVD compression (s = 20) causes overestimated T 2 values and distorted B0 map (e.g. see fat/muscle regions) whereas, using a moderate compression (s = 200) results in reconstruction quality comparable to that of using no subspace compression (compare the reconstructed maps in Figure 8(A-D) to Figure 7(I-L)). The reason is that the subspace of s = 200 principal components can well represent the IR-BSSFP dictionary used in this experiment (see also related discussions in [56]). Table 4 compares the reconstruction performance of iterative methods BLIP, KDBLIP and CoverBLIP for different search dimension regimes with/without using subspace compression. The corresponding reconstructed maps can be also visually compared in Figure 9. The BLIP algorithm using exact brute-force searches achieves the lowest fidelity error but it requires the longest reconstruction time and highest search complexity. CoverBLIP with ε = 0.2 reports the best reconstruction time-accuracy (also total search cost) among all tested methods. CoverBLIP saves between 2 to 3 orders of magnitude in total search cost of BLIP with a comparable reconstruction accuracy (see the corresponding normalized fidelity errors in Table 4 and the recovered maps in Figure 9(A-D, I-L)). Importantly and unlike KDBLIP, this computational advantage is consistent for all three search-dimension regimes. We can observe in Figure 9(E-H) that by using subspace compression s = 200, the KDBLIP algorithm with 256 checks outputs comparable parameter maps to that of CoverBLIP, however with 2-4 times more search cost. Runtimes reported for both methods in this case (Table 4) are however similar because as previously pointed out we do not claim an optimal implementation of the cover tree searches used here. KDBLIP uses non-scalable tree searches and therefore without a dimensionality reduction -even with a large number of checks = 4096, a longer runtime and 80x higher search cost than CoverBLIP-this algorithm fails to output artefact-free parameter maps (Figure 9(M-T)). More artefacts occur using smaller checks e.g. 512 or 256. In this experiment a moderate subspace compression turns out to be advantageous for all tested algorithms, but then it is a crucial step for using KD-tree searches. We empirically observed that KDBLIP starts reporting poor reconstruction time-accuracies when more than 350 principal components are used.  Table 4: Comparison between iterative methods with/without using temporal subspace compression in terms of search cost, total runtime and normalized fidelity error.
Comparing the overall runtimes in Table 4, we note that CoverBLIP (ε = 0.2) achieves 2.5-6x fold acceleration compared to the BLIP algorithm which is less than what was reported for our previous synthetic data experiment in Section 6.1. The reason is that here we use multicoil data and non-Cartesian k-space sampling where both make the gradient updates become a non-trivial computational overhead for the iterations. Note that reconstructions from a non-Cartesian acquisition protocol requires computing slow NUFFT operations in each iteration. As a result, despite a significant reduction in the total search cost (i.e. projection steps) this advantage will be less pronounced in the overall runtime of CoverBLIP. We believe addressing this issue i.e. breaking down the cost of heavy gradient updates, merits an independent line of future investigation beyond the scope of this work.
7. Conclusions and future directions. We considered accelerating the iterative scheme for model-based MRF reconstruction and for this purpose we approximated the matched-filtering step in each iteration using cover tree's (1 + ε)-ANNS search scheme. For low-dimensional manifold datasets cover trees offer appealing construction times, memory requirements and remarkably low search complexities scaling logarithmic in terms of data population. With this motivation, we proposed the CoverBLIP algorithm which adopts such tree structures for fast iterative searches over large-size MRF dictionaries i.e. discrete manifold of Bloch responses parametrized by few NMR characteristics. Provided with a notion of (model-restricted) embedding we showed that the inexact iterations of CoverBLIP linearly convergence toward a solution with the same order of accuracy as when using BLIP with exact brute-force searches. We also introduced an adaptive step-size scheme that guarantees local monotone convergence of CoverBLIP in the absence of bi-Lipschitz embedding. We evaluated the performance of our proposed method on both synthetic and real-world MRF datasets using different sampling strategies, and we demonstrated that CoverBLIP is capable of achieving orders of magni-tude acceleration in conducting the projection steps as compared to the exact iterations of BLIP. We also showed that CoverBLIP is a scalable algorithm able to maintain the gain in its time-accuracy performance in high-dimensional search spaces.
Future works include application of CoverBLIP to the emerging multi-parametric MRF problems with more complex dynamic responses encoding a larger number of NMR characteristics. In such cases and due to the inherent non-linearity of Bloch responses a low-dimensional subspace model of the dictionary would be prohibitively inaccurate, and one would rather need to resort on fast search schemes such as cover trees that are robust against the curseof-dimensionality. Our current search implementation does not benefit from the considerable amount of inter-voxel correlations present in a query batch. As shown in e.g. [23] faster searches are possible by additionally building a dual (cover) tree on the query batches. An interesting line of future work would adopt this idea to further accelerate CoverBLIP, however with more restricted choice of dual trees or batch sizes whose construction times would not bring a computational overhead throughout multiple iterations. Further extension to the present work could also focus on reducing the computational cost of the gradient updates for non-Cartesian and multi-coil acquisition schemes. In this regard, a possible line of investigation would be the application of randomized iterative projected gradient algorithms (see e.g. [16,50,68,67]), where iterations adopt cheap, unbiased and variance-reduced stochastic approximations of the true but computationally-intensive gradient updates.
Supplementary materials. The supplementary materials include the proof of our convergence results for the inexact IPG iterations (24) -including the CoverBLIP algorithm -in Section A. Related to our numerical experiments in Section 6.2 of the main article and for further clarity to readers who might not be MRI/MRF experts, we provide a short note in Section B on using a density compensation scheme for preconditioning MRF reconstruction problems with variable-density spiral (or other non-Cartesian sampling) readouts.
Appendix A. Proof of the convergence results in Section 5. In this part and in Section A.1 we provide the proof of Proposition 1 on the feasibility of using the adaptive shrinkage scheme to find a good step-size in a finite number of sub-iterations. Section A.2 includes the proof of Theorem 3 which establishes a near global convergence result i.e. a reconstruction guarantee, for the inexact IPG algorithm by using the adaptive step-size shrinkage scheme. In Section A.3 we provide the proof of Theorem 4 which guarantees monotone convergence of the inexact IPG in the absence of any embedding assumption.
A.1. Proof of Proposition 1. Following the approximate projection definition (23), each iteration of the inexact IPG algorithm (24) produces X k+1 , X k ∈ C. Therefore, according to the bi-Lipschitz property (Definition 3) we have the following bounds on the values used in the step-size criteria (21) ∀k: As a result starting from any finite large step-size µ and after a finite number of divisions by a factor ζ > 1, µ k reaches the lower bound and satisfies criteria (21). The smallest possible step-size before stopping the shrinkage thus ranges in the interval (ζβ) −1 , β −1 which implies the lower bound in (27). The largest possible µ k is upper bounded by (21) which is always less than α −1 .
On the other hand the following lower bound holds: DCF-BPI slice t =5 BPI slice t =100 t =200 t =400 t =800 (a) Back-projected images with (top row) and without (bottom row) DCF weighting (b) DCF profile across one spiral arm which (together with the fact that the cost function ||Y − A(X)|| 0 is lower bounded) completes the proof.
Appendix B. A note on using density compensation for preconditioning the MRF reconstruction with spiral readouts. Spiral readouts acquire much denser collection of samples from the centre of k-space than the outer regions. As a result the forward operator A becomes more ill-conditioned as compared to e.g. a Cartesian acquisition. In this case if we perform an iterative reconstruction such as (6) the progress in each iteration will be very small. As can be observed in Figure 10(a), applying the adjoint operator on the k-space data X = A H (Y ) results in heavily blurred images in the first iteration. Such slow convergence combined with costly NUFFT updates at each iteration makes the reconstruction extremely time-consuming. One fix to this issue -which we adopt in our experiments in Section 6.2 of the main article -is to use a weighted least squares loss for the fidelity term with smaller weights for the central k-space locations. The objective of (5) updates to L t=1 ||Y t − P Ωt F S(X t )|| 2 W , where the weighted norm is defined as ||a|| 2 W := i w i |a i | 2 , and the weights w i > 0 are derived from calculating the Density Compensation Function (DCF) for a given spiral trajectory [61]. Figure 10(b) illustrates DCF weights for the sampling trajectory used in this experiment. Following the update in the objective function and the corresponding gradient expression, the iterations of BLIP in (6) take the following form: Similar update applies to CoverBLIP (24), KDBLIP and their SVD dimension-reduced variants when using spiral readouts. Figure 10(a) illustrates the weighted back projected images (BPI) i.e. X = A H (diag(w)Y ) in the first iteration, where we can observe the blurring has been removed (the under-sampling artefacts however remain). DCF reweighing significantly improves the conditioning of Hessian A H diag(w)A compared to that of A H A and results in much larger chosen step-size and faster progress during the iterations i.e. faster convergence. Similar to the Carteian sampling we initialize the step-size by the compression factor µ = n/m and we empirically observe that this choice satisfies the criteria (21) for most of the iterations and for the rest one or two shrinkage sub-iterations suffices.