Randomized algorithms for fast computation of low rank tensor ring model

Randomized algorithms are efficient techniques for big data tensor analysis. In this tutorial paper, we review and extend a variety of randomized algorithms for decomposing large-scale data tensors in Tensor Ring (TR) format. We discuss both adaptive and nonadaptive randomized algorithms for this task. Our main focus is on the random projection technique as an efficient randomized framework and how it can be used to decompose large-scale data tensors in the TR format. Simulations are provided to support the presentation and efficiency, and performance of the presented algorithms are compared.


Introduction
Tensor decompositions have found numerous applications such as in signal processing, machine learning and scientific computing [1][2][3][4][5][6][7][8][9]. Handling very large-scale data tensors is a challenging task due to high computational complexity and memory requirements. Tucker decomposition [10][11][12] can resolve this problem by compressing a given data tensor by a smaller core tensor and factor matrices. However, the core tensor suffers from the phenomenon known as curse of dimensionality which means that the number of parameters for its representation is exponentially increased as the order of the core tensor is increased [13,14]. To tackle this difficulty, alternative tensor representations have been introduced. Tensor networks are effective tools to cope with this difficulty by approximating a given data tensor by a series of inter-connected smaller low order core tensors [2,3,15]. For example, Tensor Train/Tensor Ring (TT/TR) decompositions [16][17][18][19][20] are special cases of Hierarchical Tucker decomposition [21,22] which are two simple but powerful tensor networks representing the original tensor as a train and a ring (chain) of inter-connected third order tensors, respectively. The TT decomposition is known as Matrix Product States (MPS) in quantum physics [23][24][25][26]. The TT and the TR decompositions have found many applications in scientific computing and machine learning communities such as computing extreme singular values and computing pseudoinverse of very large matrices [27,28], reducing number of parameters in deep neural networks (DNNs) [29][30][31][32][33], tensor completion [34][35][36][37][38][39], machine learning [40][41][42][43][44], quantum chemistry [45], solving high-dimensional PDEs [46], low rank approximation of large sparse matrices [47]. Other related tensor networks are Projected Entangled Pair States (PEPS) [48], the Multi-scale Entanglement Renormalization Ansatz (MERA) [49] and the Tree Tensor Network (TTN) [50]. While the Tucker decomposition suffers from the curse of dimensionality, recently an efficient algorithm has been proposed in [51] to compute a Tucker decomposition whose core tensor stored in the TT format to avoid this difficulty. The idea is based on decomposing a tensor into the TT format followed by a conversion into the Tucker format whose core tensor is stored in the TT format. Deterministic algorithms for computing the TR decomposition involves computation of a series of the SVD of unfolding matrices. Clearly this is computationally prohibitive for large-scale data tensors and requires large memory and computational complexity. Randomized algorithms are effective techniques to cope with this problem by reducing the computational complexity of the deterministic algorithms and also reducing the communication among different levels of memory hierarchy. In this paper, we review a variety of randomized algorithms for computing the TR decomposition of large-scale data tensors.
Our main contributions in this paper are as follows: • Extending random projection technique for fast TR decomposition (Algorithms 7 and 10), • Extending randomized algorithms for fast TR decomposition with permutation of core tensors (Algorithms 13 and 14), • Extending randomized rank-revealing algorithm for fast TR decomposition (Algorithm 7), • Applying randomized algorithms for fast TR completion task (Example 3 in section 5).
Before starting the next section, we introduce some concepts and notations used throughout the paper. Tensors, matrices and vectors are denoted by underlined bold upper case, bold upper case and bold lower case letters as X, X, x, respectively. The notations 'T' and 'Tr' denote the transpose and the trace of a matrix. The Frobenius norm of a tensor is denoted by ∥.∥ F . Slices are matrices taken from tensors produced by fixing all but two indices. Slices X(:, i 2 , :) of a 3rd-order tensor X ∈ R I1×I2×I3 , are called lateral slices. We can multiply tensors and matrices. For example, the tensor-matrix multiplication of a tensor X ∈ R I1×I2×···×IN and a matrix Q ∈ R K×In along mode n is denoted by X × n Q ∈ R I1×···×In−1×K×In+1×···×IN and defined as follows x i1i2 ···iN q k in , k = 1, 2, . . . , K. (1) The same definition can be presented for tensor-vector multiplication. Based on the definition of the tensor-matrix multiplication, when a tensor is multiplied by a vector, the resulting tensor has a mode of size 1. In order to remove the mentioned mode and reduce the order of the resulting tensor, we use a special notation×. For example, for X ∈ R I1×I2×···×IN and y ∈ R In , we have X× n y ∈ R I1×···×In−1×In+1···×IN . A tensor can be reshaped to a matrix and vice versa. These procedures are called matricization (unfolding or flattening) and tensorization, respectively. For a tensor X ∈ R I1×I2×···×IN , the n-unfolding of the tensor X, is denoted by X ⟨n⟩ ∈ R I1···In×In+1···IN , and its components are defined as follows A special case of the n-unfolding with only one index for the first coordinate is called mode-n unfolding and is denoted by If a tensor is multiplied by a matrix along a specific mode, then its mode-n unfolding can be computed as follows For a given data matrix X, operator SVD δ (X) denotes the truncated SVD of X, i.e.
and the corresponding minimal rank is denoted by rank δ (X).
The N-tuple (K 1 , K 2 , . . . , K N ) is called multilinear or Tucker rank. Higher order SVD (HOSVD) [52] is a special Tucker decomposition in which the factor matrices are orthogonal.
The organization of this paper is structured as follows. In section 2, basic randomized algorithms for low rank matrix approximation are introduced. The TR model, its properties and basic algorithms are described in section 3. Adaptive and non-adaptive randomized variants of the algorithms presented in section 3 are elaborated in section 4. Simulations are provided in section 5 to support the presentation and the conclusion is give in section 6.

Basic randomized algorithms for large-scale matrices
Randomized algorithms are efficient techniques for computing low rank matrix approximation. The principle idea behind this framework is capturing the range (column space) of a matrix and making reduction in the data matrix in this manner. Note that if the number of rows of a matrix is more than its columns then we should capture its row space. The randomized approaches can be categorized into next three groups: • Random projection. In the random projection approach, a matrix is multiplied by a random matrix such as Gaussian, uniform or Bernoulli matrices 3 on the right-hand side [53]. This procedure may be expensive. In order to accelerate the computation procedure, we can use structured random matrices such sparse random matrices [54,55], subsampled random Fourier transform [56,57], subsampled Hadamard transforms, sequence of Givens rotations [57,58] which are established techniques and have been used extensively in the literature [59]. • Sampling. In the sampling techniques [60], a part of columns of the original matrix is selected and reduction is made in the data matrix in this manner. • Count-Sketch. The count-sketch approach [61][62][63] first labels the columns of the data matrix uniformly. This is also called Hashing. Then, the columns with the same labels are grouped together. Afterwards, the columns of each group are multiplied with ±1 uniformly 4 and they are summed as representative columns.
In this paper, we only consider the random projection technique although the sampling and the count-sketch techniques are also applicable. We mainly focus on standard and distributed data tensors which means that they can be stored in Random Access Memory (RAM) on a single workstation or distributed among several disks, respectively. We only make a few comments on streaming data sets.

Fast SVD
After making enough reduction in the data matrix via one of the above-mentioned randomized dimensionality reduction techniques, the original data matrix is projected onto the low rank approximation obtained in the first step. To be more precise, let X ∈ R I×J be a given data matrix with rank(X) = R ≪ min(I, J). In the first step, the matrix X is multiplied by a random matrix Ω ∈ R J×R as In the next step, an orthogonal projector onto the column space of the matrix X, i.e. QQ T , is computed where Q is an orthonormal basis for the range of Y. The orthonormal basis Q can be computed through the QR decomposition Y = QR, where Q ∈ R I×R , and R ∈ R R×R . Here, we have X ∼ = Q Q T X, and the compressed matrix B = Q T X ∈ R R×J is of smaller size than X. The SVD of the original matrix X is recovered from the SVD of B = USV T as follows This procedure is summarized as follows: The oversampling technique can be used to improve the solution accuracy of the randomized algorithms. In the oversampling technique, we use additional random vectors (for example R + P random vectors instead of R random vectors) in the first step, i.e. the dimensionality reduction step. In practice, typically P = 5 or P = 10 is enough to achieve reasonable solutions [53]. Remark 2. (Power iteration technique) The power iteration technique, is used when the singular values of a data matrix do not decay very fast. Here, we exploit matrix Z = X X T q X (q is a non-negative integer number) instead of the matrix X and the randomized algorithms are applied to this new matrix. Considering the SVD, X = USV T , we have X X T q X = U S 2q+1 V T , and it is seen that the left and right singular vectors of the new matrix Z are the same as those of the matrix X but the singular values of Z have faster decay rate. This can improve the solution accuracy obtained by the randomized algorithms. One should avoid constructing the matrix Z explicitly because of instability issues and it should be computed iteratively using QR decomposition [53]. It was experimentally confirmed that the power iteration q = 1 or q = 2 is often enough in practice for achieving accurate solutions [53]. Algorithm 1: Randomized SVD algorithm with oversampling and power iteration [53] Input: A data matrix X ∈ R I×J , target rank R, oversampling P and power iteration q Output: SVD factor matrices U ∈ R I×R , S ∈ R R×R and V ∈ R R×J 1 Generate a random matrix Ω ∈ R J×(P+R) with a prescribed probability distribution 2 Form Y = X X T q X Ω ∈ R I×R 3 Compute QR decomposition: Y = QR 4 Compute: B = Q T X ∈ R R×J 5 Compute an SVD, B = U S V • Reduction. Replacing an extremely large-scale data matrix with a new one of smaller size compared with the original one capturing either the column or the row space of the original data matrix as much as possible. • SVD computation. Applying deterministic algorithms, e.g. truncated SVD to the reduced data matrix and finding its low rank approximation 5 . • Recovery. Recovering the SVD of the original data matrix from the SVD of the compressed one.
The basic form of the randomized SVD algorithm equipped with the oversampling and the power iteration strategies is outlined in Algorithm 1.

Two-sided randomized algorithm
It is also possible to make reduction on both dimensions of a data matrix when both of them are large. This can be done by simultaneous multiplying a given data matrix with two random matrices from the left and right hand sides. Algorithm 2 outlines the structure of such a randomized algorithm. Both Algorithms 1 and 2 are randomized multi-pass SVD algorithms because both of them need to access (pass) the original data matrix X two times in Lines 2 and 4. These algorithms can be modified to become single-pass algorithms. To this end, Line 4 in both algorithms can be replaced by alternative representations. For Algorithm 1, we can consider [66] and for Algorithm 2, we can consider The benefit of these approaches is that they avoid computation of the terms Q T X and Q (1) T X Q (2) which may be computationally expensive, especially when the data matrix is stored out-of-core and the cost of communication may exceed our main computations. Instead, in formulations (3) and (4), the original data matrix X is sketched using the random projection technique and the corresponding matrix B is obtained by solving some well-conditioned overdetermined linear least-squares problems [66]. The matrix multiplication by a random matrix can be performed relatively fast by employing structured random matrices. We should note that this strategy passes the original data matrix only once because all sketching procedures can be done in the first pass over the raw data 6 . Other types of single-pass techniques can be found in [53,[67][68][69][70]].

Randomized matrix rank-revealing (RR) algorithms
In randomized Algorithms 1 and 2, we need an estimation of the matrix rank in advance which may be a difficult task. Randomized rank-revealing (RR) or equivalently randomized fixed-precision algorithms are able to retrieve the rank of a given data matrix and also the corresponding low-rank matrix approximation automatically. In practice, we use randomized RR Algorithm 3 proposed in [71] which is a modification of the RR algorithm developed in [72]. The operator 'orth' in Lines 6,8,9,11 computes an orthonormal basis for the range of a given data matrix. Also, in the first step, the matrices Q and B are empty and they are

Algorithm 2: Two-Sided Matrix Randomized SVD [53, 65]
Input: A data matrix X ∈ R I×J , and target rank R Output: SVD factor matrices U ∈ R I×R , S ∈ R R×R and V ∈ R R×J 1 Draw random matrices of prescribed sizes

Algorithm 3: Randomized Matrix Rank-Revealing Algorithm [71, 72]
Input: A data matrix X ∈ R I×J , approximation error bound ε, block size b and power iteration q updated sequentially. The algorithm requires an approximation error bound ε, block size b and the power iteration q. For more details on theoretical results of this algorithm 7 , we refer to [71,72].

Basic tensor ring (TR) decomposition
Tensor Chain (TC) or Ring (TR) decomposition [17][18][19][20] is a tensor network representing a tensor as a ring (chain) of 3rd-order tensors (see figure 1). A special case of the TR decomposition with condition R 0 = R N = 1 is called the Tensor Train (TT) decomposition [16] because it represents a tensor as a train of 3rd-order tensors. For simplicity of presentation, throughout the paper, we only focus on the TR decomposition and all materials naturally hold true for the TT decomposition.
Let X ∈ R I1×I2×···×IN , then the TR decomposition of the tensor X admits the following model where X (n) ∈ R Rn−1×In×Rn , n = 1, 2, . . . , N are called core tensors and the (N − 1)-tuple (R 0 , R 1 , . . . , R N−1 ) is called TR-ranks. Note that in the TR decomposition, we have R 0 = R N and it is also shown in [20] that the TR-ranks satisfy R 0 R n ≤ rank X ⟨n⟩ for n = 1, 2, …, N. Equation (5), is called component-wise TR representation and an equivalent slice-wise representation is Here, X (n) (i n ) are R n−1 × R n lateral slices of the core tensors X (n) for n = 1, 2, …, N. In view of (5), introducing an auxiliary index r 0 makes it possible to consider the TR decomposition as a linear combination of R 0 terms of the TT decomposition with partially shared cores. Generally the topological structure of tensor networks can be changed. For example, the TT and TR decompositions can be converted to each other 8 [73,74]. There are several efficient algorithms for computation of the TR decomposition such as Sequential SVDs algorithm, Alternating Least-Squares (ALS) algorithm, ALS with adaptive ranks 9 and Block-wise ALS algorithm [20,73]. Note that in the ALS-type algorithms, at each iteration, all core tensors are kept fixed except one and the corresponding core tensor is updated. Moreover, the fixed core tensor is alternatively changed and this justifies the name ALS. A closely related algorithm is modified ALS (MALS) or Density Matrix Renormalization Group (DMRG) algorithm [18,73,[75][76][77], where consecutive core tensors are updated simultaneously.
Here, we introduce the TR-SVD algorithm [20] for computing the TR decomposition. It is summarized in Algorithm 4. This algorithm is robust because it relies on the SVD where at each iteration of the algorithm the truncated SVD of the unfolding matrices are computed. It is worth mentioning that the TR-SVD algorithm with initial rank R 0 = 1 is equivalent to TT-SVD algorithm [16]. The idea of cross/skeleton or equivalently CUR decomposition [78][79][80][81] has been used for the TT decomposition [82][83][84] which can be naturally used for the TR decomposition [19]. For ALS and MALS-types algorithms see [18,20,85,86]. The TT decomposition can also be computed by Tucker-2 model which is also applicable for the TR decomposition, (see [2,85]). Remark 3. The TR-ranks obtained by Algorithm 4 may not be optimal and often rounding algorithms [16,87] are used to find a decomposition with lower TR-ranks. Unlike the TT format, the rounding algorithms for mathematical operations in the TR format is more complicated [88]. The TR decomposition as a tensor network contains loop. Hence, it is in general not closed in the Zariski topology [73], section 3, [14,89,90]. This means that a sequence of tensors in the TR format may not necessarily converge to a tensor in the TR format, for a detailed theoretical justification see [14,89,90]. This may cause instability issues when one wants to find the best TR approximation [18,73,91,92]. That is, the best TR approximation of a given tensor with predefined TR-ranks may not exist and Algorithm 4: TR-SVD algorithm [20] Input: A data tensor X ∈ R I1×I2×···×IN , a prescribed approximation error bound ε, and initial rank R0 as a divisor of rank δ X ⟨1⟩ Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X (N) ≫, such that it can be arbitrarily approximated well by the TR decomposition with lower TR-ranks. In contrast, the best low rank TT decomposition is a well-posed problem [16].
Let τ be a cyclic permutation of the dimensions of a data tensor X ∈ R I1×I2×···×IN , and produce a new reshaped data tensor Assume that the TR representation of the tensor X τ is as follows where τ −1 is the inverse of the cyclic permutation τ . Since there are no boundary on the corner core tensors, the decomposition is invariant to cyclic permutation. However, in practice, the TR-ranks of the permuted tensor X τ may be different from the TR-ranks of the tensor X and each cyclic permutation of indices may provide different TR-ranks. It turns out that two main issues underlying the compression performance of the TR decomposition are • Cyclic shifts which leads to a suboptimal model • Initial rank R 0 of the first core tensor More precisely, choosing different cyclic shifts and an initial rank R 0 may lead to different TR decompositions with different number of parameters. In particular, for different initial ranks, it is shown that it is impossible to find a common minimal rank, see ( [87], Proposition 2.2). These facts imply that to find a TR decomposition with suboptimal TR-ranks, it is necessary to check all cyclic shifts and possible initial ranks. This procedure is called Reduced storage TR-SVD [87] and is summarized in Algorithm 5. Clearly this algorithm is computationally expensive for high order tensors and because of this issue, a heuristic algorithm called Heuristic TR-SVD is developed [87] in which the procedure of initial rank and cyclic permutation selections are performed heuristically. This procedure is summarized in Algorithm 6 and it essentially consists of two parts as follows: Algorithm 5: TR-SVD with all possible cyclic permutations (see also [87]) Input: A data tensor X ∈ R I1×I2×···×IN and a prescribed approximation error bound ε Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X Set R0 = r 7 Apply Algorithm 4 to the permuted tensor X γn with initial rank R0 and obtain core tensors X (n) i γ −1 n (k) , k = 1,2,…,N 8 end 9 end 10 Find the optimal TR decomposition with the least number of parameters corresponding to the cyclic permutation γn * and the initial rank R0, i.e. core tensors • Preprocessing part to find a sub-optimal cyclic shift and an initial rank R 0 (in heuristic manner), • Applying the TR-SVD with those parameters obtained from the first step.
In the heuristic algorithm, firstly a cyclic permutation γ n * is chosen by solving the following minimization problem Afterwards, the initial rank R 0 corresponding to the cyclic permutation γ n * selected in the first step, is found by solving the following minimization problem [87] arg min where The main algorithms proposed so far are deterministic using truncated or economic SVD which are quite expensive for big data matrices. Next we present the randomized variants of the mentioned algorithms.

Randomized tensor ring (TR) decomposition
The main computationally expensive part of Algorithms 4-6 is computation of the truncated SVD of the unfolding matrices. Exploiting the randomized algorithms can speed-up these algorithms for the TR decomposition. Following this strategy, in this section, randomized algorithms for the TR decomposition are introduced.
As mentioned, we only focus on the random projection technique. The sampling and the count-sketch strategies can be applied straightforwardly. For example, in [19], the cross decomposition was used instead of the SVD. Here, columns and rows are sampled in heuristic ways. Also in [93], the sampling technique is used within the ALS algorithm which scales linearly in the tensor order. The problem is treated as a tensor with missing components (only O(N) known components) and the ALS-TR algorithm applied to this uncompleted data tensor to simultaneously recover the data tensor and also decompose it into the TR format.
Recently, in order to reduce the high computational cost of the standard TT-SVD algorithm, two types of randomized algorithms have been proposed in [94][95][96] as follows Algorithm 6: Simplified TR-SVD (see also [87]) Input: A tensor X ∈ R I1×I2×···×IN and a prescribed approximation error bound ε Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X (N) ≫, such that Compute R * 0 by solving optimization problem (9) 8 Apply Algorithm 4 to the permuted tensor X γ n * with the initial rank R * 0 and obtain core tensors • Random projection TT algorithm, • Adaptive randomized TT algorithm.
The random projection TT algorithm is a variant of the TT-SVD algorithm where at each iteration of the algorithm, random projection technique is used to find low rank approximations of unfolding matrices. This procedure is outlined in Algorithm 7. We discuss this idea for the general setting of the TR decomposition. To be more clear, we explain one iteration of Algorithm 7. In the first iteration of Algorithm 7, C (1) ∈ R I1×I2I3···IN is the mode-1 unfolding matrix of the Nth-order tensor C ∈ R I1×I2×···×IN . Considering random Gaussian matrix Ω of conforming dimension and taking into account the oversampling P, we have In order to find an orthonormal basis for the range of matrix C (1) Ω, the QR decomposition of mentioned matrix is computed as follows The first R 0 R 1 columns of matrix Q are taken as orthonormal basis for range of C (1) or Since Q (1) Q (1) T is an orthogonal projection onto the range of C ⟨1⟩ , we have Two terms Q (1) and Q (1) T C (1) are reshaped into tensors of conforming dimensions in the following manners (see figure 2 for graphical illustration) Tensorization : In the next step, the tensor C× 1 Q (1) T is reshaped to a 3rd order tensor as and the procedure is continued with the tensor C. In general, in the nth iteration of Algorithm 7, the following reshaped 3rd order tensor is considered Similar to the procedure discussed above, first its mode-1 unfolding, i.e. C (1) ∈ R Rn−1In×In+1...INR0 is computed. Then from the following randomized low rank matrix approximation the nth core tensor X (n) can be computed as

Remark 4. Algorithm 7 can be equipped with the power iteration technique when the data tensor is corrupted by a level of noise or equivalently the singular values of the unfolding matrices do not decay very fast and oversampling technique may not provide satisfactory approximations. To that end, in Algorithm 7 we replace Z = C Ω in Lines 3 and 11, by Z = C C T q C Ω.
Algorithm 7 needs an estimation of the TR-ranks in advance because it is necessary to have estimation of the unfolding matrices ranks for the projection step in Lines 2 and 11. This imposes a restriction on it because we may not have any information about the TR-ranks. It is of interest to choose the TR-ranks of tensors adaptively during running the algorithm. Estimating the TR-rank R n in (11) is equivalent to finding an orthogonal matrix Q (n) ∈ R Rn−1In×Rn satisfying This can be equivalently reformulated as the following problem: Problem 1. Suppose that C ∈ R InRn−1×In+1×···×IN and ε is a prescribed approximation error bound. The objective is to find a columnwise orthogonal matrix Q (n) ∈ R InRn−1×Rn with R n ≤ I n R n−1 , such that where I is the identity matrix of size R n−1 I n × R n−1 I n . Problem 1, can be solved by Algorithm 8 or 9, for a detailed study on these algorithms see [53,94]. Note that in Algorithm 8 a stopping criterion can be either a predefined maximum number of iterations or a predefined approximation error bound. An adaptive randomized algorithm based on this idea is summarized in Algorithm 10. At each iteration of Algorithm 10,
It is also possible to use randomized RR Algorithm 3 (also see Algorithm 5 in [97]). within the TR-SVD. This procedure is summarized in Algorithm 7 and we refereed to it as Randomized RRTR-SVD algorithm. Please note that in Lines 3 and 9 of Algorithm 7, by Rank-Revealing Algorithm, we mean Algorithm 3. Recently, this idea has been used for the Tucker decomposition [98] and here we utilize it for the TR decomposition.
Following the idea of computation of CANDECOMP/PARAFAC decomposition (CPD) [99][100][101] with a prior fast randomized HOSVD compression [102], in [103], a randomized algorithm was proposed for computation of the TR decomposition based on a prior Tucker compression. The idea is utilizing a randomized Tucker decomposition in the first step as a preprocessing step after which the deterministic algorithms such as Algorithm 4, Algorithm 5, Algorithm 6 or TR-ALS Algorithm [20] can be applied to the smaller Tucker core tensor.

13
Compute which may be difficult. However, an adaptive algorithm, e.g. Algorithm 3, can be used in Algorithm 12 to find the factor matrices and their corresponding multilinear rank automatically [98]. This procedure is summarized in Algorithm 14.
It has been shown in [105], that the computational complexity of the TT-SVD algorithm for decomposing an Nth-order tensor of size I × I × · · · × I and the TT-ranks (R, R, . . . , R) is O I N R 2 , due to the computation of N SVD of the unfolding matrices, (Theorem 2.1 in Page 2136). Since the computational complexity of the TR-SVD algorithm is the same as the TT-SVD algorithm, we have the same complexity for the TR-SVD algorithm. The idea of decomposing tensors in the TT format with a prior Tucker compression was first proposed in [16]. The computational complexity of TR-SVD (with a prior Tucker decomposition) of an Nth-order tensor of size I × I × · · · × I, with the Tucker rank R ,R, . . . ,R and the TR-ranks The first term is the cost for multiplying the unfolding matrices with random matrices which is the most expensive operation. The second term is for the TR decomposition of the core Tucker tensor. So, ifR < R 2 , then the approach of the TR decomposition with a prior Tucker compression is cheaper.
We should emphasize that the TR decomposition with a prior Tucker compression is applicable when • The underlying data tensor is of small order (up to 5) otherwise the curse of dimensionality occurs.
• The Tucker core tensor admits a low multilinear rank.
Concerning Algorithms 5 and 6, the complexity is more involved because they need several permutations of modes. They are not applicable directly to very large-scale tensors and a prior Tucker compression can somewhat reduce the computational complexity. Please note that all algorithms discussed in this paper achieve a suboptimal compression ratio and developing algorithms for finding the optimal model is a challenging topic that needs to be investigated. For example, in [85], novel algorithms are developed for the TT decomposition. Remark 5. For decomposing a streaming data tensor in the TT format, specialized algorithms should be used. Recently a streaming TT decomposition is developed in [106] with applications in cyber-physical big data. Further applications of the streaming TT decomposition in DNNs can be found in [107]. Remark 6. Recently an efficient technique called TT-HSVD was proposed in [108,109] for decomposing tensors in the TT format in which the core tensors can be computed in parallel. This is in contrast to the TT-SVD in which at Algorithm 10: Adaptive Random projection TR-SVD algorithm (see also [94]) Input: A data tensor X ∈ R I1×I2×···×IN , a prescribed approximation error bound ε, a positive number P and an initial rank R0 as a divisor of rank δ X ⟨1⟩ Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X (N) ≫, such that Apply Algorithm 3 to the C ⟨1⟩ and generate the columnwise orthogonal matrix Apply Algorithm 8 or 9 to the tensor C for solving Problem 1 and generate the columnwise orthogonal matrix Q (n) ∈ R InR n−1 ×Rn

Algorithm 11: Randomized RRTR-SVD Algorithm
Input: A data tensor X ∈ R I1×I2×···×IN , a prescribed approximation error bound ε, a power iteration q and an initial rank R0 as a divisor of rank δ X ⟨1⟩ Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X (N) ≫, such that each step just one core tensor is computed. This technique can be generalized for the TR decomposition employing randomization for further acceleration in computations. Remark 7. Consider Algorithm 1 which is the basic form of randomized algorithms for low rank approximation of matrices [53]. It has been recently shown in that for large sparse matrices, Algorithm 1 or its variants can be totally improved by prior transformation of a given data matrix into the TT matrix format (equivalently Matrix Product Operator (MPO)) [105], and performing all computations in the TT matrix format. This algorithm is called tensor train randomized SVD (TTrSVD) algorithm and simulations have shown that for some experiments, it can achieve more than 10 times speed up for the TT decomposition compared with tensor-based alternating least squares SVD (ALS-SVD) [27] and modified alternating least squares SVD (MALS-SVD) matrix approximation methods [27] even with better accuracy. Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , such that X − X ≤ ε ∥X∥ 1 Apply Algorithm 12 to the data tensor X to compress it in the Tucker model with Tucker rank multilinear rank (K1, K2, . . . , K N ) and obtain S, Q (1) , Q (2) , . . . , Q (N) 2 Apply Algorithm 4, Algorithm 6, Algorithm 5 or TR-ALS Algorithm [20] to the compressed data tensor S and obtain ≪ S (1) , S (2) , . . . , S (N) ≫ 3 Recover the TR cores of the original data tensor from the TR cores of the compressed data tensor, X (n) = S (n) ×2 Q (n) , n = 1, 2, . . . , N 4 Return ≪ X (1) , X (2) , . . . , X (N) ≫

Simulations
In this section, we evaluate the presented randomized algorithms for computation of the TR decomposition of synthetic and real data tensors. All numerical simulations were conducted on a laptop computer with 2.60 GHz Intel(R) Core(TM) i7-5600U processor and 8GB memory. The evaluation measures of the algorithms are compression ratio and relative error. The compression ratio is defined as Algorithm 14: Adaptive randomized TR decomposition with a prior Tucker compression (see also [103]) Input: A data tensor X ∈ R I1×I2×···×IN and a prescribed approximation error bound ε Output: Approximative representation of the tensor X in the TR format X =≪ X (1) , X (2) , . . . , X (N) ≫, such that X − X ≤ ε ∥X∥ and the TR-ranks {R0, R1, . . . , R N−1 } 1 Apply the r-SHOSVD algorithm [98] (also see Algorithm 5 in [97]) to compress the data tensor in the Tucker format and obtain S; Q (1) , Q (2) , . . . , Q (N) 2 Apply Algorithm 4, Algorithm 6, Algorithm 5 or TR-ALS Algorithm [20] to the compressed data tensor and obtain S represented as ≪ S (1) , S (2) , . . . , S (N) ≫ 3 Recover the TR cores of the original data tensor from the TR cores of the compressed data tensor, Return ≪ X (1) , X (2) , . . . , X (N) ≫ where C 1 , C 2 are the number of components of the original data tensor and its approximation in the TR format, respectively. Also, the relative error is defined as where X and X are approximate and exact data tensors, respectively. Example 1. In the first experiment, the performance and accuracy of the randomized TR algorithms are compared for synthetic data. We set the power iteration as q = 1 and the oversampling as P = 5, within the randomized algorithms. Consider a 4th-order random tensor X ∈ R 70×70×70×70 with exact TR-ranks (5,3,5,7). We applied the deterministic and randomized TR algorithms (Algorithms 5-7, 7, 10, 13, 14) to the tensor X. The possible initial rank, i.e. R 0 , for the TR decomposition were R 0 = 1, 3, 5, 15 and the best compression was achieved for R 0 = 15. The compression ratio of the TR-SVD achieved using different initial ranks R 0 are reported in table 1. The same compression was achieved using the randomized TR-SVD but with better running time reported in table 2. From table 1, it is seen that the best compression ratio was achieved by initial rank R 0 = 15.
Algorithms 5-6 and their variants with a prior Tucker compression (Algorithm 13 or 14) were able to find the TR-ranks {15, 1, 15, 21} but with higher computational costs. The running time and relative error of algorithms are reported in table 2. For a prior reduction in the Tucker format, the tensor was compressed to a tensor of size (15,15,35,35) using the r-STHOSVD algorithm. From table 2, it is seen that the randomized algorithms have better running time compared to the deterministic counterparts. Note that the last two algorithms in table 2 are non-adaptive and we gave the true TR-ranks to the algorithms. This is why they achieved a better compression ratio.
In a second simulation, we generated a 5th tensor of size 30 × 30 × 30 × 30 × 30 with TR-ranks (5,3,5,3,5). Here, we wanted to highlight the performance of TR decomposition with a prior Tucker compression when it is combined with Algorithms 5 and 6. Note that for the Tucker compression step, we used multilinear rank (15,15,15,15,25). The results of this experiment are reported in table 3. The results show significant speed-up of the deterministic algorithms using the randomization technique.
Example 2. In this example, we consider the highly oscillating function f (x) = (x + 1) sin 100(x + 1) 2 depicted in figure 4. It was shown in [110] and [111] that the discretized form of such functions after a tensorization can be compressed in the TR format. Discretization of this function over the range [−1, 1] with   . Discretization of this function with step-size 1 2 24 followed by a tensorization has low TR-ranks [110,111] (3,7,21,21) which can be computed via the MATLAB function 'mlrank.m' included in the tensorlab toolbox [112]. The MATLAB function 'mlrankest.m' gave the numerical multilinear rank (2,4,16,16) but we used the former in our compression step. We again used Algorithm 12 for the Tucker compression step with the multilinear rank (3,7,21,21). In this experiment, we highlighted the importance of prior Tucker compression in the data tensor in reducing the running time of TR algorithms with cyclic permutations of the cores. The results of this simulation are reported in table 4. From the results, the superiority of the randomized algorithms compared to deterministic counterparts is visible. Also, all algorithms achieved the same compression ratio.  Example 3. In this simulation, we are dealing with the tensor completion problem. The problem of recovering a given data tensor from its partially observed components is known as tensor completion [113,114]. The TR model has recently been used for recovering the missing elements of incomplete higher-order data tensors in [34,[115][116][117]. Here, we used the TR low rank factors (TRLRF) algorithm proposed in [116,117] because as reported in [116,117], it was the most stable and efficient algorithm in terms of sensitivity to TR-ranks and reconstruction error compared to others. The algorithm is based on nuclear norm minimization formulation and it uses the ADMM algorithm [118] to solve the underlying optimization problem 10 . It requires to compute the SVD of some unfolding matrices in Lines 86-88 of the MATLAB code 'TRLRF.m' . We modified it to a randomized algorithm by replacing the SVD with Algorithm 3 where the block size was b = 100 and the power iteration was q = 1. We call it randomized TRLRF (R-TRLRF) algorithm.
Consider three benchmark images ('Giant' , 'Kate' , 'Peppers') depicted in figure 7, respectively (images in the first column). The "Giant' was of size 256 × 256 × 3, while the 'Kate' and 'Peppers' were of size 512 × 512 × 3. We reshaped 'Giant' , 'Kate' , 'Peppers' to 7th-order tensors of sizes 8 × 8 × 4 × 8 × 8 × 4 × 3, 16 × 8 × 4 × 16 × 8 × 4 × 3 and 16 × 8 × 4 × 16 × 8 × 4 × 3, respectively. We removed randomly 80% of the pixels of the mentioned images and the TRLRF and the R-TRLRF algorithms were applied to the images with missing pixels with varying TR-ranks R 1 = R 2 = · · · = R 7 = R, for R = 12, 13, …, 20. The running times of the TRLRF and the R-TRLRF algorithms for different TR-ranks and all images were almost the same and we only report it for the 'Giant' image and TR-ranks 20 in figure 5. From figure 5, the scalability of the R-TRLRF algorithm compared to the TRLRF algorithm is visible. Our simulations confirmed the R-TRLRF algorithm achieves roughly the same accuracy as the TRLRF algorithm while it is computationally much cheaper. The PSNR and the RSE of recovered images obtained by the TRLRF algorithm and the R-TRLRF algorithm are reported in figures 6, where RSE = ∥X− X∥ F ∥X∥ F , PSNR = 10log 10 255 2 /MSE , and MSE = X − X 2 F /num (X) . Note 'num' denotes the number of parameters of a given data tensor. Here again, it is seen that the R-TRLRF algorithm outperforms the TRLRF algorithm. We visualize the results of recovered images by the TRLRF algorithm and the R-TRLRF algorithm in figure 7 for TR-rank 20.

Conclusion
In this paper, we reviewed and extended a variety of randomized algorithms for computation of the Tensor Ring (TR) decomposition which to some extent can be applied to the TT decomposition with R 0 = 1. We discussed both adaptive and non-adaptive randomized algorithms for this task. Our main focus was on the random projection technique and used it to speed-up the decomposition of a tensor in the TT/TR format. Simulations were performed on synthetic and real data-sets to support the presentation.

Data Availability
Data sharing is not applicable to this article as no new data were created or analysed in this study. The analyzed data are available online.
The synthetic data were generated using functions of the TR toolbox https://github.com/oscarmickelin/ tensor-ring-decomposition.