Tensor networks for interpretable and efficient quantum-inspired machine learning

It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.


I. INTRODUCTION
Deep machine learning (ML) such as those based on deep neural network (NN) has achieved tremendous successes in, e.g., computer vision and natural language process.However, the dilemma between the interpretability and efficiency, which is a long-concerned topic [1][2][3][4], has caused several severe challenges.Generally speaking, interpretability is defined as the degree to which a human can understand the cause of a decision, which is critical to question, understand, and trust the deep ML methods [4].
Though universal characterizations of interpretability are still controversial, it is widely recognized that the powerful deep NN models are non-interpretable.Due to their high non-linearity, rigorous understanding of the underlying mathematics of such models is mostly unlikely.Interpretations in these cases are usually "post hoc" (see, e.g., a recent work in Ref. [5]), in contrast to the "intrinsic" interpretation with "white-box" ML models or by knowing how the models work [3,6].Consequently, the investigations are generally conducted in an inefficient trial-and-error manner that consumes significant human and computational resources.Another serious issue brought by the non-interpretability concerns the robustness.For instance, a well-trained deep NN model might be severely disturbed by noises or intentional attacks (see, e.g., Ref. [7]).Currently, there exist no general theories to quantitatively characterize how far the deep NN models can be trusted or how significant different disturbances can affect the predictions.From the perspective of applications, interpretability concerns several vital issues such as fairness and privacy [1,8].
Among the potential ways of opening the black boxes in deep ML, the (classical) probabilistic theories have drawn wide attentions.From a "revisionist" perspective, the deep NN has been incorporated with, e.g., the mutual information, relative entropy, or the physics-inspired renormalization groups [9][10][11][12][13], to show how the models process information.These interpretations are mostly post hoc, and help to understand the learning decision-making processes to certain extent.But the results or conclusions might strongly depend on data or the specifics of models.
The probabilistic ML models [14], such as Bayesian networks [15] and Boltzmann machines [16] (a type of Markov random fields), are regarded to be "white-box" and intrinsically interpretable.These models promise to interpret in the statistical ways that human minds can follow, where we have, e.g., the probabilistic reasoning to unveil the hidden casual relations [17][18][19].Unfortunately, the gaps between the performance of these probabilistic models and state-of-the-art deep NN's are quite huge.It seems that high efficiency and interpretability cannot be reached simultaneously with the ML models at hand [20].

II. TENSOR NETWORK: A POWERFUL "WHITE-BOX" MATHEMATICAL TOOL FROM QUANTUM PHYSICS
With the fast development of both the classical and quantum computations, tensor network (TN) sheds new light on getting out of the dilemma between interpretability and efficiency.A TN is defined as the contraction of multiple ten-FIG.2. (Color online) MPS [21] (or TT form [22]) can efficiently represent or formulate a large group of mathematical objects.For quantum states, it can represent the cat states [32], AKLT state [33,34], W-state [35], and the states that can be efficiently simulated by DMRG [36,37] and TEBD [38,39].Equipped by the scaling theories [40][41][42][43], MPS can access the properties of the critical states obeying the 1D area law with logarithmic correction.For ML, MPS can represent the restricted Boltzmann machines [16] and the quantum Born machines [44,45].It has been used to parameterize the classification boundaries for supervised learning [46] and the agents for reinforcement learning (RL) [47].The high-order tensors storing the parameters of NN can also be formulated as MPS (or MPO) for model compression [48][49][50][51][52][53][54][55].
sors.Its network structure determines how the tensors are contracted.Fig. 1 gives the diagrammatic representations of three kinds of TN's, namely matrix product state [21] (MPS, which is also known as the tensor-train form [22] with the open boundary condition), tree TN [23], and projected entangled pair state [24].Taking an MPS formed by M tensors as an example, it results in an M -th order tensor T after contracting the virtual indexes [the black bonds in Fig. 1 (a)] that satisfies (1) TN has achieved significant successes in quantum mechanics as an efficient representation of the states for the large-scale quantum systems [25][26][27][28][29]. From the perspective of classical computation of quantum problems, the dimension of Hilbert space [30] and the parameter complexity of quantum states scale exponentially with the size of the quantum system, which is known as the "curse of dimensionality" or "exponential wall".This makes large quantum systems inaccessible by the conventional methods such as exact diagonalization [31].TN reduces the parameter complexity of representing a quantum state to be just polynomial, thus efficient simulations of a large class of quantum systems became plausible.Taking MPS as an example again, the number of parameters are reduced from #(T ) ∼ O(d M ) (the number of elements in the M -th order tensor T ) to #(MPS) ∼ O(M dχ 2 ) (the total number of parameters in the tensors {A (m) } (m = 1, . . ., 6)), with d = dim(s m ) and χ = dim(α m ).
Among the TN theories, it has been revealed that the states satisfying the "area laws" [56] of entanglement entropy [57] can be efficiently approximated by the TN representations with finite bond dimensions [24,29,58,59].Entanglement entropy [60] is a fundamental concept in quantum sciences [61].
From a statistic perspective, the entanglement entropy between two subparts of a system characterizes the amount of the gained information on one subpart by knowing the information on the rest part.Luckily, most of the states that we care about satisfy such area laws, meaning the entanglement entropy scales not with the volume of the subpart but with the length of the boundary.For instance, MPS obeys the one-dimensional area law, where the boundary is represented by two zerodimensional points and the entanglement entropy is a constant.Such an area law is satisfied by the low-lying eigenstates of many one-and quasi-one-dimensional quantum lattice models [62][63][64][65][66][67][68], including those with non-trivial topological properties [33,34].Therefore, the MPS-based algorithms including density matrix renormalization group (DMRG) [36,37] and time-evolving block decimation (TEBD) [38,39] exhibit remarkable efficiency for simulating such systems.
Moreover, a large class of artificially constructed states widely used in quantum information processing and computing can also be represented by MPS, such as Greenberger-Horne-Zeilinger (GHZ) states (also known as the cat states) [32] and W-state [35] (see the orange circles in Fig. 2).The multi-scale entanglement renormalization ansatz (MERA) is designed to exhibit the logarithmic scaling of the entanglement entropy, which efficiently represent the critical states [69][70][71].The projected entangled pair state (PEPS) is proposed to obey the area laws in two and higher dimensions, which gained tremendous successes in studying higher-dimensional quantum systems [24,[72][73][74].In short, the area laws of entanglement entropy provide intrinsic interpretations on the representational or computational power of TN for simulating quantum systems.Such interpretations also apply to the TN ML.Furthermore, the TN representing quantum states can be interpreted by Born's quantum-probabilistic interpretation (also known as Born rule) [75].Thus, TN is regarded as a "white-box" numerical tool (called Born machine [44]), akin to the (classical) probabilistic models for ML.We will focus on this point in the next section.

III. TENSOR NETWORK FOR QUANTUM-INSPIRED MACHINE LEARNING
Equipped with the well-established theories and efficient methods, TN illuminates a new avenue on tackling the dilemma of interpretability and efficiency in ML.To this end, two entangled lines of researches are under hot debate, which are 1.How do the quantum theories serve as the mathematical foundation for the interpretability of TNML?
2. How do the quantum-mechanical TN methods and quantum computing techniques conceive the TNML schemes with high efficiency?
Focusing around these two questions, below we will introduce the recent inspiring progresses on TN for quantuminspired ML from three aspects: feature mapping, modeling, and ML by quantum computation.These are closely related to the advantages of TN for ML on gaining both efficiency and interpretability.As the theories, models, or methods are taken from or inspired by those in quantum physics, these ML schemes are often called "quantum-inspired" (see, e.g.,a recent work in Ref. [76]).But be noted that significantly more efforts are required to make towards a systematic framework of interpretability based on quantum physics.The main methods in the TN ML mentioned below, with their relations to efficiency and interpretability, are summarized in Table I.
A. Quantum feature mappings and kernel functions Previous works have provided us with stimulative hints on the quantum-inspired ML based on TN.A fundamental step for a quantum treatment of ML is to map the data to the Hilbert space, i.e., to encode them in quantum states [77].A sample in machine learning can be an image, a sentence, a piece of time series, or etc., which can normally be regarded as a vector.The vector elements are called the features of the sample.The encoding of samples to quantum states is flexible [78][79][80][81][82].A straightforward encoding way is to treat the features as the amplitudes of a quantum state.For instance, a sample with M features M the normalization factor and {|ϕ m ⟩} a set of complete basis states [83].The Born rule says that the probability of having the state |ϕ m ⟩ from |ψ⟩ equals to the norm of the corresponding amplitudes, satisfying P (|ϕ m ⟩) = |x m /Z| 2 .It was also proposed to encode to the amplitudes of the state of ⌈log 2 M ⌉ qubits [82], to avoid the usage of high-level qudits [84].
A different way is to encode samples to the Hilbert space of quantum many-body states, which is dubbed as quantum many-body feature mapping (QMFM).Each feature (that is a scalar) is mapped onto a two-component vector, which can be treated as the state of one qubit [80] as where {|s⟩} (s = 0, 1) can be chosen as the two eigenstates of Pauli operator σz [85] and we normally assume In other words, x m is mapped to a normalized two-component complex vector.The probability of having the state |s⟩ from One may also choose an equivalent mapping x m → |x m ⟩ = R(x m )|0⟩ with R a rotation operator and x m the rotational angle, for the propose of quantum computation.By means of QMFM, a sample (vector) x is mapped to a M -qubit product state with M = dim(x) the number of features (or qubits in the state).One can see that a non-linear feature-mapping function will definitely introduce non-linearity in processing the sample x, despite the following-up treatments on the encoded state in the Hilbert space might be linear.This non-linearity in principle would not harm the interpretability but might bring the quantum-inspired ML schemes with an accuracy competitive to the non-linear ML models if the non-linearity is key to gaining high accuracy.Developing new quantum feature mapping for high accuracy is currently an open issue.
There are in general two ways to process the encoded data.One follows the ideas of the kernel-based methods such as Kmeans [86], K-nearest neighbors [87,88], and support vector machine (SVM) [89].The kernel function in the quantuminspired ML is determined by both the feature-map function [e.g., Eq. ( 2)] and the chosen measure of similarity (or distance) in the Hilbert space.A frequently-used measure of similarity between two quantum states is fidelity, which is defined as the norm of the inner product of the two states [60].With the QMFM in Eq. ( 2), the fidelity between two encoded states becomes the cosine similarity between the two samples, where the kernel function reads The above measure of similarity decays exponentially with the number of features (or qubits) M , which is known as the "catastrophe of orthogonality".Modifications on the measure were proposed, such as the rescaled logarithmic fidelity [90], to avoid such a catastrophe for better stability and performance.Exploring proper measures or kernel functions in the Hilbert space remains an open question in the field of quantuminspired ML.
Besides the catastrophe of orthogonality, the dimension of the Hilbert space increases exponentially with the number of qubits.TN algorithms can be employed to efficiently represent the encoded data and evaluate the similarity including fidelity [91].For instance, developing valid methods to distinguish quantum phases [92] is a long-concerned issue, to which quantum-inspired ML brought new clues.Ref. [93] considered the unsupervised recognition of quantum phases and phase transitions.The ground states of quantum spin models with different physical parameters (such as external magnetic fields), which can be regarded as the "quantum data" (obtained by DMRG [36,37]), are represented efficiently by MPS's.The phase transitions are clearly recognized unsupervisedly by visualizing the distribution of these MPS's with non-linear dimensionality reduction [94], where the ground states in the same quantum phase tend to cluster towards a same sub-region of the Hilbert space.In this way, prior knowledge on the quantum phases, such as order parameters [95], is not required for identifying quantum phases.
Similar observations were reported for the ML of images and texts, where the encoded states (by QMFM) of the samples in the same category tend to cluster in the Hilbert space [90].Clustering and the exponential vastness of the Hilbert space make the classification boundary easier to locate [46,90,96,97].This shares a similar spirit with the SVM's [98].Another work showed that the classification boundary can be efficiently parametrized by the generative MPS [99], which we will talk TABLE I.The main methods in TN ML with their relations to efficiency and interpretability.The green shadows indicate the existence of established schemes or solid evidences on the corresponding subjects.The yellow shadows indicate that the subjects are promising but with only some preliminary results or with disputes.about later in this paper.These works drew forth an open question on how different quantum kernels [78,100] would affect the ML efficiency for different kinds of data (such as images, texts, multimodal data, and quantum data) and for different ML tasks (supervised learning, unsupervised learning, reinforcement learning, etc.).

B. Parameterized modeling and quantum probabilistic interpretation with tensor network for machine learning
We now turn our focus on training the quantum-inspired ML models parameterized by TN.This provides another pathway of processing the quantum data (including the ground states for quantum phase recognition and the encoded states mapped from ML samples).Some relatively early progresses were made on the data clustering based on the quantum dynamics satisfying the Schrödinger equation [101] with a datadetermined potential [102,103].This concerns to inversely solve the differential equations, which can be efficiently done with a small number of variables [104][105][106].For dealing with the quantum many-body states whose dimension scales exponentially, TN has been utilized to develop ML models with polynomial complexity [45,46,80,99,[107][108][109], thanks to its high efficiency in representing quantum many-body states and operators.
By taking the advantages of its connections to Born rule and quantum information theories, intrinsically interpretable ML based on TN was developed, which is referred as the quantuminspired TN ML.The key here is to build the probabilistic ML framework from quantum states, which can be efficiently represented and simulated by TN.In this sense of probabilities, the intrinsic interpretability of such TN ML is akin to or possibly beyond the interpretability of the classical probabilistic ML.In Ref. [99], MPS was suggested to formulate the joint probabilistic distribution of features provided with a dataset, and is used to implement generative tasks.Provided with a trained MPS |Ψ⟩, the probability of a given sample x is determined by the quantum probability of obtaining |x⟩ [Eq.( 3)] by measuring |Ψ⟩, satisfying Following Born's quantum probabilistic rule, the marginal and conditional probabilities can be naturally defined.For instance, the marginal probability distribution of the m-th feature x m satisfies where |x m ⟩ is defined by Eq. ( 2) and ρ(m) = Tr /xm |Ψ⟩⟨Ψ| is the reduced density operator of the m-th qubit by tracing over the degrees of freedom of all other qubits except for the mth.The marginal probability distributions can be used for, e.g., generation [99] and simulating onsite entanglement [110,111] (which we will introduce below).The conditional probabilities are useful for, e.g., classification [112] and data fixing [99,113].The conditional probability distribution of x m with knowing the values of the rest features satisfies where |ϕ (m) ⟩ = ⟨Ψ| ⊗m ′ ̸ =m |x m ′ ⟩/Z is the quantum state by collapsing all the qubits but the m-th to ⊗m ′ ̸ =m |x m ′ ⟩ according to the known values.
The quantum probabilistic interpretation gave birth to various quantum-inspired ML schemes for, e.g., model optimization [80,99,108,114], data generation [45,99], anomaly detection [115], compressed sampling [113], solving differential equations [116] and constrained combinatorial optimization [117][118][119].Some important ML proposals based on MPS are illustrated in Fig. 2 (see the green circles).These schemes are based on the probabilities obeying Born rule, where the probabilistic distributions can be efficiently represented and calculated with TN.
The calculation of the above probability distributions involve all coefficients of the quantum state |Ψ⟩, whose number increases exponentially with the number of features (M ).Note we normally have M ∼ 10 2 or more (say M = 784 for the MNIST dataset of the images of hand-writing digits [120]), meaning 2 M ≃ 10 236 coefficients in the state |Ψ⟩.The advantage of TN on efficiency here is the same as that for using TN for quantum simulations.Taking again MPS as an example, its parameter complexity for representing the probability distributions scale just linearly with M (one may refer to Eq. ( 1) and the texts below).With the MNIST dataset, accurate generation can be made by the MPS with about 10 8 parameters [99,113].Similar advantages on efficiency can be gained with other kinds of TN, such as MERA and PEPS, using the corresponding TN contraction algorithms [27].We shall stress that the high efficiency of TN we refer to here is for representing or simulating the quantum states in the quantuminspired ML.Another aspect of efficiency is from the power of the quantum computation, which we will discuss about later in Sec.III C.
In the construction of generative models for ML, the quantum state |Ψ⟩ satisfying P (x (n) ) ≃ 1/N could be the equal super-position of the encoded product states {|x (n) ⟩} [Eq.( 3)] as with N the number of samples.The equal probabilistic distribution can be deduced from above equation with the "catastrophe" of orthogonality |⟨x (n) |x (n ′ ) ⟩| ≃ δ nn ′ for large M [one may refer to Eq. ( 4)].Taking the phase factors e iθn = 1, such a state is called the lazy-learning state [46] since it contains no variational parameters, and is directly constructed by the training samples when it is used for classification.
The generative MPS proposed in Ref. [99] is essentially an approximation of the lazy-learning state with variationallydetermined phase factors and bounded dimensions of the virtual indexes [{α m } in Eq. ( 1)].The approximation restricts the upper bound of the entanglement in the TN state by discarding the small elements in the entanglement spectrum, which is a widely recognized method in the TN approaches for quantum physics.The reported results on supervised learning for classifying images showed that the generative MPS exhibits lower accuracy on the training set but magically higher accuracy for the testing set than the lazy-learning state [46].In other words, the approximation made in the generative MPS suppresses over-fitting.This implies possible connections of the quantum super-position rule and quantum entanglement to generalization ability and over-fitting in ML [121,122], which is worth exploring in the future.
The quantum probabilistic nature allows to introduce the physical concepts and statistic properties of the TN models to the investigations of ML.For instance, the entanglement entropy has been used for feature selection [110,111].The importance of a feature can be characterized by how strongly the corresponding qubit is entangled to others, using the onsite entanglement entropy [110] with ρ(m) the reduced density matrix of |Ψ⟩ for the m-th qubit.Note the onsite entanglement entropy is the von Neumann entropy of the marginal probabilities in Eq. ( 6).
Other quantities and theories in quantum sciences, such as quantum mutual information [123], quantum correlations [124], decoherence [125], and controllability of quantum systems [126] also help to enhance the ML interpretability.These issues are still in hot debate, illuminating a promising path to characterize the ML-relevant capabilities, such as the learnability and generalization powers.The entanglement scaling was applied to unveil the properties of quantum ML models such as the quantum NN's [127,128].The controllability of quantum mechanical systems based on the dynamical Lie algebra was applied to understand and handle the over-parameterization [129] and gradient vanishing [130] (also known as barren plateaus [131]), which are critical issues for ML based on classical or quantum methods.
To give a brief summary of this subsection, we introduced the probabilistic framework for the quantum-inspired TN ML.This framework allows us to borrow the theories and techniques from both the classical and quantum information sciences, so that the quantum-inspired ML can possess equal or better interpretability than the classical probabilistic ML.But many issues have not been sufficiently explored, and more efforts need to be made in the future in this newly emergent area.

C. Quantum-inspired tensor network machine learning with quantum computation
In this subsection, we will concentrate on the combination of the quantum-inspired TN ML with the quantum computational methods and techniques, mainly for the purpose of high efficiency.The key here is the ability of TN as an efficient representation of quantum operations, which is essential for simulating the physical processes in quantum mechanics such as dynamics and thermodynamics [132][133][134][135][136][137][138].For the TN ML with quantum computation, TN can serve as a mathematical representation of the quantum circuit models [139].As the quantum counterpart of the classical logical circuits, a quantum circuit is composed of multiple quantum gates [140] that are usually unitary operators executable on the quantum computers.Efficient ways of deriving the quantum circuits for implementing quantum algorithms are key to quantum computation and to the ML by quantum computers.
However, the quantum algorithms where the circuits can be analytically derived (e.g., Shor's algorithm for factoring [141], Grover's algorithm for searching [142], and the disentangling circuits recently proposed for preparing MPSs [143]) are extremely rare.This puts strong limitations to quantum computation, including its applications to ML.
Variational quantum algorithms (VQAs) [144] were proposed, significantly extending the scope of problems that quantum computation can handle.Among others, the quantum circuits containing variational parameters, dubbed as the variational quantum circuits (VQCs), have been used for, e.g., state preparation and tomography [145,146].Variational quantum eigensolvers [147] were developed and applied to a wide range of areas such as quantum chemistry and quantum materials (one may refer to a recent review in Ref. [148]).
VQAs concern the hybrid classical-quantum optimizations [149], which also suffer from the "curse of dimensionality".Naturally, TN has been employed as an efficient mathematical tool for classical computation, allowing to stably access equal or even much larger numbers of qubits than what the current quantum computers can handle [146,[150][151][152][153][154][155][156].
The relevant works provided valuable information on the races between the powers of classical and quantum computations in the noisy intermediate-scale quantum (NISQ) era [157], such as the efficiency of random-circuit sampling by the quantum computer of Google [158] and by the TN simulations [153,155,[159][160][161][162].
The previous works on TN and quantum computation have unveiled the underlying connections among quantum mechanics, quantum computing, and ML, which makes TN a uniquely suitable tool to explore ML incorporated with quantum computation [163].As an example, let us compare the state preparation by VQC with the supervised ML by NN.The task of state preparation is to obtain the given target state (|ψ⟩) on a quantum computer with high fidelity, say |ψ⟩ ≃ Û |0⟩ with Û a unitary transformation.The VQC is used to represent the mapping Û , similar to the feed-forward mapping defined by the NN.As the building blocks of VQC, the quantum gates with variational parameters are analogous to the neurons in NN.The preparation error can be evaluated by the infidelity (f in = 1 − |⟨ψ| Û |0⟩|, a measure of distance in quantum Hilbert space) [60] between the target state and the one prepared by the VQC, analogous to the loss function (or error) in ML.The gradients and the gradient-descent process for optimizing the parameters in VQC can be implemented by the TN methods or automatic differentiation technique [143,146,[164][165][166][167], analogous to the backward propagation of ML.The optimized circuit, after necessary compilation [168], can be distributed to quantum computers for their further uses.
TN has been extensively used in the quantum-inspired ML combined with quantum computation methods including VQC [108,112,[169][170][171][172][173][174].The utilizations of TN in this sense can be generally summarized into three different but relevant ways.First, the TNs with unitary constraints are used as the parameterized ML model runnable on quantum platforms [108,112,139,170,173].Here, the equivalence between the unitary TNs and the quantum circuits are utilized.It is expected that the computational power (with, e.g., parallel quantum computing [175]) would bring high efficiency to the ML schemes running on the quantum platforms.Second, TN methods are employed to classically optimize or simulate the VQCs for machine learning [47,172,173].Such classical TN simulations have already been widely applied to the races between the powers of classical and quantum computers [153,155,[159][160][161][162], and are expected to provide useful information to guide the future investigations of the ML running on the quantum platforms.Third, TN are used for data preprocessing before entering the quantum computational procedures.For example, data compression by TN was suggested to lower the requirement of quantum hardware [169].TN is also used as an efficient representation of the quantum data (such as the states of quantum systems and the quantum states encoded from the classical data) for, e.g., classification and ML-based quantum control [47,81].These researches are particularly useful for the quantum computation of ML in the NISQ era when the power of quantum hardware (number of qubits, stability, and etc.) is for the moment limited.

IV. TENSOR NETWORK ENHANCING CLASSICAL MACHINE LEARNING
As a fundamental mathematical tool, the wide applications of TN to ML are not limited to those obeying the quantum probabilistic interpretation.With the fact that TN can be used to efficiently represent and simulate the partition functions of classical stochastic systems, such as Ising and Potts models (see, e.g., Refs.[176][177][178]), the relations between TN and Boltzmann machines [16] have been extensively studied [179][180][181][182].The relevant works also promoted the investigations of quantum many-body physics and ML from the perspective of, e.g., the area laws of entanglement entropy and the representation ability of TN as quantum state ansatz [127,[182][183][184].
TN were also used to enhance NN or develop novel ML models [185], ignoring any probabilistic interpretations.Tensor-train [22] and tensor-ring [186] forms (which correspond to the MPS's with open and periodic boundary conditions, respectively) are applied to develop novel support vector machines [46,187,188], dimensionality reduction schemes [189], and parameterized ML models [190][191][192][193][194][195].Significant reductions of parameter complexity were reported, thanks to the efficiency of TN for representing higher-order tensors, which is essentially the same reason for the high efficiency of TN in describing quantum many-body systems.
Based on the same ground, model compression methods were proposed to decompose the variational parameters of NN's to TN's, or directly to represent the variational parameters as TN's [48][49][50][51][52][53][54][55][196][197][198][199].Explicit decomposition procedures might not be necessary for the latter, where the parameters of NN's are not restored as tensors but directly as the tensor-train (tensor-ring) forms [196], matrix product operators [51,52], or deep TN's [55].Non-linear activation functions were incorporated with TN to improve the performance for ML [54,55,194], generalizing TN from a type of multilinear models to non-linear ones.

V. DISCUSSION
Methodologies to solve the dilemma between efficiency and interpretability in artificial intelligence (AI) and particularly deep ML have been long concerned.We here review the inspiring progresses made in TN for interpretable and efficient quantum-inspired ML.The advantages of TN for ML are listed in the "TN-ML butterfly" in Fig. 3.For the quantum-inspired ML, the advantages of TN can be summarized into two critical points: quantum theories for interpretability, and quantum methods for efficiency.On one hand, TN enables us to apply quantum theories and statistics, such as the entanglement theories, to build a probabilistic framework for interpretability that might be beyond the description of the classical information or statistic theories.On the other hand, the powerful quantummechanical TN algorithms and the explosively boosted quantum computational technologies will empower the quantuminspired TN-ML methods with high efficiency on both the classical and quantum computational platforms.
Particularly with the striking progresses recently made in the generative pre-trained transformer (GPT) [200], unprecedented surges in model complexity and computational power have occurred, which bring new opportunities and challenges to TN ML.The interpretability will become increasingly valuable when facing the emergent AI of GPT for not just investigations with higher efficiency but also their better use and safer control.In the current NISQ era and the forthcoming era of genuine quantum computing, TN is rapidly growing into a featured mathematical tool to explore the quantum AI from the perspective of theories, models, algorithms, software, hardware, and applications.ample to understand the area law.If every person in this land can only communicate with the persons in a short range nearby, the people who can communicate to a different village should live near the boarders.Therefore, the amount of exchanged information between a village and the rest should scale with the length of its boarders.In this case, the area law is obeyed.Someday, phones are introduced to this land, and people are able to communicate with anyone in this land.The amount of exchanged information of a village should scale with its population or approximately the size of its territory.This is named as the volume law, where obviously the amount of exchanged information increases much faster than the area-law cases if the village expands its territory.[61] Entanglement can be understood as a description of quantum correlations.A strong entanglement between two quantum particles means significant affections to the state of one particle by operating another.Entanglement entropy is a measure of the strength of entanglement.For instance, zero entanglement entropy between two particles means that their state should be

FIG. 3 .
FIG. 3. (Color online) The "TN-ML butterfly" summarizing the two unique advantages: interpretability based on quantum theories (left wing) and efficiency based on quantum methods (right wing).