Quantum discriminant analysis for dimensionality reduction and classification

We present quantum algorithms to efficiently perform discriminant analysis for dimensionality reduction and classification over an exponentially large input data set. Compared with the best-known classical algorithms, the quantum algorithms show an exponential speedup in both the number of training vectors M and the feature space dimension N. We generalize the previous quantum algorithm for solving systems of linear equations (2009 Phys. Rev. Lett. 103 150502) to efficiently implement a Hermitian chain product of k trace-normalized N ×N Hermitian positive-semidefinite matrices with time complexity of O ( log ( N ) ) . Using this result, we perform linear as well as nonlinear Fisher discriminant analysis for dimensionality reduction over M vectors, each in an N-dimensional feature space, in time O ( p polylog ( MN ) / ϵ 3 ) , where ϵ denotes the tolerance error, and p is the number of principal projection directions desired. We also present a quantum discriminant analysis algorithm for data classification with time complexity O ( log ( MN ) / ϵ 3 ) .


I. INTRODUCTION
With the rise in the fields of big data analysis and machine learning in the modern era, techniques such as dimensionality reduction and classification have gained significant importance in the information sciences.In machine learning and statistical analysis problems, when input vectors are given in an extremely large feature space, it is often necessary to reduce the data to a more manageable dimension/size before manipulation or classification.One classical example is in the problem of face recognition [8,9], where the size of the feature space is determined by a huge number of pixels representing each face.More recent applications are also seen in fields of medical imaging.For instance, [11] shows the necessity for dimensionality reduction in diagnosing cases of liver cirrhosis.Also, [12] shows the importance of dimensionality reduction for early Alzheimer's disease detection.
One widely used technique for dimensionality reduction is principal components analysis (PCA), where the data is projected onto the directions of maximal variance.However, a significant disadvantage of PCA is that it looks only at the overall data variance, and does not consider the class data.The extreme example of this would occur if the overall data variance is in exactly the same direction as the maximal within-class data variance, but orthogonal to the direction of maximal between-class data variance.In such a case, it is possible for a PCA projection to completely overlap the data from different classes, making it impossible to use the projected data to perform future discriminations.Fisher's linear discriminant analysis (LDA) is a technique developed to overcome this problem by instead projecting the data onto directions that maximize the between-class variance, while minimizing the withinclass variance of the training data.It is hence not surprising that LDA is shown to be more effective than PCA in machine learning problems involving dimensionality reduction before classification [8,9].
Another common application of discriminant analysis is to use it as a classifier itself, where labeled training vectors are presented as input and new cases must be efficiently assigned to their respective classes.The discriminant analysis classifier has recently been used in medical analysis, such as in analyzing electromyography (EMG) signals [13,14], lung cancer classification [15], and breast cancer diagnosis [16].While other algorithms such as the support vector machine (SVM) reach similar accuracy rates as the discriminant analysis classifier [17,18], studies [18] show that discriminant analysis is a significantly more stable model.This is because the separating hyperplane chosen by the SVM can depend only on a few support vectors, subjecting it to high variance in the case of training vector errors.On the other hand, since discriminant analysis bases its classification on the entire class means and variances, it tends to show less fluctuation and is potentially more robust in the presence of error.
A significant drawback of discriminant analysis in both dimensionality reduction and classification is the time complexity.Even the best existing classical algorithms for LDA dimensionality reduction require time O(M s) [7], where M is the number of training vectors given, and s is the sparseness (maximum number of nonzero components) The classical LDA dimensionality reduction algorithm is designed to return the directions of projection that maximize the between-class variance (for class discrimination), but minimize the within-class variance.With this result, in big data problems as listed in Section I, the vectors in a high-dimensional feature space may be projected onto a lower-dimensional subspace (spanned by the returned optimal unit vectors), so that less resources may be used to store the same amount of information.Suppose we are given M (real-valued) input data vectors {x i ∈ R N : 1 ≤ i ≤ M } each belonging to one of k classes.Let µ c denote the within-class mean (centroid) of class c, and x denotes the mean/centroid of all data points x.Furthermore, let denote the between-class scatter matrix of the dataset, and let denote the within-class scatter matrix.The goal is then to find a direction of projection w ∈ R N that maximizes the between-class variance w T S B w relative to the within-class variance w T S W w. Mathematically, assuming that the classes have approximately multivariate Gaussian distribution with similar covariances, this is the problem of maximizing the objective function (commonly known as Fisher's discriminant ) Since the expression for J(w) is invariant under constant rescaling of w, it is clear that the maximization problem given in (3) is equivalent to the optimization problem subject to w T S W w = 1. ( We are thus minimizing the Lagrangian [1] where λ is the desired Lagrange multiplier.By the Karush-Kuhn-Tucker conditions [27], this means that It follows that w is an eigenvector of S −1 W S B .By plugging (7) back into the objective function J(w), we get J(w) = λ.Hence, we choose w to be the principal eigenvector.
The above procedure generalizes easily to higher-dimensional projection subspaces.In this case, we seek p vectors which form a basis for our projection subspace; this corresponds to maximizing the discriminant where W is the N × p matrix whose columns are the basis vectors.Using the same analysis as above, one can show that the columns of W will be the eigenvectors corresponding to the p largest eigenvalues of S −1 W S B , as in the case of PCA.

Classification
Although its most widely-used application is probably in dimensionality reduction, discriminant analysis is also commonly used to directly perform data classification.For classification, one constructs the discriminant functions for each class c where Σ c is the covariance matrix for class c, µ c is the class mean for c as before, and π c is the prior probability for classifying into class c [19].Given a vector x, it is then classified into the class c = argmax c δ c (x).From the training vector data, if M c is the number of training vectors belonging to class c, we can approximate π c = M c /M for simplicity, i.e. the probability of classifying to a certain class c is directly proportional to the fraction of training vectors belonging to c [19].Assuming multivariate Gaussian distributions for each class, we also estimate Note that in the special case where the covariance matrices are all approximately equal (i.e.Σ c ≈ Σ ∀c), Σ is proportional to the scatter matrix S W given by Eq. (2).In this special case, the functions δ c are known as linear discriminant functions.In our paper, we present a quantum algorithm to solve the more general case, known as quadratic discriminant analysis (QDA), in time logarithmic in both the number of input vectors M and their dimension N .Our algorithm will be easily applicable to the special case of LDA classification.This provides exponential speedup over the fastest existing algorithms, since the classical construction/inversion of Σ c to evaluate the discriminant functions must require time polynomial in both M and N .
III. QUANTUM LDA ALGORITHM: DIMENSIONALITY REDUCTION

Assumptions and Initialization
The quantum Principal Components Analysis algorithm of [5] presents methods for processing input vector data if the covariance matrix of the data is efficiently obtainable as a density matrix, under specific assumptions about the vectors given in quantum mechanical form.While our major contributions are also in the processing mechanisms of the within-and between-class covariance matrices, we will describe how to obtain this density matrix under certain assumptions about the input data, like in Refs.[2,4,5].
In our algorithm, similar to the assumptions made in Refs.[2,4,5], we will assume we have quantum access to the training vector data in a quantum RAM (as described in [10]).We will assume that each training vector is stored in the quantum RAM in terms of its difference from the class means.That is, if a training vector x j belongs to class c j with centroid µ cj , we have the Euclidean norm and complex-valued components of the difference vector d j = x j − µ cj stored as floating-point numbers in quantum RAM in polar form (alternatively, if the input is presented directly as the training vectors x j and the class means µ c , we may first perform a component-wise subtraction of the given floating-point numbers, by [26]).Following the methodology of [3][4][5], we will assume the following oracle: to get the j th training vector and its class c j , where |x j − µ cj has already been normalized to one.Similarly, we also assume that we are given the quantum representations of the class centroids |µ c , in terms of their differences from the overall training vector mean |x .That is, if D c = µ c − x, we assume the oracle where we similarly assume that |µ c − x has been normalized to one.These oracles could, as an example, be realizable if the input data is presented in this form as the output of a preceding quantum system, or if the vector components are presented as floating-point numbers in the quantum RAM, and the sub-norms of the vectors can be estimated efficiently [4,[28][29][30].The oracles O 1 and O 2 allow us to construct density matrices proportional to S B and S W as follows: Let By [4,[28][29][30], if the norms of the vectors form an efficiently integrable distribution, we can obtain the states where In both cases, we now take the partial trace over the first register.Then, for the state of Eq. ( 12), the density matrix of the second register [5] is given by and for the state of Eq. ( 13), the density matrix of the second register is given by Assuming our oracles O 1 and O 2 , we can hence construct the Hermitian operators S B , S W in time O(log(M N )).

LDA Approach
Having initialized the means and operators S B , S W , our main task will be to solve the eigenvector problem of (7).This problem would be simpler if only S −1 W S B were Hermitian positive semidefinite.However, we can reduce this problem to an eigenvalue problem for a Hermitian density matrix: specifically, since S B is Hermitian positive semidefinite, letting w = S −1/2 B v reduces (7) to the eigenvalue problem [ To apply the quantum phase estimation algorithm to solve (16), we must first be able to construct the density matrix In the following section, we present a more general theorem that can be applied to construct this density matrix.

Implementing the Hermitian chain product
In this section, we state a theorem to construct the density matrix corresponding to the Hermitian chain product to error ǫ, for arbitrary normalized N × N Hermitian positive semidefinite matrices A 1 , ...A k , and functions f 1 , ...f k with convergent Taylor series.The derivation of this theorem follows the method presented in [2], and is presented in Appendix A.
Theorem 1: Let A 1 , ...A k be k normalized Hermitian positive semidefinite matrices whose quantum forms can be constructed in time O(log(N )) (e.g., by visits to a quantum RAM) and let f 1 , ...f k be k functions with convergent Taylor series.Let {λ jl } N l=1 denote the eigenvalues of matrix A j .Then the Hermitian operator in Eq. ( 17) can be implemented in time where κ j is the condition number for matrix A j , i.e. the ratio of the largest to smallest eigenvalue of A j .More generally, if construction of each matrix A j takes time O(X), the operator can be implemented in time We note that this provides exponential speedup over classical algorithms, as the optimal classical algorithm for multiplication of non-sparse N × N matrices requires time O(N 2.3737 ) [31].

Finding the principal eigenvectors
For LDA, we apply the theorem presented in the previous section to obtain the matrix product and f 2 (X) = X 1/2 .To avoid exponential complexity in the case of exponentially small eigenvalues, we adopt a technique used in [3] by pre-defining an effective condition number κ eff and taking into account only eigenvalues in the range [1/κ eff , 1] for phase estimation.(Typically, one may take κ eff = O(1/ǫ), because κ eff is a limit to the amount of eigenvalues considered in phase estimation, which should be proportional to the error tolerance).By our initialization procedures, preparation of S B and S W take time O(log(M N )), and by definition of f 1 , f 2 , and κ eff , Hence, by (19) we can obtain in time O(log(M N )κ 3.5 eff /ǫ 3 ).Using the matrix exponentiation technique presented in [5], we can then apply quantum phase estimation to obtain an approximation to the state where λ i and v i are the eigenvalues and eigenvectors of S  [22].(If all eigenvalues are indeed super-polynomially small, there are typically no suitable directions for discriminant analysis: all directions would be essentially the same in preserving between-class vs. withinclass data).Finally, having solved the eigenvalue problem of ( 16), we again use the technique of the previous section to obtain the eigenvectors of S −1 W S B .After obtaining these principal eigenvectors, the data can be projected onto the dimensions of maximal between-class variance and minimal within-class variance.At this stage, one has effectively performed dimensionality reduction, and can now easily manipulate the data with existing tools, e.g. a classifier (see [3] or Section IV below).

Algorithm 1: Quantum LDA Dimensionality Reduction
Step 1: Initialization.By querying the quantum RAM/oracles, construct the Hermitian positive semidefinite operators S B and S W as given by Equations ( 14) and (15), in time O(log(M N )).
Step 2: Since S B and S W are Hermitian positive semidefinite, use the method of [5] to exponentiate these operators.Apply the generalized matrix chain algorithm of Theorem 1 to implement S 1/2 in time O(log(M N )κ 3.5  eff /ǫ 3 ).
Step 3: is Hermitian positive semidefinite, use the method of [5] to exponentiate this operator.Use quantum phase estimation methods and sample from the resulting probabilistic mixture to then obtain the p principal eigenvalues/eigenvectors v r , in time O(p polylog(M N )/ǫ 3 ).
Step 4: Apply S where κ eff is a pre-defined condition number restricting the range of eigenvalues considered for phase estimation, typically taken to be O(1/ǫ), and p is the number of principal eigenvectors.
Note that the complexity presented here in ( 21) is polylogarithmic in both the number of input vectors M and their dimension N , regardless of training vector sparseness.

Nonlinear/Kernel Fisher Discriminant Analysis
Certainly, in many real-world cases, a straightforward linear discriminant may not be sufficient.Classically, a simple generalization is known as kernel Fisher discriminant analysis (kernel FDA), where the input vectors are first mapped (nonlinearly) by a function φ : x j → φ(x j ) into a higher-dimensional feature space F .The linear discriminant corresponding to J(w) in the feature space then becomes nonlinear in the original space.In the classical case, if the dimension of F is too large, it becomes computationally infeasible to perform operations such as matrix multiplication or exponentiation on the resulting large covariance matrices S φ B and S φ W . Instead, one must find workarounds such as kernel methods, and perform reductions so that the algorithm involves only dot products in the feature space [21], but this may seriously limit the potential choices of mapping φ.In the quantum case, however, we can directly perform the LDA analysis in the higher-dimensional feature space.As long as the dimension of F scales polynomially with the original input dimension, our algorithmic performance will be affected only by a constant factor.This allows for a much wider range of mappings φ into the feature space.

Algorithm
We now present an efficient quantum algorithm for the classification of an exponentially large data set by quadratic discriminant analysis (QDA), and our results can easily be applied to perform LDA classification.As before, we assume that the class means and training vector data are given with their norms and components stored as floatingpoint numbers in quantum RAM.We again assume the oracle O 2 to obtain the class means of the training vectors µ c and their norms, and in this section, we further assume that we can obtain the j th training vector of each class.As in Section III initialization or Refs.[3][4][5], our oracle gives the vectors x c,j are given as their difference from their class means (as in Section III, this may also be obtained from the stored floating-point numbers if necessary).Specifically, we assume the oracle As in Section III, |x c,j − µ c has been normalized to one.Finally, we now assume that for each class c, we are given the number of training vectors M c belonging to that class.
For QDA, we use the oracles to construct for each class c the Hermitian positive semidefinite operator where To do this, we first call O 3 on the state 1 Given an input vector |x in quantum form, we now present a method to find the maximum among the k discriminant functions δ c (x) given in Eq. ( 8).First, we apply Σ −1 c to the class mean |µ c , using the matrix inversion algorithm of [2].As before, to avoid exponential complexity with small eigenvalues, we introduce the pre-defined effective condition number κ eff to limit the range of considered eigenvalues.By [2], we can thus construct the state for each class c in time O(log(M N )κ 3 eff /ǫ 3 ).Next, recognizing the first two terms of the discriminant function of ( 8) as an inner product, we perform a SWAP test [32] on the states |Σ −1 c µ c and |x − 1 2 µ c to obtain the value This inner product evaluation requires time O(log(N )).Finally, for each class c, we add to the value in (24) the class prior π c = M c /M .We hence obtain the discriminant values δ c (x) for all of the k classes in time O(k log(M N )κ 3 eff /ǫ 3 ).It is then straightforward to identify the class yielding the maximum discriminant, to which the input vector is then classified.Note that our algorithm can be easily applied to perform quantum LDA classification, by using an operator proportional to the scatter matrix S W (see Section III, initialization) in place of Σ c for each class.

Algorithm 2: Quantum Discriminant Analysis Classifier
Step 1: Initialization.By querying the quantum RAM or oracles, construct the Hermitian positive semidefinite operators Σ c in time O(k log(M N )) for all classes c.
Step 2: Since Σ c is Hermitian positive semidefinite, use the method of [5] to exponentiate this operator.Apply the inversion algorithm of [2] on the state |µ c for each class to construct the states given by Eq. ( 23) in time O(k log(M N )κ 3 eff /ǫ 3 ).
Step 3: Take the inner product of Σ −1 c µ c with x − 1 2 µ c using the SWAP test, yielding the value in Eq. ( 24).This step requires time O(log N ).
Step 4: For each class c, add the class prior π c = M c /M to the value in (24) to obtain the final discriminant value δ c (x).
Step 5: Select the class c yielding the maximum discriminant value, and classify x accordingly in time O(k).

Algorithmic Complexity for Classification
Algorithm 2 above shows the pseudocode for our quantum QDA classifier.Step 1 (initialization) takes time O(k log(M N )/ǫ).Applying the inversion algorithm of [2] for each class then takes time O(k log(M N )κ 3 eff /ǫ 3 ).Computing all of the inner products given by ( 24) requires time O(k log N ), and adding the class priors requires time O(k).Finally, selecting the class with maximum discriminant takes time O(k).We add these to give the overall complexity below: Theorem 3: The quantum QDA classifier algorithm presented in this paper (with pseudocode given by Algorithm 2) can be implemented in time logarithmic in both the number of input vectors M and the input vector dimension N .Specifically, it has a runtime of where k is the number of classes for classification, and κ eff is a pre-defined condition number restricting the range of eigenvalues considered for phase estimation, typically taken to be O(1/ǫ).
Note that the complexity presented in Eq. ( 25) is logarithmic in both M and N regardless of training vector sparseness.

V. DISCUSSION
In this paper, we have presented a generalized algorithm from [2] for implementing Hermitian matrix chain operators, and applied it to implement an algorithm for quantum LDA in polylogarithmic time.As demonstrated by classical works such as [8,9], LDA is a powerful tool for dimensionality reduction in fields such as machine learning and big data analysis.Although our performance in terms of error ǫ is poorer than classical algorithms (polynomial instead of logarithmic in 1/ǫ), we believe that this is acceptable, since it is unlikely that someone desiring extreme levels of precision will wish to perform significant dimensionality reduction like that provided by LDA.Rather, we believe that the exponential speedup in terms of the parameters M and N should be more significant in reducing the overall algorithmic runtime.
Our work has also presented a quantum algorithm providing exponential speedup for the LDA and QDA classifiers.As classical studies [17,18] have shown, these classifiers typically perform just as well in terms of accuracy as the SVM (for which a quantum algorithm has been developed, [3]).However, they tend to have much better model stability [18], which can make them more robust in face of training data errors.Finally, discriminant analysis methods are much simpler when generalizing to multi-class classification, whereas the SVM is more suited for binary classification [23].In conclusion, this work has provided efficient exponential speedup for two important algorithms for dimensionality reduction and classification in big data analysis.In this appendix, we present the derivation of Eq. ( 19) from Theorem 1 in Section III.The proof of the theorem closely follows the matrix inversion algorithm presented in [2] (referred to in the following as the HHL algorithm).The HHL algorithm begins with the initial state for large T (see [2] for more details on original algorithm, we make a sketch below).Since the original algorithm was designed to apply the inverse of a matrix A on a specific vector |b , HHL considers the state |ψ 0 ⊗ |b .Here, we are interested instead in obtaining an operator for (17), so we use the density operator ρ 0 = 1 √ N N i=1 |i i| (proportional to the identity) in place of |b , and we use |ψ 0 ψ 0 | in place of |ψ 0 .
Following HHL, decompose ρ 0 in the eigenvector basis using phase estimation.Denote the eigenvectors of A 1 by {|u 1l } N l=1 , and let {λ 1l } N l=1 be the corresponding eigenvalues.Then, we write Quantum phase estimation is then applied on ρ 0 for time t 0 = O(κ 1 /ǫ) to obtain the state up to a tolerance error ǫ (κ 1 is the condition number of the matrix A 1 , or the ratio of the largest to smallest eigenvalue).In this step, the exponentiation of the Hermitian operator A 1 is performed using the trick presented in [5].By [5], n = O(κ 2 1 /ǫ 3 ) copies of A 1 are required to perform the phase estimation to error ǫ, so if A 1 can be constructed in time O(X), this step requires time O(nX) = O(Xκ 2 1 /ǫ 3 ).HHL then add an ancilla and perform a unitary controlled on the eigenvalue register.Here, we generalize this step to a controlled rotation from |0 0| to the state |ψ λ 1l ψ λ 1l ′ |, where and C is a constant of order O(min l {|f 1 (λ 1l )| −1 }) for normalization.Assuming f 1 has a convergent Taylor series, it is possible to efficiently perform this rotation using controlled gates (see Appendix B).This results in the overall state Next, following HHL, we undo phase estimation to uncompute the eigenvalue register, resulting in the state Finally, as in HHL, measure the ancilla to be |1 1|.By choice of C, this success probability is easily seen to be [33] This produces the density operator proportional to To generalize this to the entire matrix chain, we can simply repeat this algorithm with A 2 as the matrix, f 2 as the function, and start with the state ρ 1 instead of ρ 0 .More generally, at each iteration j, use A j , f j on ρ j−1 .At each step, n = O(κ 2 j /ǫ 3 ) copies of A j are required, and the probability of success in measuring the ancilla is given by the expression analogous to Eq. (32).Hence, for k matrices, the Hermitian operator in Eq. ( 17) can be implemented in time By [2], amplitude amplification can be used on the first matrix A 1 only to increase the measurement success probability from Ω In the case where A 1 , ...A k are density matrices presented in a quantum RAM, X = O(log(N )), and we obtain Eq. ( 18).
APPENDIX B: PERFORMING THE CONTROLLED ROTATION OF (29) In this section, we show how to perform a controlled rotation in the form of (29) for an arbitrary function f with convergent Taylor series.Specifically, we are to rotate the ancilla by the angle θ(λ) = sin −1 (Cf (λ)).This is not trivial, even for simple cases such as f (x) = 1/x in the original HHL algorithm.HHL do not provide a decomposition of this rotation in terms of controlled gates.Here, we will present such a method for arbitrary f .The idea behind our method is to approximate θ(λ) by its Taylor series.We will first construct the register |f (λ) from |λ , and then construct |θ(λ) from |f (λ) .This procedure is outlined below and presented in detail in Algorithm 3.
Since f has a convergent Taylor series, we approximate for some constant x 0 near λ to order n.Now, by choice of C, |Cf (λ)| < 1 lies in the radius of convergence of the Maclaurin series for sin −1 (x).Hence, we can substitute Cf (λ) into this Maclaurin series and approximate θ(λ) = sin −1 (Cf (λ)) ≈ Cf (λ) + (Cf (λ)) The multiplication in steps 1 and 2 on the quantum registers in Algorithm 3 can be performed very similarly to the exponentiation a n in Shor's algorithm [25].Suppose the binary strings |A and |B are given in quantum form.Then, the classical grade-school multiplication algorithm can be carried out by performing addition operations controlled on the qubits representing |B : at the k th iteration, if the k th qubit of |B from the right is |1 , add |2 k A (obtained by left-shifting qubits) to the result.Repeating for all k between 0 and the number of qubits in |B minus one gives the desired product.
Finally, once we have the binary representation of the rotation angle in the register |θ(λ) , we can implement the controlled rotation of the ancilla.Specifically, for each term 2 r appearing in this binary expansion, add a unitary controlled on the qubit coefficient of this term to rotate the ancilla by 2 r .Since any two-qubit controlled-U operation can be implemented with two controlled-NOT gates and single-qubit unitaries [24], the desired rotation of ( 29) can be implemented efficiently.

Algorithm 3: Constructing |θ(λ) from |λ
Step 1: Initialization.Prepare an auxiliary register to hold the value λ − x 0 .Prepare three more working registers: the first (initialized to 1) will hold the current power of λ − x 0 , the second (initialized to 1) will hold the value of the first register multiplied by the Taylor coefficient f (k) (x 0 )/k!, and the third (initialized to 0) will be a running total for the right hand side of Eq. (35).
Step 2: Multiply the first working register by λ − x 0 from the auxiliary.
Step 3: Multiply the value in the first register by the k th Taylor coefficient f (k) /k! and store the result in the second working register.
Step 4: Add the value in the second working register to the third working register.
Step 6: Repeat steps 1-5 now with a register containing |Cf (λ) in place of |λ , and with the function sin −1 in place of f .It suffices to expand around x 0 = 0.This step yields the register |θ(λ) , by the Maclaurin series approximation of (36).

2 B
. If the p principal (largest) eigenvalues are polynomially small, sampling produces the corresponding p principal eigenvectors v r of S 1/2

−1/ 2 B 2 BTheorem 2 :
to the v r 's (by the algorithm of Theorem 1) to get desired directions w r = S−1/2 B v r , in time O(p log(M N )κ 3 eff /ǫ 3 ).Step 5: Project data onto the w r 's for dimensionality reduction, or otherwise work in the directions of maximal class discrimination.Algorithmic Complexity for Dimensionality ReductionAlgorithm 1 above shows the pseudocode for our LDA algorithm.Step 1 (initialization) takes time O(log(M N )) with our quantum oracles.Implementing the operator S takes time O(log(M N )κ3.5  eff /ǫ 3 ), and finding its principal eigenvectors then takes O(log(M N )/ǫ 3 ).Finally, Step 4 takes time O(log(M N )κ 3 eff /ǫ 3 ) to apply S −1/2 B to the v r 's and obtain the eigenvectors of S −1 W S B .Hence, we can add these to get the total runtime: The quantum LDA algorithm presented in this paper (with pseudocode given by Algorithm 1) can be implemented in time polylogarithmic in both the number of input vectors M and the input vector dimension N .Specifically, it has a runtime of O p polylog(M N )κ3.5  eff /ǫ 3(21) |j | x c,j − µ c |x c,j − µ c .As in the initialization procedure of Section III, we use the methods of [4, 28-30] to obtain the state |χ c = 1 √ Ac M j=1 Mc j=1 x c,j − µ c |c |j | x c,j − µ c |x c,j − µ c .Tracing over |j in the outer product |χ c χ c | then yields the operator Σ c in time O(log(M N )).
APPENDIX A: PROOF OF THEOREM 1