A Manifold-Based Dimension Reduction Algorithm Framework for Noisy Data Using Graph Sampling and Spectral Graph

This paper proposes a new manifold-based dimension reduction algorithm framework. It can deal with the dimension reduction problem of data with noise and give the dimension reduction results with the deviation values caused by noise interference. Commonly used manifold learning methods are sensitive to noise in the data. Mean computation, a denoising method, is an important step in data preprocessing but leads to a loss of local structural information. In addition, it is difficult to measure the accuracy of the dimension reduction of noisy data. Thus, manifold learning methods often transform the data into an approximately smooth manifold structure; however, practical data from the physical world may not meet the requirements. The proposed framework follows the idea of the localization of manifolds and uses graph sampling to determine some local anchor points from the given data. Subsequently, the specific range of localities is determined using graph spectral analysis, and the density within each local range is estimated to obtain the distribution parameters. Then, manifold-based dimension reduction with distribution parameters is established, and the deviation values in each local range are measured and further extended to all data. Thus, our proposed framework gives a measurement method for deviation caused by noise.


Introduction
Manifold learning is used for nonlinear dimension reduction of natural and general data. ese data are often assumed to be nonlinear manifolds embedded in low-dimensional space [1]. When the data are ideally located on a smooth manifold, manifold-based methods such as ISOMAP [2], local linear embedding (LLE) [3], Laplace embedding (LE) [4], and local tangent space alignment (LTSA) [5] can give effective and accurate low-dimensional structures. However, the performance of these methods decreases when the noise interferes with the data and does not allow it to be accurately embedded on the manifold. In practice, noise from the natural environment, sensors, and human intervention interfere with most of the data collected from general backgrounds and objects. us, the dimension reduction of noisy data is imperative. is is highly significant for data mining applications in various fields.
Manifold-based dimension reduction of noisy data is challenging. Traditionally, Hein and Maier [6] and Gong et al. [7] reduce the noise impact through an error penalty factor, resulting in excessive smoothing and even deformation of the data. Current works mainly focus on the combination of denoising preprocessing and manifold dimension reduction. Zhao et al. [8] propose a semisupervised local multimanifold framework by linear embedding which incorporates the neighborhood reconstruction error to preserve local topology structures between both labeled and unlabeled samples. Kim and Lee [9] construct a metric function to calculate the distance between the data points and the expected smooth manifold and then project the data onto this smooth manifold. A similar work refers to Rajagopal et al. [10]. In Jin and Bachmann's work [11], the data are split into multiple regions and the dimension in each region is reduced to obtain principal components, connected to reduce the noise interference. Zhang et al. [12] use sparse-constrained manifold mapping to weaken the noise. Zhang et al. [13] propose a marginal Isomap for manifold learning which computes the shortest path on graphs to better separate interclass samples, and the authors in [14] further provide the semisupervised marginal manifold visualization. Hao et al. [15] project the data onto a local tangent plane and achieve denoising by averaging the projected coordinates, but the original local structure information is not retained simultaneously. Zhang et al. [16] propose a robust unsupervised nonnegative adaptive feature extraction algorithm which preserves the manifold structures by joint weights sharing in low-dimensional representations and error proximation method.
Although the above manifold learning methods can suppress noise to a certain extent, the original data are possibly overprocessed or artificially intervened, and the effectiveness of dimension reduction becomes difficult to evaluate. Yan et al. [17], which is based on graph spectral of graph theory and the frequency domain of noise mostly belonging to high frequency bands, realize the screening of noise points by the characteristics of the data itself and do not affect the distribution of the original data; however, it is still a combination of denoising and dimension reduction and does not provide the possible low-dimensional space of the original data. Graph theory is also used in uncertain nonlinear systems with environmental noises, and the edgebased adaptive algorithm is proposed to solve the distributed optimal consensus problem [18], while the form of noises is needless to be focused. Li and Lian [19] consider the frequency of persistent dwell time switching in switching Markov jump systems to determine the rule of control while not specific to the manifold-based method. e low-dimensional space of noisy data will show certain randomness under the interference condition; that is to say, noise changes the existing distribution characteristics because of the low-dimensional coordinates changes, and these characteristics reflect the existence of certain deviation values in the dimension reduction results. Some works, such as that by Zhang et al. [20] propose a joint label prediction to have representations of data which preserved manifold structures explicitly and adaptively, Ye et al. [21] provide a new linear discriminant analysis to solve the problem of insufficient robustness to outliers, and Zhang et al. [22] propose a reliable and robust two-dimensional neighborhood preserving projection method, pay attentions to robust feature extraction problem. e aims of these papers are different from this paper. is paper, aiming at manifoldbased dimension reduction of noisy data, solves the problem of measuring the deviation values of low-dimensional data caused by noise and determines the local range of the data.
In the determination of localization, which is an important process in most manifold methods for proper localization, the key is to change the low-dimensional structure to an approximately nonlinear, high-dimensional one. is paper uses graph sampling [23] and spectral graph [24] to provide a solution for noisy data. In our paper, graph sampling is used in the first step of our framework to select useful nodes, and spectral graph is the theory foundation which gives the basis to graph wavelet analysis in our framework. is paper proposes to determine proper local ranges with knowledge of graph sampling and spectral graph and such local ranges are considered to be relatively natural, where the structure, including noise interference, of the original data is retained. e algorithm framework proposed needs all structural information, which is completed by combining graph sampling and spectral graph theory, to measure the deviation values of low-dimensional samples under noise interference.
For measuring the deviation values of the low-dimensional samples, this study uses a maximum likelihood estimation in each local range to construct a normal distribution function adapted to the current local. is function can be used to obtain the maximum deviation of the noisy data in each locality, and the dimension reduction result should be affected by the fluctuation of noise interference in one or more localities. e distance weights are computed to calculate the deviation results caused by noise of the dimension reduction results in our algorithm framework.
As can be seen from the above, the contribution of this paper is as follows: (1) is paper proposes a manifold dimension reduction framework for noisy data. (2) A kind of local ranges determination is provided for complex noisy data by graph sampling and graph spectral theory. (3) is paper proposes a weighted sum method to compute the low-dimensional data deviation values from different local ranges caused by noise.
Furthermore, our framework fits for many kinds of manifold dimension reduction methods and thus can be possibly applied to different research fields. e remainder of this paper is organized as follows. In next section, the preliminary works are introduced. Afterwards, the section details the proposed algorithm framework, and the following section provides the experimental results of the simulation data and practical data. Finally, a summary is given in the final section.

Preliminary Works
e noisy data are denoted as x � x i | i � 1, 2, . . . , N , x i ∈ R D . Assume that noise obeys the Gaussian distribution δ ∼ N(0, σ 2 ), and the data x sampled from the unknown smooth manifold M are denoted as x � x + δ. Based on the data x, an undirected weighted graph G � (V, E) is constructed, in which V represents the set of nodes in the graph, and the nodes have a one-to-one correspondence to the original data samples, that is, v i ∼ x i , ∀i. E represents the set of edges connecting two nodes in the graph. A weight matrix W is calculated, and a nonnegative weight is assigned to each edge. In this study, the calculation of weights uses a Gaussian kernel function 2 Complexity In order to analyze the spectral properties of the graph G, it is necessary to assume that there is a global signal function f ∈ R D , which will be applied to the node set V. e effect of the Laplacian operator of the graph on the function f is expressed as where L represents the Laplacian matrix, that is, We perform eigenvalue decomposition of L to its eigenvalues and eigenvectors. e eigenvalues represent a graph-based frequency domain spectrum. Let L � UΛU * , where U � [u 1 , u 2 , . . . , u N ] T represents the eigenvectors, and the diagonal elements [λ 1 , λ 2 , . . . , λ N ] of Λ represent the eigenvalues, arranged in ascending order λ 1 ≤ λ 2 ≤ · · · ≤ λ N . Using eigenvalues and eigenvectors, a graph-based Fourier transform can be performed on the signal function f, that is, with the inverse Fourier transform being We introduce the spectral graph wavelet theory [7] to determine the local low-frequency ranges in the graph. Let g be a nonnegative real-valued filter function satisfying g(0) � 0 and lim λ⟶∞ g(λ) � 0. For a Laplacian matrix of finite size, the wavelet operator can be expressed as T g � g(L) and T g : e operator T g acts on the Fourier transform result f of the signal function f, resulting in a modulating function and with the inverse transform law, we have At this time, T g f(n) corresponds to the node n in the graph G, and ψ f (n) is the wavelet coefficient of the wavelet operator T g acting on a given signal function f, that is, ψ f (n) � T g f(n). e related graph spectral wavelet work [25] further introduces a scale factor s to construct a multiscale operator, that is, T s g � g(sL). Another work [26] studies the use of low-order polynomials to approximate the wavelet mother function, for constructing a fast computation method for the spectral wavelets.

Proposed Algorithm Framework
e algorithm framework in this paper is divided into three parts: the determination of local anchor vectors, local range and distribution estimation, and manifold dimension reduction with distribution parameters.

Local Anchor Vectors.
e localization analysis is the first step for reducing the dimension of the manifold. e determination of the local anchor vectors is important for identifying the local position. Graph sampling is similar to image downsampling. Graph sampling can select some data samples to form a reduced data graph based on the polarity of the eigenvectors of the Laplacian matrix, retaining a part of the data samples of interest, so as to determine some key samples for computing the local ranges. ese key samples are collectively referred to as "local anchor vectors" in this paper.
Let the local anchor vector set be V 1 ; then, V 1 ⊂ V, implying that the anchor vector set is a part of the graph nodes, and each local anchor vector and its local range represent its nearest neighbors.
is paper adopts a graph sampling method based on the polarity of the components of the largest eigenvector, that is, with eigenvalues arranged in the ascending order. According to Shuman et al. [27], graph sampling should meet the following conditions: (1) the sampled node set should be approximately half of the total number of nodes in the original graph, that is, |V 1 | ≈ (|V|/2); (2) the removed nodes are not connected to edges with high weights; and (3) the computation is efficient and feasible. e polarity of the largest eigenvector sampling method satisfies three conditions, and the local anchor vector set can be obtained for the local range analysis.

Local Range and Distribution Estimation.
After each local anchor vector is determined, it is necessary to further determine its local range and then calculate information such as the deviation values caused by noise interference. e spectral graph theory is used to further analyze the local ranges. A spectral graph is a special type of spectral analysis. e spectral analysis itself is based on the frequency domain where a signal is characterized by its spectral coefficient or spectral energy. According to [28], stable data distribution often exhibits low-frequency characteristics, while noise may appear in the full frequency band and mainly cover high frequencies. erefore, the local range of a local anchor vector should be located in the low-frequency band of a signal on the anchor vector.
is study uses the spectral graph wavelet method to obtain the local range. is method is more flexible and has fewer parameters than the common k-nearest neighbor algorithm. If a signal is applied to an anchor vector, it will propagate and extend to the area around. Owing to distance and noise, the signal strength will be attenuated. When determining the local range using the spectral graph wavelet, the range in which the signal is attenuated to a certain degree and the bandpass characteristic of the wavelet function will also affect the speed of attenuation.

Complexity 3
Let a signal function δ n ∈ R N act on each node of the graph with δ n (n) � 1 and δ n (m) � 0, n ≠ m. For a certain node n, its wavelet coefficients ψ δ,n (n) � T g δ n (n). Suppose the filter function is g. en, e filter function will control the span of signal spreading, and it can take out the area where the local range will possibly be. Given a certain anchor vector v p ∈ V 1 and a signal δ p ; the corresponding wavelet coefficient vector ψ δ,p ∈ R N can be obtained according to spectral graph wavelet transform. Based on the knowledge of power spectrum [29], the square of the absolute value of the vector ψ δ,p represents the spectral energy. e stronger the spectral energy is, the better the signal spread in this area will be. Considering this trait, this study presents a local range determination method using an anchor vector v p as an example, that is, by selecting nodes whose spectral energy accounts for 90% of the total signal spectral energy. Suppose there are q nodes belonging to the local range of v p denoted erefore, we traverse the anchor vector set and determine the local range of each anchor vector through the spectral energy calculation given above. e noise will lead to a certain distribution of the local range of each anchor vector. On the Gaussian noise premise, the distribution can also be approximately Gaussian. To perform the manifold dimension reduction with distribution parameters, it is necessary to estimate the distribution parameters.
Our proposed framework uses the maximum likelihood estimation. We consider the anchor vector v p and its local range v 1 p , v 2 p , . . . , v q p as an example. Each node represents a data sample, that is, v i p � x i p , i � 1, . . . , q and x i p ∈ R D . According to maximum likelihood and the expression of the Gaussian density function, the estimated value of the mean and the estimated value of the covariance are, respectively, as follows: e estimated distribution of the anchor vector v p with its local range can be obtained as N(μ p , Σ p ).

Manifold Learning with Distribution Parameters.
e manifold-based dimension reduction with distribution parameters in this section will solve the problem in measuring the deviation values of low-dimensional manifolds under noise interference. A maximum likelihood estimation is used in each local range to construct a normal distribution function adapted to the current local. In this study, manifold mapping functions are established for the local mean and deviation results, to obtain the manifold dimension reduction results with distribution parameters and then to calculate the deviations between the deviation dimension reduction result and the mean dimension reduction result. Similarly, using the original data to establish a manifold mapping function and considering that each data belongs to the range of one or more local anchor vectors, the dimension reduction result should also be affected by the fluctuation of noise interference in one or more localities. In this study, the distance weights are used to calculate the weighted sum of the deviation results of the dimension reduction results of each data sample.
For each anchor vector and its local range, the mean and covariance were estimated. In reality, the mean of all anchor vectors constitutes a type of denoising result.
is mean result will form an approximately smooth manifold. erefore, the manifold dimension reduction of the mean values is used as the benchmark of the deviation measure.
e framework proposed in this study can use any manifold dimension reduction method to establish a mapping function. Assume that the selected manifold dimension reduction method is expressed as F; then, F(x): R D ⟶ R d and d < D represent reduction from dimension D to dimension d. When the mapping function is obtained by mean values, it is denoted as F μ . In addition, it is necessary to establish the dimension reduction results of the deviation value caused by covariance. Taking the anchor vector v p and its local range as an example, since μ p ∈ R D and Σ p ∈ R D×D for a certain dimension m, we can calculate the deviation values because of the variance parameter in dimension m, that is, where μ m p is the m-th element in the mean vector μ p and σ m p is the m-th diagonal element in the covariance Σ p . By traversing all the D dimensions of the data, the deviation values can be obtained.
e deviation values are divided into positive and negative values because of the direction of the variance and are denoted as We iterate through all anchor vectors to obtain all local deviation value sets [b − , b + ] and use the negative and positive deviations to obtain the dimension reduction mapping functions F b − and F b + , respectively. For now, we 4 Complexity obtain the manifold dimension reduction with distribution parameters, that is, In order to measure the low-dimensional deviation values caused by noise, the framework in this study requires the following two calculations: } to denote the dimension reduction with distribution parameters. For any j-th dimension, the formula for calculating the deviation is as follows: e above formula represents the deviation calculation of the j-th dimension within the local range of the anchor vector v p .
(2) Distance calculation of low-dimensional space data. e computation formula is as follows: e Euclidean distance calculation in the above formula represents the deviation calculation of the data distance within the local range of the anchor vector v p .
In order to further achieve the dimension reduction of the overall noisy data and measure the interference effect of noise, based on obtaining the manifold dimension reduction with distribution parameters of the anchor vectors and their local deviation values, the framework in this study provides a distance-weighted method for dimension reduction values of the original noisy data. e reason of using distance-weighted method is that any one data sample x i , ∀i, may be located in the local range of one or more anchor vectors, and noise interferences represented by each anchor vector and its locality will affect the current sample x i . In this study, the ratio of the distance between the current sample x i and the local mean of each anchor vector is used as a weight. e deviation value of x i comes from the weighted sum of the deviation results of the dimensions of each anchor vector. Specifically, suppose the data sample x i is in the localities of R anchor vectors, and these anchor vectors are expressed as en, we calculate the distances from the local mean values of these anchor vectors, that is, e deviation values of each dimension of the low-dimensional local of anchor vectors in V R are denoted as T For the entire data X, the dimension reduction result is F X , and the deviation values in each dimension of (11) are introduced to obtain the deviation values of the entire lowdimensional entire data with noise interference. For a certain dimension j, it is expressed as follows: We iterate overall low-dimension d and obtain F X,b + and F X,b − . us, the deviation values of the entire data distance in the low-dimensional space can be obtained from the Euclidean distance calculation in (12). e above deviation value calculation for each dimension in the low-dimensional space and for the overall data in the low-dimensional space is shown in Figure 1.
In Figure 1 and according to (14), we can obtain the computing of w 1 and w 2 in the following: rough the weights, the different local ranges can simultaneously affect the final deviation values.
Summarize the above three parts of algorithm to have the schematic illustration of Figure 2 in the following. e computing process of the proposed framework is as follows.

Procedure.
Step 1. Construct a graph G � (V, E) according to the data X, where the weight matrix W is calculated by (1).
Step 2. Calculate the Laplacian matrix L of the graph G and obtain the eigenvector matrix U and eigenvalue matrix Λ by matrix decomposition.
Step 3. Determine the anchor vector set V 1 according to the polarity of the largest eigenvector.
Step 4. For each v i ∈ V 1 , do the following: Apply a signal function δ i on v i . Calculate the spectral graph wavelet coefficient of the signal function according to (7). Determine the local range of the anchor vector v i according to (8).

Complexity 5
Obtain the distribution parameters of the anchor vector locality according to (9).
Step 5. Learn to obtain the manifold dimension reduction result F μ using all mean vectors of the anchor vector set V 1 .
Step 6. Use all covariances of the anchor vector set V 1 and (10) to obtain all deviation value vectors and learn to obtain the manifold dimension reduction results F b − , F b + .
Step 7. Learn to obtain the manifold dimension reduction result F X for all data X.
Step 8. Obtain the deviation value of each dimension of the low-dimensional space according to (11).
Step 9. For each x i ∈ X, do the following: According to (13), calculate the distance ratio between x i and the local mean values of the anchor vectors consisting of the local ranges that contain of F x i according to (14) and (15).

Experiments
For the proposed algorithm framework, two simulated datasets and two image datasets are selected for algorithm implementation and simulation.   Figure 2: A schematic illustration of the whole proposed framework of manifold dimension reduction of noisy data. 6 Complexity

Swiss Roll Dataset.
is three-dimensional dataset has a total of 1000 data samples. Gaussian noise with 0 mean and a standard deviation of 0.5 is added to each sample point, as shown in Figure 3.
We construct the graph G and calculate the Laplacian matrix L and its eigenvalue decomposition result L � UΛU * . e polarity of the largest eigenvector is used to determine the set of anchor vectors for local analysis, as shown in Figure 4.
Next, the graph spectral wavelet method determines the local range of each anchor vector. e filtering function selected in this study is g(x) � exp(− (τx/λ max )), where λ max represents the largest eigenvalue of the Laplacian matrix.
is function has low-pass characteristics. e parameter τ can control the low-pass frequency domain width and therefore change the ranges of local area. In practice, the filter function with its parameter should be tested to choose proper ones. For the current simulation process, τ � 30 is selected. A signal function δ is applied to each anchor vector, and the wavelet coefficients of the nodes in the graph are calculated so that the local range of each anchor vector is determined by the total energy ratio of the wavelet coefficients. Four anchor vectors are randomly selected as visualization examples, as shown in Figure 5.
For each local range, we estimate the distribution parameters, that is, the mean and covariance parameters. According to (9), the local mean vector and deviation values of each anchor vector local range in the three-dimensional space can be plotted, as shown in Figure 6. e mean vectors are used to learn to determine the dimension reduction F μ , and the deviations are used to learn to determine the dimension reduction F b − , F b + . In order to visualize the deviation value obtained by the algorithm framework shown in (16), the positive Z-axis value in the three-dimensional space is used to represent the deviation value because of the existence of the positive standard deviation of the distribution parameters, and the negative value Z-axis value is used to represent the one because of the existence of the negative standard deviation of the distribution parameters, that is, For now, we can obtain the dimension reduction results of the mean vectors and deviation values, as shown in Figure 7. is algorithm framework can introduce any type of manifold dimension reduction method. In the experiment, four commonly used methods are selected: LE, ISOMAP, LLE, and LTSA.
As can be seen from Figure 7, the low-dimensional results obtained by different manifold learning methods and the deviation values caused by noise interference are different.
We use manifold learning to perform dimension reduction on the entire noisy data and the proposed distance weighting method to calculate the deviation values of the noise interference in the dimension reduction result. In order to visualize the deviation results, the positive and negative Z-axes are still selected to denote the deviation values in low-dimensional space, as shown in Figure 8.
As can be seen from Figure 8, the dimension reduction results of deviation values with the distribution parameters exhibit certain morphological characteristics, which are different for different dimension reduction methods.

S-Shaped Dataset.
is dataset has a total of 1000 data samples. Gaussian noise with 0 mean and a standard deviation of 0.5 is added to each sample point, as shown in Figure 9.
According to the anchor vector set and the local range determination method given in the framework, the anchor point set is shown in Figure 10 and the local ranges represented by the wavelet coefficients of six randomly selected anchor vectors are shown as examples in Figure 11.
After determining the anchor vector and local range, the distribution parameters are estimated and the deviation can be obtained based on the positive and negative directions of the standard deviation, as shown in Figure 12. Similar to the Swiss roll dataset experiment, the mean vector and the deviation values in Figure 12 are used to calculate the deviation values caused by noise interference. e four commonly used manifold methods: LE, ISOMAP, LLE, and LTSA reduce it to two-dimension and use the positive and negative directions of the Z-axis to indicate the calculated deviation values, as shown in Figure 13. en, manifold dimension reduction processing is performed on the original noisy data, and deviation values are calculated for noise interference in the dimension reduction results. Similarly, select the positive and negative directions of the Z-axis to indicate the deviation values in low-dimensional space. Figure 14 also shows that the interference of noise on the manifold dimension reduction results varies with the method and has different characteristics.

MNIST Dataset.
e MNIST dataset is a set of handwritten digital grayscale images of size 28 × 28 in pixels. e purpose of the experiment in this section is to test the effectiveness of the proposed algorithm framework on datasets with classification labels. erefore, 2000 images of "0" and "1" are arbitrarily selected as two-class samples, and 1500 images of "0," "1," and "2" are arbitrarily selected as three-class samples. For visualization, each image data is vectorized into 28 × 28 � 768 vector sample, and the manifold dimension reduction method is used for two-dimensional reduction, and the positive and negative Z-axes of the three-dimensional space are used to characterize the deviation values caused by noise interference in low-dimensional space.  8 Complexity and LTSA, the commonly used manifold dimension reduction methods, and the parameter in the filter function g is τ � 35. Figure 15 is an example of the original image. Figure 16 shows the results of the manifold dimension reduction results and the deviation values in the positive and negative standard deviation directions in low-dimensional space. As seen in Figure 16, when reducing to two dimensions, the distance between the different classes is obvious, the class labels can be well distinguished, and the deviation values of the dimension reduction results with distribution parameters also have obvious class discrimination. e difference can also indicate the change in the noise interference in different localities of the dataset.

ree-Class Samples.
Gaussian noise is added to the original 1500 vectors, with 0 mean and a standard deviation of 0.5. Other conditions are similar to the two-class experiment. Figure 17 is an example of the original image. Figure 18 shows the results of the dimension reductions and their deviations. e results in Figure 18 also have obvious class discrimination, and the noise interference difference of each sample point can be expressed by the difference between the manifold dimension reduction results and the distribution parameters.

Fashion-MNIST Dataset.
e fashion-MNIST dataset belongs to an extended version of MNIST dataset and includes 28 × 28 grayscale image sets of ten category labels such as items "T-shirt," "jeans," and "sweater."

Two-Class Samples.
We arbitrarily select "T-shirts" and "boots" as two-class samples. e processing method is the same as in the MNIST dataset experiment. Gaussian noise with mean of 0 and standard deviation of 0.5 is introduced. Each class is randomly composed of 1,000 data samples. e proposed algorithm framework is used to process samples, and the parameter of the filter function g is τ � 25. Figure 19 is an example of the original image. Figure 20 shows the result of the manifold dimension reduction and deviation values of the dimension reduction results with the distribution parameters.
As can be seen in Figure 20, the dimension reduction results have obvious class discrimination, and the deviations by noise interference are nonlinear and have class discrimination.    Figure 19: Examples of original "T-shirt" and "boots" and images with added noise.

ree-Class Samples.
We arbitrarily select the threeclass samples of "T-shirt," "pants," and "boots." e setting of the experiment is the same as that of the two-class process. Gaussian noise with mean of 0 and standard deviation of 0.5 is introduced. We randomly take 500 data samples from each class for dimension reduction processing. Figure 21 is an example of the original image. e experimental results are shown in Figure 22.
As can be seen in Figure 22, the dimension reduction result visually retains the class discrimination, and the deviation values from the noise interference are not only nonlinear but also have a visual class discrimination.

Conclusion
is paper presents a manifold-based dimension reduction algorithm framework capable of processing noisy data. Considering the manifold localization and the distribution characteristics brought about by noise interference, we propose a method for determining local anchor vectors using graph sampling and a method for determining the local ranges of anchor vectors based on spectral graph wavelet. In each local range of an anchor vector, maximum likelihood estimation is used to estimate the distribution parameters, and a distance-weighted deviation value calculation method for dimension reduction results with distribution parameters is proposed. Among these, dimension reduction can be adopted as the currently used manifold learning methods.
As seen from the simulation of noisy data, the proposed framework can achieve the dimension reduction of noisy data and the deviation value measurement caused by noise interference and provides the dimension reduction results and deviation values with obvious classification discrimination for data containing class labels. Moreover, the proposed framework can fit for other kinds of dimension reduction methods and in this way to extend the functionalities of those methods to take noisy data as input to be applied in more complex situations. is paper will further research on various manifold dimension reduction calculations with other types of distribution parameters, optimization of filter functions, and studying of quantitative evaluation methods.
Data Availability e paper used the public MNIST and fashion-MNIST dataset.

Conflicts of Interest
e authors declare that they have no conflicts of interest.