Memristive devices based hardware for unlabeled data processing

Unlabeled data processing is of great significance for artificial intelligence (AI), since well-structured labeled data are scarce in a majority of practical applications due to the high cost of human annotation of labeling data. Therefore, automatous analysis of unlabeled datasets is important, and relevant algorithms for processing unlabeled data, such as k-means clustering, restricted Boltzmann machine and locally competitive algorithms etc, play a critical role in the development of AI techniques. Memristive devices offer potential for power and time efficient implementation of unlabeled data processing due to their unique properties in neuromorphic and in-memory computing. This review provides an overview of the design principles and applications of memristive devices for various unlabeled data processing and cognitive AI tasks.


Introduction
The enormous success of artificial intelligence (AI) in recent years is inextricably linked to the explosive growth of data and rapidly evolving computing hardware for better performance [1][2][3][4]. The main task of AI research is to find efficient ways to extract information and knowledge from the huge and continuously increasing amount of available data. When the data are labeled (i.e. one has a set of input variables and corresponding output variables), the algorithms make predictions by adjusting the model parameters relying on the priori labels from the training dataset. Such supervised learning approach has achieved amazing performance in accuracy, sometimes even outperforming human [5]. However, the scarcity of labeled data and the high cost of manually labeling data arouse unlabeled data processing problems in the most common applications.
High-dimensional unlabeled data processing is a more general technology toward realistic vast amounts of data [6][7][8]. It exists in various fields, such as medicine [9,10], bioinformatics [11][12][13], biology [14][15][16], computer science [17,18] and multiple engineering fields [19,20]. Aimed for different applications, many models for various unlabeled data processing have been developed, including autoencoders and sparse-coding for feature exaction [21,22], k-means for classification [23,24], Hopfield networks, and restricted Boltzmann machines for optimization problems [25,26]. These models offer many fascinating abilities to AI systems, such as getting insights from large volumes of raw data, discovering inherent unknown patterns and features structure, and adapting to any change in the external world in a cognitive way. Some models even hold the potential to outperform supervised models in accuracy, generalization ability, and robustness [27,28].
Accessed from the perspective of computing, unlabeled data processing models contain dominating complex computing-intensitive operations such as vector matrix multiplications (VMMs) and locally competitive algorithms (LCAs) [29]. Traditional scalar von Neumann architecture computing systems based on transistor electronics are not optimized for these computing tasks [30]. Existing unlabeled data processing models implemented on the conventional hardware face a high design overhead in speed, power, and area [31,32]. Therefore, it is a vital task to develop new nanoscale devices beyond CMOS and novel hardware architectures to achieve high-efficiency unlabeled data processing.
To achieve this, memristive devices are under intense investigation, including resistive random access memory (RRAM) [33][34][35], phase-change memory (PCM) [36][37][38], ferroelectric devices [39,40], ion-gating [41][42][43], and charge-trap transistors [44][45][46], as well as spintronic devices [47,48]. They have a unique characteristic that their internal resistance (or conductance) state is determined by the history of applied voltages and currents. These nanoscale devices feature good scalability, stacking-ability, and other intriguing properties that exceed conventional integrated circuit technology [49][50][51][52][53][54]. Besides, there exists a high similarity between the variable synaptic strengths of biological synapses and tunable internal resistance states of memristive devices. Memristive device crossbar arrays are thereby power-efficient and fast in the calculation of VMMs and LCAs. Memristive devices offer the fascinating possibility of in-memory computing and neuromorphic computing, thus bypassing the restrictions of computing speed as well as the power consumption of the von Neumann architecture because of the distanced memory and processing units [55][56][57][58].
Therefore, as shown in figure 1(a), realizing unlabeled data processing models with the memristive device is a promising way to low-power computing systems. In this paper, we provide a high-level overview of the design principles and the methods for building energy-efficient memristive device-based computing hardware. These memristive AI hardware process various input data such as digital, analog, and spiking efficiently, realizing cognitive AI tasks, such as recognition and classification tasks in computer vision [59,60], healthcare [61,62], and neuromorphic computing [63], as shown in figures (b) and (c). The memristive hardware is described according to different unlabeled data processing models. These unlabeled data processing models include:

Principal component analysis
Principal component analysis (PCA) is an important tractional machine learning algorithm for unlabeled data processing. By projecting the high-dimensional unlabeled data into lower dimensions principal components (PCs), PCA finds the best representations of the unlabeled data using a finite number of PCs. As shown in figure 2(a), by minimizing the distance between the input unlabeled data and their projection, the first PC is chosen. Meanwhile, the variance (σ 2 ) of the projected points is also maximized, as shown in figure 2(b). The second and subsequent PCs are extracted in the same way, with another requirement that they must be orthogonal to all previous PCs. PCA reduces data dimensionality data while retaining models and trends so that it is widely applied in machine learning applications, such as facial recognition, image denoising, analyzing genetic data, and disease predictions [64].
However, the traditional way for PCs is to resolve the eigenvectors of the covariance matrix of the high-dimensional input data, which is precise but compute-intensive. Nowadays, a more hardware-friendly method with low computation cost is developed. This new approach finds the approximate PCs through unsupervised, online learning of PCA neural network, which can be implemented very efficiently by the memristive crossbar. To be specific, Sanger's rule is often utilized in PCA neural networks, as shown in equation (1) [65]: where i is the row number, j, k is the column number of crossbars, x and y are the input and output, respectively, g ij is the weight, and η is the learning rate. The input data are encoded into voltage pulses with different widths, and the weights of columns after the training process correspond to PCs one by one. The first memristive PCA neural network is proposed based on TaO x memristor crossbar array, as shown in figure 2(c). The crossbar array structure can efficiently perform the matrix operation and the tantalum-oxide memristor can implement on-chip in situ learning with simple programming pulses [66]. Therefore, unsupervised, on-chip learning of PCA neural networks is trained under Sanger's learning rule to obtain the PCs. As shown in figure 2(d), the experiments show that even using memristors with non-idealities, the memristive PCA network can linearly separate different classes from breast cancer data with an excellent success rate of 97.6%.
However, the as-prepared TaO x -based memristor needs an initial forming process, where a high voltage of ∼5 V (the typical value of the TaO x -based memristor set voltage is ∼1 V) is applied to get the ionic distributions, which are vital for subsequent resistive switching behaviors. When the high forming voltage is applied to form a device, the already formed devices that share the same row can be damaged. To solve this problem, a forming-free TaO x -based memristor is developed by using Ta 2 O 5 layer as the switching layer and Ta metal as the reactive top electrode, as shown in figure 2(e). A rapid thermal annealing process after device fabrication to create oxygen vacancies in the Ta 2 O 5 switching layer is responsible for forming-free behavior. Similar results has also been reported in other forming-free metal-oxide memristors [67][68][69]. The device is utilized for the successful implementations of a practical memristor crossbar-based PCA neural network computing system. As shown in figure 2(f), the computing system includes a CMOS chip as a microcontroller and a c test board in which the wire-bonded memristor crossbar array is connected to the periphery circuitry [70]. The memristive PCA neural network hardware reached 97.1% accuracy in the classification task on standard breast cancer data even using devices with large variations, closing to results of software solving (97.6%).
As for the previously mentioned implementations of PCA neural network, an external printed-circuit board is needed to provide the interface and control circuitry for memristor arrays. Therefore, integrating all these functions on chip will be a big step forward to the practical use of memristive computing systems by enabling the whole system to be scaled. WO x memristor is a good choice to fabricate integrated memristor/CMOS system because it is compatible with the 180 nm logic process and can be fabricated on top of the CMOS circuits using back-end-of-line (BEOL) process as shown in figure 2(g). Therefore, in a single memristor/CMOS chip, WO x memristor crossbar array can be integrated with peripheral circuits consisting of analog-to-digital converters/digital-to-analog converters (ADCs/DACs), digital buses, and a programmable processor, as shown in figures 2(h) and (i). With the integrated WO x memristor/CMOS system, PCA neural network and the following classification layer are implemented [71]. Compared with software PCA programs, this memristive hardware system can achieve similar classification accuracy with a high power efficiency (1.37 TOPS/W), which potentially endows PCA on large amount of data.

Sparse coding
An interesting fact is that only a small subset of elements from a dictionary is needed to well approximate a natural image. The problem of selecting a proper subset of dictionary elements and their coefficients to reconstruct a signal is called sparse approximation. Sparse coding originally refers to the intriguing hypothesis that biological neural systems exhibit similar sparse representations with their population codes, encoding stimulus information only in a few active neurons. For example, the significant sensitivity in the receptive field of primary visual cortex cells can be explained by sparse coding. Within the receptive field, a single neuron is coded to the stimuli, such as edges, stripes, line segments, as well as other image features in a certain direction. The cost function of sparse coding can be mathematically expressed as: where x is the input signal, a is a vector of sparse coefficients with only a few non-zero elements, D is a dictionary of features, |·| 2 and |·| 0 are the L 2 -and L 0 -norm, respectively, and λ represents the sparsity parameter. Sparse coding described by equation (2) is a complex nonconvex optimization problem. Fortunately, a convenient unsupervised machine learning algorithm, the LCA, is developed to solve this sparse coding problem. LCA consists of several nonlinear ordinary differential equations which depict the dynamics of the membrane potential and the firing rate of interacting neurons. These neurons continually compete with neurons nearby using lateral inhibition to compute coefficients of input data using a complete dictionary. The LCA significantly eases the hardware implementation of sparse coding in that the used computational primitives can be realized by analog circuit elements. Thus, using memristive devices to implement LCAs also can be high speed and energy efficient.
A detailed description of how memristor arrays can be used to implement the LCA is proposed by Lu et al [72]. In this implementation, an analog WO x memristor crossbar array structure is employed in which VMM and matrix transpose can be easily performed. Input x is an m-element column vector which is coded into the width of voltage pulses. These pulses are applied to the rows of the array simultaneously. The amplitude of the voltage is carefully chosen to be high enough to guarantee an adequate read-out margin, and not so high to switch the resistance states. Memristors in a given column serve as an m-element vector representing a feature. N feature vectors form an m × n matrix, the feature dictionary D. A leaky-integrate and fire (LIF) neuron is connected to each column of D as output. The activity coefficients are represented by an n-element row vector a, in which the ith element of a is the sparse coefficients of the corresponding feature. Equation (3) shows the dynamics of such a process: where u i represents the membrane potential of the ith neuron, τ and λ represents time constant and threshold, respectively. After sending the original input x into the network, a reconstruction of x can be obtained aŝ x = Da T . As shown in figure 3(a), the difference between the reconstructed signal and the input is the residual. Lateral neuron inhibition is realized via an iterative strategy by feeding the residual to the network as the new inputs [65]. In this ping-pong way, the LCA network settles to a steady state and the vector a representing neuron activity no longer changes, thus the cost function described by equation (2) is minimized. Thus, a sparse representation with minimum reconstruction error and high sparsity is found in a memristive computing system. Compared with CMOS implementation of sparse coding, the stimulated memristor crossbar sparse coding computing architecture proves to be more efficient by offering a saving of 154× and ∼2× for area and energy, respectively. The distortion of the reconstructed signal is comparable with the result obtained by software at high sparsity, even implemented by memristor with large variations [72]. Following this idea, the prototype system of sparse coding algorithms is implemented by a 32 × 32 WO x memristors crossbar array (figure 3(b)) [73]. According to the nature of the input images, various feature dictionaries can be learned on-chip and stored in this crossbar. As shown in figures 3(c) and (d), data-intensive tasks such as natural image processing and real-time video analysis showcase the great prospect for piratical unsupervised AI tasks [71]. Based on the threshold adaptive memristor model, a sparse coding algorithm named memristive neural network-based soft-threshold adaptive sparse coding (MMN-SLCA) is proposed. The tuneable thresholds further improve the flexibility of intelligent sparse coding. The MMN-SLCA computing system reduces the dimensional and complexity of the original input unlabeled signals and provides enhanced feature extraction, pattern recognition, and super-resolution reconstruction functions [74].

K-means clustering
Clustering analysis is a basic unsupervised learning method that refers to the classification of a set of signals such as data points, feature vectors, or observations into meaningful groups with no predefined labels. Among different clustering analysis methods, K-means clustering is the prominent one and is applied in numerous fields of science and engineering such as gene expression analysis and document clustering [70]. The basic principle of k-means is quite simple. Before classification, the k centroids are initialized randomly; first, each input data point is classified to its nearest cluster by computing the distances between the input unlabeled data point and the k centroids; second, update the coordinates of k centroids with the new clusters; third, the algorithm repeats two steps until convergence is reached.
The k-means clustering algorithm can be implemented efficiently in memristive devices crossbars by representing weight vectors as conductance. As shown in figure 4(a), crossbar arrays of Ta 2 O 5−x memristor with a low forming voltage and ideal analog programmable conductance is used for the implementation. Figure 4(b) depicts the implementation of the k-means algorithm with the Ta 2 O 5−x memristor crossbar array. Before classification, the coordinates of the centroids are interpreted as the conductance matrix of the memristors crossbar. The coordinates of input data points are encoded with pulse width modulation and the resulting voltage pulses are applied to the rows of the crossbar. By adding a row to introduce the W 2 term and comparing the accumulated charges in the columns, the closest centroids of the corresponding input data point can be found. The experimental system reaches high classification accuracy (93.3%), approximately the same as the result directly obtained from software (95.3%) for the standard IRIS data set (figure 4(c)) [75].
Besides Ta 2 O 5−x memristor, memristive implementation of k-means is also demonstrated by graphene field effect transistor (GFET) devices with programmable conductance, as shown in figure 4(d). VMM operations are executed by the memory architecture shown in figure 4(e). For a given back-gate voltage (V BG ), the product of the matrix represented by GFET device's conductance and the vector encoded by input voltage amplitude is given spontaneously by the output current. The results of output currents in experiment and theory are very close, as shown in figure 4(f). Because GFETs allow reliable programming to specific conductance states, which improves the computing accuracy of k-means clustering [76].

Hopfield network
Hopfield network is an asynchronous, recursive, and dynamic artificial neural network (ANN) proposed by Hopfield in the 1980s and is considered as a kind of constrained optimization network that minimizes certain preprogrammed energy functions. With a simple structure of several interconnected neurons, the nonlinear Hopfield networks have proved extremely useful in unlabeled signal processing such as pattern recognition, data compression, combinatorial optimization problem, as well as location allocation problems [77,78].
The dynamics and energy function are crucial to understanding how the Hopfield networks work. We take a binary discrete-time asynchronous model as an example because this model operates based on a simple mechanism and is often adopted in practical applications. For binary discrete-time asynchronous model consisting of N fully connected binary neurons, only one randomly-selected neuron update in one step and the dynamics governing the system evolution is described by equation (4): where U j (t) represents the state of the neuron j at iteration t, f (.) is the threshold function, ω ij (t) and T b j is the strength of synaptic weight and bias respectively.
Alternatively, the energy function also can describe the evolution of the state in Hopfield network, as given by equation (5).
The energy function spontaneously decreases during the runtime. It is proved mathematically that the evolution of the nonlinear system given by equation (4) is equivalent to the energy function minimization process described by equation (5). Therefore, the network gradually converges to the attractors represented by state variables of the network including the neuron states and synaptic strength [79]. By mapping the problems into the state variables and energy function of the Hopfield network, the dynamic allows the network to solve optimization problems and realize recurrent associative memory.
The update rule of equation (4) ensures that energy function is decreasing as the network evolves in time. But the network is easy to trap into local minima (not the final goal) near the initial states. Steady states are the key mechanism for convergence but trapping in local minima is undesired for the solution of optimization problems. To avoid the local minima problem, simulated annealing is developed to get out of local minima by harnessing thermally controlled probabilistic jumps [80].
Hopfield network is realized by constructing CMOS synapses with large chip areas and high power consumption. In contrast, memristive devices provide a more efficient way to realize Hopfield networks [81,82]. This is because the dominating computation task in Hopfield networks, VMM, can be efficiently performed by the high-density memristor crossbar [63]. Furthermore, by harnessing the inherently stochastic and nonlinearity during switching, memristive devices can significantly reduce the implementation overhead of simulated annealing.
As shown in figure 5(a), Hopfield neural network solving max-cut problems is implemented by programming binary weight matrices on 1-transistor-1-memristor (1T1R) crossbar arrays of TaO x memristors integrated with the control circuits at 180 nm technology node. 1T1R crossbar arrays enable highly efficient computations of VMM of input vectors ( figure 5(b)). Intrinsic analog noise of the system includes the conductance fluctuations in the nanoscale devices and crossbar array non-ideal properties, such as IR drop. They are exerted to demonstrate the contribution to improving both solution quality and efficiency by enabling the system to escape local minima (figure 5(c)). Compared to quantum, optical, and fully digital methods, this memristor-based stochastic annealing system provides throughput over 4 orders of magnitude higher per power consumption, i.e. power efficiency [83]. Chaos from nonlinearity in TaO x memristors is also introduced to simulate annealing to develop effective memristive optimizer hardware for combinatorial/continuous function optimizations [84].
Besides harnessing the intrinsic noise and the chaotic annealing, a weight annealing strategy is proposed to overcome the local minima trap problem in the memristive Hopfield neural networks. By initializing all synaptic weights to zero and then gradually recovering the weights by the annealing strategy, convergence to global minima can be achieved more quickly because the network keeps near its ground state during the whole process. The weight annealing is implemented by a 20 × 20 crossbar array of TiO x memristor and mixedsignal circuits (figure 5(d)). The pre-synaptic drivers are used to realize the scaling of the synaptic weights, which is the key to the weight annealing, as shown in figure 5(e). The numerical simulations prove that the weight annealing strategy leads to better results for several classic combinatorial problems compared to chaotic or stochastic annealing (figure 5(f)) [85].
In addition to solving combinatorial optimization problems, associative memory is another cognitive function of the Hopfield neural network by allowing the system to reconstruct the whole data upon a part of the presentation for that piece of data. As mentioned before, the trapping in local minima (attractors) is favorable in associative memory as the attractors represent the stored information. Single/multi-associative memory is experimentally prototyped in HfO x, memristors [87]. The inputs of the computing system encoded into voltage pulses are also used as the outputs to train the analog memory states of memristors. Thus, different features are stored in the network, described by another set of values of the parameters for the recall, also known as the 'synaptic weights'. A power-efficient associative memory implementation is proved by a Y-flash memristive crosswise array. The cell of the array consists of a floating gate two-terminal n-channel metal-oxide semiconductor transistor ( figure 5(g)). As shown in figure 5(h), a 3 bit content-addressable memories (CAMs) Hopfield network is implemented with 12 Y-flash arrays as memristive synapses integrated with three decisionmaking neurons. The resulting hardware performs single/multi-associative memory (figure 5(i)) and minimal power dissipation is achieved in a winner-take-all network configuration [86]. Thanks to the power-efficient Y-flash memristive device, the memristive network achieves a low power consumption of only 3.6 μW on a testing epoch.
In the memristive associative memory system discussed earlier, the attractors of the Hopfield network are discrete. However, network attractors can be continuous by introducing a translationally invariant bell-shaped connectivity model. Continuous attractor is considered as the mechanism of working memory, namely the ability of the brain to store and maintain pertinent information and apply this information to do higher-level computations afterward. Through the simulation of continuous attractor in the memristor-based Hopfield network, working memory is achieved based on memristive device in [88], demonstrating the great potential of the memristive device for the development of highly efficient computing hardware for cognitive unlabeled data processing AI tasks.

Restricted Boltzmann machine
As an unsupervised learning model, restricted Boltzmann machine (RBM) is regarded as a generalization version of the Hopfield networks. RBM is a simple neural network consisting of a hidden layer and a visible layer. Each node of the hidden layer is connected to every node of the visible layer and connections within a layer are forbidden. Just like the Hopfield network, the evolution of RBM states is determined by an energy function, as described in equation (6).
where c i and d j are the bias weights, ω ij represents a synaptic weight connected between v i and h j nodes. Different from the original Hopfield network, the RBM evolves in a stochastic mechanism. The distribution over states of RBM is given by a Boltzmann-Gibbs distribution: However, it is difficult to exactly calculate the value of normalizing factor Z due to the terms of exponentially scaling. Fortunately, it is easy to see that the hidden nodes are conditionally independent given the values of the visible nodes by using Bayes rule (and vice versa). Therefore, a conditional distribution of RBM states can be expressed as: where σ(.) represents an activation function. RBMs are universal approximators of discrete distributions. After learning, any unknown data distribution can be represented and the minima of the energy function corresponding to a sample of the unknown data set. Thus, RBMs have found a variety of applications in statistics, information theory [89], machine learning problems [90], and even physics [91]. Hardware implementation of RBM by CMOS is intolerable when scaled up due to the storage required allto-all communication among the RBM computing process [95,96]. To train the brain-scale neural network application, an efficient RBM system is in need. Memristive crossbar provides a good solution by integrating the computing-intensive storage operations of neuron outputs and VMMs within the same stage cycle [97,98]. As shown in figure 6(a), RBM is implemented using a 20 × 20 Pt/Al 2 O 3 /TiO 2−x /Pt memristor crossbar array. By utilizing intrinsic and extrinsic current fluctuations of memristive device, efficient stochastic dot production computation given by equations (7) and (8) is performed. This is the most intensive computation during inference and training of RBM, as shown in figure 6(b). By introducing simulated annealing, the RBM is a fast and practical method to solving combinatorial optimization problems. As an example, shown in figure 6(c), the memristive RBM computing system quickly converges to thermal equilibrium and finds the solution only in 500 epochs [92].
Besides optimization problems, the memristive device with intrinsic fluctuations and stochastic can be used for pattern classifications tasks by constructing a fuzzy RBM network. The weight states in fuzzy RBM network are fuzzified following Gaussian distributions (figure 6(d)), which can be implemented naturally by stochastic Pt/TaO x /Ta memristors. The device stochasticity of the memristor originated from the cycle-to-cycle as well as device-to-device variations of the device internal resistance states (figure 6(e)). Compared with other RBM networks, the fuzzy RBM network achieves a lower error rate in handwritten digit recognition test on the MNIST database, demonstrating its enhanced tolerance ability to device stochasticity (figure 6(f)) [93].
By using RBMs as a building block to extract features, training deeper models is much easier [89], such as deep belief networks. Discriminative RBM uses both the input pictures and the associated label as visible neurons and learned jointly. This architecture is believed to be more powerful in features extraction than the simple RBM [99]. By using the conductance of two memristive devices to represent the synaptic weights (figures 6(g) and (h)), a discriminative RBM and other RBM-based networks are simulated and compared. The result reveals that the discriminative RBM overperforms other RBM-based structures using devices with typical parameters (figure 6(i)), suggesting the great potential of memristive RBM architectures to achieve on-chip unlabeled data processing [94].

Autoencoder
Autoencoders is another generative model developed to produce representations. By using the input data as the teacher, autoencoders can transform inputs into outputs with minimal distortion. Therefore, they can extract useful information needed for solving unlabeled data processing problems, such as video recognition or anomalous data detection. Autoencoder is also a biologically-plausible mechanism. By adopting an autoencoder model, the behaviors of the sensory areas in the human visual cortex were well understood. Furthermore, autoencoders play a critical role in the deep learning architecture where bottom-up stacked autoencoders layers are frequently used followed by a supervised learning top layer [21].
Similar to other ANN, autoencoders can be implemented by memristive devices-based systems with significantly lower power consumption than the alternative conventional approaches. A memristive autoencoder computing system is designed for real-time intrusion detection and anomaly detection. The benchmark results show that the system has an overall detection accuracy of 92.91% and the malicious packet detection accuracy is 98.89% [100]. Autoencoder hardware is implemented by Cu:ZnO/Nb:STO memristor which can denoise sample imagines of the MNIST dataset [101]. By modulating the conductance of the memristive device, the STDP learning rule is obtained. Employing this learning rule, a spike-based autoencoder learns the features of the original image and is tested with images adding different kinds of noise. Since autoencoders have the ability Figure 7. GAN structure. G generates fake samples from uniform noise distribution. D is a discriminator to maximize the probability of assigning the correct label to both fake samples and input data. © 2020 IEEE. Reprinted, with permission, from [102].
to precisely encode the input features, the output of tested imagines is denoised. The obtained results show that the recovered images achieve high accuracy even with a considerable amount of noise, proving that memristive devices play an important role in autoencoder-based unlabeled image processing and facial recognition systems.

Generative adversarial network
GAN is a type of unsupervised learning model that can extract feature representations and generate new patterns from unlabeled data. Generally, GAN is a complex architecture consisting of two independent neural networks with a competitive learning approach. In detail, as shown in figure 7, a generator (G) learns from the real data distribution and generates new data from the uniform random input noise to deceive a discriminator (D). The discriminator acts as a detective to discriminate fake samples from G and real ones from the training data. The G and D are trained together and influenced each other. The backbone neural networks to construct G and D are still the convolutional neural networks (CNNs) [102].
The training and testing processes of GAN consist of large numbers of forward and backward passes, which are more sophisticated than deep neural networks. Thus, GAN is computationally complex and requires significant computational resources. Therefore, the implementations of GAN with conventional hardware are slow and energy inefficient, even in GPU and distributed computational networks. Therefore, realizing an efficient accelerator for GAN is necessary but challenging, especially for time-constrained object tracking applications and energy-and area-constrained IoT edge devices. Thanks to their superior properties, various memristors, and their composite circuits are excellent candidates for implementing efficient GAN. For example, a novel circuit based on a spintronic memristor crossbar is proposed for the acceleration of CNN operations (figure 8(a)) [103]. Based on the memristor-based CNN unit, a GAN architecture is developed to solve the single image super-resolution problem ( figure 8(b)). As aforementioned, WO x memristors are compatible with the 180 nm logic process, and integrated memristor/CMOS chip can be fabricated using BEOL process by fabricating the memristor crossbars on top of the CMOS circuits [69,104]. Therefore, hardware named analog memristive deep convolutional GAN (AM-DCGAN) is designed to accelerate the intensive neural computations of GAN [105]. By mapping weights to the conductance of WO x memristors, convolution, deconvolution, and mean-pooling filters are realized by memristive crossbars (figure 8(c)). System-and circuit-level simulation results prove that the minimum average power consumption per neural computation is only 47 nW and cell-to-cell variability of memristor affects the performance of the generator system adversely and image quality degrades after the process variation standard deviation over 0.08 ( figure 8(d)). Therefore, the AM-DCGAN system efficiently achieves high-quality image generation on the MNIST database.
Compression techniques of CNNs in software, such as quantization and pruning, can be utilized to accelerate memristive GAN architecture since the G and D units are still conventional CNNs. Through binarizing the weights for several CNN layers within both the G and D in the traditional deep convolutional GAN (DCGAN), an approximate GAN (ApGAN) algorithm and its hardware implementation for accelerating GANs are proposed [102]. As shown in figure 8(e), the architecture of ApGAN accelerator consists of image and kernel sub-arrays, external processing unit, and the computational sub-arrays based on memristor. The sub-array is composed of row decoder, column decoder, as well as sense circuitry with 512 rows and 256 columns per memory array. Ag-Si memristor device parameters are used for SPICE models at the circuitlevel simulation of the memristive sub-array. Then the SPICE models of memristors and CMOS transistors are combined for system-level evaluations. As shown in figure 8(f), the images generated by ApGAN architectures look similar to that from full-precision DCGAN. As shown in figure 8(g), the ApGAN architecture achieves better results in energy efficiency and throughput over traditional CMOS accelerators due to its approximate quantization and parallel, energy-efficient operations based on memristive sub-array [102].

Spiking neural network
Biological synapses and neurons use discrete spikes to transmit and process information [106]. Inspired by biological neurons, spiking neural networks (SNNs) are developed as the third generation of artificial neuron models. Compared with second-generation DNNs in which neurons communicate using single, static, continuous-valued activations, SNNs bring superior accuracy because of the added temporal dimension and intrinsic sensitivity to the unstructured temporal data. This mechanism is prevalent in the biological neural system [107].
The lack of effective training methods is a challenge for SNNs because the spikes are discrete both in the time and space domain. Spiking neural transfer function is non-differentiable preventing the use of widelyused backpropagation. For now, only a few biologically-plausible local learning rules are developed to train SNNs. These unsupervised learning rules describe the two types of synaptic plasticity: long-term plasticity and short-term plasticity [108][109][110]. Long-term plasticity includes long-term potentiation (LTP) and longterm depression (LTD), spike timing dependent plasticity (STDP), and spike rate dependent plasticity (SRDP), while short-term plasticity contains short-term potentiation (STP), paired-pulsed facilitation (PPF) and depression (PPD). They are originally discovered in neuroscience and thought to be the underlying mechanisms for complex learning and memory functions in the human brain [111].

Long-term plasticity
STDP is the most commonly adopted learning mechanism in unsupervised training in SNN. The interpretation of biological STDP is intuitive: when the pre-neuron fires shortly before the post-neuron, the synaptic weight is strengthened (called 'LTP'). When the pre-neuron fires shortly after the post-neuron, the synaptic weight is thereby weakened (called 'LTD') [59,108]. Although standard CMOS elements have been used to implement STDP, memristive devices are of high interest due to the analogy of the basic STDP mechanism found in memristors [112][113][114][115][116][117][118].
It is important to point out that the hardware implementation strategies for STDP should be very different, depending on the conductance switching dynamics of memristive devices. As shown in figure 9(a), a ferroelectric tunnel junction (FTJ) is fabricated by using BiFeO 3 (BFO) as a ferroelectric tunnel barrier, which is sandwiched between a top electrode of Pt/Co and a bottom electrode of (Ca, Ce) MnO 3 [119]. Applying voltage pulses to the FTJs changes the area of ferroelectric domains, resulting in junction conductance switching. As shown in figure 9(b), when both pre-and post-neuron spikes are applied to the FTJ closely in time, the voltage in the device temporarily exceeds the threshold. The conductance of FTJ will increase or decrease depending on the sign of time delay. An SNN network is simulated by a crossbar of 9 × 5 FTJs and successful unsupervised pattern recognition is demonstrated, as shown in figure 9(c).
Unlike FTJs where continuous conductance adjusting is achievable due to the polarization switching of ferroelectric domains under applied voltage pulse, the asymmetric phase-change dynamics in PCM makes the implementation of STDP a challenge [120,121]. Usually, crystallization and amorphization dynamics of Ge 2 Sb 2 Te 5 (GST) are utilized to realize the synaptic potentiation and depression, respectively. However, a gradual update of the state of the cell cannot be achieved in the amorphization process like crystallization and high energy consumption is required for creating the amorphization (figure 9(d)) [122]. To overcome these problems, a simplified power-efficient asymmetric STDP learning rule is proposed by differential configuration comprised of two cells, where a synaptic weight decrease operation is achieved by an increase of a negatively contributing second cell (figure 9(e)). Unsupervised learning as well as detection of multiple temporal correlations in parallel input streams are demonstrated in the phase-change memristors implemented SNN architecture (figure 9(f)) [123].
In the case of conductive bridging random access memory (CBRAM) devices, the change of resistance state depends on the formation and break of the conductive filament. This process is determined by the polarity and amplitude of the applied voltage. Thus, to implement STDP, a 1T1R array structure is adopted to provide access to fine control of individual CBRAM devices. With filaments consisting of semiconductor or semimetal subquantum, CBRAM cells with low programming energy (about 0.2 pJ) and excellent filament stability are fabricated in a standard 130 nm CMOS process (figure 9(g)). As shown in figure 9(h), linear increasing and decreasing device conductance are provided by 1T1R CBRAM cells allowing fine synaptic weight updates (LTP/LTD) for SNN training, and high recognition accuracy is achieved in MNIST digit classification (figure 9(i)) [32].
Besides the standard STDP mechanism, other local training rules such as stochastic simplified STDP [124][125][126] and SRDP [127,128] can be implemented by memristor naturally. For example, WO 3−x devices (figure 9(j)) is second-order memristor which has delicate dynamics originating from the temporary effects such as the electric double-layer capacitance [129]. If a second spike reaches the device before the completely disappear of the first excitatory postsynaptic current (EPSC), the diffusion of oxygen ions will be suppressed. Then the oxygen ions will accumulate on the Pt/WO 3−x interface and result in greater resistance change. As shown in figure 9(k), higher rate spikes lead to a much larger EPSC amplitude. This behavior of WO 3−x device is similar to SRDP in the biological synapse. Furthermore, the Bienenstock-Cooper-Munro (BCM) learning rule, the mechanism for complex spatiotemporal pattern recognition in the human visual cortex such as rate-based orientation selectivity, can be implemented by second-order WO 3−x devices [130]. The BCM rule-based SNN model proved to be very effective in unsupervised learning of spatiotemporal patterns of different orientation bars (figure 9(l)) [131].

Short-term plasticiy
The other type of synaptic plasticity, short-term plasticity, is also essential for SNN by offering much more dynamics. For example, as a common form of short-term plasticity, PPF was reported to be crucial for complex decoding and processing of temporal information functions in the human brain such as memory and forgiveness activity [142,143]. Thus, to develop more powerful memristive SNN hardware for unlabeled date processing, a lot of efforts have been devoted to implementing short-term plasticity in memristive devices.
Artificial synapse with short-term plasticity, low-programming voltage (∼0.2 V), and high-precision (1024 states) is realized by a three-terminal memristor with Li x WO 3 channel and self-gate (figure 10(a)) [144]. As shown in figure 10(b), a typical PPF behavior is observed where the conductance change in response to a pair of voltage pulses is determined by the time delay of these two pulses. Based on the three-terminal Li x WO 3 memristor, an SNNs model was built and reached 128× improvement in class separability when compared with SNN with NO-STP synapse in the classification performance benchmark test (figure 10(c)). The above Li x WO 3 memristor demonstrates only one kind of short-term plasticity. However, multiple forms of short-term plasticity can be realized in a single device. As shown in figure 10(d), SiO x N y :Ag memristor compose of a SiO x N y dielectric layer embedded with Ag nanoclusters and inert electrodes [145]. By in situ high-resolution TEM observation and simulation, the Ag dynamics of the SiO x N y :Ag memristor is revealed. The revealed Ag dynamics highly resemble the Ca 2+ behavior in synapses of the biological system, which allows the realization of multiple short-term plasticity phenomena such as PPF, PPD, and PPD following PPF, as shown in figures 10(e) and (f). Recently, memritive devices with short-term plasticity have proved to be very efficient in the realization of spatio-temporal computing, such as recognition of spatio-temporal patterns with a reduced training cost [146] and real-time motion detection [147].
Long-term plasticity and short-term plasticity are closely related. It was mentioned before that both longterm plasticity and short-term plasticity are indispensable in the physiological process of cognitive function [148,149]. Similarly, by device-level hardware combination of long-term plasticity and short-term plasticity, the memristive computing system can gain some new features which cannot reach by only one synaptic plasticity. As shown in figure 10(g), four-terminal PCM cell consists of Ge 15 Sb 85 on Al 2 O 3 and SiO 2 dielectrics [150]. The voltage pulses applied on the source-drain terminals can program the non-volatile conductance states of the device by Joule heating while the pulses applied on the gate-drain terminals induces the transient modulation in the conductance states. As shown in figure 10(h), long-term plasticity and short-term plasticity can be represented by analog non-volatile conductance states and volatile conductance states, respectively. Therefore, device-level hardware integration of tunable long-term and short-term plasticity in a single artificial synapse is realized. Moreover, using this synapse with rich dynamics, sequential learning is demonstrated and synapse without short-term plasticity is unable to finish the sequential learning task successfully.

Summary of the memristive computing hardware for unlabeled data processing
Various kinds of memristive devices can be used for the realization of unlabeled data processing system including classic machine learning algorithms, dynamic neural networks, generative models, and SNN. Table 1 summarizes and compares the performance of these memristive unlabeled data processing systems. Although memristive computing systems have shown great success in simple unlabeled data processing, this promising A n a l o g 6 0 × 60 Y NP-hard max-cut problems (not provided) TiO x memristor [85] A n a l o g 2 0 × 20 Y Graph partitioning problem and maximum-weight independent set problem (not provided) Y-flash cell [86] A n a l o g A n a l o g 5 × 9 N Simple pattern recognition (100%) PCM [123] A n a l o g 2 × 2 M cells Y Learning temporal correlations spatio-temporal patterns (not provided) CBRAM [32] Analog 512 kbit Y Unsupervised learning for MNIST (82%) Pt/WO 3−x /W memristor [131] A n a l o g 9 × 9 N Rate-based orientation selectivity (100%) technology is still in its infancy. Only a few devices such as WO x memristor and PCM have implemented fullcustom integrated circuits and the crossbar size is not big enough. Many of these systems are simulation results or simple prototypes. Therefore, the realization of practical memristive unlabeled data processing computing systems faces many obstacles, especially for data-intensive tasks. Firstly, the main engineering obstacle is non-ideal effects including device defects, device-to-device variation, cycle-to-cycle variation, and parasitics [151,152]. Several serious consequences arise when non-ideal memristive devices are used, such as the large latency, drop in resolution and accuracy, as well as nonconvergence [153]. Compensating these non-ideal effects is crucial to the implementation of unlabeled data processing, especially for realizing neural networks-based models with memristor crossbars. Thus, many strategies have been proposed. For example, device defects could be compensated by using defect map in the training process [151]. Fine programming methods such as 'write-verify' can be used to suppress the variations [154]. Parasitics causes errors in currents/voltages and RC delay. Therefore, correction circuits are usually added to compensate parasitics [155]. Recently, a 'variation-and defect-aware training' strategy is proposed to compensate variation and defects. This strategy reduces the drop in inference accuracy significantly even under severe variations [156]. Nevertheless, some non-ideal effects in a proper range, such as intrinsic resistance drift of PCM [157], fluctuations and stochastic of RRAM [93] and intrinsic hardware noise of memristor [83] can be utilized to improve the accuracy of the computing system.
Secondly, IR drop is another critical challenge in memristor crossbar toward high-efficient unlabeled data processing systems. The IR drop problem, caused by the limited electrical conductivity of metal wire connections, constrains the scalability of these systems and hinders the degree of parallelism in computing [158,159]. The IR drop problem could be suppressed in some way such as through compensation technique [160] and designing new CMOS-memristor hybrid cells [161]. Nevertheless, it still needs special attention when designing memristive unlabeled date processing systems, especially for data-intensive tasks.
Thirdly, in the memristor crossbar, the high voltage applied in the initial forming process for the memristors can damage the devices in the same array without selectors. To fully exploit the benefits of memristive devices over traditional CMOS devices for developing advanced computing systems, electroforming-free memristive devices are in urgent need. The key parameters for forming processes are the voltage for electroforming, the electroforming time, and the presence and quality of switching behavior after electroforming. These parameters strongly depend on the insulating material as well as the deposit method [140]. Electroforming-free has been achieved in memristive tunneling junctions [162] and TaO x -based memristor [67], but still needs to be developed in many other memristive devices.
Last but not least, not all the as-prepared memristive devices can be utilized for all the proposed models and systems. To overcome this obstacle, a lot of effort is dedicated both at the device level and at the circuit level. For example, a lot of memristive devices do not have enough internal states to ensure gradual continuous conductance modulation under subsequent voltage signals. Therefore, these devices are unfit for realizing complex synaptic plasticity, let alone SNN [163]. This problem can be solved by introducing a buffer layer such as AlO x to reduce current overshoot [164] or an electro-thermal modulation layer to control the electric field and temperature of the switching layer [165]. Memristor has nonlinear I-V characteristics and which is an obstacle when performing VMMs in the current/voltage domain. This restriction can be bypassed by developing computing systems that support charge domain operation with pulse width modulation and custom ADCs [71,166].

Conclusion and prospects
In a nutshell, the integration of memristive devices and computing systems for unlabeled data can revolutionize classification, optimization, and neuromorphic computing research thanks to the great improvement in energy efficiency and computation speed. To promote the development of this exciting research topic, researchers in different fields, such as AI, computer science, signal processing, circuits and microelectronics, and materials, should work together to make significant improvements and overcome the challenges occurring in cross-layers of device, circuits, microarchitectures, systems, and software. Here are a few points that we have to make about the latest trends.
As mentioned before, memristive devices have large fluctuations in parameters due to non-ideal effects. These defects dramatically reduce yields which are crucial for their commercial application [167]. Many strategies have been proposed to promote the compatibility of the memristive device with the CMOS process [168][169][170]. Continued material and process optimizations are still required to address the remaining variation and yield issues before mass production for commercial applications.
While SNNs are the better choices to deal with spatiotemporal unlabeled data, it still falls behind the traditional ANNs in the benchmark test on some tasks and the highly effective training of SNNs is still an open question. These algorithmic issues must be resolved for developing future efficient SNN unlabeled date processing systems.
Hardware/software co-design is a key strategy for building efficient computing systems by combining the advances in memristive devices with algorithmic methods. For example, high efficiency and accuracy memristive computing system for pattern recognition is experimentally demonstrated with the integration of low-energy consumption subquantum CBRAM devices and network pruning algorithm [32]. An adaptive quantization method in software is utilized to compensate for the accuracy loss due to limited conductance levels of low-precision PCM in hardware [171].
In summary, we envision that there will be innovations in hardware, software as well as their co-design for the field of low-power, efficient computing platforms. The near-term development of computing platforms for the unlabeled data processing shall be built with mixed-signal architectures via the co-design of memristive devices and conventional CMOS technology. In the long run, breakthroughs are expected from advanced computational memristive computing systems with new hardware and software, such as efficient nano memristive devices of new operating principles and materials as well as unlabeled data processing models with more complex human-like cognitive abilities.