Modern synergetic neural network for imbalanced small data classification

Wang, Zihao; Li, Haifeng; Ma, Lin

doi:10.1038/s41598-023-42689-8

Download PDF

Article
Open access
Published: 21 September 2023

Modern synergetic neural network for imbalanced small data classification

Zihao Wang¹,
Haifeng Li¹ &
Lin Ma¹

Scientific Reports volume 13, Article number: 15669 (2023) Cite this article

596 Accesses
Metrics details

Subjects

Abstract

Deep learning’s performance on the imbalanced small data is substantially degraded by overfitting. Recurrent neural networks retain better performance in such tasks by constructing dynamical systems for robustness. Synergetic neural network (SNN), a synergetic-based recurrent neural network, has superiorities in eliminating recall errors and pseudo memories, but is subject to frequent association errors. Since the cause remains unclear, most subsequent studies use genetic algorithms to adjust parameters for better accuracy, which occupies the parameter optimization space and hinders task-oriented tuning. To solve the problem and promote SNN’s application capability, we propose the modern synergetic neural network (MSNN) model. MSNN solves the association error by correcting the state initialization method in the working process, liberating the parameter optimization space. In addition, MSNN optimizes the attention parameter of the network with the error backpropagation algorithm and the gradient bypass technique to allow the network to be trained jointly with other network layers. The self-learning of the attention parameter empowers the adaptation to the imbalanced sample size, further improving the classification performance. In 75 classification tasks of small UC Irvine Machine Learning Datasets, the average rank of the MSNN achieves the best result compared to 187 neural and non-neural network machine learning methods.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Article 08 May 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

Article 14 May 2024

Introduction

Accurate classification of massive samples, as a landmark accomplishment of deep learning, has achieved excellent performance that compares to or surpasses that of humans. However, its performance is based on the premise of a huge number of data with a balanced distribution, which makes it highly susceptible to the overfitting problem with small^1,2,3,4 and imbalanced^5,6,7 data. Some approaches try to mitigate the overfitting problem by data augmentation^8,9,10. However, the supplemental data obtained by such methods are only the result of distribution prediction of the existing data, whose accuracy is also highly dependent on the amount of data. Therefore, data-scarce fields still rely on traditional neural networks or non-neural machine learning methods.

Among these traditional methods, the recurrent neural networks (RNN)^11,12,13,14 have promising performances, and the synergetic neural network (SNN)¹¹ therein natively supports data distribution adaptation, making it more suitable for classification tasks with small amounts of imbalanced data. RNN has the advantages of a small number of parameters, fast training speed, controllable working process, low data dependence, and high robustness^15,16, which supports its wide application in small data classification tasks¹⁷. Although SNN is a relatively outdated RNN model, it still has some theoretical advantages over the latest models. All attractors of the SNN correspond to valid memories, saddle points and the local extreme are limited and can be easily escaped, ensuring the network’s unrivaled robustness^18,19,20. In addition, its unique synergetic-based dynamics lead to zero error in convergence results and native imbalanced data adaptation^16,21,22. The overfitting problem can be effectively reduced by repeatedly applying varied attention parameters to different classes of data during the recurrences of the network.

However, these advantages are masked by the network's frequent association errors, and the optimization space of the parameters is employed to overcome this problem, preventing the network's adaption capability from being fully revealed. Researchers apply SNN in tasks including image retrieval²³, face recognition²⁴, and semantic role labeling^25,26, yet find that the increased task difficulty significantly elevates the frequency of network association errors. Hu first addresses the problem by adding more parameters for adjustment²⁷. Other studies optimize parameters with immune clonal strategy²⁸, fuzzy integral²⁹, and improved particle swarm optimization³⁰. Besides, applying the immune clonal strategy to enhance the orthogonality of memories also helps improve accuracy³¹. However, the ideology of these approaches is to use various learning methods to solve SNN’s working problem. From the dynamical system perspective, the association error stems from the initial state being placed in the wrong basin of attraction in its working procedure. Since the cause of the problem is not revealed, parameter tuning becomes the dominant route. Due to the lack of explicit objectives, optimization methods are often based on genetic algorithms. Such a research route substantially complicates SNN’s application process and occupies the optimization space of the parameters, so task-oriented parameter training is difficult to be introduced simultaneously. These problems lead related research to a standstill.

In this paper, we propose a modern synergetic neural network (MSNN) model to properly apply the advantages of SNN to practical problems. We first address the association error and release the parameter tuning space by defining and remodeling the state initialization method of SNN. Although SNN’s first study and some subsequent studies suggest that its initial state characterizes the similarity between samples and memories^{11,23,32,33,34}, we prove that the initial state does not conform to the principles of a similarity metric. Therefore, we distill the network's method of calculating the initial state and remodel it as a definitive solution. Since the new solution isolates the parameter tuning process, the whole optimization space can be reserved for the task properties. We design an Error BackPropagation (EBP) based attention parameter training method that allows MSNN to be co-trained with other network layers for automatic data distribution adaptation. Experimental results on 75 imbalanced small UC Irvine Machine Learning (UCI) Datasets show that these improvements make MSNN outperform 187 neural and non-neural methods.

Contribution of this work

(1) Revealing the root of the SNN association error to be the wrong state initialization method. (2) Updating SNN’s working process to solve the association error and release the parameter tuning space. (3) Proposing an EBP-based training method to enable adaptation of the built-in attention parameters to the data distribution.

Related work

In general, classification methods for imbalanced data include data preprocessing, training target modification, and proposing targeting methods³⁵. Since most classification networks are not designed with data imbalance, related research focuses on data preprocessing methods. Depending on the distributional characteristics of the data, preprocessing methods can be categorized into oversampling^36,37, undersampling^38,39, and hybrid methods of the two^40,41. In recent years, with the increasing demand for data volume of classification networks and the proposal of pattern generation methods based on generative adversarial networks, oversampling of minority classes has gradually become a mainstream method^42,43. However, our solution belongs to the category of proposing targeting methods, and our network natively supports imbalanced data for training, which is divergent from the above studies.

SNN overview

SNN’s working procedure

SNN is a 3-layer RNN, its network structure is shown in Fig. 1. Updating formula of SNN¹¹ of its input, hidden, and output layer is

$${\varvec{\xi}}={V}^{+}{\varvec{x}}$$

(1)

$${{\varvec{\xi}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}=Syn\left({\varvec{\xi}}\right)=\gamma \left(\frac{b{{\varvec{\xi}}}^{3}+{\varvec{\lambda}}{\varvec{\xi}}}{(b+c){\Vert {\varvec{\xi}}\Vert }_{2}^{2}}+\left(\frac{1}{\gamma }-1\right){\varvec{\xi}}\right)$$

(2)

$${{\varvec{x}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}=V{{\varvec{\xi}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}$$

(3)

${\varvec{x}}$ is the normalized query pattern. $V=[{{\varvec{v}}}_{1},\dots ,{{\varvec{v}}}_{{\varvec{M}}}]$ is the matrix of normalized static prototypes representing memories. ${{\varvec{x}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}$ is the new input transmitting to Eq. (1). ${\varvec{\xi}}$ is the vector of order parameters. ${V}^{+}$ is $V$’s Moore–Penrose inverse^44,45. $Syn$ is the Synergetic activation function. $\gamma$ is the learning rate. Network parameters include ${\varvec{\lambda}}$, $b$, and $c$. ${\varvec{\lambda}}$ is the attention parameter to the prototypes with default value 1. Higher attention brings greater chances of association. $b$ and $c$ control the convergence speed with default value 1. SNN requires that all prototypes are mutually independent, and that their total number is less than the dimension, such that the product of ${V}^{+}$ and $V$ is the identity matrix. Substitute Eq. (3) into (1), and ${\varvec{\xi}}={V}^{+}{{\varvec{x}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}={{\varvec{\xi}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}$. Thus, the update formula can be interpreted as constructing a dynamic system of ${\varvec{\xi}}$. ${\varvec{\xi}}$ is the dynamical state, and its initial value is the initial state. The variation of ${\varvec{\xi}}$ is reflected to ${\varvec{x}}$ through $V$.

SNN converges to three kinds of stationary points, including the target stable point, the saddle point, and the local maxima point. The convergences are shown in Fig. 2. Generally, SNN reaches the target stable point. The target stable point is reached when ${\varvec{\xi}}$ is the positive or negative one-hot encoding. The single nonzero order parameter is called the winner parameter. The network outputs $\pm {\varvec{v}}$ at this point, which reflects the association from ${\varvec{x}}$ to ${\varvec{v}}$. The saddle point is reached when ${\varvec{\xi}}$ has more than one identical nonzero value, which stems from multiple identical extremes in the initial state. The local maximum point is reached when all elements of ${\varvec{\xi}}$ are 0. The division by 0 error in Eq. (2) blocks the network from working.

SNN’s basin of attraction

In describing the convergence process to the target stabilization point, SNN proposes the “winner-takes-all” property, i.e., the order parameter with the biggest absolute initial value is the winner parameter, but lacking detailed proof. Therefore, we prove this property by showing that $\left|{\xi }_{m}^{new}\right|$ is the largest when $\left|{\xi }_{m}\right|$ is the largest. The detailed proof is shown in SI ***1A. From the perspective of dynamical systems, the “winner-takes-all” property can be interpreted as extreme-based basin partitioning. The basin of SNN’s attractor is the set of all initial states with the same sign and extreme value index as itself. The attractors, basins, and trajectories of random initial states of SNN in 2D and 3D are shown in Fig. 3. It can be seen that such a division allows the order parameter with the biggest absolute value retains its winner position throughout the convergence.

MSNN

The nonlinear dynamical system is sensitive to the initial state and should be carefully designed. However, the initialization method of SNN was proposed without in-depth analysis. Although the associative memory task requires the correct association to be the most similar memory to the input, we prove that the existing initialization method will designate the order parameter with smaller similarity as the winner parameter. Due to the “winner-takes-all” property of SNN’s convergence, the selected winner will converge to ± 1, so the network will output the less similar memory as the association result, leading to the association error. To address this problem, we redesign the state initialization method to correct the winner designation process. The new approach ensures the consistency of the winner selection and the association target, fundamentally solving the association error problem of SNN. In addition, the new initialization method provides the feasibility of EBP-based parameter learning.

SNN’s erroneous state initialization method

The working target of SNN is to converge to the most similar memory. The initial state controls the convergence, so the initialization method should be proposed under a similarity metric. However, the similarity between the sample ${\varvec{x}}$ and the memory ${\varvec{v}}$ cannot be characterized by the metric of SNN’s state initialization method

$$S\left({{\varvec{v}}}_{{\varvec{m}}},{\varvec{x}}\right)={{\varvec{v}}}_{{\varvec{m}}}^{+}\cdot {\varvec{x}}$$

(4)

Although there are at least 67 different metrics applied in various fields⁴⁶, all similarity metrics shall satisfy the following three principles⁴⁷:

1.
Commonality related. The more commonality they share, the more similar they are.
2.
Difference related. The more differences they have, the less similar they are.
3.
The maximum is reached when identical.

However, $S$ actually characterizes the scaled cosine distance of ${{\varvec{v}}}_{{\varvec{m}}}^{+}$ and ${\varvec{x}}$, conforming to none of the above principles. For Principle (3), $S=1$ when ${\varvec{x}}={{\varvec{v}}}_{{\varvec{m}}}^{+}$, while $S={\Vert {{\varvec{v}}}_{{\varvec{m}}}^{+}\Vert }_{2}$ when ${\varvec{x}}={{\varvec{v}}}_{{\varvec{m}}}^{+}/{\Vert {{\varvec{v}}}_{{\varvec{m}}}^{+}\Vert }_{2}$ (i.e., ${\varvec{x}}$ is the normalized adjoint vector). From "Related work", ${V}^{+}V$ is the identity matrix, so

$${{\varvec{v}}}_{{\varvec{m}}}^{+}\cdot {{\varvec{v}}}_{{\varvec{n}}}=\left\{\begin{array}{l}1,m=n\\ 0,otherwise\end{array}\right.$$

(5)

which means that ${{\varvec{v}}}_{{\varvec{m}}}^{+}$ is perpendicular to the hyperplane of all prototypes except ${{\varvec{v}}}_{{\varvec{m}}}$. Since the inner product of ${{\varvec{v}}}_{{\varvec{m}}}^{+}$ and ${{\varvec{v}}}_{{\varvec{m}}}$ is 1, the angle between ${{\varvec{v}}}_{{\varvec{i}}}^{+}$ and ${{\varvec{v}}}_{{\varvec{m}}}$ takes values in the range [0,0.5π). SNN requires ${{\varvec{v}}}_{{\varvec{m}}}^{+}$ to be normalized, so ${\Vert {{\varvec{v}}}_{{\varvec{m}}}^{+}\Vert }_{2}\ge 1$. $S$ may achieve a bigger value when it is not equal to ${{\varvec{v}}}_{{\varvec{m}}}$, so $S$ does not satisfy Principle (3). For Principle (1) and (2), as ${\varvec{x}}$ gradually approaches ${{\varvec{v}}}_{{\varvec{m}}}^{+}$ from ${{\varvec{v}}}_{{\varvec{i}}}$, its commonality with ${{\varvec{v}}}_{{\varvec{m}}}$ decreases and the difference increases, but $S$ increases other than decreases. Therefore, $S$ does not satisfy Principles (1) and (2).

The conflict between $S$ and the similarity metric causes the association error. From the previous section, the order parameter with the largest absolute value in the initial state is the winner parameter. SNN will pick the wrong winner when the largest order parameter relates to a less similar ${\varvec{v}}$ by $S$, which leads to an association error.

MSNN’s remodeling of the state initialization method

The association error originates from the wrong initial state, so the MSNN needs to redesign the initialization method. Since SNN’s basin of attraction focuses on the parameter’s absolute value, simply using the similarity measure as the state initialization method of SNN may allow the smallest negative order parameter to be the winner, making the network associates the least similar memory. To avoid this problem, we propose the new initialization method as

$${\xi }_{m}=ReLU\left(S\left({\varvec{x}},{{\varvec{v}}}_{{\varvec{m}}}\right)\right)$$

(6)

$S$ is the similarity measure between the query and the memory. $ReLU$ sets the negative value to zero, eliminating the possibility of the negative order parameter becoming the winner. In summary, the working process of MSNN is

$${\varvec{\xi}}=\left\{\begin{array}{c}ReLU\left({\left(S\left({\varvec{x}},{{\varvec{v}}}_{1}\right),\dots ,S\left({\varvec{x}},{{\varvec{v}}}_{{\varvec{M}}}\right)\right)}^{\mathrm{T}}\right),\xi \ not \ initialized\\ {V}^{+}x, otherwise\end{array}\right.$$

(7)

$${{\varvec{\xi}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}=Syn\left({\varvec{\xi}}\right)$$

(8)

$${{\varvec{x}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}=V{{\varvec{\xi}}}^{{\varvec{n}}{\varvec{e}}{\varvec{w}}}$$

(9)

MSNN’s network structure is shown in Fig. 4.

The new initialization method ensures the correct association while improving the running speed. This method only allows the positive value to be the initial value of the order parameter, so the most similar memory must correspond to the largest order parameter. From the “winner-takes-all” property, the largest order parameter becomes the winner, and the most similar memory becomes the association result. The new initialization method sparsifies ${\varvec{\xi}}$ by setting negative order parameters to zero, thus speeding up the hardware computation.

MSNN’s attention parameter self-learning

SNN’s genetic algorithm-based parameter learning is hard to be co-trained with other modern network layers, so we design an EBP-based learning method. The new learning method adjusts the attention parameter ${\varvec{\lambda}}$ to assign greater attention to classes with smaller sizes for imbalanced data self-adaptation. Before applying EBP, $Syn$ repeatedly imposes a polynomial function onto the input, which may lead to the gradient exploding or vanishing. The gradient problem is so severe that conventional means like gradient clipping can barely circumvent the non-convergence. To solve this problem, we first normalize ${\varvec{\xi}}$ and divide $Syn$ into two terms,

$$Syn\left({\varvec{\xi}}\right)=\frac{\gamma{\varvec{\lambda}}{\varvec{\xi}}}{b+c}+\left(\frac{\gamma b{{\varvec{\xi}}}^{3}}{b+c}+\left(1-\gamma \right){\varvec{\xi}}\right)$$

(10)

EBP is performed normally for the former term, and the latter term uses the gradient bypass technique^48,49. This technique passes the gradient of certain network layer outputs directly to the input during backpropagation, which is used to circumvent the inappropriate activation functions causing the gradient exploding or vanishing, even the gradient intransmissible caused by discontinuity.

The parameter learning requirement can be satisfied by directly acting EBP onto ${\varvec{\lambda}}$. Let the error of ${\xi }_{i}^{new}$ be ${\delta }_{i}$. ${\xi }_{i}\ge 0$,

$$\frac{\partial {\delta }_{i}}{\partial {\lambda }_{i}}=\frac{\gamma {\xi }_{i}}{b+c}\ge 0$$

(11)

so the adjustment $\Delta {\lambda }_{i}$ has a different sign than ${\delta }_{i}$. ${\delta }_{i}>0$ means ${\xi }_{i}^{new}$ is too large, and $\Delta {\lambda }_{i}\le 0$ means the network will not increase its attention to ${\xi }_{i}$, giving it a higher chance to converge to 0. ${\delta }_{i}<0$ means ${\xi }_{i}^{new}$ is too small, and $\Delta {\lambda }_{i}\ge 0$ leads ${\xi }_{i}$ to a higher chance of converging to 1. Therefore, EBP satisfies the parameter learning requirement of ${\varvec{\lambda}}$.

Experiments

Dataset and network configuration

We test MSNN on the small datasets of the UCI, a collection of 121 datasets as pattern classification tasks to benchmark both neural network and non-neural network machine learning algorithms. These datasets are divided into 75 small and 46 large datasets by the threshold of 1000 samples⁵⁰. All of these datasets are imbalanced after the train-test set division. We compare our network against 187 neural and non-neural machine learning algorithms. Their configurations and performances are detailed in literatures^16,17,50. See SI 1B .2 for dataset configuration details.

As for the network architecture, we use the embedding layers, which is/are {0, 1, 7} fully connected layer(s) with ReLU activation functions and {32, 128, 1024} hidden units per embedding layer. These embedding layers are followed by SNN with iteration {0} to {9} and a mapping to the output vector with the dimension number of classes. The prototype matrix is obtained by intra-class K-means clustering, and the adjoint matrix is the M-P inverse of the prototype matrix. The network structure used for the experiments is shown in Fig. 5. On each dataset, we use EBP to train SNN’s hyperparameter ${\varvec{\lambda}}$ and perform a grid search to determine the best hyperparameter setting for the embedding layers, the memory number, and SNN’s iteration number. The hyperparameter search space of the grid search is listed in Table 1. All models are trained for 100 epochs with a mini-batch size of 4 samples using the softmax cross-entropy loss and the AdamW optimizer⁵¹. After each epoch, the model accuracy is computed on a separate validation set. Using the gradient direct transmission technique^48,49, the gradient of MSNN’s output layer in the error backpropagation stage is passed directly to the state initialization layer to circumvent bypassing the polynomial-shaped activation function of the SNN causing gradient exploding or vanishing. With early stopping, the model with the best validation set accuracy averaged over 16 consecutive epochs is selected as the final model. This final model is then evaluated against the test set to determine the accuracy.

Table 1 Hyperparameter search space for grid search on small UCI datasets.

Full size table

Classification performances validation

The Friedman rankings of these methods among datasets are presented in Table 2. MSNN outperforms all other methods on small datasets, setting a new state-of-the-art for 12 datasets (balance-scale, breast-cancer, congressional-voting, heart-cleveland, ionosphere, low-res-spect, monks-2, monks-3, planning, post-operative, soybean, and spect). See SI 1B ***for more details.

Table 2 Friedman ranking and average accuracy (%) for each classifier, ordered by increasing Friedman ranking.

Full size table

Imbalanced data adaptation performance

We analyze the performance of MSNN for datasets with different levels of imbalance in terms of the percentage of majority class %Maj¹⁷. %Maj reflects the level of imbalance in the dataset, the higher the %Maj, the higher the imbalance. The classifier is prone to focus on the majority class when applied to imbalanced data, i.e., labeling all samples as the major class, which brings the overfitting problem. The more severe the overfitting problem is, the closer the accuracy of the classifier will be to %Maj. Thus, the accuracy over %Maj, denoted by $\sigma$, can reflect the extent of minor class samples being correctly classified. See SI 1B ***for the %Maj of each UCI dataset. We rank each dataset in ascending order of %Maj and mark the accuracies of the top three Friedman ranking methods in Fig. 6a. We merge the adjacent datasets in groups of five and calculate their average $\sigma$ for better visualization. The results are shown in Fig. 6b. MSNN outperforms other methods in most cases, and the average $\sigma$ improvement is most obvious in groups 2–12 (except group 10) with the %Maj interval of (30.93, 67.83), which indicates that MSNN is useful for both mild and moderate imbalance datasets have good adaptive performance. In groups 13–15 with %Maj greater than 73.53, the average $\sigma$ of MSNN has a decrease compared to other methods, which suggests that the linear classifier and the standard associative memory network have a more stable performance for heavily imbalanced datasets.

Order parameter initialization validation

We verify the effectiveness of MSNN’s order parameter initialization method for correcting association errors by comparing the accuracy to SNN. We use the balanced parameter configuration (all parameters default to be 1), so the association target is the most similar memory to the query. The performance of the SNN and MSNN is shown in Fig. 7. MSNN achieves 100% accuracy for all datasets, while SNN achieves 100% accuracy in only 5 datasets (acute-inflammation, acute-nephritis, horse-colic, monks-3, and trains). Its average accuracy is 66.47%.

Attention parameter learning performance

MSNN mitigates overfitting by attention parameter ${\varvec{\lambda}}$ self-learning to provide greater attention to classes with small sample sizes. Ideally, the elements of ${\varvec{\lambda}}$ should negatively correlate with the number of imbalanced samples. Due to the diversity of data sources, MSNN cannot guarantee fitting effectiveness on all datasets, and the attention parameter’s learning performance can hardly be reflected in the underperformed datasets. In addition, the correlation between ${\varvec{\lambda}}$ and the number of imbalanced samples is challenging to model in a parameter configuration with multiple attention parameters corresponding to one class. Since the objective is to verify the ideal cases rather than all cases, we dropped results from 28 datasets that did not meet the criteria, including (1) positive effect of ${\varvec{\lambda}}$ self-learning on performance. (2) The number of ${\varvec{\lambda}}$ is equal to the class number. We use Spearman correlation analysis to verify the correlation between ${\varvec{\lambda}}$ and the sample sizes. Spearman analysis requires at least four samples, yet a significant proportion of 2 or 3 class tasks are in the UCI dataset. Therefore, we apply 1-norm to ${\varvec{\lambda}}$ from different datasets and integrate them. The integrated ${\varvec{\lambda}}$ contains 192 samples with a correlation coefficient of about − 0.170, corresponding to a p value of about 0.019. Thus, ${\varvec{\lambda}}$ has a significantly negative correlation with the sample sizes, indicating that EBP is applicable to the learning of ${\varvec{\lambda}}$.

Conclusion

In this paper, we propose the MSNN model to further improve RNN’s classification performance on imbalanced small data. MSNN first addresses SNN’s misattributing association errors to underoptimized parameters in existing studies by modifying the state initialization method in its working process, releasing the whole parameter optimization space to task requirements. Then, MSNN adjusts SNN’s built-in attention parameter through an EBP and error bypass-based learning method for network self-adaptation of imbalanced data during network layers joint training. Experimental results on 75 small UCI datasets show that MSNN retains error-free associations on all datasets, and the attention parameters spontaneously establish a strong correlation with the imbalanced sample size. These improvements make MSNN outperforms 187 methods and achieves a new state-of-the-art.

Our study allows the theoretical advantages of the Synergetics to be successfully applied in artificial neural networks, and we plan to further extend these advantages to other areas in future work, including optimization methods for attention mechanisms and self-learning methods of representative prototypes.

Data availability

Publicly available datasets were analyzed in this study. This data can be found here: https://archive.ics.uci.edu/datasets.

References

Ba, J., Hinton, G., Mnih, V., Leibo, J. Z. & Ionescu, C. Using fast weights to attend to the recent past. Adv. Neural Inf. Process. Syst. 20, 4338–4346 (2016).
Google Scholar
Wu, X., Liu, X., Li, W. & Wu, Q. Improved expressivity through dendritic neural networks. Adv. Neural. Inf. Process. Syst. 2018, 8057–8068 (2018).
Google Scholar
Schlag, I. & Schmidhuber, J. Learning to reason with third-order tensor products. Adv. Neural. Inf. Process. Syst. 2018, 9981–9993 (2018).
Google Scholar
Radhakrishnan, A., Belkin, M. & Uhler, C. Overparameterized neural networks implement associative memory. Proc. Natl. Acad. Sci. USA 117, 27162–27170 (2020).
ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Huang, C., Li, Y., Loy, C. C. & Tang, X. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2016, 5375–5384 (2016).
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A. & Togneri, R. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 29, 3573–3587 (2018).
PubMed Google Scholar
Yan, Y., Chen, M., Shyu, M. L. & Chen, S. C. Deep Learning for Imbalanced Multimedia Data Classification. In Proceedings—2015 IEEE International Symposium on Multimedia, ISM 2015, 483–488 (2016). https://doi.org/10.1109/ISM.2015.126.
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 25 (2019).
Google Scholar
Gao, J. I. E. Data augmentation in solving data imbalance problems. Degree Proj. Comput. Sci. Eng. 20, 20 (2020).
Google Scholar
Wen, Q. et al. Time series data augmentation for deep learning: A survey. IJCAI Int. Jt. Conf. Artif. Intell. 20, 4653–4660. https://doi.org/10.24963/ijcai.2021/631 (2021).
Article Google Scholar
Haken, H. Synergetic Computers and Cognition : A Top-Down Approach to Neural Nets. Springer Series in Synergetics, ***Vol. 50 (Springer, 1991).
MATH Google Scholar
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554–2558 (1982).
ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Kosko, B. Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 18, 49–60 (1988).
MathSciNet Google Scholar
Adachi, M. & Aihara, K. Associative dynamics in a chaotic neural network. Neural Netw. 10, 83–98 (1997).
PubMed MATH Google Scholar
Krotov, D. & Hopfield, J. J. Dense associative memory for pattern recognition. In Advances in Neural Information Processing Systems 1180–1188 (Neural Information Processing Systems Foundation, 2016).
Google Scholar
Ramsauer, H. et al. Hopfield Networks is All You Need. http://arxiv.org/abs/2008.02217 (2020).
Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we need hundreds of classifiers to solve real world classification problems?. J. Mach. Learn. Res. 15, 3133–3181 (2014).
MathSciNet MATH Google Scholar
Wang, H., Yu, Y., Wen, G., Zhang, S. & Yu, J. Global stability analysis of fractional-order Hopfield neural networks with time delay. Neurocomputing 154, 15–23 (2015).
Google Scholar
Wu, A., Zeng, Z. & Song, X. Global Mittag–Leffler stabilization of fractional-order bidirectional associative memory neural networks. Neurocomputing 177, 489–496 (2016).
Google Scholar
Yang, Z. & Zhang, J. Global stabilization of fractional-order bidirectional associative memory neural networks with mixed time delays via adaptive feedback control. Int. J. Comput. Math. 97, 2074–2090 (2020).
MathSciNet MATH Google Scholar
Demircigil, M., Heusel, J., Löwe, M., Upgang, S. & Vermet, F. On a model of associative memory with huge storage capacity. J. Stat. Phys. 168, 288–299 (2017).
ADS MathSciNet MATH Google Scholar
Barra, A., Beccaria, M. & Fachechi, A. A new mechanical approach to handle generalized Hopfield neural networks. Neural Netw. 106, 205–222 (2018).
PubMed MATH Google Scholar
Zhao, T., Tang, L. H., Ip, H. H. S. & Qi, F. On relevance feedback and similarity measure for image retrieval with synergetic neural nets. Neurocomputing 51, 105–124 (2003).
Google Scholar
Wong, W. M., Loo, C. K. & Tan, A. W. C. Parameter controlled chaotic synergetic neural network for face recognition. In 2010 IEEE Conference on Cybernetics and Intelligent Systems, CIS 2010, 58–63 (2010). https://doi.org/10.1109/ICCIS.2010.5518581.
Huang, Z., Chen, Y. & Shi, X. A parallel SRL algorithm based on synergetic neural network. J. Converg. Inf. Technol. 7, 1–8 (2012).
Google Scholar
Huang, Z., Chen, Y. & Shi, X. A synergetic semantic role labeling model with the introduction of fluctuating force accompanied with word sense information. Intell. Data Anal. 21, 5–18 (2017).
Google Scholar
Hu, D. & Qi, F. Reconstruction of order parameters in synergetics approach to pattern recognition. J. Infrared Millim. Waves 17, 177–181 (1998).
ADS Google Scholar
Ma, X. & Jiao, L. Reconstruction of order parameters based on immunity clonal strategy for image classification. Lect. Notes Comput. Sci. 3211, 455–462 (2004).
Google Scholar
Ma, X., Wang, S. & Jiao, L. Robust classification of immunity clonal synergetic network inspired by fuzzy integral. Lect. Notes Comput. Sci. 3497, 26–31 (2005).
MATH Google Scholar
Chen, Y., Huang, Z. & Shi, X. An SNN-based semantic role labeling model with its network parameters optimized using an improved PSO algorithm. Neural Process. Lett. 44, 245–263 (2016).
Google Scholar
Gou, S. P., Jiao, L. C. & Tian, X. L. Image recognition using synergetic neural networks based on immune clonal clustering. J. Electron. Inf. Technol. 30, 263–266 (2008).
Google Scholar
Wagner, T. & Boebel, F. G. Testing synergetic algorithms with industrial classification problems. Neural Netw. 7, 1313–1321 (1994).
Google Scholar
Li, H., Ma, X., Wan, W. & Zhou, X. Image similarity matching retrieval on synergetic neural network. In ICALIP 2010–2010 International Conference on Audio, Language and Image Processing, Proceedings, 1566–1571 (2010). https://doi.org/10.1109/ICALIP.2010.5684499.
Liu, S., Liu, Z., Sun, J. & Liu, L. Application of synergetic neural network in online writeprint identification. Int. J. Digit. Content Technol. Appl. 5, 126–135 (2011).
Google Scholar
Kaur, H., Pannu, H. S. & Malhi, A. K. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv. https://doi.org/10.1145/3343440 (2019).
Article Google Scholar
Zheng, Z., Cai, Y. & Li, Y. Oversampling method for imbalanced classification. Comput. Inform. 34, 1017–1037 (2015).
Google Scholar
Moreo, A., Esuli, A. & Sebastiani, F. Distributional random oversampling for imbalanced text classification. In SIGIR 2016—Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 805–808 (2016). https://doi.org/10.1145/2911451.2914722.
Yu, H., Ni, J. & Zhao, J. ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101, 309–318 (2013).
Google Scholar
Dai, D. & Hua, S.-W. Random Under-sampling ensemble methods for highly imbalanced rare disease classification. In 12th International Conference on Data Mining, 54–59 (2016).
Qian, Y., Liang, Y., Li, M., Feng, G. & Shi, X. A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143, 57–67 (2014).
Google Scholar
Charte, F., Rivera, A. J., del Jesus, M. J. & Herrera, F. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015).
Google Scholar
Nugraha, R. A., Pardede, H. F. & Subekti, A. Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim. Kuwait J. Sci. 49, 2 (2022).
Google Scholar
Fanny, S. & Cenggoro, T. W. Deep learning for imbalance data classification using class expert generative adversarial network. Proced. Comput. Sci. 135, 60–67 (2018).
Google Scholar
Moore, E. H. On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc. 26, 394–395 (1920).
Google Scholar
Penrose, R. A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51, 406–413 (1955).
ADS MATH Google Scholar
McGill, M. & Koll, M. T. N. An evaluation of factors affecting document ranking by information retrieval systems. Algorithms 20, 20 (1979).
Google Scholar
Lin, D. An information-theoretic definition of similarity. Icml 20, 296–304 (1998).
Google Scholar
Van Den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. Adv. Neural. Inf. Process. Syst. 2017, 6307–6316 (2017).
Google Scholar
Razavi, A., van den Oord, A. & Vinyals, O. Generating diverse high-fidelity images with VQ-VAE-2. Adv. Neural. Inf. Process. Syst. 32, 25 (2019).
Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. Adv. Neural. Inf. Process. Syst. 2017, 25 (2017).
Google Scholar
Loshchilov, I. & Hutter, F. Fixing Weight Decay Regularization in Adam. Iclr https://openreview.net/pdf?id=Bkg6RiCqY7 (2018).

Download references

Author information

Authors and Affiliations

Faculty of Computing, Harbin Institute of Technology, No.92, Xidazhi Street, Nangang District, Harbin, 150001, Heilongjiang, China
Zihao Wang, Haifeng Li & Lin Ma

Authors

Zihao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, H.L. and L.M.; methodology, Z.W.; software, Z.W.; validation, Z.W., H.L., and L.M.; formal analysis, Z.W.; investigation, Z.W.; resources, Z.W.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W.; visualization, Z.W.; supervision, Z.W.; project administration, Z.W. All authors reviewed the manuscript.

Corresponding author

Correspondence to Haifeng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Li, H. & Ma, L. Modern synergetic neural network for imbalanced small data classification. Sci Rep 13, 15669 (2023). https://doi.org/10.1038/s41598-023-42689-8

Download citation

Received: 27 March 2023
Accepted: 13 September 2023
Published: 21 September 2023
DOI: https://doi.org/10.1038/s41598-023-42689-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.