Periodic Variable Star Classification with Deep Learning: Handling Data Imbalance in an Ensemble Augmentation Way

Zihan Kang; Yanxia Zhang; Jingyi Zhang; Changhua Li; Minzhi Kong; Yongheng Zhao; Xue-Bing Wu

doi:10.1088/1538-3873/acf15e

1. Introduction

With the upcoming large-scale sky surveys represented by the Vera C. Rubin Observatory project (LSST; Ivezić et al. 2019), time-domain astronomy is now entering a golden age with overwhelming data. Besides, the ongoing surveys are still accumulating data to be analyzed, such as the Zwicky Transient Facility (ZTF; Bellm et al. 2019), the Optical Gravitational Lensing Experiment (OGLE; Udalski et al. 2015), and the Catalina Real-Time Transient Survey (CRTS; Drake et al. 2009). Billions of observed variable sources demand automatic classification for further research. Machine learning plays a prominent role in this task and is broadly divided into two types of algorithms: traditional approaches with artificial input features and deep learning methods based on various neural networks.

Among numerous kinds of variable sources, periodic variable stars are often studied as a particular category because of their importance and distinct observable difference from other sources. For example, Cepheid stars and RR Lyrae stars can be used as standard candles for distance measurement (Alloin & Gieren 2003), hence are crucial for the galaxy structure. However, the samples of different periodic variable star classes are highly imbalanced, meaning some classes dominate the known samples while others have few cases. This makes it challenging to train a satisfying machine learning model for the classification task due to the scarcity of some samples and the bias toward the majority classes.

Techniques for dealing with the class imbalance in machine learning can be grouped into three categories: data-level, algorithm-level, and hybrid approaches (Henning et al. 2023). Data-level methods focus on adjusting the training data by resampling and augmentation. Algorithm-level techniques generally modify algorithms with a weight or cost schema, with the assumption that the data are sufficient. Hybrid approaches combine both of them and are often implemented with ensemble learning methods. For traditional machine learning algorithms with artificial input features, plenty of researches are aimed at imbalance learning, such as the Synthetic Minority Over-sampling Technique (SMOTE; Chawla et al. 2002) and the Self-paced Ensemble (SPE; Liu et al. 2020). Nevertheless, the subject of deep learning with class-imbalanced data is understudied (Henning et al. 2023). Although some techniques exist for deep learning to avoid bias toward majority classes on the imbalance data, the issue remains due to the lack of data. Deep learning relies much more on data sufficiency than traditional machine learning algorithms because of the high model complicacy with many parameters. A practical solution would be to find an ideal data augmentation method, especially for astronomical light curves. Physical models and data-driven methods are often applied to generate synthetic light curves for data augmentation. As for variable stars, there are no proper physical models for now, only data-driven approaches to be considered, including adding noise (Naul et al. 2018), the Gaussian Process (Faraway et al. 2016; Castro et al. 2017) and the deep generative models (Martínez-Palomera et al. 2022).

The recurrent neural network (RNN) is a widely used deep learning algorithm for light curves, accepting both the light curve and its uncertainties as input (Naul et al. 2018). RNN can also combine with scalar contextual information, such as period and color, to form a multi-input model (Burhanudin et al. 2021). A synthetic light curve for an RNN model should have a well-modeled uncertainty, which is ignored by previous studies. There is another deep learning method to classify variable stars using the convolutional neural network (CNN) presented by Szklenár et al. (2022). CNN takes the light curve image as input, which also demands Gaussian Process augmentation.

Our work aims to find suitable neural network architecture for periodic variable star classification on the CRTS data while applying proper augmentation to handle data imbalance. We design a multi-input RNN-based neural network and apply Gaussian Process to generate artificial light curves with uncertainties. We also develop a compound neural network architecture by combining the RNN and CNN structures. To mitigate model overfitting on minority classes while improving classification performance, we organize the augmentation and training process in a "bagging-like" ensemble learning scheme.

This paper is organized as follows. Section 2 briefly introduces the CRTS variable star data and details the Gaussian process augmentation. Section 3 describes the ensemble learning scheme we adopt. In Section 4, we characterize the two neural network architectures we design. Section 5 gives the classification result by a comprehensive evaluation on an imbalance test data set. Section 6 presents some limitations and future work, and finally, the summary is provided in Section 7. We publish our source code on https://github.com/52Hzihan/mixnn4vs.

2. Data and Augmentation

The Catalina Real-Time Transient Survey (CRTS) surveyed 33,000 deg² of the sky and produced more than 500 million light curves for various sources. Among them, CRTS DR2 provided a catalog for variable stars (Drake et al. 2017). Similar to Hosenie et al. (2020), we take 11 classes into account for our analysis, as presented in Table 1. In addition, CRTS uses an unfiltered telescope, so the lack of color data highlights the importance of extracting information from light curves when implementing classification.

Table 1. The Number of Different Classes of CRTS Variable Stars

Classes of Variable Stars	No.
RRab	4325
Blazhko	171
RRc	3752
RRd	502
Rot (Rotational)	3636
Ecl (Contact and Semi-Detached Binary)	18803
EA (Detached Binary)	4509
LPV (Long Period Variable)	1286
δ-Scuti	147
ACEP (Anomalous Cepheids)	153
Cep-II (Type-II Cepheids)	153

Download table as: ASCII Typeset image

To clean the data set, we fit the phase-folded light curves with Friedman's SuperSmoother (Friedman 1984) and exclude data points deviating more than three standard deviations from the smoothed light curves. We also delete points with significant errors greater than twice the average error.

2.1. Gaussian Process

In order to deal with the imbalance of data, we need to generate simulated light curves for augmentation. Since no proper physical model is available for all types of variable stars, the only way is to create synthetic data from natural light curves. Because the deep learning approach we adopt takes uncertainties as part of inputs, we ought to build models for both natural light curves and their uncertainties. Therefore we turn to Gaussian Process (GP; Rasmussen & Williams 2005), a stochastic process for modeling time series, which is applied for light curve data augmentation in previous studies (Boone 2019).

GP is a distribution over functions. In our simple form with scalar input, it can be viewed as an Infinite-dimensional joint Gaussian distribution over time, which is fully described by its mean function and kernel function (i.e., covariance function).

$\begin{eqnarray}&&f(t)\sim \mathrm{GP}(\mu (t),k(t,t^{\prime} ))\end{eqnarray} \tag{ 1 }$

where μ(t) is the mean function and $k(t,t^{\prime} )$ computes the covariance between two points t and $t^{\prime}$ .

To fit a GP model for a light curve, we need to choose a prior kernel function and a prior mean function, then calculate the Bayesian posterior functions under the data. Here we adopt the Matern 5/2 kernel as the prior kernel, which is given by

$\begin{eqnarray}&&{k}_{\mathrm{Matern}52}(\tau )={\alpha }^{2}\left(1+\displaystyle \frac{\sqrt{5}\tau }{\rho }+\displaystyle \frac{5{\tau }^{2}}{3{\rho }^{2}}\right)\exp \left(-\displaystyle \frac{\sqrt{5}\tau }{\rho }\right)\end{eqnarray} \tag{ 2 }$

where $\tau =t-t^{\prime}$ , α and ρ are the hyperparameters to be optimized. For the prior mean function, we can simply set μ(t) = 0.

Given a real light curve with uncertainties ( t , m , σ ), For any other random time points t _*, the joint distribution of m and the predicted magnitude m _* will be a multivariate Gaussian distribution as follow:

$\begin{eqnarray}\left[\displaystyle \genfrac{}{}{0em}{}{{\boldsymbol{m}}}{{{\boldsymbol{m}}}_{* }}\right]\sim { \mathcal N }\left(\left[\displaystyle \genfrac{}{}{0em}{}{\mu ({\boldsymbol{t}})}{\mu ({{\boldsymbol{t}}}_{* })}\right],\left[\begin{array}{cc}k({\boldsymbol{t}},{\boldsymbol{t}})+\mathrm{diag}{\left({\boldsymbol{\sigma }}\right)}^{2} & k({\boldsymbol{t}},{{\boldsymbol{t}}}_{* })\\ k({{\boldsymbol{t}}}_{* },{\boldsymbol{t}}) & k({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* })\end{array}\right]\right)\end{eqnarray} \tag{ 3 }$

The posterior distribution of m _* comes as ${{\boldsymbol{m}}}_{* }\,\sim { \mathcal N }(\overline{\mu }({{\boldsymbol{t}}}_{* }),\overline{k}({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* }))$ , where

$\begin{eqnarray}&&\overline{\mu }({{\boldsymbol{t}}}_{* })=k({{\boldsymbol{t}}}_{* },{\boldsymbol{t}}){[k({\boldsymbol{t}},{\boldsymbol{t}})+\mathrm{diag}{\left({\boldsymbol{\sigma }}\right)}^{2}]}^{-1}({\boldsymbol{m}}-\mu ({\boldsymbol{t}}))+\mu ({{\boldsymbol{t}}}_{* })\end{eqnarray} \tag{ 4 }$

$\begin{eqnarray}&&\begin{array}{l}\overline{k}({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* })=k({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* })-k({{\boldsymbol{t}}}_{* },{\boldsymbol{t}})\\ \times \,{\left[k({\boldsymbol{t}},{\boldsymbol{t}})+\mathrm{diag}{\left({\boldsymbol{\sigma }}\right)}^{2}\right]}^{-1}k({\boldsymbol{t}},{{\boldsymbol{t}}}_{* })\end{array}\end{eqnarray} \tag{ 5 }$

The hyperparameters of the prior kernel function are optimized by minimizing the negative log-likelihood function

$\begin{eqnarray}&&\mathrm{ln}{ \mathcal L }(\alpha ,\rho )=-\displaystyle \frac{1}{2}{{\boldsymbol{r}}}^{T}{{\boldsymbol{K}}}^{-1}{\boldsymbol{r}}-\displaystyle \frac{1}{2}\mathrm{ln}\,\det ({\boldsymbol{K}})-\displaystyle \frac{N}{2}\mathrm{ln}(2\pi )\end{eqnarray} \tag{ 6 }$

where K = k( t , t ) + diag( σ )², r is the residual after subtracting the model prediction means from the observations, and N is the number of data points.

We employ the GP regression using George (Ambikasaran et al. 2015).

2.2. Generate Synthetic Light Curves with Uncertainty

The GP allows us to generate synthetic light curves ( t _*, m _*, σ _*) on randomly sampled time points t _*, where ${{\boldsymbol{\sigma }}}_{* }={tr}(\overline{k}({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* }))$ . To make a synthetic light curve more "real" in the specific generation process, we scale up σ _* to ensure the mean error is the same as its prototype. The time points t _* are sampled to have the same size as t , and the magnitudes m _* come from the corresponding $\overline{\mu }({{\boldsymbol{t}}}_{* })$ adding Gaussian noises with scaled σ _* as standard deviations. In addition, we apply a random phase shift for each synthetic light curve to enhance diversity.

Figure 1 shows examples of GP regression and synthetic light curves on several classes. The light curves are folded with twice the period for better exhibition. As a data-driven model, the GP regression result gives significant uncertainty at sections with few observations.

3. Ensemble Learning Method

A general approach for data augmentation is to produce enough synthetic light curves so that every class has an equal size for training data. However, in the case of the deep learning method, especially the recurrent neural network we adopt, this equal augmentation method meets problems. Although an artificial light curve has different numerical values against its prototype, their high-level features in the neural networks may still resemble each other since their shapes look similar. For categories with few entries, equal augmentation means too many simulations for a single light curve, which may cause overfitting on samples of small-size classes when training the neural networks. Employing fewer simulations and applying class weights may partly overcome the overfitting problem; however, this is still some trade-off between large-size and small-size classes, and it makes a limited contribution to the overall performance.

Considering the above issues, we develop an ensemble learning method for neural networks to tackle data imbalance, as depicted in Figure 2. Like the classical ensemble manner of "bagging" (Breiman 1996), our approach is to build several sub-data sets from the training data and then train a neural network on each sub-data set. The classified result will be an average of all networks' outputs. When setting up sub-data sets, we apply random undersampling for large-size categories while implementing Gaussian Process augmentation for small-size categories, ensuring every category has an equal and moderate size. Notice that the augmentation procedure generates different synthetic light curves for each sub-data set.

Besides the benefit of handling data imbalance, the ensemble learning method has its original profit of improving performance. Our "bagging-like" approach also takes this advantage to reach higher classification accuracy at the expense of training multiple networks. However, there is no need to worry about the overall computation cost. We can train every network for only a few epochs by choosing a large learning rate and then letting the ensemble procedure combine these "weak learners" to generate a strong model.

We split the whole data set into a training set, a validation set and a test set by a ratio of 6: 1: 3 with no overlapping. The training set is used to build 10 sub-data sets with 1875 light curves for each class. Note that these two hyperparameters are not optimized as the computational cost is high, only a proper value to exhibit the effectiveness of ensemble operation. The validation set is augmented to have every category's size equal to the max. We do not augment the test set in order to evaluate the classification on a real data imbalance degree (actually the imbalance degree of the data set since we do not know the natural distribution of classes). In practice, a neural network trained by this augmentation-based ensemble approach will give the same classification result for a natural light curve and its simulations.

4. Multi-input Neural Networks

Neural networks have become popular in astronomical data mining with their convenience and high performance and without feature engineering, usually recurrent neural networks (RNNs) for sequential data and convolutional neural networks (CNNs) for image data. As for light curves, RNNs turned into a trendy method since Naul et al. (2018) demonstrated that RNNs could easily handle their characteristic of irregularly sampled time series. Meanwhile, Szklenár et al. (2022) proved that CNNs could also behave well when classifying variable stars by plotting phase-folded light curves as images. These approaches provided different choices for data with different sequence lengths. RNNs usually deal with sequences of not longer than several hundred data points, while CNNs need as many observations as possible to make the plotting have distinct patterns. We try both kinds of neural networks to classify CRTS variable stars' light curves, finding that RNN performs much better than the CNN approach because the typical sequence lengths are merely 100–300. Therefore we choose RNN as a basic structure of our neural network model.

Apart from the light curve itself, generally, there is extra information that can be utilized as features in classification, such as period and variation amplitude (and colors in surveys other than CRTS). These numerical features demand a proper way to combine them with the RNN structure. We design a multi-input neural network to fully use these pieces of information, as shown in Figure 3. Hereafter we call this network an RNN-based multi-input neural network.

**Figure 3.** The architecture of the RNN-based multi-input neural network.
Download figure:
Standard image High-resolution image

We can easily combine RNN and CNN structures to perform better within this multi-input neural network architecture, as exhibited in Figure 4. Although using CNN alone is less effective than using RNN alone on CRTS data, the participation of CNN in the compound architecture may provide an additional perspective on high-level feature extraction. We apply a procedure of 2-stage training to optimize the compound multi-input neural network. The specification will be in Section 4.2.

We deploy our neural network models on Keras, a Python deep learning API built over TensorFlow (Abadi et al. 2015).

4.1. RNN-based Multi-input Neural Network

As shown in Figure 3, the RNN structure takes as input a sequence of vectors (Δt, mag, error). To preprocess light curves, we transform sampling time points t to time intervals Δt and normalize the values of magnitudes to have mean zero and standard deviations one on each light curve. A masking layer is applied after the input of the sequence in order to cope with different lengths of light curves, which demand to pad input sequences with zero until they reach a same length.

The central part of this RNN structure is two bidirectional Gated recurrent unit (GRU) layers: the first returns a sequence of vectors, and the second returns an embedding vector. GRU is a widely used architecture to help retain information over a long sequence and make it possible to extract morphological features from the entire light curve. Then we employ two fully connected layers (also named dense layer) to reduce the dimensionality of the embedding vector.

We also apply two dense layers after the input layer for the numerical data to be embedded into a vector. This idea is motivated by heuristic thinking that a high-dimensional representation may attach importance to the numerical input when concatenating with high-level features of the RNN part. After the concatenation, a dense layer with softmax output gives the probability of the input belonging to each class.

To mitigate overfitting, we adopt two kinds of regularization approaches, the Dropout layer and the so-called "label smoothing" method. Dropout layers are attached to the second GRU and the first dense layers. They take effect by randomly dropping neurons and corresponding connections during training. We use a dropout rate of 0.4, which means dropping 40% of the neurons. Label smoothing is to set the one-hot label of training samples to soft labels, which is to set 1 − α for the true class and α for other classes, where α is a small number that equals 0.1 in our case. This technique can urge the neural network not to be over-confident in the classification result.

4.2. Compound Multi-input Neural Network

Figure 4 shows the architecture of our compound multi-input neural network, which represents one light curve with two different inputs: a sequence and an image. The RNN and numerical input parts are identical to the previous RNN-based multi-input neural network. The image input is a 128 × 128 pixels single-channel image on which we plot a phase-folded light curve. The structure of the CNN part is the same as in Szklenár et al. (2022). We only change the output of the last dense layer to a 32-dimension vector for proper concatenation with the outputs of another two parts of the network.

To train this network, we apply a 2-stage procedure. First, we ignore the image input, train an RNN-based multi-input neural network on the same data set and save the weights of the best model. Then these weights are loaded to the compound network layer by layer, and the weights of the RNN part are fixed before training, which means setting the RNN part untrainable. This 2-stage procedure aims to make the CNN part a supplement for high-level features, not a redundant structure. The regularization approach during training for the compound network is the same as those used in the RNN-based network.

5. Implementation and Results

We implement our classification methods on a computer with an NVidia GeForce RTX3090 GPU. Every training epoch takes about 1 minute, and the test process takes about 3 minutes. The computation cost of the Gaussian Process augmentation is comparable to the entire training and validation process.

5.1. Evaluation Metrics

We adopt the confusion matrix, the balanced accuracy and the macro F1 score as metrics to evaluate the performance of the imbalanced data classification. To normalize the confusion matrix for a better exhibition, we divide each row by the total number of objects per class. Therefore the diagonal values become Recall of every class. The confusion matrix can also be normalized by columns to show Precision of each category. The balanced accuracy is defined as the average of Recall obtained on each class, hence the average of the diagonal values of the normalized confusion matrix. The macro F1 score is the harmonic average of Precision and Recall, as follow

$\begin{eqnarray}&&{\rm{F}}1=2\times \displaystyle \frac{\mathrm{Precision}\times \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}\end{eqnarray} \tag{ 7 }$

The higher the macro F1 score, the better the classification result.

5.2. Hyperparameter Setting

Most of the hyperparameters are depicted in Figure 3–4, leaving the batch size and the learning rate for the Adam optimizer. These two hyperparameters are always tuned together. We choose a relatively small batch size of 32 to get a better generalization performance for the neural network model and a relatively large learning rate of 0.001 to reduce the computation cost. We also try lower learning rates and careful scheduling strategies, finding the improvement level less than the effect of different random initial weights of neural networks. Considering the ability of the ensemble learning method to integrate weak learners into a strong learner, it is rational to make this trade-off between performance and computation cost.

5.3. Results of the RNN-based Multi-input Neural Network

The training step of the RNN-based model takes ten epochs on each sub-data set, therefore 100 epochs for the entire ensemble learning process. Figure 5 shows the confusion matrix of the classification results on the test data compared to the other two methods: one is to apply equal augmentation, i.e., augmenting each category of the training set to the same number of the largest category; another is to augment the small-sized classes to a moderate size (1875 in the implementation) and employ the class weight when training. The macro F1 score of the ensemble technique, the equal augmentation, and augmentation with class weights are 0.71, 0.66, 0.67, respectively. As a result, the performance with ensemble technique is obviously superior to that with the equal augmentation and augmentation with class weights.

**Figure 5.** The confusion matrixes of the ensemble RNN-based model with comparison to the other two approaches. The matrixes are normalized to the Recall view. (a): classification result of the ensemble RNN-based model. (b): the result of the equal augmentation approach. (c): the result of moderate augmentation and class weights.
Download figure:
Standard image High-resolution image

5.4. Result of the Compound Multi-input Neural Network

Though training a single RNN-based or CNN-based multi-input neural network consumes not much time, the compound neural network is hard to train by taking 100 epochs on every sub-data set to find the optimal model. Table 2 exhibits the balanced accuracy of the results given by our two different networks trained on each sub-data sets. Both the classification score on validation and test sets are offered. The final ensemble learning result of the compound network is displayed in Figure 6, where we mark the variation of each class's Recall compared to the RNN-based network.

**Figure 6.** The confusion matrix of the ensemble compound multi-input neural network, marked with the variation of each class's Recall compared to the RNN-based network.
Download figure:
Standard image High-resolution image

Table 2. The Balanced Accuracy of Models Trained on Each of the Ten Sub-data sets, given by Two Different Neural Networks of the RNN-based and Compound Networks

RNN Val	Compound Val	RNN Test	Compound Test
(%)	(%)	(%)	(%)
80.91	84.29	81.90	84.97
82.68	83.49	82.62	84.29
82.19	84.62	83.43	85.10
81.90	82.19	82.78	79.92
82.82	83.54	82.67	82.72
81.57	83.69	80.74	81.25
83.00	85.33	83.92	85.34
84.61	84.97	83.39	84.77
80.74	83.24	78.62	84.30
83.13	83.20	79.73	78.95

Note. Both the classification score on validation and test sets are offered.

Download table as: ASCII Typeset image

The overall balanced accuracy of the compound model improves slightly from 85.7% to 86.2% compared to the RNN-based model, while the macro F1 score increases from 0.71 to 0.75. The relatively significant improvement in the macro F1 score derives from the advance in Precision, which is depicted in Figure 7 by the confusion matrix normalized to the Precision view. The compound model classifies the large-sized categories more accurately at the cost of slightly decreasing Recall of the small-sized ones, and this relieves the misclassification of the majority of the samples.

**Figure 7.** The confusion matrixes normalized to the Precision view. (a): the result of the RNN-based model. (b): the result of the compound model. On the imbalance test set, the misclassified samples of the large-sized classes can easily dominate samples of the predicted small-sized categories.
Download figure:
Standard image High-resolution image

5.5. 10-fold Cross-test

The performance of a classification algorithm on a data set usually varies with different partitions of training and test set. To reliably demonstrate the capacity of our method, we carry out a 10-fold cross-test. We split the original data set into ten equal-sized distinct parts with no overlapping. Each is used as a test set in a training-and-evaluation process, while others are for training and validation sets. Thus the process is repeated ten times. The augmentation and ensemble learning operations are implemented respectively on each of the ten processes.

Here we only perform cross-test for the RNN-based multi-input neural network since the compound model only slightly improves on Recall. Figure 8 depicts the result. Each matrix element is the median of the corresponding value in the ten confusion matrix. The superscript and the subscript indicate the deviation of the second-best and second-worst values, respectively. On the imbalance test set, the relatively large deviation for small-size classes can derive from the misclassification of only one or two samples.

6. Discussion

Considering all types of RR Lyrae stars as a whole, the compound neural network achieves a total Recall of 96.7% and a total Precision of 97.1%. Other periodic variable stars, except rotational stars, rarely contaminate the RR Lyrae star samples. However, some normal RRab stars are misclassified as RR Lyrae stars with the Blazhko effect or ACEP stars. Due to the significant data imbalance, these misclassified samples constitute a large proportion of the predicted samples for these two small-sized categories. Similarly, the misclassified samples from RRc stars greatly affect the purity of RRd stars.

For rotational stars, the compound model achieves 66.9% of Recall and 42.2% of Precision. Some of these stars could confuse with the contact and semi-detached binary (Ecl) stars. Moreover, the misclassified rotating stars markedly contaminated the small-sized Cep-II stars class. The Ecl class gets Recall of 80.5% and Precision of 94.2%. The misclassified samples of Ecl stars also contaminate the detached binary stars (EA) because there is no clear division between these two types. Recall and Precision of EA stars are 96.1% and 83.3%, respectively.

The long period variable stars (LPV) and the δ-Scuti stars are relatively distinct from other types of stars, and have Recall of 99.5% and 97.7%, respectively. But they could be polluted by the misclassified samples of other large-sized classes, which makes Precision decrease to 90.1% and 89.6%, for LPV and δ-Scuti respectively.

For both types of cepheids, the total Recall is 93.3% while the total Precision is 52.5%. The ACEP stars have Recall of 97.8% and while Cep-II stars have 73.3%. They mainly confuse each other, but also get dramatically contaminated by other large-sized classes' misclassified samples.

7. Conclusions

In this paper, we present an ensemble learning approach based on data augmentation for periodic variable star classification by light curves using deep learning on imbalanced data. We apply Gaussian Process to generate artificial light curves with uncertainties for small-size classes and take undersampling for large-size classes, setting up balanced sub-data sets of the training set. Training models on these sub-data sets could avoid overfitting on small-size classes, and the ensemble result shows performance improvement.

We design two kinds of neural network architectures for the task: the RNN-based multi-input model and the compound model combing RNN and CNN structures. These multi-input models take both the light curve and additional numerical features as input. On the CRTS variable star data, The macro F1 score on the imbalanced test set reaches 0.71 and 0.75 for the RNN-based and compound model, respectively.

Our ensemble learning approach can easily cooperate with different deep learning models since it is a data-level technique. Our attempt to combine CNN and RNN structures implies that using different representations of light curves together in a model is possible for higher performance. This kind of compound neural network architecture is flexible for time series sky surveys with different light curve lengths. These methods put forward by us will contribute to a better classification of variable sources with time series data in the future projects (e.g., LSST), finish the multi-classification in one step with high performance, and also shed light on the imbalance classification with multimodal data.

Acknowledgments

This paper is funded by the National Natural Science Foundation of China (grant Nos. 12273076, 12203077, 12133001, U1831126 and 11873066), the Science Research Grants from the China Manned Space Project (Nos. CMS-CSST-2021-A04 and CMS-CSST-2021-A06), and Natural Science Foundation of Hebei Province (No.A2018106014).

Periodic Variable Star Classification with Deep Learning: Handling Data Imbalance in an Ensemble Augmentation Way

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction