An Intelligent Gestational Diabetes Diagnosis Model Using Deep Stacked Autoencoder

: Gestational Diabetes Mellitus (GDM) is one of the commonly occurring diseases among women during pregnancy. Oral Glucose Tolerance Test (OGTT) is followed universally in the diagnosis of GDM diagnosis at early pregnancy which is costly and ineffective. So, there is a need to design an effective and automated GDM diagnosis and classification model. The recent developments in the field of Deep Learning (DL) are useful in diagnosing different diseases. In this view, the current research article presents a new outlier detection with deep-stacked Autoencoder (OD-DSAE) model for GDM diagnosis and classification. The goal of the proposed OD-DSAE model is to find out those mothers with high risks and make them undergo earlier diagnosis, monitoring, and treatment compared to low-risk women. The presented OD-DSAE model involves three major processes namely, preprocessing, outlier detection, and classification. In the first step i.e., data preprocessing, there exists three stages namely, format conversion, class labelling,and missing value replacement using k-nearest neighbors (KNN) model. Outliers are superior values which considerably varies from other data observations. So, it might represent the variability in measurement, experimental errors or novelty too. So, Hierarchical Clustering (HC)-based outlier detection technique is incorporated in OD-DSAE model, and thereby classification performance can be improved. The proposed model was simulated using Python 3.6.5 on a dataset collected by the researcher themselves. A series of experiments was conducted and the results were investigated under different aspects. The experimental outcomes inferred that the OD-DSAE model has outperformed the compared methods and achieved high precision of 96.17%, recall of 98.69%, specificity of 89.50%, accuracy of 96.18%, and F-score of 97.41%.


Introduction
Gestational Diabetes Mellitus (GDM) is characterized by uncertain glucose level in blood stream at the time of pregnancy [1]. In earlier gestation fasting, the blood glucose level becomes low and insulin secretion also gets reduced gradually. This phenomenon is applied in the progressive improvement of insulin resistance in next trimesters with a marginal enhancement of insulin generation or hyperinsulinemia. Further, insulin resistance exists in case of placental hormones too. Pre-defined pathophysiologic systems, along with pregnancy, tend to change the metabolism in human body and enables maximum amount of postprandial maternal glucose. Pregnancy is one of the hyperinsulinemic conditions which creates an imbalance in glucose secretion, when insulin secretion becomes irregular to equalize pregnancy-based insulin protection. GDM can be described as chronic low-grade subclinical infection classified by irregular generation of cytokines, mediators followed by initiation of inflammatory signaling pathways. Even though GDM is considered to be resistant to insulin, the accurate process behind this model is still unexplored and challenging. Therefore, the increased level of insulin resistance in pregnancy is defined as cortisol and gestational hormones; however, the recent studies confirmed the involvement of cytokines too in this application [2]. The presence of type 2 diabetes, among females with previous GDM (pGDM), is diagnosed during postpartum period with a maximum threshold. Further, the developers have found that the females with pGDM have high chances of developing type 2 DM within a limited period after pregnancy. Diabetes is one of the irreversible diseases which is accompanied by CVD mostly. In addition to the above, women with GDM are also prone to heart attacks, obesity, hypertension, dyslipidemia, and subclinical atherosclerosis. Those patients under metabolic anomalies with GDM are likely to have type 2 diabetes and it is a result of natural course of disease that eventually causes high risk CVD.
Early diagnosis and prediction tend to reduce the existence of GDM and commit minimum adverse pregnancy results [3]. But, the survey conducted earlier mentioned that GDM cases are confirmed in a limited period of time using Oral Glucose Tolerance Test (OGTT). This leaves an optimal window of contribution for both fetal as well as placental deployment. OGTT is mostly recommended during early pregnancy across the globe. However, it is expensive and provide inaccurate results in most of the cases. GDM manifests during the period of mid-to-late pregnancy [4]. So, earlier detection is essential to reduce further health risks. The establishment of simple prediction model, using previous medical data, among women who are at high risks of GDM helps in identifying mothers who require earlier diagnosis and treatment to obviate OGTTs. The currently-developed prediction models for GDM were introduced under the application of traditional regression model. In this scenario, Machine Learning (ML), a data analysis model that develops methods to predict results by 'learning' from data, has been emphasized as a competing alternative to regression analysis. Further, ML is capable of performing well than regression, in terms of capturing nonlinearities as well as complicated communications between prediction attributes. Therefore, the current study employed ML methods for GDM prediction and no other models were selected that are relevant to Logistic Regressions (LR).

Outlier Detection Problem
Outlier or anomaly prediction is one of the significant objectives in ML and data mining methods. In the literature [5], the prediction of an outlier is considered to be the problem of pattern identification in data, which is not the actual behavior. Outlier prediction domains are also involved in intrusion detection, credit fraud investigation, video tracking, climate detection, identification of cybercrimes in electronic commerce, and so on [6]. Additionally, some other types of outliers exist such as point outliers, contextual outliers and collective outliers. The prediction of point outlier has been employed in various other domains too. A data set has multiple points whereas a point is referred to as an outlier when it is differentiated from a massive number of points. In order to predict the outliers, some of the effective models have been applied earlier namely, classification modules, nearest neighbor schemes, clustering, statistical approach, distancebased approaches, and so forth. In classification-based outlier prediction, there are two classes present such as multiclass and single-class anomaly prediction approaches. Initially, multi-class classifiers have adopted the training dataset with labeled normal class label points. Next, the learner who applies the supervised learning trains another method under the application of labeled data. Classifier is generally applied to differentiate normal class from the remaining class. Alternatively, in single-class outlier's prediction, only a single normal class classification contributes in learning and predicting the boundary of a normal class. When a test point is irrelevant to the boundary, then it is considered as an outlier.

Paper Contribution
The contribution of this research work is as follows. This research article presents a new outlier detection model i.e., deep stacked Autoencoder (OD-DSAE) model for GDM diagnosis and classification. The presented OD-DSAE model aims at identifying high-risk mothers who mandatorily need early diagnosis, monitoring, and treatment compared to low-risk women. The presented OD-DSAE model performs GDM diagnosis following three sub-processes such as preprocessing, outlier detection, and classification. Primarily, data preprocessing occurs in three stages namely, format conversion, class labelling, and missing value replacement using k-nearest neighbors (KNN) model. Next to that, Hierarchical Clustering (HC)-based outlier detection technique is incorporated in OD-DSAE model due to which classification performance gets improved. The detailed simulation analysis was conducted to verify the superior performance of the presented OD-DSAE model.

Related Works
This section reviews several diagnosis models used in the detection of GDM. Xiong et al. [7] aimed at developing a first-19 weeks' risk detection method with numerous potential for GDM detectors and used Support Vector Machine (SVM) as well as Light Gradient Boosting Machine (lightGBM). Zheng et al. [8] developed a simple method to predict GDM among Chinese women during their earlier pregnancy with the help of biochemical markers and ML model. In the literature, Shen et al. [9] examined the possibility of the best AI method for GDM examination in a setting that requires limited clinical equipment and clinicians. The study also developed an app based on AI scheme.
In the literature [10], the detection of GDM with diverse ML methods has been implemented on PIMA dataset. The accuracy of diverse ML methods was verified with measures. The importance of ML techniques was depicted under the application of confusion matrix, Receiver Operating Characteristic (ROC) and AUC values in managing diabetes PIMA dataset. Srivastava et al. [11] proposed a statistical method for evaluation of Gestational Diabetes Mellitus under the application of Microsoft Azure AI services. It is an ML Studio that yields excellent performance while its algorithm works on the perception of drag and drop. The classifier used in this process to detect the existence of GDM relied on the aspects occur during earlier phases of pregnancy. Cost-Sensitive Hybrid Model (CSHM) and five traditional ML models were applied in the development of prediction methods [12] to capture the upcoming threats of GDM in temporally-collected EHRs. After the completion of data cleaning, few data was recorded and gathered for a data set.
In the study conducted earlier [13], Radial Basis Function Network (RBFNetwork) was designed, estimated for performance and compared with ANN model to find feasible cases of GDM that produce harmful effects to pregnant women and fetus. In Ye et al. [14], parameters have been trained in various ML and classic LR methods. In Du et al. [15], three different classifiers were used to predict the target in case of future infection. The prediction accuracy guides the physician to make a better decision and regular prevention. Finally, it is identified that the DenseNet method predicts the target as gestational diabetes with minimum flexibility.
The major limitation of predominant classification models is that these frameworks depend upon exact labels, for normal class labels, which are highly complicate to use in real-time scenarios. In case of nearest neighbor-related outlier prediction schemes, some of the considerations are 'points' that belong to dense regions whereas the outliers come under sparse regions. Local Outlier Factor (LOF) mechanism is one of the well-established models to date. The basic principle of LOF depends on local density estimation of score points. Every point is allocated to a score in the ratio of mean local density of k-nearest neighbors (k-NN) of the point against the local densities of data point. Thus, the major constraints of this model are O(n2) complexity. Clustering-relied outlier prediction models make use of clustering technologies to collect the data as clusters. But, these points do not come under the clusters named 'outliers'.
The main aim of clustering is to identify the clusters; thus, the outliers are considered to be the result of clustering operation which is not optimized appropriately. The root cause behind this strategy is the complex nature of clustering approaches in O(n2). In case of statistical outlier prediction, these approaches rely upon normal data point in high probability sites of stochastic method while the anomaly exists in low possibility regions. Generally, the statistical models are suitable for Gaussian distribution which is a combination of parametric statistical distribution that offers data and statistical inference test used to estimate the unknown sample under this method. But, the major disadvantage of this approach is the interruption of data points. Hence, in case of maximum dimension of data, the hypothesis becomes false. For distance-based outlier's prediction, a point has been assumed to be an outlier when it is composed of minimum sufficient points than a defined threshold value [16]. Fig. 1 shows the working principle of OD-DSAE model. The presented OD-DSAE model involves three major processes namely, preprocessing, outlier detection, and classification. Initially, data preprocessing occurs in three stages namely, format conversion, class labeling, and missing value replacement. In addition, Hierarchical Clustering (HC)-based outlier detection technique is employed to remove the unwanted instances present in the dataset. Outliers are superior values and it varies considerably from other data observation. So, it might represent variability in measurement or experimental errors or in terms of novelty. So, outlier detection technique is incorporated in the OD-DSAE model, thereby classification performance can be improved. Finally, DSAE is applied as a classification model to determine an appropriate class label for GDM.

Data Preprocessing
At this stage, the input medical data is preprocessed to improve data quality in three ways. Firstly, the data conversion process occurs during when the input data in .xls format is converted into .csv format. Secondly, class labeling process is carried out during when the data instances are allocated to corresponding class labels. Thirdly, missing value replacement is performed using KNN technique. KNN is a simple and efficient technique that stores all the existing cases while at the same time, it categorizes the new cases based on a similarity measure. In this scenario, KNN model is deployed as a tool for data imputation. The principle behind KNN method is provided herewith [17]: where d (x,y) implies Euclidian distance, J means data attribute with j = 1, 2, 3, . . .s, s refers to data dimensions, x aj defines value from j-attribute along with missing data, and y bj represents the value from j-attribute with no missing data, • According to the distance details attained, the lower Euclidian distance relies on a parameter k which is calculated as an imputed value in missing data. Hence, the value of imputation is determined using Weight Mean Estimation technique, as given in Eq. (2).
where x j denotes the Weight Mean Estimation, K implies the variable count applied with k = 5, w k refers to the nearest neighbour observation measure, and v k indicates the value from complete details on attribute with missing data related to variable k. In this model, the function of w k has been measured using Eq. (3).
where d (x,y) : Euclidian distance of every variable k.

HC-Based Outlier Detection Technique
Once the data is pre-processed, HC-based outlier detection technique is executed to remove the outliers exist in the data. The proposed HC algorithm operates in a bottom-up manner. Here, the clusters are merged till an individual cluster is attained and the entire procedure gets iterated to determine the optimal cluster count that fits the data [18]. It begins with the consideration of every instance as a single cluster, and consequently necessitate |X | − 1 merging steps. To reduce the number of merging steps, HC algorithm normally utilizes a low complexity initial clustering in the generation of N clusters, N |X |. The initial clustering produces small-sized clusters and considerably reduces the computation complexities of HC technique. Fig. 2 shows the processes involved in HC. In HC model, the clusters are merged in such a way that the quadratic mutual information gets maximized. At this point, two approaches are available for merging such as agglomerative clustering and split-and-merge clustering.

Agglomerative Clustering
In this method, the modifications are estimated on quadratic mutual data after the integration of pair of clusters and identification of optimal clusters. The pair is selected which enhances the quadratic mutual information. The clusters with minimum distortion have emerged which are applied in the optimization of distortion-rate function. Consider that the clusters X A and X B are combined to generate X c = X A ∪ X B while the modifications in quadratic mutual data, ΔĨ A,B are determined as, whereĨ (t) (X ;X ) implies the quadratic mutual information at step r. The closed form Eq. (4) offers an optimal pair with no integration of pair and evaluates the quadratic mutual data. On the other hand, the maximum ΔĨ (t+1) A,B at every hierarchy estimates the actual cluster count in the data. Algorithm 1 gives the pseudo code for agglomerative clustering method.

Split and Merge Clustering
In contrast to agglomerative clustering, split-and-merging clustering technique is applied to detect the cluster in a hierarchy for elimination process. The cluster exerts a poor impact on quadratic mutual data. This denotes that when the clusters are eliminated, then mutual information could be enhanced. Assume the cluster X A is composed of worst impact on mutual data and the quadratic mutual information gets changed while ΔĨ A is determined as: The instances of poor cluster are allocated to residual clusters of clustered space depending on minimum Euclidean distance, where the closest samples are allocated initially. These processes are followed until one remaining cluster is reached. Followed by, on the basis of drastic extensions in quadratic mutual data from various hierarchies ΔĨ (t+1) A , the actual number of clusters is estimated. Hence, the pseudo code for split-and-merge clustering model is defined in algorithm 2.

DSAE-Based Classification Technique
DSAE model is executed to allocate the class labels of input data, i.e., presence of GDM or not. Basically, autoencoder (AE) is a type of unsupervised learning and is composed of three layers namely, input, hidden, and output layers as depicted in Fig. 3. The operation of AE training is comprised of two portions namely, Encoder and Decoder.

Figure 3: Structure of AE
Generally, encoder is applied in mapping the input data to hidden implication, while decoder is considered for redevelopment of input data from hidden representation. It is provided as unlabeled input dataset {x n } N n=1 , where x n ∈ R m × 1, h n indicates the hidden encoder vector estimated from x n , andx n denotes the decoder vector of the resultant layer [19]. Therefore, the encoding operation is as given below.
where f implies the encoding function, W 1 refers to the weight matrix of the encoder, and b 1 defines a bias vector. The decoder operation is described as followŝ where g represents the decoding process, W 2 implies the weight matrix of the decoder, and b 2 indicates a bias vector. The parameter setting of AE undergoes optimization to reduce the reformed error: where L implies a loss function L(x,x) = ||x −x|| 2 . As depicted in Fig. 4 [20], the architecture of SAEs undergoes stacking n AEs into n hidden layers using unsupervised layer-wise learning method and fine-tuned using a supervised model.

Figure 4: Structure of SAE
Thus, SAE-relied models can be classified using three procedures as given herewith • The first AE is trained with the help of input data and gain the learned feature vector • The feature vector of the previous layer is applied as input for the next layer and is followed till the training process is completed. • Once the hidden layers are trained, Backpropagation (BP) model is applied to maximize the cost function and upgrade the weights with labelled training set and accomplish the fine-tuning process.

Dropout
Dropout is an efficient principle that limits the overfitting process involved in NN training. Most of the overfitting issues exist only when the training set is minimum and it intends to generate minimum accuracy on the test set. Dropout affects the neurons randomly with a hidden layer and the energy gets drained during training process; however, the weights of the neurons are maintained. Moreover, dropout is capable of accomplishing the output date of hidden neurons to 0 and the neurons could not be related to forward propagation task. The developers have sampled the impact of dropout to reduce overfitting issues for minimum training set. It is employed to maximize the feature extraction and classification accuracy of SAEs during GDM diagnosis.

ReLU Function
In case of classical activation functions like sigmoid and hyperbolic tangent functions, the gradients get reduced immediately with training error and propagate it to forward layers. Rectified LLinear units (ReLU) achieves maximum concentration from developers in the recent times since the gradient tend not to reduce the independent parameter. Thus, the system with ReLU is free from gradient diminishing problem. Therefore, ReLU function can be represented as given herewith.

Performance Validation
The presented model was simulated in Python 3.6.5 with additional packages such as tensorflow-gpu==1.14.0, pyqt5==5.14, pandas, scikit-learn, matplotlib, prettytable, seaborn, tqdm, numpy==1.16.0, and h5py==2.7.0. The snapshots of the results yielded during the simulation process are shown in the appendix. The presented OD-DSAE model was experimentally validated using the GDM dataset constructed by researcher themselves. It has a total of 3525 instances with the existence of 15 features. In addition, a set of two class labels exists in the dataset. Particularly, 2153 instances belong to class 0 and 1372 instances belong to class 1. Tab. 1 shows the information related to the dataset. Fig. 5 shows the frequency distribution of attributes that exist in GMD dataset.         The obtained values signify that the Voted Perceptron model accomplished the least classification performance with an accuracy of 65.1% and F-score of 78.86%. Afterward, the DT model showcased a slightly higher accuracy of 73.82% and F-score of 80.19%. Simultaneously, the Logit Boost model yielded an even better accuracy of 74.08% and F-score of 80.96%. Along with that, the NN model produced a moderate diagnostic outcome with an accuracy of 75.39% and F-score of 81.48%. Moreover, the LR model attained a higher accuracy of 77.21% and F-score of 83.41%. However, these models failed to outperform the presented DSAE and OD-DSAE models. At last, the presented DSAE model obtained an effective accuracy of 89.16% and F-score of 91.41% whereas the OD-DSAE model accomplished an even higher accuracy of 96.18% and F-score of 97.41%.  Fig. 10 illustrate the detailed results of comparative analysis of OD-DSAE model upon the applied GDM dataset [21]. The resultant values in the table demonstrate that the OD-DSAE model outperformed all other methods. The existing KNN model produced an inferior classifier outcome over other methods by accomplishing the least accuracy of 67.6%. At the same time, the NB, ELM, and SGD models obtained moderate accuracy values of 74.9%, 75.72%, and 76.6% respectively. Followed by, the MLP model exhibited somewhat better accuracy i.e., 81.9%.  Eventually, hybrid and J48 (pruned) models produced reasonable and closer accuracy values of 86.6% and 89.93% respectively. Besides, the AMMLP and HPM models attempted to produce reasonable outcomes and achieved accuracies of 89.93% and 92.38%. Concurrently, K-means + LR model presented a near-optimal accuracy value of 95.42%. However, the presented OD-DSAE model showcased its supremacy and achieved the maximum accuracy of 96.18%. After observing the above-discussed tables and figures, it is evident that the presented OD-DSAE model has effective diagnostic outcomes over the compared methods. Therefore, it can be employed as an appropriate tool for diagnosis and classification of GDM.

Conclusion
The current research work presented a new DL-based GDM diagnosis and classification model i.e., OD-DSAE. The goal of the proposed OD-DSAE model is to identify high-risk mothers who need early diagnosis, monitoring, and treatment compared to low-risk mothers. The presented OD-DSAE model has three major processes namely preprocessing, outlier detection, and classification. At first, data preprocessing was performed in three stages namely, format conversion, class labeling, and missing value replacement. The presented model made use of HC-based outlier detection technique to remove the unwanted instances from the dataset. Finally, DSAE was applied as a classification model to determine the appropriate class label of GDM. A detailed simulation analysis was performed on our dataset and the results were investigated under different aspects. The presented OD-DSAE model outperformed other models and achieved a precision of 96.17%, recall of 98.69%, specificity of 89.50%, accuracy of 96.18%, and F-score of 97.41%. These experimental outcomes infer that OD-DSAE model is a promising candidate for diagnostic applications. As a part of future work, the classification performance can be further enhanced using feature selection models.

Funding Statement:
The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.