Double decoupled network for imbalanced obstetric intelligent diagnosis

: Electronic Medical Record (EMR) is the data basis of intelligent diagnosis. The diagnosis results of an EMR are multi-disease, including normal diagnosis, pathological diagnosis and complications, so intelligent diagnosis can be treated as multi-label classification problem. The distribution of diagnostic results in EMRs is imbalanced. And the diagnostic results in one EMR have a high coupling degree. The traditional rebalancing methods does not function effectively on highly coupled imbalanced datasets. This paper proposes Double Decoupled Network (DDN) based intelligent diagnosis model, which decouples representation learning and classifier learning. In the representation learning stage, Convolutional Neural Networks (CNN) is used to learn the original features of the data. In the classifier learning stage, a Decoupled and Rebalancing highly Imbalanced Labels (DRIL) algorithm is proposed to decouple the highly coupled diagnostic results and rebalance the datasets, and then the balanced datasets is used to train the classifier. This paper evaluates the proposed DDN using Chinese Obstetric EMR (COEMR) datasets, and verifies the effectiveness and universality of the model on two benchmark multi-label text classification datasets: Arxiv Academic Papers Datasets (AAPD) and Reuters Corpus1 (RCV1). Demonstrating the effectiveness of the proposed methods is an imbalanced obstetric EMRs. The accuracy of DDN model on COEMR, AAPD and RCV1 datasets is 84.17, 86.35 and 93.87% respectively, which is higher than the current optimal experimental results.


Introduction
With the implementation of China's three-child policy, obstetric clinical research is faced with unprecedented challenges, and the number of elderly parturient women is increasing rapidly. The incidence rate of obstetric and gynecological diseases in elderly maternal is significantly higher than that of pregnant women at the right age [1]. Electronic Medical Records (EMRs) are the most detailed and direct record of clinical medical activities. The clinical diagnosis process of doctors can be regarded as judging the probability of suffering from a certain disease according to the clinical manifestations and examination results of patients. An obstetric EMR usually contains multiple diagnostic results, that is, patients may be diagnosed with both "gestational diabetes mellitus" and "gestational hypertension", and the diagnosis results have strong coupling. If an EMR is regarded as one sample, each sample can be grouped into multiple categories. Therefore, the intelligent diagnosis problem can be regarded as a multi-label classification problem in machine learning, and multiple diagnostic results in an EMR have different labels [2].
However, the data distribution of EMRs is often imbalanced, and the sample number of rare diseases is far less than that of common diseases [3]. The imbalanced distribution of datasets will lead to the performance degradation of traditional classification algorithms [4]. Traditional algorithms tend to treat a few classes as noise or outliers, and ignore them in classification [5]. For imbalanced EMRs, the cost of false negative is much higher than that of false positive. For example, in 100 EMRs, 99 results were normal and one detected cancer. If the traditional classification algorithm is directly applied to this kind of data, the diagnostic results of EMR will be predicted as normal. Although the precision rate is as high as 99%, the most critical cancer information is ignored.
In neural networks, the features of input samples have a great influence on classifiers. The reason that rebalancing method works is that it can update the weights of classifiers and significantly improve the learning ability of deep network classifiers, but it damages the feature learning ability of deep network [6,7]. In addition, the coupling of high-frequency diagnostic results and low-frequency diagnostic results in an EMR needs to be considered in the multi-label rebalancing method: removing the EMR containing high-frequency diagnostic results also means losing low-frequency diagnostic results, and cloning the EMR containing low-frequency diagnostic results to add new instances will also increase the frequency of existing high-frequency diagnostic results.
In recent years, intelligent diagnosis has become a research focus. Yin et al. [8] proposed a method to extract signal features from heart rate variability signals and classify patients' states using the long short-term memory network. Yan et al. [9] collected 1880 endoscopic images, and developed a Gastric Intestinal Metaplasia (GIM) system with these images using a modified convolutional neural network algorithm. Wang et al. [10] proposed Patch Shuffle stochastic pooling neural network, which improved recognition performances of Corona Virus Disease 2019 (COVID-19) infection from chest CT (CCT) images, and will help assist radiologists to make diagnosis more quickly and accurately on COVID-19 cases.
In addition to using image data for intelligent diagnosis, text-based intelligent diagnosis is also a research focus. Rajkomar et al. [11] proved that in all the methods using EMRs, deep learning methods are superior to the most advanced statistical prediction model. Maxwell et al. [12] used physical examination data to predict the possible chronic diseases such as diabetes, hypertension and fatty liver. Yang et al. [13] proposed a Convolutional Neural Network (CNN) based auxiliary diagnosis method, which is based on self-learning to learn high-level semantic understanding from EMRs, and output the prediction probability of common diseases including hypertension and diabetes. Liang et al. [14] proposed a system framework based on pediatric EMRs, which integrates medical knowledge in pediatric EMRs for intelligent diagnosis.
In the current researches, the imbalanced distribution of datasets is an important factor limiting the performance of intelligent diagnosis. The precision of low-frequency disease diagnosis is too low, which leads to the decline of the practicability of the diagnosis model. The existing researches based on imbalanced data can be divided into data-level methods and algorithm-level methods.
The data-level methods are to transform the original datasets into a relatively balanced datasets from the data preparation stage. Liu et al. [15] proposed an algorithm based on information granulation. The algorithm assembles the data in most classes into particles to balance the proportion of classes in the data, and uses prostate cancer data to predict the survival rate of patients. Huang et al. [16] proposed random balanced sampling algorithm based on association rule selection, and verified the performance of the algorithm in a private diabetes EMRs.
Unlike the data-level methods, the algorithm-level methods do not change the distribution of training data; instead, they increase the importance of minority classes in the learning and decisionmaking process [17]. Li et al. [18] proposed dice loss to improve the weight of difficult samples and reduce the weight of simple negative samples. For high-dimensional imbalanced text data, researchers found that selecting features that are conducive to identifying minority classes can effectively deal with imbalanced data. Yang et al. [19] proposed a text feature selection method based on relation score. By calculating the relationship score of each feature and category, the relationship score of minority features is increased and the imbalance degree of data concentration categories.
At present, intelligent diagnosis researches based on EMRs are mostly for a single disease, and does not consider the multiple complications and other diagnostic results. In addition, the existence of rare diseases causes the imbalance of EMRs, and the diagnostic performance of the model needs to be further improved.
This paper proposes Double Decoupled Network (DDN) to improve the performance of intelligent diagnosis based imbalanced datasets. Our main contributions are summarized as follows: 1) DDN is proposed to decouple representation learning from classifier learning and high coupling diagnostic results. 2) In the classifier learning stage, a Decoupled and Rebalancing highly Imbalanced Labels (DRIL) algorithm is proposed to decouple the highly coupled diagnostic results and rebalance the datasets. 3) Experiments on a real Chinese Obstetric EMR (COEMR) datasets and two public datasets show that DDN method has better performance than comparison methods.

Materials and methods
Inspired by paper [6], we firstly analyze the performance of rebalancing strategies in neural network. Secondly, we discuss the rebalancing strategies in high coupling diagnostic results of COEMR datasets. This paper proposes DDN to decouple representation learning from classifier learning, and high coupling diagnostic results.

Overall framework
In order to solve the problem of high coupling of diagnosis results and obtain better features of input samples, an intelligent diagnosis model based on double decoupling network is proposed in this section. The overall architecture of DDN is shown in Figure 1. First, DDN decouples representation learning from classifier learning. In the presentation learning stage, DDN uses CNN to learn the original features of COEMR datasets , , … , , and fixed parameters of presentation learning, where represents a sample in the datasets. Map the input text to a sequence of embedding vector. Words are embedded in a vector sequence through a convolutional layer and linearly transformed using a non-linear activation function to capture indicative information. Different types of useful information for prediction are selected in the pooling layer, and the maximum value is chosen from the feature mapping for each type. Finally, fully connected layers are used to integrate information with disease differentiation in the convolutional or pooling layers. In the classifier learning stage, DRIL algorithm is proposed to decouple the high coupling diagnostic results and rebalance the datasets , , … , , using the datasets to train the classifier. The classifier consists a full connection layer and Softmax function. Both stages use the same CNN network structure and share all weights except for the last full connection layer.

Representation and classifier learning decoupling module
Two methods are adopted to deal with unbalanced data: resampling samples and reweighting sample loss in small batches. In order to explore the working mechanism of rebalancing method, we divide the training process of neural network into two stages, namely representation learning and classifier learning. Specifically, in the first stage, we use the common method (Cross Entropy, CE) or rebalancing methods (Re-sampling/R-weighting, RS/RW) to train the neural network to get the corresponding feature extractor. Then we fix the parameters of the feature extractor. In the second stage, the classifier is retrained by common method or rebalancing methods. In this section, representation learning and classifier learning are used to compare the effects of different training methods. Figure 2 shows the precision of different methods on COEMR and Arxiv Academic Papers Datasets (AAPD) datasets. 1) CE: the traditional cross entropy loss is used to train the network on the original imbalanced data.
2) RS: in this section, class balanced resampling method is used to ensure that the probability of each class in each batch is the same. The sampling probability is calculated as Eq (1).

(1)
where C is the number of all labels in the training set.
3) RW: reweighting all classes according to the reciprocal of their sample size. For representation learning, when using the same classifier learning method (comparing the precision of three blocks in the horizontal direction), the precision of CE block is always higher than that of RW/RS block. We can find that CE can get better classification results since it obtains better features. The worse results of RW/RS block show that RW/RS method has a poor ability to identify depth features, which will damage the ability of representation learning.
For classifier learning, when using the same representation learning method (comparing the precision of three blocks in the vertical direction), it can be found that RW/RS methods can achieve higher precision than CE. The results show that the main reason why rebalancing methods can achieve a balanced performance on imbalanced data is that these methods directly affect the update of the weights of deep network classifiers, that is, promote the learning of classifiers. This section discusses the influence of rebalancing methods on representation learning and classifier learning in neural network. We can find that rebalancing methods can significantly promote classifier learning, but it also damages the ability of learning features to a certain extent. To solve these problems, we propose decoupled representation learning from classifier learning. In the representation learning stage, the original features of the datasets are learned. In the classifier learning stage, the rebalancing datasets are used for training, so as to give consideration to both representation learning and classifier learning, and improve the generalization ability of low-frequency data, and improve the classification performance of imbalanced data.

Label decoupling module
The rebalancing strategy is independent of classifier, so it is applicable to a wider range of scenarios than does the adaptive classifier. It is difficult for traditional rebalancing methods to achieve good performance for multi-label data. The main problems include the huge difference of imbalanced degree between labels in multi-label datasets, and the highly coupling of low-frequency labels and high-frequency labels in the same sample.
The imbalance ratio and average imbalance ratio between labels in a multi-label dataset can be determined according to the method proposed in reference [20]. The first measure is the Imbalance Ratio per label (IR), as shown in Eq (2), which evaluates the imbalance ratio of a single label. Define a multi-label dataset , |0 , ∈ , where is the sample in the datasets, is the label set of the datasets , and L is the label set of the datasets.
The second measure is Mean Imbalance Ratio (MeanIR), which is the overall estimation of the imbalance degree of multi-label datasets, that is, the average IR value of all labels, See Eq (3), where |L| is the label set of the datasets.
According to MeanIR and IR value, this section defines high-frequency label and lowfrequency label. When IR value is higher than mean IR value, it is a low-frequency label; otherwise, it is high-frequency label. For label y, if IR y MeanIR , it belongs to minBags, otherwise it belongs to majBags.
In order to understand the coupling degree of low-frequency and high-frequency labels in the same sample in multi-label datasets, we can evaluate it by SCUMBLE measure [21]. As can be seen from Eq (4), SCUMBLE relies on the aforementioned IR metric. The coupling degree for each sample is first obtained , and then, the average coupling degree SCUMBLE D is calculated for the entire multi-label datasets, as shown in Eq (5). The values of SCUMBLE are normalized in the [0, 1] range, and the larger the values, the higher the coupling between the unbalanced labels.
We calculate the average coupling degree 0.3 of the COEMR datasets, and visualizes the label coupling of the COEMR datasets by using chord diagram, as shown in Figure 3. It can be seen that there is a high coupling degree in the COEMR datasets, and the low-frequency labels are completely associated with some high-frequency labels.
The high coupling of imbalanced labels can be alleviated with label decoupling strategy. Chart et al. [22] proposed remedial algorithm, which is independent of resampling algorithm and uses as the judgment condition of whether to decouple labels, the coupling degree between high-frequency labels and low-frequency labels is reduced by decoupling them. Based on this, this paper proposes a DRIL algorithm, which decouples and clones the samples with high scumble value, and obtains two examples, one of which is associated with highfrequency labels, and the other is associated with low-frequency labels, so as to reduce the coupling degree, and then uses the method of combining oversampling and under-sampling to rebalance the datasets. It can reduce the loss of high-frequency disease sample information and the over fitting of low-frequency disease.
When and | | , the label will be decoupled. Find sample with high coupling label according to step 5. From step 5 to step 7, one sample is decoupled into two samples, namely clone sample is ，L is the label set of , is the label set of , ， . By decoupling, the lowfrequency and high-frequency labels in the samples can be separated, and the decoupled datasets is obtained. MeanSamples is the number of samples required for all labels to reach the mean state of MeanIR. Its calculation method is to divide the number of samples of the most frequent labels by the value of MeanIR: From step 9 to step 21, the datasets are balanced by the method of combining oversampling and under-sampling. Firstly, a random label y is generated and X samples are randomly selected from y. If y belongs to minBags, x Random 0, MeanSamples |y| , then the selected samples are added to the datasets D . If y belongs to majBags, x Random 0, |y| MeanSamples , delete the X samples from the datasets D . DRIL uses Mean samples to limit the number of samples x to balance the distribution among samples, and the number of samples needed to achieve the balance is not more than or less than the number of samples needed to achieve the balance. At the end of each rebalancing, MeanIR and IR are recalculated. Mean samples always use the initial value, and the original distribution of the datasets will not be greatly affected. Finally, we get the datasets D which is decoupled and balanced by the DRIL algorithm.  If |y| MeanSamples and y ∈ minBags then 13 x Random 0, MeanSamples |y| Select x samples from the samples of label y 14 Add x to D , D x,

Experiments
We evaluate the proposed intelligent diagnosis model using COEMR datasets, and verifies the effectiveness and universality of the model in two benchmark multi-label text classification datasets: AAPD and RCV1. Table 2 shows the descriptive statistics of the datasets used in the experiment. In the presentation learning stage, the data filter widths of CNN are set to (2,3,4) and the number of each filter is 25. When training the classifier, the Xaiver method [23] is used to randomly initialize the classifier parameters. The resampling rate P of DRIL is set to 0.1, which is the best resampling rate for multi-label data [24]. Adam [25] is employed as the optimizer and the learning rate set to 0.001, batch size set to 32 and dropout set to 0.3.  COEMR: In this dataset, 24,339 EMRs were randomly selected from the inpatient departments of several hospitals. EMRs are mainly composed of structured and unstructured text data. Structured data includes basic patient information, such as age, ethnicity, and laboratory test data. Unstructured data mainly refers to patients' statement, condition of hospitalization and objective examination, etc. In order to protect the privacy of patients, the patient's name, ID number and other privacy information were removed. Table 3 shows the detailed description of the obstetric COEMR.
The distribution of COEMR datasets diagnostic results is visualized and shown in Table 4. In the COEMR datasets, more than 90% of the diagnostic results have "head position", while "gestational hypertension" accounts for less than 10%. According to the diagnostic results of 73 diseases with high coupling degree, all 24,339 samples were divided into training set (21,905) and test set (2434) according to 9∶1.
Arxiv academic papers datasets (AAPD): AAPD datasets is a large dataset of MLTC constructed by Yang et al. [26]. It includes 55,840 abstracts from Arxiv 1 on computer science.
Reuters Corpus1 (RCV1): RCV1 is provided by Gleviset et al. [27] and consists of artificially annotated Reuters news from 1996 to 1997. Each piece of news can be assigned multiple topics, a total of 103 topics.

Results
This paper compares DDN model with common multi-label text classification methods: Binary Relevance (BR) [28], Label Powerset (LP) [29] and CNN. BP algorithm establishes a two classifier for each tag in the tag data set to predict, which has the advantages of simplicity and efficiency. LP algorithm regards the combination of each label as a new class, and transforms the multi label problem into a multi classification problem. Its advantage is that it pays attention to the semantic correlation between each label. Chen [30] used CNN for the first time in the text classification task. The key information in the sentence is extracted by multiple filters of different sizes in CNN, and the local correlation of the text can be better captured.
In addition, in order to verify the effectiveness of the proposed decoupled module, DDN is compared with BR + DRIL, LP + DRIL and RS + CNN + DRIL is to use the datasets balanced by the DRIL algorithm proposed in this section for classification. RS + CNN means that the data after class rebalancing is directly input into CNN for classification.
From the mean IR value in Table 2, it can be seen that the three datasets are imbalanced, so theoretically, the three datasets can benefit from the rebalancing method. It can be seen from the SCUMBEL value that the SCUMBEL value of COEMR datasets and RCV1 datasets is larger, which belongs to the difficult multi-label datasets, and has high coupling on the labels of different imbalance levels. Tables 5-7 show the experimental results on COEMR datasets, AAPD and RCV1 datasets respectively. In each column, the best results are expressed in bold.

Discussion
It can be seen from the results in Tables 5 and 6 that after the application of the proposed DRIL on COEMR and RCV1 datasets, all measurements have been improved. Compared with the traditional BR algorithm, BR + DRIL algorithm improves the P, R and F1 by 5.30, 2.55 and 3.84% respectively. Compared with the traditional LP algorithm, LP + DRIL algorithm improves the P, R and F1 by 4.59, 2.90 and 3.65% respectively. For RCV1 datasets, due to the large number of labels and the complex hierarchical structure between labels, the improvement of each index is relatively small. Compared with the traditional BR algorithm, the F1 value of BR + DRIL algorithm is increased by 1.21%, and that of LP + DRIL algorithm is increased by 0.27%. This is because the DRIL algorithm decouples the high coupling labels, and combines the over-sampling and under-sampling methods to balance the distribution of the datasets, which makes it easier for the multi-label classification algorithm to process. In addition, the smaller SCUMBLE value of AAPD datasets indicates that there is almost no imbalanced label concurrency in the datasets, so the impact of using DRIL algorithm on the results is relatively small. DDN model is based on CNN. CNN is suitable for extracting local features. Using Bert and other models can have higher recall than DDN model, but F1 value may be reduced. The neural network model can capture more abundant features and deeper semantic information, so CNN has a certain improvement in most of the evaluation indexes than the traditional classification methods BR and LP. Because CNN model is suitable for extracting local features, CNN model tends to select features favorable to high-frequency samples for unbalanced data, so the recall increment of CNN is relatively low. For the RS + CNN model, the traditional class resampling method can improve the frequency of low-frequency class samples, but without considering the high coupling between labels, the performance is only slightly improved. Compared with CNN model, the P value and F1 value of DDN are improved by 4.36 and 2.93%, respectively, reaching 84.17 and 66.78%, and the Hamming loss is reduced by 9.40%. DDN model can solve the problem of high coupling of imbalanced labels by decoupling of representation learning from classifier learning and high coupling label so that the model can learn high-quality text feature representation, and performance has been further improved. DDN also performs well on AAPD and RCV1, which indicates that DDN model can also be applied to other multi-label text classification tasks.
This paper further evaluates the COEMR datasets to explore the effect of DDN model on different parts of the dataset distribution. As shown in the Figure 4, the horizontal axis is the diagnostic result sorted in descending order according to the number of corresponding samples, and the vertical axis is the accuracy increment of each category. It can be seen that DDN has a certain improvement in the diagnostic performance of lowfrequency diseases. Taking the disease "Pregnancy with hypothyroidism" as an example, after label decoupling, the number of electronic medical records containing the disease will increase, and the model is more sensitive to the characteristics of this type of disease, such as "Excessive serum TSH", "Edema" and other disease-related signs and symptoms. The decoupled low-frequency labels will not be restrained by the high-frequency labels. In addition, DDN does not improve the performance of low-frequency diseases by sacrificing the diagnostic accuracy of high-frequency diseases. After label decoupling, the performance of the model can be improved on most diseases. As mentioned above, the resampling method often results in over fitting of low-frequency data, and DDN decouples representation learning from classifier learning to learn good feature representations, and improve the generalization ability of low-frequency data. In conclusion, the DDN model including representation and classifier learning decoupling and label decoupling proposed in this paper has good performance for the diagnosis of low-frequency diseases.

Conclusions
This paper proposes a DDN model for intelligent diagnosis based on imbalanced EMRs. A twostage training method is proposed to decouple the representation learning and classifier learning: in the representation learning stage, CNN model is used to learn the original features of data. In the classifier stage, considering the high coupling diagnostic results of EMRs, a DRIL algorithm is proposed to decouple the high coupling diagnostic results and balance the data distribution. The experimental results on COEMR datasets show that DDN can effectively improve the performance of intelligent diagnosis based on imbalanced EMRs, especially the precision of low-frequency disease diagnosis. In the future, we will try to use DDN for intelligent diagnosis in diseases with more complications such as diabetes.