Thyroid disease diagnosis based on feature interpolation and dynamic weighting ensemble model

doi:10.21203/rs.3.rs-2851005/v1

Download PDF

Research Article

Thyroid disease diagnosis based on feature interpolation and dynamic weighting ensemble model

https://doi.org/10.21203/rs.3.rs-2851005/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Medical diagnosis is the basis for modern medical treatment planning, rapid and accurate testing is responsible for patients’ lives and health. Thanks to increasingly sophisticated large-scale data collection techniques, the advantages of big data and artificial intelligence technologies for data analysis are gradually coming to the fore. However, medical data mostly have missing items and non-human biases, and common algorithms often show instability in learning data distribution patterns. In this paper, a feature interpolation algorithm based on feature importance is proposed for thyroid disease classification dataset, and a dynamic assignment integration model is designed to further optimize the weight assignment of the integration model in terms of distance and accuracy, which improves the accuracy and smoothness of the model. The classification accuracy reaches 94.75%, which is 2.45% higher than the original model, and the index AUC is almost the same as the original model. The generalization learning performance is poor because the deep learning algorithm lacks the support of a priori knowledge in learning data features. In this paper, the convolutional model is introduced to the sequence data prediction and propose a feature-enhanced small network EdcNet, and combine it with the feature interpolation algorithm to explore the role and effect of lightweight convolution in sequence analysis, and the final classification accuracy and AUC reach 95.6% and 99.1%, respectively.

Thyroid disease classification

feature interpolation

dynamic empowerment

ensemble learning

convolutional neural networks

Thyroid disorders are one of the endocrine disorders with a high incidence, and their causes are often related to the amount of iodine consumed by the body. At present, thyroid diseases are diagnosed by the thyroid stimulating hormone (TSH), triiodothyronine (T3), total thyroid hormone (TT4) and other indicators in the body. For example, TSH is a specific indicator for detecting thyroid function. Small changes in free thyroid concentration can lead to significant adjustments in TSH, and when TSH increases it is usually accompanied by attenuated thyroid function, pituitary TSH tumors, etc., while a decrease in TSH indicates that the patient may have hyperthyroidism; T3 is the main hormone that acts on target organs, and T3 can detect hyperthyroidism as well as pseudo-thyrotoxicosis, etc. With these data, the patient's health status can be inferred for timely consultation.

With the development and maturity of big data and machine learning, researchers have applied data mining and intelligent diagnostic techniques to the medical field with some success. With the help of machine learning methods for in-depth analysis of collected data including classification as well as regression, it is possible to obtain detection results with high accuracy, which is self-evident for the development of medical diagnosis. As early as 2009 Temurtas et al^[1] in their study, used three different types of neural networks (NNS) to achieve thyroid disease diagnosis. 2011 Lucas R. Trambaiolli^[2] et al. used SVM, searched for patterns to distinguish AD patients from controls during the EEG period and obtained 87.0% accuracy and 91.7% sensitivity In 2017 Topalovic^[3] in disease prediction proposed new algorithm to predict discrete values by separating patient information as individuals or groups.Miao^[4] et al. proposed a DNN network and achieved an accuracy of 93.51 on CVD disease data.

Numerous studies have shown that traditional medicine, with the help of machine learning as well as big data mining, is revolutionizing and evolving diagnostic techniques that are more suitable and convenient for modern life. However, the aforementioned algorithms and effects still have their limitations and shortcomings:

1) Traditional machine learning algorithms are still inadequate in terms of accuracy, and the phenomenon of false positives has not been greatly improved.

2) Currently, deep learning algorithms do not perform well enough in testing on small data sets to obtain a priori knowledge similar to that in feature engineering, and often suffer from overfitting. Deep learning models for medical data sequences often use fully connected layers, which cannot quickly and adequately fit the data distribution, and are relatively homogeneous in their expressive power and have poor generalization ability.

3) Due to the lack in the data collection process, there are often missing items in medical data, and traditional algorithms usually rely on interpolation such as median, which lacks correlation in the data and erases the features of the data itself.

To address the above defects and shortcomings, this paper takes the UCI thyroid disease dataset as an example and uses it for classification of thyroid diseases, proposes a feature interpolation algorithm to solve the problem of missing data, and designs an integrated model with dynamic assignment to improve the diagnostic accuracy. To further explore the potential of deep neural networks for data analysis, a residual-improved attention model EdcNet was designed and compared with the VGG-1D conventional machine learning algorithm under 1-dimensional convolution, and the results showed that deep learning has greater potential on data sequences, and with the assistance of the interpolation algorithm, was able to achieve 95.6% accuracy on the validation set as well as 99.1% of the AUC score.

This paper is organized as follows: the interpolation algorithm based on feature importance will be introduced in the second part; the dynamic weighted integration model is introduced in the third part, and the efficient 1-dimensional feature extraction specialization convolutional network EdcNet designed in this paper is introduced in the fourth part. the comparative experiments as well as the ablation experimental results are shown in the fifth part, and finally a summary is given in the sixth part.

Data selection. To be more relevant and to obtain general experimental results, the thyroid disease dataset throid0387 from the UCI machine learning dataset, which contains thyroid disease records from the Garavan Institute and J. Ross of the College of New South Wales, was selected for this paper [5–6]. This dataset is a standard thyroid dataset and is used as a research target by a wide range of data analysis and machine learning scholars. The dataset contains a total of 20 subcategories and 6 major categories with a total of 9172 samples. In this paper, the thyroid disease types was classified into the following categories: hyperthyroid conditions, hypothyroid conditions, binding protein, general health, replacement therapy, discordant results. To facilitate the analysis, the labels were transformed into numerical labels of 0, 1, 2, 3, 4, and 5 for subsequent analysis.

Data mining. Through statistical analysis, it was found that the number of samples corresponding to the six categories was statistically available as shown in Table 1. It is not difficult to find that the sample classification gap is within an acceptable range and no serious imbalance of sample categories occurs. At the same time, a large number of unlabeled samples were removed, and the final number of available samples was 2282. describing objective information relevance can usually be described by functional or statistical relationships. Obviously, the correlation between the features in this paper obviously cannot be described in a general way using a simple functional relationship. In this paper, the Pearson was used for perform correlation analysis^[7]. Pearson correlation analysis is a statistical indicator introduced according to the degree of linear correlation between two variables, and its correlation is defined as the quotient of the covariance and standard deviation between two variables

$$r= \frac{\sum _{i=1}^{n}({x}_{i}- \stackrel{-}{x})({y}_{i}- \stackrel{-}{y})}{\sqrt{\sum _{i=1}^{n}{({x}_{i}- \stackrel{-}{x})}^{2}}\sqrt{\sum _{i=1}^{n}{({y}_{i}- \stackrel{-}{y})}^{2}}} \left(1\right)$$

where r represents the Pearson correlation coefficient, ${(x}_{i},{y}_{i})$ is the point distribution of sample i, and $\stackrel{-}{x},\stackrel{-}{y}$ are its mean values, respectively. The specific distribution of each category is shown in Fig. 1. The statistical results of the data are shown in Table 1. The analysis is transformed into a heat map as shown in Fig. 2, and the features with strong correlation are several important hormonal indicators used as the main diagnostic basis as previously described.

Table 1

Statistics of data set categories
class	hyperthyroid	hypothyroid	binding protein	general health	replacement therapy	discordant results
index	0	1	2	3	4	5
num	182	593	412	562	336	197

Feature interpolation algorithm. The diagnosis of thyroid disease mainly collects information related to patients' TSH, etc., but the actual data is likely to have missing data. In this paper, firstly, the missing values of each feature are counted, and the data missing are mainly divided into two categories, one is the continuous value missing and the other is the Boolean value missing. The missing continuous values are shown in Table 2, and the following strategy will be used to deal with the missing values: the features with missing values accounting for more than 50% will be deleted. Finally, this paper removes the TBG, and the referral source that are not related to the analysis of etiology, and gets 27 valid feature columns.

For the treatment of other missing values, it is mainly the missing continuous values of hormone measurements. However, there are also validity and importance distinctions for hormone diagnosis in current medicine, for example, TSH can be used as a basis for the initial diagnosis of many thyroid diseases, and its diagnostic reliability is usually at the level of this paper proposes a weighted feature analysis method based on feature importance. Its main algorithmic steps are: first, the data are trained using the random forest algorithm to obtain its relevance basis. To further enhance the trainer effect, a grid search and cross-validation algorithm is used in this paper to find and adjust the specified training parameters and perform k-fold cross-validation to obtain the optimal hyperparameters. Then, the importance statistics are obtained by random forest-based importance analysis as shown in Fig. 3, and the importance index is obtained as noted as

$$I= RF\left(x,y\right) \left(2\right)$$

The importance score I corresponding to each missing term feature column is obtained. then, the importance score is normalized to:

$$I\_norm=\left[\frac{{I}_{1}}{{I}_{max}- {I}_{min}} ,\dots ,\frac{{I}_{n}}{{I}_{max}- {I}_{min}} \right] \left(3\right)$$

The coefficients of the corresponding fitting functions are obtained by iterating through the missing columns, analyzing the characteristic fitting relationships between the current missing columns and other columns, and fitting them by least squares.

$${\text{a}}_{ij},{\text{b}}_{ij},{\text{c}}_{ij},{\text{d}}_{ij}= \text{F}\left(\text{min}\left(\left({\phi \left({y}_{j}\right)-{y}_{i})}^{T}\right)\left(\phi \left({y}_{j}\right)-y\right)\right)\right) \left(4\right)$$

where ${\text{a}}_{ij},{\text{b}}_{ij},{\text{c}}_{ij},{\text{d}}_{ij}$ denote the values of the coefficients in the objective function obtained from the j-th column of features and the i-th column of missing values, respectively. f represents the solution by the method of coefficients to be determined.

The result of fitting the i-th column using the j-th column is

$$\widehat{{y}_{ij}}= {\text{a}}_{ij}+ {\text{b}}_{ij}{y}_{j}+{\text{c}}_{ij}{{y}_{j}}^{2}+{\text{d}}_{ij}{{y}_{j}}^{3} \left(5\right)$$

The filling of feature interpolation can be expressed as the process of introducing prior knowledge by using the feature importance scores to obtain the weighted fitting results for a certain column of missing values using the feature importance of 'TSH measured', 'TSH', 'T3 measured', 'T3', 'TT4 measured', 'TT4', 'T4U measured', 'T4U', 'FTI measured', 'FTI', and thus selecting that value for interpolation to fill the missing values.

$${y}_{ij}= \sum _{i=0}^{k}{I\_norm}_{j}\widehat{{y}_{ij}} \left(6\right)$$

The results after fitting are shown in Fig. 2(b). Compared with the sequence after feature interpolation, the detail expression is richer using only the static interpolation algorithm, incorporating prior knowledge, and the value distribution fluctuates in the vicinity of the static set, preserving the distribution trend of the original feature columns. After removing the feature columns containing a large number of missing values, the remaining missing feature columns are subjected to the levy interpolation algorithm, and the specific interpolation process is as follows: first, all feature columns are initially complemented and set aside using the median interpolation algorithm, and if different feature columns are missing in the peer group when complementing, the mean values of other columns are used as temporary data for interpolation to obtain the initial complemented data. Using the a priori data of other associated features of the same sample, the weighted fit is performed according to the feature importance score to complete some items in the missing columns of the original data. Later in Chap. 5, ablation and comparison experiments are conducted, which show that the feature interpolation algorithm has outstanding results in terms of accuracy improvement.

Table 2

Missing values statistics table
Feature	Missing value ratio
TBG	0.994303
T3	0.223488
T4U	0.042507
FTI	0.042068
TT4	0.005259

Usually, when dealing with medical data, a single model is difficult to achieve the best fit for complex data and its resistance to interference is poor and not ideal. While a single model is a weakly supervised model, integration learning is the combination of multiple weakly supervised models^[8] to obtain a strongly supervised model with better fitting effect and generalization performance. There are several mature integration learning schemes, mainly classified into three types: Bagging, Boosting, and Stacking. Bagging and Boosting use the model of sampling, learning, and combining [9–10]. At the same time, Bagging is equal in the prediction function is more concerned about the combination of voting under multiple models, which ensures the stability of the model. Boosting is weighted, and its learning strategy is to make the variance decrease on the basis of the guaranteed deviation, thus improving the model stability and robustness. Stacking, on the other hand, first trains the output of different models, and then uses the output of multiple models as input before training The final output is obtained, and the comprehensive performance is excellent, and the best combination can be found by the metamodel, as shown in Table 3.

Table 3

Comparison of algorithms under different integration models
Algorithm	Integration method	Strategy Focus	Effects
Bagging	Parallel	Voting combinations	Improve model stability
Boosting	Serial	Weighted Portfolio	Improve model accuracy
Stacking	Parallel	Heterogeneous training	Good overall performance and improved model complexity

Stacking can formally combine the two previous integration learning algorithms by reintegrating the integration model and passing the data from the base learner to the second layer model for training. Integration learning also has some experience in the medical field, for example, R Dhanya et al^[11] used existing ensemble techniques as well as machine learning algorithms to develop new models for breast cancer prediction. Also, Novaes de Amorim A^[12] et al. used Stacking ensemble models to predict visits for Influenza-like illness (ILI) in emergency departments. More studies[15–18] have shown that their results are generally better than individual models; therefore, the Stacking integrated model is preferred as a benchmark method in this paper.

Stacking integrated learning consists of a base learner and a meta-learner, where the base learner is used to make first-level predictions from the original data. The meta-learner is used to obtain higher-level abstract information features for secondary prediction based on the first-level prediction results as input. The advantage of this model: when the base learner deviates in learning some feature space expression and distribution, the meta-learner can correct the deviation by secondary learning. In the selection of base learners, the RandomForest algorithm, which is good at handling high-dimensional data features and can maintain high accuracy even when features are missing, the AdaBoost algorithm, which has cascading characteristics and takes into account the weights of each base classifier to avoid overfitting, the Xgboost algorithm, which introduces regularization terms and defines more accurate losses with second-order derivatives, and the XGboost algorithm, which is better at GBDT algorithm which is more effective in handling different types of data and has high robustness to outliers, lightGBM algorithm which supports parallel operations and multiple mutually exclusive feature bundle learning^[13], and ExtraTree which is capable of random selection of bifurcation attributes.On the meta-learner, the ExtraTree algorithm is used in this paper. The ExtraTree algorithm is more randomized because it traverses all feature attributes of the nodes and the feature attribute fork values are obtained according to random selection^[14].

The algorithm flow is shown in Fig. 3. First, data mining and feature interpolation algorithms are performed on the original data to obtain the pre-processed data, after which each base learner will perform initial learning and divide the original training set into K copies for K-fold cross-validation. Six different base learners are selected, and for each base learner, K-1 copies of data are used for training, and then the remaining copy of data is used for prediction, and the prediction results are obtained as new features. This step is repeated K times to finally obtain 6 new features. The 6 new features are stitched together as the training set of the secondary learner, and the labels of the original training set are used as the output of the secondary learner. For the test set, the secondary learner is used to predict the test set to get the final result.

The above Stacking model is a traditional Stacking integrated learning strategy. the first layer of Stacking is cross-validated to obtain the input data of the second layer, which collects the feature learning of different models on the data. Due to the different model characteristics, it is easy to cause model preferences in the learning process of the models. The results of these preference learning may affect the model accuracy in the learning process of the meta-learner. Thus, after learning the first layer of features, further weighting is needed to balance certain results with lower accuracy to achieve the effect of balancing the secondary inputs and reducing the influence of intermediate results on the final prediction results.

Based on the above analysis, a dynamic assignment integration model is proposed in this paper. Based on the original Stacking strategy, the Stacking integration process is rewritten and the primary data are further processed to assign weights to the test set based on the MSE and F1 metrics of the training set. The traditional Stacking algorithm uses a five-fold cross-validation method to divide the data set into five equal parts, and the five base models are formed by five data subsets, which may usually have uneven data division and lead to poor prediction. The weight assignment formula for MSE and F1 participation is

$${w}_{ij}=\frac{\alpha \left(\sum _{j=0}^{j=6}{ϵ}_{ij }- {ϵ}_{ij }\right)}{5\sum _{j=0}^{j=6}{ϵ}_{ij }}+ \frac{(1-\alpha )\left({\gamma }_{ij }\right)}{\sum _{j=0}^{j=6}{\gamma }_{ij }} \left(7\right)$$

Where $\alpha$ is a dynamic adjustment coefficient taking values between 0 and 1 to balance the MSE weight ${ϵ}_{ij }$ and F1 score weight ${\gamma }_{ij }$. $i,j$ denote the j-th training of the i-th model, respectively. The total weighting coefficient ${w}_{ij}$ of the designed model takes values between 0 and 1.

In the case of learning with the meta-learner, the data composition is then made up of primary data stitched together according to the weights. With this dynamic weighting mechanism, taking $\alpha$ as 0.2 and adding the feature interpolation algorithm, the accuracy of the final integrated learning model reaches 94.7%. The specific algorithm pseudo-code can be expressed as

Input: Training set D={(x₁,y₁),(x₂,y₂),…,( ${x}_{n}$, ${y}_{n}$)};

Number of training sessions n

Base Learners ${b}_{1}$, ${b}_{2}$,…, ${b}_{m}$;

Meta-Learner $meta$;

MSEWeights $ϵ$;

F1Score Weights $\gamma$;

Secondary Input Integration${F}^{{\prime }}$

Algorithm:

1: ${D}^{{\prime }}$ =Sampling(D);

2: for i = 1,2,3,…,6 do

3: for j in range(n) do

4: ${z}_{ij}$ = ${b}_{i}.fit$(${D}^{{\prime }}$);

5:${pred}_{ij}= {z}_{ij}.predict({D}^{{\prime }}\_test)$

6:${w1}_{ij }= \alpha (sum\left({ϵ}_{ij }\right)– {ϵ}_{ij })/sum\left({ϵ}_{ij }\right)/(m-1)$

7:${w2}_{ij }=(1-\alpha )\left({ϵ}_{ij }\right)/sum\left({ϵ}_{ij }\right)$

8:${w}_{ij }= {w1}_{ij }+{w2}_{ij }$

9: end for

10: ${{D}_{2}}^{{\prime }}$ = ${F}^{{\prime }}$(${D}^{{\prime }}\left({pred}_{ij}\right)*{w}_{ij }$)

11: end for

12: ${meta}^{{\prime }}$ = $meta$(${{D}_{2}}^{{\prime }}$)

Output: prediction = ${meta}^{{\prime }}$(${{D}_{2}}^{{\prime }}$)

The inputs to the Stacking model algorithm based on feature interpolation and dynamic assignment are: training set D, training number n, base learner {${b}_{1}$, ${b}_{2}$,…, ${b}_{m}$}, meta-learner meta, MSE weight $ϵ$, F1 score weight $\gamma$, and integration operator ${F}^{{\prime }}$. First, the preprocessed dataset is divided into training and test sets using a random sampling strategy with a ratio of 8:2. After that, each model training is performed according to K-fold, trained using the base learner and predicted for the validation set data, and collated to obtain the weight assignments for MSE and F1 participation, respectively. $\alpha$ is the assignment dynamics for balancing MSE weight $ϵ$, F1 fraction weight $\gamma$ in each random sampling training adjustment parameters, the algorithm generates a set of meta-features by applying each base model to the training data and weighting the predictions using fusion weights to improve the expressiveness of the model accuracy. The meta-features are then used as inputs to the meta-model to generate the final prediction results.

1D-CNNs have performance beyond conventional algorithms on data sequences.Levent Eren^[19] designed a compact adaptive 1D convolutional neural network (CNN) classifier for studying the performance of a bearing fault diagnosis system.Abo-Tabik M^[20] et al. designed a 1D-CNN model for predicting smoking events.

To further improve the accuracy of thyroid disease diagnosis, an efficient data processing network EdcNet (Efficient data process Net) is designed in this paper based on the residual structure^[21]. The bottleneck of the network consists of several residual blocks shown in Fig. 4, and firstly, two convolutional kernel size 2 convolutional blocks are performed on the main line, the first convolutional block consists of 1D convolution, BN^[22] and LeakRelu, and the second without activation module. In the residual part, the crossover was chosen to proceed in such a way that only the last residual block of each bottleneck was reinforced with the attention strategy for the upper level features, otherwise the dimensional transformation was performed using the normal convolution with a convolution kernel size of 1 and the convolution part stride of the last residual block was 2, which was used to downscale the data and obtain high-level semantic information.

For the construction of the residual blocks, we introduce the ECANet^[23] attention mechanism, which uses a 1x1 convolutional layer after the global average pooling layer, removing the fully connected layer. This module avoids dimensionality reduction and effectively captures cross-channel interactions.

The final overall architecture is shown in Fig. 5 through repeated experiments and tuning of references. As each individual in the thyroid dataset is independent, the channel is 1 and there are 27 valid features, so the length of the input is set to 27.

The proposed EdcNet is a specialization network designed based on the thyroid dataset, and its network parameters are calibrated with the data dimensions and aligned on the dimensionality. The three average pooling layer kernel sizes of 1, 3 and 5 in the SPP structure are used to extract and integrate the features of interest to different kernels with a set step size of 2. The formula for its set padding is

$$padding=\frac{kernel-1}{2} \left(8\right)$$

The size of the aggregated features becomes 1*channel*1 after global maximum pooling. channel size can be determined by the initial setting, and the channel sizes of different bottles of EdcNet, input SPP part are set to 32, 128, 128 and 16 respectively. finally, after 1000 rounds of training, the best accuracy reaches 95.6%, and the accuracy, recall and F1 score are the best among the comparison models. The accuracy, recall, and F1 score are the best among the comparison models. However, from its training process, the model has some overfitting phenomenon, and the larger learning rate of SGD optimizer is used in the pre-training process, and the network accuracy shows some fluctuations in the training process. The specific experiments will be elaborated in Section 5.

The experiments designed in this paper use the UCI thyroid data set as the initial data, and the simulation hardware environment is Intel(R) Core(TM) i7-10700K CPU 2.90GHz, NVIDIA Quadro 4000 graphics card, Python 3.8, CUDA 11.0. PyTorch 1.8 is the programming framework.

Selection of the best model. In this paper, a set of comparison experiments as well as a set of ablation experiments are set up. In the comparison phase, eight sets of comparison experiments are designed in order to preferably select the best studied benchmark model, and RandomForest, AdaBoost, Xgboost, GBDT, lightGBM, ExtraTree are used for the base model, and ExtraTree is used for the secondary training metamodel. Through the experimental comparison, it is found that Stacking has a better accuracy and comprehensive index AUC score than the base model and other machine learning algorithms, with an accuracy of 92.3% and an AUC of 99.6%. After preliminary screening and comparison, the results are shown in Fig. 6(a), Stacking integrated model has the best accuracy, followed by Xgboost model with 92.1% accuracy.

Figure 6 Stacking preferences and ablation experiments of the modified model. (a) shows the preliminary algorithm preference, and the Stacking integrated model used in this paper has the highest accuracy and AUC.(b) Stacking + in (b) indicates the dynamically assigned Stacking model, and those with * indicate that the feature interpolation algorithm is used, while those without * indicate that no feature interpolation is performed and the ordinary mean interpolation is used.

Ablation experiments. In the ablation experiment, the feature interpolation algorithm designed in this paper introduces certain prior knowledge, which makes the filling of missing values in the network more reasonable and plays a better accuracy improvement effect. As shown in Fig. 6(b), the accuracy of the dynamically empowered Stacking model reaches 93.9%, which is 1.6% higher than that of the Stacking model without weighting, with a recall rate of 94.1%, an F1 score of 94.1%, and a basically unchanged AUC score. The deep learning model, EdcNet*, has an accuracy ACC of 95.6%, a Precision of 96.12% under the weighted computation rule, an improvement of 1.2%, a 1% improvement in Recall, and a 0.9% improvement in F1, and an essentially flat AUC compared to the model with only normal interpolation.

Training details of the EdcNet model ablation experiment.

As shown in Fig. 7, two sets of ablation comparisons are set up, one is the result of EdcNet training process and accuracy and other metrics with the introduction of feature interpolation algorithm, as shown in Fig. 7-(a), (b), (f) for Loss curve, Acc curve and ROC curve, respectively. The other category shows the results related to EdcNet without introduction.

In order to more clearly demonstrate the enhancement effect brought by the feature interpolation algorithm in EdcNet 1D convolutional network, the relevant metrics and loss curve changes are recorded during the training process. The convolutional neural network training in this paper adopts SGD as the optimizer and uses the cross-entropy loss function. Since the model is a pure sequence learning, the learning rate is set to a larger 0.2. In the training without using the eigenvalue algorithm EdcNet, the validation set loss function rises instead at the 300th round, and overfitting phenomenon appears. In contrast, EdcNet, which used the eigenvalue algorithm, did not show heavier overfitting during the training. Meanwhile the Acc of the validation set reached 96.0%, which was 1% higher compared to when it was not added. In the ROC curve, the curve with EdcNet covers a larger area and performs better overall.

In the medical field, the combination of artificial intelligence and big data has led to a trend toward intelligent diagnosis in the field of medical data analysis. In this paper, the thyroid dataset was taken as an example and the contributions are as follows:

(1) For missing samples, an interpolation algorithm based on feature importance is proposed, which introduces prior knowledge in feature learning and mitigates the model overfitting phenomenon, thus further mining the feature expression of the data.

(2) A single machine learning algorithm suffers from model accuracy and stability. In this paper, multiple models are fused by Stacking integrated learning to achieve the accuracy improvement of secondary prediction.

(3) To alleviate the possible training preference phenomenon of some base learners in the training process, this paper designs a dynamically assigned Stacking improvement model to balance the secondary inputs in a weighted manner and reduce the influence of intermediate results on the final prediction results. Finally, the accuracy of the dynamically assigned Stacking model is 93.9%, which is 1.6% higher than that of the original Stacking model.

(4) Finally, this paper explores the capability of 1D-CNN in performing sequence classification and designs EdcNet, which achieves an accuracy of 95.6% after combining with the feature interpolation algorithm, an improvement of 0.9%. However, its training process exhibits some instability, which points the way for future research.

Data availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors have no conflicts of interest to declare.

Authors’ Contributions

The corresponding authors are Shasha Song and Wenlin Wang. Yufei Xie and Weiwei Yu contributed equally to this work. Shasha Song and Wenlin Wang designed the research; Ying Jia, Weiwei Yu designed the algorithm; Yufei Xie and Wenke Gao analyzed the data; Shaotai Wen, Chongyang Wang, Sen Wang wrote the manuscript; Shasha Song, Wenlin Wang gave the improvement plan.

Acknowledgement

The authors appreciate the financial support from the National Natural Science Foundation of China (Grant No. 81700056), and the technical support of Shenzhen Gengfeng Technology Co., Ltd. ([email protected])

Temurtas, Feyzullah. A comparative study on thyroid disease diagnosis using neural networks. Expert Systems with Applications, 2009, 36(1): 944–949.
Trambaiolli, Lucas R., et al. Improving Alzheimer’s disease diagnosis with machine learning techniques. Clinical EEG and neuroscience, 2011, 42(3): 160–165.
Topalovic, M.; Laval, S.; Aerts, J.M.; Troosters, T.; Decramer, M.; Janssens, W. Belgian Pulmonary Function Study Investigators. Automated Interpretation of Pulmonary Function Tests in Adults with Respiratory Complaints. Respiration, 2017, 93: 170–178.
Miao, Kathleen H., and Julia H. Miao. Coronary heart disease diagnosis using deep neural networks. International journal of advanced computer science and applications, 2018, 9(10).
Quinlan,J.R., Compton,P.J., Horn,K.A., & Lazurus,L. Inductive knowledge acquisition: A case study. In Proceedings of the Second Australian Conference on Applications of Expert Systems, 1986, Sydney, Australia.
Zhihua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans. Knowl. Data Eng, 2004, 16.
Liu A N, Wang L L, Li H P, et al. Correlation between posttraumatic growth and posttraumatic stress disorder symptoms based on Pearson correlation coefficient: A meta-analysis. The Journal of nervous and mental disease, 2017, 205(5): 380–389.
Li Y F, Guo L Z, Zhou Z H. Towards safe weakly supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(1): 334–346.
Ribeiro M H D M, dos Santos Coelho L. Ensemble approach based on bagging, boosting and Stacking for short-term prediction in agribusiness time series. Applied soft computing, 2020, 86: 105837.
Graczyk M, Lasota T, Trawiński B, et al. Comparison of bagging, boosting and Stacking ensembles applied to real estate appraisal. Intelligent Information and Database Systems: Second International Conference, ACIIDS, Hue City, Vietnam, March 24–26, 2010. Proceedings, Part II 2. Springer Berlin Heidelberg, 2010: 340–350.
Dhanya R, Paul I R, Akula S S, et al. F-test feature selection in Stacking ensemble model for breast cancer prediction. Procedia Computer Science, 2020, 171: 1561–1570.
Novaes de Amorim A, Deardon R, Saini V. A stacked ensemble method for forecasting influenza-like illness visit volumes at emergency departments. Plos one, 2021, 16(3): e0241725.
Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 2017, 30.
Saeed U, Jan S U, Lee Y D, et al. Fault diagnosis based on extremely randomized trees in wireless sensor networks. Reliability engineering & system safety, 2021, 205: 107284.
Yin X, Liu Q, Pan Y, et al. Strength of Stacking technique of ensemble learning in rockburst prediction with imbalanced data: Comparison of eight single and ensemble models. Natural Resources Research, 2021, 30: 1795–1815.
El-Rashidy N, El-Sappagh S, Abuhmed T, et al. Intensive care unit mortality prediction: An improved patient-specific Stacking ensemble model. IEEE Access, 2020, 8: 133541–133564.
Akyol K. Stacking ensemble based deep neural networks modeling for effective epileptic seizure detection. Expert Systems with Applications, 2020, 148: 113239.
Yang Y, Wei L, Hu Y, et al. Classification of Parkinson's disease based on multi-modal features and Stacking ensemble learning. Journal of Neuroscience Methods, 2021, 350: 109019.
Eren L, Ince T, Kiranyaz S. A generic intelligent bearing fault diagnosis system using compact adaptive 1D CNN classifier. Journal of Signal Processing Systems, 2019, 91: 179–189.
Abo-Tabik M, Costen N, Darby J, et al. Towards a smart smoking cessation app: A 1D-CNN model predicting smoking events. Sensors, 2020, 20(4): 1099.
He K, Zhang X, Ren S, et al. Deep residual learning. Image Recognition, 2015, 7.
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning. 2015: 448–456.
Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534–11542.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Thyroid disease diagnosis based on feature interpolation and dynamic weighting ensemble model

Status:

Version 1

Abstract

Figures

1. Introduction

2. Interpolation based on feature importance

3. Dynamically empower the ensemble model

4. Exploration of convolutional models

5. Experiments

6. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1