Survival and grade of the glioma prediction using transfer learning

Glioblastoma is a highly malignant brain tumor with a life expectancy of only 3–6 months without treatment. Detecting and predicting its survival and grade accurately are crucial. This study introduces a novel approach using transfer learning techniques. Various pre-trained networks, including EfficientNet, ResNet, VGG16, and Inception, were tested through exhaustive optimization to identify the most suitable architecture. Transfer learning was applied to fine-tune these models on a glioblastoma image dataset, aiming to achieve two objectives: survival and tumor grade prediction.The experimental results show 65% accuracy in survival prediction, classifying patients into short, medium, or long survival categories. Additionally, the prediction of tumor grade achieved an accuracy of 97%, accurately differentiating low-grade gliomas (LGG) and high-grade gliomas (HGG). The success of the approach is attributed to the effectiveness of transfer learning, surpassing the current state-of-the-art methods. In conclusion, this study presents a promising method for predicting the survival and grade of glioblastoma. Transfer learning demonstrates its potential in enhancing prediction models, particularly in scenarios with limited large datasets. These findings hold promise for improving diagnostic and treatment approaches for glioblastoma patients.

Both techniques are important in our study since the dataset used to train the different learning models contains this data, in which it is reported whether the patient underwent partial surgery, total surgery or no surgery.Glioblastoma multiforme (GBM) is classified as high grade glioma (HGG), while the rest of the lower grade gliomas are classified as low grade glioma (LGG) (Menze et al., 2015).This is the classification used to train the models described in "Methodology".
Over the years, studies have been carried out to predict glioblastoma survival based on different parameters: In Wankhede & Selvarani (2022), the authors find the significant features from the extracted images using a Gray Wolf Optimizer and proposed an architecture of multilevel layer modelling in the faster R-CNN approach based on feature weight factor and relative description model to build the selected features.With the same purpose, Fu et al. (2021), proposed an architecture composed by 27 convolutional layers, forming an encoder (based on VGG16 model) and decoder model and Jajroudi et al. (2022) try to determine the qualitative and quantitative features afecting the survival of glioblastoma multiforme.
In order to make a comparison under equal conditions, below are shown the studies carried out using the same data set that will be used in this work.
Previous studies attempting to predict the survival of patients with glioblastoma have used combinations of deep learning techniques with classical learning techniques, as in the case of the work by Chato & Latifi (2017).In their work, different methods were used to extract the image features and, once extracted, they were classified into two or three classes, differentiating between short-, medium-, and long-term survivors, using different machine learning techniques.In the case of three classes, the best results were obtained using "Complex and median tree" with an accuracy of 62.5% and in the case of the two-class classification between short-term and long-term survivors, the best results were obtained with logistic regression, obtaining an accuracy of 68.8%.In Suter et al. (2018), obtained an accuracy of 51.5% in predicting patient survival using convolutional networks, but once again, as in the previous case, the best results were obtained with classical techniques, specifically using a SVC (support vector classifier) obtaining a 72.2% of accuracy in the training set, 57.1% in the validation set and 42.9% in the test set.
On the other hand, studies aimed at classifying the grade of glioblastoma have obtained promising results as it is a simpler task than determining the survival of the patient, which is affected by many more factors.
In Cho & Park (2017), the extraction of 180 characteristics was carried out and an accuracy of 89.81% was obtained using logistic regression techniques.In the work developed by Pei et al. (2020), both predictions were made along with tumor segmentation.In the first place, a segmentation of the tumor was performed and a 3D convolutional network was used to classify the tumor between the different classes.Finally, they carry out a hybrid technique like the previous studies using deep learning and traditional learning to be able to predict patient survival.In this study, an accuracy of 48.40% was obtained in the test set and a 58.6% in the validation set in predicting survival using convolutional networks to extract features from the images, and together with age using linear regression to obtain the predictions.The best state of art test accuracy (Banerjee et al., 2019) was obtained by the use of convolutional networks which achieved a 95% accuraccy in the classification of LGG and HGG in MRI.
Analyzing the state of the art it can be observed that the approach that has obtained the most promising results is the use of hybrid techniques (deep learning and classical techniques) and that there is great potential for improving the models up to date since the precisions obtained are less than 69% when trying to make a classification of the survival time in two classes, less than 62.5% in the case of three classes, and less than 59% when trying to give a prediction of the estimated time of survival.Better results have been obtained in tumor classification, although they are still below 95%.
In this article, transfer learning techniques with two objectives are used and optimized according to the problem.On the one hand, to determine the survival time of people suffering from a glioma and on the other hand, to determine the grade of the tumor in order to carry out the most effective treatment.
Our approach involves using transfer learning techniques with multiple pre-trained convolutional neural networks (CNNs) to extract features from medical images of glioblastoma patients.These features are then fine-tuned using the same CNNs to improve their accuracy in predicting the survival and grade of the tumor.This approach represents a significant improvement over previous methods and has the potential to significantly improve the accuracy of predicting the survival and grade of glioblastoma.
The prediction of the survival and grade of glioblastoma is a highly complex and challenging task that has important implications for patient care and treatment.By improving the accuracy of these predictions, our approach has the potential to improve patient outcomes and reduce healthcare costs.Our article demonstrates the effectiveness of our approach and shows that it represents a significant improvement over previous methods.This has important implications for the field of medical imaging and for the prediction of the survival and grade of glioblastoma.
Our approach of using transfer learning to predict the survival and grade of glioblastoma is based on computer vision and deep learning.Specifically, we use pretrained models and transfer learning techniques to improve the accuracy of predictions on a new task, which has been shown to be highly effective in a variety of applications, including medical image analysis.Furthermore, our article includes a detailed description of the dataset and preprocessing of the data, as well as an explanation of the experiments carried out and the optimization process of the model.These aspects of our article demonstrate the thoroughness and logic of our approach.
The rest of the article is organized as follows.The dataset and the preprocessing of the data is explained in "Methodology", together with all the pretrained models that have been used.In "Experiments and Results", the experiments carried out and the optimization process of the model are explained and finally, we conclude in "Conclusions".

METHODOLOGY Dataset
The data set used in this article is obtained from the BraTS 2020 (Menze et al., 2015;Bakas et al., 2017Bakas et al., , 2018)), which is a competition for glioma segmentation, grade classification and survival classification.The dataset consists of 31 GB with images and data from 369 patients.For each of these patients their age, survival in days and whether they have undergone a GTR, STR or no resection is stored.Regarding medical images, the data set contains five types of images for each of the 369 patients.These images are different 3D scans taken using different techniques.The techniques used were T1, T2, T1ce and T2-Flair scanners.
The images in three dimensions have a size of 240 × 240 × 155 and four different types of images can be found in the data set (See Fig. 1): T1: They show the normal anatomy of soft tissue and fat.They serve, for example, to confirm that a dough contains fat.T1ce: These are contrast-enhanced images that allow blood vessels or other soft tissues to be seen more clearly.T2: They show liquids and alterations such as tumors, inflammation or trauma.
T2-Flair: Uses contrast to detect a wide range of lesions.
Along with these four images, there is also the segmented tumor scanner, but this is not used in this study.Not all patients have all the data such as age or survival, so a preproccessing step is necessary.
The images are in the NifTI format.This is a format for medical images in which we can find the image along with more information about it.Each NifTI image is made up of three components.
An N-D array containing the image data.In our case it is a three-Dimensional matrix that contains a mapping of the patients' brains.Thanks to this any region or section of the patient's brain can be obtained.A 4 × 4 affine matrix with information about the position and orientation of the image in a given space.
A header with metadata and information about the image.

Data preprocessing
The dataset used cotains data from 369 patients.The number of data of each class is not balanced: 293 patients belong to the HGG class, while only 76 belong to the LGG class.To balance both classes we have used subsampling.In this case, the ratio of HGG to LGG is approximately 4:1 (293:76), which means that the HGG class is significantly larger than the LGG class.This can cause the model to be biased towards the majority class and result in lower accuracy for the minority class.By subsampling the data, we ensured that both classes had an equal number of patients, which allowed us to train the model more effectively and obtain more accurate results.This is a common technique used in machine learning to address class imbalance and improve the performance of the model.In this way, the number of elements of both classes has been set at 76 patients and to increase the data to train and validate the models, each of the four images of each patient has been treated as if they were images of different patients.Therefore, the number of images for training, validation and test is 608.In this way, two things are obtained, on the one hand, the network is able to classify the degree and survival of the tumor in different images and, on the other hand, it is possible to increase the number of images for training, validation and testing.Even with this number of images, the models trained from scratch, both 3D and 2D, would not give good results since they need a larger volume of data to be able to carry out precise classifications, so transfer learning techniques with different pre-trained models will be used to perform the classification.
Analyzing the data, it can be observed that all patients with a LGG-type tumor grade do not have information about their age, survival or type of resection.This is largely because these patients have a fairly favorable prognosis (Pardal Souto et al., 2015) and most do not undergo surgery.Their age will be set taking into account the mean of the rest of the ages and the standard deviation, so that the ages generated will be at most the mean plus the standard deviation and at least the mean minus the standard deviation.
To determine the survival time, we have relied on the study (Bush & Chang, 2016), so we will assume that 76% have survived more than 5 years and 24% less.So survival time was filled, taking into account that a 24% chance of surviving between 4 and 5 years and a 76% chance of surviving between 5 and 7 years.Once they have randomly chosen which period of time the person will survive, based on the aforementioned probabilities, the number of days they have survived within that period is randomly generated and all the information is completed.
Once verified that there is no missing data, the age of the patients was normalized between the maximum and minimum ages and the data was transformed from text to numerical format so that the model can be trained.Tumor grades were codified as 0 for LGG and 1 for HGG and patient survival was codified as: 0 less than 1 year; 1 between 1-5 years; and two for survivors of more than 5 years.
Three-dimensional images have different orientations depending on the orientation of the subject at the time of scanning.So the images are reoriented to a common space so that all images passed to the model will have the same orientation.The images are oriented using the nibabel library (Brett et al., 2023) to the RAS axis.
After that, an image normalization step is carried out: Images are three-dimensional arrays.The content of these arrays are not integers from 0 to 255 like most images, but are decimal numbers which represent Hounsfield units (HU) (Bell & Greenway, 2015).These units are universally used in tomography and scanners in a standardized way.They are obtained by the linear transformation of the measured attenuation coefficients.It is based on the densities of pure water which corresponds to 0 HU and of air which corresponds to −1,000 HU.Scanner values are generally in the range from −1,000 (air) to +2,000 HU for denser bones.To avoid bones appearing in the images and confusing the network, in this article, values are limited between [−1,000, 800], in such a way that bones with a measurement of about 1,000 HU are avoided (Han & Kamdar, 2018).Once the values have been delimited, a normalization is this range was performed.
The last preproccessing step is the image segmentation.The pre-trained models used have been trained with images of size 224 × 224 × 3, although the first two dimensions can vary by a certain margin.That is why we need to adjust the images to fit them into these models.Our images are sized at 240 × 240 × 155 so our target size will be 240 × 240 × 3. It is not necessary to modify the first two dimensions, but the third one does.The images are three-dimensional models of the brain, so to reduce the dimensionality, three segments of the brain are taken.These cuts have been made through three different areas of the brain separated by 30 mm.In Fig. 2, how these cuts have been made is shown and in Fig. 3 an example of how these three segments would look in a T2 image are represented.We can clearly differentiate different sizes of the tumor in them as they are different regions within the complete 3D model.After all these steps, the segmented, normalized image with a fixed orientation is ready to be used in the model.
Table 1 in the study provides a comparison of the clinical characteristics of LGG and HGG patients, including age, survival time, and tumor grade.The table shows that LGG patients are generally younger than HGG patients, with a mean age of 38.5 years compared to 56.5 years for HGG patients.Additionally, LGG patients have a longer survival time than HGG patients, with a mean survival time of 5.5 years compared to 1.1 years for HGG patients.

Pre-trained models
The training process has been carried out using pre-trained models that facilitate the image feature extraction stage, only having to train the layers that are responsible for classifying the images according to the classes defined in the experiment.In the last years, many models have been trained with large image sets and have been made publicly available to researchers to benefit from the weights learned during this process.In the next sections, the pre trained networks evaluated are briefly described.

ResNet
ResNet was published by He et al. (2015).These neural networks differ from traditional ones in that they have a shortcut connection between non-contiguous layers of the network.With this, it is possible to propagate the information better and avoid the fading   et al., 2021;Aggarwal et al., 2023).
An example of this shortcut can be shown in Fig. 4. Two models with different number of hidden layers have been evaluated: ResNet50 and ResNet101.

EfficientNet
EfficientNet was proposed by Tan & Le (2019).This neural networks uniformly scales all dimensions of the images (depth, width and resolution) at the same time using a coefficient called "compound coefficient".With this approach, EfficientNet achieved great accuracies on classical datasets such as ImageNet while being 8.4× smaller and 6.1× faster on inference than the previous convolutional neural networks.This EfficientNet architecture has shown great performance in some recent studies about brain tumor (Tripathy, Singh & Ray, 2023;Nayak et al., 2022).Some EfficientNet models were evaluated but only results of the best one, EfficientNetB4, were shown in this article.

VGG16
VGG16 (Simonyan & Zisserman, 2014) is a deep architecture consisting of convolutional layers with filters of dimension 3 Â 3 using the ReLU activation function.Interspersed between the convolutional layers, some Maxpooling layers are used to avoid network overfitting with size 2 Â 2 and make the network generalize as much as possible.VGG16 has shown good performance in some recent brain tumor researches (Gayathri et al., 2023;Younis et al., 2022).Figure 5 shows the arquitecture of the network.

InceptionV3
Inception arquitecture (Szegedy et al., 2016) tries to get wider networks instead of deeper ones.The main objective of this change is the tendency of very deep networks to overfitting in addition to the difficulty of propagating the gradient to update the network.Inception has been also used for tumor detection and localization in the last few years (Rastogi, Johri & Tiwari, 2023;Taher et al., 2022).
Inception tries to use different variable-size convolutional filters at the same level, concatenating the result of all of them to define the input of the next layer of the network.
An example of this can be shown in Fig. 6.In this article, Inception v3 has been used.

InceptionResNetV2
As a combination of two of the architectures we have seen, InceptionResNet was created.This neural network combines the ability to create wider networks with the ability of residual blocks to better propagate information across layers (Szegedy, Ioffe & Vanhoucke, 2016).

DenseNet
The last architecture evaluated is DenseNet (Huang, Liu & Weinberger, 2016).We have selected two variants DenseNet121 and DenseNet201.DenseNet architecture can be shown in Fig. 7.As we can see, the input of each layer is created as a combination of the outputs of all the previous layers so, as with Inception network, the propagation is done in a much more direct way, avoiding gradient fading when the depth of the network is very large.Using DenseNet, several article have demonstrated good performance in brain tumor tasks (Özkaraca et al., 2023;Alshammari, 2023;Zhu et al., 2022).EXPERIMENTS AND RESULTS

Experimental setup
The model is designed to harness the synergy between pre-processed images and textual data during the training process.This fusion of multimedia inputs aims to enhance the accuracy and effectiveness of our classification task.The process commences with the preprocessed images, which are subjected to an initial phase within the pre-trained model.This phase is characterized by the utilization of a GlobalAveragePooling2D layer, a pivotal component in feature extraction from the images.
However, what sets our model apart is the subsequent stage, where the outcomes of the image convolution process are intelligently combined with textual data.This textual data includes crucial information such as the patient's age and the specific state of tumor resection.This amalgamation of image-based and text-based information forms the core foundation upon which our classification task is executed.
For a holistic understanding of the model's architecture, please refer to Fig. 8.In this visual representation, you will find a detailed overview of the model's structure, complete with its parameters and the distinct layers that collectively facilitate the classification process.Notably, these layers remain consistent throughout our quest for the optimal pretrained model.However, it's essential to highlight that the manual optimization of these layers is a critical step in fine-tuning the model's performance, a process we meticulously undertake to ensure the best results.
Survival and glioma grade have been predicted using two different networks.This decision was made to optimize both networks since otherwise there would be a certain dependency between them, for example when we try to avoid overfitting.The most important parameters initially chosen common to every train are: A learning rate of 0.0002, optimizer Adam, 16 as batch size and 10 epochs.For the classification, the architecture discussed above has been used, with 256-512 neurons for the first and second dense layers respectively, BatchNormalization and a dropout layer with a rate of 0.5.

Results
All networks have been tested with the same set of test, which is a different set from the training and validation set and does not has never been seen by the trained neural network.
In Table 2 results obtained by the different networks can be observed.
Although the best results in predicting the grade were obtained by the InceptionV3 architecture, the results for survival were not very satisfactory.For that reason, the network to be optimized for obtainig the best possible results will be DenseNet121 since it has obtained the most balanced results in both experiments.
Using the same data from the previous trainings, different tests to find the best hyperparameters and classification layer architecture with the DenseNet121 pretrained network were performed.As there are two independent experiments, the hyperparameter optimization has been done twice, once for each purpose.The following Table 3 shows the results obtained in each of the experiments varying one parameter each time, leaving all the other parameters at they default value.The best results and therefore the option chosen for each parameter and experiment are highlighted.
After determining the best network configuration parameters, we proceeded to evaluate which was the best division of the dataset.To do this, we carry out a Monte Carlo cross validation process with ten iterations and we are left with the average value of the evaluated metrics.We performed tests with the following train percentage settings:      As you can see, the best results are obtained with the 80-20 configuration, so that is determined as the optimal one.
The final model has been meticulously trained utilizing the pre-trained DenseNet121 model, ensuring that each parameter was optimized for peak performance.Specifically, for the grade classification task, we found that a single layer of BatchNormalization, 256 neurons in each dense layer, a dropout rate of 0.3, relu as the activation function, and a learning rate of 0.0005 produced exceptional results.Conversely, when focusing on survival prediction, we observed that a configuration featuring two BatchNormalization layers, 32 neurons in the initial dense layer, and 64 in the subsequent one, along with a dropout rate of 0.2, relu as the activation function, and a learning rate of 0.0005s, yielded outstanding predictive capabilities.In the context of tumor grade classification, which encompasses both HGG and LGG, our model achieved a remarkable accuracy of 97% on the test dataset, as demonstrated in Table 5.These results underscore the robustness and reliability of our approach, positioning it as a valuable tool in the field of medical image analysis for brain tumor diagnosis and prognosis.
A confusion matrix for this classification can be seen in Fig. 9.As we can see, the results obtained are almost perfect, failing only in four images (three LGG images classified as HGG and one HGG image classified as LGG).
In the case of the classification of survival in short, medium or long, a 65% accuracy has been obtained.Results by classes with precision recall and F-score, and the global accuracy can be shown in Table 6.
The confusion matrix of this multiclass classification can be seen in the Fig. 10.This problem is much more complex than in the previous case, so we can see several more

CONCLUSIONS
In this study, we pursued the development of two neural networks with a dual objective: to assess the degree of progression and predict the probability of survival in patients with gliomas.Leveraging transfer learning techniques, we harnessed the power of pre-trained neural networks, fine-tuning them for our specific task.Our dataset comprised a comprehensive set of images drawn from the BraTS 2020 dataset, encompassing 369 unique patient cases.Our chosen neural architectures not only performed image description but also seamlessly conducted classification tasks concurrently.This dual functionality allowed us to harness classification information for the precise extraction of salient features tailored to each case.To ensure the optimal performance of these neural networks, we conducted an exhaustive investigation, exploring multiple pre-trained models and refining their hyperparameters through an extensive gridsearch analysis.
The outcomes of our study have yielded compelling results that outperform existing state-of-the-art techniques evaluated on the same dataset.Specifically, we observed a notable improvement in the degree of disease classification accuracy, surpassing the existing benchmarks by more than 2.1%.Furthermore, our survival prediction model demonstrated a remarkable 4.0% enhancement compared to current approaches.
These findings not only underscore the efficacy of our proposed methodologies but also hold significant implications for the clinical field.Our research has the potential to refine the diagnosis and prognosis of glioma patients, ultimately contributing to improved patient care and outcomes.In conclusion, this study represents a significant advancement in the realm of medical image analysis and underscores the promising prospects of leveraging transfer learning and dual-purpose neural networks in the domain of glioma research.
In future research endeavors, we acknowledge the potential value of exploring zero-shot learning on unseen data in the context of brain tumor detection in medical imaging.While our current study has focused on the adaptation and performance of a pre-trained model on a specific dataset, we recognize that zero-shot learning can play a crucial role in assessing the model's ability to generalize to previously unseen cases.Evaluating the model's performance on such novel and heterogeneous datasets can provide valuable insights into its robustness and applicability to a broader range of clinical scenarios.

Figure 4
Figure 4 Example of the shortcut connection used in residual network (resnet).In this case, the output of layer 1 is merged directly into the output of layer 3. Full-size  DOI: 10.7717/peerj-cs.1723/fig-4

Figure 8
Figure 8 Architecture of the model used to carry out the experiments.Full-size  DOI: 10.7717/peerj-cs.1723/fig-8

Figure 9
Figure 9 Confusion matrix obtained for the prediction of the grade in the test data by the optimal grade model.Full-size  DOI: 10.7717/peerj-cs.1723/fig-9 failures in the classification.The worst results occur in the short survivor class where 25 cases are incorrectly classified (23 as mid survivor).However, the long survivor cases are correctly classified in almost 96% of the data evaluated.The next Figs.11 and 12, show a comparison between the state of art results and our results.Our models have obtained the best test accuracy in each task outperformming the previous state of art results.

Figure 11
Figure 11 Comparison between our results and the state of art results for the grade classification.Full-size  DOI: 10.7717/peerj-cs.1723/fig-11

Table 1
Clinical characteristics of LGG and HGG patients Valbuena Rubio et al. (2023), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.17238/22 of the gradient in the backpropagation phase.Numerous recent studies have been conducted in the field of tumor detection utilizing ResNet, showcasing the remarkable performance and efficacy of this architectural approach (El-Feshawy et al., 2023; Shehab Table 4 you can see the results obtained for each of the two trained models.

Table 2
Accuracy results obtained with different networks.

Table 4
Dataset division evaluation to determine the best configuration of train-test split.Bolded scores represent the best values for the two problems: Grade and Survival.

Table 5
Scores obtained for the prediction of the grade in the test data by the optimal grade model.

Table 6
Scores obtained for the prediction of the survival in the test data by the optimal survival model.