Glioblastoma Surgery Imaging—Reporting and Data System: Standardized Reporting of Tumor Volume, Location, and Resectability Based on Automated Segmentations

Simple Summary Neurosurgical decisions for patients with glioblastoma depend on tumor characteristics in the preoperative MR scan. Currently, this is based on subjective estimates or manual tumor delineation in the absence of a standard for reporting. We compared tumor features of 1596 patients from 13 institutions extracted from manual segmentations by a human rater and from automated segmentations generated by a machine learning model. The automated segmentations were in excellent agreement with manual segmentations and are practically equivalent regarding tumor features that are potentially relevant for neurosurgical purposes. Standard reports can be generated by open access software, enabling comparison between surgical cohorts, multicenter trials, and patient registries. Abstract Treatment decisions for patients with presumed glioblastoma are based on tumor characteristics available from a preoperative MR scan. Tumor characteristics, including volume, location, and resectability, are often estimated or manually delineated. This process is time consuming and subjective. Hence, comparison across cohorts, trials, or registries are subject to assessment bias. In this study, we propose a standardized Glioblastoma Surgery Imaging Reporting and Data System (GSI-RADS) based on an automated method of tumor segmentation that provides standard reports on tumor features that are potentially relevant for glioblastoma surgery. As clinical validation, we determine the agreement in extracted tumor features between the automated method and the current standard of manual segmentations from routine clinical MR scans before treatment. In an observational consecutive cohort of 1596 adult patients with a first time surgery of a glioblastoma from 13 institutions, we segmented gadolinium-enhanced tumor parts both by a human rater and by an automated algorithm. Tumor features were extracted from segmentations of both methods and compared to assess differences, concordance, and equivalence. The laterality, contralateral infiltration, and the laterality indices were in excellent agreement. The native and normalized tumor volumes had excellent agreement, consistency, and equivalence. Multifocality, but not the number of foci, had good agreement and equivalence. The location profiles of cortical and subcortical structures were in excellent agreement. The expected residual tumor volumes and resectability indices had excellent agreement, consistency, and equivalence. Tumor probability maps were in good agreement. In conclusion, automated segmentations are in excellent agreement with manual segmentations and practically equivalent regarding tumor features that are potentially relevant for neurosurgical purposes. Standard GSI-RADS reports can be generated by open access software.


Introduction
The preoperative MR scan of a patient with a glioblastoma contains essential information that is interpreted by a neurosurgical team for a surgical strategy. Decisions on whether to perform a biopsy or a resection, estimations on how much tumor can be safely removed, the risks of complications and loss of brain functions, and judgements concerning the complexity of the surgery and ensuing pre-and intraoperative diagnostics are imperative for patient outcomes. In addition, the initial scan holds prognostic information, including tumor volume and location [1][2][3], which guides clinical decisions on radiotherapy and chemotherapy and serves patient counseling. In reports of surgical cohorts, multicenter trials, and registries, outcomes are customarily related to measurements of tumor characteristics on the initial scan and related to the outcomes and measurements of other teams [4][5][6][7][8][9][10][11][12][13][14][15]. Furthermore, these reports are pooled in meta-analyses enabling the identification of new patterns in the reported data to guide future clinical decisions [16,17]. Reliable measurements of tumor characteristics are therefore instrumental in patient care and in the development of glioblastoma treatment. Whereas the response assessment of neuro-oncological treatment mainly focuses on changes in tumor volume over time [18,19] and radiotherapy planning on the clinical target volume on postoperative scans [20][21][22][23], pre-treatment tumor characteristics are of special interest for neurosurgical purposes. In addition to tumor volume, these include measurements of distance to and overlap with brain structures and expected resectability. The current standard is segmentation of the tumor in 3D, while qualitative description, measurement of tumor diameter, and bidimensional products are also in use [24]. These segmentations by human raters have disadvantages. Manual segmentations are timeconsuming [25] and therefore expensive. It is common to have inexperienced students or junior investigators as raters for large numbers of segmentations. The level of experience of the rater is an important contributing factor to the accuracy of segmentations [26,27]. Certification of expert raters has not been established. The reproducibility of manual segmentations can be limited, probably due to human error, as attention may fluctuate in monotonous tasks [26,[28][29][30][31][32][33]. In addition, segmentation updates or revisions take considerable time.
Automated segmentation algorithms have been developed and compared with manual segmentations as ground truth [34]. Convolutional neural networks [35], in particular employing U-Net [36], dominate the applications. Their performances have been benchmarked on a standardized image dataset (the Brain Tumor Image Segmentation, BraTS [32,34]), using a diagnostic accuracy approach with human rater segmentations as reference. In this approach, the spatial overlap of segmented voxels is typically reported as a Dice score, and the distance of segmentation surfaces as a Hausdorff metric. Nevertheless, this strictly determines the voxel-wise resemblance between an automated segmentation and the reference segmentation. This does not address the clinical utility of these segmentations, and the curated standardized image dataset is not representative for routine scans, which are often of suboptimal quality due to motion artefacts, missing sequences, and other image degradation. Furthermore, in routine scans, brains are not extracted, as is the case in the BraTS dataset.
Standard reporting and data systems (RADS) have been established for several solid tumors, including prostate cancer [37,38], hepatocellular carcinoma [39], head and neck squamous cell carcinoma [40], solitary bone tumors [41], bladder cancer [42], breast cancer [43], lymph node involvement by cancer [44], and lung cancer [45]. These RADS have enabled rules for imaging techniques, terminology for reports, definitions of tumor features, and treatment response, with less practice variation and reproducible tumor classification. Its broad implementation should facilitate collaborations and stimulate evaluation for development and improvement of RADS.
In this study, we determine the agreement in extracted tumor features between automated and manual segmentations from routine clinical MR scans before treatment and describe their discrepancies. We propose a standardized Glioblastoma Surgery Imaging Reporting and Data System (GSI-RADS) to automatically extract tumor features that are potentially relevant for glioblastoma surgery and demonstrate the use of a software module to create standard reports.

Patients and MR Images
Medical University Vienna, Austria (VIE); and Isala hospital, Zwolle, The Netherlands (ZWO), and between 2007 and 2018 from one hospital: St Olav's hospital, Trondheim university Hospital, Norway (STO). Patients gave their informed consent for scientific use of their data, as required for each participating hospital. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Medical Ethics Review Committee. Data and images for analysis were pseudonymized for analysis.
Patients were identified at each hospital by prospective electronic databases. Part of this cohort was reported earlier to address resectability and comparison of surgical decisions between institutes [46,47]. Descriptive information was collected from the electronical medical records, including age and gender.
Preoperative MR scans were acquired from the hospitals' archival systems and included a 3D heavily T1-weighted gradient-echo pulse sequence at 1 mm isotropic resolution, obtained before and after administration of intravenous gadolinium, and a T2/FLAIRweighted gradient-echo pulse sequence. MR scan protocols were standardized in hospitals but not identical between hospitals. Scanners from several vendors were in use, including Siemens, model Sonata, Avanto, Skyra, Prisma and mMR; GE medical systems, model Signa HDxt or DISCOVERY MR750; Toshiba, model Titan3T; and Philips, model Panorama HFO or Ingenuity with field strength of 1.5T or 3T. Detailed scan protocols have been described elsewhere [25,48].

Manual Tumor Segmentations
Tumors were manually segmented in 3D by trained raters using an initiation by either a region growing algorithm [26] (Brainlab SmartBrush, BrainLAB AG, Münich, Germany) or a grow cut algorithm [49] (3D Slicer, http://www.slicer.org, accessed on 3 June 2021) and subsequent manual editing. Trained raters were supervised by neuroradiologists and neurosurgeons. The tumor was defined as gadolinium-enhancing tissue on T1-weighted scans, including nonenhancing enclosed necrosis or cysts.

Automated Tumor Segmentations
A segmentation model was trained following a leave-one-hospital-out cross-validation strategy over the 1596 MRI volumes featured in our dataset, using the AGUNet architecture [50]. The model was trained from scratch, using the Dice Loss as cost function [51] and an Adam optimizer with an initial learning rate of 1e −3 and stopped after 30 epochs without validation loss improvement. Data augmentation was performed during training to improve generalization, such as random horizontal and vertical flip, rotation, and translation transforms.

Extracted Tumor Features
To correlate the tumor segmentations with standard anatomy, patient images were nonlinearly registered to a standard anatomical reference space, here consisting of the symmetric Montreal Neurological Institute ICBM2009a atlas, symmetric version 09a (MNI) [52,53], using symmetric image normalization as previously described [54,55]. From both the manual and the automated segmentation of each patient, the following measurements were extracted.
The laterality was defined as the main part of the tumor coinciding with either the left or right hemisphere, or none in the case where a tumor volume was not detected. Contralateral infiltration was defined as binary variable, true if any tumor voxel involved the contralateral hemisphere. The laterality index was defined as an index of tumor distribution between hemispheres, where −1 represents a tumor entirely located in the right hemisphere, 0 represents equal distribution of tumor between both hemispheres, and 1 represents a tumor completely located in the left hemisphere.
The native tumor volume in mL was defined as the number of tumor voxels in patient space times the volume of a tumor voxel in patient space. The normalized tumor volume in mL was defined as the number of tumor voxels in reference space times the volume of a tumor voxel in reference space.
Multifocality was defined as binary variable, true if more than one contrast-enhancing tumor component was observed and the second contrast-enhancing tumor component had a minimum volume of 0.1 mL and a minimum distance between the first and second largest tumor components of 5 mm. The number of foci was counted as the number of unconnected components.
The location profile of cortical structures is represented by the percentage of patients with a tumor per cortical parcel in a circular barplot [56]. We demonstrate the location profile of the cohort for two commonly used brain parcellations, Desikan's brain parcellation with 96 parcels based on anatomy [57] and Schaefer's brain parcellation with 17 network classes from 400 parcels based on functional connectivity using a resting state functional MRI [58,59]. Involvement of a patient's tumor with a parcel was defined as any tumor voxel from a patient overlapping with that parcel.
The location profile of subcortical structures is represented by the percentage of patients with a tumor per white matter structure in a circular barplot [56]. The subcortical white matter structures deemed potentially relevant for surgery comprise a selection of tracts in each hemisphere, consisting of the corticospinal tract with a paracentral and three hand segments; the superior longitudinal fasciculus with three divisions; the arcuate fasciculus with a long, anterior, and posterior segment; the frontal aslant tract; the frontal striatal tract; the inferior fronto-occipital fasciculus; the uncinate fasciculus; the inferior longitudinal fascicle; and the optic radiation. The white matter structure definitions from the Brain Connectivity and Behaviour group were used [60]. The involvement of a patient's tumor with a structure was defined as any tumor voxel overlapping with that white matter structure.
The expected residual tumor volume and the expected resectability index were calculated with a resection probability map of 451 patients with glioblastoma surgery in the left hemisphere and 464 patients in the right hemisphere, as reference, consisting of a subset of the current study population [46]. To calculate the resectability, the tumor segmentation masked the resection probability map. The resection probabilities of the masked voxels were summed to obtain the expected resectable volume. The preoperative tumor volume minus the expected resectable volume resulted in the expected residual tumor volume in mL. A division of the expected resectable volume by the preoperative tumor volume resulted in the expected resectability index, ranging from 0.0 to 1.0. This method has been detailed and validated elsewhere [46].
The tumor probability map was constructed for the whole population as 3D volume in standard brain space at 1 mm resolution. The fraction of tumors divided by the total number of patients was calculated voxel-wise.

Software Module and Standard Report
The proposed GSI-RADS software (https://github.com/SINTEFMedtek/GSI-RADS, accessed on 3 June 2021) enables the extraction of the described tumor features from a patient's preoperative MR scan locally. The software has been developed in Python 3 and is compatible for use on Windows 10 (Microsoft Corp., Redmond, WA, USA), macOS (≥10.13; Apple Inc., Cupertino, CA, USA), and Ubuntu Linux 18.04 (Canonical Group Ltd., London, UK). A minimalistic GUI is provided to the user for specifying the required parameters and running the process. The input for the software consists of a 3D T1weighted gadolinium-enhanced MRI volume provided as a DICOM sequence or NIfTI format. A manual segmentation of the tumor can be provided by the user (e.g., NIfTI format); if not, an automatic segmentation will be generated using the trained model. The output consists of a generated standard report in text (.txt) and CSV format, alongside multiple NIfTI files containing the tumor segmentation as a binary mask (in patient and MNI spaces), the registered MR scan in MNI space, and the anatomical region masks in patient space. The standard report summarizes the extracted tumor features for each patient. These include the tumor laterality, contralateral infiltration, the laterality index, the native and normalized tumor volumes, the presence of multifocality and the number of foci, the percentage of tumor overlap with cortical parcels and subcortical structures, the expected residual tumor volume and expected resectability, and binary maps of the tumor segmentation in patient space and standard brain space.

Statistical Analysis
Differences in laterality, contralateral infiltration, multifocality, number of foci, and cortical and subcortical profiles between automated and manual segmentations were evaluated in contingency tables and tested for significance of paired data using McNemar's test for two classes and Friedman's test for more than two classes. The concordance as a percentage was calculated by dividing the sum of concordant classes over the total number of patients. Differences in native and normalized tumor volumes and expected residual volumes and resectability indices were tested for significance using the Wilcoxon signedrank test for paired data. Agreement in laterality index, native and normalized tumor volumes, expected residual tumor volumes, and resectability indices between automated and manual segmentations was displayed in histograms, scatter plots, and Bland-Altman plots and calculated as an intraclass-correlation coefficient using a one-way model based on agreement with 95% confidence interval [61][62][63][64]. Equivalence in laterality, contralateral infiltration, native and normalized tumor volumes, multifocality, number of foci, expected residual tumor volumes, and resectability indices were tested using two one-sided tests for the smallest effect size of interest [65]. The smallest effect size of interest for equivalence bounds in proportions was considered to be 10%, for volumes two mL, for foci one focus, and for expected resectability indices 0.1. The product moment correlation coefficient with 95% confidence interval was calculated for the laterality indices, the native and normalized tumor volumes, expected residual tumor volumes, and expected resectability indices between automated and manual segmentations. Voxel-wise agreement was evaluated in tumor probability maps based on automated and manual segmentations. False discovery rates were calculated for the voxel-wise differences using a permutation test, as previously detailed [47,66].

Patients
A total of 1596 patients were included in this analysis. No scans were excluded based on poor image quality or failed registration. A listing of the populations per hospital is provided in Table 1.

Laterality, Contralateral Infiltration, and the Laterality Index
The automated and the manual segmentations, respectively, identified 785 (49.2%) and 794 (49.7%) patients with left-sided tumors, 792 (49.6%) and 799 (50.1%) patients with rightsided tumors, and 19 (1.2%) and 3 (0.2%) patients in whom no tumor volume was identified and hence were devoid of laterality, as listed in Table 2. Of the five discordant cases with opposing laterality, four were midline tumor with slightly dissimilar tumor voxel numbers in either hemisphere, and one scan was of poor quality with faint gadolinium enhancement of the tumor that the automated method failed to detect while a false positive segmentation of choroidal plexus was segmented contralaterally. In 17 (1.1%) patients, the automated segmentation did not identify a tumor, whereas the human rater did, due to minute tumor size, faint gadolinium enhancement, or poor scan quality. The observed laterality difference was statistically not different from zero (odds ratio: 0.98, 95% CI: 0.89-1.09; p-value = 0.744) and statistically equivalent to zero (95% CI: −0.029 to 0.030; Z = −5.59, p-value < 0.0001). The concordance was 98.6%. Contralateral infiltration was observed in 430 (26.9%) patients based on the automated segmentations and in 469 (29.4%) based on the manual segmentations, as listed in Table 3. The observed difference in contralateral infiltration was statistically not different from zero (Z = 1.54, p-value = 0.125) and statistically not equivalent to zero (95% CI: −0.007 to 0.056; Z = −1.61, p-value = 0.0541). The concordance was 95.4%. The distribution of the laterality indices determined by automated and manual segmentations and their correlation are shown in Figure 1A and the Bland-Altman plot in Figure 1B. The correlation coefficient was 0.998 (95% CI: 0.998-0.998). No bias was observed (0.00039, 95% CI: −0.0024 to 0.0032). The lower and upper 95% limits of agreement were −0.11 and 0.11. This indicates excellent agreement to detect laterality, contralateral infiltration, and the laterality index between the segmentation methods.

Tumor Volumes
The difference between the native and normalized tumor volumes was plotted in Figure 2A,B. The median (interquartile range) of this difference for automated segmentations was −2.6 (6.8) mL and for manual segmentations −3.2 (7.5) mL. Apparently, the standard brain is somewhat larger than the brains of many patients. Therefore, we assessed normalized tumor volume in addition to native tumor volume.
In Figure 2C,E, the native and normalized tumor volumes based on automated and manual segmentations are plotted, indicating excellent agreement. In the Bland-Altman plots in Figure 2D,F, a small negligible systematic bias was observed between the automated and manual segmentations for native (0.4 mL, 95% CI: 0.1-0.7) and normalized tumor volumes (1.2 mL, 95% CI: 0.9-1.5). The limits of agreement were between −11.0 and 11.3 mL for the native tumor volumes and between −11.8 and 14.2 mL for the normalized tumor volumes. This indicates excellent agreement to detect laterality, contralateral infiltration, and the laterality index between the segmentation methods.

Tumor Volumes
The difference between the native and normalized tumor volumes was plotted in Figure 2A,B. The median (interquartile range) of this difference for automated segmentations was −2.6 (6.8) mL and for manual segmentations −3.2 (7.5) mL. Apparently, the standard brain is somewhat larger than the brains of many patients. Therefore, we assessed normalized tumor volume in addition to native tumor volume.
In Figure 2C,E, the native and normalized tumor volumes based on automated and manual segmentations are plotted, indicating excellent agreement. In the Bland-Altman plots in Figure 2D,F, a small negligible systematic bias was observed between the automated and manual segmentations for native (0.4 mL, 95% CI: 0.1-0.7) and normalized tumor volumes (1.2 mL, 95% CI: 0.9-1.5). The limits of agreement were between −11.0 and 11.3 mL for the native tumor volumes and between −11.8 and 14.2 mL for the normalized tumor volumes. This indicates excellent agreement, consistency, and equivalence in native and normalized tumor volume measurements between the automated and manual segmentations. This indicates excellent agreement, consistency, and equivalence in native and normalized tumor volume measurements between the automated and manual segmentations.

Location Profile of Cortical Parcels
The location profiles of the 96 cortical parcels from Desikan's brain parcellation for the patient population are shown in Figure 3A,B according to the manual and automated segmentations. The well-known preferred locations of glioblastoma are apparent, and the incidence profiles of cortical involvement are almost identical between the segmentation methods. The correlation coefficient of the number of patients with parcel involvement as displayed in Figure 3C  This indicates excellent agreement. The location profiles of the 400 cortical parcels converging into 17 network classes from Schaefer's brain parcellation for the patient population are shown in Figure 4A,B for the manual and automated segmentations. The incidence profiles of cortical involvement are almost identical between the segmentation methods. The correlation coefficient of the number of patients with parcel involvement as displayed in Figure 4C    The location profiles of the 400 cortical parcels converging into 17 network classes from Schaefer's brain parcellation for the patient population are shown in Figure 4A,B for the manual and automated segmentations. The incidence profiles of cortical involvement are almost identical between the segmentation methods. The correlation coefficient of the number of patients with parcel involvement as displayed in Figure 4C was 0.998 (95% CI: 0.998-0.999).
This indicates excellent agreement in cortical incidence profiles between the segmentation methods.  This indicates excellent agreement in cortical incidence profiles between the segmentation methods.

Location Profile of Subcortical Structures
The location profiles of 17 white matter tracts in either hemisphere for tumor overlap were compared for the whole population between the automated and manual segmenta-tions in Figure 5A,B, respectively. The incidence profiles of cortical involvement are almost identical between the segmentation methods. The correlation coefficient of the number of patients with tract involvement was 0.999 (0.999-1.000), as displayed in Figure 5C. plot between the number of patients with parcel involvement between the manual and automated segmentations. The dotted diagonal indicates the identity line.

Location Profile of Subcortical Structures
The location profiles of 17 white matter tracts in either hemisphere for tumor overlap were compared for the whole population between the automated and manual segmentations in Figure 5A,B, respectively. The incidence profiles of cortical involvement are almost identical between the segmentation methods. The correlation coefficient of the number of patients with tract involvement was 0.999 (0.999-1.000), as displayed in Figure 5C. This indicates excellent agreement between the segmentation methods.  This indicates excellent agreement between the segmentation methods.

Expected Residual Tumor Volume and Expected Resectability Index
The median (interquartile range) of the expected residual tumor volume was 4.5 (7.2) mL for automated segmentations and 4.7 (7.5) mL for manual segmentations, which have a small clinically negligible difference (0.2 mL, 95% CI: 0.2-0.3; p-value < 0.0001), within the smallest effect size of interest of 2 mL (one-sided test for the upper bound t = −35.6, df = 1575, p-value < 0.0001 and for the lower bound t = 56.7, df = 1575, p-value < 0.0001).
In Figure 6A,C, the expected residual tumor volume and resectability index are plotted, indicating excellent correlation between the automated and manual segmentations. In the Bland-Altman plots in Figure 6B,D, a small negligible bias was observed between the automated and manual segmentations for the expected residual tumor volume (0.5 mL, 95% CI: 0.4-0.5) and for the expected resectability index (−0.005, 95% CI: −0.004 to −0.007). The limits of agreement were between −2.9 and 3.8 mL for the expected residual tumor volume and between −0.07 and 0.06 for the expected resectability index. structure indicated in grey. The width of a bar corresponds with the relative volume of a structure. (C) Correlation plot between the number of patients with structure involvement between the manual and automated segmentations. The dotted diagonal indicates the identity line.

Expected Residual Tumor Volume and Expected Resectability Index
The median (interquartile range) of the expected residual tumor volume was 4.5 (7.2) mL for automated segmentations and 4.7 (7.5) mL for manual segmentations, which have a small clinically negligible difference (0.2 mL, 95% CI: 0.2-0.3; p-value < 0.0001), within the smallest effect size of interest of 2 mL (one-sided test for the upper bound t = −35.6, df = 1575, p-value < 0.0001 and for the lower bound t = 56.7, df = 1575, p-value < 0.0001).
In Figure 6A,C, the expected residual tumor volume and resectability index are plotted, indicating excellent correlation between the automated and manual segmentations. In the Bland-Altman plots in Figure 6B,D, a small negligible bias was observed between the automated and manual segmentations for the expected residual tumor volume (0.5 mL, 95% CI: 0.4-0.5) and for the expected resectability index (−0.005, 95% CI: −0.004 to −0.007). The limits of agreement were between −2.9 and 3.8 mL for the expected residual tumor volume and between −0.07 and 0.06 for the expected resectability index.  This indicates excellent agreement, consistency, and equivalence in expected residual tumor volume and resectability index between the segmentation methods.

Tumor Probability Map
The tumor probability maps based on automated and manual segmentations are provided in Figure 7. The maps were almost identical. Of 1.9 million brain voxels, none had an incidence difference with a false discovery rate below 20%. This indicates excellent tumor probability map agreement between the segmentation methods.

Examples of Disagreement between Manual and Automated Segmentations
From inspection of the cases that showed lower agreement between automated and manual segmentations, four categories of disagreement emerged, as demonstrated in

GSI-RADS Software and Standard Report
An example of the generated output is shown in Figure 9 The numerical results are displayed as text in a window and can be exported in csv file format.

GSI-RADS Software and Standard Report
An example of the generated output is shown in Figure 9 The numerical results are displayed as text in a window and can be exported in csv file format. cers 2021, 13, x 17 of 23 Figure 9. Illustration of the GSI-RADS software and standard report. At the left, the standard report is displayed in text format. At the top right, the patient MRI scan and the patient MRI scan with overlayed automated tumor segmentation are displayed, and at the bottom right, the standard brain space and the registered patient MRI scan with overlayed automated tumor segmentation in standard brain space are demonstrated.

Discussion
The main finding of this study is that automated segmentations are in excellent agreement with manual segmentations regarding extracted tumor features, such as laterality, tumor volume, multifocality, location profiles of cortical parcels and subcortical structures, resectability, and tumor probability maps, which are potentially relevant for neurosurgical planning and reporting. This agreement supports at least equal validity of automated segmentations for these purposes. The generation of automated segmentations is more rapid and more reproducible than manual segmentations, as previously demonstrated [27]. We propose to substitute manual delineations with automated segmentation methods as standard in reports of patients with glioblastoma. To facilitate the distribution of these standard methods, we provide GSI-RADS as software to extract the most relevant tumor features from an MR scan, consisting of tumor laterality, volume, multifocality, location profiles of cortical and subcortical involvement, and resectability.
The use of a uniform method by the neurosurgical community to delineate a tumor and to extract tumor features would be an important step towards standardization across studies and between neurosurgical teams. A suitable segmentation method for neurosurgical use has several requirements: the method should be user friendly, rapid, scalable, accurate, reproducible, affordable, and valid [67]. The present software module is designed to minimize user interaction to import the DICOM scan. The processing duration of the automated method is a fraction of the manual method, which typically takes 30 min per patient [27], deterring to scale to cohorts larger than a few hundred patients. In absence of a ground truth for the exact tumor location, the accuracy of either method remains undetermined. Histopathological and molecular determination of tumor presence based on detailed multiregion sampling would theoretically be the ultimate ground truth [68]. This is infeasible for a patient cohort for obvious reasons. A second-best ground truth is postmortem investigation, although this would restrict a correlation to a recent last scan, and results may not extrapolate to the early stage of disease. An alternative ground truth could be an ensemble of segmentations by multiple expert raters, but this takes considerable time and expense restricted to a limited numbers of patients [69]. Therefore, we took a pragmatic approach and with equivalence between the segmentation methods, the question on the better method can remain unanswered. Automated segmentations are entirely Figure 9. Illustration of the GSI-RADS software and standard report. At the left, the standard report is displayed in text format. At the top right, the patient MRI scan and the patient MRI scan with overlayed automated tumor segmentation are displayed, and at the bottom right, the standard brain space and the registered patient MRI scan with overlayed automated tumor segmentation in standard brain space are demonstrated.

Discussion
The main finding of this study is that automated segmentations are in excellent agreement with manual segmentations regarding extracted tumor features, such as laterality, tumor volume, multifocality, location profiles of cortical parcels and subcortical structures, resectability, and tumor probability maps, which are potentially relevant for neurosurgical planning and reporting. This agreement supports at least equal validity of automated segmentations for these purposes. The generation of automated segmentations is more rapid and more reproducible than manual segmentations, as previously demonstrated [27].
We propose to substitute manual delineations with automated segmentation methods as standard in reports of patients with glioblastoma. To facilitate the distribution of these standard methods, we provide GSI-RADS as software to extract the most relevant tumor features from an MR scan, consisting of tumor laterality, volume, multifocality, location profiles of cortical and subcortical involvement, and resectability.
The use of a uniform method by the neurosurgical community to delineate a tumor and to extract tumor features would be an important step towards standardization across studies and between neurosurgical teams. A suitable segmentation method for neurosurgical use has several requirements: the method should be user friendly, rapid, scalable, accurate, reproducible, affordable, and valid [67]. The present software module is designed to minimize user interaction to import the DICOM scan. The processing duration of the automated method is a fraction of the manual method, which typically takes 30 min per patient [27], deterring to scale to cohorts larger than a few hundred patients. In absence of a ground truth for the exact tumor location, the accuracy of either method remains undetermined. Histopathological and molecular determination of tumor presence based on detailed multiregion sampling would theoretically be the ultimate ground truth [68]. This is infeasible for a patient cohort for obvious reasons. A second-best ground truth is postmortem investigation, although this would restrict a correlation to a recent last scan, and results may not extrapolate to the early stage of disease. An alternative ground truth could be an ensemble of segmentations by multiple expert raters, but this takes considerable time and expense restricted to a limited numbers of patients [69]. Therefore, we took a pragmatic approach and with equivalence between the segmentation methods, the question on the better method can remain unanswered. Automated segmentations are entirely reproducible and free, providing segmentations that can be updated through batch processing, whereas human raters are subject to disagreement between and within raters, yielding unreproducible data from a task that is not trivial in time and expense. In this study, we demonstrate that automated segmentations are equivalent to manual segmentations regarding neurosurgical tumor characteristics, hence they are equally valid. Either segmentation method may yield questionable results in a small subset of atypical tumors, characterized by faint contrast enhancement with large nonenhancing tumor portions, large cysts, or image artefacts. In the absence of a ground truth, we would argue that the reproducibility of an automated segmentation is preferable over arduous manual assessment, even in such less well-defined cases. Likewise, a pragmatic and reproducible standard for tumor volume, focality, location, and resectability based on automated segmentation is preferable over manual delineation.
Our finding that an automated processing by a 'machine' can replace a tedious and error-prone task by a 'human' adds to an already long list [70][71][72][73][74]. From this perspective, our findings are unsurprising and fit in the development of successful implementations of processes automated by deep learning.
Thus far, no other applications have been developed to extract tumor characteristics for use in glioblastoma surgery, although several applications were developed to segment the tumor in scans. The Brain Tumor Image Analysis tool (BraTumIA) has been developed to segment three brain tumor compartments using four scan sequences [33,75] and has been shown to have good agreement with manual tumor volumes on preoperative scans. The Pearson's correlation coefficient between manual and BraTumIA tumor volumes was 0.8 based on 19 patients [75] and 0.88 based on 58 patients [76], albeit with a systematic overestimation. In addition, the BraTS challenge has been held yearly since 2012, which aims to improve disease diagnosis, treatment planning, monitoring, and clinical trials by means of reliable tumor segmentation. Participants have applied more than 200 models over the years. Many models were updated versions of previous submissions. As far as we are aware, none of these models has been used to generate tumor characteristics for neurosurgical practice. Therefore, the quest for the best performance in a common dataset by ranking of Dice score is not necessarily representative for clinical practice. In this study, we sought to address whether automation could replace manual labor without compromising validity in terms of tumor features and to make the software readily available for others to use and validate further, both clinically and technically. Future improvements of automated methods can be easily integrated in updated software.
A strength of this study is good external validity given the mixture of institutions, scanners, scan protocols, and patients. Until standardized scan acquisition protocols are implemented in neuro-oncological care [77], automated segmentation methods should resolve this practice variation. Another strength is the relatively large dataset for training the automated method. A limitation is that we used manual segmentations from one trained rater per tumor, although this probably represents current practice in neurosurgical reports of tumor characteristics.
A practical implication is that standard reports for glioblastoma surgery can now be generated by GSI-RADS. Obviously, improved patient outcomes cannot be expected from better reporting in itself. Indirectly, improved outcomes may result from more accurate data-driven decisions on the use of preoperative techniques such as DTI-based tractography, functional MRI, transcranial stimulation, and intraoperative stimulation mapping. Another indirect effect may be the facilitation of consultation between neurosurgeons and teams and possibly in referral patterns by better recognition of complex surgical cases regarding tumor location and eloquence. An example would be the identification of a more complex tumor near the arcuate fascicle, for instance, by a lower expected resectability index and infiltration of this tract, indicating additional preoperative diagnostics to detail the relation between the tumor and the tract and the use of intraoperative stimulation mapping to safely maximize tumor removal. As such, the automated methods hold potential for development of a quantitative standard for eloquence. Reliable definitions of pretreatment tumor characteristics from MR scans may also facilitate less biased comparisons across institutions, studies, or quality registries. Furthermore, prognostic information, surgical treatment evaluation, and response assessment may indirectly improve the risk stratification of patient cohorts. Finally, the standardized reports could speed up the learning curve and serve in the education and training of neurosurgeons.
In future efforts, several directions are important to explore. The automated segmentations can be extended to other pathology, such as lower-grade nonenhancing glioma, brain metastasis, and meningioma. Alternative automated methods can be benchmarked against the current results. The presented automated method can be trained with data from additional patients and institutions. New tumor features will be added to the standard report, such as different aspects of multifocality and the infiltration and disconnection of white matter pathways. These new measures should be compared with patient outcomes for evaluation of their clinical use [46]. This may, for instance, result in a quantitative assessment of risk for surgical complications and risk for early tumor progression. Other tumor compartments can be included, such as the T2/FLAIR hyperintense region, necrotic or ischemic tissue, hemorrhage, cyst fluid, and ultimately molecular heterogeneity and metabolic activity. In addition, reliable tumor segmentations over time and at different stages of disease would be instrumental to provide standardized reports of postsurgical evaluation and treatment response assessment. Finally, distribution of the software should be available for multiple platforms and environments, such as a standalone web-based application.

Conclusions
Automated segmentations are in excellent agreement with manual segmentations and are practically equivalent regarding tumor features that are potentially relevant for neurosurgical purposes. A standard GSI-RADS report is proposed for these tumor features, including the laterality, volume, multifocality, location, and resectability (https://github. com/SINTEFMedtek/GSI-RADS, accessed on 3 June 2021).