Calibration of the EBT3 Gafchromic Film Using HNN Deep Learning

To achieve a dose distribution conformal to the target volume while sparing normal tissues, intensity modulation with steep dose gradient is used for treatment planning. To successfully deliver such treatment, high spatial and dosimetric accuracy are crucial and need to be verified. With high 2D dosimetry resolution and a self-development property, the Ashland Inc. product EBT3 Gafchromic film is a widely used quality assurance tool designed especially for this. However, the film should be recalibrated each quarter due to the “aging effect,” and calibration uncertainties always exist between individual films even in the same lot. Recently, artificial neural networks (ANN) are applied to many fields. If a physicist can collect the calibration data, it could be accumulated to be a substantial ANN data input used for film calibration. We therefore use the Keras functional Application Program Interface to build a hierarchical neural network (HNN), with the inputs of net optical densities, pixel values, and inverse transmittances to reveal the delivered dose and train the neural network with deep learning. For comparison, the film dose calculated using red-channel net optical density with power function fitting was performed and taken as a conventional method. The results show that the percentage error of the film dose using the HNN method is less than 4% for the aging effect verification test and less than 4.5% for the intralot variation test; in contrast, the conventional method could yield errors higher than 10% and 7%, respectively. This HNN method to calibrate the EBT film could be further improved by adding training data or adjusting the HNN structure. The model could help physicists spend less calibration time and reduce film usage.


Introduction
In addition to dose painting, various strategies of radiation therapy with steep dose gradients are used to deliver a nonuniform dose to a clinical target with reduced toxicity to normal tissues [1,2]. To ensure both spatial and dosimetric accuracy, quality assurance (QA) is vital for treatment centers. Several two-dimensional dosimetry tools have been introduced to expedite this QA, including portal dosimetry devices [3,4], matrix detectors [5][6][7][8][9], and film dosimeters [4,[10][11][12]. Of these, the Gafchromic EBT film is widely used, largely due to its self-development characteristic, near dose-to-water equivalence [13], high spatial resolution, rereadability, relatively uniform dose-response across a wide range of photon energies [2,11,14], and inexpensive techniques for read-out using commercially available flatbed document scanners [15].
Several generations of the Gafchromic film have been developed, but only EBT2 and EBT3 film models are recommended by Ashland for verifying all beam-modulated techniques [16]. This is because spatial nonhomogeneity is corrected by its yellow marker dye [15,[17][18][19][20][21][22], so it is less sensitive to the visible spectrum, and it is available for repeated scans [23,24]. With the matte polyester substrate to avoid the formation of Newton's rings [25][26][27][28], the EBT3 film has active layer composition and dosimetric properties similar to EBT2 [26], with insignificant side dependence of the film [29]. Based on the Ashland report, the effective atomic numbers of EBT2 and EBT3 films are around 6.8 and 7.3, which is approximately water equivalent, increasing their suitability for patient dosimetry [24,30].
However, the calibration responses and the fitting parameters change due to sensitivity variations between film lots and the so-called film "aging effect" that changes film sensitivity with shelf life [31][32][33][34][35]. The film aging effect can be diminished if the background is subtracted using the net optical density, a conventional calibration method; but periodical recalibration (e.g., once per quarter) is recommended [33,35].
Film calibration can be done by extracting the net optical density [36], the pixel value [24], and the inverse transmittance [37] appropriate for the delivered dose with the adequate equations. Recently, Zhuang et al. used the pytorch artificial neural network (ANN) platform (https://pytorch .org/) with inputs of optical densities for calibrating different EBT3 lots [38]. Zhuang subsequently did a trial with 400 training inputs from 6 films, where each film had different lot numbers, and the mean square errors (MSE) of the test batches reached 18 cGy. In our study, a hierarchical neural network (HNN) was built using the Keras functional Application Program Interface (https://keras.io/guides/functional_ api/). Hierarchical networks, based on a hierarchical organization, consist of several ANN subnets, each of which deals with a specific aspect of the input data. The subnet models with some input variables determine the overall training pattern [39]. HNN was previously used for survival analysis [40]. Here, it is used to find a solution for the film age effect and intralot variation.

Materials and Methods
Gafchromic EBT3 films from different lots were scanned by an Epson 10000XL scanner in a fixed portrait orientation to create 127 dpi tiff images before and after calibration delivery, referred to as prescan and postscan, respectively. Just before calibration delivery, a 6 MV photon beam from an Elekta Synergy accelerator was quickly calibrated at the depth of 5 cm (SSD 95 cm, field size 10 2 cm 2 and 1 cGy/MU) according to AAPM TG reports [41][42][43][44]. Then, the film was tightly sandwiched in a 30 cm cubic RW3 polystyrene phantom, and the cubic phantom was located above another 10 cm thick backup plates. The film plane was parallel to the beam central axis with its midline, the line longitudinally separated the film into two equal parts, oriented to be coincident with the central beam.
A dose in the daily treatment range was delivered to the film, and the film dose at midline was exactly calculated by the delivered MU and the verified percentage depth doses [24,36,37,[45][46][47]. After 24 hours, each film was rescanned with the same 127 dpi, and all the tiff format images were analyzed using the Matlab and Keras software. Lot No. 03211802 EBT3 films (Lot C) were exposed 17 times at different dates within 20 months for film calibration. The time interval between the 16 th and 17 th calibration was 4 months. The previous 11 and 16 times calibrations of lot C films were collected to be portion I and II training data with 2394 and 3762 inputs, respectively. The training data of portion II was used to manage the film aging effect since it needs longer collection, and the 17 th calibration film was used for the test data. Portion I was used for the intralot variation verification test by using lot A (lot No. 07191602) with 7 calibrations and lot B (lot No. 03071603) with 3 calibrations. For comparison with the developed HNN method, the conventional method is introduced below.
2.1. Conventional Method. The red-channel net optical density (R-NOD) of the calibration film can be written as where RPV pre and RPV post are the extracted red-channel pixel value (PV) from the prescan (background) image and the postscan image, respectively. The R-NOD extracted from the midline of the film is fitted to the delivered dose using the power function where D fit is the fitted dose; and a, b and c are fitting parameters. The fitting process was repeated twice, the first time with a and b not bound, but c bound between 1 and 3. After obtaining the fitted c value, it was rounded to the nearest tenth. Then, the second fitting process was started with the same parameter values as the first. The percentage error between the calculated dose and delivered dose, E tr , using the conventional method can be written as where D c is the calculated dose by equation (2) and D d is the delivered dose. The films of the first calibration of lots A and B and the 16 th calibration of lot C were used to calculate the fitting parameters individually through the power function of equation (2). All the other films of lots A and B and the 17 th calibration of lot C were used for the verification test.
where W represents one of the R, G, and B channels. Some of the input parameters may depend on each other; however, all have been used for film calibration with different techniques [24,36,37] since each has its own advantages. The red-channel PV has the highest sensitivity to the dose range of daily treatment, while the green-channel PV and blue-channel PV have higher dynamic responses to higher delivered doses [37,48]. As the earliest used parameter, with many published papers, R-NOD was gradually replaced by the IT of the RGB used for the three-channel calibration technique [36,37]. The three-channel background PV was intended to manage the film aging effect. These ten kinds of inputs were reorganized as five input groups: (1) R-NOD,   Figure 2 illustrates the detailed structure.
"Selu," "elu," "relu," "softplus," and "linear" are activation functions. The initial weighting was set as a uniform, random number generator seed 435. The optimization algorithm "Adam" is used as an extension to stochastic gradient descent in place of classical stochastic gradient descent to update network weights more efficiently and steadily. Since the training deals with a multiple-regression problem, a mean squared error (MSE) objective function is optimized through the "Adam" optimizer. MSE is also a desirable metric that is used to evaluate performance of the model. The other two metrics used in this HNN are "mean absolute error" (MAE) and "accuracy." Then, the fitting process was executing with batch size of 20 and 500 epochs. The validation split is 0.45; that is, 45% training data was held back for validation.
The number of hidden layers, neurons, and activation functions were systematically adjusted so all the calculated doses converged to be equal to the delivered doses, which can be examined through the value of MSE and MAE and the illustration of the delivered dose with the calculated dose. The training results using portion I films is shown in Figure 3, where the red line is one calibration data of portion I.

Intralot Verification
Test. Due to sensitivity variations between lots, the calculated dose through the trained HNN model for the 1 st calibration film of lot A and lot B film is found clearly away from the line, where calculated doses are equal to delivered doses. To make the lot C training results work for lots A and B, the calculated film doses of the 1 st calibration films of lots A and B through the trained HNN are refitted to the delivered dose D d as below: where D rfit is the fitted dose; e, f , g, and k are fitting parameters; and H is the calculated dose through the trained HNN.
The fitting results are shown in Figure 4. equation (5) is then used to calculate the film doses of lots A and B.

Aging Effect Verification
Test. The 17 th calibration film of lot C is used for the verification test of film aging. The dose of the 17 th verification film is calculated through the trained HNN that was executed by using the films of portion II, and the calculated dose is compared with the delivered dose.
Here, the refit (equation (5)) is not performed, since the test lot and training lot are the same lot.

BioMed Research International
Applying the deep learning HNN method, the H value of the first calibration film of lots A and B was used to calculate the fitting parameters of equation (5), and all the other films of lots A and B and the 17 th calibration lot C film will be used for the verification test of the interlot and aging effect, respectively. The percentage error between the calculated dose and where D rd will be the film doses calculated through equation (5) for lot A and lot B and D rd = H for the lot C film of the 17 th calibration. 5 BioMed Research International apart from the black line ( Figure 5). After the refitting procedure (equation (5)), the calculated dose (red line) approaches the black line.

Results and Discussion
To calculate the fitting parameters of equations (2) and (5), 228 and 114 data points were extracted from the 1 st calibration film of lots A and B, respectively. For the verification test, the complete 684 and 228 inputs of lots A and B, respectively, excluding the 1 st calibration, were used to calcu-late D rd (equation (6)) and compared with the delivered doses ( Figure 6). The percentage error, E hnn , is within 4.5% using the deep learning HNN method. The percentage error, E tr , generally is also within 4.5%, by using the conventional R-NOD method, but would be higher than 7% and 5% for the delivered doses around 70 cGy, of lot A and lot B, respectively.
The 17 th calibration film of lot C is used for the verification test of the aging effect. The calculated dose through the    Figure 6: Percentage differences between the calculated dose (equation (2), equation (5)) and the delivered dose for the verification test of Lot A and Lot B films.

BioMed Research International
Compared with the study of Zhuang et al. using pytorch with 2-6-3-1 pretraining ANN and 2-6-3-9-1 protraining ANN, 400 NOD training inputs and 80 test NOD inputs [38], more training data and test data (thousands vs. hundreds), were used in our study. Our results also show a lower averaged verification test MSE, 6.4 cGy vs. 10.4 cGy.
Examining either high dose or low dose aspects, the two data ends shown in Figure 3 for the training process of our HNN model, it can be seen that if one part converges well, the other part will have divergences. The final chosen model is actually the compromise of the above. There are some trials that may improve the HNN modeling in future work: (1) modifying the activation functions, hidden layers, and the neuron numbers of the HNN model; (2) using R-NOD to separate the delivered doses to several ranges, with each range having its own welltrained HNN model; and (3) add new, appropriate parameters to the HNN model.
If the film used is not from the training lot, its H value generally will substantially depart from the "perfect" line ( Figure 5) due to intralot variability of the film sensitivity, which requires physicists to calibrate new film lots at least once (by equation (5)). To apply our HNN model, equation (5) was used for the new lot refit, and it proved to be workable ( Figure 6). However, equation (5) may not be feasible if the film sensitivity of a new lot varies so much that the H values will be far from the "perfect" line. Future research could consider resolving intralot variation, by putting the new lot calibration parameters into the HNN training model and giving them higher weights.

Conclusions
A deep leaning HNN method to calibrate the EBT3 film with better calibration accuracy than the conventional R-NOD method is presented. About the aging effect, the percentage error of the HNN method is within 4% and proved to be unaffected, while the averaged percentage error of the conventional R-NOD method is about 6.8%. This new technique can be improved by updating the new calibration data into the HNN training system whenever physicists perform the recalibration. Basing on collecting calibration data with the HNN method, physicists could spend less calibration time and reduce film usage.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.   BioMed Research International