Machine learning for real-time optical property recovery in interstitial photodynamic therapy: a stimulation-based study

: With the continued development of non-toxic photosensitizer drugs, interstitial photodynamic therapy (iPDT) is showing more favorable outcomes in recent clinical trials. IPDT planning is crucial to further increase the treatment efficacy. However, it remains a major challenge to generate a high-quality, patient-specific plan due to uncertainty in tissue optical properties (OPs), µ a and µ s . These parameters govern how light propagates inside tissues, and any deviation from the planning-assumed values during treatment could significantly affect the treatment outcome. In this work, we increase the robustness of iPDT against OP variations by using machine learning models to recover the patient-specific OPs from light dosimetry measurements and then re-optimizing the diffusers’ optical powers to adapt to these OPs in real time. Simulations on virtual brain tumor models show that reoptimizing the power allocation with the recovered OPs significantly reduces uncertainty in the predicted light dosimetry for all tissues involved. empirically fit a correction factor to map the measurements of the online dosimetry to those of the LUTs. Results show


Introduction
When ranking malignancy according to the highest number of years of potential life lost, central nervous system (CNS) and malignant brain tumors are ranked among the top three for men and women [1]. Between 1990 and 2016, the age-standardized incidence rate of central nervous system cancers -including malignant brain tumors -increased globally by 15%, resulting in over 200000 deaths worldwide. The incidence rates increase with increasing socio-demographic index and vary regionally, with East Asia presenting the highest incidence rates [2]. Interstitial photodynamic therapy (iPDT) has shown promise to treat deep-seated tumors (>1cm) [3], mainly due to the development of novel non-toxic and highly selective photosensitizer (PS) drugs [4], increased utilization of fiber-optic probes in tissue characterization and diagnostics [5,6], and the ability to keep tissue death mechanisms restricted to the intended target volume [3]. Moreover, PDT has been shown to induce an immune response, which can prevent or slow tumor re-growth as well as target micro-invasions beyond the treatment planning volume [7]. However, several challenges remain on the path of wider clinical translation for iPDT. At the time of submission of this manuscript, only 3 out of 21 ongoing PDT clinical trials were for solid tumors focusing on interstitial light delivery for PDT. NCT03735095 targets locally advanced lung cancers, NCT03727061 is for locally advanced head and neck tumors, and NCT03897491 targets newly diagnosed supratentorial IDH wild-type glioblastoma.
One major reason for the lack of trials is the challenge in creating personalized treatment plans that can predict the outcome of the treatment accurately and enhance the coverage of the target volume while minimizing damage to surrounding organs-at-risk (OAR). To attain this goal, a treatment plan determines a good placement of light diffusers, while minimizing the number of those diffusers and intelligently allocating the power across them [8][9][10]. However, patient specificity is hard to achieve. Not only biological tissues are optically heterogeneous (optical properties differ across tissues), but also there are patient-specific uncertainties in the optical properties (OPs) for each tissue [11]. Inter-tissue heterogeneity has been addressed in prior work through the use of Monte-Carlo simulations for light propagation that accurately model the physics [8]. However, patient-specific uncertainty is still an open problem and is the focus of this work.
Such uncertainties are caused by several factors [12,13]. The absorption coefficient of the tissues, µ a , is controlled by the availability of absorbing chromophores (blood, water, fat, yellow pigments, etc.), so the change in blood flow during a PDT treatment leads to an alteration in the absorption coefficient and in turn the achieved PDT dose [13]. Moreover, the scattering probability is a function of the cellular density, so edema and cell swelling that occur during the treatment affect the number of scattering events taking place per unit length, and effectively the scattering coefficient, µ s . This in turn affects the treatment outcome. For example, our simulation-based results on glioblastoma multiforme (GBM) iPDT showed that if the optical properties varied by only a standard deviation of 10% from the values assumed during treatment planning, the light dose distribution to the surrounding OAR can be underestimated by up to 75%. If there is enough PS and oxygen concentration in the surroundings to induce cell death, the organs-at-risk can be significantly damaged and the efficacy of the treatment is reduced [14].
Tackling the problem of optical property uncertainty during treatment planning has attracted limited attention in the literature. Prior work mainly relies on analytical models of light transport in biological tissues using finite-element methods (FEM) and diffusion approximations [15][16][17]. These approximations either assume uniform absorption and scattering coefficients across the tissues or lack the ability to accurately model the reflection and refraction of light at tissue boundaries with different optical properties. This problem becomes more evident with varying optical parameters within the tissue itself. Johansson et. al. [18,19] introduced an individualized real-time feedback dosimetry software for prostate PDT to update the light source irradiation powers based on how the treatment parameters change. The authors use an FEM model to calculate the light propagation from point sources, and then monitor the light transmission signals between the implanted optical fibers during the treatment to evaluate the new effective attenuation coefficient using a linear regression model based on the diffusion approximation. The recovered attenuation coefficient value is then fed back to the light simulator and a Cimmino algorithm (a feasibility algorithm to find a good power allocation to sources) is used to update the irradiation intensities of all fibers. While the results of this real-time feedback system are encouraging, the approximation of the spectroscopy using a diffusion equation limits the quality of the generated treatment plan, as it doesn't rely on the separation of the attenuation coefficient into absorption and scattering coefficients needed for accurate light dosimetry prediction. In [20], we tackled the OP variation issue in the offline stage of the treatment, specifically in the planning phase. We implemented several variance reduction techniques and propagated them through the optimization framework in order to minimize the risk of mispredicting the treatment outcome. However, the results showed that these approaches were computationally expensive (more than 24 hours needed to generate a treatment plan) when relying on Monte-Carlo (MC) simulations for light propagation modeling. Additionally, reducing the risk of mispredicting the damage to OAR led to a decrease in the target volume dose coverage.
Other works have attempted to use machine learning (ML) techniques such as generative adversarial networks [21], random forests [22] and support vector regression (SVR) [23] to recover optical properties in real time. However, those works either relied on reflectance spectroscopy and spatial frequency domain images (SFDI) as their training features (which work well with superficial tissues but lack the ability to accurately model the light propagation in deep-seated lesions), or on a specific fiber-optic probe instrument with pre-determined source-detector separations, which prevents generality in the pre-treatment plan placement configuration.
In this work, we investigate the use of ML methods including neural networks and gradient boosting regression trees (GBRT) to recover the optical properties in real time and update the powers allocated to the different diffusers in order to dynamically adapt to the changes in the treatment parameters, and effectively minimize the misprediction of the treatment light dosimetry. We leverage the ability of cylindrical diffusers (commonly used in iPDT nowadays) to be used as both light sources and detectors [17] in order to measure the light dosimetry at different locations of the tissue without adding to the complexity of the procedure or patient discomfort. The light dosimetry measurements are used as input features to train our models, which predict µ a and µ s of the tumor.
We build our models around a state-of-the-art iPDT planning tool, PDT-SPACE [8,24]. The objective of the generated solution by PDT-SPACE is to maximize the tumor light dose coverage, while minimizing the light penetration to OAR to see whether a PDT treatment for the tumor indication under study would be clinically feasible or not. We enhance this tool-set in two ways. First, we simulate several different treatment plans for GBM iPDT with different optical properties and placement configurations in order to train a general optical property prediction model (prevent overfitting) that can predict µ a and µ s for various light diffuser placements. Second, we use the predicted OPs to re-allocate the powers to the diffusers using the built-in engine in PDT-SPACE. We validate our work by running simulations on 3D virtual glioblastoma multiforme tumor models constructed from real MRI images taken from the cancer imaging archive (TCIA) [25]. These simulations show that we can recover tumor optical properties with good accuracy, and that rapidly re-allocating the powers with the recovered OPs improves the light dosimetry prediction.

PDT-SPACE background
PDT-SPACE is an open-source software (available at: https://gitlab.com/FullMonte/pdt-space) for automatic interstitial photodynamic therapy planning [8]. It includes optimized algorithms based on simulated annealing [26] with reinforcement learning [27] (SA-RL) to choose locations for a set of light diffusers [24], and to allocate powers to those diffusers such that the tumor treated volume is maximized, while the damage to OAR is minimized. The power allocation algorithm is based on a linear convex optimization, which guarantees an optimal solution (minimizes the objective function) and has a fast runtime. Below is a brief description on how the power allocation works.
Let n be the number of volume elements in the treatment planning volume. Let also T be the set of all tumor volume elements, and O be the set of all OAR volume elements. Assume we have a set of m sources (diffusers) with pre-determined locations, orientations and lengths, and let x be an m × 1 vector of the source powers to be allocated. Then the objective function of PDT-SPACE, f (x) : R m → R, can be written as follows: where: PDT-SPACE employs the PDT threshold model [14,28,29]. This means there is a PDT dose threshold value for each tissue and photosensitizer combination, after which tissue death mechanisms (apoptosis, necrosis or autophagy) activate. Accordingly, the cost function in Eq. (1) penalizes the over/under-dosage at each volume element, i, by evaluating the absolute difference between the overall light dose received at i, d i (x), and the minimum, d min,i , or maximum, d max,i , threshold dose of that element, depending on if it is a tumor or healthy tissue element. The tuning parameter α i is an importance weight that is set by the user to reflect the importance or priority of each tissue type. Minimizing Eq. (1) results in a solution x * that minimizes the OAR's critical volume (CV: volume that is considered optically unsafe in the treatment) while killing a predetermined volume fraction of the tumor. The light dose, d i (x), and the dose thresholds, d min and d max , are measured as fluence, [J/cm 2 ]. The fluence is based on the light dose absorbed in each volume element due to m sources with powers as specified in x and the PS tissue uptake [8]. We assume per-tissue uniform PS uptake and rely on published cell death/damage thresholds (in terms of photon absorption per unit volume) to calculate the PDT dose thresholds (refer to [8] for details). Based on this assumption, the PDT dose translates to a light dose. While this is a fairly simplistic assumption to the PDT model, it enables a fair comparison of different treatment plans, especially that the focus of this manuscript is optical property variation which directly affects the light propagation in the tissues.
To model the light dose distribution inside the tissues, PDT-SPACE uses FullMonteSW (available at: https://gitlab.com/FullMonte/FullMonteSW) [30]. FullMonteSW is a fast, tetrahedralmesh based Monte-Carlo simulator to model light propagation in optically heterogeneous biological tissues. PDT-SPACE starts by building an n × m dose matrix, D, each element D ij of which is the light dose absorbed in element i due to a unit power applied to source j. This is done by running FullMonteSW m times, each time with a different source j with unit power and storing the results in column j of D. Due to the linearity of dose accumulation, taking the dot product of the i th row of D with x results in d i (x).

Overall system
The entire proposed system for the treatment procedure is illustrated in Fig. 1. The left part of the flow represents the pre-treatment phase, in which we aim to optimize the treatment plan to minimize the predicted OAR critical volume using PDT-SPACE and established population average tissue optical properties. This phase is done in the absence of the patient, so any changes in the patient's optical properties are not considered yet. In step 1, we start with an initial placement of cylindrical diffusers, which can be random or based on a clinical heuristic. In this work, we start with a clinical heuristic that is based on a brachytherapy-derived template of holes such that the diffusers are inserted in parallel with at least 1 cm spacing between them [31]. Then, in step 2, we start optimizing the source placement using PDT-SPACE's optimized algorithm SA-RL. This algorithm is based on simulated annealing, and includes several smart source location perturbation strategies with a reinforcement learning (RL) agent to choose what strategy (move) to attempt in order to minimize the number of attempted moves needed while generating high-quality solutions. This RL agent learns while the algorithm is running what strategies are likely to perform better based on the history of the previously attempted moves' rewards (improvements in quality). Our earlier work [24] showed that a treatment plan that follows such placement reduces OAR critical volume by 46% on average compared to a plan based on the clinical heuristic. Then we allocate the powers in step 3 to the diffusers based on the convex formulation presented in Section 2. Finally, the physician guides the optical fibers to their positions in step 4. Notice that if the final source placement deviates slightly (a few millimeters) from the positions proposed by PDT-SPACE, the physician can execute a final power allocation run to take this deviation into account. In the treatment phase (right part of the flow chart), the clinician/physician starts the treatment by sequentially illuminating the diffusers with their respective powers for a short period (typically ranges from seconds to <10 mins in total based on the clinician's judgement of the observed OAR damage) while reading the measurements (step 5) from the rest of the diffusers (acting as detectors). Once this is done, the clinician checks if the target volume coverage is achieved. If yes, the treatment is ended, otherwise; the light dosimetry measurements are passed to the feedback system to recover the optical properties using pre-trained machine learning models (steps 6 and 7). After that, the PDT-SPACE power allocation method is invoked again (step 8) to re-optimize the powers and minimize the OAR CV while maintaining the target tumor coverage. Ideally, we would like the stoppage time of the treatment (including the re-optimization) not to exceed 5 mins in order to minimize the overall operation time, as this would cause extra discomfort to the patient by lying without movement and might require extra anesthesia. Additionally, bleeding caused by fiber insertions increases optical absorption over time These temporal changes in OPs that can occur while the treatment is stopped might further deviate the optimized plan from the actual outcome of the treatment. To this end, the procedure can be repeated until the target coverage is achieved.

Cylindrical detectors in Monte-Carlo simulations
As mentioned earlier, we need to emulate the online light dosimetry of PDT in this simulationbased work. To this end, we need the ability to accurately model the photon detection profile in a physical interstitial cylindrical detector (ICD) with our MC simulations, where a detection profile gives the probability of detecting a photon (or a photon packet) if it hits the diffusive region of the ICD. We do this by extending FullMonteSW to model the cylindrical sources as detectors [32]. First, we need to determine which photons hit the detecting cylindrical fiber, so we modify the MC engine to identify these photons. One also needs to determine which fraction of these photons will make it through the diffuser and via the fiber to the fiber core and the opto-electronic detector outside the patient's body. Recall that we are re-using the light source fibers as detectors, and hence different fibers can have different engineered scattering profiles that make them more effective as distributed light sources. When guided non-laser light (that is propagating in the biological tissue) interacts with these laser-induced refractive index changes, it is scattered in all directions with varying power [33]. Therefore, we need to model these complex scattering profiles when the diffusers are used as detectors. Section 1 of the supplemental document explains how we model these profiles in FullMonteSW.

Recovering µ eff
To recover the effective attenuation coefficient, we rely on the knowledge of the detected fluence at the different detectors. We use a look-up table (key-value pairs) based approach to extract the best matching µ eff value. This look-up table is built in the pre-treatment phase by running the MC simulator with different randomly-generated optical properties. Then we store the detected energies from these simulations as keys in the tables, and the µ eff for each run as the value. The effective attenuation coefficient is calculated from the absorption and scattering coefficients as follows [34]: where µ ′ s = µ s (1 − g) is the reduced scattering coefficient based on the anisotropy factor, g. Let s be the number of random tumor optical properties samples drawn from a certain random distribution (typically between 20 and 50). In this work, we assume normal distributions for the absorption and scattering coefficients, with means equal to the population average (taken from the literature), and a standard deviation of 10% of the mean. Then the µ eff recovery process is as follows: 2. Run the FullMonteSW MC simulator several times, with differing tumor optical properties, on the treatment volume being planned, using source placement generated by PDT-SPACE. FullMonteSW is run m times, once for each source with the tumor optical properties being (µ a , µ s ) l . In each simulation, there would be 1 source and m − 1 detectors.
3. For each diffuser j, 1 ≤ j ≤ m, build a vector of size m − 1 containing the detected energy at each one of the m − 1 detectors. We will refer to this vector as, q (j) .
4. For each diffuser (source) j, store its corresponding q (j) vector as a key in an ordered look-up table (LUT) j, and the simulated µ eff ,l (as calculated in Eq. (3)) as the corresponding value. This LUT is ordered in a lexicographical order, where a vector For equality tests, we add a tolerance of ϵ = 10 −4 to account for inaccuracies inherent in MC simulations. In other words, v t = u t if and only if |v t − u t | ≤ ϵ.
In this way, we have m LUTs each of size s. These LUTs are built once for each tumor anatomy in the pre-treatment phase after determining the diffuser placement. During the treatment, when we collect the detection measurements at each of the diffusers, we build a new vector q (j) for each source j and compare it against LUT j. If q (j) is found in the LUT, we take the corresponding µ eff to be the recovered value at source j. Otherwise, we find two vectors q lb and q ub in LUT j such that q lb ≤ q (j) ≤ q ub . Then we take the average of the corresponding µ eff values as our recovered µ eff . In this process, we have m different recovered µ eff values, one for each source, which are then fed to the ML model as input features as will be shown in the next section.

ML models for OP recovery
In this section, we utilize information on the tumor anatomy, number of diffusers, their orientation (relative to one another) and the recovered µ eff at each diffuser as input features to train ML models to separate the absorption and scattering coefficients. We compare two machine learning techniques as detailed below based on a certain cost function (see Section 4). We also perform different experiments by hiding certain features (feature reduction) from the training data to see each one's effect on the cost. We also compare the two models to a baseline naive model to check if they perform better than such a simple model. This model is based on fixing one of µ a and µ s and calculating the other parameter via Eq. (3).  [35]. However, this minimum number (27 in our simulations) led to significant loss of information that rendered the PCA model ineffective. Therefore, the results of the following sections all use the zero padding technique to create a full feature set (∼1500 features) for all tumors tested.

Gradient boosting decision trees
Gradient boosting decision tree (GBDT) is a machine learning technique for regression and classification based on learning an ensemble of decision tree models in sequence [36]. In each iteration, GBDT learns a new tree by fitting the negative gradients (or residual errors) from the previous iteration. The main advantage of GBDT that makes it suitable to our application is that it has the ability to handle missing data (variable feature count in our case) by grouping features that are rarely non-zero together. Additionally, GBDTs often work well with categorical and numerical values as is, i.e., there is no need for any data pre-processing. However, the drawback of GBDTs is that they don't support multiple outputs, and usually a different model is trained for each independent output (which is the case in our work, as we try to recover both the absorption and scattering coefficients). Furthermore, the model's performance is heavily dependent on the number of features and data size [37]. In other words, the training time Detected fluence at each interstitial cylindrical detector (ICD), m(m − 1) for each possible diffuser source Recovered µ eff at each diffuser m and memory footprint increase as the number of features and training samples increase. This is because the model has to compute the gradient for each data point (training sample) for each feature. Not only is this computationally expensive, but also usually leads to overfitting, especially if the training data is skewed. To train the model, we use LightGBM [38], an open-source and highly optimized framework for GBDT training. To prevent overfitting, we use RandomizedSearchCV from Scikit-Learn [39] to perform hyper-parameter tuning on several parameters (including the number of estimators or trees, tree depth and learning rate for computing the gradients) and find the best model performance on the cross validation set.
RandomizedSearchCV takes a LightGBM model initialized with some hyper-parameters, and then iteratively picks a random point in the search space of these parameters and retrain the model with this setting. Note that RandomizedSearchCV actually accepts several common regression and classification models and not only restricted to LightGBM.

Artificial neural networks
Artificial neural networks (ANNs) are the second ML model type we investigate for µ a and µ s prediction. ANNs constitute a vast area of machine learning. In this work, we use one type of these networks called multilayer perceptrons, which consist of multiple feed-forward layers, each consisting of nodes (often called artificial neurons). In general, ANNs excel at learning complex relationships among input features, and can also work with incomplete knowledge of the data. In other words, ANNs may produce meaningful outputs after training even with missing features [35]. This is helpful in case the suggested features in Table 1 are not enough. The model accuracy obviously depends on the importance of the missing features. However, for ANNs to work well, there is usually an overhead of pre-processing the data to remove any skewness and normalize the features and sometimes the output labels. For example, in our simulations the scattering coefficient, µ s , is usually 2 orders of magnitude higher than the absorption coefficient. Therefore, our preliminary training results showed that the ANN model was always fixing µ a to zero, while trying to predict µ s . The reason is that the cost metric used, which is the mean squared error, is usually dominated by the larger values. As a result, normalizing the output labels (µ a and µ s values) to the range [0, 1] significantly improved the accuracy of the model. Another common disadvantage of ANNs is the difficulty in determining the proper network structure, which is usually based on trial and error and can be computationally expensive. In our simulations, we perform a randomized grid search and hyper-parameter tuning to find the best model in terms of the number of hidden layers and neurons and regularization rate to achieve the best performance on the validation set (see Section 3.4.4) and reduce overfitting. Figure 2 shows the final network used in our simulations after hyper-parameter tuning, where F is the total number of features used to train the model. The network consists of an input layer of F neurons, three hidden layers of 33 neurons each, and an output layer of 2 neurons corresponding to the two optical properties we aim to recover. Note that the input layer is passed to a normalization layer that normalizes all the features to be based on a normal distribution with zero mean and one standard deviation. This is a common practice to speed up the training time [35]. Additionally, the output layer is passed to a mapping layer that maps the outputs to their original order of magnitude. Finally, all the hidden layers use tanh as their activation function, while the output layer uses a rectified linear unit (ReLU) as the optical properties cannot be negative.
To build and train our ANNs, we use an open-source Python-based framework called Keras (version 2.4.3) [40]. Keras is an easy-to-use deep learning API that runs on top of the well-known ML framework, TensorFlow [41].

Training data
For the training data, we run FullMonteSW with randomly generated optical properties for the tumor sampled from a normal distribution as discussed in Section 3.3, and then we use the algorithm described earlier to predict the µ eff at each diffuser. For each case, we store the detected photon weights at each ICD due to illuminating each source at a time, along with the recovered effective attenuation coefficients and all the other features summarized in Table 1. We generated 6300 (9 tumors × 2 source placements × 350 OP samples) different samples from 9 different simulated iPDT for GBM models (see Section 4) that follow either a heuristic placement of parallel diffusers, or an SA-RL placement as described in Section 3.1. For each MC simulation, we launch 10 7 photon packets, as this value showed the best runtime-quality trade-off to recover the attenuation coefficient (see Section 5.1). The total runtime to generate the 6300 test cases was five weeks. Finally, we split the data randomly into 70% training set and 30% validation set. We further split the validation set into 20% cross-validation set for the models during training and 10% test set for the final evaluation.

Testing methodology
Our simulation geometry, PS assumed, tissue optical properties and light dose thresholds are based on what we have used in our earlier works [8,24]. Below we briefly describe these data. For more details, please refer to the supplementary material to this manuscript. All the methods and models used in this manuscript are available on our GitLab repository (Ref. [42]), along with instructions on how to use them and run the ML experiments presented here.

Simulation geometry
We simulated and trained our models on nine different brain tumor models that approximate clinical glioblastoma multiforme (GBM) images obtained from the cancer imaging archive [25]. These tumors were augmented to a segmented brain atlas taken from [43]. The mesh comprises 423, 376 tetrahedra representing five different materials: skull, cerebrospinal fluid (CSF), grey matter (GM), white matter (WM) and tumor. The tumor is assumed to include part of the infiltration zone. Figure 3 shows three cross-sectional slices of the brain mesh used, along with one of the tumor cases modeled. Finally, the diffusers used in this work are line sources that emit light isotropically and vary between 0 cm and 4 cm in length with 1 cm increments (a 0 cm diffuser means a point source is used instead).

PS and optical properties
For the photosensitizer, 5-ALA (5-aminolevulinic acid) induced PpIX is assumed in the simulations. This PS is used in most clinical studies for brain-tumor iPDT [44]. The nominal (population average) optical properties of the relevant brain tissues and GBMs were extracted from the literature [11,[45][46][47] at a 635 nm PS activation wavelength.

Light dose thresholds
As mentioned in Section 2, PDT-SPACE follows the PDT threshold model [14,28,29]. The tissue specific fluence dose death thresholds, d min and d max , used in Eq. (2) were calculated based on the number of photons absorbed by the PS per unit volume and the tissue-specific population average PS uptake taken from [28].

Cost function and evaluation metrics
The cost function we use in minimizing the error of our models is the mean squared error (MSE). MSE is useful as it incorporates the variance of a regression model and its bias, and is rather precise compared to an absolute metric (The gradient of MSE is higher for larger losses or errors). Given two matrices Y true and Y pred of size k × l, where k is the size of the training/validation/test set and l is the number of outputs, the MSE is calculated as follows: Note that the MSE can be dominated with larger output values if the different outputs are of different magnitudes; therefore, best practice is to normalize the values of the model outputs to the same range before training as we did in Section 3.4.3. For the GBDT model, this is not a problem, as we train a separate model for each output.
To compare our models, we rely on two metrics. The first is the coefficient of determination, R 2 , which is common in regression models and is inversely proportional to the variance of the dependent variable that is predictable from the independent variables. This means that the higher R 2 is, the more accurate the model is in predicting the output. Ideally, R 2 is 1 for the best model. For each output j, R 2 is calculated as follows: whereȲ true j is the mean true value of output j. Note that R 2 can be indirectly interpreted from the MSE, where a higher MSE means that model has a lower R 2 . The second metric we use to evaluate our models is the mean absolute percentage error (MAPE). This metric measures the error rate of the model and gives us the ability to directly assess its quality. MAPE is calculated as follows: Note that this metric fails with zero true label values, but these do not occur for optical properties of biological tissues.

Computational platform
We implemented the algorithms in C++, and tested on a Linux machine (Ubuntu 16.04) with a 12-core (24 hyper-threads) 2.2GHz Xeon E5-2650v4 CPU and 128GB of RAM. We used the MOSEK solver [48] for the convex optimization power allocation problem. Finally, for both optical properties prediction models (GBDT and ANN) training and tuning, we used an NVIDIA GeForce RTX 2080 Ti GPU.

Recovering µ eff
To evaluate the LUT-based method of Section 3.3 to recover the tumor attenuation coefficient, we first implemented a simple linear regression model that uses the dosimetry measurements and the assumption of a diffusion approximation to predict µ eff as performed in [18]. We compare the error rate in recovering µ eff using our method to that of the regression model. Then we vary several parameters and evaluate their effects on the quality of the LUT-based model. Parameters changed include the number of photon packets launched in the light simulator, the LUT size s (the number of randomly chosen optical property trails), the radius of the ICDs, the standard deviation σ of the distribution assumed during recovery, the tumor anisotropy factor g, and the surrounding OAR's optical properties. We assume a numerical aperture (NA) of 0.5 for the cylindrical detectors, which is a typical value for diffusers designed to work with diode lasers [17]. a The LUT is built using 1 mm radius detectors, but recovery is using 0.5 mm detectors with an empirical correction factor.
Recall that the LUT is built using a normal distribution of OPs with a mean set to the population average, and a σ of 10% of the mean. We also fix the optical properties of the surrounding healthy tissues to their nominal values during the LUT building procedure, but they can be varied during recovery/inference. Table 2 summarizes the results of the different tests by showing the arithmetic average of the error rate (recovered value µ eff with respect to actual value) across the nine tumor models. In some of the columns, a value F means the parameter is fixed to the nominal value, and a value V means the parameter is varied based on a random distribution. From the table, we can see in Test 1 that a simple linear regression model does not perform well; it recovers the attenuation coefficient with an average error of 15.9%, which is higher than the 10% standard deviation in this test case. The reason behind this large error is likely due to the failure of the diffusion approximation to model tissue heterogeneity and the correct light transport at small distances from the source, when the diffusion requirements are not met. This means that the MC simulation results will be different from what the diffusion equation expects and the model is not applicable. Test 2 shows that simulating the LUT-based method with 10 6 photon packets results in 2.56% error in recovering µ eff , which is significantly better than the linear regression model. This test assumes a LUT size, s, of 50 key-value pairs in each table, and a cylindrical detector radius of 0.5 mm, which is typical for manufacturable diffusers. If we increase the number of photon packets to 10 7 (Test 3), the error further decreases to 0.94% at the cost of increased runtime in building the lookup tables (3.7 hrs compared to 22 mins). Further increasing the packet count to 10 8 (Test 4) slightly improves the error to 0.38% with 10 times increase in runtime (37 hrs on average). This slight improvement in quality does not justify such a high runtime, and therefore, we use 10 7 photon packets for the rest of this study.
If we build the lookup tables and recover µ eff with a 1 mm detector radius, we can see from Test 5 that the quality improves and the error decreases to 0.44%, which is close the best result of Test 4. However, since 1 mm is not common for manufacturable cylindrical diffusers, we build the LUTs using 1 mm radius in the simulations, and during recovery we use 0.5 mm diffusers (Tests 7 to 10 follow this scheme); this is a variance reduction technique to detect more photons of interest with a simulation of a fixed number of photons. We have empirically fit a correction factor to map the measurements of the online dosimetry to those of the LUTs. Results show  Table 2.
(Test 7) that the recovery error is 0.83%, which is slightly better than the 0.5 mm case (Test 3). Figure 4 shows the data points of the nine tumor cases for this test. The LUT model of test 7 will be used throughout the remainder of this study.
If we vary the OAR optical properties during recovery (a more complex and realistic scenario than varying only the tumor OPs), we can see from Test 8 that the error rate barely increases to 1.08%. Test 9 shows that if we draw OP samples from a wider distribution (σ = 20% of the mean), the error increases and the quality of the recovery degrades. This means that knowledge of the OP's population variance is important when building the lookup table. From Test 10, we see that if g differs from its assumed value during the LUT building procedure, the quality of recovery is degraded, with average error increasing to 3.12%. Finally, Test 6 shows that if we want to save runtime in building the LUT, we can decrease the LUT size (number of samples to be used in building the LUT); however, this comes at the cost of a modest quality degradation. Table 3 summarizes the quality of the absorption and scattering coefficients recovered using the two different models presented in Sections 3.4.2 and 3.4.3, along with an analytical baseline model, where we fix one of the parameters and determine the other using Eq. (3). The results reported are based on the final model trained after performing feature reductions and hyper-parameter sweeps (best MAPE achieved on the validation set). We can see that the analytical baseline model is no better than a random model (R 2 is negative) in terms of decomposing µ eff into µ a and µ s , while our ML models achieve better results when recovering the optical properties: a lower MAPE (5.11% − 6.71%) and a higher R 2 (0.33 − 0.59). Note that the anisotropy factor is fixed during the recovery of the optical properties. Figures 5 and 6 plot the recovered versus true values of the OPs for the ANN and GBDT models, respectively. It is evident that both models have better accuracy when predicting µ a than µ s . It is also evident that the accuracy of recovery degrades as the actual OPs further deviate from the mean value. While this is undesirable, Section 5.3 demonstrates the ability of the recovered OPs in decreasing the inaccuracy of predicting the OAR optically critical volumes. While both models achieve similar accuracy in recovering the optical properties, it is worth mentioning that the GBDT model is more likely to be generalizable to unseen test data given the features listed in Table 1, since our results (not shown here) show that the best ANN model ignored all of the features related to the orientation of sources (dot and cross products), and the detected energy weights at the different ICDs. It relied mainly on the attenuation coefficients recovered from the look-up tables. On the other hand, the GBDT model only ignored the cross products, which is expected as each pair of sources contribute three different features (x, y and z components of the cross-product vector), and it is difficult for a model to extract meaningful results from this large set of cross-product features. Figure 7 shows the top 5 features (for each   of the µ a and µ s GBDT models) ordered in terms of percentage of total number of tree splits. A higher percentage means the corresponding feature contributes more to the regression of the output value. We can see that the model highly relies on the detected energy weights at the different ICDs, along with the dot products, the recovered µ eff and the source-detector separations.

ML results
To this end, we use the GBDT model in the next Section, where we re-allocate the source powers with the recovered OPs.

Effect of OP recovery on light dosimetry
Now that we have recovered the optical properties, we simulate Step 8 in our real-time feedback system as shown in Fig. 1. We want to look at the effect of the recovered optical properties on the quality of the treatment plan after re-allocating the source powers. To this end, we compare the critical volume of the tumor and organs at risk (white matter and grey matter), v 100 (tissue volume receiving 100% of its light threshold dose. Ideally, we want this value to be zero for an OAR and 100% for a tumor), with three different treatment evaluations summarized in Table 4. The first one is a Nominal OP simulation that evaluates the quality of the treatment with the true values of optical properties (randomly chosen from a distribution as detailed below) on a plan optimized assuming the nominal (population average) OP values. This is the baseline plan over which we want to increase variation robustness. Second we evaluate the true optical properties on plan optimized with the optical properties recovered using our GBDT model. This evaluation is called Recovered OP. We compare these two to a Perfect OP plan which is optimized assuming a priori knowledge of the actual optical properties.
Whenever we optimize a plan, we target a 98% ± 0.2% predicted v 100 for the tumor for fair comparison. We have simulated all these evaluations on the nine tumor models, each with 50 randomly chosen optical properties samples. The samples were drawn from a bimodal distribution around the population average for each of the tissues involved (OAR and tumor) in order to focus on the over-estimated or under-estimated optical properties at the edges of Fig. 6. Figure 8 shows a histogram of the µ a (left) and µ s (right) drawn, and the bimodal distribution is clearly visible.

Plan Name Description
Nominal OP Evaluating the true (actual) OPs on a plan optimized with population average OPs (baseline).

Recovered OP
Evaluating the true OPs on a plan optimized with the recovered OPs (main result).

Perfect OP
Evaluating the true OPs on a plan optimized with the true OPs (Ideal plan).
This simulates the ideal scenario of knowing the exact values of optical properties during the pre-treatment phase.
If v 100 from the Perfect OP is 0, while the Recovered OP v 100 is not, the deviation is set to ±100%. The v 100 deviation for the Nominal OP plan is defined in the same way.
A negative value of v 100 deviation means that the evaluated outcome by the treatment plan overestimates the CV to OAR, while a positive value means the CV is underestimated. Ideally, we would like the range of the box plot to be zero, which means that with 100% confidence, the predicted outcome of the treatment plan reflects the real outcome of the treatment.
From the figures on the left in Fig. 9, we can see that the tumor and OAR CVs are significantly mispredicted when optimizing with the nominal optical properties, while the deviation from the ideal plan is significantly decreased with our recovered µ a and µ s . This is also reflected in the box plots on the right, in which the misprediction range with the Recovered OP is significantly reduced. This means that our recovery of optical properties and re-planning increases the treatment robustness against optical property variation. While some of the OAR data points are above the Perfect OP line (left figures), which means the generated plan overestimated the CVs and this can be tolerated, it remains important to get as close as possible to the perfect line on average to be able to trust the treatment plan's predicted outcome. Table 5 reports the arithmetic average and standard deviation (calculated per tumor case then averaged across the nine cases) across all simulations for all evaluations. Similar to the figures, we can see that the recovered model is decreasing the v 100 deviation from the ideal plan. It is worth mentioning that the recovered critical volumes of OAR are based on a 90% guardband on the light dose threshold (See Table S3 in the supplementary material). If we remove this guardband, the optimizer's reported volumes will be much less (e.g. the average grey matter v 100 decreases from 42.5 cm 3 to 8.4 cm 3 ).

Results on unseen data
So far, we have used the same nine tumor models introduced in Section 4 for training and evaluation. While we divide the training samples into training, validation and test sets, the models generated might be biased towards the nine benchmarks under study. To assess the generalizability of the ML models, we have created three new tumor models following a similar procedure to Section 4. (see Table S1 in the supplemental document for more details); these  tumors are used to test the system, but are not used to train. We have generated 25 samples for each, and evaluated the treatment plan efficacy post OP recovery. The MAPE of predicting µ a and µ s degraded to 13% and 16%, respectively, for the GBDT model. This is expected as the reported R 2 in Table 3 is only around 0.5. However, looking at the treatment effect in the last 3 columns of Table 5, we can see that the deviation of the OAR CV from a Perfect OP plan is still reduced by the recovered OPs.

Results on wider OP distributions
The models presented here are highly dependent on the assumed population average and standard deviation (here taken as 10% of the mean) values of the OPs. If the actual distributions differed in a real scenario, the models would have to be re-trained to get the same accuracy in recovery. However, it is expected that the conclusion of reducing variation in the predicted critical volumes will be the same. For example, the range of tumor µ a taken here was [0.12, 0.24] mm −1 and that of µ s was [6.8, 13.25] mm −1 . Several works have shown that the optical properties in GBMs can have a wider range (due to hemorrhage induced by fiber insertions for example) [11,45]. To this end, we have evaluated our trained models with a wider range of 30% standard deviation (the range of tumor µ a was [0.01, 0.36] mm −1 and of µ s was [0.45, 19.8] mm −1 ), and our simulation results on the treatment plans optimized with the recovered OPs still showed a reduction in the v 100 deviation as evident in Table 6. Table 7 summarizes the average runtime of the overall system across the nine tumor models.

Runtime of overall system
The results show that with our models, we only need to stop the treatment for around 1 min to recover the optical properties and re-optimize the source powers. The runtime of the system is mainly dominated by the pre-treatment phase, where it takes approximately 3 hrs to determine the best placement for the diffusers, and another 3.7 hrs to build a look-up table for the individual tumor models so we can recover µ eff . However, this is a one time cost for each patient and it is done prior to treatment, and hence does not add any discomfort to the patient. It is worth mentioning that training the model with all hyper-parameter sweeps and feature selections took Table 6. Average and standard deviation of v 100 (across 9 tumor models with 25 samples each) for the different tissues with the plans summarized in Table 4 for an OP standard deviation of 30% of the mean.  approximately 36 hrs. However, with enough patient data, the system only needs to be trained once (or periodically to update the models with new patient data). Therefore, training time is not considered part of the overall system runtime.

Discussions
Interstitial PDT has shown promising results and attracted more interest recently as a treatment alternative in oncology. With the continued development of new non-toxic photosensitizers, iPDT can now be tailored to various tumor types and locations like the brain, bladder, prostate and pancreas. However, controlling the light propagation in the tissue, and thereby the PDT dose and damage to OAR, remains one of the major challenges that prevent the treatment from being fully adapted in the clinic. Several software implementations have been proposed to model the light dose distribution and optimize iPDT planning in an attempt to confine the light propagation to the target volume [8,18,[49][50][51]. However, most of these studies lacked the ability to generate patient-specific plans, as they disregard biological variations among patients. Specifically, tissue optical properties that govern how the light propagates inside biological tissues vary among patients and tissue types, and within the same tissue during the treatment. This variation significantly affects the outcome of the treatment, and might lead to underestimating the damage to OAR or overestimating the tumor damage. This work proposes two machine learning models to recover the optical property values during treatment to feed them back to planning software so it can re-optimize the power allocation to the light diffusers. The goal is to dynamically adapt to the changes in the treatment parameters to tighten the variation in the prediction of OAR optically critical volumes. The models are trained with different features related to photon measurements taken during the treatment (by using the cylindrical diffusers as detectors), the recovered attenuation coefficients and the treatment geometry and source locations. Simulation results on virtual GBMs showed that the GBDT model can recover µ a with 5.11% error and µ s with 6.25% error on the validation set. Furthermore, Fig. 9 and Table 5 show that modifying the treatment plan in the real-time feedback system with the recovered optical properties significantly reduces the misprediction in OAR's and tumor's critical volumes compared to the typical approach which optimizes a plan using nominal optical properties. From the box plots in Fig. 9, we see that compared to the CV predicted by an ideal treatment plan optimized with perfect knowledge of optical properties, 50% of the simulations on the baseline plan resulted in mispredicting the white matter v 100 by −89% to 44%. For the same simulations, the recovered optical properties helped increase robustness by decreasing the error range to [−22%, 14%]. Similar results are seen for the grey matter, where the deviation decreased from [−59%, 36%] to [−15%, 11%], and the tumor ([−2%, 3%] to [−0.6%, 0.8%]). This is achieved while only needing to stop the treatment for an average of 1 min to re-optimize the plan, which suggests it is feasible to perform multiple iterations of this stop-recover-replan procedure to further improve the accuracy of OP recovery and the treatment robustness. Figure 7 shows that our best model relies heavily on accurately recovering the effective attenuation coefficient first in order to decompose it into µ a and µ s . To this end, we had to build a look-up table in the pre-treatment phase that added approximately 3.7 hrs to the planning time. This relatively long time doesn't add any discomfort to the patient as it is performed before treatment, without their presence. Additionally, it can be parallelized to reduce time; using either four servers or a server with GPU would reduce the time to less than 1 hr. However, it would still be better to utilize this time to enhance quality of the pre-treatment planning optimization algorithms instead. Therefore, our future work aims to train models without the need for µ eff , which will allow us to avoid building this LUT and save us 3.7 hrs of CPU time. This can be achieved by introducing additional input features and/or improving the models used, especially the ANN model, where we can try different optimizers and possibly deeper architectures [35].
Finally, this simulation-based work serves as a prospective methodology to investigate the effect of different treatment parameters and planning configurations without worrying about technical difficulties of the in-vivo measurements. However, for better clinical practicability, future work should address a limitation in this work. Here, we only focus on the light dosimetry and the optically critical volumes of the tissues. However, it would be better to directly predict the treatment outcome in terms of tissue damage instead. For this, one has to accurately model the PS and oxygen distributions throughout the tissues (here, we assume a per-tissue uniform PS concentration and an unlimited oxygen) and the effect of photobleaching. Nevertheless, we believe that the conclusion of this work -recovering the optical properties with ML models reduces variation in treatment outcome prediction -would still hold, and that this work should encourage the medical community to build a database of clinical trials that can be used as training data to build a more generic model for optical property recovery.

Conclusion
In this work, we have simulated a real-time feedback system to improve the robustness of iPDT against tissue optical property variation. We start by generating an optimized treatment plan using PDT-SPACE with population average optical properties, and then during the treatment we simulate light dosimetry measurements by modeling cylindrical detectors in MC simulations. Using these measurements, we use a pre-trained machine learning model to recover the optical properties and re-optimize the powers allocated to the diffusers. Results show that the ML model can recover optical properties with less than 6% error on average, and that these recovered OPs can increase the robustness of the predicted outcome of the treatment by significantly reducing the deviation from the predicted outcome of an ideal plan optimized with a priori knowledge of the tissue OPs.
This approach can be combined with any other method aimed at improving PDT efficacy, such as PpIX fluorescence in the target tissue or in combination with other photobiological effects, such as replacing the PDT death threshold with immune response or any other photobiological effect as long as these effects can be spatially quantified.