Patient-specific stopping power calibration for proton therapy planning based on single-detector proton radiography

A simple robust optimizer has been developed that can produce patient-specific calibration curves to convert x-ray computed tomography (CT) numbers to relative stopping powers (HU-RSPs) for proton therapy treatment planning. The difference between a digitally reconstructed radiograph water-equivalent path length (DRRWEPL) map through the x-ray CT dataset and a proton radiograph (set as the ground truth) is minimized by optimizing the HU-RSP calibration curve. The function of the optimizer is validated with synthetic datasets that contain no noise and its robustness is shown against CT noise. Application of the procedure is then demonstrated on a plastic and a real tissue phantom, with proton radiographs produced using a single detector. The mean errors using generic/optimized calibration curves between the DRRWEPL map and the proton radiograph were 1.8/0.4% for a plastic phantom and −2.1/ − 0.2% for a real tissue phantom. It was then demonstrated that these optimized calibration curves offer a better prediction of the water equivalent path length at a therapeutic depth. We believe that these promising results are suggestive that a single proton radiograph could be used to generate a patient-specific calibration curve as part of the current proton treatment planning workflow.

Keywords: proton radiography, proton therapy, treatment planning, stoichiometric, calibration (Some figures may appear in colour only in the online journal)

Introduction
Proton therapy has the potential to deliver better dose distributions than radiotherapy by photon/electron beams, but there are a large number of uncertainties that make predicting the beam range in the patient difficult. Currently, these uncertainties are accounted for with increased planning margins, resulting in the irradiation of more tissue than is necessary. One of the largest uncertainties comes from the conversion of the patient anatomical information from one of photon attenuation to one of proton energy loss, requiring x-ray computed tomography (CT) numbers to be converted into relative stopping powers (RSPs). This process is thought to contribute 0.5-1.8% of the total proton beam range uncertainty, dependent on whether the uncertainty in the mean ionization energy is considered (Schaffner andPedroni 1998, Paganetti 2012).
Currently most centres use a single function (referred to in the remainder of this document as the HU-RSP curve) for this conversion, although it is known that there is not a one-toone correspondence between CT numbers and RSPs (Yang et al 2012). The stoichiometric approach first proposed by Schneider et al (1996) is widely accepted as the most accurate method for producing the calibration curve for this conversion. The final step of this procedure involves calculation of the theoretical RSPs of human biological tissues, using composition data for average, healthy, adults (Woodard and White 1986, White et al 1987, ICRU 1989. As such, the stoichiometric calibration is not specific to the patient being imaged. Yang et al (2010) showed that neglecting typical patient-to-patient variations in density (4%), calcium content (2%) or hydrogen content (1%) causes inaccuracies in the RSP prediction of 2.2%, 2.0% and 1.3%, respectively.
Proton radiography offers a potential solution to this discrepancy as it produces maps in units of water-equivalent path length (WEPL). Reconstruction of individual proton radiographic projections allows a 3D map of the patient RSP to be generated, in what is referred to as 'proton CT'. Proton radiography is not used routinely, for this or any other purpose, however, largely because the most common approach requires high speed tracking of the protons before and after the patient and measurement of the residual energy of the protons exiting the patient (Schneider and Pedroni 1995, Pemler et al 1999, Schneider et al 2004, Schulte et al 2004, Shinoda et al 2006, Talamonti et al 2010, Sipala et al 2011. The majority of these prototype designs are bulky, expensive and difficult to incorporate into the clinical environment. Using only a single detector positioned beyond the patient offers a more applicable solution for proton radiography and as such there has been much research into this field recently (Gelover-Reyes et al 2011, Seco and Depauw 2011, Telsemeyer et al 2012, Testa et al 2013, Poludniowski et al 2014. The general concept is to measure a bulk dose. However, while convenient, not tracking protons means the individual paths cannot be reconstructed and the bulk dose becomes mixed with protons that have taken different paths through the patient. Methods therefore have to be introduced to detect and handle this effect, known as 'range mixing' (Bentefour et al 2012).
Optimizing the calibration curve with proton radiography offers the potential to combine the mapping of RSPs from radiography with the high spatial resolution of the x-ray CT. In a single previous effort to produce patient-specific calibration curves using proton tracking radiography, Schneider et al (2005) split the calibration curve in the soft tissue region (between −200 and 135 HU) into five parts and randomly varied the curve 20 000 times until the difference between the proton radiograph and a digitally reconstructed radiograph WEPL (DRR WEPL , generated from the x-ray CT dataset) is minimized. Only protons that travelled straight through the proton tracking modules were considered for analysis. Their work showed promising results, with an improvement in the mean range prediction from 3.6 to 0.4 mm for a dog patient. However, no further investigation of the technique and its introduction to routine clinical practice has since been reported by this or any other groups. In this work we present a method for optimizing the calibration curve using single-detector proton radiography. Specifically, we: (a) Devise and implement an optimization scheme that minimizes the difference between the measured proton radiograph (known WEPL map) and DRR WEPL map generated from the x-ray CT dataset, by varying the HU-RSP calibration curve. (b) Validate the optimizer against a set of synthetic datasets, in which the CT number and RSP are controlled and known. (c) Demonstrate the full approach on phantoms, using WEPL maps measured by singledetector time-resolved proton radiography. (d) Confirm that the estimate of the WEPL map at the treatment depth is improved using the optimized calibration curve, by comparing with a WEPL map measured by the method of dose extinction.

Optimization scheme
The structure of the optimization procedure is shown schematically in figure 1. The DRR WEPL map is created using the Siddon transform implemented in Plastimatch (plastimatch.org), and the length of intersection of each ray trace with every voxel, together with the voxel CT HU value, is stored in a matrix file for future computations. A single ray corresponds to a single pixel in the DRR WEPL map, so the density of rays matches the resolution of the DRR WEPL . The CT HU values are then converted into RSPs using the given HU-RSP calibration curve. For a single ray trace i, the individual intersection lengths l j multiplied by the RSP of the voxel, summed across all the voxels j the ray passes through, gives the calculated WEPL, w , i c of the ray trace: Combining all the ray traces together gives the DRR WEPL map. In equation (1) F(HU j ) is a function that describes the calibration conversion and it is typically a single piecewise linear function: The division of the linear sections is a choice of the user. In this work all results are compared to a 'generic' calibration curve used for general treatment planning at our institution, formed of three linear portions (−1000 to 40 HU with a gradient of 0.001, 40 to 2990 HU with a gradient of 0.0005, and a line with zero gradient between 2995 and 9960 for titanium). For optimization, this curve was split into 25 linear spline points, based on the material binning suggested for Monte Carlo simulations by Schneider et al (2000), with an additional point at water. The 26 points were [−1000; −950; −120; −83; −53; −23; 0; 7; 18; 80; 120; 200; 300; 400; 500; 600; 700; 800; 900; 1000; 1100; 1200; 1300; 1400; 1500; 2990] (the titanium correction was excluded from the optimization procedure as this would ordinarily be overridden as part of treatment planning).
For the optimization, the parameters a, b, a′, b′… in equation (2) were varied until the difference between the DRR WEPL and the proton radiographic WEPL reached a minimum. The cost function describing this difference Δ, which must be minimized, was thus defined: where w i m is the proton radiography (measured) value of the i th pixel and w i c is the WEPL (calculated) value of the i th pixel.
The optimization was performed using Matlab's built-in Nelder-Mead optimization function, 'fminsearch'. In addition to the convenience of a Matlab built-in function, Nelder-Mead optimizations are more robust to local minima and do not require an equation to be provided for the derivative of the cost function (Lagarias et al 1998). Details of the input variables used in optimization scheme can be found in table 1. When optimizing, 24 spline points (two points were fixed, see below) were allowed to vary in the y-direction only (HU stays fixed but RSP changes). The number of variables, n, therefore was equal to 24. We also imposed other restrictions, namely that: (i) the curve must be monotonic; and (ii) that the curve must pass through the points of air (HU = −1000, RSP = 0.001) and water (HU = 0, RSP = 1); by introducing a high cost penalty in the optimization if these constraints were not met.
For the synthetic datasets, for which the desired solution was known, repeats ran on a while loop until the solution converged. For the real measured datasets we ran two repeats, each of 2400 iterations (100 × n), using the output of the first iteration as the input for the second. This allows the optimizer to form another simplex, which has been shown ensures the Nelder-Mead algorithm can produce a locally optimal solution (Simon 2011).

Optimizer validation against simulated datasets
To validate the function of the optimizer we worked with synthetic data. Real CT datasets contain noise and real proton radiographs are subject to multiple Coulomb scattering (MCS), so it would be otherwise difficult to discern whether the optimizer had reduced the errors to the theoretical minimum. Using literature tissue compositions from Woodard and White (1986), White et al (1987) and ICRU (1989), theoretical CT numbers were calculated using the parameterization of our scanner and the standard procedure described by Yohannes et al (2012), while the RSPs were computed using the approximation of the Bethe-Bloch formula given by Schneider et al (1996). Simulated phantoms were constructed and perfect, known WEPL maps were generated based on the geometry. DRR WEPL maps were created with no divergence in the ray tracing to isolate the performance of the optimizer. The calibration curves were optimized against these known WEPL maps, for a set of synthetic datasets shown in figure 2.
2.2.1. Single materials, multiple materials and heterogeneous phantom. The simplest task for the optimizer is to alter the calibration curve at a single point only, which is the case for homogeneous objects composed of one material. To test this, 30 individual spherical phantoms were simulated, each with a radius of 15 cm and composed of a single homogeneous material (tissues from table 2 of Yohannes et al (2012)). An example for the breast tissue is shown in figure 2(a). To test the ability of the optimizer to simultaneously reduce the error for multiple materials at once, a synthetic phantom was created (figure 2(b)) of 5 × 5 × 5 cm 3 blocks of 28 different materials across the whole HU range (same tissues as for the homogeneous spheres, excluding 'blood' and 'cell nucleus'). To verify that the optimizer works through heterogeneous materials, a synthetic phantom was composed of two of the multiple material phantoms (figure 2(b)) in succession with the rows of the second phantom offset. In both the multiple material and heterogeneous phantoms the voxels were 5 × 5 × 5 mm 3 in size, so each material block contained 10 × 10 × 10 pixels.

Anthropomorphic phantom and real patient CT
data. An anthropomorphic pelvic phantom (custom made by The Phantom Laboratory, Greenwich, NY) was CT scanned (figures 2(c), (e) and (f)). To avoid discrepancies due to CT artifacts and to allow a known WEPL map to be constructed, the CT numbers of each organ were overridden to a single value and assigned a single RSP. Although currently not used clinically because of the potential dose to the rectum, better proton range prediction could allow for the use of anterior proton fields for prostate treatments, which would offer a significant dose sparing advantage over photon treatments (Tang et al 2011). As such, optimization was applied over an anterior field. As a final test, the CT numbers of each organ in a lung patient were overridden to a single value (figure 2(d)). For real patients we only need to know how the curve should be optimized for the field to be treated, so for speed the optimization was performed on a smaller cropped region, as shown in the inset of figure 2(d).

Impact of CT noise
In all previous validation tests, optimization involved minimizing the difference between two perfect images: (i) a DRR WEPL formed through a CT with no noise; and (ii) a simulated known WEPL map. In real patient cases the CT will contain noise, which is a stochastic phenomenon, usually assumed to have a Gaussian distribution (Chvetsov and Paige 2010). Therefore to investigate the impact on the optimization, the CT numbers of each organ in the pelvic phantom (figures 2(c), (e) and (f)) were overridden to a Gaussian distribution of values. Optimization was performed on datasets in which a range of noise levels (standard deviations of 1, 2, 3, 5 and 10%) was applied to the scaled CT numbers (+1000 HU compared to usual values) of each organ.

Proton radiographic and dose extinction measurements
All radiographic measurements were made in the proton gantry at Massachusetts General Hospital, using a spread-out-Bragg peak with a range of 20 cm and a modulation width of  18 cm (allowing us to capture all WEPLs within the range 2-20 cm). Proton radiographs were acquired using the time-resolved proton radiographic method, the full details of which can be found elsewhere (Lu 2008, Gottschalk et al 2011, Testa et al 2013. In summary, rotation of the range modulator (RM) wheel, at a typical well-defined 600 rpm, delivers a series of Bragg peaks spread out in depth. Using the RM wheel essentially as a precise clock, at each depth there is a unique time-varying dose pattern which can be converted to a water-equivalent depth based on a calibration in water.
Our calibration was based on the root-mean-square (RMS) deviation of each modulator cycle (Gottschalk et al 2011) and the detector used was an array of 249 semiconductor diodes arranged on an octagonal matrix (see figure 5(b)), with a diagonal pitch of 7.07 mm and a linear pitch of 10 mm separating diodes on the same row. The data acquisition module has a sampling time of 2 ms, which is sufficiently small to capture the 100 ms cycle of the RM wheel.
Optimization was conducted as detailed in section 2.1, with DRR WEPL maps replicating the geometry of the proton radiographic measurements. In our gantry protons are assumed to originate from a single point 227 cm upstream of the isocentre, so the beams are divergent in the ray tracing. The DRR WEPL map was produced at the same location as our diode array in the measurements, which was placed directly below the phantom (with no air gap). The set up for the proton radiographic measurements is shown in figure 3(c).
To determine if the optimization procedure results in a more accurate WEPL map at the therapeutic depth, we compared with the simple, robust method of dose extinction. In this method a controlled dose (10 MU) was delivered and increasing thicknesses of water-equivalent material were placed between the proton beam and the phantom until the dose to all of the detectors is zero. Depth dose curves for each diode were then formed, from which the thickness that stops 50% of our protons, also known as the R80, was computed. These R80 values were subtracted from the beam range (20 cm) to compute the WEPL as measured by dose extinction (DE WEPL ).

Plastic and real tissue phantoms
To test the optimization procedure we imaged the phantoms shown in figure 3 (and explained in the subsequent paragraphs). They are both composed of two halves. The reason for this is that radiographs are made through the entire patient, but the desired quantity for treatment is the WEPL to a therapeutic depth. Assuming the therapeutic depth is the distal edge of the top half of the phantom, the procedure consists of the following three steps: (i) acquire timeresolved proton radiograph through the entire phantom; (ii) optimize the calibration curve by minimizing the difference between the DRR WEPL and the proton radiographic WEPL; and (iii) predict the WEPL to the distal edge of the top half using the optimized calibration curve. To validate that the optimization improves the WEPL estimate, we split apart the phantoms and conducted dose extinction measurements (section 2.4) on the top halves only.
In the top half of the plastic phantom ( figure 3(a)), there are eight tightly packed inserts from the Gammex 467 phantom (Gammex Inc., Middleton, WI), arranged in a 12 cm PMMA box (1 cm thick walls) with a central divider. Water was used to fill in the gaps, up to the insert height (7 cm). The insert materials ranged from RSPs of 0.94 (Adipose) to 1.60 (Cortical bone), which together with the thickness of the PMMA box gave a range of WEPLs from 7.6 to 12.2 cm. The bottom half of the plastic phantom was a piece of homogeneous solid water, equivalent to 5.12 cm of water.
For the real tissue phantom, the top half (same 12 cm PMMA box as for the plastic phantom) was filled with a single piece of boneless beef chuck (fat left on), a non-smoothed pork bone and a strip of veal liver along the base ( figure 3(b)). The bottom half was composed of a simple commercial lunchbox filled with another single piece of boneless beef chuck and another non-smoothed pork bone. The bones in the top and bottom halves were oriented 90° to each other, with some overlap. Given that the real tissue was not frozen or fixed during the experiment, CT scans were acquired approximately 1 h before and 30 min after the proton radiographic measurements to allow for quantification of any changes in tissue shape. The latter volume was warped to the former volume using a b-spline deformable registration (with the software Plastimatch) and tissue changes were computed along each direction to assess the stability of the tissue during the experiment.

Handling range mixing
The inability to be able to reconstruct individual proton tracks is one of the major obstacles that needs to be addressed for the routine use of single detector proton radiography. Since protons undergo MCS, they take a curved trajectory through the body and thus there can be multiple paths to a given point. If all of the most probable paths pass through the same amount of water-equivalent material, then all the protons at the given point will have a well-defined energy distribution (assuming the protons entering are mono-energetic) and it is possible to define the WEPL. If, however, protons can get to the point of interest via two (or more) equally probable paths with different energy losses, then the energy distribution is not well defined and the final signal becomes mixed. Since single detector proton radiography does not allow us to reconstruct the proton paths, it is imperative that instances of range mixing are identified as the WEPL value is no longer reliable and the particular diode should not be used as part of the optimization.
For our purposes we applied a simple method to remove potentially range mixed diodes, by assuming that range mixing occurs at the boundaries of two materials. To identify the boundaries, we converted a 1 mm resolution DRR WEPL map into a map of local standard deviation by assessing the standard deviation of all surrounding pixels (on a 3 × 3 square). Any pixels having a standard deviation of >1% were defined to be close to a boundary and could thus be subject to range mixing. Any diode locations that overlapped with these pixels were then masked out during optimization.

Optimizer validation against simulated datasets
The results for all the synthetic dataset validation tests are detailed in table 2. Tests 1 and 2 were completed on a 32 bit 4 GB CPU architecture, while tests 3, 4 and 5 were completed on a 64 bit 8 GB CPU architecture. The optimizer was able to either completely eradicate or significantly reduce the root-mean-square-error (RMSE) across the WEPL map in all tests, with a wide range of optimization times (18 s to 4.1 h) dependent on the complexity of the phantom and the CPU architecture. Figure 4 shows an example result for the anthropomorphic pelvic phantom: (a) shows the known total WEPL map; (b) shows the calibration curves in the soft tissue region before and after optimization. Before optimization, the DRR WEPL difference map and histogram of differences compared to the known WEPL map are shown in (c) and (d) respectively. The same results using the optimized calibration curve are shown in (e) and (f).

Impact of CT noise
The optimizer was shown to handle noise in the CT number well, with results before and after optimization detailed in table 3.

Optimization against time-resolved proton radiographs
Example proton radiographs of the real tissue phantom, acquired with the time-resolved technique, are shown in figure 5: a 3D map is shown in (a), a 2D map (showing the diode locations) is shown in (b), and the 2D map after masking of the pixels potentially subject to range mixing is shown in (c). As explained in the methods, optimization was only conducted on the unmasked diodes. The results following 120 and 125 min of optimization for the plastic and real tissue phantoms, respectively, can be seen in figure 6. The relevant statistics before and after optimization are detailed in table 4. From the registration it was found that the tissue remained quite static during the course of the 3 h experiment. With reference to the directions in figure 3, it was found the mean/max deformations were: −0.53/ −1.48 mm in the LR direction; −0.46/ −1.99 mm in the AP direction; and 0.07/ −1.51 mm in the SI direction.

Estimation of the WEPL at the therapeutic depth
To validate that the optimized calibration curve offers an improvement in the WEPL estimate at a therapeutic depth, independent measurements of the top half of the phantom were made using dose extinction. Integrated doses were recorded by the imager behind the top halves of the phantom, for a number of thicknesses of solid water on top. A comparison of the DRR WEPL maps at the therapeutic depth (constructed using the generic and optimized calibration curves) with the DE WEPL maps are shown in figure 7, with the relevant statistics detailed in table 4.

Validation against synthetic datasets
Synthetic datasets were used to validate the optimizer. Based on the assigned RSPs for the HU values in the synthetic dataset, perfect WEPL maps could be created against which to optimize. In all cases the error defining the cost function was either completely eradicated or significantly reduced. The only cases where the optimizer failed to completely minimize the error were for the multiple material and heterogeneous phantoms. In these phantoms there were 28 different materials, but the optimized calibration curve was only split up into 25 linear sections. As such, it would have been very unlikely for the optimizer to completely eradicate the errors. The fact that there was a significant reduction demonstrates the optimizer's robustness to potentially conflicting requirements. The results also showed that the complexity of the dataset does not impact on the results, with complete eradication of the error for the anthropomorphic and patient geometries. The perfect datasets were critical in constructing the optimizer, but it was also important to validate that the optimizer was robust in more realistic cases. It was found to function as expected with varying CT noise levels, with an almost identical final error following optimization. As to be expected, there was an intrinsic spread in the errors that the optimizer could not remove due to the noise.

Optimization against time-resolved proton radiographs
The results in figure 6 demonstrate that the procedure works when optimizing against real proton radiographs. For both the plastic and real tissue phantoms all computed statistics were improved. Following optimization on the plastic phantom the RMSE difference was reduced from 3.4 to 2.6% and the mean was reduced from 1.8 to 0.4%. For the real tissue phantom the RMSE was reduced from 3.6 to 2.9% and the mean error was reduced from −2.1 to −0.2%. As expected the improvements are more modest than for the synthetic datasets that had a single solution, but they are sufficiently large to warrant the clinical value of the technique.
For the real tissue phantom the tissue shape changes were small during the course of the experiment. Maximum changes of −1.48/ − 1.99/ −1.51 mm in the LR/AP/SI directions were found between the pre-and post-radiographic CT images, which were acquired 4 h apart in time. The time difference between the pre-radiographic CT image, upon which the optimization was performed, and the proton radiographic measurement was around 1 h. Assuming a constant rate of shape change, differences in the real tissue between the CT image and the proton radiograph would be less than 0.5 mm in all directions. Given the pitch between adjacent diodes was never less than 7 mm, we assume that such changes did not impact on our results.

Estimation of the WEPL at a therapeutic depth
Splitting apart the phantoms, dose extinction measurements allowed for the generation of WEPL maps at therapeutic depths. Comparing the DE WEPL maps with DRR WEPL maps at the same depth, it was clear that for both phantoms the optimized curve matched the DE WEPL better than the generic curve. The RMSEs for the plastic phantom were 3.4/2.5% for generic/ optimized calibration curves. The same statistics for the real tissue phantom were 2.0/1.4%.  These improvements are similar to those of the previous section, suggesting that optimization of the calibration curve using proton radiography across the whole patient allows for a more accurate estimate of the WEPL at a therapeutic depth.

Technique limitations
The technique proposed here relies on the ability to produce an accurate proton radiographic image. Optimizing against an inaccurate proton radiographic image would result in an inaccurate calibration curve being produced and an inaccurate estimate of the RSPs in the patient. We are confident, however, that the single-detector time-resolved approach we utilized in this study was able to produce accurate WEPL values as in a previous study the same equipment was able to achieve millimeter accuracy in a pure water phantom (Testa et al 2013).
Another source of uncertainty is the applicability of straight-line DRR ray tracing to proton paths. It is well known that protons do not take straight line paths due to the effect of MCS, and thus a straight line approximation would underestimate the WEPL. One potential solution would be to use the optimized calibration curve as input for a Monte Carlo simulation (by binning into materials and assigning compositions and densities) so that the effects of MCS are accounted for. The optimized calibration curve could then be fine-tuned, using the results of the simulation. We believe this underestimation to be a second order effect in our results, however. The maximum WEPL was 17.32 cm for the plastic phantom and 13.31 cm for the real tissue phantom. The Highland (1975) formula to compute the root mean square projected angle of scattering σ s (radians) is given by, where S 2 = 14.1 MeV and ε = 1/9 are empirical factors; X is the thickness of the scatterer (cm); X 0 is the radiation length (36.08 cm for water in our case as we use WEPLs); p is the momentum (MeV/c); and β is the particle velocity as a fraction of the speed of light c. At the energy used (approximately 175 MeV) and at the largest thicknesses of our phantoms, σ s = 0.0292 rad for the plastic phantom and σ s = 0.0253 rad for the real tissue phantom. An angular deflection leads to a projected lateral deflection D as described by Leroy and Rancoita (2009) In our experiments the average lateral deflections were thus D = 0.29 cm for the plastic phantom and D = 0.19 cm for the real tissue phantom. Given the closest pitch between adjacent diodes (0.7 cm) was larger than both of these values, we believe that a straight line approximation was valid. If the protons travel through material with a composition different from water, such as bone, it is possible these deflections will increase. The deflections will also increase if the distance between the patient and the detector is increased.
It is likely more accurate results will be produced by assuming a cubic spline path (Li et al 2006) or a most likely path (Williams 2004), however this requires the tracking of individual particles (which is not possible with a single detector). Additionally, we are not aware of any software that can currently easily compute the length of intersection of such paths with the CT dataset. This is suggested as an area of future work.
In the current implementation the optimization times are probably too long. This time is dependent on three main factors. (i) The voxel size and number of voxels in the CT dataset determines the number of intersections stored in the ray tracing matrix file, which affects the time to calculate the DRR WEPL map. This has to be recomputed for each iteration. (ii) The number of pixels in the two images, between which the difference is trying to be minimized, affects the time to calculate the cost function. (iii) The stopping tolerance of the optimization directly affects how long the optimization takes to converge. The first of these, computation of the DRR WEPL map using the ray tracing details, was the major factor for slow optimization times in our implementation. This could be significantly reduced, however, by producing field-specific calibration curves over much smaller volumes. The proton radiograph would be acquired in the same direction as the intended treatment beam, so only those tissues in the field of view would need to be optimized. It is also possible that an alternative optimization scheme, such as gradient descent, could also speed up the minimization.
Another limitation of our specific set of measurements was the spatial resolution (pitch between diodes up to 1 cm) of the proton imager. This could be easily improved by using one of the various promising technologies currently available such as fluorescent screens coupled to charge-coupled device cameras (Ryu et al 2008, Muraishi et al 2009, Bentefour et al 2013; commercial flat-panel detectors (Telsemeyer et al 2012); or complementary metal oxide semiconductor (CMOS) active pixel sensors (CMOS APS) (Gelover-Reyes et al 2011, Seco andDepauw 2011, Poludniowski et al 2014).
Additionally, production of the DRR WEPL map is reliant on the accuracy of the CT. As such, removal or reduction of artifacts such as beam hardening or streaking (with improved reconstruction algorithms or dual-energy CT) must be considered before producing the DRR.

Clinical application
The optimization scheme developed in this work is insensitive to the method of production of the proton radiograph. Although we used the time-resolved single detector proton radiographic method, the optimization scheme could, in theory, be used for proton tracking radiographic devices. Provided the ray tracing of the DRR can be effectively modeled (i.e. the source-to-isocenter and source-to-imager distances are known), patient-specific curves could be generated.
We envisage the proton treatment planning workflow being updated as follows: (i) acquire an x-ray CT for treatment planning; (ii) acquire a proton radiograph in the intended treatment beam direction; (iii) generate a patient-specific HU-RSP calibration curve by optimization; and (iv) create a treatment plan using the patient-specific calibration curve.

Summary and conclusions
In this work we have shown that it is possible to construct a robust optimizer that can produce patient-specific calibration curves, using the information from simple singledetector proton radiographs. Following validation with a set of perfect (containing no noise) synthetic datasets, the robustness of the optimizer was confirmed using CT datasets containing noise.
We then demonstrated the technique with real proton radiographs produced using a simple diode array and with the time-resolved radiographic technique. For a phantom composed of real tissues, the optimization procedure was able to reduce the mean error between a DRR WEPL map and a proton radiograph from −2.1% using a generic calibration curve to 0.2% using an optimized calibration curve. Using this optimized calibration curve, produced using proton radiography through the entire patient, it was demonstrated that a more accurate estimate of the WEPL at a therapeutic depth could also be made. By splitting the phantom in half, the RMSE difference between the DRR WEPL map and an independent DE WEPL map decreased from 2.0 to 1.4% using the optimized calibration curve.
With further development, we believe that this procedure could be introduced as part of the proton treatment planning workflow. A proton radiograph would be acquired at the time of the planning x-ray CT and calibration curves would be generated on a patient-by-patient basis.