Semiautomated segmentation of head and neck cancers in 18F-FDG PET scans: A just-enough-interaction approach

Purpose: The purpose of this work was to develop, validate, and compare a highly computer-aided method for the segmentation of hot lesions in head and neck 18F-FDG PET scans. Methods: A semiautomated segmentation method was developed, which transforms the segmentation problem into a graph-based optimization problem. For this purpose, a graph structure around a user-provided approximate lesion centerpoint is constructed and a suitable cost function is derived based on local image statistics. To handle frequently occurring situations that are ambiguous (e.g., lesions adjacent to each other versus lesion with inhomogeneous uptake), several segmentation modes are introduced that adapt the behavior of the base algorithm accordingly. In addition, the authors present approaches for the efficient interactive local and global refinement of initial segmentations that are based on the “just-enough-interaction” principle. For method validation, 60 PET/CT scans from 59 different subjects with 230 head and neck lesions were utilized. All patients had squamous cell carcinoma of the head and neck. A detailed comparison with the current clinically relevant standard manual segmentation approach was performed based on 2760 segmentations produced by three experts. Results: Segmentation accuracy measured by the Dice coefficient of the proposed semiautomated and standard manual segmentation approach was 0.766 and 0.764, respectively. This difference was not statistically significant (p = 0.2145). However, the intra- and interoperator standard deviations were significantly lower for the semiautomated method. In addition, the proposed method was found to be significantly faster and resulted in significantly higher intra- and interoperator segmentation agreement when compared to the manual segmentation approach. Conclusions: Lack of consistency in tumor definition is a critical barrier for radiation treatment targeting as well as for response assessment in clinical trials and in clinical oncology decision-making. The properties of the authors approach make it well suited for applications in image-guided radiation oncology, response assessment, or treatment outcome prediction.


1.A. Background
FDG PET/CT has become an essential tool for clinical management of head and neck (H&N) squamous cell carcinoma (SCC). 1 This disease typically originates from the normal squamous mucosa that lines the open air spaces in the H&N region. Cancers are most often caused by irritants and carcinogens from cigarette smoke, alcohol, and chewing tobacco although more recent studies also show a role for human papilloma virus (HPV). 2,3 Once cancerous, these mucosal neoplasms have access to associated lymphatic drainage, leading to local/regional spread of the disease to neck lymph nodes. Staging for these cancers is defined by the size and invasion pattern of the primary site cancer (T stage) as well as the presence, size, and location of regional nodes (N stage). 4 More rarely these cancers spread to regions beyond the H&N region, which can be defined as metastases (M stage).
TNM staging is an important determinant of both prognosis and oncologic treatment decision-making for every patient. 4 Because of limitations of physical exam, imaging has rapidly become a critical part of staging. CT was the initial imaging modality of choice [5][6][7][8] and provided volumetric anatomic assessment based on contrast uptake into tumor versus normal mucosa. The size of local-regional lymph nodes could readily be determined and lymph nodes larger than 1.5 cm were more likely involved in cancer. Unfortunately, CT imaging was not very sensitive with limited involvement of the cancer. The advent of PET/CT provided a more sensitive imaging tool that detected cancer based on the metabolic differences in glucose metabolism with tumors showing greater FDG uptake than normal tissues. FDG PET/CT is an effective tool for diagnosis and is effective in accurately detecting sites of cancer involvement and thereby improves the accuracy of staging 1,9 and helps define areas for surgical excision and radiation therapy. FDG PET/CT has thus become the most important staging and therapy-planning tool for H&N cancers. Finally, the ultimate success of the treatment can be assessed with this more sensitive tool, although it requires a sufficiently long interval (generally 2-3 months) after radiation and/or surgery to allow the inflammation of treatment to abate. 10 FDG PET/CT is now a standard in many centers for staging, clinical decisionmaking, and response assessment. 11 Despite the advances associated with FDG PET/CT use, the previously described applications are mostly based on standard visual inspection of images with only limited use of quantitative indices. If quantification is employed, the most commonly used metrics to characterize lesions are SUVmax, SUVmean, metabolic tumor volume (MTV), and total lesion glycolysis (TLG). 12,13 PERCIST is an approach that has proposed another index, SUVpeak (i.e., the maximum average activity in a 1 cm sphere inside the tumor of interest) to assess response to therapy. 14 The work presented here focuses on improvements that can be achieved by employing more automated and therefore more consistent approaches to quantitative analysis of tumors and associated lymph nodes. We argue that highly automated segmentation algorithms can provide greater accuracy and consistency in defining radiotherapy treatment volumes and are important for quantifying responses to therapy.

1.B. Problem statement and requirements
Segmenting H&N lesions in PET images is demanding, as demonstrated by the examples shown in Fig. 1, which are part of our evaluation set (Sec. 3.A). Also, for applications like outcome prediction, it is not a priori known if segmenting and quantitatively describing the primary tumor only is sufficient or if all lesions need to be quantified individually. Thus, to answer this question, all lesions need to be segmented individually and can be later combined for comparison. Specifically, our requirements for a head and neck cancer (HNC) segmentation approach are as follows.
• The segmentation method must offer the ability to individually segment lesions with varying contrast that are in close proximity to each other or to normal structures with FDG uptake, which is especially important for HNC FDG PET imaging (Fig. 1). • The segmentation approach must allow physicians to segment lesions in a time-efficient manner, as required for clinical use. Note that due to the complexity of HNC FDG PET images, the (correct) interpretation of scans (i.e., locating lesions, differentiating between normal and abnormal structures, etc.) is already time intensive, and therefore, this point becomes even more important for this specific application. • The method must be intuitive to use. Also, in the case of semiautomated segmentation methods, it must allow physicians to make corrections to semiautomated segmentation results as deemed appropriate. • The approach must offer good agreement with manual segmentation, which is the de facto standard. • The method must offer better consistency to reduce intraand interoperator variability. This is especially important for applications like radiation treatment planning, where operator induced variability can compromise outcome.

1.C. Related work
The definition of the gross tumor volume (GTV) or calculation of FDG uptake metrics like average SUV, MTV, or total glycolytic volume (TGV) requires the segmentation of target structures like primary tumors and metabolically active lymph nodes in volumetric FDG PET images. For PET image segmentation, a number of approaches have been proposed, including adaptive thresholding, 15 "standard" region growing, 16 a combination of adaptive region-growing and dualfront active contours, 17 k-means-based partitioning into tumor and background regions, 18 a gradient-based method, 19 a combination of two segmentation methods to reduce inconsistencies, 18 and fuzzy locally adaptive Bayesian (FLAB) method. 20 For an overview of published methods, the reader is referred to the review by Foster et al. 21 Recently, a number of graph-based segmentation methods have been proposed for PET image segmentation, and a summary is given in Table I. Similarly as in other medical image analysis applications, 22 these methods show great promise and represent a quite powerful segmentation framework. However, formulating a suitable graph-based cost function for HNC segmentation ( Fig. 1) is not straightforward due to potentially contradicting/conflicting demands/requirements (e.g., lesion with inhomogeneous uptake vs. lesions in close proximity). Such an issue can be avoided by utilizing graphsegmentation algorithms in an interactive fashion, allowing the user to contribute expert knowledge, if needed. For example, in case of random walks and graph cuts, the user can add additional object and background seeds to alter the segmentation result. However, these changes can affect the segmenta-T I. Overview of graph-based segmentation methods utilized for PET image segmentation. Cosegmentation of single lesions in PET and CT images utilizing a Markov Random Field (MRF) approach; the optimization is solved using a graph-cuts-based method tion globally, which can lead to unexpected effects, making segmentation refinement unintuitive. In contrast, we have developed approaches that follow the just-enough-interaction (JEI) principle for segmentation of livers, 23 lymph nodes, 24 and lungs in 3D (Ref. 25) and 4D (3D inspiration and expiration scan) 26 MDCT scans as well as for segmentation of IVUS data sets. 27 The idea behind this principle is that the user can provide simple cues to guide the segmentation algorithm in an efficient and intuitive manner. Besides the belief that the general JEI principle can offer improved performance in PET image analysis, no such approach for PET image segmentation has been available so far. Specifically, this paper represents the first demonstration that JEI strategy is suitable for this application domain.

1.D. Contribution of this work
In this paper, we present a clinically relevant JEI-based approach for FDG PET image segmentation, which utilizes a graph-based segmentation method. For this purpose, we transform the segmentation task into a suitable graph-based optimization problem. In addition, we introduce several efficient interaction approaches to enable the user to modify the initial segmentation result. To demonstrate the applicability and performance of our approach for the task of HNC segmentation, we evaluate our approach on a large cohort of FDG PET scans and compare its results to manual segmentation, the current clinical de facto standard for HNC segmentation in FDG PET volumes.

METHODS
We utilize a highly automated optimal surface segmentation (OSS) approach, which is a variant of the LOGISMOS (layered optimal graph image segmentation of multiple objects and surfaces) segmentation framework, 33 for the segmentation of uptake in FDG PET volumes. OSS was introduced by Li et al. 34 The basic idea behind this approach is to formulate a segmentation problem as a graph-based optimization problem, which enables finding an optimal solution (i.e., segmentation surface) according to the utilized cost function. For more information on OSS, the reader is referred to Li et al. 34 In Subsections 2.A-2.E, the approach for lesion segmentation in FDG PET volumes is described in detail. First, we introduce the main segmentation approach (Secs. 2.A-2.C). Second, we outline several additional segmentation modes, which allow effective handling of frequently occurring situations like lesions in close proximity (Sec. 2.D). Third, we describe an approach that enables the user to efficiently refine segmentations (Sec. 2.E), if needed.

2.A. Graph construction
To construct a graph that represents the segmentation problem, the user first needs to indicate a lesion to be segmented. For this purpose, the user specifies a rough center point ce k inside of lesion k. This can be done in two ways: (a) the userselected point ce k user is directly utilized as ce k or (b) ce k user can be automatically recentered to the highest uptake voxel within a search radius of 7 mm from ce k user . Option (b) typically leads to more consistent results because a property of the PET image I is utilized to adjust the location of the center point. Therefore, it is used as the default setting, which can be changed by the user. However, recentering may cause problems, for example, when segmenting lesions with necrotic centers.
Based on ce k , a graph structure is generated (Fig. 2). For this purpose we assume that the lesion to be segmented is roughly spherical in shape, which is frequently the case for lesions in the H&N area. For more complex shaped lesions, additional center points can be used, and individual segmentation results will be automatically combined with a logical OR operation. A graph G k = (V,E), consisting of nodes V and edges E, is constructed by placing a spherical mesh with radius r and n column evenly spaced mesh vertices p i with i ∈ {0,1,...,n column − 1} centered around ce k . The volume (voxels) inside the spherical mesh will be denoted as M region . Then, columns with sample points (nodes) are introduced between ce k and mesh vertices p i . The spacing between nodes is gap, resulting in n node nodes that represent sample points of the physical volume [ Fig. 2(a)]. The node n i,0 is closest to the center, while node n i, n node −1 is furthest away from the center. All remaining nodes are placed in order between them. In addition to nodes, several edges are added to represent the segmentation problem. First, for all nodes on a column, the edges {n i, j ,n i ′ , j−1 } with infinite capacity are added to E if j 0 to ensure that only one node can be selected on a column 34 [ Fig. 2(b)]. Second, for all nodes j and j ′ , where j ′ = max( j − sc,0) on every pair of adjacent columns i and i ′ , edges {n i, j ,n i ′ , j ′} and {n i ′ , j ,n i, j ′} with infinite capacity are added to E to implement a hard smoothness constraint 34 [ Fig. 2(c)]. Thus, when solved, the nodes of two adjacent surface points on columns cannot be more than sc nodes apart. Third, a soft smoothness constraint is included by adding edges Consequently, for a possible solution, the cost is increased by sp times the difference in j between surface nodes on a neighboring column.
Based on experiments performed on a small set of PET volumes dedicated for algorithm development, we found that r = 60.0 mm, n column = 1026, gap = 1.0 mm (resulting in n node = 60), sc = 5, and sp = 0.005 work well for our application.

2.B. Cost function design
For graph-based segmentation, nodes n i, j that represent the object boundary should incur low costs, while nodes that represent the selected object (uptake region), adjacent objects, or background should incur high costs to make it less likely that the object surface (mesh) passes through these nodes. To generate node costs, uptake values up(n i, j ) at corresponding spatial node locations are sampled from the PET volume by means of linear interpolation and subsequently converted into node costs c base (n i, j ). This conversion is based on the analysis of local image properties around the center point ce k , which are robustly estimated. Our cost function design is based on an adaptive strategy to mimic the typical tracing preference of radiation oncologists [ Fig. 3 (a)]. The boundary drawn by the radiation oncologist [ Fig. 3 (a)] almost reaches voxels with background uptake, whereas a typical segmentation approach based on the isocontour corresponding to 50% of the maximum lesion uptake-to-background uptake ratio has boundary voxels with much higher uptake values [ Fig. 3 (b)]. In this context, we observed that there is no linear relation between the average uptake value at the boundary of the manual segmentation and the maximum uptake to background uptake ratio. Also, note that the segmentation boundary appears "noisy" in the sagittal cross section depicted in Fig. 3

2.B.1. Local image properties
Based on our assumption of roughly spherical object shape, we introduce the notion of a shell to robustly measure local uptake parameters. A shell Ω j is the set of all nodes with the same node level j: as a function of j is given in Fig. 4 (a). Specifically, we are interested in finding the "peak" pe and the "knee" kn uptake values [ Fig. 4 (a)], which will be utilized for determining a threshold T h that is used for calculating c base (n i, j ). The peak pe is given by pe = max j=0, ..., n node −1 (up Ω ( j)) and represents the maximum uptake in a shell. The knee kn is the approximate uptake value at which the object fades into the background, but is not intended to be the background itself. To find kn, the gradient up ′ Fig. 4(b)]. Then the index of the node representing the steepest descent in up Ω is found with j low = arg min j=1, ..., n node −2 {(n node −( j +1))/n node up ′ Ω ( j)} [ Fig. 4(c)]. The linear center bias term (n node − ( j + 1))/n node helps dealing with some rare, occasional cases of one large or several small outside objects with uptake appearing in the shell profile. To find the point where the shell uptake transitions to background values, we utilize a modified version of up ′ Ω , such that it never The node index where up Ω starts leveling off is j hi [ Fig. 4

2.B.2. Threshold calculation
Once the peak and knee are determined, the threshold T h can be calculated evaluating A plot of Eq. (2) as a function of the ratio γ = pe/kn is provided in Fig. 5, and the rationale behind this design is as follows. If γ = 2, roughly a 50% threshold is used. For larger γ-values (e.g., pe ≫ kn), the T h % will be lower than 50%, and for smaller γ-values, T h % will be increased moderately. With this empirical design, the typical tracing behavior of radiation oncologists is mimicked; higher uptake values above background are included to form a perceived safe tu-mor margin definition (Fig. 3). High acceptance (i.e., avoiding the urge of users to manually edit/postprocess segmentations that were generated by an algorithm) of computergenerated thresholds/segmentations is important to achieve lower inter-and intraobserver variability. Furthermore, Eq. (2) can be tailored to specific needs or applications. For example, variants based on some standard threshold methods can be used as an alternative to focus more on volume estimation than treatment. Some examples are T h % = 0.4 and T h % = 0.5, which are based on typical 40% and 50% values of maximum thresholds that are fairly common. Also, note that during calculation of T h, the most inner and outer shells are not considered because they represent extreme solutions that are not of interest.

2.B.3. Cost function
Once the threshold T h is found, the cost function c base (n i, j ) is constructed as follows (Fig. 6). Essentially, c base consists of the following three components: the likeliness of the uptake to be part of the background. For this purpose, the image volume inside M region is isotropically resampled, and a normalized image histogram H with max(H) = 1 is generated.H is then generated by taking the right-to-left monotonic increasing envelope function of H [ Fig. 6(a)]. If up(n i, j ) = T h: The cost is defined as c base (n i, j ) = 0. If up(n i, j ) > T h: The cost is a linear function given by which has the value of 0 at the threshold uptake and 1 at the center uptake.
The final cost function is given by c(n i, j ) = c base (n i, j ) + c reject (n i, j ). The term c reject (n i, j ) is added to reject trivial solutions or parts of adjacent, unrelated objects. It is defined as with r 1 (n i, j ) = j < j min , r 2 (n i, j ) = ( j > j min and min j ′ =0,1, ..., j (up(i, j ′ )) < median(M region )), and j min = 3. Therefore, the costs of nodes in close proximity to the center ( j < j min ) as well as nodes further away where the uptake has fallen below the median of spherical region M region are increased by rej = 6, far above the typical cost range between 0 and 1. An example of a complete cost function profile is given in Fig. 6(c).

2.C. Segmentation
Once the graph is constructed and all costs have been determined, a globally optimal solution can be found in low degree polynomial time, as described by Li et al. 34 The result is a set of nodes with exactly one node per column selected, which represents the segmentation boundary. To visually represent the segmentation result, the initial spherical triangle mesh can be utilized by moving mesh vertices to selected boundary nodes on the same column. Several examples of segmentations and their cost functions are shown in Fig. 7.
Once the user has produced a valid segmentation of lesions found in a PET volume, the meshes are converted to labeled volumes by means of voxelization, resulting in segmentations S k . Voxelization can lead to a result where some voxels are not within a 6-neighborhood of the main part of the lesion that includes ce k . Such voxels are removed. Also, the user has the option to close one-voxel-wide gaps between adjacent lesions (e.g., hot lymph nodes within a chain of nodes).

2.D. Segmentation options
Beyond the base algorithm (Secs. 2.A-2.C), there are alternative modes that can adapt the behavior of the algorithm to better handle frequently occurring situations that are ambiguous. Thus, the options enable the user to provide additional expert knowledge, which is utilized to modify the behavior of the algorithm accordingly. For our method, the following segmentation options were implemented, which are described in appendices. Label Avoidance prevents new lesion segmentation from overwriting existing segmentations (Appendix A).
Splitting simplifies individually segmenting lesion in close proximity with similar uptake (Appendix B). Necrotic Mode simplifies the segmentation of necrotic lesions (Appendix C).

2.E. JEI-Based refinement approach
Designing a cost function that is appropriate for all possible situations is difficult. Consequently, while the above-described base algorithm will perform flawlessly for the majority of lesions found in H&N FDG PET scans, suboptimal results can occur in some cases. For practical applicability in clinical trials and routine care, it is important that even difficult cases can be processed with the same segmentation tool without major additional effort. Thus, we utilize the JEI principle to effectively deal with segmentation errors of the base algorithm. The basic idea behind JEI is that user input is kept at a minimum during refinement of a segmentation. Thus, instead of manually correcting (local) errors (i.e., user driven boundary delineation), the user provides only high-level input to the algorithm to correct the segmentation boundary by utilizing refinement modes of the approach. Also, an undo function is provided that allows the user to reverse a refinement step. Two examples of JEI-based refinement are given in Fig. 8, and a detailed description of refinement modes is given in the sections below.

2.E.1. Global refinement
Global refinement changes the value of T h in order to modify the boundary of the segmentation. The user can place a point R T h at an image location where the lesion boundary should go through, and the corresponding uptake value is utilized as a new value for T h, which will change the cost function and thus the entire segmentation surface. In addition, the closest column to R T h is modified to force the result (surface) through the closest node to R T h , which will be called n i T h , j T h . This is accomplished by adding another cost change function to c base during the calculation of the final cost c. As a result of this cost change, the entire graph-based optimization needs to be rerun. If the user is not satisfied with the result, he/she can repeat the process, which will override the previous settings. An example for a "single click" global refinement action is provided in Figs. 8(a) and 8(c). As can be seen, a single user provided point [red dot in Fig. 8(c)] is sufficient to update the whole object surface.

2.E.2. Local refinement
Local refinement allows the user to correct cases where only a portion of the segmentation boundary needs to be corrected. This is accomplished as described below.
(a) User-interaction. To start the local refinement process, the user needs to inspect the segmentation and, if needed, specify a surface point R L p through which the correct surface should go [blue point in Fig. 8(d)].
Because the user can perform multiple local refinement interactions, the index p is used to keep track of them. The node closest to R L p will be denoted as n(i L p , j L p ). (b) Local search for similar columns. Subsequent to finding n(i L p , j L p ), neighboring columns of the surface segment to be altered are determined based on the similarity of the uptake pattern around n(i L p , j L p ) (Fig. 9). For this purpose, a breadth-first search (BFS) on columns with a hard constraint of ds(i,i L p ) ≤ 5 is performed. The function ds(i,i L p ) represents the number of edges on the shortest path on selected columns (mesh vertices) between the surface mesh vertices a and b (graph geodesic) that are associated with columns i and i L p , respectively. An uptake pattern on a column is considered similar if smc(n i, j ,n i L p , j L p ) =  3 k=−3 |up(n i, j+k ) − up(n i L p , j L p +k )| < th si m and | j − j L p | ≤ ds(i,i L p ) is fulfilled with th si m = 0.05  3 k=−3 |up(n i, j+k )|. Because the BFS can leave "holes" in the set of selected columns, left out columns are added in a "closing" step; columns that were skipped during BFS due to dissimilarity are included, if two thirds of its immediate neighbors were selected by smc. Finally, for all selected columns with i i L p , the node with the highest profile similarity is stored in the set Ψ p . (c) Modification of costs. To locally refine the segmentation result, the costs on column i L p itself and selected neighboring columns in set Ψ p are modified. On column i L p , the cost update function is given by For all nodes of columns that include a node nĩ p ,j p ∈ Ψ p , the function c L p (nĩ p , j ) = notch( j,j p ,3,ds(ĩ p ,i L p )) is utilized; for the definition of function notch see Eq. (A2) in Appendix A. Thus, the cost is decreased around the target. The decrease is narrow in close proximity to column i L p , but becomes much wider further away. For all other nodes, c L p is equal to zero. The F. 9. Examples illustrating local refinement. (a) In this example, the vector around R E l on the column marked C i+3 is compared to vectors on adjacent columns within range to find the best match for the refinement. Note that the vector is smaller than in practice and for simplicity, only columns in a plane are shown. (b) Illustration of BFS to find neighboring columns whose costs need to be adapted. Note that in this case, comparisons are based on column C i . motivation behind this design is as follows. First, it forces the boundary to pass trough n i L p , j . Second, it models uncertainty further away from the user specified refinement point. Multiple local refinement points can be utilized for a lesion, each resulting in a cost change, which are accumulated to update c. The only exceptions are columns directly associated with user-selected refinement points, which will not have their costs further changed.

3.A. Image data
For method validation, 60 PET/CT scans from 59 different subjects with H&N cancer were utilized, which were acquired with different scanners and reconstruction parameters (Table II) between 2004 and 2008. Out of this set, 59 scans were performed pretreatment, and for one patient, an additional post-treatment scan with uptake was included. All subjects were injected with 370 MBq ± 10% of [F-18]FDG with an uptake time of 90 min ± 10%. In all cases subjects were fasted for >4 h and had blood glucose <200 mg/dl. Because of the interest in the H&N region, patients were imaged with arms down, and CT-based attenuation correction was performed. All reconstructions were performed with 2D OSEM iterative algorithms (Table II) lymph nodes, and the average number of lesions was 3.83/scan. All scans were assessed in terms of complexity and classified into the following categories: low, med, and high.

3.B. Experimental setup
To assess the performance of our semiautomated algorithm, it was implemented as an extension for 3D Slicer, 37 a multiplatform, free, and open source software package for visualization and medical-image computing. In addition, the manual segmentation (2D drawing) tools offered by 3D Slicer were used for comparison.
Three physicians (experts) with different levels of radiation oncology experience (one faculty professor-JMB and two physician residents with instructions-KAP and TC) participated in our validation experiment. Before the start of the experiment, all experts were trained on both segmentation methods and tools, using a separate set of ten H&N PET scans.
For validation, the 60 PET scans described in Sec. 3.A were randomly divided into three sets of 20 scans each. The partitioning was stratified such that each set contains approximately the same range of case complexities. These three sets were processed sequentially in four processing steps. In each step, the complete set of 20 PET scans was segmented in a random sequence by alternating the use of semiautomated and manual segmentation methods such that after two steps, all 20 cases in a set were processed with both methods by each physician. To assess intraoperator variability, two additional steps were performed by all three experts. Thus, all 230 lesions present in the 60 PET scans partitioned into three sets were segmented fourtimes by three experts, resulting overall in 2760 segmentations. Half were done by using the semiautomated (JEI) method and half with the standard manual segmentation method. For both T II. Utilized PET-CT scanners and image reconstruction parameters. methods, PET scans were loaded into the segmentation software using a default display setting, consisting of a window of 6 SUV and level of 3 SUV. The experts were allowed to change this setting if deemed necessary for the segmentation task at hand (e.g., chain of active lymph nodes with similar uptake). For segmentation, a set of indicator images was provided to the experts that roughly identified the object to be segmented and the corresponding object label to be used (Fig. 11) for each lesion to avoid differences in the interpretation of the PET scans. The indicator images were generated by an experienced radiation oncologist (JMB) with access to medical patient records.

3.C. Independent reference standard and quantitative indices
(a) Segmentation accuracy. Due to the lack of a ground truth for lesions, the following approach was used to form an independent reference for comparison. For a given expert, the four manual segmentations performed by the two other experts were combined to produce a segmentation reference by utilizing the Simultaneous Truth and Performance Level Estimation (STA-PLE) algorithm proposed by Warfield et al. 38 To assess segmentation accuracy, the Dice coefficient was computed between each expert's observed and reference segmentations. Given two segmentations, A and B, the Dice coefficient is given by D = (2| A ∩ B|)/(| A| + |B|).
Coefficient values close to one indicate high agreement, and values close to zero indicate low agreement. (b) Segmentation agreement. For the assessment of interand intraoperator segmentation agreement, the following approach was used. To assess intraoperator agreement, the Dice coefficient between trial one and two with a given method was calculated. To measure interoperator agreement, Dice coefficients for a given method were calculated for all user pairs within each trial.
(c) Time and user effort. For each user, the time required to segment all lesions in a given PET scan was recorded for both methods. Additionally, for the semiautomated approach, the number of actions (specifying a center point for segmentation, refinement operations, etc.) including and excluding discarded actions was recorded and analyzed.

3.D. Statistical analysis of performance indices
Statistical analysis was performed on results from the tumor segmentations. Reported Dice coefficient means, 95% confidence intervals, and p-values were obtained with linear mixed effects regression models. Random effects were included in the models for experts and patients in order to account for repeated segmentation within the two factors. Estimated intraand interoperator variability in segmentation accuracy Dice coefficients are reported as standard deviations. Intraoperator variability measures the amount of variation in the trial one and two segmentations about their average. Interoperator variability measures the variation in trial one and two averages across experts.

4.A. Segmentation accuracy
For both methods, the average Dice coefficient is given in Table III. The Dice coefficients were not found to be statistically different (p = 0.2145). Also, Table III provides summaries of the intra-and interoperator standard deviation of segmentation accuracy assessed with the Dice coefficient. The provided 95% confidence intervals (CIs) show that intra-and interoperator standard deviations are significantly lower for the semiautomated method compared to the manual segmentation approach. Table IV summarizes results for intra-and interoperator segmentation agreement analysis. In both cases, the semiautomated method shows a significantly higher agreement compared to the manual method. Figure 12 depicts typical examples of intra-and interoperator variation in HNC segmentation. PET segmentations per HNC PET scan. The 95% confidence intervals in Table V show that the proposed semiautomated method is significantly faster than its manual counterpart. Note that while the average time for segmenting all lesions (average: 3.83 lesions) with our semiautomated method in a PET scan was 3.74 min, the segmentation algorithm itself runs at interactive speeds. Thus, almost all of the reported time is spent by the user on understanding the scene and inspecting produced segmentations. Table VI provides statistics about the number of actions required by experts to perform a semiautomated segmentation of all lesions in a PET scan. The plots in Fig. 14 show the accumulative percentage of cases that were completely segmented in dependence of the number of actions required. Differences between used (all) and actually necessary (final) actions in Table VI and Fig. 14 indicate that the user effectiveness can be further increased with additional training.

5.A. Performance
Our JEI graph-based segmentation approach offers a high degree of automation, requires little user interaction, and enables clinically practical and efficient computer-aided segmentation refinement. Compared to a manual segmentation T V. Estimated mean times in minutes with standard deviation (SD) and 95% confidence intervals (CIs) for manual and semiautomated segmentations per PET data set. approach, this combination results in a versatile segmentation tool that showed equivalent average segmentation error with significantly reduced standard deviation. The performed assessment of user agreement showed significantly higher intra-and interoperator consistency for the semiautomated approach compared to manual segmentation. This can be seen by the example provided in Fig. 12; the manual segmentation approach leads to considerable variation in segmentations produced by the same user as well as across users, whereas little variation is observed between the segmentations produced with the semiautomated method. Thus, the approach meets the requirements defined in Sec. 1.B. Despite the relatively short training phase, all three experts were able to produce valid segmentations of target lesions in all utilized PET scans. However, as Table VI and Fig. 14 show, training of users is important to achieve good efficiency. Based on the plots shown in Fig. 14, we estimate that more than 90% of the 230 lesions can be segmented with one or two user input actions (mouse clicks), which will likely reduce the required user interaction time (Table V and Fig. 13) even further.

5.B. Current limitations
As can be seen from Fig. 14, in some rare cases, more than ten user actions were necessary to produce a segmentation of a lesion. This issue can be addressed by adding suitable segmentation modes and/or refinement tools that enables the user to more efficiently handle such cases.
While the results presented in Sec. 4 are very promising in terms of segmentation error, segmentation agreement, and interaction time, it does not offer insight into the impact of different PET image reconstruction methods on segmentation results. Thus, we plan to study this aspect by utilizing PET phantoms in the near future. Harmonization of image acquisition and quality assurance is another critical component of assuring optimal clinical trials, target definitions, and response assessment that is also being addressed nationally. 39 In the current implementation, only the PET image information of PET-CT scans is utilized. However, the chosen graph-based LOGISMOS framework is well suited for expanding our approach to segmenting PET and CT volumes at the same time. In addition, the method can be adapted for segmenting dynamic PET (volume and time) scans. Also, by defining a suitable cost function, our segmentation approach can be adapted for segmenting PET scans with tracers other than FDG.

5.C. Impact
Target or lesion definition is a critical component for both radiation therapy treatment planning and delivery as well as for response assessment of tumors after treatment. Our current standard involves a process wherein the radiation oncologist T VI. Statistics of actions required by expert 1 to expert 3 for the semiautomated method. All actions denotes the number of actions actually performed by the expert, and final actions denotes the number of actions that would have been required to generate the segmentations (i.e., undone actions are not counted). or radiologist manually outlines a tumor or lesion. This is inherently both time consuming and relatively inconsistent because of the nature of both human perception and limits of human dexterity. Algorithmic tools have the capacity to speed the process and at the same time improve consistency. While concerns exist regarding automation of critical treatment-determining steps and decision-making metrics, the ability to improve consistency is unquestionably paramount, particularly when considering clinical trials in which multiple institutions and investigators will determine targets and hence outcomes. The ability to substantially improve consistency enables better comparisons of tumor control probabilities. Although substantial quality assurance efforts within the radiation therapy community have been extremely effective and robust in consistently defining dose delivery parameters across institutions using a variety of treatment planning and delivery systems for clinical trials, 40 the same level of robustness has not been possible for tumor definitions. Algorithmic tools are an essential method for achieving this improved robustness.

Number of actions
Our tool illustrates a 26.9% improvement in segmentation agreement across physicians combined with a 20.3% improvement of agreement within the same physician contouring the same target. In addition, the tool reduced the time required for segmentation by 57.9% compared to the standard manual method per PET scan. In considering radiation therapy trials, this can impact both the cost by diminishing the numbers of patients needed and decreasing the time for defining the tumor target for treatment. Additional tools to apply these algorithmic benefits for response assessment will be both useful for outcome determination as well as potentially for therapy adaptation.
F. 14. Plots showing the accumulative number of cases completely segmented in dependence of number of user actions required for the semiautomated method for experts one, two and three. "All actions" denotes the number of actions actually performed by the expert, and "final actions" denotes the user actions that were actually required to perform the segmentations (i.e., undone actions are not counted).

CONCLUSIONS
We have presented a novel approach for the highly computer-aided segmentation of lesions in H&N FDG PET volume data sets. By utilizing the JEI paradigm, a good tradeoff between automation and input required from human experts was achieved, enabling segmentation of simple and complex cases with the same tool in an efficient and consistent manner. Due to the higher intra-and interoperator segmentation agreement compared to manual segmentation without a loss in segmentation accuracy, the proposed method is well suited for applications like image-guided radiation treatment planning as well as image based assessment of treatment response or treatment outcome prediction, which all benefit from these traits. The application of these tools in multi-institutional clinical trials can improve their efficiency and cost-effectiveness. sp = 0.05. Consequently, parts of a surface that are further away from the rest of the shape are more likely to be cut off. Furthermore, the surface becomes more responsive to the local refinement presented in Sec. 2.E.2. Second, the cost function c is expanded by adding c split , introducing additional feature terms that focus on local and potentially weak evidence of the presence of an uptake boundary. For this purpose, the following features are considered.
Local minima: Nodes representing local uptake minima along columns are detected by using spc min (n i, j ) = (up(n i, j−1 ) > up(n i, j )) AND (up(n i, j ) ≤ up(n i, j+1 )). Watersheds: For watershed analysis, an inverted version of image I is utilized.
(i) Strong watersheds [ Fig. 17(c)] are identified by a watershed segmentation with a fill level of 20% of the maximum difference of the processed volume, where the level is the threshold for watershed unions. 41 The resulting watershed labels at node n i, j will be denoted as sw s(n i, j ). Then, if two adjacent nodes on a column are in different watersheds and are in the abovethreshold part of the column, the node is marked: spc sw s (n i, j ) = (sw s(n i, j−1 ) sw s(n i, j ) and ( j ′ | up(n i, j ′) < T h and 0 ≤ j ′ ≤ j)). (ii) Weak watersheds [ Fig. 17(d)] are found by a watershed segmentation with a fill level of 0% of the maxi-mum, which will likely result in oversegmentation. Nodes representing transitions between weak watersheds are marked by spc w w s (n i, j ) = (ww s(n i, j−1 ) ww s(n i, j ) and ( j ′ | up(n i, j ′) < T h and 0 ≤ j ′ ≤ j)). Note that weak watersheds will also respond to strong watersheds, but not vice versa.
Each node at which a splitting condition is met is a splitting node. Note that several markers described above can be set for a node on a given column. For each column, sets of nodes with the features snl min i , snl sw s i , and snl w w s i are combined into a superset snl i , containing all nodes that respond to corresponding features on column i.
At the location of a node with a condition met, a cost change is applied. For this purpose the notch function [Eq. (A2)] is utilized. This allows for the splitting feature to affect the surface solution, even if smoothness constraints prevent the exact splitting node from being part of the surface. The depth d is adjusted based on the type of splitting feature: c split min (i, j) =  n i, j ′∈snl min i notch( j, j ′ ,0.4,2), c split w w s (i, j) =  n i, j ′∈snlw w s i notch( j, j ′ ,0.5,2), and c split s w s (i, j) =  n i, j ′∈snls w s i notch( j, j ′ , 0.2,2). Note that d sw s < d w w s , but weak watershed features usually also occur at nodes with strong watersheds, and thus their effects are additive, resulting in a cost reduction (i.e., notch depth) of 0.7 at such locations.
The additive cost term can now be defined as

APPENDIX C: NECROTIC MODE
Necrotic lesions have a core with minimal or no uptake that is typically surrounded by active regions with uptake values above background. With the above presented framework, it will be difficult to segment such a necrotic lesion when the user places the center in the necrotic region. The main reason is that the approach presented above assumes that lesions have above background uptake values. This behavior can be changed by activating the necrotic mode option-instead of rejecting as soon as the uptake goes below the regional median, the uptake must first go above the threshold. Thus, the rejection for being too low will not happen immediately on necrotic regions, allowing the segmentation to succeed. For complex, very heterogeneous cases, a combination of normal and necrotic modes can be used to segment different dominant parts, which can be joined by assigning the same segmentation label (i.e., performing a logical OR operation).