A Geospatial Framework to Assess Fireline Effectiveness for Large Wildfires in the Western USA

Quantifying fireline effectiveness (FLE) is essential to evaluate the efficiency of large wildfire management strategies to foster institutional learning and improvement in fire management organizations. FLE performance metrics for incident-level evaluation have been developed and applied to a small set of wildfires, but there is a need to understand how widely they vary across incidents to progress towards targets or standards for performance evaluation. Recent efforts to archive spatially explicit fireline records from large wildfires facilitate the application of these metrics to a broad sample of wildfires in different environments. We evaluated fireline outcomes (burned over, held, not engaged) and analyzed incident-scale FLE for 33 large wildfires in the western USA from the 2017 and 2018 fire seasons. FLE performance metrics varied widely across wildfires and often aligned with factors that influence suppression strategy. We propose a performance evaluation framework based on both the held to engaged fireline ratio and the total fireline to perimeter ratio. These two metrics capture whether fireline was placed in locations with high probability of engaging with the wildfire and holding and the relative level of investment in containment compared to wildfire growth. We also identify future research directions to improve understanding of decision quality in a risk-based framework.


Introduction
Progress is being made in the USA towards risk informed, safe, and efficient fire management, but limited suppression effectiveness data collection and analysis inhibits efforts to evaluate the quality of wildfire management decisions and the subsequent adaptive management of suppression practices [1][2][3][4][5]. Suppression effectiveness research covers a diverse set of management actions aimed at modifying physical fire processes across multiple spatial and temporal scales to contain fire and mitigate its negative effects [6,7]. Although many fine-scale fire processes and management interventions are best studied with controlled experiments, containment strategy evaluation necessarily relies on observational data from actual incidents. It is these data-capturing management decisions, suppression resource actions, fire behavior, and values threatened in space and time-that are lacking in quantity and quality within many fire organizations [6][7][8][9]. Despite these challenges, the increasing availability of spatially explicit fireline records provides the means to evaluate certain aspects of containment strategy performance using fireline effectiveness (FLE) metrics based on easily quantified measures of the length of final wildfire perimeter and how much fireline burned over, held, or did not engage with the wildfire [9,10]. FLE performance metrics are attractive because of their modest data and analysis requirements, but there is a need to expand their application beyond a small set of case studies to understand how they can be utilized to evaluate the effectiveness of incident-level strategies. Early research decomposed suppression effectiveness into the study of fireline production rates and containment probability to support suppression models for determining resource needs and evaluating containment alternatives [11][12][13][14]. The bottom-up approach to determining resource needs for containment approximates reality on small wildfires but does not scale well to larger or longer duration incidents, especially during periods of extreme weather [15][16][17]. Initial attempts to evaluate incident level performance for large wildfires were hampered by the lack of high-quality geospatial information on fire perimeters and containment actions; for example, previous studies relied on approximations of fire geometry to estimate perimeter lengths and used fireline production proxies in place of direct observations [15,16]. Thus, performance measurement was limited to coarse metrics, such as the Exposure Index (EI) [15], which is calculated as the ratio of fire perimeter length to the total productive fireline capacity of the available resources. Calkin et al. [15] found that EI averaged just under 11% for a large set of wildfires in the USA, which suggests that real-world resource efficiency is far lower than predicted by simple containment models (e.g., [13]). Unfortunately, they were unable to document the causes of this inefficiency due to the lack of precise information on containment actions; EI could be low because fireline may have burned over or not engaged with the fire, which relate more to strategic decisions on how resources are used instead of the fireline productive capacity of the resources.
Evaluating the effectiveness of large wildfire suppression strategies is complicated by the multiple containment, protection, and safety objectives that vary through space and time on an incident [18]. Rapid containment is often the primary objective for wildfires burning near human values, but containment may be a lower priority for fires burning in remote locations, especially under conditions that allow fire to achieve land and resource management objectives [19]. Real or perceived potential for loss of high value resources and assets can motivate excessive risk taking by fire responders to engage in control tactics that are both ineffective and unsafe during periods of extreme fire weather and growth [20][21][22]. Fireline that is constructed hastily and/or situated in non-ideal locations for fire control can burn over, decreasing the realized containment efficiency of resources. When opportunities do not exist to safely engage in direct containment tactics, resources are often allocated to indirect tactics such as prepping containment lines far from the current wildfire perimeter. Indirect containment tactics may, in some circumstances, enhance safety, and enable locating fireline where conditions increase control probability such as roads, non-timber fuel types, and mellow slopes [23][24][25][26][27][28]. However, fireline located too far from the current wildfire perimeter will not directly contribute to fire containment if it does not engage with the fire. Geospatial analyses of incident-level FLE are now possible due to better recording and archiving of fireline locations and final wildfire perimeters. Fireline outcomes are easily assigned into categories of burned over, held, and not engaged using overlay analysis of fireline locations relative to the final wildfire extent and perimeter. Like EI, the total fireline length can be compared to the final wildfire perimeter length as a relative index of suppression intensity and strategy. Fireline production often far exceeds fire perimeter length [9], suggesting that resources are producing outputs on large wildfires that may not contribute directly to containment. Wildfires can also be rated on the proportion of total line that engaged with the fire and held to evaluate the quality of fireline placement decisions. High engagement is generally interpreted as an efficient outcome, as engaging with the fire is a necessary pre-condition for line to affect containment, but it may also be the result of excessive line production with low probability of success that eventually burns over. FLE performance metrics have been used at the incident-level to evaluate the relative efficiency of containment in different fuel conditions and applied to a small number of fires in the USA to demonstrate the potential utility for rating incident-level performance [9,10]. While FLE metrics appear promising for rating incident-level performance, operationalizing them in an evaluative framework requires more understanding of how and why they vary across a range of incidents.
To increase understanding of FLE metric utility for incident-level performance evaluation, we compiled and analyzed fireline and perimeter records from 33 large wildfires in the western USA. We designed a set of geospatial analyses to address current data quality issues, assign fireline outcomes, and calculate FLE performance metrics by incident and fireline type. We first evaluated the utility of each FLE metric by contrasting environment and strategy characteristics of wildfires at the low and high ends of the distribution. Pairwise correlation analysis was then used to evaluate which performance metrics contain redundant or potentially complementary information on containment strategy effectiveness. We developed a performance evaluation framework based on both the held to engaged fireline ratio and the total fireline to perimeter ratio, which captures whether fireline was placed in locations with a high probability of success for the fire conditions and the relative level of investment in containment compared to fire growth. Aggregated measures of fireline effectiveness were also compared across fireline types to illustrate the potential future uses of these data to inform the efficient allocation of fire suppression resources.

Case Study Fires
We compiled spatial fireline and perimeter data for 33 large, western USA wildfires from the 2017 and 2018 fire seasons from the National Interagency Fire Center (NIFC) [29]. The fires were selected based primarily on the completeness and quality of fireline reporting rather than by random sampling of all large (>400 ha) fire incidents. Despite this filter, the wildfires included in the study cover a broad range of physical and social settings across nine states ( Figure 1). The fires range in size from 13.3 to 1858 km 2 . Wildfire environment and strategy information used in our qualitative evaluation came from Incident Status Summary Reports (ICS-209) [30] and spatial datasets of land ownership [31], land management designations [32], and population [33]. Most of the wildfires we investigated burned in forests. The primary fire behavior fuel models [34] assigned by incident managers were: 22 timber (litter and understory), four timber (grass and understory), two closed timber litter, two brush, one tall grass, one southern rough, and one chaparral. USDA Forest Service ownership by incident ranged from 9.1 to 100%. Only eight of the wildfires had more than 10% private ownership, but almost every fire (31 of 33) threatened more than ten human structures (range 0 to 15,300 structures threatened) and 19 fires evacuated more than ten people. Full suppression was the dominant strategy on most incidents (27 of 33), but 17 of the fires used some form of confinement, point/zone protection, or monitoring strategies on some days or portions of the fires. Given the range of ownerships, protection concerns, and suppression strategies, not all wildfires we examined are considered "extreme" [35].
The NIFC wildfire data used in this analysis represent the final fire extents as polygons, which were primarily mapped using aerial infrared imagery or mixed methods that include manual adjustment to perimeters derived from infrared imagery to reflect subsequent fire growth. For nine fires, we used the final fire extent as documented by the Geospatial Multi-Agency Coordination Center (GeoMAC) [36] because there was significant late-season growth that was not represented by NIFC. Both sources draw from the same primary incident-level data, so mapping resolution and accuracy are equivalent.
Fireline activities are represented by polyline features attributed with the fireline type in the feature category field ("FeatureCat") per standard operating procedures for wildfire geographic information [37]. Fireline type includes information related to construction method (e.g., "hand line," "dozer line," etc.), whether the fireline is based on a pre-existing feature (e.g., "road as completed line"), associated suppression tactics (e.g., "burnout"), and construction status ("completed," "planned," or "proposed"). Fireline data were collected in the field and managed by the incident Geographic Information System (GIS) analyst(s) using a variety of methods, leading to several data quality issues we addressed prior to analysis. Geographic Information System (GIS) analyst(s) using a variety of methods, leading to several data quality issues we addressed prior to analysis.

Data Cleaning
All wildfire extent and fireline data were re-projected to a common continental-scale projection (USGS Albers Equal Area Conic) and compiled by data type. Our goal with pre-analysis processing was to identify firelines that were constructed or enhanced in advance of fire arrival as opposed to mop up activities completed after fire growth slowed or ceased. This was accomplished with a combination of attribute query and spatial processing techniques detailed in the following paragraphs.
The first filtering step removed fireline types with construction status of "planned" or "proposed" or those attributed as "uncontrolled fired edge" to focus only on firelines that were implemented with actions on the ground. During initial data exploration, several erroneous fireline features with suspect geometry were identified with visual inspection and removed from analysis. Visual inspection of the data also revealed that the fireline type "completed line" was assigned to large portions of wildfire perimeters, often with highly irregular boundaries that do not correspond

Data Cleaning
All wildfire extent and fireline data were re-projected to a common continental-scale projection (USGS Albers Equal Area Conic) and compiled by data type. Our goal with pre-analysis processing was to identify firelines that were constructed or enhanced in advance of fire arrival as opposed to mop up activities completed after fire growth slowed or ceased. This was accomplished with a combination of attribute query and spatial processing techniques detailed in the following paragraphs.
The first filtering step removed fireline types with construction status of "planned" or "proposed" or those attributed as "uncontrolled fired edge" to focus only on firelines that were implemented with actions on the ground. During initial data exploration, several erroneous fireline features with suspect geometry were identified with visual inspection and removed from analysis. Visual inspection of the data also revealed that the fireline type "completed line" was assigned to large portions of wildfire perimeters, often with highly irregular boundaries that do not correspond to logical locations to construct fireline. Our interpretation is that completed line was assigned primarily to locations where fire control was achieved with mop up instead of pre-arrival fireline construction. Mop-up is critical containment activity that is often time-consuming and expensive [7], but we excluded it from this analysis because the timeline and decision context differ considerably from proactive containment actions. We revisit the consequences of excluding completed line in the discussion.
Current fireline data collection and management often results in duplicate firelines, including segments with identical spatial and attribute data, multiple passes recorded along dozer or plow lines, overlapping ends of adjacent segments, and overlapping segments with different fireline types. We reduced the prevalence of duplicated fireline using two methods. First, duplicate firelines with identical spatial and attribute characteristics were reduced to single features using the delete identical tool in ArcGIS 10.3 [39] by fireline line type, fire name, and spatial geometry. Second, we eliminated large segments of overlap by collapsing firelines with similar spatial location and geometry. This was accomplished by buffering the firelines by 50 m and intersecting these buffers to create polygons representing the potential zones of overlap. Based on visual inspection, we determined that overlap zones with less than 500 m perimeter usually represent shared line endpoints or intersections. We selected the overlap zones with greater than 500 m perimeter length for further analysis. We then clipped out all the firelines within a 25 m buffer around the large overlap zones. The ArcGIS 10.3 [39] integrate tool with a 30 m tolerance was used to reduce multiple firelines within these zones to common features with simplified geometry. We then attributed these integrated firelines with the fireline type of the combined features or "multiple methods" in the case where more than one fireline type was integrated. These methods reduce the prevalence of duplicates but do not eliminate them. The data cleaning methods used here are reasonable for broad-scale analysis of trends across the many fires included in the study, but a detailed analysis of a single fire would warrant manual fireline inspection and revision.

Attributing Firelines with Outcomes
Fireline outcome was attributed as burned over, held, and not engaged with overlay analysis in ArcGIS 10.3 [39]. First, fire extent polygons were converted to polyline perimeters and then buffered by 50 m. Any portion of a fireline that intersected this buffer region was considered held. Portions of firelines that fell outside this buffer region but within the fire extent were considered burned over. Portions of firelines that fell outside this buffer region and outside the fire extent were considered not engaged. Figure 2 demonstrates the application of this framework to a single fire edge and three example incidents. The 50 m buffer around the fire perimeter was necessary to define held line due to the imprecision with both the perimeter and fireline locations. The 50 m buffer was chosen after testing distances of 25, 50, 75, and 100 m. For most fires, the choice of buffer distance did not substantially change the proportion of line classified as held, but the wider buffer distances did increase line held for fires with many firelines oriented perpendicular to the final perimeter, which is common as fire burns into developed areas.

Performance Metrics of Fireline Effectiveness
We calculated fireline performance metrics from Thompson et al. [9,10] for each fire using the length of fire perimeter and the length of fireline that burned over, held, and did not engage. These metrics include: • Tr-total fireline length to perimeter ratio This ratio describes the relative suppression level on the fire. Lower Tr means less suppression effort devoted to building fireline, and higher Tr means more suppression. Tr of one means total fireline length matched the fire perimeter length, which would, in theory, lead to fire containment if perfectly planned and executed. • Er-engaged to total fireline ratio Engagement with fire is a pre-condition for fireline to affect fire containment. Engaged fireline in this study includes the burned over and held categories. Fireline that does not engage with fire may reflect a primary line that was never reached because fire growth was misjudged or an alternative/contingency line that is warranted to provide an extra layer of protection for high valued assets should primary line breach.

•
Her-held to engaged fireline ratio

Performance Metrics of Fireline Effectiveness
We calculated fireline performance metrics from Thompson et al. [9,10] for each fire using the length of fire perimeter and the length of fireline that burned over, held, and did not engage. These metrics include: • Tr-total fireline length to perimeter ratio This ratio describes the relative suppression level on the fire. Lower Tr means less suppression effort devoted to building fireline, and higher Tr means more suppression. Tr of one means total fireline length matched the fire perimeter length, which would, in theory, lead to fire containment if perfectly planned and executed. • Er-engaged to total fireline ratio Engagement with fire is a pre-condition for fireline to affect fire containment. Engaged fireline in this study includes the burned over and held categories. Fireline that does not engage with fire may reflect a primary line that was never reached because fire growth was misjudged or an alternative/contingency line that is warranted to provide an extra layer of protection for high valued assets should primary line breach.

•
Her-held to engaged fireline ratio This ratio describes the effectiveness of fireline that engaged with the fire. High HEr indicates fireline placement, construction specifications, and holding resources were sufficient to ensure a high probability of success, whereas low HEr indicates that much of the engaged fireline had a low probability of success for the conditions. • HTr-held to total fireline ratio This ratio describes the overall effectiveness of fireline considering both fireline engagement and whether it held. Using HTr as an effectiveness measure penalizes both line construction where it has little probability of success and excessive alternative/contingency line.
FLE performance metric calculations and data visualizations were performed with the R language for statistical computing and graphics [40].

Descriptive Statistics of Fire Characteristics and Performance Measures
The wildfires in this study cover a large range of sizes with varying levels of suppression and fireline outcomes. Wildfire size characteristics, fireline outcomes, and fireline performance metrics are presented for each fire in Table 1. Much of our qualitative analysis focuses on wildfire proximity to values at risk. As a coarse proxy for protection concerns, we divided the fires into frontcountry and backcountry groups based on having less than or greater than 50% area burned in USDA Forest Service wilderness and roadless areas. Fireline performance metric distributions are visualized for these two groups in Figure 3. In this section, we present a summary of each performance metric using example fires at the low and high ends of the distribution to provide context for how each metric relates to incident characteristics. It is important to recognize that the wildfire environment and strategy information we present is from coarse descriptions in the ICS-209 reports and post-hoc overlay analyses that describe only some of the factors driving fire suppression decisions.
The total fireline length to perimeter ratio, Tr, which we interpret as a relative index of suppression intensity, varied from 0.07 to 3.15. This means incident-level fireline production length varied from 7% to 315% percent of the respective fire perimeter length. Tr tended to be higher for wildfires in the frontcountry than those in the backcountry (mean 1.19 versus 0.52; Figure 3). The outliers on the high end include the 2017 Sunrise Fire in Montana and the 2018 Ferguson Fire in California, which both had substantial fireline length that burned over or did not engage with the fires. Both fires included long durations of active and extreme fire behavior that threatened structures and resulted in evacuations. The primary fuel type was timber for the Sunrise Fire and chaparral for the Ferguson Fire. The two fires with the lowest Tr were the 2017 Meyers Fire in Montana and the 2018 Maple Fire in Washington. Both fires were burning primarily in backcountry areas of public lands with few nearby values at risk. The suppression strategy on the Meyers Fire was classified as 100% full suppression, but the ICS-209 report mentioned employing a variety of direct, indirect, and point protection tactics that were concentrated in limited areas where the fire threatened structures. The suppression strategy on the Maple Fire was 20% full suppression and 80% monitor commensurate with minimal and moderate fire behavior threatening few values.
The proportion of fireline that engaged with fire, Er, varied from 0 to 0.78 across incidents. Low Er suggests either a large investment in alternative and contingency lines or a lack of primary line engagement due to misjudged growth. The average Er was higher for fires in the frontcountry than fires in the backcountry (0.59 versus 0.40; Figure 3). The 2018 Monument Fire in Montana had zero fireline engagement in our analysis, but the western boundary of the fire was reported as "completed line" where roads and the transition from timber to grass fuels supported firing operations, mop up, and spot fire containment. The 13.7 km of fireline to control the eastern flank of the fire was not reached because weather moderated. Most of this fireline was road as completed line (70.8%) or dozer line (21.3%) because direct control was not practical due to both the limited values threatened and safety concerns in the heavy timber fuels and steep terrain. The 2017 Norse Peak Fire in Washington had the second-lowest Er at 0.06 and a similar context, with the fire primarily burning in wilderness and roadless areas of the National Forest with heavy timber fuels, and line construction focused on containing fire before reaching private lands. The suppression strategy started as confinement but transitioned to full suppression as the fire burned out of the wilderness. The five fires with Er above 0.7 burned in a variety of ecological and social settings across five states. The Lolo Peak, Milli, South Sugarloaf, and West Valley fires all threatened structures and/or caused evacuations and were managed with close to 100% full suppression strategies. The 2018 Buzzard Fire in New Mexico was unique for its 100% point protection strategy due to the limited assets threatened by the fire.
The proportion of engaged fireline that held, HEr, varied from 0.30 to 1.00 with a mean of 0.67 and a median of 0.70, indicating that most fireline in a position to engage with fire effectively contained it. HEr was slightly higher for backcountry than frontcountry fires (0.75 versus 0.62; Figure 3). Only seven fires had HEr less than 0.5. The 2017 Sunrise Fire in Montana, which had the highest Tr, also had the lowest HEr at 0.30. Fires with HEr less than 0.5 all had human populations in proximity to at least one front of the fire. It appears that burned over line was most common in these zones, possibly reflecting that firelines with low probability of holding were attempted because of the high value of avoided exposure of the adjacent populations on private lands. For example, this set of low performing fires includes the 2018 Mendocino Complex and Carr Fires in California that threatened a mix of rural and urban populations. Most low performing fires listed the primary fuel type as timber, but the primary fuel types was brush on the Carr Fire and the secondary fuel type on the Mendocino Complex Fire was chaparral. A common feature of these low performing fires was their high percentage of days with reported active or extreme fire behavior (45.9-68.6%) in the ICS-209 reports, which puts them all in the upper 50th percentile of fires in this study. At the other end of the spectrum, perfect HEr was achieved on the 2018 West Valley Fire in Utah, which was burning almost entirely in wilderness and roadless and saw limited investment in hand line to check fire growth. The next three top performing fires were in similar settings in Utah, Montana, and Washington with low population and most area burned in remote public lands. The top four HEr fires reported fewer days of active or extreme fire behavior (16.1-46.7%) than most of the low performing fires.
HTr, which accounts for the held proportion of total line construction, varied from 0.00 to 0.75 with a mean of 0.33 and a median of 0.34. HTr was on average lower for backcountry than frontcountry fires (0.29 versus 0.37; Figure 3). Since HTr is the product of Er and HEr, incidents with either low Er and/or low HEr also had low HTr.    Tr is the total fireline length to perimeter ratio. Er is the engaged to total fireline ratio. HEr is the held to engaged fireline ratio. HTr is the held to total fireline ratio.

Combining Measures to Evaluate Incident-level Performance
As noted previously, some of the fireline performance metrics are related to each other due to a combination of their overlapping definitions (Er, HEr, and HTr) and their association with certain fire settings. Correlation analysis of the performance measures ( Figure 4) confirm our qualitative evaluations that HTr is strongly related to Er (R = 0.76) and moderately to HEr (R = 0.52). Most other pairwise comparisons indicate weak correlation of performance metrics. The one exception of note is a moderate negative correlation between Tr and HEr (R = -0.45) suggesting that high suppression fires are associated with lower fireline containment success. Tr was weakly correlated with both Er Tr is the total fireline length to perimeter ratio. Er is the engaged to total fireline ratio. HEr is the held to engaged fireline ratio. HTr is the held to total fireline ratio.

Combining Measures to Evaluate Incident-level Performance
As noted previously, some of the fireline performance metrics are related to each other due to a combination of their overlapping definitions (Er, HEr, and HTr) and their association with certain fire settings. Correlation analysis of the performance measures ( Figure 4) confirm our qualitative evaluations that HTr is strongly related to Er (R = 0.76) and moderately to HEr (R = 0.52). Most other pairwise comparisons indicate weak correlation of performance metrics. The one exception of note is a moderate negative correlation between Tr and HEr (R = -0.45) suggesting that high suppression fires are associated with lower fireline containment success. Tr was weakly correlated with both Er (R = 0.10) and HTr (R = -0.20) suggesting that the combination of Tr and one of these metrics would together describe unique characteristics of incident-level fireline effectiveness.
Sugarloaf Fires were fast-growing, wind-driven fires in brush or tall grass that allowed for only limited containment with direct tactics until weather moderated. The ten fires in the high suppression, low effectiveness category included the previously discussed Ferguson, Mendocino Complex, and Carr Fires in California and Sunrise Fire Montana. Most other wildfires in this category were similar in size, burning conditions, and threat to structures that drove tactics towards moderate to high levels of fireline engagement. The exceptions were two smaller fires -Gold Hill in Montana and Whitewater in Oregon-which had substantial fireline built for their size but low engagement. Most of the unengaged fireline from Gold Hill was constructed dozer line, but fireline from Whitewater was primarily road as completed line prepped by masticating adjacent fuels. The incident reports for both these fires mention primarily confinement objectives, which appear to be met despite the low proportion of line that engaged with the fire and held. The remaining 18 fires in the low suppression, low effectiveness category were associated with a range of incident characteristics.  We propose a coarse rating system based on suppression level, as described by Tr, and total fireline effectiveness, as measured by HTr. Figure 5 illustrates how the 33 fires in this study are distributed in the two-dimensional space defined by these performance metrics. Tr was chosen as an indicator of suppression effort and cost. Our correlation analysis demonstrated that Er and HTr are least related to Tr and thus likely contain unique and complementary information. HTr was selected for this initial effort because it synthesizes both engagement and conditional effectiveness into one metric. Our qualitative evaluation of the incidents at the low and high ends of these metrics also indicated HTr grouped incidents with similar conditions, whereas there were few shared characteristics of fires with high Er. The resulting scatterplot of incidents by Tr and HTr ( Figure 5) shows that about half of all incidents have Tr ≤ 1 and HTr ≤ 0.5, which we define as the bounds of low and high suppression and low and high effectiveness, respectively. Ten fires fall within the zone Tr > 1 and HTr ≤ 0.5, which groups fires with high suppression and low effectiveness. Only five fires fall within the contrasting low suppression and high effectiveness zone defined by Tr ≤ 1 and HTr > 0.5. Although these thresholds are somewhat arbitrary, Tr of one is a logical breakpoint to identify fires with fireline production in excess of the fire perimeter length and HTr of 0.5 sets the expectation that at least half of the fireline constructed should engage with the fire and hold, thus contributing to containment. Note that none of the study fires fell into the high suppression (Tr > 1), high effectiveness (HTr > 0.5) category. Fire 2020, 3, x FOR PEER REVIEW 12 of 19 Figure 5. Scatterplot of total fireline to perimeter ratio (Tr), as a measure of suppression level, and held to total fireline ratio (HTr), as a measure of effectiveness. The size of the circle corresponds to the perimeter length of the fire. Broad zones of suppression and effectiveness levels are delineated to categorize outcome quality. The low suppression, high effectiveness zone is desirable whereas the high suppression, low effectiveness zone is undesirable.

Performance Measures by Fireline Type
Our exploratory analysis of performance measures by fireline type focuses on measures of Er, HEr, and HTr aggregated across all fires ( Table 2). It is important to recognize that some fireline types were rare in our data set and poorly distributed across incidents, and thus differences in effectiveness may be due to incident rather than fireline characteristics. A larger data set is needed for a multivariate analysis to control for incident-specific factors such as fire weather and fuels. Still, this initial exploration provides some insights on fireline type performance that can be gleaned from incident-level data. Dozer line accounted for most of the fireline length in this study (3441 km), followed by road as completed line (2056 km) and hand line (632 km). The remaining fireline types account for about seven percent of the total fireline length and were poorly distributed across fires.
The proportion of fireline that engaged with fire, Er, varied from 0.15 to 0.89 across fireline types. Fireline types associated with defensive tactics such as burnouts and aerial retardant drops had the highest Er, followed by hand line. Only half of road as completed line engaged with fire, consistent with our observation that this fireline type was often employed far from the fire perimeters, likely as part of a contingency containment strategy. Dozer line had slightly higher Er than road as completed line at 0.55, suggesting similar uses.
HEr ranged from 0.16 to 0.85 across fireline types. As with Er, burnouts ranked most effective. Hand line held at higher proportions (0.76) than wider line types such as dozer line (0.53) and road as completed line (0.55); we suspect this is due to differences in fireline placement and fire behavior rather than fireline quality. Mapped aerial retardant drops held 64% of the time they engaged with fire, but we do not have much confidence in these results due to the small sample size. All the aerial retardant drop fireline came from the Sunrise Fire and further inspection revealed that much of the Figure 5. Scatterplot of total fireline to perimeter ratio (Tr), as a measure of suppression level, and held to total fireline ratio (HTr), as a measure of effectiveness. The size of the circle corresponds to the perimeter length of the fire. Broad zones of suppression and effectiveness levels are delineated to categorize outcome quality. The low suppression, high effectiveness zone is desirable whereas the high suppression, low effectiveness zone is undesirable.
The five fires in the low suppression, high effectiveness category were either smaller fires (<10,000 ha) burning in timber fuel types or larger fires in brush or tall grass fuel types. The three fires with timber fuel types-Maple, Trail Mountain, and West Valley-were primarily in backcountry areas of USDA Forest Service lands (49.1-100% wilderness and roadless). These incidents reported fewer days with high or extreme fire behavior (16.1-40% of days) than average. The Mesa and South Sugarloaf Fires were fast-growing, wind-driven fires in brush or tall grass that allowed for only limited containment with direct tactics until weather moderated. The ten fires in the high suppression, low effectiveness category included the previously discussed Ferguson, Mendocino Complex, and Carr Fires in California and Sunrise Fire Montana. Most other wildfires in this category were similar in size, burning conditions, and threat to structures that drove tactics towards moderate to high levels of fireline engagement. The exceptions were two smaller fires-Gold Hill in Montana and Whitewater in Oregon-which had substantial fireline built for their size but low engagement. Most of the unengaged fireline from Gold Hill was constructed dozer line, but fireline from Whitewater was primarily road as completed line prepped by masticating adjacent fuels. The incident reports for both these fires mention primarily confinement objectives, which appear to be met despite the low proportion of line that engaged with the fire and held. The remaining 18 fires in the low suppression, low effectiveness category were associated with a range of incident characteristics.

Performance Measures by Fireline Type
Our exploratory analysis of performance measures by fireline type focuses on measures of Er, HEr, and HTr aggregated across all fires ( Table 2). It is important to recognize that some fireline types were rare in our data set and poorly distributed across incidents, and thus differences in effectiveness may be due to incident rather than fireline characteristics. A larger data set is needed for a multivariate analysis to control for incident-specific factors such as fire weather and fuels. Still, this initial exploration provides some insights on fireline type performance that can be gleaned from incident-level data. Dozer line accounted for most of the fireline length in this study (3441 km), followed by road as completed line (2056 km) and hand line (632 km). The remaining fireline types account for about seven percent of the total fireline length and were poorly distributed across fires. Table 2. Fireline length, fireline outcomes, and fireline performance metrics aggregated across fires. The type "other" includes several rare and/or unattributed fireline types including unknown, explosive line, highlighted manmade feature, and ridge/geographic feature). The proportion of fireline that engaged with fire, Er, varied from 0.15 to 0.89 across fireline types. Fireline types associated with defensive tactics such as burnouts and aerial retardant drops had the highest Er, followed by hand line. Only half of road as completed line engaged with fire, consistent with our observation that this fireline type was often employed far from the fire perimeters, likely as part of a contingency containment strategy. Dozer line had slightly higher Er than road as completed line at 0.55, suggesting similar uses.

Line
HEr ranged from 0.16 to 0.85 across fireline types. As with Er, burnouts ranked most effective. Hand line held at higher proportions (0.76) than wider line types such as dozer line (0.53) and road as completed line (0.55); we suspect this is due to differences in fireline placement and fire behavior rather than fireline quality. Mapped aerial retardant drops held 64% of the time they engaged with fire, but we do not have much confidence in these results due to the small sample size. All the aerial retardant drop fireline came from the Sunrise Fire and further inspection revealed that much of the line that held was associated with the completed line category or roads, suggesting there were factors beyond the retardant drops that influenced containment.
Total fireline effectiveness, HTr, ranged from a low of 0.07 for the other category to 0.75 for burnouts. HTr was close to 0.5 for hand line, multiple methods, and aerial retardant drop. Dozer line and road as completed line had nearly identical HTr with less than 30% of the total length engaging with and holding fire.

Overview and General Synthesis
We found that incident-level performance metrics varied widely across a diverse set of wildfires and generally aligned with situational factors that influence suppression strategy. The performance metrics each speak to different aspects of FLE, and thus all have merit to address specific effectiveness questions. However, we found that Er and HEr were highly or moderately correlated with HTr ( Figure 4) suggesting these related metrics often contain duplicative information on fireline performance. The combination of Tr and HTr appears promising as a simple framework to stratify incidents by suppression effort and efficiency ( Figure 5). Several incidents with low Tr and high HTr provide positive examples of managing wildfires with limited and high efficiency suppression actions. Incidents with high Tr and low HTr may be undesirable because of the substantial firefighting costs and risks to personnel and the inefficiency of their actions at achieving containment outcomes. These incidents could be reviewed in detail to diagnose the root cause(s) in order to adapt management of future wildfires. For example, managers could investigate what fireline types burned over in relation to fuel type, topography, weather, and fire behavior, or what information and strategy decisions led to excessive line production. These findings could be communicated internally with meetings or formal documents, and externally with after action reviews or briefing documents for incoming incident management teams.
Our results document that FLE performance metrics vary widely across large wildfires in relation to both characteristics of the fire environment and containment strategies. First, the study fires covered a wide gradient in Tr from 0.07 to 3.15, which suggests we captured incidents with a variety of suppression strategies ranging from limited perimeter control to extensive direct and indirect fireline construction. Most fires (69.7%) had Tr < 1 indicating less than full perimeter control as many of the study fires were burning at least partially within remote areas of public lands with limited protection concerns and control opportunities. Fires with Tr ≥ 1 occurred in a variety of conditions, but most directly threatened communities; two of the fires with Tr > 2 were burning into densely populated areas of California and the fire with the highest Tr was threatening rural communities in Montana along a major highway. The average proportion of fireline that engaged with fire (Er) was 0.5 and the maximum was just under 0.80, which indicates that substantial resources on large fires are devoted to indirect control efforts that are likely part of alternative or contingency control plans.
An overly narrow interpretation of our Er results is that half of all fireline on large fires is ineffective. We caution against this interpretation without further analysis to understand if direct control efforts that would lead to higher fireline engagement were feasible, practical, safe, or cost-effective; we return to this point later in the discussion. Further, there is reason to believe that building redundancy into containment efforts may improve overall performance as part of broader risk management strategies [41][42][43]. A bright point was that most fireline that engaged with fire held (67%), suggesting that excessive production of ineffective fireline is rare. Incidents with HEr below 0.5 were most often burning under extreme weather conditions and into populated areas where the reward for containment was high. The proportion of total line that held (HTr) had a mean of only 0.33 and a maximum of 0.75. While the causes of low HTr are varied and should be interpreted in the context of the management objectives for each wildfire, it is a potential concern that an average of two thirds of fireline produced does not contribute directly to containment.
While it may seem optimal for all fireline to engage with fire, a variety of factors may justify investments in alternative or contingency lines that do not engage. In this study, we did not quantify the cost of fireline construction or improvement, but we suspect that many of the least proximal lines to the fire perimeters represent lower cost actions such as improving existing roads or constructing dozer line in ideal terrain and vegetation. In addition to potentially costing less than constructing similar barriers in non-ideal conditions closer to the fire, these indirect control efforts should lower operational risks to firefighters and allow greater flexibility to situate fireline in locations with higher probability of control such as along roads, streams, and fuel transitions [17,[25][26][27]. We did not evaluate weather conditions for these incidents in detail, but the 2017 and 2018 fire seasons were typified by extremes [44], and many of the incident reports mention dry fuels and high winds driving active and extreme fire behavior that limit the ability for suppression resources to engage in direct containment tactics [35,45]. Therefore, it is possible that indirect fireline construction may have been the only option to make use of the available resources on some fires. We also observed that many of the wildfires were burning primarily in remote public lands and only threatened human values along a single front of the fire. Prepping high probability control features only where the fire presents a risk to important values may facilitate a less aggressive approach to fire management that reduces other costs and operational risks even if these firelines do not engage with the fire. Given that, on average, only 50% of fireline engages with fire, future research should focus on understanding the motivations for indirect fireline construction and how it influences the decision space for other containment measures. An empirical statistical approach, similar to what others have done for suppression resource use and expenditures [46][47][48][49][50][51], may be a useful direction to help refine what levels of Er are appropriate for different social and ecological settings.
In this study, we excluded the fireline type completed line after observing that it was primarily applied to highly irregular boundaries of the final fire perimeter. Our interpretation is that this line type was most often assigned where post-arrival mop up facilitated control, but we also saw evidence that it was used where firing and holding operations were conducted along fuel transitions and roads. We believe it is warranted to remove these opportunistic activities in the context of evaluating the quality of planned containment actions, but it has implications for several of our performance measures. Including completed line would increase the value of all our performance metrics, which would be interpreted as improving Er, HEr, and HTr outcomes and worsening Tr outcomes. The average values Er, HEr, and HTr increase to 0.65, 0.79, and 0.52, respectively with completed line. Including completed line would increase the number of fires with Tr ≥ 1 from 10 to 17 and increase the maximum observed value from 3.1 to 3.6. The important caveat for performance improvements in Er, HEr, and HTr, is that they are often conditioned on moderating fire behavior [20,21]. Including containment actions during these periods provides a better perspective of the full effort and outcomes of containment on the incident, but resulting performance measures may provide false confidence about probability of success during the period(s) of rapid growth that can account for most area burned by large wildfires.
Many of the previous discussion points highlight that the incident is not always the best spatial and temporal scale to evaluate the quality of containment actions. For example, viewing mop up as an end of incident task does not consider that it may also represent a planned containment action for the subsequent burn periods with forecasted harsher weather conditions. Similar limitations apply to line classified as burned over relative to the final fire perimeter. It is possible that burned over line held during the period it first engaged with the fire, but was breached as fire behavior intensified during later burn periods, or it held as intended, but was engulfed when fire spread around it. The incident-level FLE metrics presented here do not account for the substantial variation in the fire environment and containment objectives that can occur on large fires that burn over multiple days, weeks, or months. Geospatial analysis of FLE outcomes on a daily or sub-daily basis would help to constrain the timeline of fireline engagement to better understand the fire environment factors associated with outcomes by fireline type (Table 2). Our exploratory analysis suggests improvements are needed in both the frequency of fire extent mapping and fireline date and time attributes. For example, many fires are not mapped every day, and mapping sometimes focuses on only the most actively burning portion of the fire. Similarly, fireline date and time attributes can be incomplete, inaccurate, or in conflict with nearby features. We also know little about the type and intensity of tactics (e.g., burnout or holding operations) that influence probability of successful containment at these features [52]. Improved data collection and management of fire behavior on incidents [8], fire perimeters, fireline, and resource use could facilitate the type of rich analyses proposed by Plucinski [7]. Legislation in the USA has mandated improved collection of some operational data that could facilitate this research in the future [53].
The exploratory analysis we presented illustrates how FLE metrics by fireline type could increase understanding of how different resources contribute to containment and provide an empirical basis for estimating probability of success. Such efforts would benefit from a larger sample size and data on fire behavior and weather, fuel types, and tactics to control for the variation in these factors across burn periods or incidents. Results for Er largely confirm our subjective interpretations that dozer line and roads are often part of contingency strategies, as close to 50% of these line types did not engage with fires. In contrast, approximately two thirds of hand line engaged with fire and even higher engagement was achieved with burnouts (89%) and retardant drops (82%). Surprisingly, the HEr for dozer line and roads were just above 0.50, which were lower than all other fireline types, save for the other category. Dozer line and road as line accounted for the vast majority of line recorded on fires (83.2% of length), so it may be possible that the other line types were poorly represented, or these other line types were selectively applied in conditions where probability of success was high. Fireline width should increase containment probability [14,54], so there are likely confounding factors influencing the effectiveness of dozer line and roads such as slope position or exposure to extreme fire behavior with spotting. The substantial proportion of dozer line that does not engage with fire is concerning given the negative environmental impacts and associated costs to rehabilitate disturbed areas. The limited observations of burnouts suggest suppression firing is highly effective, but we suspect these activities were under-reported. Suppression firing will achieve high levels of fireline engagement, but there are sometimes negative impacts associated with the extra area burned [55] and not all potential control locations are conducive to suppression firing.

Data Quality and Analysis Limitations
Current fireline data collection and management processes are prone to duplicating fireline. The primary issues we encountered were multiple passes recorded on constructed line types, replicated identical features, and features with similar spatial geometry. Detailed review of fireline data may be possible for a study of a single fire [10], but it is infeasible for a large dataset due to time constraints and limited information available on some fires to evaluate data accuracy. Our data cleaning focused on automated rulesets to reduce the prevalence of duplicated line so as not to artificially inflate estimates of fireline production or outcomes. However, it is possible that overlapping firelines sometimes do represent unique containment actions that should be accounted for separately to accurately gauge containment effort; for example, it is possible that hand line could later be widened with a dozer. Duplicate firelines were most prevalent on high suppression incidents that were actively managed for long periods, likely due to a combination of transitioning data management between incident management teams and the greater opportunity afforded by time to inventory, assess, and reinforce firelines.
As noted in the methods section, there is some uncertainty in the spatial location of firelines and fire perimeters that can influence fireline outcome assignments as burned over, held, or not engaged. Our sensitivity analysis revealed that the choice of buffer distance around the fire perimeter to define held line did not meaningfully affect FLE performance measures for most incidents. For a detailed review of fewer incidents, post-fire aerial imagery could be used to refine fire perimeter and fireline locations and to assess the spatial accuracy of the incident data. A potentially larger source of uncertainty is from undocumented fireline. Incident status summary reports could be queried to identify differences between reported activities and those recorded in the spatial fireline data. In some cases, it may be possible to identify and digitize missing firelines from post-fire aerial imagery. Post-fire imagery could also be used to validate fireline type and to collect additional attributes such as width, although evidence of suppression activities may not be well preserved in areas burned at moderate or high severity or if the imagery is not timely.
Our focus on fireline outcomes in simple categories of burned over, held, and not engaged covers the core functions of firelines to engage with and hold wildfire, but this framework does not capture all fireline benefits. For example, fireline that ultimately burned over may have functioned to temporarily check fire growth. Slop overs near the final wildfire perimeter are classified as burned over in our framework, despite that firelines were likely important for staging resources for containment at nearby locations. Similarly, fireline that was near the final fire perimeter, but did not engage with it may have played a similar role. Higher spatial and temporal resolution recording of fire extents and suppression tactics are needed to fully understand how fireline contributes to control in the busy divisions of fire perimeters. The difficulty of modeling the avoided impacts from containment also remains a barrier to progressing FLE evaluation towards a cost-benefit framework to judge the quality of decisions based on more than fireline outcomes [4].

Management Implications
FLE performance metrics are immediately relevant for coarse-scale filtering of incidents to identify fire conditions and suppression strategies associated with efficient or inefficient outcomes. Our proposed rating framework based on Tr and HTr is a simple, intuitive, and objective way to classify incident performance. Implementing it would require greater attention to defining the range of values that constitute acceptable or poor performance for each metric and if these standards should vary by incident characteristics (e.g., fire size and behavior, social and ecological settings, resource limitations, or safety concerns). After identifying low performing fires, Er and HEr can diagnose whether inefficiencies stem from overinvestment in contingency fireline or excessive production of fireline with low probability of holding. There are likely lessons to be learned from scrutinizing the fireline that did not engage or that burned over on these incidents. For example, managers can ask whether it was reasonable to expect the fire to reach contingency lines given the forecasted weather at the time of the decision and whether firelines were appropriately situated to protect high value assets. They should also examine what sources of information were used to estimate fire growth and if it is possible to reduce uncertainty in these predictions with additional data or models. Similarly, managers can examine characteristics of the burned over fireline (e.g., construction method, width, slope position, fuel types on the transmitting and receiving sides of the line) and, when possible, the associated weather conditions, fire behaviors, and suppression activities associated with the breach to identify conditions under which future firelines are unlikely to be successful. Ideally, future work will build from the FLE performance metrics described here to support monitoring, feedback, and learning within the fire management community. Better monitoring and data archiving of wildfire conditions and suppression operations are critical to advance in this direction. So too is aligning design of performance evaluation systems with advances in fire analytics that support operationally relevant planning [5,24,25,27,28,56,57].