Assessing the assumptions of classification agreement, accuracy, and predictable healing time of sea lamprey wounds on lake trout

Sea lamprey control in the Laurentian Great Lakes relies on records of sea lamprey wounds on lake trout to assess whether control efforts are supporting fisheries management targets. Wounding records have been maintained for 70 years under the assumption that they are a reliable and accurate reflection of sea lamprey damage inflicted on fish populations. However, two key assumptions underpinning the use of these data need thorough evaluation: sea lamprey wounds follow a predictable healing progression, and individuals classify wounds accurately and reliably. To assess these assumptions, we conducted a workshop where experienced professionals examined lake trout with known sea lamprey wounds. For most lake trout, pictures were taken at regular intervals during the healing process. Our evaluation of wound pictures found high variability in healing times and wound progressions that did not conform to the currently used classification system. Participants’ wound classification agreement and accuracy were low and misclassification rates were high for most wound types. Training provided during the workshops did not markedly improve these metrics. We assessed wound classification accuracy for the first time and found assumptions of high accuracy and agreement are not met. We recommend misclassification rates be incorporated into models using wound data, sensitivity analyses be conducted to assess the potential impact of wound misclassification on estimates of key metrics (such as sea lampreyinduced mortality for lake trout), and alternative biomarkers be developed to quantify wound status with greater accuracy and precision. 2020 International Association for Great Lakes Research. Published by Elsevier B.V. All rights reserved.


Introduction
Records of wounds (commonly called marks) on lake trout (Salvelinus namaycush) resulting from sea lamprey (Petromyzon marinus) parasitism have been used to inform fisheries management and sea lamprey control in the Laurentian Great Lakes for over 70 years (Eshenroder and Koonce, 1984). Through a coordinated effort involving multiple agencies in the Great Lakes basin, sea lamprey wound data are collected on an annual basis and aggregated for many different uses. Wounding data have been used for estimating sea lamprey-induced mortality of target fish species Lantry et al., 2015;Schneider et al., 1996;Sitar et al., 1999), evaluating the success of sea lamprey control program (Adams et al., 2003;Rutter and Bence, 2003), allocating resources for sea lamprey control (Irwin et al., 2012;Koonce et al., 2004), and setting fish community targets (Horns et al., 2003). Given the important applications of these data, it is important to periodically assess the effectiveness of the standardized wound classification protocol to ensure wound data are accurate and reliably classified.
The procedures for collecting and aggregating sea lamprey wound data on lake trout have changed over time to meet shifting application and data quality needs. Initially, wound data were recorded as the total number of wounded lake trout and the average number of wounds per fish from sporadic netting efforts and creel censuses such as those in the South Bay of Lake Huron (Budd et al., 1969). Information about the size and latency of the wounds was sometimes included, but there was no standardized reporting procedure (Eshenroder and Koonce, 1984;Pycha and King, 1975). Concerns regarding the uniformity of wound data collection and the lack of clarity in descriptions of the character and age of wounds, prompted the development of the King (1980) classification system with the goal of standardizing the assessment and recording of wound data. This system classified sea lamprey wounds as either Type A or Type B with four stages of wound healing (I-IV) (Fig. 1). A type-A wound is recorded when the skin is broken exposing the underlying musculature, and a type-B wound is recorded when the wound site is abraded, but there is no visible evidence of broken skin. The stage of a wound varies from a very recent wound (stage I) to a nearly fully healed wound (stage IV) (King, 1980). For example, the most severe wound would be classified as A-I, showing exposed musculature and recent sea lamprey detachment, and the least severe wound would be classified as B-IV, showing a completely healed wound with regenerated scales. Following the development of the classification system, Eshenroder and Koonce (1984) published a report suggesting only large A-I through A-III wounds be reported. Type-A wounds were thought to be more reflective of host mortality, and easier to distinguish from type-B wounds. Completely healed stage IV wounds were assumed to be caused by a previous cohort of sea lamprey (Eshenroder and Koonce, 1984). The most recent guide (Ebener et al., 2006), incorporated findings from a series of workshops to revise guidelines for reporting multiple wounds, sliding wounds, and wound size in a further effort to improve wound classification agreement among different agencies and field crews. Currently, sea lamprey wound records are used for a variety of applications, with the number of AI-AIII wounds recorded during sampling efforts as the primary observation. The wounding data are then used to estimate lake-wide lake trout wounding rates (also known as marking rates) that guide sea lamprey control efforts as well as sea lamprey-induced mortalities based on area specific wounding rates. More recently, Great Lakes fisheries managers have become interested in wounding rates on other species (e.g., lake whitefish (Coregonus clupeaformis)) to quantify impacts on other populations and characterize how availability of other hosts affects lake trout wounding rates.
The use of wound data for fisheries management and research questions relies on two key assumptions. The first assumption is that sea lamprey wounds follow a predictable healing progression transitioning sequentially from stage I to stage IV within the initial wound type. Most current applications that use sea lamprey wound data rely only on A-I through A-III wounds , in an attempt to capture recent wounding by a single cohort of sea lamprey. For example, type-A wounds occurring in the late summer and fall are expected to remain as identifiable A-I through A-III wounds in the following spring surveys. If wound healing time is highly variable, some fast-healing wounds may progress to A-IV before spring surveys begin, while other slow-healing wounds from previous cohorts may still be present as A-I through A-III wounds. If a sizable proportion of wounds follow healing progressions that result in switching wound types (e.g., type-B to type-A), wound records may not be accurate reflections of the true state of sea lamprey wounding. The second assumption is that staff from a variety of different agencies are able to classify wounds accurately and reliably. Given many different state, provincial, tribal, and federal agencies are responsible for collecting sea lamprey wound data, the methods used to assess and record this information as well as the skill-level of individual assessors must be uniform. Inconsistent approaches to wound classification or variation in ability to assess wounds could result in over or under-reporting of wound rates. Furthermore, discrepancies in wound healing progression, as highlighted in the first assumption, will make the accurate and reliable classification of sea lamprey wounds more difficult as the key characteristics used to classify wounds may be obscured or difficult to identify.
The assumption that wounds follow a predictable healing progression and healing time lacks strong evidence. Studies that assessed wound healing times found considerable variation that could confound the ability to identify individual sea lamprey cohorts Lantry et al., 2015;Nowicki, 2008;Schneider et al., 1996). At water temperatures that lake trout experience in the Great Lakes, a substantial proportion of wounds that occur in late summer and fall would heal to stage IV before spring surveys Ebener et al., 2003). Nowicki (2008) observed several instances of wounds changing type (from type-B to type-A) or following unexpected progressions (from A-IV to A-III) during the healing process. In field studies that assessed seasonal trends in wound rates during trawl and gill net surveys, little-to-no correlation was found between early stage wounds in early months and later stage wounds in later months (e.g., A-I wounds in July did not correlate with A-II wounds in September), suggesting discrepancies in wound healing progression or seasonal changes in survival of wounded fish (Lantry et al., 2015;Schneider et al., 1996). The relationship between healing time and water temperature may also result in a greater number of recorded wounds for benthic oriented lake trout that spend more time in cooler water where wounds heal more slowly . Finally, during our ongoing study assessing the sub-lethal effects of sea lamprey parasitism on lake trout, we noticed that many wounds did not appear to follow the healing progression outlined by the King (1980) classification system. Often, wounds classified as type-B immediately after sea lamprey detachment appeared to follow a healing progression that would likely lead to identification as type-A after 8-12 months of healing (Fig. 2). Despite known wound classification inconsistencies and the potential for non-conforming wound healing progression, the King (1980) system is currently the most frequently used classification scheme.
Several lines of evidence suggest that wound misclassification is occurring. Following a series of workshops, Ebener et al. (2003) Fig. 1. Examples of lake trout with sea lamprey wounds for each wound type (A and B) and stage (I-IV) in the King (1980) classification system. found high variability among individuals and agencies when classifying sea lamprey wounds on lake trout. Counts of A-I through A-III wounds sometimes varied three-fold despite individuals classifying the same group of lake trout. Although training during the workshops somewhat improved overall wound classification agreement, it remained poor for most wound types, with some types having lower observer agreement following training. Observer agreement also varied considerably by wound type . Workshops conducted in the mid-2000s also observed poor wound classification agreement (Nowicki, 2008). The results from these workshops provide evidence that the assumption of high wound classification agreement among individuals and agencies may not be met. If wounds are consistently misclassified, estimates of sea lamprey damage and sea lamprey-induced mortality (used in fishery catch-at-age models) may not be accurate.
The last revision to the wound classification system guidelines was published over a decade ago (Ebener et al., 2006), and the extent of wound classification inaccuracy and disagreement since then is currently not well characterized. The key objectives of this study are to quantify observer accuracy and agreement as well as the error associated with wound misclassification rates and overall wound detection rates, evaluate the efficacy of a workshop at improving wound classification agreement and accuracy, highlight wound types and locations that are particularly challenging to identify and classify, estimate healing the time between wound stages, and assess the degree to which wound healing progression conforms to the assumptions of the current classification system. Although similar workshops/studies have been conducted in the past, our study was the first to use fish with known wound histories, thereby permitting estimation of classification accuracy.

Fish
During October and November 2018, 24 twelve-year-old siscowet and lean lake trout reared and held at the University of Wisconsin-Stevens Point Northern Aquaculture Demonstration Facility (UWSP NADF) were parasitized in a laboratory setting by juvenile sea lamprey collected from Lake Superior. Hatchery lake trout have been used in previous studies (Goetz et al., , 2014(Goetz et al., , 2010Smith et al., 2016) and display similar physiological and morphological characteristics as their wild lake trout parents (Goetz et al., 2010). All sea lamprey used were actively parasitic and collected from lake trout hosts in the summer and earlyautumn of 2018 by commercial fishing operations. Lake trout used in the study weighed from 2.19 to 5.14 kg. Lake trout were removed from their raceways and individually placed in separate 1000 L tanks (7-7.6°C), each containing one sea lamprey. Each tank was regularly monitored during the day for sea lamprey attachment. Once attached, sea lamprey were allowed to feed for four days after which they were removed to prevent high lake trout mortality rates; preliminary observations suggest parasitism events lasting longer than four days have a high likelihood of killing the host . Following parasitism, wounds were immediately classified as A-I or B-I using the Ebener (2006) classification guidelines, and pictures of the wound site on each lake trout were taken. The lake trout were returned to the raceways and allowed to heal at water temperatures ranging from 7.0 to 7.6°C. Wounds were classified, and pictures of the wound sites were taken every week following parasitism until the start of the workshop to monitor wound healing progression. An additional group of 11 lake trout unexposed to sea lamprey were set aside for workshop participants to classify as well, the first time unwounded fish have been included in a wound classification workshop. Fish were euthanized with an overdose of tricaine methane sulfonate (MS-222) following Michigan State University and University of Wisconsin-Stevens Point approved IACUC (Institutional Animal Care and Use Committee) protocols the day of the workshop to ensure good specimen quality. To supplement the lake trout with known wound history, 16 more freshly wounded lake trout collected during spring field surveys by the Red Cliff Fisheries Department and the Wisconsin Department of Natural Resources were also provided. Although the wound history of fish collected during field surveys was unknown, the wound type and stage for each was classified by two experts prior to use in the workshops to serve as a benchmark following guidelines from Engelhard (1996).

Wound healing time
The time required for a wound to heal to the next stage (e.g., time for an A-I wound to heal to an A-II wound) was assessed using pictures and records taken on a weekly basis for each fish following parasitism. For each fish, the number of days elapsed before transition to the next stage was recorded. A Weibull distribution was fitted to healing time between each stage for both wound types using maximum likelihood estimation with the fitdistrplus package (Delignette-Muller and Dutang, 2015) in R 3.6.1 (R Core Team, 2019). The mean and standard deviation of the Weibull distribution for time spent in each stage was calculated for each wound type (type-A and type-B). As wounds were only examined on a weekly basis, healing times are approximate. Wounds that resulted in fish mortality (n = 5) were not included.  ical Survey, and Wisconsin Department of Natural Resources attended. Most attendees were part of field assessment crews or had previous experience with wound classification, but three reported having no prior field experience with sea lamprey wound classification.

Workshop
The workshop was structured as two separate wound classification trials: one on the first day soon after the participants arrived and one on the second day following debriefing, performance assessment, and additional training. For each trial, participants were presented with a series of lake trout (25 for trial 1, 22 for trial 2) to identify and classify the wounds present on the fish (if any). During the first trial, participants were asked to classify wounds using the procedures they were currently using in the field. During the second trial, participants were asked to incorporate what they had learned during the performance assessment and training when classifying fish. For both trials, participants were not informed whether each lake trout was wounded or not, and no discussion between participants was permitted. To more closely simulate field conditions, the participants were limited to 90s per fish to identify any wounds (if present) and record their classification. Participants were also asked to record the location of each wound and indicate whether the wound would be recorded in their agencies wound survey data.
The performance assessment and training were both designed to refresh participants on the wound classification procedure, highlight wounds that are often difficult to classify, and allow participants to discuss potential causes of variation in classification. Following the first wound classification trial, participants were given presentations about the wound classification system, wounds that are difficult to identify or classify, and how sea lamprey wound data are used to inform fisheries management in the Great Lakes. Participants were also given a hands-on demonstration of how to classify wounds on several fish. Photos of sea lamprey wounds were presented and participants were asked to discuss with the group which classification they would give each fish and why. Before the second trial, participants were also shown the results of the first trial accompanied with pictures of the initial wound and the subsequent pictures of the wound as it healed (Electronic Supplementary Material (ESM) Appendix SI). Following the second wound classification trial, participants were split into three discussion groups. Each group was asked to discuss: 1) what aspects of sea lamprey wound identification most surprised them; 2) problems with wound classification and 3) how wound classification and the system as a whole could be improved. After discussion within the groups, one person from each group was asked to present their findings to everyone. Key discussion points and findings were summarized. Results from the second wound classification trial were compiled and sent to participants via email after the workshop (ESM Appendix S2). Notes from the group discussions are included in ESM Appendix S3. Participants were also given a post-workshop survey where the usefulness of the workshop and general comments were recorded (ESM Appendix S4).

Agreement, accuracy, and misclassification statistics
Gwet's First-Order Agreement Coefficient (AC 1 ) calculated with the R package ragree (Redd, 2019) was used to assess the chancecorrected agreement among participants classifying sea lamprey wounds (Gwet, 2008). Agreement was assessed overall for all wounds as well as broken down by wound type and wound stage (stage I-III and stage IV). AC 1 values less than 0.20 were considered poor agreement, 0.21-0.40 were fair, 0.41-0.60 were moderate, 0.61-0.80 were substantial, and 0.81-1.0 were considered almost perfect agreement (Landis and Koch, 1977). For comparison purposes with previous studies , an AC 1 value of 0.4 was considered the minimum satisfactory level of agreement.
Wound classification accuracy was assessed by comparing participant classifications to benchmarks obtained via discussion and consensus of two expert panelists following guidelines from Engelhard (1996). Accuracy was assessed as the percentage of participants who correctly classified both the wound type (type-A or type-B), and wound stage (I-IV) as indicated by the benchmark classifications. Because we were also interested in the ability of participants to distinguish between type-A and type-B wounds, the percentage of participants who correctly classified the wound type (regardless of stage) was also recorded. For fish with multiple wounds, the classification was only considered correct if the participant correctly classified all wounds present. Classifications from fish with multiple wounds were not included when reporting summarized accuracy as it was not possible to determine which wound (s) were incorrectly classified. Because counts of A-I through A-III wounds are often aggregated for lake-wide wounding rate estimations, the percentage of A-I through A-III wounds that were classified within the aggregated A-I through A-III category (even if not exactly correct) was also calculated.
To quantify misclassification rates, wound classification data from both trials were combined. Accuracy and agreement was consistent between trials, so pooling among trials was justified. Wound classification data from fish with multiple wounds were removed from misclassification rate estimates as it was impossible to determine which individual wound was classified by the participant. For each wound type and stage, participants' responses were tabulated to display the percentage of correct and incorrect classifications. Incorrect classifications were further subdivided into the specific misclassified response.

Wound healing time
Healing time from stage I to stage II was similar for both type-A and type-B wounds (Fig. 3). The time for an A-I wound to heal to an A-II wound was 11 ± 3 days (mean ± standard deviation), and the mean healing time for B-I to B-II was 9 ± 3 days. Progression from stage II to stage III was considerably more variable than from stage I to stage II for both wound types (Fig. 3). Healing to stage III took approximately half as long on average for type-A wounds (32 ± 12 days) than for type-B wounds (68 ± 33 days). Similarly, healing time from stage III to stage IV was shorter for type-A wounds (45 ± 26) than for type-B wounds (64 ± 20). Overall healing time from stage I to stage IV ranged from 10 to 133 days for type-A wounds (mean 96 ± 15).
Although wounds that resulted in lake trout mortality were not included in wound healing time analysis, two lethal wounds followed uncharacteristic healing progressions. In one instance, a lake trout received multiple type-B wounds from a sea lamprey. The wounds initially appeared mild but during the healing process the wound sites became inflamed and necrotic, ultimately leading to the death of the fish after 21 days (Fig. 4). Three other instances of a type-B wound resulting in lake trout mortality were observed, but these followed expected type-B wound healing progressions.

Wound classification agreement
For the first wound classification trial, overall agreement among reviewers was ''fair" (AC 1 = 0.36) ( Table 1). Agreement varied by wound type and stage. Unwounded fish had the highest classification agreement (AC 1 = 0.79), and fish with multiple wounds had the lowest classification agreement (AC 1 = 0.15). A-I through A-III wounds had only ''slight" agreement (AC 1 = 0.15). During trial 1, type-B wounds had greater classification agreement than type-A wounds, and earlier stage wounds (I-III) had lower classification agreement than late stage wounds (IV) ( Table 1). With the exception of type-B wounds (Z-test, z = 1.59 p = 0.06), agreement was statistically greater than expected by chance (p < 0.05). Classification agreement was also below the 0.4 threshold for all categories except unwounded fish.
Overall classification agreement improved slightly for trial 2 (AC 1 = 0.37), but agreement among reviewers remained ''fair". Despite the slight improvement in overall agreement, the improvements were inconsistent across wound types. Although agreement was higher in trial 2 for type-A, stage I-III, and stage IV wounds, it was lower for type-B, unwounded, and fish with multiple wounds (Table 1). Agreement was also higher in trial 2 for A-I through A-III wounds (AC 1 = 0.32), but was more variable than in trial 1. Agreement among observers was statistically greater than chance alone (p < 0.05) for all categories with the exception of fish with multiple wounds (Z-test, z = 1.24, p = 0.11) and A-I through A-III wounds (Ztest, z = 1.34, p = 0.09). Classification agreement remained below the 0.4 threshold for all categories except unwounded fish.

Wound classification accuracy
In the first trial, lake trout wounds were correctly classified 28% of the time (Table 2). To break this down further, unwounded fish had the highest classification accuracy (89%), and fish with multi-  4. A lake trout wound that was classified as B-I following sea lamprey detachment. Over 21 days, the wound became more severe ultimately resulting in mortality. ple wounds had the lowest classification accuracy (2%). Participant's ability to correctly classify wounds did not vary by wound type, but was more accurate for early stage wounds (stage I-III) than late stage wounds (Table 2). On a coarser scale, participants identified the correct wound type (regardless of stage) 52% of the time. Type-A wounds were correctly classified as type-A 57% of the time, and type-B wounds were correctly classified as type-B 49% of the time. Stage I-III wounds were easier to classify to wound type than stage IV wounds (67% and 25% respectively). Fish with multiple wounds had all wounds correctly identified to wound type 5% of the time. Stage I-III wounds were accurately identified to wound type 67% of the time (Table 2). Participants classified A-I through A-III wounded fish within the A-I through A-III category 67% of the time in trial 1, and non-A-1 through A-III fish (unwounded, A-4, and B-I through B-IV) were classified in the A-I through A-III category 5% of the time.
Overall classification accuracy improved slightly in the second trial with 29% of wounds being correctly classified to wound type and stage (Table 2). Unwounded fish continued to have the highest classification accuracy, but accuracy declined from the first trial (69%). Accuracy classifying fish with multiple wounds improved to 12%. Type-A and type-B classification accuracy remained similar to trial 1. Accuracy improved slightly over trial 1 for stage I-III wounds. A-I through A-III wounds were more accurately classified in trial 2 (53%). Despite slight improvements in classifying wounds to both type and stage, on a coarser scale, ability to classify the correct wound type (regardless of stage) was worse overall (47%). However, accuracy to wound type improved for all stage I-III wounds and for A-I through A-III wounds (80 and 83% respectively) ( Table 2). In trial 2, participants classified A-I through A-III wounded fish within the aggregated A-I through A-III category 81% of the time, and non-A-1 through A-III fish (unwounded, A-4, and B-I through B-IV) were classified in the A-I through A-III category 7% of the time.

Misclassification rates
For most wound types and stages, the majority of misclassifications were off by only one stage. For example, A-II wounds were correctly classified 44% of the time, but were misclassified as A-I 17% of the time and as A-III 12% of the time (Table 3). Although wounds going undetected were relatively infrequent for wounds in stage I-III, both A-IV and B-IV wounds were highly likely to be missed and classified as unwounded (64% and 49% respectively). Type-B wounds appeared to be frequently misclassified as type-A wounds at later stages of healing. For example, B-II wounds were classified as A-III or A-IV wounds 34% of the time, and B-III wounds were classified as A-III or A-IV wounds 30% of the time. Participants appeared to distinguish early stage A wounds (I-III) from late stage A wounds (IV) with reasonable success. A-I through A-III wounds were classified as A-IV wounds fewer than 10% of the time (Table 3). Note that sample sizes were small for some wound types.

Group discussion
When asked to discuss what aspects of wound classification surprised them the most after seeing the results from trial 1, several themes were commonly expressed (ESM Appendix S3). Multiple groups mentioned having difficulty identifying wounds in unexpected locations such as on fin rays or the operculum. Wounds in unexpected locations were discussed in detail following trial 1, and some participants noted seeing wounds in these locations fairly frequently during field surveys. However, some participants mentioned that knowledge of wounds in unexpected locations may have led them to be more likely to classify a fish as wounded during trial 2, even if no wound was present. Each group also indicated that wound healing progressions where type-B wounds transition into type-A wounds after a skin sloughing event was surprising. The quick healing times of some wounds and the high level of disagreement between classifications of type-A and type-B wounds were also unexpected.
Groups were asked to identify any problems they were encountering with the current wound classification system. One concern mentioned was that most field crews do not have sufficient time to thoroughly assess a fish for sea lamprey wounds which may increase the proportion that are missed. The inherent subjectivity in the wound classification process was also identified as a potential problem for reliable and accurate wound records and was highlighted by the variability in wound classification. Often during field surveys, multiple people will examine a wound and come to a consensus which may reduce variability. Participants also mentioned that the perceived importance of A-I through A-III wounds may result in more attention being paid to those wounds when found in the field. As a result, fewer A-IV or B-I through B-IV wounds may be recorded, and misclassification rates may be higher as an artifact of less time being spent assessing these wounds.
Not all of the recommendations in the most recent guide (Ebener et al., 2006) are universally followed. For example, the reporting guidelines state that wound size should be recorded, so sea lamprey cohorts can be separated (Ebener et al., 2006). Larval sea lamprey typically spend a number of years growing in stream sediment before metamorphosing and migrating to a Great Lake during the fall (Hanson and Swink, 1989;Manion and Smith, 1978), although outmigration has been observed throughout the year (Applegate and Brynildson, 1952). The parasitic juveniles then feed on fish for the next 12-18 months after which they stop feeding and switch energy allocation to spawning, at which point they are considered adults. During April through June, adults are sexually mature and seek out a tributary in which to spawn, and subsequently die (Nowicki, 2008). Thus, two cohorts of sea lamprey are present in the lake at a given time. The intent of the classification Table 2 Percentage of wounds correctly classified by workshop participants before (trial 1) and after (trial 2) training. Classifications from fish with multiple wounds were not included in other categories as it was not possible to determine which wound(s) were incorrectly identified. guide was to omit smaller wounds (less than 20 mm) associated with recently out-migrated juveniles as they are unlikely to cause damage to fish stocks (Ebener et al., 2006). Agency adherence to this guidance is unknown; currently we are aware of only one agency (Ontario Ministry of Natural Resources and Forestry on Lake Huron) recording wound sizes. Each group was also asked to discuss potential ways wound classification could be improved going forward. All groups identified improving and standardizing wound classification training. Ideas included requiring an online quiz each season prior to field work that must be passed before an individual is authorized to classify wounds, and holding regular ''hands-on" workshops. Such approaches may reduce the likelihood of improper techniques or practices being passed down to newly hired staff. Our postworkshop survey indicated participants generally found value in this type of workshop (ESM Appendix S4).

Discussion
Assumptions of consistent wound healing time and progression, high classification agreement among reviewers, and high reviewer accuracy are likely not being met. Wound healing times varied considerably and some wounds did not follow expected healing progressions. Classification agreement was below the minimum threshold for all wound types with the exception of unwounded fish. Reviewer accuracy was also generally low, though A-I through A-III wound classification accuracy did improve following training. The implications of and potential solutions to these issues vary and are discussed in further detail below.

Wound healing time
The wound classification system relies on the assumption that as a wound heals, it will follow a predictable healing pattern transitioning sequentially from stage I to stage IV within the initial wound type. Additionally, it is assumed that the time required to heal from one stage to the next is consistent enough that wounds can be attributed to different cohorts of sea lamprey based on the healing stage. However, studies that have assessed wound healing time have found considerable variation that could influence the ability to separate cohorts Nowicki, 2008). Variation in healing time was high enough in wild-caught lake trout that Nowicki (2008) concluded wound classification schemes should not be used as an indicator of time since wounding or of the health of the host fish. Our results provide some support for these previous findings. Healing times in our study did vary, but the variation was stage dependent. Although healing times from stage I to stage II were fairly consistent for both type-A and type-B wounds, healing to later stages had high variation. Despite this variability, all healing times we observed were relatively rapid compared to the assumption that wounds occurring in autumn will remain as A-I to A-III wounds in spring. For the type-A wounds we monitored in our study, nearly all of them would have transitioned to A-IV wounds prior to spring surveys and therefore would not be included in A-I through A-III wounding statistics if they were classified in the field.
Many factors can influence wound healing times. Healing times are known to change with water temperature, which could contribute to the rapid healing we observed. Wounded lake trout in this study were allowed to heal at consistent temperatures of 7-7.6°C, and had similar healing times to those reported at 10°C . Wounds occurring on fins, or the operculum also appeared to heal much more rapidly than other wounds (ESM Appendix S1 and S2) which may reduce the likelihood of detection for these wounds. Wounds in such locations have been observed leading to mortality and sub-lethal effects on lake trout (Firkus, unpublished results), so detection of these wounds is still important. Further work quantifying the healing times of wounds on different morphotypes of lake trout from different lakes and water temperatures would be beneficial to understand the implications of healing time for wound records. Regardless, our results add support to previous findings that observed healing times are likely problematic for the assumption that A-I through A-III wounds capture the activity of the most recent cohort of parasitic sea lamprey Ebener et al., 2003;Nowicki, 2008).
In addition to healing time variation, observations of wounds following healing progressions that do not conform to the classification system may challenge wound data assumptions. Wounds similar to the ones shown in Figs. 2 and 3 show an increase in severity as they heal, either with the wound changing from type-B to type-A or with a type-B wound leading to mortality. Other studies have documented similar findings; either as wound classifications progressing from a later to earlier stage (Nowicki, 2008), or as ''sloughing B-wounds" where tissue around a type-B wound will slough off exposing underlying musculature and taking on the appearance of a type-A wound (Ebener et al., 2006. Additionally, four type-B wounds resulted in lake trout mortality during this study. Current use of sea lamprey wound records only consider A-I through A-III wounds under the premise that type-B wounds do not contribute significantly to host mortality (Ebener et al., 2006Eshenroder and Koonce, 1984). Although it is likely that type-A wounds result in lake trout mortality more frequently, the assumption that type-B wounds do not inflict mortality may not be valid. Adams et al. (This issue) explored this in simulations, and found that increasing the type-B lethality rate from 0 to 24% of the type-A lethality rate (the maximum observed by Swink 2003) did not significantly change the relation between observed wounding rates and underlying true attack rates. Table 3 Comparison of known wound classification with the classifications of workshop participants. The percentage of participants correctly classifying each wound type and stage is presented in bold. A-I  A-II  A-III  A-IV  B-I  B-II  B-III  B-IV  Unwounded   Participant Classifications (%)  A-I  36  17  2  1  -0  0  0  0  A-II  21  44  9  3  -0  0  4  0  A-III  14  12  62  3  -26  10  2  1  A-IV  0  3  9  7  -8  2 0  7  2  B-I  0  2  4  1  -11  0  2  1  B-II  0  7  2  1  -34  10  3  0  B-III  14  3

Classification agreement
High wound classification agreement amongst reviewers is an important assumption of the use of sea lamprey wound data. Unreliable and inconsistent classification by individual assessors and field crews could skew sea lamprey damage and fish population estimates. Wound records could also be spatially inconsistent if wound classification varies considerably among field crews covering different geographical areas of the Great Lakes. Likewise, wound records across years could be influenced by low classification consistency, especially if consistency changes over time due to new employees or adoption of new techniques and guidelines. Consistent with our findings, previous studies have found relatively poor classification consistency and agreement, both among agencies and individuals assessing the same lake trout Nowicki, 2008). In a prior study, researchers found that even following training, there was a two-fold difference in wounding rate records among agencies, and a four-to-five-fold difference among individual observers assessing the same fish . Later workshops where participants classified pictures of wounds also found low observer agreement (Nowicki, 2008). Although overall agreement was greater than due to chance, it was only in the ''fair" category both before and after training (Landis and Koch, 1977). Only unwounded fish had classification agreement that exceeded the 0.4 AC 1 threshold. Thus, the assumptions of high classification agreement among individuals was not met.
In our study, classification agreement varied by wound type and stage. Classifications of unwounded fish had the highest observer agreement (moderate-to-substantial) suggesting there is little confusion between observers when no wounds are present. Unsurprisingly, agreement was lowest for fish with multiple wounds. Not only is it more difficult to find multiple wounds on a fish, but when they are found, there will be inherently more disagreement by virtue of having more than one wound to classify. Agreement was not consistently higher for type-A or type-B wounds, but early stage wounds (I-III) had consistently lower agreement than stage IV wounds. Part of the reason for higher agreement for stage IV wounds could be attributed to the high misclassification frequency of fish with stage IV wounds as unwounded fish (Table 3). If a large proportion of observers classify a stage IV wounded fish as unwounded, agreement would still be high despite poor accuracy.
Classification of sea lamprey wounds is an inherently subjective process, so some degree of inconsistency and disagreement among reviewers and agencies will always be present. However, it is likely that not all of the inconsistency is due to the inherent subjectivity of the classification system. During the group discussions, some participants mentioned that they did not feel training for new hires was sufficient. Currently, there is no coordinated training program available for new hires working on biological crews that assess sea lamprey wounds on fish. As a result, trainees may receive different information depending on the experience of their co-workers and the guidance materials provided. Additionally, wound classification guidelines have been updated several times since originally published, and therefore it may be difficult for fisheries managers and field crew leads to identify the most up-to-date wound classification guide. As a consequence, field crews may be basing their classification practices on different iterations of the wound classification guidelines which could contribute to low consistency and agreement.

Classification accuracy
Fish with known wound histories created a unique opportunity to assess the accuracy of sea lamprey wound classification. We were able to compare participants' classifications with pictures of the wound's healing progression and expert benchmarks to determine if their classifications accurately reflected the known wound type and stage of healing. Previous studies have recorded classification agreement, but classification accuracy has not been previously documented. Accurate wound classification is a critical assumption for the use of wound data. When estimating sea lamprey damage, managers require wound records from the current year's cohort of sea lamprey. To obtain these, generally only records of A-I through A-III wounds are used under the premise that type-B wounds do not contribute significantly to host mortality and stage IV wounds are the result of a previous cohort of sea lamprey no longer present in the lake (Ebener et al., 2006Eshenroder and Koonce, 1984). The best practice is to record all wounds to allow for adjustments to be made if accuracy is low to the degree that A-I through A-III wounds cannot be distinguished from A-IV or B wounds, and to inform other applications that require consideration of all wound types.
Although the accuracy of specific wound classifications has not been investigated previously, findings of low classification agreement among individual assessors indicates a high rate of wound misclassification . The present study found that overall accuracy for all wound types was low both before and after training. Before training, only 28% of wounded fish were correctly classified to both wound type and stage. Following training, 29% were correctly classified. Such low accuracy rates may help explain the discrepancies in records of A-I through A-III wounds observed in other workshops. Accuracy for stage I-III wounds was generally higher than for stage IV wounds (Table 2) with a large proportion of participants misclassifying stage IV wounded fish as unwounded (Table 3). A-IV wounds were also more frequently classified as B-IV wounds than they were A-type wounds (Table 3). Although misclassifying stage IV wounds as the incorrect wound type or as unwounded fish would not have consequences for the current method of estimating sea lamprey damage, it should be accounted for in applications that require all wound stages. Currently, A-I through A-III wounds are aggregated when used to estimate sea lamprey damage, so it is not necessarily critical that wound classifications are correct for both wound type and stage. A-I through A-III wounds were correctly classified as A-I through A-III wounds 81% of the time following training which suggests that these estimates may be reasonably reliable when assessors have been trained. However, pre-training accuracy within the A-I through A-III category, which likely better represents current accuracy rates, was only 67%. Furthermore, the finding that type-B wounds are commonly misclassified as type-A at all stages (Table 3) could have implications for sea lamprey damage estimates as counts of A-I through A-III wounds would be inflated and sea lampreyinduced mortality will be overestimated. The degree of classification accuracy necessary for informing management decisions is unknown, but the assumption that wound classification accuracy is high may not be met, particularly when accuracy to wound type and stage is required.
The low accuracy and high rates of misclassification observed during this workshop have a number of potential causes. One factor that likely contributes to wounded fish being classified as unwounded fish is the presence of difficult-to-detect wounds. During the group discussions wound location and visibility were identified as potential factors that could influence classification accuracy. Participants also mentioned that many wounds heal in a manner that makes them difficult to classify accurately. Type-B wounds in which damaged skin sloughed off exposing the underlying musculature was noted as being particularly problematic and may contribute to difficulties with accurately classifying wounds. Varying degrees of severity within each wound type and stage also likely contributes to low accuracy. Small type-A wounds may not leave obvious characteristics indicative of a type-A wound for assessors to identify after the wound has begun healing. Likewise, large type-A wounds may make identification of the stage of healing difficult due to inconsistent healing of the entire wound surface.

Potential solutions
Although the results of this study suggest low wound classification agreement and accuracy among observers, there are several steps that can be taken to improve these metrics. One suggestion that was mentioned and supported during group discussions was to increase and standardize wound classification training for field crews tasked with wound classification surveys. Despite low wound classification agreement and accuracy suggesting that further training is necessary, there is little evidence that singleevent workshops improve these metrics. In this study, wound classification agreement and accuracy only improved marginally following training. We did see improvement in the classification of A-I through A-III wounds during our workshop, but variability in agreement and accuracy was high. Other workshops similarly observed little or inconsistent improvement in classification agreement following training Nowicki, 2008). Although there is little evidence that wound classification workshops improve wound classification agreement and accuracy, it does not mean that holding regular standardized training would not be beneficial. The group discussions indicated that there were a variety of approaches to handling multiple wounds, wound size, and wound identification among participants suggesting there is still room for standardization. A coordinated effort to develop a standardized training and data recording program may improve agreement and consistency by virtue of everyone receiving the same training. Additionally, it is possible that the training approaches taken in wound classification workshops, including this one, were not effectively designed to meet the goal of improving agreement and accuracy. If more targeted consideration were put into the development of training materials and methods, improvements may be achievable.
Another possibility to reduce the influence of low classification agreement and accuracy would be to incorporate misclassification rates into applications that use wound data. Wounding data are currently used to inform statistical catch-at-age models and provide insight into the binational sea lamprey control program. Misclassification rates for each wound type could inform priors in a Bayesian modeling approach, be used to modify wound records before use, or be incorporated in a sensitivity analysis to quantify the effects of wound misclassification on model estimates. Assessment of misclassification rates give some insight into how wound data might be adjusted to reflect what we know about wound classification accuracy. However, our workshop was held once with a relatively small number of participants, and it is therefore likely that a repeat of this workshop in other locations would be necessary to obtain error estimates required for any type of correction factor. Alternatively, Adams et al. (This issue) suggest that statistical catch at age models should incorporate sea lamprey abundance estimates via a functional response model as a way of calibrating observed wounding rates.
Other biomarkers may be more reliable indicators of parasitism status than classification of sea lamprey wounds. If a protein biomarker expressed in parasitized individuals could be identified with a simple, non-invasive, and low cost blood test, difficulties with the use of a subjective classification protocol may be avoided. Similar approaches have been used previously to identify biomarkers indicative of bitumen exposure in sockeye salmon (Oncorhynchus nerka) (Alderman et al., 2017), environmental estrogen exposure in Atlantic salmon (Salmo salar) (Arukwe et al., 1997), and for a wide array of contaminants in toxicology applications (Gupta, 2014). However, finding biomarkers that are cost effective, reliable predictors of ecological effects can be challenging (reviewed in Forbes et al., 2006), and using biomarkers to estimate risk to populations is generally not advised (Hanson, 2009). Ideally, any biomarker developed would be a time sensitive measure of parasitism, as most current management applications of wound data attempt to associate wounds with a given year in order to evaluate the success of the sea lamprey control program or direct influences on fish mortality. Despite these challenges, approaches to biomarker identification have become more sophisticated (Song et al., 2008); and, if developed, could play an important role in estimating parasitism intensity.

Conclusion
The Great Lakes Fishery Commission's sea lamprey control program assists managers in meeting fish community objectives (Gaden et al., 2008), with a goal toward restoration of native lake trout stocks (Treska et al., This issue;Stewart et al., 2003). Records of sea lamprey wounds on lake trout are the primary tool used to evaluate lake trout objectives and assess the effectiveness of the sea lamprey control program (Stewart et al., 2003). Given the importance of wound data for assessing and directing management plans, it is critical that the underlying assumptions behind their use are evaluated and the degree to which the assumptions are met is well characterized. The results of this workshop suggest that wound classification agreement and accuracy are low, and misclassification rates are high for most wound types, consistent with previous workshops assessing similar metrics Nowicki, 2008). Because high classification agreement and accuracy are important assumptions of wound data use, the reliability of wound data as an indicator of the success of lake trout rehabilitation and sea lamprey control efforts may merit more critical evaluation.
Despite these concerns, several approaches may improve the reliability of wound data going forward. Although previous efforts, including this workshop, have not demonstrated the ability to markedly improve wound classification accuracy and agreement, a better designed training program adopted by all agencies doing field assessments may be able to improve the reliability of wound data. Additionally, more work characterizing misclassification rates may allow for inaccuracies in wound records to be accounted for in modeling efforts. More work is needed to understand the extent to which current inaccuracies in sea lamprey wound classification can influence the evaluation of fish community targets in the Great Lakes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.