Clinical signs of trachoma are prevalent among Solomon Islanders who have no persistent markers of prior infection with Chlamydia trachomatis

Background: The low population prevalence of trachomatous trichiasis and high prevalence of trachomatous inflammation–follicular (TF) provide contradictory estimates of the magnitude of the public health threat from trachoma in the Solomon Islands. Improved characterisation of the biology of trachoma in the region may support policy makers as they decide what interventions are required. Here, age-specific profiles of anti-Pgp3 antibodies and conjunctival scarring were examined to determine whether there is evidence of ongoing transmission and pathology from ocular Chlamydia trachomatis ( Ct) infection. Methods: A total of 1511 individuals aged ≥1 year were enrolled from randomly selected households in 13 villages in which >10% of children aged 1–9 years had TF prior to a single round of azithromycin mass drug administration undertaken six months previously. Blood was collected to be screened for antibodies to the Ct antigen Pgp3. Tarsal conjunctival photographs were collected for analysis of scarring severity. Results: Anti-Pgp3 seropositivity was 18% in 1–9 year olds, sharply increasing around the age of sexual debut to reach 69% in those over 25 years. Anti-Pgp3 seropositivity did not increase significantly between the ages of 1–9 years and was not associated with TF (p=0.581) or scarring in children (p=0.472). Conjunctival scars were visible in 13.1% of photographs. Mild (p<0.0001) but not severe (p=0.149) scars increased in prevalence with age. Conclusions: Neither conjunctival scars nor lymphoid follicles were associated with antibodies to Ct, suggesting that they are unlikely to be a direct result of ocular Ct infection . Clinical signs of trachoma were prevalent in this population but were not indicative of the underlying rates of Ct infection. The current World Health Organization guidelines for trachoma elimination indicated that this population should receive intervention with mass distribution of antibiotics, but the data presented here suggest that this may not have been appropriate.


Introduction
Trachoma is responsible for approximately 1.9 million cases of visual impairment or blindness globally 1 . International partners have committed to elimination of trachoma as a public health problem by the year 2020 and the global elimination strategy is guided by the clinical signs trachomatous trichiasis (TT) and trachomatous inflammation-follicular (TF). The World Health Organization (WHO) recommends at least three years of mass drug administration (MDA) with azithromycin in districts with ≥10% TF prevalence in [1][2][3][4][5][6][7][8][9] year-olds to treat the causative agent Chlamydia trachomatis (Ct) 2 .
A 2013 population-based prevalence survey (PBPS) covering two provinces (Temotu and Rennell & Bellona) of the Solomon Islands showed that the proportion of 1-9-year-old children with TF was moderately high (26.1%) 3 . In response to this and in accordance with WHO guidelines, MDA took place throughout the Solomon Islands in 2014 and the national program administered approximately 24,000 doses of azithromycin (achieving coverage of approximately 80% in Rennell & Bellona, and 85% in Temotu). Data from the 2013 PBPS suggested that whilst TF was prevalent, TT (0.1% of those ≥15 years), trachomatous inflammation-intense (TI; 0.2% of 1-9-yearolds), and ocular infection with Ct (1.3% of 1-9 year-olds) were all rare 3 . Our recent survey of Kiritimati Island, Kiribati 4 , used the same tools and estimated more typically-matched values of TF and Ct infection prevalence (among children) at 28% and 24%, respectively. We therefore questioned the underlying biology of the TF signs that were observed in the Solomon Islands.
We hypothesised that the clinical signs in the Solomon Islands were not consistent with repeated Ct exposure. We set out to investigate this hypothesis by returning to the same two provinces of the Solomon Islands six months after MDA took place. This study used tests for two different persistent markers of previous Ct infection. The first was an enzyme-linked immunosorbent assay (ELISA) that measured antibodies against the Ct antigen Pgp3 5,6 . This tool has been used to assess transmission of both urogenital 7 and ocular 8 infections, including in the study on Kiritimati Island, where we showed that there were strong associations between Ct infection, TF and anti-Pgp3 antibody levels. We also observed a rapid increase in age-specific Pgp3 seroprevalence throughout childhood years in that population 4 .
This study also assessed trachomatous scarring. Scarring, caused by immuno-pathological responses to repeated cycles of infection, is an irreversible process that, like Ct seropositivity, is generally considered to be a persistent marker of previous ocular Ct infection. In trachoma, it is characterised by a gradual accumulation of scar tissue in the tarsal conjunctivae 9 , which typically begins to develop to the point of being visible in late childhood. Scarring is more commonly found in those who have experienced prolonged, severe inflammation and infection 9-11 . Very few young children in trachoma-endemic communities have signs of scarring, but as many as 10-30% of older children may do so 12,13 . Scarring progresses throughout a lifetime and, in severe cases, is the underlying cause of TT 11 . Assuming that trachoma was an endemic problem in this population, we would expect to observe an age-dependent accumulation of scarring, with an increasing proportional representation of severe scars with advancing age.

Ethics statement
The methods used in this study adhered to the tenets of the Declaration of Helsinki. Ethical approval for the study was granted by the London School of Hygiene & Tropical Medicine (LSHTM; 8402) and Solomon Islands National Health Research Ethics Committee (HRC15/03). Subjects aged 18 years or older gave written, informed consent to participate. A parent or guardian provided written, informed consent on behalf of those aged under 18 years.

Study design
To enable comparison to pre-MDA data, only villages in Temotu and Rennell & Bellona provinces where baseline mapping had been conducted were eligible for inclusion. Due to their small respective populations (in the 2009 census, the population of Temotu was 21,362 and of Rennell & Bellona was 3041), the two provinces were combined into one evaluation unit during baseline mapping. The survey took place in June-July 2015, six months after a single round of azithromycin MDA had been delivered by the Solomon Islands National Trachoma Elimination Program.
Thirteen villages were selected in which more than 10% of the community (all ages) had previously had signs of TF 3 . We included numbers of villages in each province to reflect the proportion of the total population of the two provinces combined (Temotu: 11 villages; Rennell & Bellona: 2 villages). The proportions of active trachoma and infection cases in study villages before MDA were extracted from the full baseline dataset and are presented here for comparison.

Amendments from Version 1
We have drafted a second version of our manuscript following feedback from two peer reviewers. The reviewers main suggestions were to improve transparency around training and validation of clinical and photograph graders and acknowledge limitations in that process. We feel these were valuable suggestions which would improve the manuscript and have therefore added three new sections to the manuscript: (1) In the Methods section, more detail has been added to the information on Trachoma Grading, both for clinical graders and photograph graders; (2) In the Results section, we have presented the outcome of the comparison between the photograph grade (according to the FPC system) and the field grade (according to the WHO simplified system) and have also clarified how many photographs had to be adjudicated; and (3) In the Discussion section, we have added a paragraph on the limitations of photograph and field grading, the potential for disagreement and the implications of that on our data. We have responded to the peer reviews specifically in the comments section, and also made some minor editorial changes to improve the clarity of the article.

REVISED
This survey was powered to estimate the prevalence of anti-Ct antibody seropositivity in children aged 1-9 years. Based on the low prevalence of ocular Ct infection prior to MDA (1.3%), we expected the seroprevalence to be approximately 10%, in line with other communities with low Ct prevalence 14 . To estimate seroprevalence with ±5% precision at the 95% confidence level assuming a design effect of 2.65 (as utilised in the baseline study) at least 367 children were required 15 . In our pre-MDA PBPS survey, we examined a mean of 1.1 children per household and therefore estimated that 25 households in each of 13 clusters were needed to reach our sample size. All residents aged 1 year or above living in households drawn at random from a list of all households in a study cluster were eligible to participate.

Trachoma grading
Clinical TF, TI and TT grading was performed in the field by two Global Trachoma Mapping Project (GTMP)-certified graders wearing 2.5× binocular magnifying loupes. Graders were trained according to a training scheme developed under the GTMP which was designed to be as standardisable as possible between countries. To become certified, graders were required to achieve a kappa score of ≥0.7 compared to expert consensus on a set of 50 photographs, then a kappa of ≥0.7 compared to a highly experienced grader on 50 schoolchildren's eyes 16 . Clinical grading of TF, TI and TT is a routine part of trachoma surveys and prevalence estimates of TF and TT are the basis for programmatic decision making. These signs have, therefore, been the focus of programmatic scale-up of mapping activities whereas similar standardisation does not exist for scar grading. Clinical grading according to the WHO simplified system 17 was therefore used for TF, TI and TT, whilst photo-grading was used for scarring.
High-resolution digital photographs of the right tarsal conjunctivae were graded for scarring using the modified WHO trachoma grading system 18 . Photographs were graded by two photograders who had previously achieved kappa scores for intergrader agreement of >0.7 for F (follicles), >0.7 for P (papillae) and >0.7 for C (conjunctival scar) grades, compared to a highlyexperienced trachoma grader. Photographs were graded for F, P and C. The F grades were used to retrospectively check the accuracy of the TF field grading, although it should be noted the two grading systems are not entirely concordant. Photograph grading was undertaken masked to field grading, laboratory results and the other photograph grader's assessment. Discrepant grades were arbitrated by a third highly experienced grader. Grading was performed using "FPC_Grader", an open source software tool based on R.

Specimens
Dried blood spots were collected for assessment of anti-Pgp3 antibody level. Participants' fingers were cleaned and then pricked with new, sterile lancets, and blood was collected onto filter paper calibrated to absorb 10 µL (CellLabs, Sydney, Australia). Filter wheels were air-dried for 4-12 hours before being sealed in plastic bags with desiccant sachets. These were refrigerated for up to one week and then stored at -20°C before shipping at ambient temperature to LSHTM, London, UK, where they were again stored at -20°C.
Swabs were passed three times (with a 120°-turn between each pass) over the right conjunctiva of children aged 1-9 years. The examiner and specimen manager took precautions to avoid cross contamination between participants or swabs in the field.
In each village, one clean swab was passed within 20 cm of a seated participant and then processed identically to participant swabs to test whether cross contamination between swabs took place in the field. Swabs were refrigerated for up to one week and then stored at -20°C before shipping to LSHTM on dry ice for processing.

Serological and nucleic acid testing
Anti-Pgp3 antibody level was assessed using ELISA, as described elsewhere 4,19 . Optical density (OD) at 450 nm was measured using SpectraMax M3 photometric plate reader (Molecular Devices, Sunnyvale, USA) then normalised to a 20% dilution of high-titre (presumed positive) serum in low-titre (presumed negative) serum.
DNA was extracted from swabs with the QIAamp DNA mini kit (Qiagen, Manchester, UK). Samples were tested for Homo sapiens ribonuclease subunit (RPP30; endogenous control) and open reading frame 2 of the Ct plasmid (diagnostic target) using a previously evaluated droplet digital PCR assay 20 with minor modifications 21 .

Data analysis
All data analyses were conducted using R 3.2.3 22 . Pre-and post-MDA proportions were compared using Wilcoxon's rank sum test. Fleiss' Kappa scores were calculated using the 'irr' package in R. ddPCR tests for current ocular Ct infection were classified into negative and positive populations according to methods described previously 20 . Anti-Pgp3 antibody titre was divided into two populations using an expectation-maximisation finite mixture model 6 , with individuals classified seropositive if their normalised OD was more than three standard deviations above the mean of the presumed-negative population. Using this method, the threshold normalised OD value for positivity was 0.7997. Data from comparable studies in Bijagos Islands, Guinea-Bissau 23 , and Kiritimati Island, Kiribati 4 , are shown in Figure

Results
Study demographics 1511 people (46.3% male; 466 1-9-year-olds) aged 1 year and over were examined in 382 households from the 13 selected study villages. By comparison, the pre-MDA survey of the same villages yielded 1534 people (490 1-9-year-olds) in 394 households. Data on non-participation were not collected in the June 2015 study, but the number enrolled was similar to that for the pre-MDA survey, suggesting a similar participation rate of around 90% on both occasions. In this study, there was a mean of 4 people per household aged 1 year and over, and a mean of 1.2 children per household aged 1-9 years. After accounting for non-participants, this is similar to the means in the 2009 No cases of TT were identified during this study.
Photographic assessment of trachoma Of the right eye photographs that were collected, 1440/1511 (95.3%) were suitable for grading conjunctival scarring. 42% of photograph grades did not match on either F, P or C grade and so were adjudicated (the majority of these discrepancies were due to disagreements in P grade). 188/1440 (13.1%) photographs were graded as having visible scars (C>0), of which 127 were C1 (mild), 53 were C2 (moderate) and eight were C3 (severe). Four out of eight cases of C3 were found in children aged 1-9 years, these photographs are shown in Figure 1. The photo-graders noted that whilst some conjunctivae met the criteria for classification of C3 (i.e., there was clear scarring with distortion) these photographs also demonstrated the presence of features that are not typically associated with trachoma. In some cases, these were characterised by pronounced linear boundaries between heavily scarred conjunctiva and apparently healthy tissue ( Figure 1C and 1D). Photo-graders noted that 4/53 (7.5%) C2 cases and 3/8 (37.5%) C3 cases looked atypical for trachomatous scarring. Of individuals with eyelid scarring considered typical for trachoma, 36/54 (67%) were seropositive, whereas 2/7 (29%) of those with atypical scarring were seropositive. This difference in proportions was not significant (chi-squared test p=0.123), presumably because of the small numbers with atypical scarring.
The age-specific prevalence of scarring in the Solomon Islands is shown in Figure 2A. Of 435 photographs graded from children aged 1-9 years, 25 (5.7%) were graded as C>0. In 311 adults aged >40 years who were examined, 74 (23.8%) had C>0 (65 cases of C1, 9 cases of C2, 0 cases of C3). We have reproduced published data from a comparable study in the Bijagos archipelago, Guinea-Bissau, where ocular Ct infections were common (22% of 1-9 year olds had detectable Ct infection) 23 . These data are included to demonstrate the contrast between photo-grading data sets from the Solomon Islands and Guinea-Bissau, the latter of which reflects the typical patterns of scar accumulation that would be expected in a setting where ocular Ct infection is hyperendemic ( Figure 2B).
In the Solomon Islands, the proportion of people with C1 increased with age (logistic regression p<0.0001), but the proportion of people with more severe scarring (C2 or C3) did not (logistic regression p=0.149). There was also no significant association between having C>0 and gender (chi-squared test p=0.80). In Rennell & Bellona, 25/225 (11.1%) of photos were graded C>0, whereas in Temotu, 163/1215 (13.4%) of photos were graded C>0; the difference in scarring between provinces was not significant (chi-squared test p = 0.404).

Anti-Pgp3 serology
Dried blood spots were collected from 1499/1511 (99.2%) people aged ≥1 year during the post-MDA survey; the other 12 people declined finger-prick. The distribution of normalised OD for all individuals, grouped into five-year age brackets, is shown in Figure 3A. This figure demonstrates the median normalised OD to be much higher in people aged >25 years than their younger counterparts. Overall, 633/1499 (42.2%) people were classified as seropositive. In children aged 1-9 years, the prevalence of anti-Pgp3 antibodies was 83/462 (18.0%). In 1-yearolds alone, it was 5/47 (10.6%). The mean seroprevalence in those aged 6-10 years was not significantly higher than in those aged 1-5 years (20.3% compared to 16.6%, chi-squared test p = 0.328) ( Figure 3C). In Figure 3B, we have also included comparator data from Kiritimati Island, where the TF prevalence was similar but where the prevalence of Ct infection was much higher 4 . Among children aged 1-9 years, the rate and dynamics of accumulation of seropositivity differed substantially between the Solomon Islands and Kiritimati ( Figure 3B).  specimens from children who were positive for Ct, the median load was high at 18,725 plasmid copies/swab. This suggests that these were much less likely to be false positive results than had they been low load infections. 6 (9.8%) of the 61 children with TF also had Ct infection. We previously showed that, of 462 swabs from the pre-MDA study which passed quality control, 5/462 (1.1%) had infection. All five infection cases came from children with active trachoma in the right eye (5/159, 3.1%). The median pre-MDA load of Ct infections in those villages was 14,260 plasmid copies/swab 3 . Neither the difference between the pre-and post-MDA Ct prevalence nor the pre-and post-MDA Ct load were statistically significant (Wilcoxon rank sum test p=0.259 and p=0.175, respectively). The relationship between Ct infection, signs of trachoma and seropositivity was examined in children aged 1-9 years and is summarised in Table 2. 7/8 cases of infection were in seropositive individuals ( Figure 3C). All study villages had at least one case of TF, but infections were limited to five of the 13 villages studied. Two

Discussion
Based on moderate estimates of province-level prevalence of TF, the Solomon Islands has (along with other Pacific Island states) been identified as having endemic trachoma. Whilst measures for trachoma elimination have already been deployed in Temotu and Rennell & Bellona, we have previously noted that TI, ocular Ct infection and late-stage disease (TT) are rare 3 . If the village-level findings of the current study were replicated throughout the district, then TF would still be sufficiently prevalent to warrant continued intervention. The conjunctival scarring and serological data presented here, combined with previous Ct infection data, suggest that ocular Ct is scarce and is not being widely transmitted. TF is not concurrent with an appreciable burden of infection, severe scarring or TT in this population. Our most significant finding, which is that 80% of TF cases occur in people who are seronegative for antibodies against Ct, questions whether further rounds of MDA are warranted in this population.
In Kiritimati Island, we found that just 20.3% (23/119) of children with TF were seronegative according to the same ELISA test that was used here. We were not surprised to find some individuals with TF were seronegative because (1) a proportion of individuals who have primary infections will not yet have seroconverted and (2) anti-Pgp3 antibody responses may not be the same in all people due to natural variability in host responses. However, the fact that 80% of TF cases in Solomon Islands were seronegative suggests that many cases are not caused by Ct infection. More important still is that in Kiritimati Island, children with TF were far more likely to be seropositive than those without TF. In the Solomon Islands, however, we not only found that most TF cases were seronegative, but that individuals with TF were no more likely to be seroreactive to Pgp3 than their peers without TF. We can rule out the possibility that Solomon Islanders are collectively non-responsive to Pgp3 (for genetic reasons, for example) because the majority of the adult population do have antibodies against Pgp3. The most parsimonious explanation of our findings is that TF in this population is caused by a factor other than Ct.
We found a small and non-significant increase in age-specific seroprevalence between young children (1-5 years) and older children (6-10 years), which suggests that children here occasionally do encounter Ct infections. This is concordant with our previous data, which suggested that although ocular Ct strains are present in the Solomon Islands, they are rare 3 . This contrasts with the data from Kiritimati Island, where we saw that there was a substantial year-on-year increase in age-specific seropositivity ( Figure 3B). The increase in seropositivity with age in this group was also very modest compared with that seen in hyper-endemic villages in Tanzania, where seropositivity has been observed to increase from approximately 25% to 94% between the ages of 1 and 6 years 26 . In the current dataset, there was a rapid increase in age-specific seroprevalence around the age of 18 years, the self-reported median age of sexual debut in a nearby population 25 . The prevalence of urogenital Ct infection is known to be high in women attending antenatal clinics in the Solomon Islands 25 , which probably explains the high seroprevalence in adults as anti-Pgp3 antibodies do not distinguish between biovars. Exposure during parturition may also be a major contributor to the 10% of 1-year-olds in our study who had evidence of prior Pgp3 exposure 27 .
While seroreversion due to clearance of infection by MDA is a possible explanation for the low seroprevalence and absence of association of anti-Pgp3 antibodies with TF in the Solomon Islands, there is currently no evidence for complete seroreversion for Pgp3-specific antibodies 7,26 after clearance of infection. It is, in any case, hard to imagine a biologically plausible situation in which seroreversion would fully account for the contrasts between serological data from Kiritimati and data from the Solomon Islands.
Whilst the proportion of people with mild scars increased with age, the proportion of those with more extensive or eyeliddistorting scars did not increase with age. Contrary to what might be expected in a trachoma-endemic community 13 , no eyeliddistorting scars were found in 311 adults aged above 40 years. Some cases of severe scarring we observed in children were not typical of trachoma and were found in children who lacked Pgp3 reactivity (Figure 1, Table 2). There are other inflammatory conditions (e.g. adenoviral, acute haemorrhagic or membranous conjunctivitis) that may result in conjunctival scarring, although the pathology, incidence and prevalence of these are poorly understood and incompletely described 28 . It is also unclear whether the TF that we observed is directly linked to conjunctival scarring in this setting. In Temotu and Rennell & Bellona, the low prevalence of severe scars suggests that the proportion of the population at risk of developing TT is very low, although we cannot determine whether this might change in the future.
One limitation of all studies of tarsal conjunctival scarring is an element of diagnostic uncertainty when the scarring is very mild; determining whether it meets the C1 criteria is difficult. It is possible that this may be easier to judge from high quality digital photographs than during live examination using 2.5× loupes. In contrast, eyelid distortion, required to meet the criteria for C3, may be difficult to judge from a single, two-dimensional photograph. This can lead to disagreement between graders. However, subjectivity in determining the presence or absence of eyelid distortion is also present with field grading. Photography has the advantage that the images can be presented as empirical evidence ( Figure 1) that can be scrutinised, appraised and regraded by third parties where field grades cannot. Photograph grading is the method of choice for studies of scarring severity or progression 12,13,29 .
The data collected on infection, antibody, scarring and trichiasis prevalence are consistent in their suggestion that trachoma is uncommon in this population despite the moderate TF prevalence. The field TF grades were therefore compared to photograph grades of F2 or F3, yielding a kappa agreement score of 0.52 (moderate agreement). This was considered to be acceptable as this is a post-treatment scenario where there are likely to be more borderline cases where disagreement may be more common. Also, the discrepancy between F2 and TF grades (for example, someone with five follicles in the central conjunctiva would qualify as having TF but not F2) may lead to further disagreement. The most common type of disagreement was field TF+ / photograph F≤1, as we might expect given the discrepancies between the FPC and the simplified system. There is also a possibility that the grader had overcalled some milder cases of follicular inflammation as TF. When a discrepancy occurs, we consider the field grade to be more likely to be accurate as the field grader can achieve a three-dimensional real-time view of the eyelid.
The 13 communities included here were the most highly endemic of those surveyed in Temotu and Rennell & Bellona during the GTMP, with at least 15% of children aged 1-9 years living in selected villages having TF before MDA. In this study, we showed that the burden of TF in many of these villages dropped significantly following a single round of MDA, but still remained above the threshold for continued intervention. The drop in clinical disease was not reflected by a simultaneous drop in ocular Ct in children with TF, which actually increased slightly (although this increase was not statistically significant). From interventions in other settings, we might expect TF prevalence to have approximately halved six months after a single round of MDA, given 80% population coverage 30,31 . Azithromycin has anti-inflammatory and broad-spectrum antibiotic effects, which may help explain the observed decrease in clinical disease, but would of course only be effective in controlling a subset of bacterial genera. Given the method of village selection, regression to the mean would be another potential explanation for the fall in prevalence.
We observed regional variation across the study villages. Compared to Temotu, we noted that MDA did not have as significant an impact on TF prevalence in Rennell & Bellona, that there were more children there who were seropositive and that there were more children with TF who also had infection. Our survey was not prospectively designed to assess these differences, and the subgroup size in Rennell & Bellona precluded more detailed analysis. Temotu is more similar to the rest of the Solomon Islands in terms of the geology and geography of the islands as well as in relation to the lifestyle and ethnicity of the majority of the inhabitants. Further studies on the localisation of trachoma in the islands are warranted.
The complex, multistage nature of trachoma makes it difficult to predict the outcome of any given intervention 32 . Data from cross-sectional surveillance tools used in isolation can be hard to interpret, especially given the prolonged persistence of TF after clearance of infection 33 . Some features of conjunctivitis in the Solomon Islands resemble trachoma, particularly the prevalent follicular inflammation and some of the severe conjunctival scarring. Crucially, we found that these clinical features were not co-endemic with TT at a prevalence that indicated an ongoing public health problem. In this setting, we believe that tests for infection gave a better indication of the public health threat from trachoma than TF. A combined approach in which various age-specific markers of trachoma are assessed together across the complete age range of the population may prove useful for prioritising areas for intervention where the prevalence of TF alone does not coherently reflect trachoma's public health importance.
Contrary to the WHO recommendation for treatment based solely on prevalence of TF, our data suggest that trachoma is not a public health problem in these villages. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

2.
3. This manuscript describes trachoma prevalence in the Solomon Islands over an almost 2 year period. The data highlight the discrepancy between the presence of high TF rates, which would indicate that trachoma is a public health problem in this area, while the rates of other signs of trachoma (ie. TI, TT) are very low to none. The study is well designed and the manuscript well written. The concern raised by the authors that the high TF rates seen may not actually reflect ongoing trachoma infection and thus MDA may not be warranted are well supported by the data as is their suggestion that consideration be given to using multiple markers of trachoma to determine whether or not intervention is needed.

Open Peer Review
I have some comments for the authors which I list below:

TS
In the Methods section the authors state that grading of TF, TI, and TT was done clinically and that only scarring was graded by photographs. However, when they describe the inter observer agreement for the grading they refer to the grading of F, P, and C (as per Dawson et al.). Could the authors please report the following: What the inter observer agreement for grading scarring alone was in this paper? What percent of grades had to be adjudicated? Where else this Dawson et. al scarring grading scheme applied to photographs has been used/validated?
The authors followed the grading system of conjunctival scarring described by Dawson et al.; the grade descriptions do not have clear cut offs. How did the authors modify this grading system for the photographic graders? How did they determine the cut off between C1 and C2? How did they determine shortening/distortion for C3 from photographs alone? Did they include looking for signs of lid margin conjunctivilization as they state in their C3 example photograph? These are areas that could lead to a great deal of disagreement between graders...
In the comparison population whose scarring data is presented in Figure 2 B, was the grading done in the same way as in this paper, through photographs with two graders and an adjudicator? If so, what was the inter-observer agreement? This information was not readily available in the paper cited.

TF
Since the authors mention the kappa of F, P, and C, I wonder if they happened to grade follicles 1.

TF
Since the authors mention the kappa of F, P, and C, I wonder if they happened to grade follicles and papillae photographically? If so, how does this compare to the field grades? Given the discrepancy between TF and all other signs of trachoma, having a way to ensure accurate field grading of TF would be reassuring.
Do the authors have any hypotheses as to why the prevalence of TF might be so high in this community if it is not due to trachoma (ie. higher incidence of viral etiologies, etc.)?

PGP3 Antibody
In the discussion the authors state that there is no evidence for complete seroreversion. Although I agree with the overall conclusion that seroreversion cannot account for the differences in the data that were observed, the authors may want to consider looking at/adding the following reference which does support the presence of seroreversion: West et al. (2018) . PubMed Abstract Publisher Full Text

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. We are grateful to Dr Wolle for her feedback on our article. We have taken her comments into account as we prepared the second version of this manuscript, and believe they have enhanced the quality and transparency of the article. We hope the following additional information addresses 1 the quality and transparency of the article. We hope the following additional information addresses some of her comments directly.
The photograph graders had to achieve a kappa score of >0.7 in C grading, a kappa score of >0.7 in F grading and a kappa score of >0.7 in P scoring when compared to a highly experienced grader before they were considered to be 'validated' for photograding. It is hard to calculate a kappa score between graders in this particular manuscript because those where the primary graders did not reach consensus were arbitrated. Where the first two grades agreed, the grade was accepted as the final grade. Where the first two grades disagreed on F, P or C, an independent expert adjudicated. There are, therefore, a subset of photographs where both primary graders have called the same C grade, and the arbitrator has also graded C. In these cases, the kappa agreement was 0.76 between primary and arbitrator grades.
42% of participant photographs did not agree on F or P or C score, so were arbitrated. The majority of these were due to disagreement in P score.
We have presented data in the article from a comparable study in Guinea Bissau where the same grading process was used, and a starkly different picture emerges.

PLoS NTDs
While we are aware of other grading systems developed specifically for grading scarring in photographs, we considered this grading system to be the best balance between the relatively simplistic tool of the simplified system and the highly complex tools described by others. We felt the experience our group had using the FPC system would lead to more reproducible grading which would outweigh the benefit incurred by using more specialised methods. We acknowledge that the difference between borderline cases of C0 and C1 or between C1 and C2 may be difficult to arbitrate, although felt that was a problem common to all trachoma grading, for example with regards to interpretation of what constitutes a follicle >0.5mm or the difference between different levels of inflammation. Regarding the question of distortion of the tarsus, we believe in the photographs where C3 was called (examples of which are displayed in figure 1) the distortion of the tarsus is clear, however, it is possible that some cases of C3 were missed. We believe at the population level, this is unlikely to lead to gross shifts in results, especially in the context of this population where over 85% of people show no scarring at all. Finally, the comparator data we present in figure 2b demonstrate that the same grading scheme used in an area with hyperendemic infection does identify cases of moderate and severe scarring.
In the comparator population from Guinea Bissau, the photographs were reviewed by two experienced graders who met to discuss the outcome of discrepant results. Further information about this process was not recorded, so it is not possible to determine what proportion needed to be discussed.
The kappa agreement score between photograde of F2-3 and field grade of TF is 0.52 and additional information surrounding this comparison is included in response to Dr Nash's comment.
The two hypotheses that have emerged as a result of this set of work are (1) that an as-yet The two hypotheses that have emerged as a result of this set of work are (1) that an as-yet uncharacterised, non-bacterial cause of follicular conjunctivitis is inflating estimates of TF or (2) that the host response of residents of the Solomon Islands to circulating infection is not the Ct same as that in other countries, so does not result in the antibody and cicatrizing responses seen in other endemic areas. We consider the former to be more likely, and mention that in the final sentence of the second paragraph in the discussion.
Thank you for directing us to the paper from West . Our interpretation of the data in that paper et al.
is that the small proportion of people classed as sero-reverters at one year actually have a distinguishably lower antibody level at baseline than those who were positive at both timepoints, and a distinguishably higher antibody level at follow up than those classed as negative at both timepoints, therefore we believe there is no evidence for 'complete' seroreversion, and the statement should stand.
No competing interests were disclosed. The manuscript describes the results of a trachoma prevalence survey which measured clinical signs, chlamydial infection, as well as anti-Pgp3 seropositivity among a population from the Solomon providences of Temotu, and Rennell & Bellona. This report is one of a number of recent reports which have examined this population in great detail, including through the use of conjunctival transcriptome profiling and measures of microbial diversity. The manuscript is well written and the laboratory and analytic techniques were appropriate for the main aims of the survey. Within this population, at both baseline, and at the survey time-point described in this manuscript, the clinical signs of trachoma were high enough to warrant interventions with antibiotics per WHO guidelines. However, other markers of trachoma, such as low infection and seropositivitiy, as well as the very low prevalence of trichiasis, suggested the possibility of alternative etiologies for the clinical signs (TF) of trachoma in these providences. The findings of this study are important, as is the author's call for the use multiple age-specific markers of trachoma to help in prioritizing areas for intervention.

Major Comments
Given that a major finding here is that TF remained above 10%, while other trachoma indicators suggested that trachoma was not a public health problem in these providences, it would strengthen the manuscript if the authors provided some more information on whether or not they took additional steps to ensure accurate grading of TF in the field. For example: The authors state that one of two GTMP-certified graders did the grading for TF, TI, and TT. Please state the month and year that this individual was certified for this particular survey, and what criteria were used for certification. The kappas achieved were reported for the photograph grading, please report for the field grading.
Second, it appears that this study used only one grader to capture field grades. Is there a concern Second, it appears that this study used only one grader to capture field grades. Is there a concern that the single chosen grader may have been a person who systematically overcalls TF (more likely to call borderline cases TF)?
Were there any quality control steps taken to ensure the quality of the field TF grading? Given that the discrepancy between TF and other markers of trachoma were discovered in these providences at an earlier timepoint (Butcher et al 2016), and that this current survey for these providences was already more thorough than standard trachoma surveys, the authors could have taken some steps to ensure the quality of field grading. For example, was it possible to recheck a certain percentage of TF cases while still in the field; how did the photo graders compare to the field graders for the TF grade within this particular survey?
Minor comments Under trachoma grading in the methods section, the authors discuss standardization of clinical signs but only cite Thylefors in the section. That paper shows comparable inter-and intra-observer variation for TF, TI and scarring (TS). Is there a different reference that can be used to support the assertion that TF, TI, and TT are more "standardized" than scar grading?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes We appreciate the time Dr Nash has taken to appraise this manuscript. We include a point-by-point response to his comments below. We believe his feedback has improved the article, and hope he agrees.

agrees.
Graders were trained according to the GTMP training protocol as described elsewhere. The training was robust; trainees attended a taught course on trachoma, were required to achieve a kappa score of ≥0.7 when grading a set of 50 photographs in which the trachoma grade had been agreed by several experts and were required to achieve a kappa score of ≥0.7 against a highly experienced grader trainer in the assessment of 50 children aged 1-9 years in the field. Both graders working on this study had met those criteria on the first attempt during a GTMP training in the Solomon Islands in September 2013 (unfortunately we do not have their specific kappa scores available). The grading in Temotu (where 11 of the 13 study villages were located) was undertaken by a single grader who had also successfully completed a GTMP training course in Ethiopia prior to the Solomon Islands course. A second grader conducted the grading in Rennell & Bellona (2 of the 13 study villages were in this province). We believe by training in this manner, we have ensured our data is as comparable as possible to data generated by the GTMP, and by impact and pre-surveillance surveys in the majority of trachoma-endemic areas. Data generated by graders trained in this manner are used to guide treatment decisions around the world. However, to support our research programme alongside the Solomon Islands National Trachoma Control programme we have collected photographs to enable us to retrospectively check grading. In a sample of photographs from a previous survey, the kappa score between field grade of TF and photograde of TF was 0.88. In this survey the photographs were graded with the FPC system rather than the simplified system. The kappa agreement between a photograph grade of F2/F3 and field grade of TF was 0.52, indicating moderate agreement. This is lower than in previous studies in the Solomon Islands, potentially because of more borderline cases in a post-treatment community or because F2/F3 is not exactly comparable to TF. We have expanded the methods section in version 2 to give a fuller account of the training process, we have added the F2/F3 vs TF kappa score to the results, and have added a section in the discussion regarding this process.
Two graders were involved in this study. Both had met the predefined criteria for successful grading and were therefore accredited to grade trachoma to the quality required to guide treatment decisions. The kappa agreement between field and photograph grades along with caveats about expectations for that agreement has been included in the manuscript in line with the previous comment. The most common type of disagreement was field+ / photograph-, therefore there is a possibility that that grader had overcalled some milder cases of follicular inflammation as TF. For the reasons described above, we expect a certain number of discrepancies. Also, when a discrepancy occurs, we believe the field grade to be more likely to be accurate as the field grader can achieve a three-dimensional real-time view of the eyelid. It is worth noting that the key finding of this paper that the majority of TF cases do not have antibodies to Pgp3 holds true if we define the clinical phenotype using the photograph grades (70% of children whose photographs were graded as F2/3 were Pgp3 seronegative, chi-squared test of association p = 0.49).
It wasn't possible to recheck grading in the field, however, we have presented the comparisons between field and photograph grade above and in the paper.
Because specific interventions at the programmes are guided by TF, TI and TT grades but not TS grades, scale up of mapping activities has focussed on those signs. Therefore, grading for TF, TI and TT has been repeatedly replicated in thousands of districts round the world, whereas TS grading is relatively uncommon. Therefore, while graders may be able to be trained to assess TS reliably, training for TF, TI and TT is 'standardised'. We have amended the terminology in the methods to reflect this, and added a reference where this is also discussed.