Disagreement between Human Papillomavirus Assays: An Unexpected Challenge for the Choice of an Assay in Primary Cervical Screening

We aimed to determine the disagreement in primary cervical screening between four human papillomavirus assays: Hybrid Capture 2, cobas, CLART, and APTIMA. Material from 5,064 SurePath samples of women participating in routine cervical screening in Copenhagen, Denmark, was tested with the four assays. Positive agreement between the assays was measured as the conditional probability that the results of all compared assays were positive given that at least one assay returned a positive result. Of all 5,064 samples, 1,679 (33.2%) tested positive on at least one of the assays. Among these, 41% tested positive on all four. Agreement was lower in women aged ≥30 years (30%, vs. 49% at <30 years), in primary screening samples (29%, vs. 38% in follow-up samples), and in women with concurrent normal cytology (22%, vs. 68% with abnormal cytology). Among primary screening samples from women aged 30–65 years (n = 2,881), 23% tested positive on at least one assay, and 42 to 58% of these showed positive agreement on any compared pair of the assays. While 4% of primary screening samples showed abnormal cytology, 6 to 10% were discordant on any pair of assays. A literature review corroborated our findings of considerable disagreement between human papillomavirus assays. This suggested that the extent of disagreement in primary screening is neither population- nor storage media-specific, leaving assay design differences as the most probable cause. The substantially different selection of women testing positive on the various human papillomavirus assays represents an unexpected challenge for the choice of an assay in primary cervical screening, and for follow up of in particular HPV positive/cytology normal women.


Introduction
Screening for human papillomavirus has better sensitivity for high-grade cervical intraepithelial neoplasia and provides protection against cervical cancer for a longer time than cytology screening [1][2][3]. This was demonstrated in studies using predominantly the Digene Hybrid Capture 2H HPV Test (Qiagen, Gaithersburg, MD), and GP5+/6+ polymerase chain reaction assays. However, several human papillomavirus assays have since become commercially available, and well-documented compara-tive studies are needed for laboratories to select the most appropriate assay for primary screening.
The proposed management strategies for women testing positive for human papillomavirus have so far been based on evidence from a small number of assays. Until now, the newly commercially available assays have most often been compared against Hybrid Capture 2 in women with recent cytological abnormalities [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. In routine screening, however, women with cytological abnormalities constitute a selected population, whereas a majority of positive human papillomavirus samples are from women without cytological abnormalities. Hence, studies of women with cytological abnormalities cannot capture the diversity of outcomes of human papilomavirus testing in primary screening. Furthermore, the few primary cervical screening studies comparing various assays used relatively crude outcome measures, e.g. kappa coefficients, and suggested good overall agreement [15,[19][20][21][22][23][24]. However, to determine whether the management strategies for women with positive tests are applicable to other assays, more detailed analyses of outcomes from the various assays are needed. A first step is simply to know whether the same women test positive on different human papillomavirus assays.
The Horizon study was a population based split sample study comparing Hybrid Capture 2, cobasH HPV Test (Roche Diagnostics, Pleasanton, CA), CLARTH HPV2 Assay (Genomica, Madrid, Spain), and APTIMAH HPV Test (Hologic/Gen-Probe, San Diego, CA; Table 1). It was undertaken on routine samples from a well screened population. The aim of this analysis was to determine the frequency of disagreement between the four human papillomavirus assays, particularly in primary screening.

Setting
The Department of Pathology at Hvidovre University Hospital in Copenhagen, accredited by the Joint Commission International, handles all cervical cytology from central Copenhagen. Copenhagen has been covered by an organized cervical screening program since the 1960s. Currently, women aged 23 to 49 years are invited for screening every three years, and women aged 50 to 65 years are invited every five years; in recent years, 76% of women had cytology in the recommended interval [25].

Sample Collection
Horizon was a quality development study nested into routine laboratory practice, and utilized only residual material that would have otherwise been discarded. According to Danish regulations of biomedical research, quality development studies do not require ethical approval.
Upon arrival at the laboratory, consecutive samples were collected in racks of 48. They were collected from 10 June to 25 August 2011, equally from Monday to Friday. Approximately 2 ml of residual material were collected after completion of routine SurePath liquid based cytology and Hybrid Capture 2 triage of women aged $30 years with atypical squamous cells of undetermined significance. Samples were collected from the first four racks or fewer processed on the collection days. This method mimicked a collection of unselected consecutive samples, assuming that the time of sample arrival in the laboratory was not associated with its characteristics. Samples were diluted with 2 ml of SurePath to obtain enough volume for all four assays. Based on capacity and processing considerations, the target number of samples was set to 5,000.
From the 12,138 routine samples processed during the collection period, 6,258 (52%) were selected for Horizon. For 1,194 (19%) samples, complete human papillomavirus testing could not be undertaken: 1,165 samples were tested only with Hybrid Capture 2 owing to lack of residual material for the other three assays, whereas 29 samples could not be systematically tested on all four assays owing to human error. Consequently, 5,064 (81%) samples with cytology and complete results on the four human papillomavirus assays and cytology were included in the analysis ( Table 2). A single sample was available from 5,005 (99%) women, whereas 59 samples (1%) were from the remaining 29 women.

Cytology
Routine cytological evaluation of SurePath samples was undertaken first by FocalPoint Slide Profiler (BD, Burlington, NC). Blinded to the outcomes of human papillomavirus testing in Horizon, samples were thereafter evaluated by cytoscreeners using FocalPoint GS Imaging System (BD), and abnormal findings were adjudicated by pathologists. Cytology was reported using the Bethesda 2001 system.

Hybrid Capture 2 Human Papillomavirus DNA Testing
On the post-quot material from the cytology procedure, DNA was either denatured prior to testing by pre-treating manually according to the manufacturer's CE-IVD protocol, or DNA was isolated and purified using the DSP AXpH DNA kit on QIASymphony SP (Qiagen, Hilden, Germany). As part of the cytology processing, post-quot material was diluted approximately 1:1 in SurePath. Testing was undertaken on automated Rapid CaptureH System (Qiagen, Gaithersburg, MD, USA). A minority of samples used for routine Hybrid Capture 2 triage of women aged $30 years with atypical squamous cells of undetermined significance were denatured and tested manually. Cobas Human Papillomavirus DNA Testing 1 ml of the diluted material was aliquoted into a 13 ml round bottom test tube (Sarstedt, cat. no NC9018280), stored at 2 to 8uC until testing. No pre-treatment of SurePath samples was required. Extraction of DNA was undertaken on cobas x480, and amplification and detection of high risk human papillomavirus DNA on cobas z480 analyzer. Fluorescent TaqManH probes were used for detection of the amplicons during polymerase chain reaction cycles. Amplification and detection of the 330 bp b-globin was used as an internal control of the testing processes.
CLART Human Papillomavirus DNA Testing 1 ml of the diluted SurePath sample was spun down (five minutes, 14,000 revolutions per minute), with supernatant removed and cell pellet re-suspended in a mix of 180 ml phosphate buffered saline (10x conc. pH 7.4, Pharmacy product) and 20 ml Proteinase K (recombinant, PCR Grade, Roche Diagnostics, Rotkreuz, Switzerland). Samples were then vortexed and incubated for one hour at 56uC and one hour at 90uC. Human papillomavirus DNA was purified using MagNa Pure LC 96 and MagNA Pure LC 32 instruments (Roche Diagnostics) with MagNA Pure LC Total Nucleic Acid Isolation Kit (Roche Diagnostics). Polymerase chain reaction amplification was performed using CLARTH HPV2 Amplification kit (Genomica). 5 ml of purified DNA were used for the polymerase chain reaction amplification. Prior to visualisation, the polymerase chain reaction products were denatured at 95uC for 10 minutes. Visualisation was performed using 10 ml of the denatured polymerase chain reaction products on the CLART microarray. Hybridisation between the amplicons and their specific probes on the microarray resulted in formation of an insoluble precipitate of peroxidase when adding a Streptavidin conjugate that binds to the biotin labelled polymerase chain reaction products. The precipitate was analyzed automatically on the Clinical Array Reader (Genomica).

APTIMA Human Papillomavirus mRNA Testing
1 ml of the diluted sample was aliquoted into an APTIMA Specimen Transfer Tube containing 2.9 ml of buffered solution (Hologic/Gen-Probe). Samples were treated with proteinase K prior to testing, using the Pace 2 Fast Expression Kit containing 1 ml diluent and lyophilized reagent (all from Hologic/Gen-Probe). 100 ml of the reconstituted proteinase K was added to each Specimen Transfer Tube and incubated at 65uC for two hours. The treated specimen tube was stored at 2 to 8uC until testing. Testing was performed on the PANTHER platform.

Processing of Samples and Assay Instrumentation
The study protocol, sample storage, and assay testing protocols were agreed upon with all manufacturers prior to the study. All instrumentation and software were used as supplied and maintained by the manufacturers.

Screening History
As described above, all women were previously screened with liquid-based cytology, and those with atypical squamous cells of undetermined significance at age 30 years or above were triaged using the Hybrid Capture 2 assay. The screening history of women from 1 January 2000 onwards was retrieved from the Danish Pathology Data Bank. Following Danish recommendations for follow-up of cervical abnormalities, Horizon samples with an earlier diagnosis of cervical cancer, a diagnosis of cervical intraepithelial neoplasia in up to three years earlier, with atypical squamous cells of undetermined significance in the previous 15 months, with more severe cytological abnormalities or a positive human papillomavirus test in the past 12 months were considered follow-up samples. Samples with no recent abnormality were considered primary samples; reflecting routine practice, these included screening samples and a small proportion of samples taken by indication.

Statistical Analysis
A positive human papillomavirus test was defined according to the manufacturers' recommendations (Hybrid Capture 2: relative light unit per cut off value $1; cobas channels 16, 18, and other high risk genotypes: critical threshold values #40.5, #40.0, and #40.0, respectively; APTIMA: signal to cut off value $0.5). CLART was considered positive if at least one of the 13 human papillomavirus genotypes classified as high risk by the International Agency for Research on Cancer, including genotype 68, was detected [26]. Kappa coefficients were calculated as a standard measure of agreement for each pair of assays; their 95% confidence intervals were calculated by analysing 1,000 bootstrap replications (IBMH SPSSH Statistics, Version 20). The frequencies of positive concordant (positive on assay A/positive on assay B), and of discordant (positive/negative, negative/positive) samples were calculated separately. The sum of the proportions of discordant samples equalled [100% -proportion of overall agreement]. Positive agreement was calculated as the conditional probability that all compared assays were positive (concordant positive samples) given that at least one assay returned a positive result (concordant positive+any discordant samples), and was reported as a proportion. Its 95% confidence interval was calculated assuming binomial distribution of the studied events.

Results
Among the 5,064 samples included in the analysis, 4,790 (94.6%) were from women targeted by the Danish cervical screening program, aged 23 to 65 years (Table 2). Cytology was abnormal in 371 (7.3%) of the 5,064 samples, cobas was positive in 1,356 (26.8%), CLART in 1,273 (25.1%), Hybrid Capture 2 in 1,035 (20.4%), and APTIMA in 846 (16.7%) samples. These proportions were higher for follow-up than for primary samples for all four assays.
Overall, 1,679 (33.2%) out of 5,064 samples were positive on at least one of the four human papillomavirus assays (Table 3). Of these 1,679 samples, 681 (41%) were positive on all four, 260 (15%) on three, 268 (16%) on two, and 470 (28%) on a single human papillomavirus assay. Positive agreement between the assays was lower for women aged 30 to 65 compared to women aged 23 to 29 years. Among women aged 30 to 65 years, positive agreement was higher for follow-up than for primary samples; disagreement among primary samples in this age group is presented in more detail on Figure 1. Among the latter samples, positive agreement was substantially higher in women with atypical squamous cells of undetermined significance or worse compared to women with normal cytology. These patterns remained when the comparison was limited to the three DNA assays.
Virtually all kappa coefficients for pairwise agreement were $0.60, suggesting good overall agreement between the four assays (Table 4). Yet, only 52% of primary samples from women aged 30 to 65 years testing positive on either Hybrid Capture 2 or cobas were positive on both. When comparing Hybrid Capture 2 with CLART and APTIMA, these figures were 50% and 58%, respectively. In total, 8.7% of these primary samples were discordant on Hybrid Capture 2 and cobas, 9.2% on Hybrid Capture 2 and CLART, and 5.7% on Hybrid Capture 2 and APTIMA. Discordant samples between cobas and CLART constituted 8.5% of primary samples from women aged 30 to 65 years, 9.7% between cobas and APTIMA, and 10.3% between CLART and APTIMA. Cytology was abnormal in 4.4% of the same primary samples (Table 5).

Principal Findings
Horizon is among the largest studies to compare several human papillomavirus assays in primary screening. Although we found kappa coefficients suggesting a good level of agreement among pairs of the four commercially available human papillomavirus assays, our analysis of positive samples demonstrated substantial disagreement between the assays, particularly in primary screening samples from women aged 30 to 65 years. For all pairwise assay comparisons, there were roughly as many discordant as concordant positive samples. While 4% of these samples showed abnormal cytology, 6 to 10% were positive on one but negative on the other human papillomavirus assay.
Our analysis indicates that to fully elucidate the extent of disagreement between human papillomavirus assays, it is necessary to compare them on positive samples. The reason is that, even in a screening population with a high background risk of cervical cancer, a majority of samples test negative, and consequently discordant samples may have little impact on traditional measures such as the kappa coefficient. Similar limitations may apply to the relative sensitivity, relative specificity, and non-inferiority [19] of one assay against another. Our approach relies on the same principle as the calculation of the proportion of overall agreement, which is another commonly reported measure. Unfortunately, it has been rarely used on primary screening data [21,22,27], and in those studies attention has not been drawn to the implications of this type of analysis for the management of women with positive human papillomavirus tests. While APTIMA detects E6/E7 mRNA from human papillomavirus infections, the other three assays detect viral DNA. Some disagreement in comparisons between the three DNA assays and APTIMA is therefore not surprising, yet the DNA assays showed more inter-assay disagreement than expected. Possible explanations for this finding include, firstly, that cobas and CLART, but also APTIMA, were run on a fixed volume input from the residual sample material. In contrast, Hybrid Capture 2 was run after resuspension of the pelleted processed cytology material. Theoretically, the CE-IVD post-cytology processing protocol for Hybrid Capture 2 might have removed some free viral particles prior to human papillomavirus analysis and, consequently, the assay may have returned a lower proportion of samples with a positive human papillomavirus outcome. Whether the clinical performance of the assays was affected will be determined when histological outcomes become available. Secondly, the designs of the assays differ. While Hybrid Capture 2 relies on signal amplification from RNA probes to the entire human papillomavirus genome, cobas and CLART are DNA polymerase chain reaction amplification assays targeting L1 sequences of human papillomavirus genotypes. Thirdly, CLART (by our definition) and Hybrid Capture 2 were designed to detect 13 genotypes, and cobas to detect the same plus genotype 66. Samples positive only for genotype 66 though explain few discordant samples, with 11/ 190 (9%) of cobas positive/Hybrid Capture 2 negative, and 20/ 127 (16%) cobas positive/CLART negative primary samples from women aged 30 to 65 years showing infections with genotype 66 and none of the 13 high risk genotypes. Fourthly, assay cross reactivity to low risk genotypes, which could increase the positivity rate, might vary between the DNA assays. These cross reactivity profiles and their significance for discordant samples will be evaluated in a separate report. Fifthly, assay specific calibration of primers/probes for individual human papillomavirus genotypes might result in different analytical sensitivities for detection of infections. Consistent with this, the higher average human papillomavirus viral loads in younger women [28] and in women with dysplasia [29] might increase the positive agreement between assays, as indeed suggested by the patterns observed in our data (Table 3). A possible explanation is that human papillomavirus positive samples in young women or in women with cytological abnormalities have on average higher viral loads than the minimum detectable amount needed to return a positive result even by the comparably least analytically sensitive assays. By the  same token, samples with a relatively low viral load might be those that are more susceptible to the set cut-off on any assay before these assays return a positive result. Therefore, a relatively high frequency of agreement between assays in young women and in women with abnormalities observed in our study could be a consequence of the fact that in these women samples with infections but a relatively low viral load are limited in number. In unselected screening samples, on the other hand, viral loads will be much more heterogeneous, representing everything from recent transient infections to high level persistent infections, allowing samples with lower viral loads having a more prominent role in determining the frequency of disagreement between the assays. Within the WHO proficiency testing panel, CLART was evaluated against known genome equivalents of genotype specific L1 plasmids. It detected the 13 high risk human papillomavirus genotype plasmids in copy number 50 to 500 genome equivalents per genotype [30]. For assays like Hybrid Capture 2 and cobas, a similar analysis would not be possible as they return a combined outcome for the targeted genotypes (cobas though also separately for genotypes 16 and 18). To increase the transparency of assay calibration for the targeted human papillomavirus genotypes, a call for international standards of calibration could be suggested.

Strengths and Weaknesses of the Study
All samples were evaluated in the same laboratory by the same staff, trained and certified by the assays' manufacturers, using testing protocols agreed upon prior to the study, and instrumentation and software as supplied by the manufacturers. Unlike in previous studies, samples were collected and stored in SurePath, and experts have called for an evaluation of new human papillomavirus assays using media other than PreservCyt [31].
Previously, 11,617 (primary) SurePath samples from the same area evaluated in the same laboratory were tested with Hybrid Capture 2 [32]. The median age of the women in that study was 36.4 years, and 6% had atypical squamous cells of undetermined significance or worse. The proportion of women aged 25 to 64 years testing positive on Hybrid Capture 2 was ,17%, similar to the 16% in Horizon in the same subset of samples. Horizon results are therefore in good agreement with the earlier results from the same population.
Although 19% of the collected samples had to be discarded, a selection bias is unlikely as excluded samples were similar to included samples. There were no significant differences between the 5,064 included and the 1,194 excluded samples in terms of women's age, cytology, and Hybrid Capture 2 outcomes, but follow-up samples were slightly more prevalent among the included than the excluded samples (Table 2).
Following the Danish routine recommendations, women with abnormal cytology were referred for colposcopy or for repeated testing. In addition, we are currently inviting women with positive human papillomavirus tests and normal cytology for repeated testing (about one year after the baseline test). False positive rates, clinical sensitivity, and clinical specificity of the assays will be reported upon completion of this histological follow-up. The lack of histological verification might appear as a weakness of the current report. However, this is not the case as screening programs have to implement follow-up procedures for all women with positive tests.

Comparison with Previous Studies and Implications for Clinical Practice
Compared to studies from other geographical areas, the Copenhagen population has a relatively high prevalence of human papillomavirus infections. This can, however, not explain the reported inter-assay disagreement. This assertion is supported by the observation that samples with high prevalence rates of human papillomavirus infection, e.g. those with high grade abnormal cytology, showed better positive agreement than samples with lower prevalence rates. Furthermore, Horizon outcomes were similar to those from prior studies that reported concordant and discordant outcomes in populations with varying prevalence rates of human papillomavirus infection (Table 5). All prior studies used samples stored in media other than SurePath. For each study, we calculated the proportions of concordant and discordant samples, and the proportion of positive agreement (see Material and Methods).
Previous studies fell into two categories; a majority of studies based on samples from women followed up after prior cervical abnormalities, and a minority of studies based on primary screening samples. In all prior studies of women with abnormalities, about four out of five positive human papillomavirus samples were concordant between cobas, CLART or APTIMA, and Hybrid Capture 2, i.e. positive agreement was around 80%. Thus, like the in Horizon data for women with abnormal cytology, these prior studies showed relatively good agreement between the four human papillomavirus assays. In this respect, it seems to make little difference which human papillomavirus assay is chosen for use in triage. However, even in these women the assays did not always detect the same cervical intraepithelial neoplasia lesions. In one study, 101 (37%) of 273 cervical intraepithelial neoplasia grade two or worse lesions were missed by at least one of the seven compared assays, including Hybrid Capture 2, CLART, and APTIMA [11]. In a study including Hybrid Capture 2, cobas, APTIMA, and four other assays, 120 (33%) of 359 cervical intraepithelial neoplasia grade two or worse lesions were missed by at least one assay [33].
In studies of women attending primary screening, about one in two to two in three human papillomavirus positive samples were concordant between Hybrid Capture 2, and cobas, CLART or APTIMA (positive agreement range: 48 to 65%; Table 5). In these studies, the proportion of women with abnormal cytology varied from 1% to 6%, and the proportion of women testing positive on one human papillomavirus assay but negative on the other varied between 3% and 9%. The proportion of women with inter-assay disagreement was thus larger than the entire proportion of women with abnormal cytology. Both Horizon and prior studies thus clearly showed that for evaluation of human papillomavirus assay (dis)agreement, women with cervical abnormalities are not representative for those attending primary screening. The similarity of the results between the Horizon and prior studies suggests that the disagreement between the assays is neither population-nor storage media-specific, but might instead be related to variability in assay designs, and how the assays are calibrated to detect the targeted genotypes. This analysis shows that although different human papillomavirus assays may return a roughly similar proportion of positive samples, on which basis it could be assumed that they also end up having a similar specificity for detecting lesions, they do not identify the same women as positive. In two ways these findings challenge the perception that human papillomavirus assays can substitute each other in primary screening. Firstly, cytology is a commonly recommended triage method in primary screening based on human papillomavirus testing. In our study, 127 out of the 2,881 primary screening samples from women aged 30 to 65 years were cytology abnormal defined as atypical squamous cells of undetermined significance or worse. Of these, 93 were positive on at least one human papillomavirus assay, but only 63 were positive on all four ( Women without cervical intraepithelial neoplasia grade two or worse. c Numbers of samples having an invalid outcome were not reported by pair of assays. Hence, numbers of samples testing positive on both assays were estimated from the reported totals of assay specific frequencies of positive outcomes, and the reported frequencies of discordant samples for pairs of assays; numbers testing negative on both assays were not calculated. a basis for referral to colposcopy, 68% ( = 63/93) of the women would be referred independently of which of the four assays had been used for the primary screening. For the other women, 32% in our study, the type of follow-up and a possibility of referral would depend on the choice of the assay. Secondly, by far the majority of the human papillomaviruspositive women are cytology normal. In this group, the disagreement between the four assays is even larger than in human papillomavirus-positive/cytology abnormal women. So, how should these women be managed? On the one hand, long-term follow-up studies have shown that human papillomavirus-positive/ cytology normal women have a higher risk of high grade cervical intraepithelial lesions than human papillomavirus-negative women [1]. Hence, these women cannot be sent back to routine screening with the extended screening intervals proposed for human papillomavirus-negative women. On the other hand, a retesting interval of 6 or 12 months must be weighed against the burden of such an interim period for this large group of women, who, in absolute terms, still have a relatively low risk of high-grade cervical intraepithelial lesions, and for whom we know that they might likely have tested negative had they been screened with another human papillomavirus assay. The classic dilemma of how to manage human papillomavirus-positive/cytology normal women therefore becomes even more pertinent with the large degree of disagreement between different human papillomavirus assays when used in primary screening.
In conclusion, considerable concordance between the four human papillomavirus assays was observed for triage indications in women with abnormal cytology. However, in primary screening of women above age 30 substantial differences in detecting human papillomavirus infections were observed for the same assays. Knowing that the use of primary human papillomavirus testing could provide a number of benefits for cervical screening, this finding is nonetheless an unexpected challenge that will need to be addressed.