Identification of Women for Referral to Colposcopy by Neural Networks: A Preliminary Study Based on LBC and Molecular Biomarkers

Objective of this study is to investigate the potential of the learning vector quantizer neural network (LVQ-NN) classifier on various diagnostic variables used in the modern cytopathology laboratory and to build an algorithm that may facilitate the classification of individual cases. From all women included in the study, a liquid-based cytology sample was obtained; this was tested via HPV DNA test, E6/E7 HPV mRNA test, and p16 immunostaining. The data were classified by the LVQ-NN into two groups: CIN-2 or worse and CIN-1 or less. Half of the cases were used to train the LVQ-NN; the remaining cases (test set) were used for validation. Out of the 1258 cases, cytology identified correctly 72.90% of the CIN-2 or worst cases and 97.37% of the CIN-1 or less cases, with overall accuracy 94.36%. The application of the LVQ-NN on the test set allowed correct classification for 84.62% of the cases with CIN-2 or worse and 97.64% of the cases with CIN-1 or less, with overall accuracy of 96.03%. The use of the LVQ-NN with cytology and the proposed biomarkers improves significantly the correct classification of cervical precancerous lesions and/or cancer and may facilitate diagnosis and patient management.


Introduction
Approximately 7-8% of the total population screened in the UK will have an abnormal smear [1,2]; of those, approximately 1.5-2% will present with high-grade and 5% with low-grade cytology. Only a small proportion of women with low-grade cytology has underlying high-grade histology. Even the cytological diagnosis of HSIL does not necessarily reflect the histological presence of CIN2+ lesions. HPV DNA test [3,4] has proven its value in detecting women with ASCUS cytology that may have an underlying high-grade histology; however its high-positivity rates in women with low-grade abnormalities fail its use as a triage tool in that population. Accurate triage methods and tests for women with both low-and high-grade cytology are lacking. New emerging technologies and biomarkers such as HPV DNA genotyping, E6&7 mRNA testing, and P16 immunostaining are continuously being investigated [5][6][7][8][9][10][11][12].
Various classification techniques such as neural networks [13][14][15][16][17][18], discriminant analysis [16,19,20], decision trees [21,22], or genetic algorithms [23] have been used in medicine and, particularly, in diagnostic cytology. The implementation of new diagnostic tools and molecular techniques that are increasingly used in the diagnostic cytology laboratory [24] may improve the accuracy of the final diagnosis in comparison to that of cytology alone. The application of neural networks on modern diagnostics might be helpful in that respect.
This study aims to investigate the potential role of learning vector quantizer neural networks (LVQ NN) on various diagnostic variables used in the modern cytopathology laboratory and build an algorithm that may facilitate the classification of individual cases.

Inclusion Criteria and Interventions.
This was a multicentric diagnostic study conducted at the Attikon University Hospital and the University Hospital of Ioannina from 2007 till 2010. The population included a consecutive sample of women with cytology taken as part of screening or during colposcopy. A liquid-based cytology (LBC) sample was obtained and was routinely prepared for cytological assessment and the remaining material was used for testing of specific biomarkers. These included the following tests: cytology using the revised Bethesda classification system (TBS2001 system) [25,26], HPV arrays using the CLART HUMAN PAPILLOMAVIRUS 2 (GENOMICA) that allows simultaneous detection of 35 different HPV genotypes by PCR amplification of a fragment within the highly conserved L1 region of the virus [27], NASBA assays [28] (NucliSENS EasyQ HPV v1.0) that are used for the identification of E6/E7 mRNA of the HPV types: 16,18,31,33, and 45, the PermiFlow (Invirion Diagnostics, LLC, Oak Brook, IL) that allows the identification of E6/E7 mRNA expression of highrisk HPV using FLOW cytometry technique [29], and finally the immunocytochemical expression of p16 using the CINtec Cytology Kit [30]. All these tests produce results that can be used in a classification process and assess the whole cytologic sample.
The histological diagnosis was the gold standard. All women had histological diagnosis with colposcopicallydirected biopsies or LLETZ, apart from those with negative cytology, colposcopy and HPV DNA test (clinically negative cases). These were considered as having negative histology; random biopsies were not taken.

Data Description and Preprocessing.
The LVQ NN was designed to classify the cases into two groups: GROUP 1 (clinically negative and CIN-1 or less at histology-these are considered as negative) or GROUP 2 (CIN-2 or worse at histology-these are considered as positive).
Before applying the LVQ NN, data have been processed as shown in Table 1. Prior to feeding the data to the NN, all variables were scaled at the same range (0 to 10) in order to give all the same significance when processed by the NN. Additionally, the dataset was randomly divided into two sets: the training set used to train the NN, and the test set, used for the NN evaluation. Stratified random sampling was used to select approximately 50% of the cases from each diagnostic category (GROUP 1 and GROUP 2) and form the training set in order to preserve the structure of the diagnostic groups in the divided sets. The LVQ NN is developed by using the training set data; the trained model was subsequently assessed by feeding the test set data and evaluating its performance. [31] is a supervised neural network classifier. The available data is divided into two sets, namely, the training and test sets and additionally the category that these data belong to should be known in advance.

LVQ Neural Networks Basics. LVQ
During the training phase, the data is used along with their allocated class and the classifier learns from this specific data set. The LVQ NN creates partitions of the feature space. Each partition is characterised by a vector in its center, called the codebook vector; the class of this vector characterises also the class of the complete partition. Only the training set is used during the training phase, while the codebook vectors are modified to represent the complete feature space. Consequently, a passing of all the training vectors to the classifier initiates the codebook vectors and subsequently the training algorithm is applied during each of these passes. Approximately 50-200 passes for all the training-set data are required for the training of the classifier.
During the test phase, the trained NN is evaluated. Each unknown case, being represented by a data vector is presented to the network. The case class is determined by the class of the partition, where the data vector resides. The partition is defined by two steps, initially it is found the codebook vector that is nearest to the unknown case data vector and subsequently the partition of this nearest codebook vector is considered to be similar to the partition of the data vector.
LVQs can be expressed as NNs composed of two layers: the first layer is a competitive layer having as inputs the feature vectors and follows a second linear layer that produces the NN output. The competitive layer learns and subsequently classifies input vectors into subclasses as described previously [31]. The linear layer assigns the classes produced by the competitive layer into the target classification classes as required by the specific problem. The classes represented by the competitive layer represent various subclasses formed in the feature space however belonging to the same target class, the classes produced by the linear layer (target classes) group the subclasses and produce from them a single class. A schematic diagram of the LVQ NN structure is presented in Figure 1. The competitive layer can be trained with various versions of the LVQ algorithm, namely, LVQ1, LVQ2.1, LVQ3, and optimized LVQ1 "OLVQ1" [32,33]. The optimal number of codebook vectors, the number of data passes and the optimal LVQ training algorithm variant are determined according to the classifier performance on the training set, that is, when there are satisfactory results on the training set. The interested reader for the details of the LVQ algorithm may consult the on line resources [32,34].  In order to produce comparable results for cytology and the LVQ NN, cases with cytology of ASC-H or worse and histology of CIN-2 or worse were considered as true positive, while cases with cytology of ASC-US or less and histology of CIN-1 and less were considered true negative.
The Cochrane Q test [35] was applied on the results of five new classifiers, trained and tested from scratch on different dataset splits. This test was used for the assessment of the robustness of the proposed methodology, We used open-source software implementing four versions of the LVQ algorithm (LVQ1, LVQ2.1, LVQ3, and optimized LVQ1 "OLVQ1"), named LVQ PAK [32,33] being developed by Kohonen et al. [31,32].

Results and Discussion
A total of 1258 samples were analyzed. The correlation of the cytological diagnosis with the histological result is shown in Table 2.
A total of 155 of those (12.3%) had CIN-2 or worse histology, while 1103 (87.7%) CIN-1 or less. The detailed distribution of the samples for the training and test sets appear in Table 3. The correlation of the cytology and histology is presented in Table 1. The ROC curve using the CIN-1 histological result as a cutoff point is presented in had an area under curve (AUC) 0.913 with standard error (S.E.) 0.014. The accuracy parameters for cytology and the results for the training and test set from the LVQ NN along with the combined results for both sets appear in Table 4.
The LVQ training was based on the LVQ1 algorithm, the number of neurons of the competitive layer was initially started from 2 and, with a step of 1, increased up to 50. The best results for classification of the training set were  Table 3). The comparison of the two ROC curves for cytology (AUC = 0.866 S.E. = 0.016) and the LVQ (AUC = 0.916 S.E. = 0.017) using histology as the gold standard, demonstrated that the LVQ NN classifier results are superior to the cytological diagnosis alone (z = −2.142, P < 0.05). The comparison on the overall accuracy of standalone cytology versus that of the LVQ classifiers favored the LVQ (χ 2 = 5.6, P < 0.05) as well.
In addition the LVQ NN system may identify more accurately women that require immediate referral to colposcopy but also reduce the number of women seen with clinically insignificant lesions. For the LVQ NN, 2% will require unnecessary colposcopy, whilst only 15% with a potentially significant lesion will fail to be referred appropriately for further colposcopic assessment. Using a cutoff of ASCUS+ cytology for referral to colposcopy the rates are 23% and 2.6%, respectively, and 2.6% and 27% using for a cutoff of ASC-H+.
The stability of the method was subsequently assessed using the z statistics on the two ROC curves (classifier versus histology for each one) for the LVQ training versus the test set. The lack of significant difference in the classifier's performance for the training and test set in the ROC curves (Training Set AUC = 0.927 S.E. = 0.023, Test set AUC = 0.905 S.E. = 0.025, z = 0.648 with P > 0.1) and in the overall accuracy of the training and test set (χ 2 = 0.072 with P > 0.1) proved the system's stability.
The robustness of the method was further evaluated with five new experiments. The dataset was de novo divided into "new" training and test sets. The stability in the performance of the "new" classifiers was evaluated after de nuovo retraining each time. The results are shown in Table 5. The probability that all the five classifiers would provide similar outcomes for an individual case was high (Q = 3.51, DF = 4) as calculated by the Cochrane Q test.
This study shows that LVQ NN has superior performance than cytology alone for the detection of high-grade lesions in a mixed population (women attending screening and colposcopy). It has higher sensitivity and specificity than cytology at the threshold of ASC-H + and much higher specificity than cytology at the threshold of ASCUS, even though its sensitivity was lower. Our results from the comparison of the training and test set were nonsignificant and suggest that the use of LVQ NN appears to provide an accurate prediction of the histological outcome that could guide further management. It therefore seems that the LVQ NN is a reliable tool for the diagnosis of high-grade lesions. The challenge is finding an appropriate role for its use in the cervical cancer prevention efforts. The high costs of the biomarkers and the reduced sensitivity over cytology at the commonly used threshold for screening (ASCUS+) makes the LVQ NN probably unsuitable for screening purposes. However its high specificity and its overall superiority over higher cytological thresholds suggest a possible role in triage of minor or even major cytological abnormalities.
Despite the advances of the last decade, there is still lack of consensus on the optimal management of women presenting with a low-grade cytological abnormality (LSIL) [36][37][38][39]. A substantial proportion of these women may actually harbor a high-grade lesion (HSIL). In our population, 14.6% of women with LSIL had underlying CIN2+ at histology. The so far available management options for low-grade abnormalities include either conservative surveillance with repeat cytology or immediate referral to colposcopy [40]. Surveillance has an inherent risk of noncompliance and default from further surveillance [41], which may put women with clinically significant lesions at potential risk of invasive cervical cancer. Immediate colposcopy can conversely lead to overloading of colposcopy clinics with financial consequences    to clinical and health resources as well as overintervention and overtreatment with long-term adverse future pregnancy outcomes or even increased perinatal mortality in women of reproductive age [2,42,43]. A more accurate assessment of the underlying risk would be undoubtedly beneficial for patients as well as health economies. Similarly, current practice and guidelines advocate immediate referral and commonly histological confirmation and treatment for all high-grade lesions. A small, albeit significant, proportion of those cases (up to 30%) may, however, have a low-grade or even normal histology. In our study 20.4% of women with ASC-H+ cytology had CIN-1 or less. Reliable triage methods are lacking. An attempt in that scientific direction is the development of the concept of a "scoring system", in order to identify individual risk for CIN2+ regardless of the cytologic and colposcopic findings. A combination of the proposed HPV-related biomarkers as well as other epidemiological data, available from women's history, such as demographic characteristics, sexual behavior, and potential cofactor information (smoking, condom use, etc.), in addition to cytology and colposcopy, could identify individual risk estimation for CIN2+, CIN3+ or even invasive disease [40,44].
Although HPV DNA testing has been proposed as a reliable method of triage of ASCUS cytology [3,4], new biomarkers, such as HPV DNA status, mRNA E6/E7 expression of HPV -16, -18, -31, -33, and -45, E6/E7 mRNA expression of high-risk HPV types and p16 immunocytochemistry may prove useful and improve the accuracy of diagnosis [36,38,39]. The incorporation of those new tests in neural networks may allow easier and more accurate interpretation of the results.
Our study has certain limitations. The population under study is mixed, that is, women attending for screening and women referred for colposcopy because of abnormal smears. Secondly in our study a large number of variables was used, which although validated as relevant in the diagnosis of CIN2+ in the literature, some of them might eventually prove to have minimal weight in the final diagnosis and could be excluded.
Therefore a future research direction could be studying the performance of LVQ NN in a strictly defined population consisting of women with LSIL smears only and assess its accuracy indices in triage. Also as the use of all the biomarkers in all situations will increase costs to a nonacceptable level, research should be directed at finding different combinations with a selection of those tests with similar accuracy at reduced cost. Studies assessing also the cost-effectiveness of such an approach on a screening program are required.

Conclusions
These preliminary results suggest that the incorporation of new tests and combinations of biomarkers using artificial intelligence methods end especially the LVQ NN may significantly improve the accuracy of diagnosis. Such an approach may reduce the overload of colposcopy clinics and guide tailored management and intervention. The results should be further assessed in larger datasets in order to confirm the reproducibility of those findings and the applicability in situations of specific interest.