Computer Assisted Pap Smear Analyser for Cervical Cancer Screening using Quantitative Microscopy

Rajasekharan Usha Deepak1, Ramakrishnan Rajesh Kumar1, Neendoorthalackal Balakrishnan Byju1, Pundluvalu Nataraju Sharathkumar1, Chandran Pournami1, Salam Sibi1, Ewert Bengtsson2 and Kunjuraman Sujathan3* 1Health and Software Technology Group, Centre for Development of Advanced Computing, Vellayambalam, Thiruvananthapuram 695033, Kerala, India 2Centre of Image Analysis, Department of Information Technology, Uppsala University, Sweden 3Division of Cancer Research, Regional Cancer Centre, Medical College Campus, Post Bag No.2417, Thiruvananthapuram, Kerala, India


Introduction
Cervical cancer is one of those rare cancer groups which can be diagnosed and fully cured if detected at onset. Even then, more than 525,000 women are diagnosed with cervical cancer and more than 265,000 die from the disease every year [1]. 85% of these deaths occur in low and middle income countries and the reason being poor access to screening and treatment services [2]. Globally there are 2 billion women [3] in the age group where screening is relevant and who need screening every three years [4]. The Pap smear test [5], invented by Dr. George Papanicolaou in 1940, is by far the most widely followed screening technique which can detect cervical cancer at an early and easily curable stage by studying the cells naturally exfoliating from the cervix. Screening based on Pap test (or Pap smear) has led to a dramatic reduction in the mortality rate for women who have been tested regularly in countries with an effective screening program [6][7][8][9][10][11]. Human Papilloma Virus (HPV) and its numerous strains, around 200 strains, have been identified as a causative factor for cervical cancer [12]. Very recently, prophylactic vaccination against HPV has been introduced. The vaccines prevent HPV infections against only a smaller subset of strains [2,[13][14][15][16][17][18]. American Cancer Society and UICC recommend Pap smear screening for vaccinated women also [18,19]. Even though facilities for detecting HPV infection using molecular tests are available, Pap test remains the most widely used screening method. Studies show that even a single screening in a life-time substantially reduces risk of cervical cancer incidence [20]. However, competing health care priorities, insufficient financial resources, weak health systems, and limited numbers of trained providers have made high coverage for cervical cancer screening in most low-and middleincome countries difficult to achieve [2,[20][21][22][23].
Visual screening of a Pap smear includes careful scrutiny of several thousand Fields of View (FOV) under a microscope, which together contains a few hundred thousand cells, for identifying a few abnormal cells. Screening is a most demanding function of the human eye-brain axis, it is exhaustive and fatigue producing [24]. According to the Clinical Laboratory Improvement Act (CLIA) of 1988 cytotechnologist, those who screens the specimen, should not process more than 100 slides per day because of fatigue and habituation factor which deteriorates the quality of screening and can result in high number of false positive and negatives [25]. To give reasonable protection against developing undetected cervical cancers, eligible women need to be screened regularly. Considering 2 billion women population, in relevant age groups, screening programs generate enormous numbers of samples to analyze. Educating and financing sufficient numbers of human screeners create great practical and economic problems which have led to substantial interest in trying to automate the task. Furthermore the human eye-brain axis is not good at appreciating the early nuclear changes which are the first indications of neoplastic transformations. The quantitative microscopy is much better suited for detection and objective measurement of early changes of malignancy [26].
Ever since the first appearance of computers, significant development efforts have been aiming at supplementing or replacing the human visual inspection of Pap-smears by computer analysis [27][28][29]. But the problem turned out to be lot harder than expected. From the first automated system in 1950's it took almost another half a century before the first commercially successful system appeared.
The Cytoanalyzer built during the 1950's was the first attempt towards automation of PAP smear screening. Although the system was able to distinguish the morphological difference between normal and malignant cells it produced too many false alarms. Another early attempt was CYBEST, developed during the 1970's, which was able to detect malignancy based on morphological features but had problems with the chromatin features primarily caused by poorly focused images. During 1980's quite a number of systems like BioPEPR, FAZYTAN, LEYTAS, DIASCANNER etc. were developed. Although some of the systems reported accuracy comparable with conventional visual screening none was successful owing to lack of cost effectiveness [30]. Lately, research work on PAP smears images has been done with assisted segmentation where free lying cells with no interference by inflammatory cells were handpicked [31]. Such work may require significantly more effort to develop into a field deployable screening system Two United States Food and Drug Administration (FDA) approved automated machines were developed in the 1990s, the AutoPap 300 QC (NeoPath, Redmond, WA, USA) and the PapNet (Neuromedical Systems Inc., Suffern, NY, USA), both systems were designed to work with conventional cytology slides. AutoCyte also developed a machine known as the AutoCyte-Screen which was able to read AutoCyte-Prep slides (now BD SurePath LBC). The experiences gained from these early commercial efforts led to the merger of the companies into TriPath Imaging Inc. (Burlington, NC, USA) and the first generation products were replaced by the AutoPap Primary Screening System, which is now known as the BD Focal Point GS Imaging System (BD Diagnostics, Franklin Lakes, NJ, USA). Cytyc also developed an interactive system with a computer prescreen that selected the most abnormal looking objects on each specimen for human inspection. In 2003 they received FDA approval for their ThinPrep Imaging System, and in 2007 they became part of the Hologic Company. The system is marketed for increasing detection of abnormalities by improved specimen preparation and screening both visually and by machine [30,32]. Even with numerous attempts, still automated screening is not sufficiently cost-effective to completely replace the visual screening judging from the relatively limited penetration of automated screening systems in the screening operations worldwide [30].

Materials and Methods
The basis for the Pap smear screening is that cancerous or precancerous abnormal cells have larger nuclei and more irregular shape and chromatin structure than normal cells, as from the Figure 1a and 1b. However, the task is not as simple as said owing to the facts that cells in the specimen, even though prepared by mono-layering technique, are often folded, overlapped, covered by blood cells or other artifacts and clustered as in Figure 1c. Moreover, as the task is analyzing a few hundred thousand cells looking for malignant cells even a very low false positive rate will result in all specimens to be classified as malignant.
To address the said problems, the automated screening system uses advanced image acquisition, processing and classification technique coupled with novel monolayer slide preparation technique detailed in the subsequent sections to provide a solution, which can be adopted for mass screening of cervical cancer.

Pap smear collection
Pap smears were obtained from women attending the early cancer detection clinic and cancer detection camps of RCC. Cervical scrapes were obtained using cervicobrush and the cells were preserved in the vials provided by Surepath Liquid Based Cytology (LBC) system. A separate scrape of cells were obtained for Mega funnel Technique from a selected group of women whose consent was taken in advance. The samples were processed in the Surepath system according to the manufacturer's instruction and MFT as described.

Mega-funnel specimen preparation technique
Each of the cell samples in 10 mL of preservative solution containing 50% alcohol, glacial acetic acid and a mucolytic agent was homogenized in a vortex for 30 seconds followed by centrifugation at 2000 RPM for 5 minutes. The cell palette was mixed well with 1 mL of preservative solution and 200-300 µL of the sample was then cyto-centrifuged onto a coated slide using a mega-funnel. The smears were fixed in 95% of alcohol for 15-30 minutes and stained using classical Pap staining method producing a specimen dimension of 22 mm×15 mm. A total of 60 MFT slides were prepared and compared against commercial LBC system to produce image of quality comparable as that of commercial LBC system. Gross appearance of slides and magnified view of smears

Field of view selection
Specimens prepared on glass slides were magnified through a 40X lens to accurately quantify nuclear chromatin distribution which resulted in an average of 2000 FOVs needed to cover the whole specimen. Data acquisition in this work was manual and as it was impractical to cover the whole specimen with manual repositioning between FOVs, only interesting FOVs were selected from each specimen. The image data was acquired by a person skilled enough to operate a microscope who after scanning the entire specimen selected 40 FOV's based on the relative density of stain and nuclear enlargement. Each FOV was optimally focused manually before acquisition.

Specimen digitization
Each FOV selected and focused manually was digitized using an image acquisition utility, e-Smear developed by our team (Government of India, Copyright Registration No. SW-6416/2013) which controls camera parameters, digitizes FOV, logs patient clinical details, creates a systematically organized repository of Pap smear, records image annotations and generates statistical reports. The microscope used was Leica DM2500 with a plan apochromat objective of magnification 40X and numerical aperture of 0.65. The camera used in the digital microscope was Leica DFC495 producing RGB images with a spatial resolution of 3264×2448 pixels and sensor pixel size of 2.7 μm. The whole CMOS sensor of DFC495 has a physical dimension of 8.81 mm×6.61 mm. To capture the maximum possible area in a FOV a demagnifier of 0.63x magnification was also used, resulting in an effective pixel size of 0.1 μm. The workstations which host e-Smear and the slide analysis software were quad core Dell desktops with 4 GB of RAM having a 32 bit operating system.

Pap Image analysis
The images acquired from e-Smear were transferred to an image processing station where each image undergoes a series of processing and analysis steps to finally classify the specimen as either normal or suspicious. A flow chart of the Pap image analysis is shown in Figure 3.

Preprocessing and segmentation
A Laplacian of Gaussian (LoG) filter was used for detecting objects from Pap smear image. The Laplacian operator applied on the image highlight regions of rapid intensity change and was used for edge detection. In order to reduce its sensitivity to noise, the Laplacian operator was applied to an image that has been first smoothened by a Gaussian smoothing filter. Red blood cells, RBC's are removed using color information from the true color RGB input image [33].

Feature extraction and ranking
The heart of the quantification and automation task is to determine what is to be measured and how it should be measured. Over the past 50 years of quantitative cytometry quite a large set of features have been tried and tested for various applications [34]. Around 40 mathematical features which can accurately determine morphology, texture and densitometry of cervical epithelial cells were identified heuristically. All the identified features were ranked using histogram analysis and Mahalanobis maximization function [35], which is the ratio of difference in mean and sum of standard deviation of normal and abnormal cells.

Classification
A hierarchical multi-stage classification approach was followed for classifying normal smears from suspicions smears. In the first stage, artifacts, microbes and other debris were separated from epithelial cells [36,37]. The epithelial cells were then analyzed using a set of mathematical features to determine suspicious cells from the rest. Apart from the cell level classification, cell clusters were detected for careful scrutiny [38], significant diagnostic information was gathered from count of neutrophils [39] and Koilocytes [40]. Finally the cell distribution of the whole specimen was analyzed for deviation from normal cell distribution. The final classification decision was made by a specimen level classifier taking input from the cell level and slide level classifiers. A flow chart is shown in Figure 4.   study was manually screened by cytologists with over 25 years of experience and the ground truth was recorded. Smears were also analyzed in parallel by the automated system using image processing methods. Manual cytology was considered as the gold standard for benchmarking the efficacy of the automated analysis. All abnormal smears were biopsy proven. Table 1 describes distribution of slides used for validation.

Results
The number of smears correctly classified and misclassified is described in Table 2. True positives are those abnormal smears which are classified as suspicious and sent for cytologist's review by automated analysis. True negatives are normal smears which were classified as normal and require no further human intervention. False positives and negatives are misclassified smears. Not processed smears are smears which were rejected from automated analysis either because of poor image quality or insufficient number of image fields.
The system screened out 60% of the normal smears which needs no further human review and classified 80% of the abnormal cases as suspicious which needs further expert human review, as in the Table  3. Detailed analysis of accuracy in normal and different precursors of cervical cancer is elaborated in Table 4.

Comparison with commercial Systems
In a randomized controlled trial by Kitchener et al. [30], automatedis a GUI application used to generate ground truths, visualization of segmentation results, feature extraction, training set creation and visualization of classification results. A total of 15,708 malignant cells were hand marked by cytotechnologists and close to 300,000 normal cells from normal smears, verified by cytotechnologists, were auto marked using CellMarker. 3092 cells which include 2935 normal cells and 157 abnormal cells of all grades were used to train the classification algorithm. The study protocol was approved by the Human Ethics Committee (HEC) of RCC, Thiruvananthapuram (HEC No. 22/2009). The evaluation protocol is elaborated in Figure 5.

Collect Sample
Prepare PAP slide Digitize PAP slide Automated Analysis

Manual Analysis
Benchmark the ef�icacy using manual reporting as Gold Standard Atypical squamous cells -cannot exclude HSIL ( ASC-H) 9 High grade squamous intraepithelial lesion ( HSIL) 60

Squamous cell carcinoma ( SCC ) 39
Total number of slides 1107    for movement of slide in XY direction for FOV hopping. Image focus on each FOV is controlled by moving either stage or objective in Z direction. Throughput of the system depends to a larger extend on the speed of the image acquisition which requires motorized mechanical movement. However, as the automated platform can work 24/7, excluding time for routine maintenance, human efficiency can very well be breached. From field trial by Kitchener et al., adoption of an automated-assisted system resulted in increase in productivity by 60%-80% [30].
Malignancy associated changes: An alternative approach to exhaustive scan of complete specimen is analysis of the field-effect or malignancy associated changes (MAC) [41,42], which points to the subtle changes in normal cells present in malignant smears. These discoveries were confirmed in the early research on automated cervical screening [43,44]. If the MAC approach is adopted only a small subset of cells from each smear needs to be analyzed instead of the complete smear scan. For MAC analysis it is essential to analyze the chromatin pattern in great detail. A highly accurate artefact removal with perfect focus is a prerequisite to convincingly demonstrate that MAC alone can detect early premalignant changes with sufficient sensitivity [30].
Field trials: The system need to undergo an extensive independent evaluation on around 10,000 smears. Image analysis throughput: Image analysis throughput can be improved by porting CPU intensive operations to graphics processing units (GPU) where hundreds of dedicated highly parallel cores make the system more efficient with only a marginal increase in cost.
Building a cost effective system: The goal of our project is to demonstrate that a cost effective system for Pap-smear screening can be built. Based on the experiences gained from our study we have made a rough estimate of the component costs for a final automated system. It can be based on a standard microscope with minimal modification for integrating digital camera, quality optics, motorized stage and illumination. Furthermore, commercial XY motorized stages are now available with a travel range sufficient to load multiple slides in which case human intervention is required may be in every 2 hours or so just to re-load slide tray. Such cost optimized systems which avoid the need for expensive embellishments can be very well be implemented under $34,000, Table 6 describes approximate component wise costs. On an economic angle, if such a system will be able to screen a moderate target of 20,000 women per year, the screening cost for each smear can be reduced to under $2 from $7, which is the current cost of slide screening in India. This savings computes to $100,000 per year or rather possibilities of offering screening to many more.
To better address disperse population in low resource setting a Centralized Smear Analysis Station (CSAS) and multiple Satellite Smear Collection Centre (SSCC) model is suggested. CSAS should have all equipments mentioned in Table 6, while SSCC need to have only person(s) collecting cervical smears on a buffer which will be transferred to CSAC. Number of CSAS and SSCC can be decided based on population required to be screened and also based on resource availability. Analysis station contains desktop grade computers which had now evolved into a stable product requiring very less maintenance, if at all required, support will be readily available. Same applies for microscope and its accessories. Microscope XY stage will be only component requiring occasional maintenance due to wear & tare caused by heavy duty slide scanning which is easily addressed being at a centralized location. assisted and manual cervical screening was extensively studied and compared. Automated-assisted systems used in the trial were Becton Dickinson (BD) FocalPoint Slide Profiler (Becton Dickinson, Franklin Lakes, NJ, USA) and ThinPrep Imaging System (Hologic, Bedford, MA, USA). The primary outcome of the trial was to determine sensitivity of automation-assisted reading relative to manual reading which is detailed in Table 5. As evident from comparing Tables 4 and 5, the system described in this paper produced results of sensitivity comparable to or even better, in case of HSIL, than existing commercial system. Slides classified as No Further Review (NFR) by the commercial automated system were at 22% which is almost three times inferior to our system. The automated image analyzer described in this article effectively screens out 60% of normal cases (similar to NFR of commercial system) which requires no further human intervention. It can thus reduce the workload of cytologist by up to 60% and extend the screening to many more, when deployed for population screening. Furthermore, as from the Table 4, high grade lesions like HSIL and SCC are detected with a higher accuracy of 93% and 95% respectively which mainly limits the false negatives to less severe cases like LSIL. As the disease takes a decade to progress to carcinoma in situ, a systematic implementation of routine screening will considerably reduce the chance of disease going undetected.

Future work
This work has demonstrated a system capable of detecting early pre-malignant changes of cervical smears with acceptable classification performance. However, the operation of the system needs to be made more time-efficient before large scale deployment. We here outline some of the considerations that will be taken into account for that work.
Motorized microscope: The existing system was designed in a semi-automated fashion where a semi-skilled person can operate the microscope, position the stage, focus and acquire the images while the analysis part is taken care of by the image analysis platform. A more sophisticated approach is full automation of slide loading and scanning where a robotic arm transfers each slide from slide tray to a scanning space which will be controlled by stepper or piezo controlled motors