Automated analysis of vessel morphometry in retinal images from a Danish high street optician setting

Josefine Freiberg; Roshan A. Welikala; Jens Rovelt; Christopher G. Owen; Alicja R. Rudnicka; Miriam Kolko; Sarah A. Barman; on behalf of the FOREVER consortium

doi:10.1371/journal.pone.0290278

Abstract

Purpose

To evaluate the test performance of the QUARTZ (QUantitative Analysis of Retinal vessel Topology and siZe) software in detecting retinal features from retinal images captured by health care professionals in a Danish high street optician chain, compared with test performance from other large population studies (i.e., UK Biobank) where retinal images were captured by non-experts.

Method

The dataset FOREVERP (Finding Ophthalmic Risk and Evaluating the Value of Eye exams and their predictive Reliability, Pilot) contains retinal images obtained from a Danish high street optician chain. The QUARTZ algorithm utilizes both image processing and machine learning methods to determine retinal image quality, vessel segmentation, vessel width, vessel classification (arterioles or venules), and optic disc localization. Outcomes were evaluated by metrics including sensitivity, specificity, and accuracy and compared to human expert ground truths.

Results

QUARTZ’s performance was evaluated on a subset of 3,682 images from the FOREVERP database. 80.55% of the FOREVERP images were labelled as being of adequate quality compared to 71.53% of UK Biobank images, with a vessel segmentation sensitivity of 74.64% and specificity of 98.41% (FOREVERP) compared with a sensitivity of 69.12% and specificity of 98.88% (UK Biobank). The mean (± standard deviation) vessel width of the ground truth was 16.21 (4.73) pixels compared to that predicted by QUARTZ of 17.01 (4.49) pixels, resulting in a difference of -0.8 (1.96) pixels. The differences were stable across a range of vessels. The detection rate for optic disc localisation was similar for the two datasets.

Conclusion

QUARTZ showed high performance when evaluated on the FOREVERP dataset, and demonstrated robustness across datasets, providing validity to direct comparisons and pooling of retinal feature measures across data sources.

Citation: Freiberg J, Welikala RA, Rovelt J, Owen CG, Rudnicka AR, Kolko M, et al. (2023) Automated analysis of vessel morphometry in retinal images from a Danish high street optician setting. PLoS ONE 18(8): e0290278. https://doi.org/10.1371/journal.pone.0290278

Editor: Thiago Gonçalves dos Santos Martins, Federal University of Rio de Janeiro: Universidade Federal do Rio de Janeiro, BRAZIL

Received: November 23, 2022; Accepted: June 29, 2023; Published: August 24, 2023

Copyright: © 2023 Freiberg et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data that support the findings of this study are available from Synoptik A/S but restrictions apply to the availability of these data. Request for the retinal image dataset can be sent to the FOREVER steering group at forever@sund.ku.dk. The FOREVER steering group is composed of a project lead, the CEO of Synoptik, Head of Clinical Development of Synoptik, and a maximum of six experts. The six experts should represent the following fields: ophthalmology, epidemiology, big data/databases, genetics, omics, and questionnaire designs. The FOREVER steering group has a formal procedure for handling data requests.

Funding: Project FOREVER is funded by the Synoptik Foundation. An additional grant has been obtained from Fonden til Lægevidenskabens Fremme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The retina of the eye is considered a part of the central nervous system (CNS) and is said to be a window to the brain and circulatory system [1,2]. Not only do the retina and brain share anatomical and embryonic development characteristics, but also the microvascular circulation and regulation in the brain and retina are similar [1]. Viewing the vessels of the retina provides a unique opportunity to study the blood circulatory system. While systemic blood circulation can be visualized by using invasive procedures such as angiography x-ray examinations, the vessels of the retina are captured by non-invasive fundus images. Changes in retinal vessel tortuosity and diameter have previously been linked to cardiovascular disease, diabetes, and glaucoma [3–5]. Also, CNS and systemic diseases such as ischemic brain incidence, stroke, multiple sclerosis, and Alzheimer’s disease have recognized ocular manifestations [2]. Thus, it is evident to believe, that retinal vessels may be a biomarker for identifying early signs of both ocular and systemic diseases and can be used as a predictor for disease development.

Retinal imaging is part of a routine eye examination when visiting an ophthalmologist. In recent years it has gained popularity in high street optician chains, as retinal imaging demands limited training and captures important signs of disease pathology such as changes in the optic nerve, macula and blood vessels of the retina [6]. The interest in using artificial intelligence (AI) in healthcare as a supplement to routine eye examinations is growing, as its ability to help clinicians manage routine tasks and analyse large amounts of data effectively has the potential to transform healthcare [7–10]. Image recognition/diagnosis classification and the search for new prognostic risk factors are of particular interest [11]. Automation and AI in ophthalmology spans a multitude of approaches, from traditional image processing (unsupervised techniques e.g., edge detection or morphological operators), to machine learning techniques (e.g., supervised learning and unsupervised learning), which use hand-crafted features, to deep learning (a subfield of machine learning) that can automatically learn features. Current research in AI software for automatic analysis of retinal fundus images includes studies on cardiovascular disease, diabetic retinopathy, age-related macular degeneration, retinopathy of prematurity, neonatal fundus haemorrhages, glaucoma, and retinal breaks and detachments [11–21].

Vessel morphometry (also known as vasculometry) is an approach for studying biomarkers of disease. This requires retinal images to be converted into quantitative measurements, including measures of vessel width, area, and tortuosity. However, this is a time-consuming task for human observers and not feasible for studies examining vasculometric associations with disease, which demand big sample sizes to generate enough power to study small, yet meaningful group differences [22]. Hence, different software programmes for automated vessel analysis have been developed [23,24], including QUARTZ (QUantitative Analysis of Retinal vessel Topology and siZe) [25].

QUARTZ converts retinal images into quantitative measures of vessel morphometry for use in epidemiological studies [26–28]. QUARTZ analyses the entire retina (not limited to concentric areas around the optic disc), and evaluates image quality, vessel segmentation, arteriole/venule (A/V) classification, width, area, and tortuosity measurements of retinal vessels, and localisation of the optic disc (Fig 1) [4].

Download:

Fig 1. QUARTZ interface.

https://doi.org/10.1371/journal.pone.0290278.g001

QUARTZ has previously been validated on the UK Biobank dataset [27]. The UK Biobank contains data from more than 502,656 UK citizens (40–69 years of age) collected from 2006 to 2010 [29]. Of all the participants, 68,549 had retinal images (45-degree field-of-view and 2048 x 1536 pixels image size) taken at baseline [27,29]. Output data from QUARTZ has previously been combined with epidemiological data from the UK Biobank cohort in the search for potential new biomarkers of disease. Studies have investigated the associations between vessel morphometry and glaucoma as well as cardiometabolic risk factors, including its ability to predict myocardial infarction and stroke [4,30–33]. Although QUARTZ was originally developed for use in UK Biobank, it is relevant to examine the performance of QUARTZ on multiple datasets using different image capture systems with images taken by experts and non-experts, as future versions of QUARTZ may be targeted at the clinic and hence should demonstrate robustness and high performance with few limitations across datasets [7,34]. Thus, the aim of this paper was to validate QUARTZ on a further dataset with a different image acquisition protocol. The performance of QUARTZ was investigated on the FOREVERP (Finding Ophthalmic Risk and Evaluating the Value of Eye exams and their predictive Reliability, Pilot) dataset from a Danish optician chain, and QUARTZ’s generalizability across datasets was examined by comparing the performance with previously published data from the UK Biobank.

Methods

The methods applied in this paper have been detailed previously by Welikala et al. [27]. The same methods were used to ensure the comparability of performance parameters across datasets.

The FOREVER dataset

Project FOREVER has been approved by the National Committee on Health Research Ethics, Denmark (project id H-21026000). The design and methodology of project FOREVER has been described thoroughly by Freiberg. et al. [35]. For participants enrolling in project FOREVER, informed written consent will be collected. The FOREVER (Finding Ophthalmic Risk and Evaluating the Value of Eye exams and their predictive Reliability) dataset contains data from Danish citizens, aged above 18 years, visiting an optician shop in Denmark. The dataset includes eye examinations: visual acuity, refraction, corneal thickness, intraocular pressure, retinal images and perimetries. A subset of the FOREVER dataset contains additional data on blood pressure, saliva samples for genetic analysis and Optical Coherence Tomography (OCT) scans. As Danish citizens have a unique social security number, the FOREVER dataset can be linked to the national registries enabling comprehensive linkage to disease risk and outcome data.

Enrolment of participants in the FOREVER cohort began in July 2022. The dataset used for validating QUARTZ to the FOREVER dataset consisted of images from the same Danish optician shops as in the FOREVER cohort. The dataset consisted of a subset of 3,682 images from 1,139 anonymized customers visiting an optician shop between February 2018 to May 2021. The dataset is referred to as “FOREVERP” (FOREVER, Pilot). The images were randomly selected for validation of QUARTZ and were not images from the FOREVER cohort, as validation was performed prior to the enrolment of participants in the FOREVER cohort. However, images from the FOREVERP dataset are comparable to images from the FOREVER dataset given that the image acquisition protocol is the same. Images from the FOREVERP can eventually be part of the FOREVER database if FOREVERP participants decide to enrol in the FOREVER cohort by given written consent.

The macular-centred retinal fundus images in “FOREVERP”were captured without mydriasis using digital non-mydriatic retinal cameras (Canon CR-2 AF) which incorporate Canon EOS 70D and Canon EOS 80D cameras. The retinal image photographers were trained personnel who either 1) attended a two-day course in fundus imaging, tonometry and perimetry enabling them to recognize errors and artefacts, or 2) had been trained by an optometrist. The optometrists are continuously trained with two mandatory and four optional training days per year, with a focus on specialized training in eye diseases such as glaucoma and diabetic retinopathy. The images were collected from multiple visits over several years, and the number of images varied per participant. Images were macular-centred and had a 45-degree field of view. Images were in BMP format and of multiple image sizes ranging from 1824 x 1216 pixels to 3984 x 2656 pixels, resized to 3984 x 2656 pixels.

Performance parameters

The performance of the algorithm was compared with a reference standard or ground truth (GT). The GT was derived from data annotation performed by human observers (JF and RAW) [36] using purpose-built software. The performance of the algorithms was compared with the GT (e.g. comparison with labelled pixels, images, vessel segments, vessel widths etc.) and most were assessed by calculating the performance parameters of sensitivity, specificity and accuracy (Table 1) [37]. Sensitivity refers to the percentage of the positives that are correctly classified as positive (TP). Specificity refers to the percentage of negatives that are correctly classified as negative (TN). Accuracy refers to the proportion of the outcomes correctly predicted as either positive (TP) or negative (TN) [37,38].

Download:

Table 1. Performance parameters used for evaluation performance of algorithms; Sensitivity, specificity, and accuracy, TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative.

https://doi.org/10.1371/journal.pone.0290278.t001

Automated image quality

Supervised learning (support vector machine classifier with the radial basis function kernel) along with global shape features (area, fragmentation, and complexity) measuring the segmented vessel map was used to classify images as either of inadequate or adequate quality. This approach was designed for use in epidemiological studies; hence an image can still be deemed adequate even if only a portion of the vasculature is visible [26] (Fig 2). 1,000 images were randomly selected and manually labelled by one human observer (RAW). Of the images, 826 were manually labelled as of adequate quality and 174 images as inadequate. The supervised classifier was trained with 500 images (using 5-fold cross-validation for model selection) and evaluated using a test data set of 500 images. A TP outcome equalled an image correctly classified as being of inadequate quality (Fig 2). The probability output from the classifier was normalized on a scale from 0 to 1 and flipped, to generate an image quality score (1 = highest quality).

Download:

Fig 2. Automated image quality assessment of FOREVERP images performed by QUARTZ.

(A and B) Examples of images of inadequate quality. (C and D) Examples of images of adequate quality.

https://doi.org/10.1371/journal.pone.0290278.g002

Vessel segmentation

An unsupervised approach based on a multi-scale line detector and hysteresis thresholding based morphological reconstruction was used for vessel segmentation (Fig 3) [26]. The test set consisted of 10 randomly selected images of adequate quality. Two human observers (JF, RAW) manually labelled the test set independently, creating two separate sets of 10 images. The vessel segmentation from the first human observer (RAW) constituted the GT. The segmentations made by the second human observer (JF) were considered the target performance level that the automated segmentation should aim to achieve. The performance was evaluated per pixel with and without pre- and post-processing. Pre-processing refers to the removal of pixels of bright intensities whereas post-processing refers to the removal of the fovea and small objects falsely segmented as vessels [26,27].

Download:

Fig 3. Automatic vessel segmentation of FOREVERP images performed by QUARTZ.

(A) Retinal fundus images. (B) Segmentation of vessels (green). (C) Segmentation of vessels (white).

https://doi.org/10.1371/journal.pone.0290278.g003

Vessel width measurements

An unsupervised approach was used for measuring vessel widths. This included creating centrelines (segmentation thinned) and edge points (zero-crossings of the second derivative), followed by measuring the distance between edge points orthogonal to the vessel centreline orientation (Fig 4) [27]. The test set consisted of 2,150 vessel profiles from 10 images of adequate quality. 961 profiles were from normal vessel segments without a strong central reflex, and with even illumination. 552 profiles showed a central reflex, and 637 profiles had low contrast or uneven illumination. Two human observers (JF, RAW) manually labelled the test set independently (Fig 5), and the mean of the two observers was used as the GT. To evaluate the agreement of measurements between QUARTZ and GT, a Bland-Altman plot was conducted.

Download:

Fig 4. Automatic vessel width measurements performed by QUARTZ on FOREVERP images.

(A) Vessel centre lines (blue). (B) Vessel centre lines (blue), vessel edges (red), and marking of optic disc (black).

https://doi.org/10.1371/journal.pone.0290278.g004

Download:

Fig 5. Vessel widths performed by human observer.

(A) FOREVERP retinal image with vessel marked for width measurements (green). (B) Vessel segment showing a vessel centreline (green line). (C) Vessel segment with vessel widths (yellow crosses) marked by human observer.

https://doi.org/10.1371/journal.pone.0290278.g005

Arteriole and venule classification

Supervised learning was used for classifying vessels into arterioles or venules. This included the use of deep learning, specifically a 6-layered convolutional neural network [28]. A total of 100 images of adequate quality were randomly selected and divided into a training set, a validation set, and a test set consisting of 50, 15 and 35 images, respectively. Two human observers (JF, RAW) manually labelled 50 images each. Classification of vessels was evaluated on both pixels and vessel segments. A vessel segment refers to the part of a vessel between bifurcations and crossover points. The human observers used the following criteria for distinguishing between arterioles and venules [28,39]:

Colour: Venules appear darker than arterioles.
Diameter: The arterioles are thinner compared with adjacent venules.
Central reflex: The central reflex is wider in arterioles compared with venules of comparable size [28,40].
Branching: When labelling small vessels without colour differences or visible central reflexes, vessel branching was followed.

The sensitivity, specificity, and accuracy of the classification of arterioles and venules were examined for different probability thresholds; >0.5, >0.6, >0.7, >0.8, and >0.9.

Optic disc localization

An unsupervised approach was used to determine the localization of the optic disc in the macular centre fundus images. This included the use of shade correction followed by the location of maximum intensity within a search region with constraints set. The test set comprised 300 images of adequate quality. One human observer manually labelled the images by marking the localisation of the optic disc.

Results

Automated image quality assessment

The ability of QUARTZ to detect low-quality images, evaluated on the 500 test set images, was calculated to have a sensitivity of 91.95% and a specificity of 95.64%. This equates to 80.40% of all images in the test set being labelled as of adequate quality (TN and FN) and of these, 98.26% were correctly labelled as of adequate quality (TN). When applying the automated algorithm to the full subset of 3,682 images, 80.55% (2,966 images) were labelled as adequate quality; with 95.17% of the participants having at least one image labelled as of adequate quality (Table 2). As these numbers include images from several years, they may overestimate the actual number of participants with an image of adequate quality. Evaluating images from one year (2021) showed consistency in image quality with 93.94% of the participants having at least one image labelled as adequate. The performances stated above equated to images being labelled as inadequate if the quality score was ≤ 0.48.

Download:

Table 2. Assessment of image quality of the FOREVERP dataset.

Image quality evaluated as number (N) and percentage of participants with 0, 1, 2, 3, ≥1, ≥3 and images of adequate quality per participant in total and from one year (2021).

https://doi.org/10.1371/journal.pone.0290278.t002