Introduction

With the availability of highly effective anti-VEGF therapies in the treatment of wet age-related macular degeneration (AMD),1, 2 the role of early detection of the disease as well as early detection of changes in macular function during treatment and follow-up has greatly gained in importance.3, 4

To date, the Amsler grid and preferential hyperacuity perimetry (PHP) are two frequently used tests in the diagnostic work-up in clinical practice.5, 6 Besides them, several additional modifications and new tests, such as the shape discrimination hyperacuity test,7, 8 have been proposed, but their diagnostic value has not yet been systematically studied. In particular, their role as a screening tool needs to be established.

Early targeted treatment is a key component of successful AMD management. A recent study by Lim and co-worker showed that delayed intervention leads to insufficient treatment, irreversible macular damage, and a poorer visual outcome.9 In their study, a delay of 14 weeks doubled the likelihood for the worsening of vision after treatment.

To clarify the diagnostic potential of the Amsler grid, PHP, and other tests, we performed a systematic review and meta-analysis of diagnostic test accuracy studies investigating the concordance of an index test results with the presence or absence of wet age-related macular degeneration.

Materials and methods

This review was conducted according to the PRISMA statement recommendations.10

Literature search

Electronic searches were performed without any language restriction on MEDLINE (PubMed interface), Scopus (from inception until August 28th, 2013), and Web of Science (by citation of reference). The full search algorithm is available on request.

Eligibility criteria

The minimum requirement was the availability of original data and the possibility to construct a two-by-two table. We accepted the following reference tests classifying presence or absence of AMD: optic coherence tomography (OCT), color fundus photographs, fluorescein angiography (FA), and scanning laser ophthalmoscopy (SLO).

Study selection, data extraction, and quality assessment

The methodological quality of all eligible papers was assessed based on published recommendations.11 We refrained from doing a rating or ranking of findings based on recommendations by Whiting et al.12 Quality assessment involved scrutinizing the methods of data collection and patient selection, and descriptions of the test and reference standard. Blinding was fulfilled if the person(s) classifying presence or absence of AMD did not know the results of the index examination or alternative reference standard investigations. Two reviewers independently assessed papers and extracted data using a standardized form (the data extraction form is available on request). Discrepancies were resolved in a consensus between the two reviewers.

Statistical analysis

For each study, we constructed a two-by-two contingency table consisting of true-positive (TP), false-positive (FP), false-negative (FN), and true-negative (TN) results. For the analysis, we called a result a true positive if the Amsler grid or PHP finding was concordant and in agreement with the reference standard findings. We calculated sensitivity as TP/(TP+FN) and specificity as TN/(FP+TN). We estimated and plotted summary receiver operating characteristics (ROC) curves using a unified model for meta-analysis of diagnostic test accuracy studies.13 We also indicated on the ROC figures, the confidence and prediction regions. The advantage of doing this is that it provides estimates of average sensitivity and specificity across studies, and can be used to provide a 95 percent confidence region for this summary point and prediction regions within which we expect the sensitivity and specificity of 95 percent of future studies to lie.

Because of methodological reasons, the minimum requirement to be included into meta-analysis was at least four studies providing a two-by-two table. Thus, a meta-analysis was not possible for M-charts (two studies) and the macular computerized psychophysical test (MCPT) (one study).

Following recent recommendations, we did not pool positive and negative likelihood ratios because these are sensible parameters to analyze statistically in a meta-analysis.14 Instead, we calculated the likelihood ratios from the estimated pooled sensitivities and specificities.

All analyses were done using the Stata 11.2 statistical software package (StataCorp 2009, Stata Statistical Software: Release 11, StataCorp LP, College Station, TX, USA).

Results

Study selection

Electronic searches retrieved 1422 records. After excluding duplicates, 1319 records remained and were screened based on title and abstract. Subsequently, 1289 studies were excluded because they did not investigate the diagnostic accuracy of tests, contained no primary data, or investigated other conditions. Thirty articles were finally retrieved and read in full text to be considered for inclusion. Out of these 10 studies fulfilled our inclusion criteria. In addition, two studies were included via screening of reference lists and the science citation index database. Thus, 12 studies were included for quantitative analyses. The study selection process is outlined in Figure 1.

Figure 1
figure 1

Study flowchart.

Patients’ characteristics, design features

The 12 studies enrolled 903 patients. Among studies reporting this, 58 percent of participants were women on average. The included studies also involved other diagnoses, ie, geographic atrophy and dry AMD, but only patients with wet AMD were included into the analysis. In nine studies patients were included consecutively. Patients’ characteristics are summarized in Table 1.

Table 1 Study characteristics, patient population

Index and reference tests

Seven studies each investigated diagnostic accuracy of the Amsler grid test—or modifications—or the PHP, while two included studies evaluated the M-Charts and one study investigated the Macular Computerized Psychophysical Test (MCPT).

Color fundus photography, fundus angiography, and OCT were the most commonly used reference tests, while SLO was only used in two and the Amsler grid only in one study against the M-chart. Table 2 shows the index and reference tests that were applied in each study.

Table 2 Summary of index and reference tests applied in the various included studies

Test performance

The 12 studies allowed constructing 27 two-by-two tables. Twelve tables reported on the Amsler grid and its modifications, twelve tables reported on the PHP, one table assessed the MCPT, and two tables assessed the M-charts. In the twelve Amsler grid studies, sensitivity ranged from 0.34 to 1.0 and specificity ranged from 0.85 to 1.0. In the twelve PHP studies, sensitivity ranged from 0.68 to 1.0 and specificity ranged from 0.71 to 0.97. The reported sensitivity of MCPT was 0.94 and specificity was 0.94. The mean sensitivity of the two studies reporting on the M-chart was 0.81 and specificity was 1. Detailed results are provided in Table 3.

Table 3 Test performance characteristics

Results from HSROC analysis

The pooled sensitivity of studies assessing the Amsler grid was 0.78 (95% confidence intervals; 0.64–0.87) and the pooled specificity was 0.97 (95% confidence intervals; 0.91–0.99). The corresponding positive and negative likelihood ratios were 23.1 (95% confidence intervals; 8.4–64.0) and 0.23 (95% confidence intervals; 0.14–0.39), respectively.

The pooled sensitivity of studies assessing PHP was 0.85 (95% confidence intervals; 0.80–0.89) and specificity was 0.87 (95% confidence intervals; 0.82–0.91). The corresponding positive and negative likelihood ratios were 6.7 (95% confidence intervals; 4.6–9.8) and 0.17 (95% confidence intervals; 0.13–0.23), (see Figures 2 and 3).

Figure 2
figure 2

Hierarchical summary ROC curve of studies assessing the Amsler grid.

Figure 3
figure 3

Hierarchical summary ROC curve of studies assessing the PHP.

Assuming a one percent prevalence of wet AMD in the screening setting and using Bayes theorem (probability odds before testing x likelihood ratio (LR)=probability odds after testing), a positive Amsler grid test would increase the probability of AMD presence to 18.9 percent (probability odds of 1% prevalence=1%/(100%−1%)=0.0101; multiplied by positive LR (23.1)=probability odds after test=0.2333; probability of wet AMD after a positive test result=0.2333/1+1.2333=18.9%). That is, in a (mass) screening population with the probability of wet AMD presence of 1 percent, approximately every fifth person with a positive Amsler grid test result would have a wet AMD. Correspondingly, a negative test would decrease the probability to 0.23 percent. In the case of PHP, the probability given a positive test would be 6.3 percent. Given a negative test result, the probability would decrease to 0.17 percent.

Discussion

Main findings

A meta-analysis of the two commonly used screening tests for wet AMD, the Amsler grid, and the PHP, assessed in small patient samples, showed to be promising candidates in ruling out the illness. However, most of the studies were so called diagnostic case–control studies, ie, test results of patients with diagnosed wet AMD were compared with test results of healthy subjects or another sampled group of patients. Although this design may be appropriate in the early, proof of concept phase of evaluation, it must be noted that they are prone to exaggerate test performance.11 For MCPT and the M-chart, data were to scarce to perform a meta-analysis.

Results in light of existing literature

To our knowledge, this is the first comprehensive assessment of studies examining the diagnostic value of various screening tests in age-related macular degeneration. We are aware of one systematic review of the US Preventive Task Force examining the evidence on tools to screen for impaired visual acuity in elderly adults. They did not systematically quantify the various tests in a selected population of patients with age-related macular degeneration but on a broader spectrum, including patients with other ocular conditions leading to impaired vision such as cataract or strabismus and amblyopia.15 In 2007, Crossland and Rubin16 provided an unsystematic overview of the diagnostic value of the Amsler grid and PHP. In their comprehensive paper, they found a low sensitivity and specificity for the Amsler grid. For the PHP, they reported a higher sensitivity but a lower specificity than that with the Amsler grid. Our findings partly disagree with them. Although we found large variability both in sensitivity and specificity, expressed by large confidence regions in the meta-analysis, the average performance of these tests in preliminary study was clearly higher than that reported by Crossland and Rubin. Whether or not these findings translate into real clinical practice still needs to be investigated. We envision that the ideal screening test has excellent test performance in the relevant clinical setting and is easy to handle, apply, and interpret. Perhaps it might also be useful to adopt the role model of home blood pressure monitoring for AMD screening and therapeutic management. In view that disease progression may occur before visual distortion, the test should identify very early phases of AMD progression.

Strength and limitations

Our study applied up-to-date systematic review methodology and used state of the art statistical methods for quantitative summaries.13 A stratified pooled analysis was not possible for specific clinical strata due to the limited number of studies and due to the limited number of studies per clinical subgroup. We therefore also refrained from exploring factors explaining heterogeneity and we did not formally test for heterogeneity. A further important limitation of our meta-analysis is the fact that many studies used different—arguably sub-optimal—reference tests. This not only limits the validity of these studies but also the validity of the meta-analysis, because an inappropriate reference test will lead to a biased test performance.17 Most of the included studies used a so called diagnostic case–control design. Again, we were unable to perform a stratified analysis based on this item. Arguably, mixing the effects found in prospective cohorts and case–control studies introduced bias in our results.11 However, given the ‘proof of concept’ type of purpose of this review, we believe that our decision is acceptable, but conclusions drawn from our analysis must be made very cautiously. We agree with other authors that the ideal test for AMD screening still needs to be found.16 Even for the Amsler grid, which has been around for over 60 years now and is broadly used in clinical practice, there is yet no compelling evidence supporting its usefulness. However, also the PHP, which seems to be a valid alternative, needs to confirm its potential in daily practice. Finally, we excluded various papers describing new tests due to lack of data to construct a two-by-two table. Thus far, it might be justified to repeat our analysis in a couple of years when additional data emerge.

Implications for practice

The Amsler grid test has been around and in use for over 60 years. From the time of Amsler’s publication in 1947 until the late 1990s, its role as a screening test was, however, only at a low level in patient management. This might be a reason for the relatively weak body of evidence assessing its diagnostic usefulness. Only recently, with the availability of various effective anti-VEGF treatments, its possible role in patient management has become apparent. Also the development of the PHP and particularly the home-testing ForeseeHome device (Notal Vision Ltd, Tel Aviv, Israel) using this technology needs to be seen in this context.6 Very recently, the HOME study showed an advantage of PHP home monitoring in patients with choroidal neovascularisations (CNV), because regular home testing discovered a new CNV development at an earlier stage.18 If confirmed in cost-effectiveness analyses, this method could be promoted for clinical use in these patients. A second line of research focuses on shape discrimination hyperacuity testing.8, 19 Very recently, a mobile phone version of this test has become available and its feasibility and usability is currently assessed.7, 20 Whether the Amsler grid or the PHP should be used in a (mass) screening context still needs to be examined. From our preliminary analysis, we extrapolate that one out of approximately five patients with a positive Amsler grid test actually do have a wet AMD needing treatment. This figure might actually be too low. However, once this figure is validly established, it must be assessed whether an early, otherwise not discovered case of wet AMD out of five who are referred to an ophthalmologist is a sufficient yield to use it in a screening context. In the case of the PHP, our study showed somewhat lower yield. However, again, we still require a valid number before assessing its usefulness in screening.

Implication for further research

As stated above, our results need confirmation in carefully designed clinical studies. Moreover, issues of practicability need to be considered. For example, the easiness of application and interpretation of a screening test is another important aspect. It has been argued that Amsler grid testing is difficult to perform correctly and thus often leads to uncertain test results. For example, in 1986, Fine et al21 raised the awareness that the screening with an Amsler grid is not fully self-explanatory. Only about 10 percent of patients spontaneously complained about a distorted vision when using it on their own. This figure raised substantially under proper instruction and supervision. This review of test accuracy studies was unable to address this important aspect of testing. Arguably, test performance will drop in clinical practice if the Amsler grid test is performed without clear instructions and monitoring of patients. To assess this issue, further studies, particularly ones examining the clinical impact of screening on patient relevant outcomes, are needed.

Conclusion

Results from small preliminary studies show promising test performance characteristics both for the Amsler grid and PHP in the diagnostic work-up of wet AMD. On the basis of test performance, the Amsler grid showed some advantages in ruling-in wet AMD and could thus help in monitoring disease, but data were very heterogeneous. The PHP in return had small advantages over the Amsler grid in ruling out wet AMD and could thus be useful in the screening context. However, to what extent our findings can be transferred to a real clinic practice still needs to be established. Moreover, new promising technologies with theoretical advantages over the Amsler grid and the PHP are currently emerging that need careful clinical examination to confirm their usefulness in a screening and monitoring context. If confirmed, further studies assessing their impact on patient management need to be quantified.