Systematic review of diagnostic tests for vaginal trichomoniasis.

OBJECTIVE
To review critically and to summarize the evidence of diagnostic tests and culture media for the diagnosis of Trichomonas vaginitis.


METHODS
We performed a systematic review of literature indexed in MEDLINE of studies that used Trichomonas culture as the reference standard (9,882 patients, 35 studies). Level I studies (5,047 patients, 13 studies) fulfilled at least two of three criteria: 1) consecutive patients were evaluated prospectively, 2) decision to culture was not influenced by test results, and 3) there was independent and blind comparison to culture.


RESULTS
The sensitivity of the polymerase chain reaction technique (PCR) was 95% (95% CI 91% to 99%), and the specificity was 98% (95% CI 96% to 100%). One study was classified as Level I evidence (52 patients). The sensitivity of the enzyme-linked immunosorbent assay was 82% (95% CI 74% to 90%), and the specificity was 73% (95% CI 35% to 100%). The sensitivity of the direct fluorescence antibody was 85% (95% CI 79% to 90%), and the specificity was 99% (95% CI 98% to 100%). Sensitivities of culture media were 95% for Diamond's, 96% for Hollander, and 95% for CPLM.


CONCLUSIONS
The sensitivity and specificity of tests to diagnose trichomoniasis vary widely.

nately, culture media are not widely available to the practicing physician, requiring 2-7 days to obtain results. The InPouch TV, 16 a technique that has recently received attention, is a commercially available medium consisting of a two-chambered bag that allows culture and microscopic examination of the specimen. Other tests to identify Trichomonas include polymerase chain reaction (PCR), enzyme-linked immunoassay (ELISA), direct fluorescence antibody assay (DFA), enzyme immunoassay (EIA), dot-immunobinding (DIBA) assay, indirect fluorescent antibody (IFA) assay, agglutination test (AT), and stained smear techniques (Pappenheim stain, Papanicolaou smear). 6 The latter tests were not mentioned in the 1998 Center of Disease Control and Prevention (CDC) guidelines for the treatment of sexually transmitted dis-eases17; currently, no guideline regarding the diagnosis of trichomoniasis is available.
Clinical investigations may result in inaccurate estimates of sensitivity and specificity if: 1).no reference standard is used, 2) patients are not evaluated prospectively, 3) the test result or reference standard influences the decision to perform the comparison test, or 4) the tests are not examined blindly and independently, is-el In a recent metaanalysis of the wet mount and Papanicolaou smear for the diagnosis of trichomoniasis, Wiese et al. lz found that 74 of 104 studies did not use a reference standard. Although several investigations have suggested the utility of other diagnostic tests for trichomoniasis, their methodologic validity has not been critically examined. We conducted a systematic review of other diagnostic tests for Trichomonas vaginitis in order to obtain overall estimates of test sensitivity and specificity. In addition, we reviewed the accuracy of various culture media. Our systematic review of the evidence may help with development of guidelines for the diagnosis of trichomoniasis. The key words to identify trichomoniasis were: explode (exp) Trichomonas, exp Trichomonas infections, Trichomonas vaginalis, and Trichomonas vagi-nitis. Any terms under each subheading were also retrieved. The text word Trichomonas and the wildcard word trichomon$ were also searched. The key words to identify diagnostic tests were: exp sensitivity and specificity, exp diagnostic errors, diagnostic tests routine, multiphasic screening, likelihood functions, diagnosis-differential, falsepositive reactions, exp false-negative reactions, exp diagnosis, receiver operating curve, sensitivity (text word), and specificity (text word), zz,z3 References listed in these published studies and in recent review articles were retrieved. 6,13,z4 The search yielded 584 articles. We used Biblio-Link II and Procite software (Research Information Systems, Carlsbad, CA) to catalog references.

SUBJECTS AND METHODS
Two investigators reviewed each title and abstract independently to screen for eligible studies. Articles were reviewed entirely when agreement on eligibility could not be resolved by consensus among four investigators. We excluded 436 articles: 402 did not describe diagnostic tests, and 34 lacked a reference standard. The kappa regarding the appropriateness of exclusion was 0.71. Values of kappa reflect agreement that is slight (0 to <0.2), fair (0.2 to <0.4), moderate (0.4 to <0.6), substantial (0.6 to <0.8), or almost perfect (0.8 to 1). zs Two investigators independently examined the remaining 148 articles. We excluded 113 articles without disagreement: Fifty did not describe diagnostic tests, 12 could not be translated (written in Slovak, Polish, Russian, Korean, or Czech), 43 lacked a reference standard, and eight pertained to the wet mount or the Papanicolaou smear. Thus, we selected 35 articles for analysis. 11,6,z6-s8 We had no disagreement regarding the inclusion of articles. One publication used two study designs.

Quail W Criteria for Validi W of Studies
The reference standard was trichomonads culture in one or more media with/without the wet mount; i.e., trichomonas was said to be present when the organism was identified in one or more culture media or when the motile organism was seen in the wet mount. The wet mount alone was not considered a reference standard. We included only studies that sampled the vagina (Table 1). Culture media reviewed either were prepared in the individual laboratories or were commercially available (such as Diamond's medium from Carr Scarborough Microbiologicals, Stone Mountain, GA; InPouch TV 2Specialty clinic, urology, obstetrics, gynecology, parasitology; STD, sexually transmitted diseases. 3Same article which used two study designs. 4CPLM (cysteine-peptone-liver medium). SlnPouch TV (proteose-peptone-medium).
from Bio Med Diagnostics Inc., Santa Clara, CA; Trichosel broth from Becton-Dickinson Microbiology Systems, Cockeysville, MD). Studies were classified as Level I when they explicitly fulfilled at least two of three validity criteria: 1) consecutive patients were evaluated pro-spectively, 2) the test result did not influence the decision to perform the reference standard, and 3) the test of interest and reference standard were blinded and independently examined ( Table 2). is Studies that fulfill these methodologic criteria are more likely to provide accurate estimates of sensi- Test and reference standard are examined blindly and independently Level Reference standard and two or more other criteria Level I1: Reference standard and one other criterion Level II1: Reference standard tivity and specificity, is Studies were classified as Level II or III, respectively, when any one, or none, of the criteria was fulfilled.
The articles were randomly distributed among raters with expertise in evidence-based medicine.
Two raters independently abstracted validity crite, ria and data from 2 2 contingency tables. Disagreement was resolved by consensus among four raters examining the full article. The kappa interrater agreement for the three study validity criteria were 0.48 for consecutive patient evaluation, 0.17 for influence to perform the reference standard, and 0.61 for test and reference standard independent evaluation.

Statistical Analysis
Prevalence, sensitivity, specificity, positive and negative predictive values, and likelihood ratios were calculated. Homogeneity of sensitivity and specificity between studies was explored with the X 2 test. [59][60][61] Studies were considered homogeneous when the result of an individual study was mathematically compatible with the results of any of the others. We used a random-effects model to pool estimates of sensitivity and specificity, s9 Statistical methods are not available to pool likelihood ratios, so a weighted likelihood ratio could not be calculated. 18,6-6s We calculated an overall likelihood ratio positive (LR+) by using pooled estimates of sensitivity and specificity, LR sensitivity/(1 specificity). SPSS 8.0 software was used to perform statistical analyses (SPSS Inc., Chicago, IL).

RESULTS
Overall, 31% of diagnostic test studies utilized a reference standard (35/112 studies; Table 1). The validity criteria were reported in 33% of studies for consecutive patients and were evaluated prospec-tively; 78% of studies for the test result did not influence the decision to perform trichomonas culture as a reference standard; and 11% of studies for the cultures were examined independently and blindly. Table shows the characteristics of the 35 articles (9,882 patients); one publication used two study designs. Thirteen studies (36%) were classified as Level (5,047 patients), 15 (42%) as Level II (3,970 patients), and eight (22%) as Level III (865 patients). No consistent details of patient information across studies were available from the original papers. Asymptomatic patients accounted for 11% of the reports, patients with/without symptoms 64% (no breakdown of estimates among groups provided), and the remainder of the reports did not specify whether patients were symptomatic or not.

PCR Technique
Six studies examined the test characteristics of the PCR (1,973 patients; Table 3). The pooled sensitivity was 95% (95% CI 91% to 99%), the pooled specificity was 98% (95% CI 96% to 100%), and the LR was 48. One study was classified as Level I (52 patients), five as Level II (1,921 patients), and none as Level III. The overall estimates of sensitivity were homogeneous. The overall estimates of specificity were heterogeneous.
The overall estimates of sensitivity and specificity were heterogeneous.

DFA Technique
Three studies examined the test characteristics of the DFA technique (809 patients; Table 3). The pooled sensitivity was 85% (95% CI 79% to 90%), the pooled specificity was 99% (95% CI 98% to 100%), and the LR was 85. Two studies were classified as Level I (704 patients), one as Level II (105 patients), and none as Level III. The overall esti-

Other Techniques
Six studies examined nine other techniques ( Table  3). The sensitivities ranged from 4% to 93%, and the specificities ranged from 62% to 100%. The LR ranged from 2 to infinity. No studies were classified as Level I, three as Level II, and three as Level III.

Culture Media
Twenty studies examined the test characteristics of 11 culture media techniques ( Table 4). The pooled Diamond-modified. 2Cervix sampling. 3Self-collected sample. 4Collected by physician. *Denotes heterogeneity of data (P < 0.05).

INFECTIOUS DISEASES IN OBSTETRICS AND GYNECOLOGY 253
sensitivity for all studies was 90% (95% CI 87% to 93%). The Diamond's culture medium was examined in 10 studies (3,568 patients; Table 4). The pooled sensitivity was 95% (95% CI 93% to 98%). Six studies were classified as Level I (2,571 patients), three as Level II (937 patients), and one as Level III (60 patients). The overall estimates of sensitivity were heterogeneous.
The Hollander culture medium was examined in two studies (810 patients; Table 4). The pooled sensitivity was 96% (95% CI 93% to 100%). Both studies were classified as Level I. The overall estimates of sensitivity were homogeneous. The CPLM culture medium was examined in one study (667 patients; Table 4). The pooled sensitivity was 95% (95% CI 80% to 100%). Both studies were classified as Level I. The overall estimates of sensitivity were homogeneous.
The InPouch TV technique was examined in four studies (1,315 patients; Table 4). The pooled sensitivity was 84% (95% CI 79% to 89%). One study was classified as Level I (715 patients), two as Level II (332 patients), and one as Level III (268 patients). The overall estimates of sensitivity were homogeneous.

DISCUSSION
We performed a systematic review of tests comparing to a reference standard to help with the development of guidelines for the diagnosis of trichomoniasis. Ideally a test should have high sensitivity and specificity and be easily available, simple to perform, and inexpensive. Currently, for the diagnosis of trichomoniasis, the wet mount is the least costly to perform, yet its sensitivity is poor. In the latest guidelines for the treatment of sexually transmitted diseases, the CDC reports that "The motile T. vagina/is is identified easily in the saline specimen [and] culture for T. vaginalis is more sensitive than microscopic examination. ' '17 However,no guidance was provided regarding other diagnostic tests.
This systematic review shows that PCR for the diagnosis of trichomoniasis has high sensitivity, specificity, and LR+. The narrow confidence intervals indicate consistent results between studies.
However, most of the data were derived from Level II studies. Self-collection of the specimen 66 and rapid results 4 are some of the advantages of this technique. Women with asymptomatic tricho-moniasis serve as a reservoir for continuing disease transmission. Therefore, perhaps PCR would be most useful in mass screening of trichomoniasis, similarly to its use in the detection of Chlamydia trachomatis. 67  waiting for results is not desirable. Trichomonas culture should be used when the wet mount is negative and the clinical suspicion is still present. Culture should also be obtained to confirm a positive Papanicolaou smear in settings of low to intermediate prevalence, lz The estimates of sensitivity for culture should be interpreted cautiously. The reference standard in some studies was the culture medium itself with the wet mount, which may yield higher estimates of sensitivity for the culture, whereas in other studies the reference standard was multiple culture media with/without the wet mount, which may yield lower estimates of sensitivity. As an example, the sensitivity of the Diamond's medium ranged from 95% to 99% for the former scenario and 88% to 97% for the latter.
Our systematic review has several strengths. We used a systematic approach in the evidence-based framework, including studies that utilized a reference standard. Studies without a reference standard when one exists are uninterpretable. We also used explicit validity criteria to assess the level of the evidence, ls, 19 Finally, multiple raters abstracted data to avoid observation bias. Our study has certain limitations. The methodologic quality criteria in the studies were not always explicitly described, resulting in less-than-ideal interrater agreements. We address this by discussing the criteria among four authors but acknowledge that other reviewers might reach different decisions. The study design was not uniform among reports, and not all estimates were homogeneous. We used a random-effect model to attempt to correct for heterogeneity among such studies, s9 but we caution the reader to examine the primary data instead of the pooled estimates. In summary, PCR is a promising technique with sensitivity equal to or better than that of culture.
However, more Level studies are needed. The CDC should make a uniform recommendation with the appropriate reference standard for the diagnosis of trichomoniasis. In the meantime, it seems prudent to use only the culture media with the highest sensitivity as a reference standard (Diamond, Hollander, or CPLM).