Analytical performance and user-friendliness of five novel point-of-care D-dimer assays

Abstract D-dimer testing combined with a clinical assessment has become a standard pathway for ruling-out venous thromboembolism (VTE). Recently, novel Point-of-Care (POC) D-dimer assays have been introduced, enabling low-volume blood sampling for rapid exclusion of VTE in a one-step procedure. We assessed the analytical validity and user-friendliness of a set of these novel POC D-dimer assays, and compared the results with a standard laboratory assay. Plasma samples were run on our reference assay (STA-Liatest D-di PLUS®) and five POC assays: Nano-Checker 710®, AFIAS-1®; iChroma-II®; Standard F200® and Hipro AFS/1®). After evaluating imprecision, Pearson Product-Moment correlation coefficients were calculated, Passing Bablok regression was performed and Bland-Altman plots were generated. User-friendliness was evaluated using the System Usability Scale (SUS). A set of 238 plasma samples of patients clinically suspected of VTE in general practice was available for analysis. Only one POC D-dimer assay (Nano-Checker 710) demonstrated an insufficient degree of imprecision. Pearson correlation coefficients and mean biases ranged from 0.68 to 0.93 and −165 to −53 μg/L respectively, and concordance with our reference assay varied from 71.8% to 89.5% using a 500 μg/L cut-off point. While we found considerable variation in overall user-friendliness, most devices were judged easy to use. In view of our findings regarding analytical performance and user-friendliness, we consider most of the novel POC D-dimer assays can be used in settings outside of the laboratory such as general practice, combining the possibility of multi-testing with low-volume capillary blood sampling and processing times of less than 15 min.


Introduction
Venous thromboembolism (VTE), consisting of deep vein thrombosis (DVT) and pulmonary embolism (PE), is the third most common cause of cardiovascular death, after myocardial infarction and stroke. The annual incidence rate is about 1 per 1000 person-years in a general population [1,2]. Diagnoses are confirmed by imagingi.e. compression ultrasonography (CUS) for DVT and computed tomographic pulmonary angiography (CTPA) for PEwith typical ratio of confirmed versus negative diagnoses of 1:6 in suspected patients. The challenge of correct VTE diagnosis is that symptoms mimic and overlap with many everyday illnesses. However, lowering the 'threshold' for performing CUS/CTPA will lead to unneeded referrals [3][4][5][6]. This may result in an increase in health care costs and may also cause unnecessary burden and harm to patients, e.g. due to radiation exposure or contrast nephropathy associated with CTPA [7]. This thus emphasizes the clinical dilemma of suspected VTE: many patients may have a suspicion but referring all of them for imaging is burdensome, costly and in the end not necessary for the majority, whereas missing a diagnosis still is common as well [8,9].
This clinical problem is often first assessed by general practitioners (GPs), but definitive diagnosis is difficult without access to rapid and around-the-clock laboratory tests such as through point-of-care (POC) testing. Guidelines for suspected DVT and PE assist GPs to combine clinical decision rules (CDRs) with, in selected cases, a D-dimer test [10][11][12][13]. A low score -CDR 3 (DVT) or CDR 4 (PE) À combined with a negative D-dimer was shown to safely rule-out both DVT and PE [14,15].
Thus, rapid access to D-dimer point-of-care testing is decisive for GPs and their patients. Previously, qualitative D-dimer tests, suited for capillary blood samples, were incorporated in guidelines. Those tests turned out to be prone to significant pre-analytical and interpretation errors in less-controlled non-research environment, leading to avoidable false-negative results [16]. Immediately after market withdrawal, comparable alternatives were not available for use in general practice. Recently, new quantitative POC D-dimer assays have been introduced which need only a small volume blood sample obtained by a finger-prick. We set out to evaluate the analytical performance of these five POC D-dimer tests with a laboratory D-dimer test as the reference standard. User-friendliness was assessed in a hands-on session, in which GP assistants filled out questionnaires after their first-time use of the test systems.

Patient samples
We collected blood samples from 242 patients suspected of having DVT (CDR 3) and/or PE (CDR 4) in general practice, after patients had provided informed consent. In this population we expected to find a wide variability of Ddimer values [12]. The study protocol was approved by the Medical Ethics Committee of the University Medical Centre of Utrecht, the Netherlands. The inclusion period was 26 September 2016 -10 January 2019. A more detailed design of the study is shown in Figure 1.

Sample collection
Routine D-dimer tests were performed in a laboratory facility for general practice prior to the initiation of any anticoagulant treatment, and were obtained by venipuncture drawn from the anterior cubital vein.
Also, an additional blood sample (4.0 mL) was drawn from the same venipuncture into a lithium heparin tube (LH PST TM II), and a citrate tube (9NC 0,105 M Buff. Na 3 citrate), all from Becton Dickinson, New Jersey, USA. Within four hours, these additional tubes were centrifuged and aliquoted (centrifugation settings: 2,000 g; 10 min; 21 C). Then, tubes were stored until analysis at À70 C.

D-dimer measurements
After thawing, sample processing was performed in the Jeroen Bosch Hospital.
An immuno-turbidimetric assay (STA-Liatest V R D-Di PLUS) on a routine laboratory analyzer (STA-R Max V R , Stago Diagnostica, Asni eres-sur-Seine, France) was used as the analytical reference method for D-dimer testing.
This assay uses a suspension of latex microparticles coated by covalent bonding with monoclonal antibodies to measure D-dimer levels; it has undergone extensive clinical validation and its cut-off value, recommended by the manufacturer, (¼500 lg/L), has been confirmed in clinical studies [17,18]. According to the package leaflet, precision studies were performed in accordance with CLSI guideline EP5-A2 showing a coefficient of variation (CV) for within-day precision of 4.6% (mean ¼ 690 lg/L) and 1.2% (mean ¼ 2,300 lg/L), and a between-day precision (CV) of 3.3% (mean ¼ 690 lg/L) and 0.0% (mean ¼ 2,300 lg/L) with linearity between 290 lg/L and 18,210 lg/L [19]. External Quality Controls from the ECAT foundation demonstrated a Z-score of <À1.5 in the latest 8 surveys of our laboratory assay, implying our measurements came close to the laboratory group average of the surverys. Latest quality controls showed a D-dimer result of 350 lg/L versus an (assay-specific) group value of 430 lg/L with a CV of 14.6% (level 1) and 980 lg/L versus an (assay-specific) group value of 1,040 lg/L with a CV of 6.6% (level 2).
Citrated plasma specimens were used on both routine and POC test systems except for Nano-Checker 710, for which lithium heparin plasma was used. POC D-dimer measurements have been carried out according to the manufacturer's instructions by an experienced laboratory technician who was unaware of the routine D-dimer testing results and the clinical details of the patient including the CDR score.
Quantitative test results were reported as lg/L fibrinogen equivalent units (FEU), which were based on the amount of purified fibrinogen used for the preparation of a crosslinked fibrin clot, which was then degraded by plasmin and used as calibrator.

Imprecision
As an elementary quality check (verification), between-run and between-day CVs for each D-dimer assay were calculated using two different D-dimer levels for each test system. Accordingly, CVs were compared with criteria from the respective manufacturer and with 50% of the CV based on within-subject biologic variation, i.e. 0.5 Â 23.3% ¼ 11.65%     2,20], which is a commonly accepted criterion for the desirable specification for imprecision [21,22].

Comparability
For a method comparison, it is recommended that at least 40 samples of broad concentration range should be available for testing with two methods (EP9 protocol) [23,24]. Ddimer values that ranged between 100 and 1,600 lg/L were considered as of greatest clinical relevance. Values below 100 lg/L and above 1,600 lg/L were excluded, as they would influence analyses disproportionately, while in clinical practice such values are considered negative and positive, respectively, without any discussion. First, Pearson Product-Moment correlation coefficients r between routine and POC D-dimer assays were calculated. Next, Passing Bablok linear regression analysis was performed, which allows estimation of agreement between analytical methods, and is less susceptible to distribution of errors and to the presence of a low amount of data outliers than other similar regression methods [23]. Regression equations (slope and intercept) were calculated between routine and POC D-dimer assays and presented with 95% confidence intervals (CI), in order to find out if mutual results Bland-Altman plots were generated to provide more detail about the nature of the differences between routine and POC D-dimer results [25,26]. Mean differences were calculated and presented with 95% CIs.
Finally, dichotomous (positive/negative) D-dimer test results were generated to demonstrate concordance and discordance percentages between routine and POC D-dimer assays. All results below 500 lg/L were considered negative, while higher results were considered positive. This value (500 lg/L) was chosen as a cut-off point because to date this value is widely used for clinical decision making in general practice and as such is incorporated in clinical guidelines.

User-friendliness
User-friendliness of the POC D-dimer test systems was assessed using a questionnaire based on the System Usability Scale (SUS; Supplementary Appendix 1). The SUS has been in use for decades and is a widespread, reliable, and well-tested tool for assessing the subjective perception of users on the usability of a system in quick and easy way, which is suited for system comparison and could function as a guide to system improvement [27,28]. Using a set of 10 questions a score is calculated ranging from 0 to 100 and can be interpreted as follows: 51.8: system should be fixed; 51.9-67.9: improvement is recommended; 68-73.9: usable but could improve; 74-80.2: usable; !80.3: excellent; alternatively, a grade rating can be used [28]. A group of eleven GP assistants, unfamiliar with the devices but familiar with CRP POC testing, were given a short instruction by a laboratory technician and all assistants were given a written instruction chart. They then directly carried out one D-dimer test on all five POC D-dimer test systems in a random order and completed a SUS questionnaire accordingly, along with a few additional questions about sample management and readability of displays and results. Next, for each device, one SUS score was calculated based on median scores of the 10 questions of the SUS questionnaire, and an interquartile range (IQR) was calculated.
Additionally, detailed characteristics of POC D-dimer devices, collected from manufacturers' information sheets and own experiments, were evaluated.
Statistical analyses were performed using the Statistical Package for the Social Sciences Software V R (PASW Version 22, SPSS, IBM, Somers, NY, USA). Bland-Altman plots were generated using Analyse-it V R (Version 5.11, Kirkstall Rd, Leeds, UK).

Results
After exclusion of samples that contained insufficient blood for processing, 238 samples were included in the data analysis which ensured sufficient samples were in the desired range of measurement and exceeded the usual criterion of at least 40 samples to be analyzed.

Imprecision
Between-run and between-day variation for the POC assays is presented in Table 1 for two different D-dimer levels for each test system. Four out of five assays demonstrated acceptable imprecision with CVs within the criteria set by the respective manufacturer and within 11.65%. Only the Nano-Checker 710 demonstrated substantially higher CVs in the low range sample, both regarding within-day variation CV (26.1%) and between-day variation CV (15.3%).

Comparability
Pearson Product-Moment correlation coefficients r and regression line equations according to Passing and Bablok are presented in Table 2 (first and second columns). R coefficients of POC assays ranged from r ¼ 0.68 (Hipro AFS/1) to r ¼ 0.93 (AFIAS-1). All Passing and Bablok fits demonstrated a significant deviation from y¼axþb (a ¼ 1, b ¼ 0), implicating that slope and/or intercept fall outside the 95% CI of 1 and 0, respectively. Figure 2 shows agreement of POC assays with the laboratory test by Bland-Altman plots. For AFIAS-1 and iChroma-II assays, higher D-dimer values demonstrate a stronger negative bias (difference). For the remaining POC assays, around 500 lg/L, biases appeared relatively constant and small considered in the context of the entire D-dimer range. Mean biases and accompanying 95% CIs are shown in Table 2 (third column). The Standard F200 assay demonstrated lowest mean bias of all POC test systems (À53 lg/L), with its 95% CI being the second narrowest (À278 -þ384 lg/L), while the Nano-Checker 710 assay demonstrated highest bias (À165 lg/L) and 95% CI (À675 -þ345 lg/L). Table 3 presents concordance and discordance percentages of our laboratory assay versus POC assays. Concordance varied from 71.8% (Nano-Checker 710) to 89.5% (Standard F200). In three out of five POC assays (Nano-Checker 710, AFIAS-1, iChroma-II), discordance was mainly due to lower POC results than the STA-Liatest PLUS and in two out of five POC assays (Standard F200, Hipro AFS/1), discordance was mainly due to higher POC results.

User-friendliness
The GP assistants, who were all first-time users, generally did not have problems with generating test results. All questionnaires were filled out completely. Median SUS scores of the five systems varied from 37.5 (iChroma-II) to 75.0 (Hipro AFS/1) and are presented in Table 4 with accompanying IQRs. Median scores of the third SUS question 'I thought the system was easy to use' were: 4 ! 'I agree' Table 1. Imprecision data of the Point-of-Care (POC) D-dimer assays used in this study.  4.0 a For this data D-dimer was measured 5 times within one day. b For this data D-dimer was measured once a day for 5 consecutive days.   Table 2 (third column).
As for the POC D-dimer test systems in general, no positive or negative outliers were found on median scores of the individual questions. Participants referred to the number of (extensive) actions to be performed and components to handle, the risk of spilling blood and the ease in which screen instructions can be followed, as noteworthy aspects of userfriendliness in these test systems. A more detailed report is presented in appendix 2. We hypothesized that a learning effect could occur due to the use of multiple devices, which could have been resulted in higher SUS scores. Therefore, we compared SUS scores of test systems based on participants that used these devices as their first test system, with SUS scores from Table 4 (based on all participants). No structural difference in SUS scores was found, implying that a bias based on a learning effect could not be proven.
Device characteristics differ considerably between the five POC D-dimer assays ( Table 4). All test systems are able to handle both plasma and whole blood samples, althoughfor the Hipro AFS/1 test systemthe ability to handle whole blood D-dimer tests was under development at the time of writing. While processing of a D-dimer test takes three minutes on the Hipro AFS/1 test system, processing time for the other devices varied between seven and 15 min. Also, plasma volumes vary considerably: most devices need 5-15 lL plasma, while the AFIAS-1 test system requires 100 lL 1 . All devices are able to handle biomarkers other than D-dimer; details are provided in Table 4.

Discussion
We assessed the analytical performance and user-friendliness of five novel POC D-dimer test systems in a direct assay comparison with the STA-Liatest Plus reference. Imprecision of all POC D-dimer assays was sufficiently low, with the exception of the Nano-Checker 710 test system. Mean bias between POC assays and the laboratory assay was small (ranging from À165 to À53 lg/L) and, importantly, bias was smallest around the most clinically relevant area (500 lg/L). Choosing 500 lg/L as a cut-off point, concordance between POC assays and the laboratory assay varied from 71.8% to 89.5%. In view of these results, we consider precise and reliable D-dimer measurements to be possible in daily practice using these novel POC assays.
Using the SUS questionnaire, we found considerable variation in overall user-friendliness, in addition to objective Table 3. Concordance and discordance of all Point-of-Care assays versus the STA-Liatest PLUS assay. Data are based on dichotomous results (positive/negative) with a 500 lg/L cut-off value.   For sample processing, we used 100 lL blood samples according to the package leaflet. For evaluating user-friendliness, we used a new method for capillary blood sampling ("C-tip"), requiring a 30 lL blood sample, because this tip was made available by then. b Examples of these steps: collecting blood, adding blood to buffer, mixing blood with buffer, adding sample to test system, loading and initializing sample for processing. c The following tests can be processed in all test systems in addition to D-dimer: (high-sensitive) C-Reactive Protein, NT-proBNP, Troponin I, CK-MB. d For many tests, a whole blood version is not (yet) available. e At the time of writing, a whole blood version was under development. f These microchips are different for each test and lot number, which is subject to error. g System Usability Scale. Whole blood samples were used for evaluating user-friendliness. h Interquartile range.
test system characteristics such as size, processing time, sample volume, and the number of testable biomarkers. Most devices were judged easy to use. Limited user-friendliness seemed largely related to the number of actions to be performed, the number of components to handle, and the risk of spilling blood. We consider user-friendliness of one device (iChroma-II) to be insufficient (SUS score ¼ 37.5) and strongly advice making device modifications before (re)introducing in clinical practice.
To our knowledge, ours is the first study to evaluate these novel POC D-dimer test systems on analytical performance and user-friendliness. Unlike recent evaluations of POC D-dimer test systems, the devices in this study are portable, enable drawing a whole blood sample by fingerprick and, for this purpose, only need limited amounts of blood (most of them 15 lL) compared to existing quantitative assays requiring 75-150 lL [9,21]. We noted a reduction in processing time: processing time varied between 3 and 15 min in contrast to between 10 and >30 min for existing devices [9,21,29].
The Passing and Bablok regression lines demonstrate the presence of a substantial, assay-dependent, linear deviation of all POC assays compared with the laboratory assay. Significant improvement of agreement is possible by adjusting such deviations through modification of pre-defined manufacturer's (software) settings and seems most beneficial for AFIAS-1 and iChroma-II assays ( Figure 2). Other possible reasons for the presence of these linear deviations are calibration and/or hardware (i.e. dilution process) issues.
The impact of these linear deviations on our results can be illustrated by the observation that the assay that revealed highest correlation coefficient (r)the AFIAS-1 test system did not demonstrate best performance results in terms of mean bias and concordance. Actually, this proved to be the Standard F200 test system. Since both slope and intercept of the AFIAS-1 demonstrated a significant deviation from y ¼ x, performance results of this assay will considerably improve after adjustment of slope and intercept, as shown in Table 3.
Performance is largely in accordance with studies on similar venous whole blood POC D-dimer devices carried out in recent years; most of these assays also demonstrated good performance, with Cohen's Kappa coefficients ranging from j ¼ 0.72-0.94 [21] and Pearson correlation coefficients ranging from r ¼ 0.73-0.99 [29][30][31][32][33]. It has been suggested that in some studies, the documented performance might, at least partially, have been biased by the inclusion of very high valueslargely exceeding twice the diagnostic cut-off levelfavourably affecting correlation but of little relevance for clinical decision making [29]. As previously indicated, such high values have not been included in this study.
The strength of our study is that test characteristics, different methods of measuring analytical performance, and user-friendliness were combined, thus providing an overview of all relevant aspects of these novel portable POC Ddimer test systems. By including samples of patients actually suspected of having VTE from the general population in general practice, larger sample numbers were available covering the entire clinical relevant measuring range to be used to assess analytical validity in the target population. Also, analyses of the same samples were performed on all POC assays, ensuring that in each POC versus lab assay comparison, the same collection of samples was used. We assessed user-friendliness using GP assistants instead of laboratory technicians. Also, in contrast to similar studies, we used a widely-used multi-purpose scale system for analysis, facilitating straightforward result interpretation [34][35][36]. Nevertheless, for full appreciation, the following limitations need to be discussed.
Firstly, although a state-of-the-art D-dimer laboratory assay has been used as reference assay, results may differ if another laboratory assay had been chosen as a reference. However, because of the lack of standardization between assays, no actual reference standard of D-dimer assays is available to date; in this respect, it is unlikely that another laboratory assay would provide more reliable results [37]. In fact, it may even be true that one or more POC D-dimer assays would provide a better diagnostic performance than our laboratory assay. For these reasons, a first impression of diagnostic accuracy based on results of the current study is difficult to provide.
Secondly, in this study, all sample processing was performed by an experienced laboratory technician. We cannot foresee if, when used within their own environment, nonlaboratory users experience problems that may influence test results [38]. Therefore, verifying results in in a general practice, an emergency room or outpatient department setting is recommended to see how they function outside the lab. Using whole blood samples, preferably drawn from a fingerprick, will further approach daily general practice. Besides, in addition to a laboratory assay, clinical outcome could be used as a reference standard to provide well-grounded diagnostic accuracy measures such as sensitivity, specificity and predictive valuesa final diagnosis is generally based on (CUS/CTPA) imaging after a positive D-dimer result [39]. We are currently planning a study to combine these aspects. Nonetheless, in our opinion, the method and setting we used in this study were sufficient for our goal to assess analytical performance and general agreement between tests.
Thirdly, for assessing user-friendliness participants used blood samples that were available from venipunctures. Thus, aspects of sample collection by drawing blood from a finger-prick were not taken into account. The number of participants was limited, but there was consensus within the GP assistants' comments on the devices.
Although overall SUS scores are relatively low, they should be interpreted in the right context. The POC Ddimer test systems were only recently released and will most likely be further developed. Also, while a bias based on a learning effect during the session could not be proven, we can satisfactorily assume that users operating with such devices on a frequent basis and after a thorough, personalized, training program would provide higher SUS scores. Hence, the SUS scores must not be interpreted as static numbers, but can serve as a benchmark for further development insteadthe feedback from the GP assistants can be used as guidance.
In short, our study shows that the analytical performance of the studied, quantitative POC D-dimer tests is comparable with a laboratory D-dimer test (STA-Liatest PLUS). This new generation of POC D-dimer assays is well suited for use in general practice and offers the potential for actually being introduced in this setting for the exclusion of VTE. Currently, most D-dimer assays are being performed in central or hospital-based laboratories, so the results of this study can contribute to the accessibility of rapid Ddimer testing for GPs [21].

Conclusion
Our results show that most POC D-dimer test systems perform adequately when compared with the STA-Liatest PLUS assay, especially when we take due account of results in the clinically relevant area. Only the Nano-Checker 710 test system did not fulfill the desired analytical specifications. Based upon these results and user-friendliness findings, these novel POC D-dimer assays have the potential to function as an alternative for a laboratory assay, although further studies are needed on diagnostic accuracy À to be verified on capillary samples À in the specific clinical setting where the POC devices are to be used. The Standard F200 assay demonstrated the best overall analytical performance and the Hipro AFS/1 assay proved to be the most user-friendly test system. All portable test systems evaluated possess the desired functionalities for out-of-hospital settings, combining multi-test analysis and low-volume capillary blood sampling with strongly reduced processing times.