Quantitation of phosphatidylethanol in dried blood after volumetric absorptive microsampling

Background: Stimulated by the increased recognition of phosphatidylethanol (PEth) as sensitive direct marker of alcohol intake, Toxicology the Institute of Criminalistics and Criminology combined their efforts to develop a quantitative method. To facilitate implementation the focus was on the use of a sampling technique which allows quick and easy blood collection, without the need of dedicated personnel at any place/any time. In the meantime the cooperation of the two labs should also allow to initiate a Belgian network of laboratories capable of quantifying PEth. Methods: Dried blood microsamples were collected via volumetric absorptive microsampling (VAMS). PEth 16:0/ 18:1 was quantified after liquid-liquid extraction using two independent isotope dilution - liquid chromatography – tandem mass spectrometry methods. A systematic review of the entire process at both sites was per- formed before the final method comparison using samples from 59 routine toxicology cases collected within a one-year time interval. Results: Initial differences between both laboratories were solved by focusing on important methodological aspects: (i) trueness verification of the calibration protocol focusing on the primary material, preparation of the stock solutions and adequate equilibration of calibrators and QCs, and (ii) verification of comparability of results obtained with different m/z transitions. Several of these aspects could only be verified by critically assessing spiked and native samples. After a final validation good average comparability of the two methods was observed. The average bias was (cid:0) 0.4%, with 85% of the differences within 20%. Moreover, the methods proved to be reproducible and robust within a one-year time interval. Conclusion: This study is the first to develop a quantitative volumetric absorptive microsampling based method for PEth measurements, in addition it is the first to perform a systematic comparison of PEth measurements between two laboratories. From the discussion on the encountered pitfalls it is clear that also on a global scale, more efforts are needed to improve interlaboratory agreement.


Introduction
In the quest of finding markers for monitoring long term alcohol consumption/abstinence with a higher specificity and sensitivity than today's standards, the usefulness of phosphatidylethanol is increasingly being recognized [1][2][3][4]. Phosphatidylethanol is a group of phospholipids, formed via the action of the enzyme phospholipase D, but only when ethanol is present [5]. The choice of the phosphatidylethanol analogue for the research described in this manuscript was based on the fact that phosphatidylethanol 16:0/18:1 is the most abundant and most commonly measured species [1,2]. In what follows we refer to this particular species as "PEth". PEth has a slow elimination rate with a reported half-life of approximately 4-10 days [6,7]. A single drinking episode resulting in a blood alcohol concentration of 1 g/L was reported to be detectable up to 12 days afterwards [8]. Hence, PEth is a highly selective and sensitive marker for alcohol intake, with a sensitivity of 95% and a specificity of 100% to detect chronic and excessive alcohol consumption vs. respectively 77% and 88% for carbohydrate-deficient transferrin [9,10]. PEth has the potential to become a key biomarker for routine screening in different settings and is already used in forensic psychiatry and monitoring programs, and for judging driving ability, the identification of alcohol intake in specific risk groups and for neonatal screening of prenatal alcohol exposure [11].
With PEth being located at the surface of red blood cells, its measurement requires the collection of whole blood samples [12]. Quick and easy blood collection, without the need of dedicated personnel may further increase the potential use of this biomarker: samples could be drawn at any moment/place. In view of this, fingerprick sampling combined with the collection of dried blood spots (DBS), was shown to be suitable for the quantitative determination of PEth [9,13]. In addition, there is an increased interest in the use of devices to volumetrically collect dried blood microsamples [14,15]. Best known in this context are the Mitra® devices, capable of volumetric absorptive microsampling (VAMS), i.e. the volumetric collection of blood by an absorptive tip. These devices overcome some limitations related to variability in processed sample volume upon taking a sub-punch of a conventional DBS, often referred to as the hematocrit effect [16][17][18]. Moreover, these devices may also offer an advantage from a sampling point of view, by increasing the user friendliness, as suggested by feedback obtained from VAMS device users [18,19].
The Ghent University's Laboratory of Toxicology (UGent) and the National Institute of Criminalistics and Criminology (NICC) recognized the potential of PEth as a valuable marker for alcohol intake and set up different projects to further evaluate its use. These projects include high numbers of study participants and aim at evaluating the applicability of PEth, the evaluation of our current in-house decision limits (see Fig. 1) and the use of VAMS in different settings (from sampling by study nurses and general practitioners to home sampling). The limits are based on previous work and reports by international peers [2,9,20], taking into account a "grey zone" with respect to the cut-off suggestive for excessive alcohol use. The latter gives the benefit of the doubt to the study subject and compensates for (i) the measurement uncertainty (as is also foreseen in the Belgian legislation for measurements performed in the context of driving under the influence of drugs) and (ii) the inconsistency between current decision limits. Several variables contribute to this measurement uncertainty/inconsistencypart is related to the insufficient comparability of current PEth methods, as can be deduced from the results of the Equalis PEth external quality scheme (product code 295, rounds 2017-2020), in which reported results may differ up to 2-fold, while z-scores may still be 'acceptable' [21,22].
Here, we report on the development and application of PEth 16:0/ 18:1 measurement procedures in both laboratories, and on how pitfalls encountered in getting the results comparable were solved. Method comparability was validated using 59 individual donor samples.

Materials and methods
A detailed description of the chemicals, sample preparation and liquid chromatographytandem mass spectrometry (LC-MS/MS) procedures applied in both laboratories can be found in supplementary data. In short, PEth was extracted from 10-μL VAMS after adding 250 μL of extraction solvent and 60 μL methanol containing 25 ng/mL (0.034 μM) PEth-D5 as internal standard (IS) and shaking (1400 rpm) for 60 ′ at room temperature (RT). The extraction solvent consisted of 2 mM ammonium acetate 0.01% formic acid in a 2/8/0.2 water/isopropanol/ formic acid mixture. Subsequently, this extract was subjected to liquidliquid extraction (LLE) using 1 mL n-hexane and shaking (1400 rpm) for 10' at RT. The n-hexane fraction was collected, dried and reconstituted in 50 μL of injection solvent, of which 5 μL was used for the measurement with two completely different LC-MS/MS procedures. For quantification, both methods used m/z 701 → 255.

Preparation of calibrators and QCs
We refer to the supplementary data for a detailed description of the preparation of the calibrators and controls, and the verification experiments performed to establish the final protocol. Pipetting of whole blood is involved. Therefore, to keep precision/accuracy under control, only fresh blood was used and all pipetting steps were done with gravimetric control. From a master stock solution of 8 mg/mL (11 mM) PEth 16:0/ 18:1, working solutions in whole blank blood were prepared with PEth concentrations of 10, 20, 50, 100, 250, 500, 1000 and 2000 ng/mL (0.014, 0.028, 0.069, 0.138, 0.345, 0.670, 1.38 and 2.76 μM) and internal quality control (IQC) samples with PEth concentrations of 10, 30, 500 and 1500 ng/mL (0.014, 0.041, 1.38 and 2.07 μM). For the final protocol, spiked samples were equilibrated overnight at 4 • C before sampling on VAMS. The lower and upper limit of quantification (LLoQ and ULoQ), at 10 respectively 2000 ng/ml, were defined based on previous work [9].
External Quality Assessment (EQA) samples were obtained from Equalis (Uppsala, Sweden) -rounds 17:03, 18:01 and 18:03 [21]. To avoid possible commutability issues (more specifically, the provided blood might behave differently from fresh blood when being absorbed by the VAMS tips), EQA samples were diluted 1:5 with whole blank blood and equilibrated overnight (4 • C) before application to VAMS. Four different dilution lots were prepared over time.

Sample collection
Blood sample collection was approved by the Ethics Committee of Ghent University Hospital (EC UZG 2018/0740 -Registration n • B670201836586). Venous blood was obtained from an alcohol abstinent healthy person in citrate tubes (BD vacutainer®, 2.7 mL). Mitra™ devices, obtained from Neoteryx® (Torrance, CA), were used to generate dried blood samples from the venous whole blood. Samples were prepared by allowing the absorptive tip of the devices to wick up blood, by touching the surface of the blood, thereby taking care to prevent overfilling. After the device was completely filled with blood, contact remained for 2 more seconds before the device was placed in the accompanying plastic clamshells and dried for 2 h at RT. For longer storage the clamshells were put in a zip closure plastic bag containing a 5 g MiniPax® absorbent packet from Sigma Aldrich (Diegem, Belgium) and stored at RT. Multiple series of calibrators and IQC samples were sampled at once and stored until further analysis.
For the method comparison, samples collected from persons in a driving license regranting program or samples from routine toxicology cases were used. Capillary blood was directly obtained through a fingerprick with a BD® Microtainer (contact-activated lancet). The first drop of blood was wiped off, while the second drop was used for sampling. Samples were collected over a one year time interval, with measurements at UGent being performed within a 2-3 weeks' time interval after collection. At the NICC, the samples were measured at the end of the sample collection period. Sample storage was as described above.

Prevalidation experiments
In order to investigate initial differences between the methods at UGent and NICC, several experiments were performed. These encompassed comparison of results for the same extracts (from calibrators, 4 IQC samples in duplicate, 7 EQA samples in duplicate and 9 native samples), analyzed at both sites, to investigate whether observed differences were due to the LC-MS/MS method itself. Also the UGent and NICC stock solutions were directly compared, by generating 3 independent working solutions, which were further diluted with water to prepare 3 calibrators (2000 ng/mL (2.76 μM)). Ion ratios (area analyte/ area IS) of the UGent and NICC calibrators were compared (Details can be found in supplementary data).

Method validation
The method validation covered selectivity, calibration model and homoscedasticity, trueness, precision, carry-over, matrix effect, extraction efficiency and stability. Besides these parameters, which are based on U.S. Food and Drug Administration and European Medicines Agency guidelines for bioanalytical method validation [23][24][25], the influence of the hematocrit (hct) on the extraction was also investigated, in line with the IATDMCT guideline on validation of DBS-based procedures [26]. The LLoQ was not determined but was pre-fixed based on the cut-off value of alcohol abstinence described in literature (see Fig. 1). Details can be found in supplementary data.

Method comparison study
A set of 59 individual donor samples was measured in both laboratories. Comparability was evaluated using the criteria generally applied for incurred sample reanalysis, i.e. at least two-thirds of the results should not deviate more than 20% of their mean [24,25]. Furthermore, results were evaluated using Passing Bablok regression and a Bland-Altman plot. For the latter, the variation in the %differences (expressed as 1.96 times the standard deviation (SD) of the % differences), should not exceed 29.9% (total error: observed average absolute bias + 1.96 * observed average total precision; see Table 3 for the numerical values). To exclude that the difference in time between the initial measurement at UGent, and the incurred sample analysis at NICC would have an influence on the results, the % deviation was also evaluated against the difference in time.

Measurement uncertainty
Measurement uncertainty was calculated following the "Handbook for calculation of measurement uncertainty in environmental laboratories" and was based on the total precision for the method and a bias component derived from the results within the EQA scheme (taking into account the bias and its standard deviation) [27,28]. The acceptance limit of 42.4% for the measurement uncertainty was propagated based on the individual limits for precision and bias (both 15%).

Data analysis
Data analysis was done using Excel and Medcalc statistical software, version 14.12.0 (Ostend, Belgium). Statistical tests were performed 2sided, with 95% confidenceunless otherwise stated.

Preparation of calibrators and internal quality control samples
As scouting experiments had revealed that discrepant results for authentic VAMS samples were obtained, depending on whether VAMS calibrators were prepared from freshly spiked blood vs. blood that was allowed to equilibrate with spiked PEth for some time, we systematically evaluated this variable. Results obtained with liquid whole blood calibrators analyzed immediately after spiking whole blood, or after storage for 72 h at 4 • C were compared to results obtained with VAMS-based calibrators, generated immediately after spiking whole blood, or after first equilibrating the blood at 4 • C for 1, 8, 24 or 72 h. These revealed no significant differences (95% confidence) for blood equilibrated for 1, 8 and 72 h. For the final protocol the most time efficient option (i.e. overnight equilibration) was adopted. The supplementary data contain details for both experiments. An impact of incubation time on the extractability was also observed by others [29,30].

Method validation
The methods proved to be selective, as no unacceptable interfering peaks were detected in the blank samples (i.e., the blank blood used to prepare the calibrators, 4 different donors during the course of the validation). For calibration, at UGent, quadratic curves (weighing 1/x 2 ) showed the best fit, while at the NICC a linear fit (weighing 1/x) was best. The mean back-calculated PEth concentrations were within ±15% difference of the nominal values (20% for LLOQ), except for 2 values observed in eight calibrations curves at the NICC.
At UGent, from the signal:noise of the LLoQ (18.4 ± 4.7 (SD), n = 10*2), the LoD was estimated to be 1.7 ng/mL (0.002 μM). At the NICC, the LoD was not determined, as (quantitative) results are only reported from the LLoQ concentration level onwards.
There was no signal detected in blanks injected after samples with a concentration at the ULoQ.
The non IS-compensated matrix effect was 79% and 114% at UGent and NICC, respectively indicating some suppression and enhancement of ionization (Table 1). In both labs, an improvement was seen for the IScompensated matrix effects, at 98% and 109% for UGent and NICC, respectively. Importantly, inclusion of the IS also led to a substantial improvement of the %RSD values, all being below 8%, which is well within the acceptance limit of 15%.
Recovery of the analyte over the entire procedure ranged from 44 to 64% and was consistent within each condition (%RSD < 15%) ( Table 2). For the low QC, a slight hct dependence was observed, as the IScompensated recovery for the low hct was not within 15% of the recovery for samples with a normal hct. For the subjects under study in the envisaged application, hct is expected to be within the normal range (36-50%) [31]. Hence, in this case there will be no impact. The efficiency of the LLE procedure was on average 75.2% (varying from 71 to 81%), with no noticeable concentration-or hct-dependence (Table 2).
PEth was stable in extracts after 72 h of storage in the autosampler at 4 • C (differences ≤ 5.1%; Supplementary Table S2). In VAMS samples, stability was demonstrated for at least one week at the three different evaluated temperatures (4 • C, RT, and 45 • C) and for one month at RT, as deviations did not significantly exceed ±15% (Supplementary Figure S3). Furthermore, results for the 4 EQA samples revealed that VAMS samples can be stored for at least 400 days at RT, as longitudinal analysis of EQA results revealed no discernible trend over time (slope not significantly different from 0 (P<0.0001)), the vast majority of the PEth results (87%) laying within ±15% of the normalized mean (Fig. 2). Note that, taking into account the mean total precision of the method (11%) and the bias criterion (15%) -a mean result of 2 duplicates on a single day may deviate 35% (= maximum allowable bias + z (2.58) * precision/square root of n). None of the results is outside this range.
Data for bias and precision are summarized in Table 3. Acceptance criteria were met at every concentration level, with a single exception (total precision at LLoQ for the NICC, at 20.3% narrowly exceeding the acceptance criterion of 20%). The expanded measurement uncertainty for results obtained by either of the two laboratories, calculated based on these results, was 38% (coverage factor k=2), which is below the preset specification of 42.4%. Fig. 3 shows the results of the method comparison in a scatter plot with Passing Bablok regression analysis (A) and in a Bland-Altman Plot (B). Both show the good average comparability of the two methods, as the slope and intercept are not significantly different from 1 and zero, respectively, both having relatively narrow confidence intervals in the Passing Bablok curve. On average there was essentially no bias (-0.4%) between both methods, the bias at the decision points being 0.9% (at 20 ng/mL (0.028 μM)) and 3.6% (at 270 ng/mL (0.37 μM)). Results were above LoD for 54 samples at Ugent (> 1.7 ng/mL (0.002 μM)) and above LLoQ for 51 samples at NICC (LLoQ > 10 ng/mL (0.014 μM)). The pre-set acceptance limits (29.9%) for the variation of the %differences (see section 2.5) aligned reasonably well with the limits of agreement (expressed as 1.96*SD of the %differences) in the Bland-Altman plot, suggesting the fit-for-purposeness. This was also underscored by the fact that only 15% of the samples had a deviation outside the 20% limit, the highest deviations being observed for samples with a concentration close to the decision limit (20 ng/mL (0.028 μM)). In our routine practice, most samples analyzed in the context of driving license regranting have (far) higher concentrations (median 181 ng/ml; n = 716). Furthermore, Fig. 3C, which gives an overview of the samples scored between 9 and 400 ng/mL (0.013-0.55 μM), shows that, overall, taking into account the measurement uncertainty and the decision limits mentioned in Fig. 1, a consistent scoring would be obtained by both labs. Last, no influence of the difference in time between both measurements was observed (Fig. 3D).

Discussion
Stimulated by the increased recognition of PEth as a useful marker for alcohol consumption and as a follow-up to our previous research, UGent and NICC decided to collaborate, both implementing their own method, to initiate a Belgian network of laboratories capable of quantitatively determining PEth [1][2][3][4]9,18]. Broad applicability of the method is guaranteed by utilizing as a starting point a simplified blood collection procedure, based on a finger prick. Because VAMS allows a fixed volume sampling of blood and because of previous positive experience, this approach was chosen to collect dried blood microsamples [18]. Besides the ease of sample collection, transport and storage, dried blood microsampling approaches hold other advantages as well, which are particularly relevant for PEth: PEth is very stable in dried blood (in contrast to liquid blood) and ex vivo formation of PEth, because of the presence of alcohol in blood, can be avoided [32]. This may be an issue when blood is not properly stored, giving rise to falsely elevated results [22].
Both laboratories used the same extraction procedure allowing to attain an LLOQ of 10 ng/mL in a robust manner from merely 10 μL of blood. The actual measurements were done using two completely different LC-MS/MS methods (different LCs, columns and MS instrumentation). Initial trueness experiments using EQA and native samples revealed a 30% difference between the two laboratories, in contrast to comparable results for spiked samples. At first, this was attributed to the equilibration of spiked calibrators before their application on VAMS.
The equilibration experiments suggest that spiked PEth should be given time to become incorporated into the membrane of the red blood cells, as in native samples. If not, the extraction efficiency of PEth from the VAMS is higher for incompletely equilibrated spiked samples compared to native samples. This could lead to an underestimation of PEth concentrations in real samples, as well as a potential bias between labs. This hypothesis is supported by the findings of others [29,30] and by the fact that for blood samples (not dried to VAMS), the results obtained for native samples are comparable, whether or not PEth spiked to calibrator samples was allowed to equilibrate with the red blood cells. Finally, calibrators were equilibrated overnight at 4 • C before application to VAMS. Confidence in the applied protocol was found in the fact that, over one year and 4 different batches of calibrators, results for EQA samples were stable (see also Fig. 2).
Allowing the calibrators to equilibrate did not result in comparable data.    (6) and (ii) a different fragmentation efficiency of the side chains of the two isomers (the sn-2 position being the preferred fragmentation site (m/z 281 for PEth 16:0/18:1)) [33,34]. As in high-throughput methods, both isomers typically aren't chromatographically separated, native samples might have an ion ratio that is distinct from that observed in calibrators, depending on the calibrator chosen [33]. This is an issue that has been largely neglected in the literature and can only be solved by reference material manufacturers. When both labs used the transition 701 → 255, analysis of the same extracts (for both calibrators and samples) resulted in an average difference of 1% between both labs, with a SD on the differences of 10%. Yet, the implementation of this second measure still didn't result in comparable data for samples that were analyzed using independently prepared calibration curves. Hence, we directly compared the different lab's stock solutions to exclude that these would result in a calibration bias. Three independent working solutions prepared from the stock solutions of the two labs were compared. Observed isotope ratios (%RSD) for the UGent solutions were 16.9 (2.4%) vs. 10.0 (3.1%) for the NICC solutions, indicating that, although both laboratories used the same reference standards, the stock solutions deviated. Rather than searching for the root cause of this difference, new and independent stock solutions were made, using the most accurate procedure possible. Using two new vials of reference standard (one in each lab), meticulously following the protocol described in supplementary data, resolved the issue. This protocol involved weighing the original vial before and after transfer of the PEth. The rationale behind this was that it is common practice for manufacturers to overfill the vials so that the customer receives at least the amount of analyte ordered [35]. Hence, taking the nominal weight instead of the exact amount could possibly lead to a calibration bias. After applying this protocol, the results obtained by both laboratories were finally comparable, and the actual method validation and formal method comparison could start.
In both laboratories, the method for quantification of PEth in VAMS samples was fully validated and the results showed that, overall, the preset quality specifications were met for all investigated aspects. Moreover, the obtained LLoQ, recovery and trueness are comparable to most recent published PEth 16:0/18:1 methods. Although precision seems to be a little higher, it should be noted that most published methods started from larger starting volumes of whole blood. Here, the starting volume was merely 10 μL of blood, in the format of a dried VAMS sample. While we and others have previously demonstrated that conventional DBS can be used for PEth analysis, the method reported here is the first to quantify PEth, starting from 10 μL VAMS samples. [29,[36][37][38][39]. In a real-world setting, with non-supervised sampling by non-trained individuals, the %RSD after application of the entire procedure (from sampling to analysis) is somewhat higher (14%) than that observed here for lab-generated samples (11%), but still acceptable, as we reported elsewhere [19].
At UGent, the method has been implemented on a weekly basis for over a year. During this period, multiple sets of calibration curves and EQA lots were prepared and measured. This allowed us to validate the method's robustness as well as the stability of the dried VAMS samples. No lot-to-lot variation was observed when evaluating the EQA results, as both the precision and trueness proved to be stable over time.
Combining the data obtained from the evaluation of different calibrator lots and EQA lots, made at different time points, also proved the longterm stability of PEth in the VAMS devices (Fig. 2). At the same time, this also demonstrated the consistent extractability over time, an important parameter when dealing with VAMS [40,41]. The excellent stability and consistent extractability could also be derived from the results of the method comparison, which can also serve as an incurred sample reanalysis experiment, with a time difference between the two measurements ranging from 30 days up to almost 400 days. In addition, the results of the method comparison demonstrated the good comparability of the results, independently obtained by both laboratories, with an average difference of -0.4%, and 85% of the samples having a difference within ±20%, thereby fulfilling the requirement of incurred sample reanalysis (i.e. 2/3 of the samples should be within 20% of the average).
The efforts that were required to achieve comparable results between only two laboratories made us realize that one should be very careful when considering and interpreting published PEth results. Some of the observations that were described above would never have been made when only using spiked QC samples, as these will always behave in exact the same way as spiked calibrators. This potential issue of noncommutability between on the one hand spiked calibrators/QCs and on the other hand native samples points to the importance of including native samples from the initial validation experiments on. In addition, as also pointed out by the recent publication of Luginbühl et al., the choice of the reference standard and the quantifier ion may have an important consequence on the numerical value that will be reported [33]. Given this discrepancy, and given the judicial framework in which we (like many labs that measure PEth) operate, we opted to use the transition 701 → 255 for quantitation purposes. By doing so, we give the benefit of the doubt to the people under investigation. However, the fact that different labs may opt to use different transitions, yielding different results, also demonstrates the urgent need to standardize PEth quantification in order to allow more reliable inter-method comparisons and to justify the use of common cut-off values. This is yet another reason to interpret the results of EQA schemes with utmost care, as we have no reference as to which calibrator and which quantifier was used by which participant.

Conclusion
This study is the first to implement PEth quantification based on dried blood samples after volumetric microsampling. A technique which is highly suitable in the main areas of interest for PEth determination, where sampling by untrained professionals, regardless place and time is of benefit. Moreover, it is the first to perform a systematic method comparison of PEth analysis between 2 different labs. The winding road to achieve comparable results points out the important methodological aspects that need to be tackled: from trueness verification of the calibration protocol, starting with the primary material and the preparation of the stock solutions, over adequate equilibration of the calibrators and QCs with spiked PEth, and verification of the comparability of results obtained with different m/z transitions. Several of these phenomena can only be verified by critically assessing spiked ánd native samples. Up to now, to the best of our knowledge, only three other groups briefly mention one of the observed phenomena [29,30,34]. The final method comparison in this report underpins the suitability of both labs' methods for the intended use. The robustness of the methods and the stability of the samples allow to conduct large-scale epidemiological studies, with comparable results regardless the time point of sample collection, the time point of measurement and even the laboratory.
Given the worldwide increased interest to use PEth as a primary marker for the follow-up of (abstinence from) alcohol use, it is essential that also on a global scale more method comparability is achieved. This will require a concerted effort, including an increased comparability between primary reference materials, and a consensus on which analyte should be measured: PEth 16:0/18:1 -with or without separation of PEth 18:1/16:0 -or other PEth-analogues, or even the sum of different PEth analogues. This will better allow future research to reach a consensus on decision limits, discern reliable PEth half-life, etc. In addition, further improvement of the reliability of PEth analysis will improve the consistency of the values obtained by different laboratories (and decrease their uncertainties) and will further strengthen the implementation of its use in a variety of contexts.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.