Chemical profiling of fingerprints using mass spectrometry

sample sizes and the focus on analytical method development. It this paper, we analyzed the chemical composition of 1852 fingerprints, donated by 463 donors during the Dutch music festival Lowlands in 2016. In a targeted approach we compared amino acid and lipid profiles obtained from different types of fingerprints. We found a large inter-variability in both amino acid and lipid content, and significant differences in L -(iso)leucine, L -phenylalanine and palmitoleic acid levels between male and female donors. In an untargeted approach we used full-scan MS data to generate classification models to predict gender (77.9% accuracy) and smoking habit (90.4% accuracy) of fingerprint donors. In the latter, putatively, nicotine and cotinine are used as predictors.


Introduction
For over 100 years, fingermarks have been used for individualization purposes. Even today, fingerprint evidence is widely used in forensic science. A fingerprint originates from contact between a bare fingertip and a surface and mainly consists of eccrine and sebaceous excretion [1]. Because the excretory glands are only located on the friction ridges and not in the furrows, the excretion can be left behind on the surface in the distinctive fingerprint pattern. Comparison between a fingermark found at a crime scene and a reference fingerprint can lead to individualizing the perpetrator. However, not every fingermark found at a crime scene can be used for individualization purposes. It may be the case that the reference fingerprint is not in the database, or that the fingermark is of poor quality in terms of ridge detail, for example due to distortion of the fingerprint pattern. Therefore, there is great interest among forensic investigators in techniques that can still retrieve basic donor information from these kinds of fingerprints.
Several studies have been carried out into the chemical composition of fingerprints. In particular, common metabolites such as amino acids and fatty acids have been investigated. Gas chromatography coupled to mass spectrometry (GC-MS) was used in numerous studies both to study amino acid and fatty acid profiles from fingerprints [2][3][4]. Besides GC-MS, multiple other analytical techniques have been used in studies into the amino acid profiles of fingerprints. Capillary Electrophoresis (CE) and (Ultra) High Performance Liquid Chromatography ((U)HPLC), coupled to a mass spectrometer as detector, are examples of techniques that have been successfully applied to separate and quantify amino acids retrieved from fingerprints [5][6][7]. Several mass spectrometry imaging approaches, often using Matrix Assisted Laser Desorption Ionization (MALDI), have been used to investigate the composition of fingerprints as well [8][9][10][11]. A major drive behind the research into the chemical composition of fingerprints is that eccrine or sebaceous excretion is likely to be influenced by several donor factors, such as the change in the composition of sebaceous excretion before and after puberty, and thus donor information may be derived from fingerprint residue [12,13]. The chemical composition of a fingerprint has been reported to be influenced by a wide variety of donor traits [1,14]. Extensive research has shown the influence of key donor factors, such as age and gender [9,[15][16][17][18]. The presence of exogeneous compounds has been investigated as well, and is reported to provide information about an individual's lifestyle [19,20]. Additionally, illicit drugs and their metabolites have been detected in fingerprints using various surface mass spectrometry approaches [21,22]. Despite the numerous studies into the chemical composition of fingerprints, due to the limited sample sizes and the emphasis on method development, it is yet unclear which questions the analysis of the chemical composition can answer in practice. The aim of this study is to unravel what the analysis of the chemical composition of fingerprints can reveal about the donor, by analyzing the relation between fingerprint composition (natural, eccrine and sebaceous) and the characteristics of a large set of donors. To be able to categorize donors based on their fingerprint composition, the influence of common donor factors, such as gender, age, diet, smoking habit and (medicinal) drug use needs to be analyzed.
In this study, we analyzed the chemical composition of 1852 fingerprints, donated by 463 donors (179 male, 280 female and 4 unknown, ranging from 18 to 63 years old, median age of 26) on the Dutch music festival Lowlands in 2016. In a targeted metabolomics approach, we analyzed the amino acid profiles of 926 fingerprints (463 natural and 463 eccrine) using LC-MS and quantified the fatty acid, squalene and cholesterol content in the other 926 fingerprints (463 natural and 463 sebaceous) using GC-MS. We used permutation tests to find significant differences in amino acid and lipid abundances. In an untargeted approach, we used the full-scan MS data from the analyzed fingerprints to generate conditional inference trees [23] as classification models to predict two forensic relevant donor traits: gender and smoking habit of fingerprint donors.

Fingerprint collection and processing
Fingerprints were donated by volunteers at Lowlands Science 2016. The material was gathered in a non-invasive manner and did not infringe on any privacy of the donors; fingerprints were purposely donated in a light smearing motion to minimize the number of individualizing characteristics. All experiments were carried out following institutional guidelines and relevant laws. Ethical approval was obtained before executing the experiment. All participants were 18 years or older, gave informed consent and received a debriefing form after participating in the experiment. All participants (4 6 3) donated two natural fingerprints, one eccrine fingerprint and one sebaceous fingerprint. Participants were asked to rub their hands together to create homogeneity and donate two fingerprints with their index finger on 2.5 by 5 cm aluminum foil sheets. The participants then transferred the sheets to two 15 mL conical tubes. To one of the tubes 2 mL MeOH containing 5% (v/v) formic acid and 0.01 mg/L IS amino acids was added by means of a dispenser. To the other, 2 mL MeOH containing 1.5 mg/L docosane was added by means of a dispenser. Both samples were mixed using a vortex mixer for 30 seconds. The participants were then asked to wash their hands thoroughly with soap, dry them with paper and then put on nitrile gloves. The participants then filled out a digital questionnaire consisting of 31 questions regarding general donor factors and habits such as age, gender, diet, smoking habit and (illicit) drug use (questionnaire can be found in supplementary material). After completing the questionnaire, the participants were asked to remove their gloves, rub their hands together to create homogeneity and donate one fingerprint with their index finger on a 2.5 by 5 cm aluminum foil sheet. Then, participants were asked to donate a sebaceous fingerprint by grooming their foreheads before donating the last fingerprint. The aluminum sheets were transferred by the donors to 15 mL conical tubes and, by means of a dispenser, to eccrine fingerprint 2 mL MeOH containing 5% (v/v) formic acid and 0.01 mg/L IS amino acids was added, while to the sebaceous fingerprint 2 mL MeOH containing 1.5 mg/L docosane was added. Both samples were mixed using a vortex mixer for 30 seconds. After collection, all samples were stored at 4°C for the first 72 hours, then at −20°C until analysis.

UPLC-MS
For each donor, one natural and one eccrine fingerprint were analyzed using UPLC-MS. The fingerprints were first brought to room temperature and subsequently mixed using a vortex mixer for 30 seconds. Then, the aluminum foil sheets were removed using clean tweezers. Samples were then evaporated under nitrogen flow and subsequently reconstituted in 50 μL MeOH containing 5% (v/v) formic acid and transferred to an injection vial. UPLC-MS was executed as described previously [6]. Liquid chromatography was carried out using a 150 mm UPLC ethylene bridged hybrid (BEH) amide column (Waters, Milford, MA, USA) and an Aqcuity I-class UPLC autosampler and binary solvent pump (Waters, Milford, MA, USA). The flow rate was set at 0.500 mL/ min. Column eluent was sprayed into the ion source of the time-of-flight MS by electrospray ionization (ESI). The solvents used for UPLC were (A) 0.4% formic acid in acetonitrile (ACN) and (B) 0.4% formic acid in ultrapure water (purified deionized water, to attain a sensitivity of 18 MΩ-cm at 25°C). The gradient used was 95% A for 3 min, followed by a linear gradient from 5 to 50% B in 19 min and then 1 min of 50% B. Finally, the column was reconditioned for 2 min with 95% A (run time totaling to 25 min). For each analysis 2.5 μL of sample solution was injected. Samples were analyzed on an accurate mass TOF with a dual ESI (Agilent 6220, Santa Clara, CA, USA). The system was operated in the positive ion mode. MS spectra from m/z 40-1200 were acquired at a resolution of 7500 at m/z 400 at a rate of 1 spectrum per second. The capillary voltage was set at 3.5 kV, the source gas temperature at 325°C and a drying gas flow of 5 L/min was used. The nebulizer pressure used was 30 psig, while fragmentor, skimmer and octapole 1 RF voltages were set at 160, 65 and 250 V, respectively. MS full scan data were acquired with Agilent Mass Hunter Data Acquisition software (version B.04.00) and data were processed using Agilent Mass Hunter Qualitative Analysis software (version B.05.00) and Quantitative Analysis software (version B.05.00). To quantify amino acid content, a calibration series was prepared ranging from 0.10 to 1.60 mg/L per amino acid. If amino acid quantification results were outside the calibration range, samples were diluted and reanalyzed. Amino acid quantification results were used for the targeted data analysis, while the full scan LC-MS data was used in for an untargeted approach.

GC-MS
Lipid components were extracted and quantified in a two-step method as proposed by Cadd et al. [4]. For each donor, the fatty acid, cholesterol and squalene profile from one natural and one sebaceous fingerprint were determined. The fingerprints were first heated to room temperature and subsequently mixed using a vortex mixer for 30 seconds. The sample solution was then transferred to a 10 mL glass tube and 100 μL chlorotrimethylsilane was added. The aluminum foil sheet was transferred to a new 10 mL glass tube and 2 mL of chloroform was added and subsequently mixed using a vortex mixer for 60 seconds. Then, the chloroform sample solution was transferred to the tube already containing the methanol and chlorotrimethylsilane sample solution and subsequently samples were evaporated under nitrogen flow. After reconstitution in 50 μL of chloroform, samples were transferred to injection vials (with 50 μL inserts). Analyses were carried out on a GC-MS HP6890/5973 (Agilent Santa Clara, CA, USA). An HP-5MS column was used for separation (30 m length, 0.25 mm internal diameter, film thickness 0.25 μm). Aliquots of 5 μL were injected onto the column in split mode (1/20) using an auto sampler and a 4.5 min solvent delay. The temperature of the liner was held at 250°C and helium was used as a carrier gas. The column temperature was held at 80°C for 1.0 min and heated to 230°C at a rate of 10°C/min. Then, the column was heated to 310°C at a rate of 4°C/min and this temperature was held constant for 8 min. A calibration series was prepared using the lipid stock solution in concentrations of 0, 0.01, 0.02, 0.05, 0.1 and 0.2 g/L and was processed in the same way as the fingerprints deposited on aluminum foil. If lipid quantification results were outside the calibration range, samples were diluted and reanalyzed. Lipid quantification results were used for the targeted data analysis, while the GC-MS data were used in for an untargeted approach.

Data analysis
Data was analyzed with R (version 3.4.2) using R studio (Version 1.1.456). R was chosen because of the wide availability of packages for preprocessing, visualization and machine learning approaches. Additionally, as it is open source software, it is available to others in the field. In the data analysis, both a targeted metabolomics approach and an untargeted profiling approach were implemented. Both the quantitative data of the targeted metabolites as well as the raw LC-MS and GC-MS data will be made publicly available within 6 months after publication (reserved doi: https://doi.org//10.4121/uuid:0611ccbb-1e5a-4bf4-b6da-abc115ca0c98).

Targeted metabolomics
The quantified amino acid and lipids were analyzed in a targeted metabolomics approach. The targeted metabolites were regarded as a sub composition of the fingerprint residue, as not all fingerprint components were quantified, and the analytes of interest are thus a sub composition of a larger, unknown, composition. Therefore, a compositional data analysis approach was used, as described by Aitchison [24], which is based on the additive log-ratio transformation (alr): where each part x j of the composition is transformed to a log-ratio with common divisor x D . L-threonine and palmitic acid yielded the lowest total variation (data not shown) and were chosen as reference part (x D ) for the eccrine and sebaceous compositions, respectively. To deal with missing values in the targeted metabolites, the following transformation was used, denoted genlog: in which m is the smallest non-zero value in a vector and int(x) a function that drops all digits after the decimal point. The subtraction of the constant c from each element after log transformation ensures that the lowest value in a vector remains zero. The alr transformation from Eq. (1) adapted to Eq. (2) leads to the function that has been used to transform the targeted metabolite values: To reduce the number of missing values, the number of variables was reduced by applying a modified 80% rule [25]. This meant a variable was included if at least 80% of the values is non-zero in the samples of any class. Permutation tests were executed as statistical tests to find significant differences in targeted alr transformed amino acids or lipids. The Agresti-Coull interval is used as an approximate binomial confidence interval [26].

Untargeted analyte profiling
We analyzed the data in an untargeted profiling approach using the XCMS package [27][28][29]. LC-MS data files were converted to mzXML using MSconvert [30]. GCMS data files were converted to CDF using openChrom [31]. Peak picking and retention time correction were optimized using the IPO package, using 10 randomly selected datafiles [32]. Analyte difference reports were subsequently generated for the following classes; type of fingerprint, gender and smoking (every day and past 24 hours). Peak areas were normalized using the total sum of peak areas per sample. Classification models were generated using the caret package [33]. 30 features were preselected with minimum redundancy and maximum relevance using the mRMRe package, using Spearman's rho as a measure of the correlation between the features in the dataset [34]. The data was subsequently preprocessed using the Yeo-Johnson transformation. Conditional inference random forests (cforest) were used as classification models, using the party package [35][36][37]. The data were randomly divided in train and test sets (75% and 25%, respectively). Classification accuracy was evaluated by generating confusion matrices using the caret package [33]. Variable importance was evaluated using the varImp function (caret package), to find the most important predictors of the generated classification models. P-values associated with fold changes were calculated using ttests. Features were putatively annotated using the online METLIN mass spectral metabolite database [38].

Targeted metabolomics
In total, 463 participants donated 4 fingerprints (2 natural, 1 eccrine and 1 sebaceous fingerprint), resulting in the analysis of 1852 fingerprints. Using LC-MS, the amino acid profiles of one of the natural and the eccrine fingerprint of each donor was determined. Using GC-MS, the lipid profile of the remaining natural and sebaceous fingerprints was determined. Fig. 1 shows the distribution of the total amino acid and lipid content for the different types of collected fingerprints. Clearly, the variability in total amino acid is large, ranging from below 100 ng to above 10 µg per fingerprint. The amino acid content was generally higher in the natural fingerprints when compared to the eccrine fingerprints (5.4 fold on average, p-value 3.48E-04). This might be explained by the fact that the participants washed their hands and only wore gloves for a limited time before donating the eccrine fingerprint. The variability in total lipid content shows a similar pattern, ranging from 100 ng up to 100 µg per natural fingerprint. The lipid content is higher in the sebaceous fingerprints (5.9 fold on average, p-value 2.32E-11), ranging from about 1 µg to over 100 µg per fingerprint.
Subsequently, multiple permutation tests (104 permutations per test and α = 0.05) were performed to identify potential metabolic markers using the additive log-ratio transformed metabolite concentrations ( Fig  S1). Table 1 displays the significant results between donor classes if found significant in both types of fingerprint samples (natural and eccrine/sebaceous). The rationale behind this is that changes found in both types of fingerprints are more likely to be a result of metabolic changes. Additionally, as the variation in fingerprint composition is large, the data from the eccrine and sebaceous fingerprints can be used to confirm findings from the natural prints. We do, however, also report significant differences, which might be of forensic relevance, found only in natural fingerprints. It must be noted that the fact that many donor traits did not lead to any significant findings in the targeted amino acid or lipid compounds, is possibly related to unbalanced sample sizes. This was the case for many of the questions regarding to medicinal drug use. Similarly, in case of donor age, over 90% of the participants was in the age range of 18 to 40 years.
Gender differences were found in L-(iso-)leucine and L-phenylalanine concentrations, which were found to be higher in men in both the natural and eccrine fingerprints (35.2% and 13.4% in case of L-phenylalanine, 13.3% and 7.9% for L-(iso)leucine). These findings are in line with previous findings in studies on amino acid serum levels, where six out of six studies that included both men and women also found significantly higher isoleucine levels in men, and five out of six for phenylalanine [39]. In comparison, Huynh et al. reported higher levels of Lphenylalanine in females compared to male fingerprint donors [40]. In case of the lipid compounds, palmitoleic acid was the only compound found to significantly differ between male and female donors in both the natural and sebaceous fingerprints. Palmitoleic acid was found to be 21.6% higher on average in natural fingerprints donated by male donors. In case of the sebaceous fingerprints, the palmitoleic acid content was 13.5% higher in case of male donors. Alanine concentrations were found to be 18.6% (natural) and 5.0% (eccrine) higher on average for donors who reported to have used cannabis in the last 24 hours. Donors that reported to have consumed > 15 units of alcohol in the 24 hours prior to donating their fingerprint, showed higher proline concentrations than those that consumed no alcohol, 1-5 or 6-10 units. Acute alcohol administration is known to cause decreased utilization of proline, which would explain an increased secretion in heavy drinkers (> 15 units) compared to other classes [41].
Previous studies have shown the ability to successfully detect illicit drugs and/or their metabolites in fingerprints of drug users using either DESI, LESA or MALDI-MS [21,22,42]. Moreover, the detection of certain drugs of abuse in fingerprints after contact has also been shown to be possible using techniques such as SIMS and Raman spectroscopy [43,44]. In this study, we investigated possible indirect effects of drugs use on the chemical composition (i.e. changes in amino acid or lipid profile). As mentioned before, only in the case of usage of cannabis and alcohol, a small but significant change in certain metabolite levels in both the eccrine as well as natural fingerprints was detected. Only in the natural fingerprints of donors who indicated to have used MDMA in the past 24 hours, we found tryptophan to be 38.5% higher compared to donors who did not use MDMA. Previous studies have shown that MDMA inhibits tryptophan hydroxylase activity [45][46][47]. This could possibly explain the higher abundance of tryptophan in donors that used MDMA. In the eccrine fingerprints no significant difference was found. Similarly, significant differences were found in L-asparagine levels in only the natural fingerprints from donors who indicated to have consumed diet soda in the past 24 hours and people who consumed regular soda or no soda at all. L-asparagine levels were 27.2% and   26.0% higher in natural fingerprints from donors that drank diet soda, compared to those who did not drink soda or drank regular soda, respectively. After ingestion, the artificial sweetener aspartame is hydrolyzed into L-tryptophan and L-aspartic acid, since it is the methyl ester of the dipeptide of these amino acid [48]. To our knowledge, no relation between increased levels of L-asparagine and aspartame has been reported previously, although it is well known that L-asparagine can be readily synthesized from L-aspartate by asparagine synthetase [49].
Since data were acquired in full scan mode, we subsequently investigated potential metabolic markers based on the full scan data in an untargeted approach.

Untargeted profiling
We aimed to develop classification models for forensic relevant donor factors such as gender, age, diet, smoking habit and (medicinal) drug use. Based on our data and the corresponding sample sizes, we selected donor gender and smoking habit to develop classification models. Conditional random forests were used as models, as they allow for easy interpretation of variable importance. As a proof of principle, models were generated based on the LC-MS and GC-MS data for the different types of collected fingerprints (i.e. natural vs eccrine for LC-MS, natural vs sebaceous for GC-MS). For these classification models, 30 features (normalized peak intensities) were preselected (using the mRMRe package, Table S1). In case of the LC-MS data, the model was able to predict the fingerprint type (natural or eccrine) with 95.3% accuracy (CI: 91.7%−97.6%, Table 2).
The most important predictor, turned out to be m/z 147.0760, putatively annotated as the amino acid L-glutamine. Table 3 presents the putative annotations of other predictors used in this model (see Figs S2 and S3 for volcano plot and abundance data). Most of these compounds showed higher normalized intensity in the natural samples, in line with the data from the targeted approach. Among these putatively annotated compounds was urea, a well-known component of eccrine excretion [18,50,51]. Next to the amino acids L-glutamine and L-arginine, which were found to be higher in the natural samples, several putatively annotated amino acid degradation products such as urocanic acid, pyroglutamic acid and 4-methylene-L-glutamine were found to be higher in the natural fingerprints. The possible increased abundance of amino acid degradation products in natural fingerprints can be explained by the fact that the eccrine fingerprints only contained fresh excretion and thus no amino acid degradation products yet.
The increased normalized intensity of L-glutamine and L-arginine in natural fingerprints might be caused by slower excretion of these amino acids. Interestingly, three m/z values putatively annotated to small peptides were included in the model, of which two were higher in the eccrine fingerprints. It could be hypothesized that these small peptides are excreted by eccrine glands, but are readily hydrolyzed. The higher abundance of one of the peptides in the natural fingerprints however, does not support this hypothesis. These peptides are likely to be hydrolysis products from larger peptides or proteins. Further analysis of the data reveals the possible annotation of additional short peptides (< 5 amino acids, Table S2) which were found in the full scan data. These were not included in the classification model.
When constructing the classification model for fingerprint type with the GC-MS data, the resulting accuracy was slightly lower: 86.8% (CI: 81.8% − 90.9%, confusion matrix in Table S3, 30 preselected features in Table S1). The model is mainly driven by the three most important predictors in this model m/z: 96.10, 137.20 and 203.20 (volcano plot and the normalized peak intensity of these predictors as well as the other predictors used in this model are displayed in Fig S4 and S5). Table 4 summarizes the putative fragments and their sources. All m/z values used in the classification were found to be higher in normalized intensity in the sebaceous fingerprints compared to the natural fingerprints. Mong et al. previously found unsaturated fatty acids to decrease faster in time compared to saturated fatty acids [52]. Similarly, squalene is known to degrade relatively fast in fingerprints, hence has been a compound of interest in fingerprint age estimation studies [2,53]. These findings are confirmed by the importance of squalene and monounsaturated fatty acids in this classification model. Additionally, all lipid compounds in the sebaceous fingerprints are more abundant compared to the internal standard (docosane) than is the case in the natural fingerprints, resulting in higher normalized peak intensities.
Then, we aimed to build a classification model to predict donor gender, using only 30 preselected features (Table S1), based on the full scan LC-MS data from the collected natural fingerprints. The accuracy of this model, based on the test set, was found to be 77.9% (CI: 69.1%−85.1%, Table 5). Interestingly, the sensitivity for males is significantly lower than for females (65.9% compared to 85.5%, respectively).
Among the most important predictors were m/z 284.0988 and m/z 169.0361, putatively annotated as guanosine and uric acid, respectively ( Table 6, volcano plot Fig. S6). Moreover, other m/z values incorporated in the classification model possibly correspond to guanine, uric acid and guanosine, all degradation products of guanosine monophosphate (GMP) (pathway depicted in Fig. S7). The normalized peak intensities of the used predictors are depicted in Fig. 2. Further analysis of the data reveals that a compound putatively annotated as xanthine also significantly differs between males and females, but that this compound played no role in the classification model. Possibly, quantitative information from xanthine is redundant since guanine and uric Table 1 Fold change in additive log-ratio transformed targeted amino acids (LC-MS) and lipids (GC-MS), if found significant in both natural and eccrine or sebaceous fingerprint. Significant differences were found in gender, cannabis usage and alcohol consumption classes. Under classes is specified: M = male, F = female in case of gender; Yes = used cannabis in past 24 h, No = not used cannabis in past 24 h in case of cannabis usage, and in case of response variable alcohol: the number of alcoholic consumptions consumed in the past 24 h. P-values were calculated using the Agresti-Coull interval as an approximate binomial confidence interval. acid are already incorporated. Although the compounds putatively annotated as products from GMP catabolism are included in the classification model for gender, there is, to our knowledge, no previous record of concentrations of GMP or related products in fingerprints. Studies into the concentration of cGMP and guanosine in nasal mucus and human brain tissue, respectively, found significant higher concentrations in females compared to males [54,55]. The m/z 166.0865 (detected in 175 out of 179 males and in 271 out of 280 females), even though not used in the classification model, is worth mentioning, as it corresponds to the [M+H] + of L-phenylalanine ([M+H] + of 166.0863) and was found to be 32% higher in males than females, which is in line with the fold change found in the targeted approach (35%, Table 1). No significant changes in L-(iso)leucine levels were found using this approach. Similarly, a classification model to predict gender was built using 30 predictors for the eccrine fingerprints. In this case the accuracy decreased to 71.7% (CI: 62.4%−79.8%, confusion matrix in Table S4, 30 preselected features in Table S1). Putatively annotated products from GMP catabolism were not included in this model, although some changes in uric acid and guanine were seen (respective 1.77-and 2.06-fold increase in males, data not shown).
Subsequently, we aimed to develop the corresponding classification models for gender based on the GC-MS data. The model based on the natural fingerprints had an overall accuracy of 68.1% (CI: 58.7%−76.6%, confusion matrix in Table S5, 30 preselected features in  Table S1). When the model was built based on sebaceous fingerprints, the accuracy decreased even further to 64.6% (CI: 55.0%−73.4%, confusion matrix in Table S6, 30 preselected features in Table S1). The low accuracy might be due to the large variation in sebaceous fingerprint content, which was also seen in the targeted approach.
Next, models were generated in attempting to classify smokers versus non-smokers, using only 30 preselected features. Based on the LC-MS data from the natural fingerprints, the generated model was able to achieve a 90.4% accuracy (CI: 83.4%−95.1%, Table 7). The most important predictor was m/z 163.1225, putatively annotated as nicotine ([M+H] + of 163.1230, Table 8). The second and third most important predictors were m/z 177.1024 and m/z 96.0444, respectively. The m/z 177.1024 possibly corresponds to cotinine ([M+H] + of 177.1022), the main degradation product of nicotine, which has been detected in fingerprints in previous studies [56,57]. The m/z 96.0444 matches with hydroxypyridine ([M+H] + of 96.0444). Although hydroxypyridine itself is not a direct degradation product of nicotine in humans, the pyridine pathway is a well-known microbial degradation route of nicotine, forming 2,5-dihydroxypyridine [58][59][60]. Putative hydroxypyridine, nicotine and cotinine levels were, respectively, 2.76-, 11.74-and 5.70-fold higher on average in fingerprints from smokers compared to those of non-smokers (Fig. 3, volcano plot Fig. S8). Prediction of smokers versus non-smokers was also attempted with the eccrine fingerprints and resulted in similar accuracy (90.2%, CI: 83.1% − 95.0%, confusion matrix in Table S7, 30 preselected features in Table  S1, volcano plot Fig. S9). In this model, only the m/z annotated as nicotine is incorporated, which was 6.80-fold high in smokers compared to non-smokers (Fig. 3). The absence of cotinine might be a result of the lower concentrations in the eccrine fingerprints. Hydroxypyridine, might be absent because it possibly is a microbial degradation product, Table 3 Features used in the classification model for fingerprint type based on the LC-MS data, their putative annotation, relative mass error (in ppm) and fold change. Pvalues were calculated using t-tests.  Table 4 Features used in the classification model for fingerprint type based on GC-MS data, their putative source and fold change. P-values were calculated using t-tests.   Table 6 Features used in the classification model for gender based on natural fingerprint LC-MS data, their putative annotation, relative mass error (in ppm) and fold change. P-values were calculated using t-tests. and thus is not excreted directly by the eccrine glands. We then generated the classification model to differentiate between donors who smoked in the past 24 hours and donors who did not smoke in the past 24 hours. Interestingly, the percentage of participants that indicated to have smoked in the past 24 hours was 36%, compared to 28% that indicated to smoke on an everyday basis. In case of the natural fingerprints, this resulted in an accuracy of 87.7% (CI: 80.3% − 93.1%, confusion matrix in Table S8, 30 preselected features in Table S1, abundance data and feature fold changes in Fig. S10 and Table S10)   hydroxypyridine are incorporated in this model, as was to be expected (respective 11.07-, 4.07-and 2.50-fold increase). This model has slightly lower accuracy as the model constructed for everyday smokers, mainly because of an increased false negative rate (i.e. predicting a smoker as non-smoker). This might indicate that it takes some time before the fingerprint composition is influenced by smoking, and putative levels of nicotine, cotinine and hydroxypyridine are not sufficiently high in fingerprints by occasionally smoking. This trend was also seen in case of eccrine fingerprints (accuracy of 87.6%, CI: 80.1% − 93.1%, confusion matrix in Table S9, 30 preselected features in Table  S1, abundance data and feature fold changes in Fig. S10 and Table S10), where only the m/z value likely corresponding to nicotine was included in the model (6.33-fold increase). Both these models show large similarity with the everyday-smoking models, since participants who smoke every day are likely to have smoked in the past 24 hours.
In the natural fingerprints of participants that indicated to have used cosmetic or personal care products such as make-up, sunscreen or hair gel, several exogenous compounds were putatively annotated, although classification models to predict the contact with such compounds could not be constructed successfully (Fig. S11 and Table S11). Among the putatively annotated compounds were ensulizole (sunscreen agent, change not significant), panthenol (moisturizer in cosmetic and personal care products), glycerol (humectant and lubricant in pharmaceutical and personal care products) and dimethylethanolamine (DMEA, used in skin care products, change not significant). Next to DMEA, the related compound choline was putatively annotated as well.

Discussion
From a forensic point of view, insight into potentially distinguishing properties of fingermarks is needed to increase the chances of individualizing or categorizing perpetrators. Previous research indicated differences in the chemical composition of fingermark residue, but advanced knowledge was still missing. The primary objective of this study was to identify differences in the composition of fingermark residue in relation to certain donor conditions. Various distinguishing compounds have been found in eccrine, sebaceous and natural fingerprints. The inter-variability in both natural and eccrine or sebaceous fingerprint residue was found to be large. It must be noted that deposition pressure was not controlled in this study. Recent work by Dorakumbura et al. showed that the percentage difference of squalene from deposits from two hands varies between 4 and 100%, when deposition pressure is not controlled [61]. This partly explains the large inter-variability found in our study. The intra-variability of many compounds remains largely unknown, but clearly could be significant as well if no consistent deposition force is used. Since we analyzed only a single fingerprint per donor per analysis method (LC-MS or GC-MS) and fingerprint type (natural/ eccrine/ sebaceous), the intra-donor variability was not considered in this study, and thus remains uncertain what implications this has on the results. Moreover, since natural fingermarks can be subject to numerous external factors, we cannot verify that the markers found are a result of metabolic changes. Nevertheless, these samples mirror fingermarks as they are mainly found in practice, and therefore of most interest to forensic professionals. It must also be noted that it remains uncertain if the detected potential markers would survive prior enhancement of fingermarks, as fingerprints constituents were directly dissolved in this study. Further research should focus on the compatibility of analytical techniques with common fingermark detection techniques, such as cyanoacrylate fuming. Alternatively, the development of extraction techniques that leave the fingermark ridge pattern unmarked, such as the use of hydrogels [62], could potentially overcome this problem as well.
In the targeted approach, compounds were found that could serve as potential metabolic marker for gender, cannabis usage and alcohol consumption. We are aware of the fact that the proposed generalized logarithm we used in this approach to circumvent the zero-value problem in log-ratios might affect inference as well. Nevertheless, we assume this did not have any major implications, as it tends to preserve the original order of magnitudes -and it seems a more appropriate method than the commonly used 'half minimum imputation'. Alternative approaches for censored data are described by Helsel [63]. Additionally, some conditions have highly unbalanced sample sizes. These included fairly unique traits, such as use of drugs, for which research ethics committees generally have strict rules. Therefore, in spite of the unbalanced sample sizes, we took these conditions into consideration. It remains uncertain if this had implications on the permutation tests.
In the untargeted approach, we developed classification models based on the full scan MS-data to retrieve information about donor traits based on fingerprint composition. Accurate prediction of donor gender and smoking habit, would be valuable information to forensic investigators, as it would enable significant reduction of the suspect population. In case of the model to predict smoking habit, m/z values corresponding to nicotine and cotinine were the most important predictors. Classification accuracy might improve further by setting environmental cutoff levels, as was proposed by Ismail et al. for cocaine and heroin [64]. These could potentially correct for detection of nicotine or cotinine which results from environmental contamination, such as passive smoking. It must be noted that the suggested putative annotations of the m/z values incorporated in the classification models can only be considered to be level 2 or 3 metabolite identifications, as defined by the 2007 metabolomics standards initiative [65]. They should serve as targets for further research, in which higher levels of annotations of these predictors should be achieved. Moreover, the quantitative analyses in this approach are based on the peak intensity normalized to the total peak sum, which is arguable. The fact that the relative change in L-phenylalanine was similar to the internal standard corrected targeted approach is an indication that this approach is valid.
We found several putative exogenous compounds originating from cosmetic and personal care products, in line with previous findings from Bouslimani et al. and Hinners et al. [19,20]. We were, however, unsuccessful in constructing accurate classification models based on these compounds. This is likely a result of grouping different personal care and cosmetics products together in the questionnaire, while these compounds may be specific to one class of products. Analysis of these exogenous compounds on a more specific fingerprint-to-fingerprint basis, as was executed in the aforementioned studies, might be more suitable.
In this study, many donor traits, such as the (medicinal) drug use and donor age, did not yield significant changes in targeted metabolite levels and were neither successfully predicted in the untargeted approach. In many of these cases, sample sizes were highly unbalanced. Future studies with larger, more balanced sample sizes, should be executed to investigate the feasibility of deriving information about these donor traits from fingerprint composition. The fact that smoking habit could be predicted with relative high accuracy based on this dataset is a promising lead for the development of classification models for similar stimulants based on fingerprint chemical composition.

Conclusion
We successfully collected a database of chemical profiles from 1852 fingerprints, donated by 463 donors. We found a large inter-variability in all analyzed types in fingerprints. Total amino acid levels were found to range from below 100 ng to 10 µg. The variability in total lipid content ranged from 100 ng up to 100 µg in natural fingerprints, while the lipid ranging from about 1 µg to over 100 µg in sebaceous fingerprints. In a targeted metabolomics approach, we found L-phenylalanine, L-(iso)-leucine and palmitoleic acid to differ significantly between male and female donors. Moreover, L-alanine levels were found to differ for donors who indicated to have used cannabis while L-proline levels differed for donors that consumed a large amount (> 15 units) of alcohol. The targeted amino acid and lipid compounds alone were, however, insufficient to successfully derive donor information from fingerprint composition.
In an untargeted approach, we constructed classification models for fingerprint type, gender and smoking habit. Based on the full-scan data, models could accurately discriminate between the fingerprint type (95.3% and 86.8% accuracy for LC-and GC-MS, respectively). Gender could only be predicted with moderate accuracy based on natural fingerprints analyzed by LC-MS (77.9%). Surprisingly, putatively annotated metabolites from the GMP degradation pathway serve as predictors in this model, which pose as interesting targets for further research. Everyday smoking habit was accurately predicted in both natural and eccrine fingerprints (90.4% and 90.2% accuracy, respectively). Smoking habits in the past 24-hours could be predicted with slightly lower accuracy. In these models, m/z values corresponding to nicotine and cotinine were the most important predictors. The results presented in this paper are promising leads for further investigations into retrieving donor information from the chemical composition of fingerprints. Further analysis is needed to validate the potential metabolic markers found.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. thank all of the following persons, who helped with either the preparations of the experiment or the collection of fingerprints during the festival: Craig Appleby, Marcel van Beest, Nick van de Berg, Elise van Diejen, Arjan van Dijke, Sander Ernst, Frits de Haan, Eef Herregodts, Tim Horeman, Anneke Koster, Roxy van de Langkruis, Arjo Loeve, Mathilde Scheulderman, Beth Selway, Elmarije van Straalen and Roel Zaremba. We wish to thank Stijn Oonk for assisting with part of the data analysis. WvH acknowledges a RAAK-PRO research grant (No. 2014-01-124PRO).