Identification of Synthetic Drugs on Seized Blotter Papers Using ATR-FTIR and PLS-DA: Routine Application in a Forensic Laboratory

Blotter papers seizures containing synthetic drugs have intensified over the last decades. These drugs were originally conceived as “legal” alternatives to traditional illicit drugs, designed to mimic their effects and circumvent control agencies. Reference methods for determining these substances on blotter papers are based on chromatographic techniques using mass spectrometry detection. However, these procedures are destructive, expensive, and time consuming. Some compounds are also thermolabile and not suitable for regular gas chromatography analyses. In this paper, two multivariate models were presented and incorporated in the routine of a forensic laboratory as a screening method. They were developed and validated using a representative dataset of 158 seizures analyzed by attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR) and a partial least squares-discriminant analysis (PLS-DA) model. The first model (model A) discriminates between samples with and without different types of drugs, and the second one (model B) discriminates between samples containing NBOMe and NBOH, two N-benzyl 2,5-dimethoxy substituted phenethylamine commonly incorporated into blotter papers. The proposed method is fast, non-destructive, and requires no sample preparation. Both models showed reliable results (misidentification errors < 10%), presented good results in a real forensic laboratory routine, and can be updated to include new drugs.


Introduction
New psychoactive substances (NPS), "legal highs" or "design drugs" comprise a group of modified drugs developed by changing molecular structures of known illegal substances such as cannabis, cocaine, lysergic acid diethylamide (LSD), and ecstasy. [1][2][3] As reported by United Nations Office on Drugs and Crime (UNODC), 4 a total of 950 NPS were reported up to 2019, making NPS an international issue, presenting threats to health, well-being, and social security.
Many intoxication cases and fatalities have been reported for NPS. [5][6][7][8][9][10] In addition, a great number of people who consume them is unaware of their presence and toxicological effects. Therefore, considering the unprecedent increase in both consumption and diversifying range of NPS, the investment in scientific research and international cooperation are imperative for improving law enforcement capacities to disassemble organized criminal groups and controlling drug trafficking. 3 P o l i c e d e p a r t m e n t s a r e h a n d l i n g w i t h a n impressive rise of blotter papers seizures containing N-benzylphenethylamines derivatives, such as NBOMe and NBOH. 25R-NBOMe (2-(4-R-2,5-dimethoxyphenyl)-N-[ (2-methoxyphenyl)methyl]ethanamine) and 25R-NBOH (2-((2-(4-R-2,5-dimethoxy phenyl)ethylamino)methyl) phenol), where R could be either Cl, Br, I or an organic substituent, are two synthetic drug families included in phenethylamines class. 11 Due to the relatively easy synthesis of new compounds by changing small substituents in the molecular structure of popular psychoactive substances, many "new drugs" have been "legally" commercialized. During some years many phenethylamines were not listed on Brazil's drug regulatory annexes. However, in the update that occurred on December 4 th , 2019, the RDC No. 325 by ANVISA 12 included a new item of structural classes for phenethylamines in the drug regulatory annexes. Nevertheless, different of the NBOMe, NBOH are still considered NPS, since they are not internationally controlled by the United Nations 1961 or 1971 conventions.
In 2014, 25I-NBOH, a substance belonging to the NBOH family, was identified for the first time in seized material in Brazil. 13 This compound is a thermolabile molecule that fragments into 2C-I (2-(4-iodo-2,5-dimethoxyphenyl) ethan-1-amine) and a smaller fragment, an ortho-hydroxy allyl phenyl ether, when analyzed by gas chromatographymass spectrometry (GC-MS). 14 Depending on the GC-MS method used, it could lead to misidentification of 25I-NBOH as 2C-I. 15 However, if 2C-I was actually present in these blotter papers its concentration would be considerably small to cause biological effects in the organism. This result motivated experts to carry out more studies with a modified GC-MS technique, liquid chromatography-quadrupole time-of-flight-mass spectrometry (LC-QTOF-MS), attenuated total reflectance-Fourier transform infrared (ATR-FTIR) and nuclear magnetic resonance (NMR) analyses, which enable the molecular characterization of the novel compound. 13,14 Routine analysis for seized blotter papers is commonly performed based on gas or liquid chromatography coupled to mass spectrometry detection (LC-MS and GC-MS). [13][14][15] However, as mentioned before, 25R-NBOH needs a customized GC-MS method or LC-QTOF-MS method to avoid its misidentification as 2C-R. 13,14 In this way, an alternative method that is fast, non-destructive, and convenient for screening discrimination of samples before the chromatographic analysis is required by forensic laboratories. In the forensic area, spectroscopy techniques are suitable for alternative and complementary analyses by their non-destructive characteristic, preserving the seized material for future analyses. Besides, middle infrared (MIR) spectroscopy is considered a "technique A" by the SWGDRUG recommendations 16 by its selectivity and high quality chemical structural information.
Previous studies proposed the discrimination of drugs on blotter papers using multivariate methods and MIR. Coelho Neto 17 and Pereira et al. 18 applied multivariate models and ATR-FTIR as a screening for synthetic drugs on blotter papers. Both authors have proposed models to discriminate some of the most seized synthetic drugs (NBOMe, 2C-H (2,5-dimethoxyphenethylamine), LSD, MAL (methallylescaline) and blank papers). Even though both models presented high general efficiency rates, the discrimination of LSD was not effective due to the lower concentration of this drug and the restrictions of the limit of detection of the method. In addition, the validation of most classes was restricted to a small number of samples, which might represent a serious problem considering the variation observed in seized drug samples.
Magalhães et al. 11 compared partial least squaresdiscriminant analysis (PLS-DA) and soft independent modeling for class analysis (SIMCA) models based on a handheld near infrared (NIR) spectrometer analysis for screening NBOMe and NBOH on seized blotter papers. The method was developed using a two-stage approach. The first one discriminated between samples containing drugs and those without, and the second stage discriminated between NBOMe and NBOH. PLS-DA models provided higher efficiency rates, while SIMCA models showed a lower efficiency, but good results on dealing with samples containing drugs that were not included in the training phase. However, it is important to point out that NIR spectra are usually less sensitive and characteristic than the MIR spectra due to overlap absorptions mainly regarding overtones and combinations of vibrational modes. 19,20 Moreover, MIR spectrometers are most commonly found in forensic laboratories than NIR spectrometers.
Considering the results already present in the literature, 17,18 indicating the high potential use of MIR for the identification of synthetic drugs in blotter papers, this work focused on the development, validation, and implementation of an easy-to-operate screening method based on ATR-FTIR analysis for identification of synthetic drugs and discrimination between NBOMe and NBOH in forensic laboratories. Further, results obtained with ATR-FTIR and chemometrics were compared to those obtained with reference chromatographic methods. Following the two-stage approach of our previous work, 11 the first model, model A, was dedicated to discriminate between samples containing drugs and not. Model B discriminated between samples containing NBOMe and NBOH. The method was developed and validated with a representative number of samples, seizures, replicates, and is based on a fast spectral acquisition using a technique present in most forensic laboratories in Brazil and abroad, a fact that enables the application and possible transference to other forensic laboratories throughout the world.

Experimental
Apparatus and software Spectra acquisition was performed using an ATR-FTIR spectrometer (Alpha II, Bruker) equipped with DTGS (deuterated triglycine sulfate) IR detector and a single reflection ATR accessory using a diamond crystal. Data analysis and multivariate models were developed by the software OPUS (version 7.5, Bruker), using chemometric tools available in the instrument software (i.e., Quant2 for model development and Quant2 Analysis for evaluation of independent tests in the models) and a spreadsheet (Excel, Microsoft Office, version 365). 21

Samples and spectra acquisition
The 158 samples used in this work were constituted of distinct blotter papers seizured from investigations coordinated by the Civil Police of the Federal District (PCDF) in Brazil between 2009 and 2018. 25B-NBOMe, 25C-NBOMe, 25I-NBOMe, DOC (2,5-dimethoxy-4-chloroamphetamine), LSD, 25B-NBOH, 25I-NBOH, and 25E-NBOH are detailed in Table 1. The complete characterization of those samples were previously carried out by GC-MS and LC-QTOF-MS, following published methods. 13,15 As mentioned before, currently the NBOMe family is no longer considered an NPS, since 25B-NBOMe, 25C-NBOMe and 25I-NBOMe were included in the Schedule 1 of the Convention on Psychotropic Substances of 1971 at the 10 th meeting of the Commission on Narcotic Drugs, on 13 March 2015. 4 On the other hand, the NBOH family is still considered an NPS. Therefore, to avoid any confusion, NBOMe, NBOH and other controlled substances analyzed in this paper will be denoted by the general term of synthetic drugs (SYD).
Perforated drug-free blotter papers and filter papers were acquired from online stores and local market shops. Seven sheets of blotter papers, five with printed artwork, and two blanks, plus two filter paper sheets (Unifil, 85 g m −2 ) were used to populate the drug-free class. The absence of drugs was properly verified by GC-MS analysis. Model A was designed to discriminate samples containing all types of synthetic drugs available for this work (described before) from the drug-free class. On the other hand, model B was intended to discriminate between synthetic drugs classified as NBOMe and NBOH. For this purpose, and due to their molecular similarity, all NBOMe compounds were modeled together as a class named NBOMe, and, likewise, all NBOH compounds were modeled together as NBOH. Moreover, following the chronological order of spectral acquisition, two-thirds of the available samples from each class were selected to the training set and the remaining one third to the validation set. Table 1 summarizes the number and content of the samples used in the training and validation phases.
Our previous work 11 has described the interference of pigments on the artwork side, therefore all spectra acquisition was done on the opposite side to ensure a better signal and discriminatory capability. Before sample analysis, a background scan was performed to minimize atmospheric interference and instrumental noise. 16 Measures were taken at 4 cm −1 resolution and 24 scans per spectrum. To obtain a representative sampling, five infrared spectra were collected from different spots of each sample containing SYD, resulting in a total of 790 spectra. Considering the limitation of the number of independent sheets in the blank class, 25 spectra were measured from different spots in each sheet (9 sheets × 25 spectra per sheet = 225 spectra). Besides, considering that the distribution of drugs and possible interferences might be different in each sample spot, all replicates were used to develop the models, instead of using the average spectra of the replicates in the same sample. This procedure was applied to increase the spectral variability of the dataset, which could, at first, result in a more robust method. 22 Model development PLS-DA is an important discriminant model that became popular over the years and proved to be able to provide very good discrimination results. Basically, this model performs a regression using all variables in relation to a binary vector that splits samples between a target class with a label 1 and all the other classes gathered in a single group with label 0. In the case of three or more classes, they are usually modeled as one group against all the others in a PLS2-DA approach. 23,24 PLS-DA is based on PLS multivariate calibration, which makes the simultaneous decomposition of matrix X (n,m) (instrumental signals), containing n samples and m variables, and a vector y, which contains the numerical binary identification of each sample according to its class. New axes called latent variables set a coordinate system that gathers the most relevant information present in original variables. 23,24 Equations 1 and 2 show the decomposition of X and y into this new coordinate system. (1) where t a are the scores, p a and q a are the loadings, E and f are the residual arrays for X and y, respectively, and A is the number of latent variables. The PLS-DA models were developed in OPUS software environment, which performs PLS regression and not discrimination. Values of 0 and 1 were attributed to the dependent variable (y) to specify the different classes before model development. First of all, the training dataset was loaded in Quant2 interface and several PLS-DA models were developed using the software optimization tool, which automatically searches for the best preprocessing, spectral region and number of latent variables. At the end of the optimization process models were ranked by crescent order of their root mean square errors of cross-validation (RMSECV). The best model, based on low values of RMSECV and the lower number of latent variables, were chosen for running outliers exclusion, validation phase, and determination of figures of merit.

Identification and exclusion of outliers
To implement a more user-friendly method for routine forensic applications, outlier identification was performed using the three diagnostics tests. Two of them (anomalous spectral residuals and Mahalanobis distance) are available in the Quant2/OPUS software, while the third test, used for detection of extreme values for the class value estimation, were calculated using a spreadsheet, as described below.
The test for samples with anomalous spectral residuals was performed by calculating an F value for each sample i (F cal,i ): 25 (3) where e is the residual vector of a sample i determined by equation 1, n is the number of spectra of X and j denote all samples different than i. The probability associated with F cal was determined considering a one-tailed test with 1 and n − 1 degrees of freedom. If a sample probability value is larger than 0.99, it is identified as an outlier. Note that the residual vectors used in equation 3 were determined directly by equation 1 for both training and validation samples without the application of cross-validation.
Outlier samples were characterized by the distance from the center of the distribution of the training set based on the Mahalanobis distance (D i ). The Mahalanobis distance limit (D limit ) is determined based on the distribution of all training samples. For this purpose, the mean value and the standard deviation of the Mahalanobis distance for the training samples are calculated. Assuming that there is a normal distribution, a one-sided D limit was defined, which covers a probability of 0.99. Samples presenting D i > D limit are also identified as outliers. 25 The third test considers the class values that exceed their confidence limits according to a t-Student distribution with 99% confidence. Details of this calculation can be found in our previous work. 11,26 After the development of an initial model, the three criteria described above were applied, the outliers were identified, excluded from the training set and the model was rebuilt. This procedure was performed up to two times for the training phase and once for the validation phase.

Validation
For assessement of the prediction capability of the method some figures of merit such as: false positive/ negative and efficiency rates were determined. Thereby, considering that five replicates are available for each sample, a sample was considered as a false positive/ negative error if more than half of the available replicates were wrongly predicted.
where FN and FP are the number of samples predicted as false negative and positive, respectively; TN and TP are the number of samples predicted as true negative and positive, respectively; FNR, FPR and EFR are the false negative, false positive, and efficiency rates, respectively.

Application in routine analysis
Nineteen recent seized samples were used for a blind test of the models A and B. Five spectra were acquired for each sample and these spectra were tested in the developed models.

Results and Discussion
The raw infrared spectra of all NBOMe and NBOH training samples used in model B are exhibited in Figure 1a. The baseline shift is a current and unwanted trend in spectroscopy, and it is also observed among the spectra, which justifies the application of a preprocessing method. Furthermore, Figure 1a also shows that there are 10 spectra from 2500 to 3000 cm −1 , which behave clearly distinct in relation to the other ones and thus, they were excluded from the training set. Posterior verification of these samples revealed contamination with an unidentified pink powder, which justifies their outlier behavior. Besides, the mean overlapped spectrum of each class is displayed in Figure 1b. Figure 1c highlights the main differences in each class by comparing their spectral signatures in the fingerprint region (wavenumber = 800 to 1800 cm −1 ). According to Figure 1c, a slightly intense band situated at 1250 cm −1 (blue line) is associated to asymmetric C−O−C vibrations of NBOMe compounds and with the substitution of OCH 3 by OH in the position 2 of the N-benzyl group, resulting in an NBOH molecule, it is possible to identify a decrease of this peak and the appearance of a double peak in NBOH spectrum (red line). 13 However, due to the small concentration of these drugs and the overlap with other compounds present in the samples, this spectral band was not visible in most seized samples. In fact, most of the spectral signals are probably due to the paper absorption. The high paper absorption, the signal overlapping and the low concentration of the SYD in the seized samples prevent the application of univariate approaches and justify the use of multivariate analysis.
In order to minimize the error rates, second derivative and mean centering were chosen as the most suitable preprocessing methods, according to the models proposed by Quant2/OPUS software automatic search. The second derivative was used to minimize additive and multiplicative shifts among the spectra and highlight slight differences in the signal. The spectral range for models A and B were also chosen based on the Quant2/OPUS automatic search. For model A, the spectral region was delimited from 2300 to 1280 cm −1 , while for model B the spectral region was composed of two intervals: from 4000 to 2640 cm −1 and from 2300 to 1620 cm −1 .
The first model, model A, was developed with 13 latent variables. In this model, the class value 1 was attributed to the SYD samples and the class value 0 to the drug-free samples. The heterogeneity of the dataset, comprising of samples from different seizures and years, containing different drug classes, in varying concentrations and variation in the paper aspect have contributed for the model to require a high number of latent variables. At this condition, the explained variance was higher than 95 and 85% for X and y, respectively. In the training phase, 32 spectra were classified as outliers after two exclusion rounds. These spectra refer to six different samples containing 25I-NBOH and LSD. Figure 2a shows Mahalanobis distance versus spectral residue probability after the exclusion of the training outliers. In the validation phase, 10 spectra were classified as outliers in an unique exclusion round. As presented in Figure 2 and Table 2, five spectra of the validation set were identified as outliers due to the presence of both Mahalanobis distance and spectral residue probability above their respective limits, while the other five presented estimated class values above the upper limit for the SYD class (class 1). Additionally, in the validation phase, two samples from the SYD class were wrongly classified as drug-free (false negative for this class), providing a false negative rate of 3.9%. Thus, model A resulted in an efficiency rate of 96.1%. Despite the good results and the high number of samples, the increase of the training set could probably decrease FNR and FPR due to the increase of the representativeness of models, which can be achieved with new seized samples and drug-free samples.
Model B was developed with 9 latent variables. In this case, value 1 was attributed to NBOH class and  value 0 to NBOMe class. Figure 3a shows Mahalanobis distance versus spectral residue probability for both training and validation sets, being that the outliers of the training set were excluded in model optimization. In this model, 31 spectra were classified as outliers in two exclusion rounds, and these spectra were related to five different samples containing 25I-NBOH. In most cases, for both models, all replicates of the sample were excluded as outliers, and not just a few replicates. An explanation for this behavior is that some samples are significantly different from the dataset regarding the spectra and physical aspects, such as paper thickness or pigment penetration in the blotter papers. Table 2 shows the number of outliers detected by each criterion. The sum of the samples identified in each criterion cannot be interpreted as the total number of outliers since the same spectrum can be classified as an outlier in more than one test. It can also be observed that some training spectra exceed the confidence limits for the class values even after two exclusion rounds. This phenomenon was observed in both models, A and B. The model optimization procedure was stopped after two exclusion rounds to prevent a high number of excluded samples, which could reduce the representativeness of the dataset. Figure 3b shows that, based on the estimated class values and the discrimination threshold, no false positive or false negative samples were observed in the training phase, leading to an efficiency rate of 100%. In the validation phase, only 1 spectrum belonging to the NBOMe class was identified as an outlier by the Mahalanobis distance and the limits for the class values. Figure 3b shows that seven spectra (an entire sample and just two replicates of another sample) presented class values lower than the discrimination threshold. Therefore, only one sample was classified as a false positive error, representing 3.6% of FPR for NBOMe class (or 3.6% of FNR for the NBOH class). Table 3 summarizes the estimated figures of merits for both models. Results show that the proposed method can be used as a screening method for SYD detection and identification in blotter papers in forensic laboratories.
After the validation of the method, the routine application was accomplished with the analysis of 19 seizures/samples for models A and B. This evaluation was performed with new seizures done by Civil Police of Federal District in Brazil. As shown in Table 4, it was obtained a correct identification of 94.7% of the samples with model A. One sample was not correctly classified, since model A considered it an outlier, while the correct classification should be at the drug-free class, according to GC-MS and LC-QTOF-MS results. The absence of false negative errors and the agreement between replicates suggest that the five spectra acquired per sample was enough to ensure a representative sampling of the seized blotter papers.
For model B, samples 18 and 19 were excluded from the test set once they were classified in the drugfree class or outlier, respectively, in model A. From the remaining 17 samples correct identification was achieved for 70.6% (12 samples). Two samples were deemed inconclusive (11.8%) and three were misclassified (17.6%), as shown in Table 5. As can be expected, most of the classification errors in model B (2 samples) were related to the presence of synthetic drugs not included in modeled classes. This behavior was also observed by Magalhães et al. 11 while using NIR and was reported by Oliveri et al. 27 as a limitation on PLS-DA models. It is expected that samples containing drugs not included in the modeled classes would present a Mahalanobis distance    would be 76.9, 15.4 and 7.7% for correct identification, inconclusive and misidentification results, respectively. Therefore, misclassification rate is lower than 10% for seizures containing drugs included in validation and training sets, which agrees with the results obtained during the validation phase.

Conclusions
The results show that the proposed method, developed with MIR and two-stage PLS-DA models, enables a fast and non-destructive analysis of blotter papers containing some of the common synthetic drugs seized by the Brazilian law enforcement. Most calculations for the PLS-DA models were performed in a friendly interface provided by the equipment's software, being that only one of the outlier tests and the discrimination threshold was necessary to be implemented in a simple spreadsheet.
The models were developed with a large and representative number of samples and seizures, which enhance the confidence in the results and the coverage of the method. Even so, in view of the continuous evolution of the drug scenario, where new drugs are added or substitute usual ones, the model (training dataset) should be periodically updated and confirmation analysis should be performed with reference methods, as required by the SWGDRUG recommendations.
The proposed method was developed by using a common spectroscopy technique available in most forensic laboratories. The simple protocol enables the training of an analyst in a matter of a few hours. Both models showed reliable results (misidentification errors < 10%), can be updated reflecting the ever-evolving drug scenario, and presented good results with real seized samples.