In silico and Experimental Assessments Applied to Preliminary Identification of New Illicit Substances Structures

This is an initial study using infrared (IR) in silico data as a standard database in preliminary method for new synthetic seized drugs.For this purpose, ten of the most common synthetic illicit seized substances on the Brazilian market were compared and computational chemistry was used as a tool for theoretical standard database. Infrared data from standard electronic library, experimental data from seized samples and data simulated by Density Functional Theory (DFT) were evaluated. The feasibility of the method was based on the degree of correlation of evaluated data. The results suggest that the computational methodology can be a viable way to analyze the structures of new synthetic drugs and obtain preliminary infrared profiles. Correlations data indicated a presumptive identification of the analyzed samples and it was also possible to observe a preliminary identification of drugs in five classes. However, it was observed that some synthetic cathinones and phenylethylamines were confused with amphetamines. Therefore, new studies must be developed to optimize the use of these data, because the simulated IR spectra can be advantageous to evaluate the profile of possible substances that may be synthesized in the future.


Introduction
The search for methods that gives the unequivocal identification of illicit substances is part of the routine of forensic laboratories. It represents a challenging work due to the constant emergence of new substances on the market. These new substances with structural alterations, but with effects similar to conventional drugs, such as cocaine, marijuana, amphetamine and others, have been developed in clandestine laboratories and reach users more easily. 1,2 Synthetic drugs can be sold in their pure form or as mixtures with various other substances, excipients or not. Often there is an association of several substances, mixing with other narcotics or with substances for random purposes that intensify the effects caused, but may also contain adulterants that neutralize collateral effects. These mixtures make it difficult to identify the present substances. This is because some adulterants may have a similar chemical structure, or a stronger detection signal, in addition to potentiating the damage caused to the health of users. [3][4][5][6][7] Examples of adulterants are lidocaine, a local anesthetic, which can be incorporated into mixtures due to its intoxicating effect; benzocaine, used for having an effect similar to that of cocaine; calcium carbonate, incorporated to neutralize stomach acid; hydroxyzine, incorporated for its sedative effect; diltiazem, neutralizes the cardio-stimulating effect, but is contraindicated in pregnancy; metoclopramide, counteracts collateral effects of nausea and vomiting and may induce extrapyramidal symptoms; oleamide has a behavior similar to Cannabis; vitamin E masking agent; microcrystalline cellulose, bulking agent and vitamin E acetate associated with lung lesions. 4,7 Synthetic substances can be classified by their chemical structure as synthetic cannabinoids, synthetic cathinones, phenylethylamines, piperazines, ketamine and phencyclidine-like substances, tryptamines, benzofurans, synthetic opioids (fentanyl analog and compounds with a different chemical structure) and benzodiazepines. 8 Among the most seized globally are the synthetic cannabinoids and cathinones, phenethylamines, fentanyl and their derivatives. 1 Several characterization techniques and preliminary verification tests are already well established in the field of forensic chemistry for conventional drugs. The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) presents a set of techniques that can be used for reliable and scientifically based identification. Techniques are grouped according to their highest potential level of selectivity. The group of techniques with the greatest potential for selectivity, Category A, are infrared spectroscopy, mass spectrometry, nuclear magnetic resonance spectroscopy, Raman spectroscopy and X-ray diffractometry. 9 Chemical identification of a drug is initially carried out by preliminary examination. In view of this, in Brazilian law, it was stipulated, through Law No. 11,343/06, the preparation of the Preliminary Report. 10 The preliminary examination is composed of simpler and faster techniques to materialize the object of the crime and support the arrest notice in flagrante delicto; it is also a guide for the definitive examination. However, there is no presumptive specific methodology for new synthetic substances, making their rapid identification difficult. A fact that can be associated with the complexity of structures is a matrix presentation form such as powder, pills, seals, hair, urine and blood, among others, making it difficult to develop a single reliable methodology. 1 The ease of structural modification in the clandestine laboratories and the difficulty in accessing illicit substances analytical standards also dificult the drugs identification. When analytical standards are available, they are very expensive and the delivery time is always long due to the need for import authorization from Drugs Control Institutions. For example, 25I-NBOMe and 25R-NBOH reference standards compounds are difficult to access and not marketed globally. 11 In response to the need to develop methods for identifying new synthetic drugs, many studies are being developed, which propose the adaptation and association of techniques for the identification of these substances, which often appear in the form of mixtures and as isomeric compounds, making their identification difficult by commonly used techniques such as gas-chromatography mass spectrometry (GC-MS) and Fourier transform infrared spectroscopy (FTIR). [12][13][14][15] Currently, there is an interest in the use of in silico methods as a way to assist in the identification of synthetic substances. 16,17 These methods offer advantages such as obtaining physicochemical, toxicological and spectral parameters of a large group of compounds in a short period, in addition to being of low cost. Not requiring standards, it can be used to predict compounds not yet synthesized, dispensing physical sampling, without the need for authorization from government regulatory agencies and contribute to the systematization of information. 18,19 This study proposed a computational methodology to evaluate supposed structural changes in the reference skeleton of the known illicit drugs, without the need to do experimental analysis and, after that, create a standard database to match with the experimental data of new seized synthetic drugs. It was chosen to study ten compounds that are part of the classes of drugs that are common in the market of illicit substances in the world, including Brazil. From infrared spectral profiles obtained from an electronic literature library, we used the signals from these ten samples (see Figure 1) to characterize base structures that are common in these illicit substances according to the class, verifying how the different organic functions in the samples influence the characteristic signals of these base structures. In silico calculations of infrared spectral profiles of the same compounds were made to compare with the infrared spectral profiles of the standard electronic library and the infrared spectral profiles of the same experimental seized samples as well to verify the method used and its correlation. In the future, after verifying the feasibility of this study it will be possible to use it to obtain the spectral profile of possible substances that may be synthesized and commercialized in the future, facilitating forensic work.

Literature data (LTD)
The literature data were obtained from the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) and RESPONSE electronic libraries, processed in Omnic software, version 7.3. The spectra raw data were converted in csv file format (range between 4000-650 cm -1 and 2 cm -1 resolution). The SWGDRUG and RESPONSE are communities that compile and make available free infrared libraries. The Drug Enforcement Administration's Special Testing and Research Laboratory generated the libraries using structurally confirmed reference materials. 20,21 Theoretical data (TD) Theoretical data were obtained by Density Functional Theory (DFT) calculations. All structures in Figure 1 were analyzed by Marvinsketch 5.2 22 and the predominant microstructures at neutral pH were selected because most of the seized samples are in salt form. From the neutral structures obtained, we used the Conformer-Rotamer Ensemble Sampling Tool (CREST) 23 to obtain the most stable conformation.
The pH 7 was used in a methanol solvation medium in samples S7, S8 and S10. The CREST conditions parameter was the GFN2-xTB method, at a temperature of 298.15 K with set optimization with strict limits of 6 kcal mol -1 . For these parameters, about 1000 conformations are generated for each sample, each geometry optimized by the GFN2-xTB method. Then, the conformations are analyzed by lengths, angles, dihedrals, and energies, with the conformation difference corresponding to 6 kcal mol -1 per the structure of the sample. After energy ranking for each structure, the five with the lowest energies are used as initial geometry for calculation at the DFT level.
From the structure obtained by the previous analysis of pH and conformations, calculations of geometric and frequency optimizations were made using the Gaussian 09. 24 For these calculations we used the (DFT) 25,26 B3LYP/6-311+g(d,p) level of theory. 27,28 They are commonly used models in the literature and with the most satisfactory results, in addition to aligning with the limited computational capacity available in the cluster used. 19,29,30 The optimized structures are shown in the Supplementary Information (SI) section, in Figure S1. The frequency data were used to obtain the infrared profile considering a scale factor of 0.967, 31 then compared with the experimental values and spectra. For sample S8, the optimization was calculated considering LANL2DZ for iodine atom, 32 and 6-311+g(d,p) for other atoms of molecule.

Data analysis
Qualitative and quantitative comparisons of the frequency and infrared signals of the molecules were made for the experimental data (ED), literature data (LTD) and theoretical data (TD). Qualitative evaluations were performed using specific bands that characterize the main functionalities of a molecular skeleton (base structure), and compared with the bands obtained by the computational methodology. The presence of the characteristic bands in ED, LTD and TD was verified since even in the experimental data these bands can be difficult to identify because seized samples can be mixtures.

Statistics data analysis
The theorical spectra were imported in SYNSPEC software 33 to convert the "stick" spectra produced by computational methods into synthetic spectra with realistic linewidths. Gaussian 24 lineshapes, with linewith of 20 cm -1 , wavelength between 4000-650 cm -1 , with spacing of point of 1 cm 1 , were employed. The resolution of LTD spectra was enhanced applying interpolating by Fourier back transform, using package 'spectral' version 2.0.
All spectra ED, LTD and TD were imported to software RStudio, version 1.4.1717 34 for statistical analyses. Savitzkye-Golay smoothing filter was applied to baseline treatments before the determination of Pearson's correlations (R 2 ), with span of 11 cm -1 and polynomial of 4. For the determination of R 2 between ED and LTD and between ED × TD, first derivative was used; to determine R² between TD and LTD only smoothing was applied because at these spectra the samples are in pure form. The calculation employed the RamanMP package. 35

Results and Discussion
In the context of forensic analysis, the samples of Figure 1, object of this study, were evaluated considering  36 Therefore, to evaluate methods that will be developed and applied in the interest of digitizing and optimizing the analysis of detection of possible illicit substances, some calculations involving the DFT were applied to compare the data of experimental IR bands and the simulated IR for these similar structures. This line of evaluation, through comparative methods of IR with computational methodologies, is already reported in the literature 30,37,38 , intending to characterize reactive sites. In some samples, it was found similar structures. Some characteristic points of these structures are identified in Table 2. Thus, it is possible to verify the effect of different substituents on these similar structures (skeleton of structures). The bands identified in Table  2 were used to evaluate these effects in wavenumbers expressed in Tables 3-6, which shows LTD, ED and TD, respectively. As a support for evaluating the base structure of illicit substances, the tables were used to verify how some characteristic bands of skeletal structures of different classes are influenced by neighborhood. As the unequivocal identification of these substances is done with complete structure, there is some difficulty in establishing the identification of these compounds because they have similar structures and could cause the same effects. General structure of the classes of synthetic cannabinoid, synthetic cathinones and phenylethylamines are shown in Figure 2.
The S1 sample is a synthetic cannabinoid, with B, C, and D bands (see Table 3). It shows bands between 1600-1585 cm -1 and 1500-1400 cm -1 of C=C stretch in aromatic rings, strong bands that can be attributed to C-H folding at 746.4 cm -1 related to the ortho-disubstituted ring and at 824 cm -1 corresponding to two adjacent hydrogen atoms in substituted naphthalene derivatives, 40 an intense band near 1515 cm -1 attributed to band of carbonyl and band near 1375 cm -1 for aromatic amine. Sample S1 has aromatic substitutions on both sides of the carbonyl, so this indicates that this type of substitution, aromatic vicinal to C=O, brings the carbonyl sign to lower values. The effect of resonance at S1 results in an absorption at a lower wavenumber. It suffers a double effect, which is caused by the conjugation of the carbonyl with the C=C bond and with the electrons of the lone pair of the nitrogen atom which leads to a reduction of the double bond character in the C=O group. In S1 the effect is caused by the conjugation of the carbonyl with the indole group and naphthalene.
Synthetic cathinones present A, B, C, D and E bands (see Table 4). Band A of aryl ethers was observed only for S2, S3 and S5, these samples have 1,3-benzodioxole group in common. Bands B, C, D and E are present in all samples of synthetic cathinones. S2, S3, S5 present bands C between 882-735, 888-802 and 870-751 cm -1 , respectively, that can be attributed to the trisubstituted ring C-H fold. S4 presented band C at 831 cm -1 attributed to the out-plane C-H bending of para-disubstituted ring and S6 presented bands C at 750 cm -1 attributed to the out-plane C-H bending of monosubstituted ring. 41 The difference between the values of the D band is related to the different chemical environments of the samples. The samples with the same value have a toluene group and a benzene ring in the aromatic part, respectively at S4 and S6. At S6, the secondary amino group has an ethyl group and at the beta carbon to the aromatic ring, a butane group. The difference is within the range of 1600 cm -1 but using the full range. Thus, some of the changes related to the functional group and the displacement caused by it are: samples S2 and S5 have the same carbonyl side, with Table 2. Band identification to evaluate the structures of the molecules in Figure 1 Band Band identification  This difference is represented by the D band with a very discrete difference in signal 1680 cm -1 for S2 and 1670 cm -1 for S5. This indicates that the methyl position causes low variation in D. The main difference between S2 and S3 is a secondary amine and a tertiary amine; both are in the beta to carbonyl position. It is a discrete signal in general, but within the analyzed range it is considerable at 1680 cm -1 for S2 and 1668 cm -1 for S3. In this case, it is difficult to infer the substituting factor that acts, since by the previous comparison we have already proved the influence of the alkane groups in the position between C=O and amines. But, S3 has a tertiary amine and two of the substitutions on the nitrogen are with methyl. In S2, the secondary amine has an ethyl group. With a small variation of observed signs, it is more tendentious to believe that the influence is in the alkane group between C=O and amines. This is based on the previous analysis done with S2 and S5.
The sample of amphetamine, S7, has a structure similar to synthetic cathinones. It shows bands A, B, C, and E (see Table 4). The S7 has the 1,3-benzodioxole group in common with S2, S3 and S5 but does not have the C=O group conjugated to the aromatic ring. This modification in the functional skeleton brought the absorption of aryl ethers to lower wavenumber. Structure S7 also presents bands between 895-797 cm -1 which can be attributed to the trisubstituted ring C-H fold.
Phenylethylamines samples present A, B, C, E and F bands (see Table 5). Samples S8 and S9 have similar structures, showing symmetrical and asymmetric stretches of aryl-alkyl-ethers C-O-C between 1075-1020 cm -1 , and present 1275 and 1200 cm -1 beside presented bands near 750 and 864 cm -1 which can be attributed to the out-plane C-H bending of ortho-disubstituted and tetra substituted rings. These compounds presented bands responsible for C-X (aromatic) (X = Cl, I) in the benzene rings. In sample S8, the presence of iodine was characterized by the band Table 3. Bands defined in Table 2 Table 4. Bands defined in Table 2  1210 cm -1 , which corresponds to axial deformation C-I, in agreement with the literature. 41 On the other hand, sample S9 presents a band at 1059 cm -1 , which can be related to the C-Cl stretching of aryl chloride. 40,41 Band between 1600-1585 cm -1 associated with the C=C stretch bands in aromatic rings was not observed in ED of S8, due to the presence of noise in the region. Psychedelic drug sample shows bands B, C, D and F (see Table 6). S10 shows bands between 1500-1400 cm -1 of C=C stretch in aromatic rings, between 730-685 cm -1 , that can be attributed to the out-plane C-H bending of 1,2,3-trisubstituted ring, near 1505 cm -1 due stretch C-N and band of C=O of tertiary amide. The difference in the absorption of the C=O group observed can be attributed to the formation of a hydrogen bond with the solvent used, methanol, which influences the position of the band and the theoretical data are free of interference.
The Tables 3-6 show the comparative relationships of the wavenumber values for the bands, obtained by computational, library and experimental means. It is noted that the computational methodology, used to employ the DFT, shows data similar to the LTD and ED. Figure 3 shows LTD, TD and ED spectra. The spectra studied showed similarities in the spectral profile, but some differences observed is due to the way in which the spectra were acquired, as expected. Those acquired experimentally present overlapping bands, noises, ATR crystal bands, and characteristic bands of H 2 O and CO 2 bands. Computational spectra did not show these interferences.
Correlations between sample statistic data are presented in Tables 7, 8 and 9. R 2 values are the main factor that indicates how close the data are. Table 7 resumes the study of correlations data about LTD vs. ED analyzes. Through the correlation values between the sample data, it was possible to observe that in the LTD vs. ED analyzes for the same sample, they present better correlation when compared to another sample's correlation. That is, it was possible to identify and discriminate the samples within the classes and individually. This result was expected since the spectra available in the electronic libraries used were acquired in the same way as those acquired from the seized samples studied. However, the results differ somewhat due to the seized samples having excipients and the library samples being purified. Therefore, the correlation factor distanced from 1, according to the degree of impurities of the seized samples. Table 8 shows the correlation between LTD vs. TD. It was possible to observe that the correlation values were lower than LTD vs. ED and only 50% of the samples showed better correlation for data from the same sample. For example, sample S3 showed a higher correlation value between the LTD of S3 and the TD of S7, indicating a misunderstanding of sample S3, identified as MDMA. However, it was observed a possible identification by classes. The samples can be grouped into classes such as synthetic cannabinoids (SC), synthetic cathinones (CT), amphetamines (A), phenylethylamine (Ph) and psychedelics drug (PS). The only exception observed was for samples S3, S8 and S9. Substances on Table 8 were classified as amphetamine rather than synthetic cathinone and phenylethylamine, respectively. Table 5. Bands defined in Table 2 Table 6. Bands defined in Table 2  In the correlation between ED and TD, shown in Table 9, it was observed only three correct identifications of classes and only two samples were correctly identified.
It was observed that some synthetic cathinones and phenylethylamines were confused with amphetamines, as observed for S3, S8 and S9 in correlation LTD vs. TD. This   confusion of cathinones with amphetamines is expected since they are structural analogues and has been reported in other studies. 16,19 For phenylethylamines, there is a common functional skeleton, which leads to a preliminary false identification. These results indicate that it will be necessary to continue studies to investigate a methodology that can improve the correlation between ED and TD, either in a pretreatment of the seized samples before the test or by chemometric methods. Some factors may contribute to false positive results, such as the presence of interference and noise. Synthetic drugs can be marketed in the form of mixtures, which make it difficult to identify the seized illicit substances, adulterants present in the samples may have a similar structure or a stronger detection signal. [3][4][5][6][7] Differences between experimental and theoretical spectra can also be related to how they were obtained, ED were obtained in ATR while TD were obtained in transmission. The main difference between ATR and broadcast spectra is the relative strength of the bands. Generally, ATR spectra show enhanced band intensities at longer wavelengths. In ATR measurements, the optical path equivalent is the effective depth of penetration of the radiation into the sample. This depth of penetration is proportional to the wavelength. The ATR correction scales the intensities by a factor inversely proportional to the wavelength, making them more similar to those in the broadcast spectra.
ATR correction also allows improving the contact between the sample and the ATR crystal, if the surface of the sample is not optically flat, there will be an air gap between the crystal and the sample in some places. The gap represents a greater proportion of the depth of penetration affecting the intensity of the bands, this effect is greater at shorter wavelengths. However, the reduction in intensities caused by slack is not simply proportional to the wavelength. The Contact function attempts to provide a fix for this effect. As the air gap is not uniform, this second correction term should be considered as an empirical fit.
The transmitted spectra show a multiplicative scale. If a film of thickness of a material has a transmission of 90% at a given position, a film of thickness 2n will not have a transmission of 80%, but 90% of the transmission of 90%, that is, 81% transmission. When using a transmittance scale, the noise level is independent of the energy level, so the influence of noise on the spectrum will be greater with low transmittance levels than with high transmittance levels. Thus, spectra in transmission tend to present differences at each reading performed due to the difference in film homogeneity, in addition to suffering interference from the environment such as CO 2 absorption.
Another factor that contributed to the observation of differences between experimental and theoretical data is the way in which the TD were calculated. Experimental spectra present data in salt form and theoretical spectra present spectra of structures that are not in salt form ( Figure S1 presents the structures considered for the calculation of TD). Ammonium salts show a broad and intense band due to axial deformation of the N-H bond between 3300-3030 cm -1 and a combination band in the region 2000-1709 cm -1 . 42 In a preliminary method is expected false positive cases. For example, the Scott test, a colorimetric test for cocaine identification, applied all over the world has reported many cases of false positive. 43 Therefore, this study may be feasible in the preliminary identification of new synthetic drugs.

Conclusions
Through the spectral data, it was possible to observe how some of the functional groups affect the structural framework, taking the wavenumbers to higher or lower values, depending on the substitution in the molecule. Thus, it was possible to observe the differences of samples that have some molecular structure in common (skeleton). This study also indicated that the theoretical data obtained can be used to assist in the preliminary identification of drug samples. It was possible to observe that the results of correlation between LTD vs. ED presented the correct identification for all samples and 50% of the data presented higher correlation values between LTD vs. TD for same sample, indicating a presumptive identification of the analyzed sample. That may indicate that these data can be used as a standard. It was also possible to observe a preliminary identification of drug classes, like SC, CT, A, Ph and PS. LTD vs. TD correlation showed only three errors between structural analogue compounds. However, ED vs. TD, showed only three correct identification of classes and only two samples were correctly identified. Most of the errors occurred in the misidentification of cathinones as amphetamines because they are structural analogous compounds. Phenylethylamines have also been confused with amphetamines because they have common skeletons. Therefore, new studies must be developed to optimize the use of these data, because the simulated IR spectra can be advantageous to evaluate the profile of possible substances that may be synthesized in the future. The use of TD as a standard is an advantage because the access to standard materials can often be unfeasible during the development of studies due to bureaucracy and high market values. Thus, the computational technique can be useful to create a database for new synthetic drugs in forensic laboratories, since theoretical data can be used for comparison with experimental data from seized samples helping to identify compounds preliminarily.

Supplementary Information
Supplementary information (experimental and characterization details) is available free of charge at http:// jbcs.sbq.org.br as PDF file.