UvA-DARE (Digital Academic Repository) On-site illicit-drug detection with an integrated near-infrared spectral sensor

drugs-of-abuse detection using a 1.8 × 2.2 mm 2 multipixel near-infrared (NIR) spectral sensor that potentially can be integrated into a smartphone. This integrated sensor, based on an InGaAs-on-silicon technology, exploits an array of resonant-cavity enhanced photodetectors without any moving parts. A 100% correct classification of 11 common illicit drugs, pharmaceuticals and adulterants was achieved by chemometric modelling of the response of 15 wavelength-specific pixels. The performance on actual forensic casework was investigated on 246 cocaine-suspected powders and 39 MDMA-suspected ecstasy tablets yielding an over 90% correct classification in both cases. These findings show that presumptive drug testing by miniaturized spectral sensors is a promising development ultimately paving the way for a fully integrated drug-sensor in mobile communication devices used by law enforcement.


Introduction
Drug abuse and drug production are on a rise reaching an estimated 269 million drug users globally in 2018, a 30% increase compared to the 2009 figures.In addition, the estimated global manufacture of cocaine also reached an all-time high of 1723 metric tons in 2020 [1,2].Consequently, a rising trend in seizures of cocaine, methamphetamine, MDMA and opioids is reported by the United Nations Office on Drugs and Crime (UNODC) [1].This sets pressure on both police and border security officers that are confronted with an increased workload.To effectively steer the forensic process, information on the identity of encountered suspected materials is needed rapidly and ideally directly on-site.For many years, investigation officers use chemical spot tests for presumptive drug testing.In these so-called colorimetric tests, a color can be observed after reaction with a specific drug, such as a blue color for the cobalt(II)thiocyanate complex with cocaine in the Scott test.Unfortunately, color tests are only available for a small range of drugs, are prone to false positive reactions, require manual handling of the suspect material and require single-use consumables and chemicals thus impacting the environment [3].Gas chromatographymass spectrometry (GC-MS) currently is the default technique for unambiguous identification of common drugs-of-abuse in forensic samples [4,5].However, this reliable technique is expensive, requires experienced operators, is not mobile due to the required stable vacuum systems and is thus intended for use in dedicated laboratory facilities rather than on-site testing.
In recent years, various portable platforms for on-site forensic drug detection have been developed and evaluated.Handheld Raman devices are capable of detecting a wide range of substances -both pure and mixtures-while being non-destructive and able to scan directly through packaging materials [6][7][8][9].Forensic applications of on-site drug testing using portable Raman devices are reported for cocaine [10][11][12][13], cathinones [13,14] and 'legal high' substances [15].A drawback of Raman devices is their relatively high unit price, thus limiting their scope only to dedicated forensic teams and illicit drug law enforcement experts.Portable mid-infrared (MIR) instruments are another promising category for on-site forensic drug detection due to the higher spectral specificity in the MIR wavelength range compared to near-IR (NIR) [16,17].Recent forensic applications in MIR drug detection using portable devices include fentanyl detection in community drug testing [18] and cocaine detection [19].Further miniaturization of MIR instruments by micro-electro-mechanical systems (MEMS) technology is a promising development towards pocket-size instrumentation [20,21].However, the attenuated total reflection (ATR) sampling technique needed for MIR requires the compression of a small amount of sample on the ATR crystal.Touching and sampling of the material by the operators is thus required [17,21].Electrochemical sensors can also detect a broad range of substances and do not face the limitations of spectroscopic techniques such as interferences for colored samples.They do however require single-use electrodes for analysis [22][23][24].
NIR-based spectrometers are very promising for fast and reliable onscene drug detection.Coppey et al. introduced a drug-detection platform using a handheld NIR sensor operating in the 950-1650 nm range and demonstrated applicability for qualitative and quantitative cocaine and heroin analysis [25] as well as the measurement of the THC level in cannabis [26].Risolutti et al. used the same MicroNIR hardware in a toxicological analytical platform for cocaine detection in saliva [27].Hespanhol et al. demonstrated in-situ cocaine profiling based on NIR spectra of a 900-1700 nm NanoNIR spectrometer [28].The Dutch Police very recently collaborated in a study on illicit-drug identification using a 1300-2600 nm portable NIR spectrometer by applying a calibration-friendly data-analysis approach for mixture detection [29].
Numerous attempts were undertaken to miniaturize sensors for onscene forensic detection [6], such as paper based analytical devices [30] and wearable electrochemical sensors including fingertip sensors, rings and gloves ('robotic skin') [22,[31][32][33].The availability of convenient small-size, broadly applicable and affordable sensors may leverage street level drug testing by generic police officers.Since all police and investigative staff are equipped with a smartphone, integration of a spectroscopic sensor in the mobile phone may also aid rapid drug detection.However, this requires extensive miniaturization of a relatively low-cost sensor.The integration of a spectral sensor in a mobile communication platform with cloud data storage allows for a local measurement with central data storage, processing and modelling.This creates opportunities to develop rapid, miniaturized portable technology that combines ease-of-use with excellent selectivity, sensitivity and versatility.Various groups have presented the development of visible or short wave (SW)-NIR spectral sensors aimed for smartphone integration.Rissanen et al. demonstrated a hyperspectral imager operating in the 450-550 nm range [34].This device uses a MEMS Fabry-Pérot interferometer.Levin et al. reported on a ~600-800 nm silicon hyperspectral Fabry-Pérot filter [35].In 2020, a collaboration of Dutch governmental laboratories reported SW-NIR-based cocaine detection in seized casework using a 740-1050 nm sensor in combination with an advanced multi-stage machine learning model [36].However, such sensors which use silicon-based detectors have a maximum operating wavelength of 1050 nm [37,38].Spectral peaks in the SW-NIR (~700-1050 nm) primarily originate from the 3rd to 4th overtones of C-H, O-H and N-H groups, which have a collapsed band structure and contain limited information [38].In comparison, spectral sensors operating in the NIR (800-2500 nm) region are more reliable and provide higher selectivity, sensitivity and penetration depth [38].
The common strategy to extend the NIR wavelength range up to ~1700 nm is by using Indium Gallium Arsenide (InGaAs) detectors.
Existing portable NIR sensor systems are mostly based on miniaturized dispersive optics typically using scaled-down diffraction gratings [37,39,40] and Fabry-Pérot filter-based [41][42][43] or Fourier transform-based [44,45] systems with mirror displacement using MEMS.In all these cases, the filter and detector components of the sensor are not integrated into one structure, thus resulting in high complexity and limiting the extent of miniaturization.Moreover, MEMS approaches are susceptible to mechanical vibrations and shocks common in on-site sensing scenarios.
Very recently, a novel fully integrated multi-pixel NIR sensor was introduced [46,47], which uses an array of 16 resonant cavity-enhanced photodetectors operating in the 850-1700 nm range.This 1.8 × 2.2 mm footprint sensor contains no moving parts and can therefore be produced in large volumes at low-cost due to the wafer-scale fabrication process [47].In this work, an exciting first application of this multi-pixel NIR sensor is presented for the identification of drugs of abuse.As a proof-of-principle, the classification of eleven substances commonly encountered in police seizures is presented showing a compelling 100% accuracy.In addition, real-world performance was assessed by analysis of 246 cocaine-suspected forensic casework samples and MDMA-suspected crushed ecstasy tablets, seized by the Amsterdam Police in 2020 and 2021.

Forensic casework material
Three sets of forensic casework material were used in this study: Set A: The purpose of this set is to investigate sensor selectivity for various commonly encountered drugs-of-abuse substances as well as several licit pharmaceuticals that frequently occur in forensic casework.All samples originated from high-purity casework samples seized by the Dutch Police.Substances included in this set are: amphetamine sulphate, cocaine base, cocaine HCl, caffeine, brown heroin sample containing heroin base, ketamine, levamisole, lidocaine, 3,4-methylenedioxymethamphetamine HCl (MDMA), methamphetamine HCl and paracetamol.
Set B: The aim of this set is to explore the sensor performance for cocaine detection in a wide variety of 246 white, off-white or creamcolored powders seized in a drug-suspected setting.The composition of this set is as follows: 155x cocaine (either in HCl or base form, at various concentrations and degrees of adulteration); 18x amphetamine; 18x ketamine; 13x MDMA; and 42 other substances (uncommon illicit drugs, designer drugs and licit pharmaceuticals or household chemicals).The exact composition of the samples is published elsewhere: Table S6 in Ref. [29] and Table S2 in Ref. [36], completed with casework cocaine HCl samples from large volume seizures.

Instrument and settings
Scans were recorded on an integrated semiconductor spectral sensor based on a multi-pixel array of resonant-cavity-enhanced photodetectors.Each of the 16 pixels includes a Fabry-Pérot optical cavity with different cavity lengths generating a wavelength-selective response within the 850-1700 nm range [47].The photodetector array was fabricated on an InGaAs/InP membrane bonded on a silicon wafer, using the fabrication method described by Van Klinken et al. [48] and Hakkel et al. [49], resulting in photoresponse peaks with a linewidth varying R.F.Kranenburg et al. from 25 nm to 90 nm; its total size was 1.8 × 2.2 mm 2 .For this study, the sensor chip combined with a halogen lamp light source, read-out electronics and USB-adapter were embedded in a reflectance 8 × 8 cm 2 casing, forming the "SpectraPod™" sensing module (Fig. 1).The module was operated via an in-house developed application created in MATLAB version 9.7 (R2019b) and executed via MATLAB Runtime for R2019b.This application provided instrument control settings (lamp power, integration time), acquisition settings (number of scan averages, recording of a dark reference spectrum) and the option to save and export raw spectral data.
For each measurement, the average of three acquisition cycles were obtained, resulting in 16 photocurrent values via the analog-to-digital converter (ADC).The integration time per pixel was ~0.5 s.One of the 16 channels was damaged due to faults during manual device assembly, as the sensor module used in this study was one of the first made.The assembly process is undergoing improvements to minimize risks of damage and faulty devices will be rejected during future production.The damaged pixel was excluded in this study, and the remaining 15 photocurrent values were used in subsequent analyses.Since each pixel has a broad spectral response of multiple peaks in the NIR range [48], the entire range was still covered even though a single pixel was not working.
Individual samples were measured by placing a glass vial containing sample material directly on top of the sensor window.Each vial was measured 9-fold, by removing and subsequently replacing vials to the scanner for a set of three measurements to compensate for variations introduced by sample placement.Then this set of three was repeated three times after shaking the vials to redistribute the particles.For the set A samples, the 9-fold replicate measurement was repeated on 6 different days, leading to 594 scans in total.Individual measurements and data storage took ~20 s.Dark scans were recorded each day prior to analysis.Spectralon references were recorded before, after and every 2 h of analysis.

Data analysis
Firstly, the measurements of each sample were corrected for the dark current by subtraction of the dark current ADC value.Then, the darkcorrected sample measurements were converted to absorbance using reference measurements of the Spectralon: log 10 (I r /I s ), where I r and I s represent the dark-corrected ADC values for the reference and drug samples, respectively.Outliers were identified by plotting the Q residuals and Hotelling's T 2 of the samples' absorbance.Four out of 594 scans in set A, 10 out of 2214 scans in set B and three out of 640 scans from set C, were identified as outliers and excluded.Subsequently, the absorbance values of the triplicate measurements taken after shaking the vial to redistribute the sample particles were averaged (i.e. each vial was measured 9-fold, and resulted in three sets of triplicate-averaged absorbance values).
Data was divided into training and test sets in the following way: Set A: all measurements from day 1, 2 and 4 were included in the training set, whereas those from day 3, 5 and 6 were included in the test set.Set B: A total of 53 out of the 246 samples were selected for the test set, such that this group consisted of 30x cocaine, 6x ketamine, 6x amphetamine, 6x MDMA and 5 samples identified as 'others'.For the measurements in each set, three preprocessing methods were applied to the triplicate-averaged absorbance values, and compared: mean-centering, sum normalization and standard normal variate (SNV).Six classifiers were compared for building the classification model: linear discriminant analysis (LDA), partial least squarediscriminant analysis (PLS-DA), support vector machine (SVM), principal component analysis (PCA)-LDA, random forest (RF) and PLS-RF.Fivefold groupwise cross-validation was used to optimize the parameters of the PLS, PCA and RF-based models, and groupwise randomized search crossvalidation was used to optimize the SVM models that had a larger number of parameters to tune.In all cases of cross-validation, replicate measurements from each sample were kept together in one group.The combination of the preprocessing method and classifier that resulted in the best prediction performance for each experiment set is shown in this manuscript.The algorithms used in analysis and modeling were implemented in Python using packages from NumPy [50], Matplotlib [51], and Scikit-learn [52].

Common drugs-of-abuse
Fig. 2 shows the Set A data from 15 pixels, both as raw (A) and following preprocessing by sum normalization (B).The preprocessed data already shows that the relative pixel responses are different for individual drug types.For example, amphetamine has a relatively high response on pixel #2 compared to other drugs.Direct input of the preprocessed data into the LDA classifier from Scikit-learn provided 100% accurate discrimination for all drug substances by yielding both a perfect cross-validation in the training set and a 100% correct prediction of the samples in the test set (Fig. 3).It must however be noted that only single samples per substance were used, and inter-sample variation was thus not taken into account in this experiment.The results nevertheless demonstrate the capability of the sensor to discriminate among relatively pure drug substances, even when measured on different days.The sensor selectivity is also reflected in the LDA score plots showing wellseparated distributions for each individual compound (Fig. 4).

White and off-white powdered forensic casework samples
The sensor performance was subsequently examined using the 246 casework samples included in Set B. This set was designed to be representative for actual forensic materials, as these sample were randomly selected from seized casework of the Amsterdam Police laboratory by only taking into account the physical properties of the material (i.e.material with a white, off-white or cream color and an appearance as powder, coarse powder or small chunks).The actual composition of the material, revealed by GC-MS, showed that the vast majority (i.e.63%) of the material was cocaine-containing (in different compositions with various amount of adulterants).Around 20% of the material consisted of the common drugs amphetamine, MDMA and ketamine.The remaining 17% (42 samples) of the 'others group' represented a very diverse range of substances such as adulterants, designer drugs and household chemicals.Many of them were unique in this sample set.These diverse 'other' substances complicated chemometric modelling by the large diversity of spectral signals within this group.Figure S1 shows the confusion matrices for the Set B samples when grouping the material in 4 drug categories (i.e.cocaine, amphetamine, ketamine and MDMA) and 1 generic 'other' category containing the diverse set of remaining substances.A 69% accuracy from cross-validation was obtained, in which the main errors could be attributed to misclassifications of the 'other' class.This broad class of samples was particularly challenging, showing a 34% false positive rate for cocaine in cross-validation (Figure S1).The wide range of diverse substances including the pure form of adulterants that were also present as part of diluted cocaine-containing samples is proposed to cause this phenomenon.The overall 82% validation prediction accuracy does however show that the model is well capable of characterizing new samples of illicit-drugs for classes that are included in model design.This is further exemplified by the results shown in Fig. 5, which were obtained using SNV preprocessing and the SVM classifier (optimized using a groupwise randomized search crossvalidation with 50 iterations, to obtain the model parameters: linear kernel, tol = 1, C = 10).In this analysis, all samples belonging to the 'others' class were deliberately excluded to provide insight in the sensor performance to discriminate common drugs originating from various sources (seizures, degree of adulteration).In addition to the single sample results in 3.1, the 93% prediction accuracy showed that the sensor was also capable to detect e.g.cocaine within this wide variety of casework materials thus including inter-sample variations such as particle size, color and the presence of adulterants.It must however be noted that omitting the wide variety of diverse substances in the 'other' class makes this experiment less representative for real-world performance on forensic samples in which many diverse chemicals may be encountered.These Fig. 5 results are thus only intended to provide insight in performance on various batches of cocaine, amphetamine, MDMA and ketamine.The model on the full Set B (Figure S1) needs further study to draw more statistically convincing conclusions on the overall performance in a forensic setting.
Another approach to cope with the diverse range of substances encountered in the forensic setting is to only put emphasis on the most important substance; thus, develop a model for cocaine-detection only.All Set B samples were divided into a cocaine-containing class (155 samples) and a 'non-cocaine' class (91 samples).A model was implemented with SNV preprocessed data as input for the PLS-RF classification method optimized using groupwise fivefold cross-validation.10 latent variables were retained from the PLS analysis, which were used as input to the RF classifier with 200 estimators and a maximum depth of 100 levels.The prediction of the test dataset obtained 92% accuracy; 84% sensitivity (true positive rate); 98% specificity (true negative rate) and 97% precision as depicted in Figure S2.With the low false negative rate, such a test could be envisioned in e.g.harbor bulk-testing for cocaine using the smartphone of harbor security personnel.The relatively high false positive rate for cocaine may be acceptable in this specific situation as a positive first suspicion on cocaine will always be followed by additional tests using more advanced laboratory-grade equipment.
Note that, for all experiments in this section, both cocaine HCl (i.e.snorting cocaine) and cocaine base (i.e.crack cocaine) were combined  into a single group.The reason is that background knowledge on the salt form of a substance is typically not available from routine GC-MS casework data.In spectroscopic analysis, it is common that various salt forms yield different spectra and should thus be considered different substances [10,36,53].This is also noticed in the Set A results of this study (3.1) in which the different cocaine types yielded distinct responses.Including separate groups for cocaine HCl and cocaine base in the training set may thus be a future development to increase model  performance.It should also be remarked that supervised methods (i.e.LDA, PLS-RF, SVM) were required to exploit the differences in the spectroscopic profiles in all sets.The unsupervised approach PCA (results not shown) and PCA-based soft independent modelling by class analogy (SIMCA) were found unsuitable as differences were not substantial enough in relation to compound classes showing significant profile variation due to compositional variety.As a first proof of principle, nine replicate scans each followed by shaking of the material were taken into account.In forensic casework analysis it is highly unwanted to perform a large number of physical replicates because of time constraints.As a next step towards real-life performance, a follow-up study on single measurements or replicate scans without removing and shaking the sample is suggested.

MDMA-detection in ecstasy tablets
Another promising application for rapid on-site drug testing using a mobile phone integrated NIR-sensor is the detection of MDMAcontaining tablets by police officers at e.g.dance parties.Ecstasy tablets are usually brightly colored due to the presence of colorants.In addition, an active ingredient and one or more excipients may be present.MDMA is the most commonly encountered substance in these tablets and is put under international control as an illicit substance.Besides MDMA, other synthetic drugs (both controlled and uncontrolled) including 2C-B, fluoroamphetamines or fluoromethamphetamines may also be encountered.In certain cases, such as a scam, no active ingredient may be present at all.Although mixtures of multiple drugs do exist [53][54][55], these are rarely encountered in tablets and are left outside the scope of this study.A total of 71 crushed tablets (39x MDMA-containing, 32x others) were scanned on the NIR sensor.A model for MDMA detection was developed utilizing the PLS-RF method (using 8 PLS latent variables and a RF classifier with 50 estimators and a maximum depth of 10 levels).The classification model achieved a cross-validation accuracy of 91%.In the prediction of the test sample set, a 91% accuracy; 94% sensitivity (true positive rate) and 88% specificity (true negative rate) were obtained.The confusion matrices are shown in Fig. 6.These results demonstrate the potential of such a sensor for on-scene MDMA detection.In line with section 3.2, the non-MDMA-containing tablets also consisted of a wide variety of at least 11 different substances [23].Unlike the results for cocaine, a relatively low number of false positive and false negative results were observed.A possible explanation for this is a more selective (i.e.characteristic, diagnostic) NIR response of MDMA compared to other synthetic drugs in the wavelength range of this sensor [29].Crushed tablets were used in this study because of availability of this sample set.It must be noted that crushing of tablets requires some manual sample handling that ideally should be avoided.Direct analysis of intact tablets was not included in the current study but may be a promising future outlook.

Conclusions
As a proof of concept, the performance of a miniaturized multipixel integrated spectral sensor for forensic illicit-drug detection was examined.The sensor consisted of 15 pixels all with individual spectral  characteristics within the 850-1700 nm range.The spectral information was therefore encompassed in only 15 measured photocurrent values.This limited resolution was however adequate to distinguish cocaine, MDMA, amphetamine, methamphetamine and ketamine from each other and from various licit drug substances.An LDA-based model provided 100% accuracy both in the cross-validation of the training set and in the classification of a validation set.Real-world forensic casework materials not only include a vast number of different substances but also major dissimilarities are observed within a single type of drug.This relates to differences in degree and composition of adulteration, the chemical form of the active ingredient and to physical properties as particle size, shape and color.The performance of the sensor on actual forensic casework was further explored on 246 seized powder samples.Using SVM and PLS-RF classifiers, an above 90% accurate prediction of cocaine was achieved by focusing on differentiation to other drugs.This demonstrates that the sensor is capable of handling the chemical diversity encountered in cocaine casework samples, although supervised models are required to exploit the spectral profile.Only 69% accuracy from cross-validation was obtained when all randomly selected casework samples were included in modelling.This is related to a high 34% false positive rate for cocaine originating from a broad residual class of samples representing a large diversity of uncommonly encountered substances (e.g.pharmaceuticals, household chemicals, designer drugs).A possible future development to overcome this limitation is expansion of the training set to include all substances that are likely to be encountered in a forensic setting.When spectral selectivity is then still found to be the limiting factor, technological advances such as an increase in the number of pixels or improvements in the pixel's NIR wavelength range and linewidth are possible solutions.
Unlike cocaine, detection of crushed MDMA-suspected ecstasy tablets yielded a remarkably good 94% true positive and 88% true negative rate.Possible explanations are the relatively high level of MDMA in seized ecstasy tablets in combination with more diagnostic spectral features for this substance in the NIR spectrum as reflected in the pixel responses.
Several encumbrances need to be addressed in future studies towards implementation of this technology in forensic practice.The current study is limited to scans on powdered samples collected in glass vials whereas actual forensic casework can have a myriad of physical appearances and packaging materials.Also, the current replicate scans per sample are unwanted in a time-efficient on-scene approach.Furthermore, future mechanical shock testing will be valuable to provide additional insight on the sensor's robustness and allow comparison with international standards.
Summarized, these findings show that illicit-drug detection in real forensic case work samples using a 1.8 × 2.2 mm 2 miniaturized multipixel spectral sensor is possible.The option of integrating such devices in a smartphone provides interesting opportunities for drug detection by police and customs officers directly on-site.
All remaining 193 samples were included in the training set.Set C: From the total of 71 crushed tablets, 19 (11 MDMA-containing, 8 with other identity) were included in the test set.The remaining 52 samples (28x MDMA, 24x other) completed the training set.The test samples of sets B and C were manually selected in a randomized way before the measurement process, followed by an examination to ensure that 'unique' samples were placed in the training set.The second examination step was necessary as there were unique samples in the 'others group', e.g.there was only one methylenedioxypyrovalerone sample in the others group of set B and one pentylone sample in that of set C. An overview of all training and test sample sets can be found in Table S1 in the Supplemental Information.The training data of set B were balanced by up-sampling the classes with fewer samples, to reduce skewing the classification model toward the majority class (63% of set B samples are cocaine).

Fig. 1 .
Fig. 1. (A) Photograph of the spectral sensor placed on a Eurocent coin for size comparison.(B) Schematic diagram of the resonant-cavity-enhanced (RCE) multipixel detector array, where each pixel has a different wavelength response (indicated by the different colors).(C) The handheld SpectraPod™ sensing module used to acquire reflectance measurements through the bottom of glass sample vials.(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 2 .
Fig. 2. Raw (A) and sum normalized (B) sensor data of the Set A common drugs of abuse.

Fig. 3 .
Fig. 3. Confusion matrices of the Set A data following sum normalization and LDA classification.Numbers are absolute frequency (samples with outliers removed).

Fig. 5 .
Fig. 5. Confusion matrices of the training set (left, relative numbers) and training set (right, absolute numbers) of the 204 samples in Set B (excluding the 'others' class).Model performance following SNV pre-processing and SVM classification.

Fig. 6 .
Fig. 6.Confusion matrices of the training set (left, relative numbers) and training set (right, both absolute and relative numbers) of the Set C crushed ecstasy tablets processed as an MDMA-detection test.Results following mean centering as pre-processing and PLS-RF classification.