Elsevier

Journal of Molecular Structure

Volume 1073, 5 September 2014, Pages 3-9
Journal of Molecular Structure

Substance identification based on transmission THz spectra using library search

This paper is in honour of Prof. Simion Simon on the occasion of his 65th birthday
https://doi.org/10.1016/j.molstruc.2013.12.065Get rights and content

Highlights

  • With renewed interest in the THz region, library search algorithms were tested.

  • Spectra from the publicly viewable online RIKEN THz database were used.

  • Spectral pre-processing was required for artefact removal prior to data comparison.

  • Successful removal of atmospheric water vapour rotational lines is demonstrated.

  • Fourier filtering for removal of channel spectra leads to efficient library search.

Abstract

Over recent years terahertz spectroscopy has become a new tool for the characterization of solid materials, in particular for investigating polymorphism and crystallinity in pharmaceutics. Search strategies have been tested for THz spectra of various organic compounds with their spectra taken from the Riken database (http://www.riken.jp), using the GRAMS spectroscopy software. A subset of the entire database was used, which had been processed by removal of atmospheric water vapour lines and smoothing applied based on Savitzky-Golay convolution or time domain filtering. The spectral range available for all library substances was restricted to an interval from 0.9 to 4.5 THz (30–150 cm−1). The number of vibrational bands within this spectral range is much reduced compared with mid-infrared or Raman spectra. The appropriateness of spectral pre-treatment is demonstrated with regard to reliability and robustness of the search methods. In particular, time-domain filters for smoothing and pre-treatment by the removal of water lines and etalon effects have been successfully tested in combination with least squares and correlation methods. With these insights, applications for substance identification, especially for the pharmaceutical industry, may be enlarged.

Introduction

There is ever increasing demand for robust computer-based detection algorithms to enable unknown substance identification from reference digital library spectra when the composition of the samples is entirely unknown or can only be guessed. In the past, significant progress has been achieved using computer-based detection and identification based on Raman, IR, NMR and other spectral libraries. Under laboratory conditions usually a satisfactory standardized sample preparation can be established, where the conditions can be adjusted to the measurement requirements. These steps lead to high-quality spectra which are recommended to be used for establishing spectral databases. In contrast, infrared spectra measured under non-ideal and more realistic conditions often suffer from spurious features such as atmospheric bands, etalon effects and other noise problems. The same statement applies to far-infrared (FIR) spectra, nowadays also known as THz domain. These effects complicate an automatic detection, despite the fact that modern mathematical techniques such as pattern recognition tools and elaborate library search algorithms are available, e.g. [1], [2], [3], [4]. Previously, molecular spectroscopy with IR and Raman played an important part due to the availability of large and versatile spectral libraries. Exemplary methods employ, e.g., neural networks for the identification of polymers based on NIR-spectroscopy [5], or another recent application is the identification of food spoilage bacteria through Raman micro-spectroscopy using support vector machines [6].

Recently, there has been a regained interest for the far-infrared region or synonymously for THz spectroscopy due to novel radiation and detector technologies, especially within the pharmaceutical industry or for material science applications and non-destructive testing [7]. Solid formulations are the most important pharmaceutical dosage forms today. While Raman and infrared spectroscopy probe the solid state predominantly on an intramolecular level, with advantages also for amorphous substances, THz spectroscopy can probe low energy torsion and hydrogen bonding vibrations as well as the intermolecular state, i.e. long-range crystalline lattice vibrations (phonon modes). These properties make THz spectroscopy an ideal tool to investigate crystallinity and to discriminate between polymorphic forms [8]. Detection and quantification of solid phase transformations (appearance of new polymorphs) during processing is another application in pharmaceutics.

Further examples of applications of THz technology can be found, in bio-engineering and biomedicine [9] as well as in the security field as related to explosives and hazardous materials. Several text books have been published covering the different areas [10], [11]. Recently, a review on the measurement of THz spectra and chemometrics used for spectral pre-processing, qualitative and quantitative analysis has also been published [12].

This work was dedicated to testing search strategies for THz spectra of various organic compounds using the GRAMS Spectroscopy Software (http://www.thermoscientific.com/grams). Similar software is also available from other companies, e.g., the IDENT program from Bruker Optics (Ettlingen, Germany) using spectral library search tools or cluster analysis for the identification of chemicals. The identification of drugs and explosives from scanning persons or postal items by THz radiation was our main interest, and in this context we tested the effectiveness and performance of library search algorithms for this spectral range.

Section snippets

THz spectroscopy and spectral databases

The THz frequency range has been broadly defined from 0.1 to 30 THz. The corresponding wavelengths range from 3 mm to 10 μm and in wavenumbers from 3.33 cm−1 to 1000 cm−1. Terahertz spectroscopy shows similar optical properties of the in wavelength neighbouring mid-infrared spectral region, and over and above, the same tools for spectral interpretation and band assignment can be exploited. Furthermore, THz radiation provides high penetration depths in special materials and low scattering combined

Search strategies

The performance of the search algorithms was tested based on GRAMS AI spectral ID toolbox [18]. In the first stage a library was created. For this, the spectral data format had also to be consistently matched, i.e. all points were interpolated to the same wavenumber grid and cropped to the same frequency range. The following search algorithms were tested (for the different metrics, see for example [2], [3]; more details, also on the definition of the hit quality indices, are given in the

Results and discussion

Exemplary spectra of the Riken database representing some of the main substance classes are shown in Fig. 5. The spectral range available differs significantly for the individual substances within the library, as some spectra were recorded for the interval of 30–300 cm−1 with quite different number of data points (for example, glucose spectrum 790, ampicillin 630 or captan 7500 data points). For captan and ampicillin 150 data points were taken into account in the Fourier domain for the Fourier

Conclusions

In this paper conventional library search algorithms were tested for THz spectra. THz or FIR spectroscopy provides several unique attractive features that make this frequency range for some measurement tasks preferable to other optical spectroscopy. However, it can be stated that visual inspection is still a step, which is recommended for selecting the correct library entry from the suggested hit list spectra obtained by library searching. In the recent past, in particular so-called THz TDS

Acknowledgements

Part of the work was performed when both authors were with the Leibniz Institut für Analytische Wissenschaften – ISAS e.V. in Dortmund. We are most grateful to R. Kuckuk for continuous support. The Riken group is thanked for using their THz library data. Financial support from the Fraunhofer Institut für Physikalische Messtechnik, THz-Department, Kaiserslautern and the supply of TDS spectra is also gratefully acknowledged. Furthermore, financial support by the Ministerium für Innovation,

References (18)

  • S. Meisel et al.

    Food Microbiol.

    (2014)
  • H.H. Mantsch et al.

    J. Mol. Struct.

    (2010)
  • J. El Haddad et al.

    Trends Anal. Chem.

    (2013)
  • H. Wu et al.

    Int. J. Pharm.

    (2007)
  • M. Herrmann et al.

    Vib. Spectrosc.

    (2012)
  • L.S. Rothman et al.

    J. Quant. Spectrosc. Radiat. Transfer

    (2013)
  • F. van der Heijden et al.

    Classification, Parameter Estimation and State Estimation

    (2004)
  • P.R. Griffiths et al.

    Appl. Spectrosc.

    (2009)
  • H. Günzler et al.

    IR-Spektroskopie

    (1996)
There are more references available in the full text version of this article.

Cited by (15)

View all citing articles on Scopus
View full text