Analysis of synchrotron radiation induced X-ray emission spectra with R environment
Introduction
Amount of data collected during the experiments is growing very fast, so there is a high demand on tools for efficient and reliable treatment of such big data sets. Proposed here solution is based on an open source R statistical programming environment for statistical analysis, visualization and modelling. Acceptance of R as a statistical lingua franca of scientists today is based on its ability to transform and evolve. As new statistical analysis techniques are discovered, they instantly emerge as R packages, well before they are incorporated in commercial software.
R environment is available free of charge for most of contemporary operating systems including Windows, MacOS and wide variety of UNIX based platforms. Currently there are 3860 packages available at CRAN – The Comprehensive R Archive Network (CRAN, 2012) (state for 08.06.2012). R Studio (RStudio, 2012) is a free and open source integrated development environment (IDE) for R and it was used in this analysis. R environment is widely used in many areas of research including econometrics, psychology, ecology, genetics, environmental studies, biology, chemistry and physics. Even governmental agencies and departments dealing with still growing volume and complexity of digital information stored for public purposes have started to use R for gaining the insight from data collected in big-data platforms. For example US National Weather Service is using R for research and development of models to predict river flooding. Many pharmaceutical companies are now using open-source R to analyse data from clinical trials, despite a false perception that only commercial software may be used for this purpose.
In this paper application of R platform for the spectral data preprocessing as well as for univariate and multivariate statistical analysis is presented. Initial analysis of the spectral data obtained from the experiment usually includes baseline correction, normalization, sometimes subtracting common for every spectrum component, removing artefacts like spikes or glitches, offset correction, etc. All these procedures could be implemented in R. One of the packages for this kind of analysis is hyperSpec (Beleites and Sergo, in preparation).
In order to reduce the amount of variables in the spectral data sets multivariate approaches like hierarchical cluster analysis (HCA) and principal component analysis (PCA) (Härdle and Simar, 2003) are extremely useful. They allow for automatic classification of the spectra or simplify the analysis by reducing the number of variables describing the system under investigation. In spectral data by their nature there are many redundant information due to high correlation. PCA is also known to be very sensitive to outliers and it could be used for outliers detection. Data matrices collected during the experiments contain redundant information and can be reduced substantially. New reduced variables are easier to understand and interpret while noise and other disturbances are left in the residual matrix. In that sense PCA could serve as noise filtering technique.
In this paper results of univariate analysis (that is represented by calculation of concentration distribution maps for selected elements) are compared with multivariate way of spectral data treatment.
Section snippets
Samples
Presented in this contribution analysis was performed for the elemental concentration measurements on mice hearts thin section specimen by using Synchrotron Radiation Induced X-ray Emission (SRIXE) spectroscopy. It could be easily generalized for virtually any large spectral data set. The samples measured were thick mice heart tissue sections mounted on Mylar foil, freeze-dried and unfixed. More details about the samples and sample preparation procedure one can find in Gajda et al.
Preprocessing
In order to start spectral data analysis in hyperSpec package of R it is necessary to import results of experiments to the environment and arrange them into a hyperSpec object. In this object all imported spectra, x-axis descriptors (energy, frequency, wave-number or wavelength) as well as supplementary data (spatial coordinates, time or concentration) are stored.
This goal can be achieved with following code in R environment:
One of the first step in preprocessing is baseline correction. It
Conclusions
In summary, it is evident that multivariate data analysis techniques play an increasingly important role in the evaluation of the results of analytical techniques providing information-rich data sets such as SRIXE spectroscopy. For successfully discriminating various groups, healthy from altered tissues, or for differentiating disease stages numerous samples are required, and thus, thousands of spectra may have to be collected to establish a reliable database of control and diseased subjects.
Acknowledgements
This work was partially performed under NUS Core Support C-380-003-003-001.
References (14)
- et al.
Using micro-synchrotron radiation induced X-ray emission distribution maps to determine correlation between elements in prostate tissue
Spectrochim. Acta Part BAtom. Spectrosc.y
(2008) - et al.
Multivariate analysis of remote laser-induced breakdown spectroscopy spectra using partial least squares, principal component analysis, and related techniques
Spectrochim. Acta Part BAtom. Spectrosc.
(2009) - et al.
Forensic application of total reflection X-ray fluorescence spectrometry for elemental characterization of ink samples
Spectrochim. Acta Part BAtom. Spectrosc.
(2010) - et al.
Non-destructive analysis for the investigation of decomposition phenomena of historical manuscripts and prints
Spectrochim. Acta Part BAtom. Spectrosc.
(2007) - et al.
An advanced multivariate approach for processing X-ray fluorescence spectral and hyperspectral data from non-invasive in situ analyses on painted surfaces
Anal. Chim. Acta
(2012) - Beleites, C., Sergo, V., 2012. hyperspec: a package to handle hyperspectral data sets in R, R package version...
- CRAN, 2012. The Comprehensive R Archive Network...
Cited by (5)
Non-destructive evaluation of weld discontinuity in steel tubes by gamma ray CT
2015, Nuclear Instruments and Methods in Physics Research, Section B: Beam Interactions with Materials and AtomsCitation Excerpt :A computerized gamma ray scan could be an adequate system to generate data for a weld discontinuity investigation [5]. In the current data processing environment, instrumental installations that are used to induce interaction and measure photon signals [6,7], should combine computational algorithms and multivariate statistical models [8] to obtain effective results. In the literature, a strategy to discriminate between noise and defect signals in X-ray CT has been described in detail [9].
2014 Atomic Spectrometry Update-a review of advances in X-ray fluorescence spectrometry
2014, Journal of Analytical Atomic Spectrometry