Elsevier

Radiation Physics and Chemistry

Volume 93, December 2013, Pages 82-86
Radiation Physics and Chemistry

Analysis of synchrotron radiation induced X-ray emission spectra with R environment

https://doi.org/10.1016/j.radphyschem.2013.04.026Get rights and content

Author-Highlights

  • Solution for spectral data processing and analysis based on open source R environment.

  • Multivariate statistical analysis successfully implemented on spectral data.

  • Automatic clustering of the data based on spectral differences by means of hierarchical cluster analysis.

Abstract

Life sciences have seen a huge increase in the amount and complexity of data being collected with every experiment. Scientists today are faced with increasingly difficult task to extract vital information from the vast amount of numbers. Software used for this purpose should be sufficiently powerful and flexible to handle large and complex data sets. On the other hand it should allow the user to exactly follow what is being calculated, black-box type of software should be avoided. R platform (R Development Core Team, 2011), open-source environment for statistical analysis nicely fits these requirements. With its rapidly expanding user community it is quickly becoming the most important tool in statistical analysis of data in broad range of applications. The most important feature of R is the package system, allowing users to address specific problems with dedicated package and even for more advanced users to contribute software for their own fields. In this paper multivariate analysis and data treatment of spectral data by using R environment is presented.

Introduction

Amount of data collected during the experiments is growing very fast, so there is a high demand on tools for efficient and reliable treatment of such big data sets. Proposed here solution is based on an open source R statistical programming environment for statistical analysis, visualization and modelling. Acceptance of R as a statistical lingua franca of scientists today is based on its ability to transform and evolve. As new statistical analysis techniques are discovered, they instantly emerge as R packages, well before they are incorporated in commercial software.

R environment is available free of charge for most of contemporary operating systems including Windows, MacOS and wide variety of UNIX based platforms. Currently there are 3860 packages available at CRAN – The Comprehensive R Archive Network (CRAN, 2012) (state for 08.06.2012). R Studio (RStudio, 2012) is a free and open source integrated development environment (IDE) for R and it was used in this analysis. R environment is widely used in many areas of research including econometrics, psychology, ecology, genetics, environmental studies, biology, chemistry and physics. Even governmental agencies and departments dealing with still growing volume and complexity of digital information stored for public purposes have started to use R for gaining the insight from data collected in big-data platforms. For example US National Weather Service is using R for research and development of models to predict river flooding. Many pharmaceutical companies are now using open-source R to analyse data from clinical trials, despite a false perception that only commercial software may be used for this purpose.

In this paper application of R platform for the spectral data preprocessing as well as for univariate and multivariate statistical analysis is presented. Initial analysis of the spectral data obtained from the experiment usually includes baseline correction, normalization, sometimes subtracting common for every spectrum component, removing artefacts like spikes or glitches, offset correction, etc. All these procedures could be implemented in R. One of the packages for this kind of analysis is hyperSpec (Beleites and Sergo, in preparation).

In order to reduce the amount of variables in the spectral data sets multivariate approaches like hierarchical cluster analysis (HCA) and principal component analysis (PCA) (Härdle and Simar, 2003) are extremely useful. They allow for automatic classification of the spectra or simplify the analysis by reducing the number of variables describing the system under investigation. In spectral data by their nature there are many redundant information due to high correlation. PCA is also known to be very sensitive to outliers and it could be used for outliers detection. Data matrices collected during the experiments contain redundant information and can be reduced substantially. New reduced variables are easier to understand and interpret while noise and other disturbances are left in the residual matrix. In that sense PCA could serve as noise filtering technique.

In this paper results of univariate analysis (that is represented by calculation of concentration distribution maps for selected elements) are compared with multivariate way of spectral data treatment.

Section snippets

Samples

Presented in this contribution analysis was performed for the elemental concentration measurements on mice hearts thin section specimen by using Synchrotron Radiation Induced X-ray Emission (SRIXE) spectroscopy. It could be easily generalized for virtually any large spectral data set. The samples measured were 10μm thick mice heart tissue sections mounted on 2.5μm Mylar foil, freeze-dried and unfixed. More details about the samples and sample preparation procedure one can find in Gajda et al.

Preprocessing

In order to start spectral data analysis in hyperSpec package of R it is necessary to import results of experiments to the environment and arrange them into a hyperSpec object. In this object all imported spectra, x-axis descriptors (energy, frequency, wave-number or wavelength) as well as supplementary data (spatial coordinates, time or concentration) are stored.

This goal can be achieved with following code in R environment:

One of the first step in preprocessing is baseline correction. It

Conclusions

In summary, it is evident that multivariate data analysis techniques play an increasingly important role in the evaluation of the results of analytical techniques providing information-rich data sets such as SRIXE spectroscopy. For successfully discriminating various groups, healthy from altered tissues, or for differentiating disease stages numerous samples are required, and thus, thousands of spectra may have to be collected to establish a reliable database of control and diseased subjects.

Acknowledgements

This work was partially performed under NUS Core Support C-380-003-003-001.

References (14)

There are more references available in the full text version of this article.

Cited by (5)

View full text