New program with new approach for spectral data analysis

This article presents a high-throughput computer program, called EasyDD, for batch processing, analyzing and visualizing of spectral data; particularly those related to the new generation of synchrotron detectors and X-ray powder diffraction applications. This computing tool is designed for the treatment of large volumes of data in reasonable time with affordable computational resources. A case study in which this program was used to process and analyze powder diffraction data obtained from the ESRF synchrotron on an alumina-based nickel nanoparticle catalysis system is also presented for demonstration. The development of this computing tool, with the associated protocols, is inspired by a novel approach in spectral data analysis.


Introduction
The last few decades have witnessed a revolution in the detectors and data acquisition technologies. This, associated with the computing and communication revolution, has increased the demand for data processing power. Modern detectors coupled with the high intensity radiation sources have led to the situation where data sets can be collected in ever shorter time scales and in ever larger numbers. Such large volumes of data sets pose a data processing problem which augments with the current and future instrument development.
EasyDD is based on a new approach for large-scale processing, visualization and analysis of massive volumes of spectral data. Such a utility greatly assists studies on various physical systems and enables far larger detailed data sets to be rapidly interrogated and analyzed. EasyDD methodology is based on automation, batch processing, and encapsulation of various modules in a single entity where the user can sequentially follow computational protocols to apply multiple types of procedures on massive amounts of correlated data sets with general common features.
EasyDD can be described as a high throughput software to manage, process, analyze and visualize spectral data in general and synchrotron data in particular. It is a powerful tool for processing large quantities of data in a variety of formats with ease and comfort using limited time and computing resources. The main features of EasyDD which were observed in its development are • User friendliness to minimize the time and effort required to learn and use. A graphic user interface is therefore adopted in favor of command line interface although the latter is more common in scientific computing and much easier to develop.
• Capability of handling several common data formats including a generic XY format so that it is possible to use in processing data produced by various sensors and detectors.
• Optimization for the commonly available computational resources; most importantly CPU time and memory. The program therefore tries to set the limits of its data processing capabilities to the limits of the available computational resources.
• Batch and multi-batch processing functionalities which most EasyDD procedures are based upon.
These features allow processing huge amounts of data (∼ terabytes) in reasonable time (∼ hours or days).
EasyDD uses a hybrid approach of procedural and object oriented programming methodologies. It combines Graphic User Interface (GUI) technology with standard scientific computing techniques. Its resources include the standard C++ library, a GUI library, with numerous algorithms, functions and techniques. Several input data formats are supported, which include but not limited to: generic XY, MCA of Diamond synchrotron [4], MCA of ESRF [6], ERD of Manchester University detector, and HEXITEC detector (e.g. [9]).
One of the main functionalities of EasyDD is to read and map spectral data on a graphic interface. This can be done simply by depositing the data files of a particular format in a directory and invoking the relevant read function. On reading the files, the data are stored in memory and mapped on a 2D color-coded tab. Multiple tabs from different data sources can be created at the same time. The tabs can also be removed collectively or individually in any order. In the following section we outline the main components of EasyDD, with a brief account of their main functionalities, followed by a general description of its modules.

Components and Modules of EasyDD
The principal component of EasyDD is the main window which is a standard GUI widget with menus, toolbars, a status bar, context menus and so on. The basic functionality of this widget is to serve as a platform for accessing and managing other components with their specific functionalities.
Another component is the tab widget which can accommodate a number of 2D color-coded scalable tabs for tomographic mapping with graphic and text tooltips to show all essential data properties. The tabs can be used to launch a plotter for dynamic display of individual patterns of the mapped data. The tabs can also be used for imaging, 3D visualization, manipulation and format conversion of these data.
A third component is a numeric plotter to obtain a graph of the spectral pattern for any cell in the tabs. It is also used to create basis functions and forms for curve-fitting. The 2D plotter capabilities include creating and drawing fitting basis functions which include polynomials of order ≤ 6 that pass through a number of selected points, Gauss, Lorentz and pseudo-Voigt. The fitting basis functions can also be modified and removed from within the plotter. The plotter can be used to perform non-linear least-squares curve-fitting by Levenberg-Marquardt algorithm on individual data in the tabs. Moreover, the plotter image can be saved in a number of different formats.
A fourth component is a spreadsheet form, mainly used for batch curve-fitting.
The idea is to prepare a form in the plotter and save it to the disc. It is then imported for batch fitting a number of cells or tabs in the tab widget or to use it in a multi-batch curve-fitting operation. The form has a number of columns that contain data required for curve-fitting such as data range, initial fitting parameters, upper and lower limits, boolean flags and values for applying restrictions on the refined parameters when they exceed acceptable limits. Columns that contain counters and boolean flags for controlling the number of iteration cycles and the parameters to be refined in the least-squares fitting routine can also be added by the user.
A fifth component is a 3D plotter for creating a 3D graph of the data in the tabs where the total intensity is displayed against the tomographic dimensions. The 3D plotter is very useful for close inspection of data as the graph can be rotated in all orientations and zoomed in and out.
EasyDD contains four main modules; which are • Curve-fitting by least-squares minimization using Levenberg-Marquardt algorithm which is an iterative nonlinear least-squares optimization numerical technique. Thanks to its efficiency and good convergence, the Levenberg-Marquardt algorithm is widely used by scientists and engineers in all disciplines, and hence it became a standard for nonlinear least-squares minimization problems. In EasyDD, curve-fitting can be performed on a single pattern, or as a single batch process over multiple patterns, or as a multi-batch process over multiple forms and data sets. Curve-fitting can be done on a single or multiple peaks using a number of basis functions with and without polynomial background modeling. The number of curve-fitting cycles can be fixed or vary according to the convergence criteria. The parameters to be fitted can also be selected with possible application of restrictions. The range of data to fit can be selected graphically or by using a prepared form. Some relevant statistical indicators for the fitting process are computed in the curve-fitting routine. can run in single and multi-batch modes and can be applied on total or partial intensity as well as individual channels with possible application of Fourier transform and filtering.
• EDF processing to extract the information from EDF binary files obtained from CCD detectors [15]. Two forms of extraction are available: conversion to normal 2D images in one of three formats (png, jpg and bmp), and squeezing to 1D patterns in xy text format with the possibility of making tilt and missing-ring corrections.
• Graphic presentation which includes mapping, visualization and imaging to produce graphs in 2D and 3D spaces. These include creating tomographic and surface images in single, batch and multi-batch modes, as well as xy plots for spectral patterns. Some of these graphic techniques use a direct display on the computer monitor, while others save the results to the computer storage in the form of image files.

Case Study
EasyDD has been used in a number of key studies such as [5, 7, 10--14]. It has also been used by the High Energy X-Ray Imaging Technology (HEXITEC) project for the development of multi-pixel 2D X-ray detectors [3,9,16]. However, in this section we present a brief account of a nickel chloride catalysis system study as a show case for the use of EasyDD in processing and analyzing powder diffraction data. The nickel chloride data, which were collected by S. Jacques and coworkers (refer to [14] for details), are part of a larger data collection which consists of about 254 thousand EDF image files in 179 data sets (sinograms) with a total size of about 2.45 terabytes. The measurements were carried out at the ESRF [6] beamline ID15B which is dedicated to applications using very high energy X-ray radiation up to several hundreds of keV. ID15B houses the angle dispersive diffraction setup using a large area detector and high resolution Compton spectrometer. The wavelength of the monochromatic beam used in these measurements is λ = 0.14272Å. The nickel chloride collection contains 23 data sets representing sequential time frames of a single lateral slice in a cylindrical object.
A computer aided tomography (CAT) technique in angle dispersive diffraction (ADD) mode was employed in these measurements to monitor the chemical and crystallographic developments in a slice through an impregnated extrudate sample undergoing heat treatment to obtain 2D information at various points in time. A charge-coupled device (CCD) was used to record the diffraction patterns in 2D space. The CAT type ADD method has been suggested previously in the literature and has recently been demonstrated by Bleuet et al [2]. The temporal aspect of the study makes it entirely novel and challenging, requiring the data acquisition to be sufficiently fast to make the process observable.
A major advantage of using CAT type ADD technique is the rapid rate of data collection. Proportional area detectors, such as CCD devices, when used for recording angle dispersive diffraction patterns can support much higher count rates than energy dispersive solid state detectors. Since these devices offer fast reading, they can provide a more thorough insight into the temporal dimension of the dynamic processes under investigation. The area aspect of such detectors allows the recording of entire powder diffraction rings of the whole diffraction pattern simultaneously with good signal-to-noise ratio. The recorded intensity would be severely reduced if only a strip detector was employed. Such area detectors, when used in conjunction with very bright sources, allow for very fast data acquisition even from materials that give fairly poor scattering.
In the CAT technique, a pencil beam of monochromatic synchrotron X-ray is applied on a sample mounted on translational-rotational stage and a time frame of the slice is collected for each translation-rotation cycle, as depicted schematically in Figure 1. In each frame, the sample is translated m times across the beam and a complete diffraction pattern is collected for each translation position. These m translations are then repeated at n angles between 0 and π in steps of π/(n − 1), and hence m×n diffraction patterns are collected for each time frame. The complete data of a frame represent a sinogram that can be reconstructed, using a back projection computational algorithm, to obtain a tomographic image of the slice in real space.
A series of frames then give a complete picture of the dynamic transformation of the phases involved during the whole experiment. As area detectors are employed in these experiments, the 2D diffraction images should be transformed to 1D patterns by integrating the diffraction rings. Curve-fitting can then be used to identify the phases in each stage as the peaks in these patterns provide distinctive signatures of each phase.
The sample used in this study is a cylindrical extrudate of γ-alumina (Al 2 O 3 ) as a base impregnated with nickel nanoparticles as an active metallic catalyst.
The impregnation was performed using an aqueous solution of nickel chloride ethylenediamine tetrahydrate, NiCl 2 (en)(H 2 O) 4 , as a precursor. The sample and preparation process are similar to those described by Beale et al [1]. During the experiment, the sample was undergoing a heat treatment which consists of ramp increasing temperature from 25 • C at the first frame to 500 • C at the 20th frame (i.e. 25 • C increase per frame) followed by steady state temperature of 500 • C at the last three frames.
EasyDD was used in multi-batch mode to convert the binary EDF images to 1D spectral patterns in ASCII numeric format. It was also used to visualize, align, and back project the sinograms; and curve-fit the peaks of the back-projected patterns.
A Gaussian profile was used to model the peak shape while a linear polynomial was used to model the background. Most peaks were fitted as singlets while the remainder were fitted as doublets. Each collected data set (sinogram), which represents a temporal/thermal frame, consists of 33 rotational and 43 translational steps, while each back-projected data set consists of 1849 (43×43) patterns. An acquisition time of about 0.4 second per measurement (i.e. for particular translational position and rotational orientation) gives an overall collection time of approximately 10 minutes to record a single frame.
The stack plot of the sums of back-projected diffraction spectra is shown in Figure 2. A number of phase distribution patterns (PDP) have been observed with the main ones being displayed in Figure 3. The idea of the PDPs is to group the phases in their spatial distribution in the frames as presented in the 2D color-coded tomographic images. By classifying each particular sequence of tomographic images into a particular PDP and matching the energies/wavelengths for which that PDP is obtained to a standard diffraction pattern, the phase can be identified, and hence its spatial and temporal evolution is exposed. The stack plot of the sums plays an assistant role in this process. The general assumption is that each PDP is a finger print of a single crystallographic entity.  [8] have been used for phase identification. The full details of this study can be found in [14].
It should be remarked that the whole operation of processing and analyzing nickel chloride data (which involves converting tens of thousands of 2D binary images to 1D numeric diffraction patterns; obtaining the sums of the stack plot; aligning, visualizing and back projecting the sinograms; curve-fitting several millions of peaks; phase identification; and final visualization) has been completed in just a few days. This stands in a sharp contrast to the required time and effort, estimated to be weeks or even months of hard work, to perform such a task manually using conventional tools.

Conclusions
The EasyDD project is one step in the right direction for the future development of computational tools to deal with the growing demand on data processing and analysis capabilities. Due to the recent developments in the technology of radiation sources and data acquisition systems, this approach is an important endeavor in developing software that can cope with the massive and ever-increasing size of data collections that are generated in modern multi-tasking scientific experiments.
EasyDD has already proved to be a crucial component within a number of key studies. Without the efficiency and speed of EasyDD, and without its effective strategies, such as the multi-batch processing approach, some of these studies may not have happened.
During the development and use of EasyDD, a novel approach for processing and analyzing massive spectral data collections has emerged in which phase distribution pattern diagrams, combined with stack plots and standard diffraction patterns from powder diffraction databases, play a key role in summarizing and presenting huge data sets in manageable form as an essential step to identifying phases and tracking the evolution of complex physical and chemical systems. As such, the developmental studies represented by the nickel chloride show case take diffractionimaging capabilities way beyond those of previous landmark studies, such as Bleuet et al [2] and Espinosa-Alonso et al [5], particularly for in situ and dynamic phase transformation studies. It is anticipated that these developments should have a significant future impact, at the scientific, technological and industrial level, within several fields of research such as catalysis, in operando studies, phase transformations, dynamic stress imaging of construction and biological materials, etc.
In the process of analyzing nickel chloride system, as outlined in the case study, several chemical and crystallographic phases have been identified and mapped spatially and temporally, thereby leading to important scientific implications. This system emerges as an outstanding example of what can be achieved in following its evolution (precursor, intermediates and final active phase) in terms of time, temperature, crystallite size and spatial distribution. The high level of detail extracted enabled us to elucidate the overall evolving chemistry while also revealing new information on the physical state of the catalyst and providing evidence and suggesting a mechanism for the dynamic development. This particular study stands out as a vivid example that demonstrates the capabilities and potential of these X-ray imaging and analysis techniques.