Fourier transform and near infrared dataset of dialdehyde celluloses used to determine the degree of oxidation with chemometric analysis

This dataset is related to the research article entitled ``A fast method to measure the degree of oxidation of dialdehyde celluloses using multivariate calibration and infrared spectroscopy''. In this article, 74 dialdehyde cellulose samples with different degrees of oxidation were prepared by periodate oxidation and analysed by Fourier-transform infrared (FTIR) and near-infrared spectroscopy (NIR). The corresponding degrees of oxidation were determined indirectly by periodate consumption using UV spectroscopy at 222 nm and by the quantitative reaction with hydroxylamine hydrochloride followed by potentiometric titration. Partial least squares regression (PLSR) was used to correlate the infrared data with the corresponding degree of oxidation (DO). The developed NIR/PLSR and FTIR/PLSR models can easily be implemented in other laboratories to quickly and reliably predict the degree of oxidation of dialdehyde celluloses.

Chemistry and Chemometrics Specific subject area Pulp chemistry and carbohydrate polymers Type of data Tables, spectroscopic data and Opus files How the data were acquired Data format Raw (.csv, .o) and analysed Opus files (.q2) Description of data collection DAC samples with different degrees of oxidation were generated by periodate oxidation of softwood kraft pulp. The isolated samples were air-dried and analysed using NIR and FTIR spectroscopy. The infrared data were pre-processed using min-max normalisation, first derivative plus multiplicative scattering correction or first derivative plus vector normalisation. The DO of each sample was determined by the two most used methods, the UV/Vis method [1] and the titration or oxime method [2] .

Value of the Data
• The data can be used to predict the degree of oxidation rapidly and reliably in dialdehyde celluloses. • Determining the aldehyde content is crucial for tailoring the properties of dialdehyde cellulose, which is applied in areas such as drug delivery [4][5][6] , medical applications [7][8][9] , sensor technologies [10][11][12] and material science [13] .
• This dataset allows researchers to implement this method in everyday research saving money, time and resources.

Data Description
All data refer to the original research article "A fast method to measure the degree of oxidation of dialdehyde celluloses using multivariate calibration and infrared spectroscopy" [3] . Fig. 1 shows a schematic of the experimental design to collect and analyse the dataset. The data in Table 2 displays the isolated dialdehyde cellulose samples with their file names (NIR and FTIR dataset) and their obtained degrees of oxidation (DO)-from periodate consumption using UV/Vis spectroscopy (DO UV/Vis ) and from potentiometric titration after quantitative reaction with hydroxylamine hydrochloride (DO Titration ). The degrees of oxidation from periodate consumption are calculated using a calibration curve ( Fig. 2 ). The corresponding spectral raw data is available in Mendeley repository data ("Dataset" > " raw_data": Spectral raw data for each PLSR model, .csv files). The isolated DAC samples were used to construct four PLSR models that correlate the NIR and FTIR data with the corresponding DO. Table 1 summarizes the parameters of partial least-squares regression. OPUS QUANT2 was used to develop the NIR/PLSR models (1 and 2) and FTIR/PLSR models (3 and 4), which are available in Mendeley repository data ("Dataset" > " processed_data": OPUS files for each model with the corresponding spectra, .q2 and .o files). https://data.mendeley.com/datasets/bncy3n34v7/draft?a=b69c69fa-86f3-4ce3-916d-87f4c9e90ef9

Experimental Design, Materials and Methods
The dataset was generated by first oxidising pulp samples with sodium periodate. After that, the isolated samples were air-dried and analysed by NIR and FTIR spectroscopy. The correlating degrees of oxidation were determined by UV/Vis spectroscopy and potentiometric titration   Table 2 Overview of dialdehyde cellulose samples (from SWKP) analysed with their degrees of oxidation (DO) obtained from the periodate consumption by UV/Vis spectroscopy (DO UV/Vis ) and potentiometric titration (DO Titration ).
no sample was taken 14.16 ( ±0.00) no sample was taken no sample was taken no sample was taken no sample was taken no sample was taken 59.14 ( ±0.14) 28.06 ( ±4.22) sample size was too small 44.14 ( ±1.42) 49.75 ( ±0.82)      after hydroxylamine hydrochloride treatment. Finally, the IR datasets were correlated to the corresponding degrees of oxidation using the OPUS QUANT2 package (Bruker Optics, v. 8.2.28). The provided data can be used to reproduce the PLSR models with any chemometrics software package or use the analysed Opus files to predict the DO in any periodate oxidized cellulose sample.
The following sections are expanded versions of the description of the methods presented in our previous works [3] .

Chemicals and reagents
UPM-Kymmene Oyj (Lappeenranta, Finland) provided softwood kraft pulp samples used as the starting material in periodate oxidation. Sodium periodate ( ≥99.8%; Sigma Aldrich; oxidant in the oxidation of pulp to dialdehyde celluloses) and hydroxylamine hydrochloride (99%; Sigma Aldrich) were used without further purification.

Periodate oxidation to prepare dialdehyde cellulose samples
DAC samples were prepared by periodate oxidation of softwood kraft pulp. Air-dried softwood kraft pulp was disintegrated in deionised water using a commercial kitchen blender (3 times for 10 s). It was then filtered and added to a sodium periodate solution. The flask was covered with aluminum foil to limit side reactions (i.e., degradation of sodium periodate). The temperature (room temperature to 50 °C), periodate concentration (0.8 eq to 2 eq) and reaction duration (up to 3 days) were varied to prepare DAC samples with degrees of oxidation between 0 and 80%. The isolated DAC samples were thoroughly washed with water (2-3x) and ethanol (1x) using vacuum filtration.

Infrared measurements
Before recording the IR spectra, the DAC samples were air-dried for 2 to 14 days and conditioned in the measuring room before analysis. Other drying techniques (such as oven drying or freeze-drying) are not recommended since a controlled equilibrium between the free aldehyde and its masked forms is needed. All isolated DAC samples were measured three times with NIR and FTIR spectroscopy. The total number of spectra slightly varies since single measurements failed or the sample size was too small for NIR analysis ( Table 1 ). The NIR spectra were recorded using an MPA Multi-Purpose Analyzer (Bruker, Billerica, MA) with a fibre optic probe and a Te-InGaAs detector (10 kHz). The parameters for all analyses included an 8 cm −1 resolution, the 12,50 0-40 0 0 cm −1 spectral range and 32 scans per sample. All measurements were conducted at room temperature using aluminum foil as the background. The fibre optic probe was pressed onto three different (randomly chosen) positions of the DAC surface to consider inhomogeneity within the sample. The FTIR spectra were recorded using a Frontier FTIR spectrophotometer (PerkinElmer, Waltham, MA, USA) in conjunction with the attenuated total reflection (ATR) technique. All analyses' parameters include a resolution of 4cm −1 , the spectral range of 40 0 0-650 cm −1 , and 64 scans per sample. All infrared measurements were conducted at room temperature. The three-fold measurements were conducted at three different (randomly chosen) positions of the DAC sample to consider inhomogeneity within the sample.

Determining the degree of oxidation of isolated dialdehyde samples
The degrees of oxidation were determined from the periodate consumption through UV/Vis spectroscopy [1] and potentiometric titration after the quantitative reaction with hydroxylamine hydrochloride [2] .
For potentiometric analysis, 18-22 mg of the isolated dialdehyde celluloses were freeze-dried and shaken in 5 mL of 0.25 M hydroxylamine hydrochloride solution for 48 h. The hydroxylamine hydrochloride solution was adjusted to pH 4.6. Hydroxylamine hydrochloride quantitively reacts with the carbonyl groups of DAC, releasing one mole of hydrochloric acid per aldehyde functionality. For each sample, 2.00 mL were diluted with 5 mL of deionized water to ensure sufficient contact with the pH electrode. Each sample was prepared in duplicate and titrated back to pH 4.6 with 0.01 M sodium hydroxide solution. The DO was then calculated from the volume of consumed sodium hydroxide V NaOH according to where [ NaOH ] is the NaOH concentration, M AGU the molecular weight of the anhydroglucose unit, m DAC, 0 the weight of the freeze-dried DAC, V 0 the initial volume of the added hydroxylamine hydrochloride, V 1 the volume of the titrated oxime solution, and D O blank the DO of the unreacted pulp as a blank. The DO was also determined from the periodate consumption by UV/Vis spectroscopy at 222 nm (DO UV/Vis ). 100 μL of each filtrate was diluted with deionized water, and the remaining periodate concentration was calculated from the periodate absorbance at 222 nm. The dilution factor was varied depending on the equivalents of sodium periodate used to measure absorbances in the range of 0.5 to 1.1. UV/Vis measurements were performed using a LAMBDA 35 UV/Vis spectrometer (PerkinElmer, Waltham, MA). The UV/Vis spectrometer was referenced to deionised water using a quartz cuvette with a 10 mm path length. Assuming no side reactions, the DO was calculated according to where m NaI O 4 , 0 and m pul p, 0 are the mass of sodium periodate and pulp, respectively; M NaI O 4 is the molecular weight of the sodium periodate, A the arithmetic mean of the measured absorbance, b the calibration curve slope ( Fig. 2 ), F D the dilution factor, and V the solvent (deionized water) volume; M AGU is the molecular weight of the anhydroglucose unit (AGU), simplified on the assumption that the pulp consists of cellulose only.

Partial least squares regression
The unprocessed NIR and FTIR data were pre-processed using min-max normalisation, first derivative plus multiplicative scattering correction or first derivative plus vector normalisation. The PLSR models were calculated using the OPUS QUANT2 package (Bruker Optics, v. 8.2.28; parameters in Table 1 ). The PLSR algorithm automatically validated the obtained correlation model with a selected test set of the recorded IR spectra. In addition, the PLS 1 algorithm in OPUS QUANT2 was used to determine the best pre-processing method and the optimal spectral range ( Table 1 ). Leave-one-out cross-validation was used. Two sets of infrared data (NIR and FTIR) with two different degrees of oxidation (from the periodate consumption and potentiometric titration) give four PLSR models, which are all available in Mendeley repository data.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.