Dataset of serum proteomic spectra from tuberculosis patients detected by Raman spectroscopy and surface-enhanced Raman spectroscopy

In this data article, we present Raman spectroscopy (RS) and surface-enhanced Raman spectroscopy (SERS) data obtained using an InVia Reflex confocal Raman microscope (Renishaw; Wotton-under-Edge, UK) and processed using WiRE™ 4.2 software. The data include RS and SERS spectra detected, after removal of albumin, from the serum proteome of tuberculosis (TB) patient categories and controls (active tuberculosis; ATB, latent tuberculosis; LTBI, TB-exposed persons with undetected infection; EC, healthy controls; HC) using 532 nm and 785 nm laser wavelengths for RS and 785 nm for SERS. The RS and SERS data had high reproducibility (SERS; R2 = 0.988, RS at 785 nm; R2 = 0.972, RS at 532 nm; R2 = 0.9150). This data can be used for analysis of proteomic spectra based on RS and SERS for TB diagnosis and can also be compared to other populations. The spectral dataset based on normal, healthy control groups might be used as the control data for analysis of other diseases using RS and SERS approaches.

normal, healthy control groups might be used as the control data for analysis of other diseases using RS and SERS approaches. © 2019 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
In this report, we present data of serum proteomic spectra detected among TB-infection categories using both Raman spectroscopy (RS) and surface-enhanced Raman spectroscopy (SERS). The TB-infection categories included unexposed individuals without infection (healthy controls; HC), exposed individuals without infection (early clearance; EC), those with latent infection (LTBI) and those with active TB disease (ATB). The protocol to acquire the dataset was approved by the Khon Kaen University Ethics Committee in Human Research (Ethics number HE611116).
The presented data include raw data of protein fingerprints detected among TB catagories (Supplementary Table S1). The high reproducibility of the data detected from both RS and SERS is shown in Fig. 1 and Table 1. The reproducibility of protein samples measured using SERS (R 2 ¼ 0.988) was slightly higher than that of RS at 532 nm (R 2 ¼ 0.915) and at 785 nm (R 2 ¼ 0.972) ( Fig. 1  Value of the Data This dataset provide serum proteomic spectra from individuals with latent tuberculosis (TB) and those with active TB based on Raman Spectroscopy (RS) and SERS. This is the only serum proteomic spectral dataset of latent TB in a public database. These data may be relevant for other researchers who (i) analyze the serum proteome based on RS and SERS, (ii) have a focus on TB diagnosis, especially for distinguishing between active and latent TB. The dataset might be used for TB diagnostic applications by distinguishing among TB disease categories including active TB, latent TB, TB-exposed persons with undetected infection and healthy controls based on RS and SERS spectra. The spectral dataset of the normal healthy control groups might be used as the control data for studies on other diseases based on Raman spectroscopy analysis.  Table 1).  [1] were used. A diagnosis of ATB was based on clinical symptoms and positive evidence from a molecular test (Xpert MTB/RIF, Cepheid, Sunnyvale, CA, USA), acid-fast bacilli staining or bacterial culture. LTBI cases were defined based on a positive result of the QuantiFERON-TB Gold test (QFT) from healthy TB-exposed (persons having close contact with an ATB patient, such as working in TB wards for at least 6 months). The EC category was defined based on a negative result of the QFT in individuals having contact with ATB patients. Healthy controls (HC) were defined as apparently healthy persons with no evidence of TB exposure and having a negative result of the QFT.

Sample preparation
Albumin was excluded from each serum sample using protein filteration columns and centrifugation. Protein concentrations were measured using the Bradford protein assay and 0.8 mg of each protein sample were dropped onto aluminum foil. The samples were left to air dry for 3 minutes and detected using RS and SERS.

Protein fingerprint spectra collection and analysis
Raman Spectroscopy readings were taken on a InVia Reflex confocal Raman microscope (Renishaw; Wotton-under-Edge, UK) in a range of 179e1926 cm À1 for a 532 nm laser and 508e1632 cm À1 for a 785 nm laser. Data were generated based on RS detected at 532 and 785 nm laser wavelengths and SERS detected at 785 nm laser wavelength with 16 area points of detection in each sample. WiRE™ 4.2 software was used for data processing. To test the reproducibility of RS and SERS data, the average R 2 score of 120 comparisons among 16 replications (detected from 1 to 16 area points) were calculated from 1010 peak positions in a sample from a randomly selected ATB case using the corrgram package in the R programming language. All SERS chips were developed by our group (NECTEC, Thailand). However, due to a shortage of SERS chips, we pooled the serum protein samples to match the available number of SERS chips.

Ethics approval and consent to participate
The specimens from the biobank of the Department of Microbiology, Faculty of Medicine, Khon Kaen University. The protocol to obtain the dataset was approved by the Khon Kaen University Ethics Committee in Human Research (Ethic number HE611116).