Electronic nose dataset for COPD detection from smokers and healthy people through exhaled breath analysis

This article presents a database which was obtained by acquiring measurements through a multisensory device called Electronic Nose (E-nose) based on a matrix of metal oxide sensors, in order to discriminate and classify a group of people affected by the respiratory disease Chronic Obstructive Pulmonary Disease (COPD), smokers and healthy control people through exhaled breath analysis. The database consists of 4 groups of measurements which were acquired through the E-nose system: 10 control samples (healthy people), 20 samples of people with COPD, 4 samples of smokers and 10 air samples, where in each group two samples of exhaled breath per person were acquired giving a total of 78 samples (40 from COPD, 20 from control, 8 from smokers and 10 from the air)


Specifications
Biomedical Engineering; Pulmonary and Respiratory Medicine; Computer Science Applications; Signal Processing Specific subject area COPD Detection by using electronic nose technology Type of data Text files Data gathering Data was acquired using a wireless acquisition card coupled to the E-nose measurement chamber. Exhaled breath measurements were detected by the following sensors: SP-3, MQ-3, TGS 822, MQ-138, MQ-137, TGS 813, TGS 800, and MQ-135. Likewise, a user interface developed in Labview V14 was used to arrange the parameters in the data acquisition. Data format Raw data Parameters for data collection For the acquisition of the measurement, a protocol was developed for the collection of the breath samples of the people where the samples were taken early in the day and on fasting to avoid errors or possible misunderstanding factors. On the other hand, the device was conditioned with a heating system inside the sensor chamber to maintain a constant temperature and relative humidity. Description of data collection

Value of the Data
• The database is a point of reference for the implementation of E-Nose in medical applications [1,2] . • This database can be useful in the evaluation of different algorithms for classification and pattern recognition and also select which variables are useful for the implementation of the system [3] . • The data will be an important material for the scientific community and society in general to evaluate the capacity of an electronic nose for the detection of COPD, the influence of tobacco on human health as well as for the detection of other respiratory diseases through breath analysis [4] . • The impact of the data and results obtained with the E-nose could encourage society to investigate, develop and implement more strongly this type of device in the health sector since they are an alternative for obtaining rapid and non-invasive for people.

Data Description
The data were acquired with a sampling rate of 500 samples / second and a resolution of the ADC of 24 bits, obtaining measurements of exhaled breath with a length of 40 0 0 data per sensor through the matrix of 8 sensors or variables (SP-3, MQ-3, TGS 822, MQ-138, MQ-137, TGS 813, TGS-800, MQ-135). In each of the measurements of the three groups analyzed, two measurements were taken per person with the purpose to obtain a correct measurement gathering and also correlate the exhaled breath measurements of the same person (repeatability). The experiment and evaluation of the e-nose system developed were proposed as a pilot test, therefore, the number of samples was not established as a target since the participation was limited to geriatric centers where volunteers were located in different places, making the samples collection much more efficient. Despite the limitation of the sample size, there was huge cooperation in the geriatric centers for finding the number of COPD cases. It should be clarified that only people over 18 years were involved in this study.
On the other hand, selection and exclusion criteria were made to prevent the samples from being affected by other situations that disturb the volunteers.

Inclusion criteria
• Volunteers over 18 years.
• Patients with a diagnosis of COPD.
• An informed consent form must be signed.

Exclusion criteria
• Patients with related diseases.
• Patients who have been given inhaled medications (nasal route) previously at the time of measurement.
In this way, a total of 78 samples were obtained (40 from COPD, 20 from control, 8 from smokers, and 10 from the air. Fig. 1 shows a set of measurements acquired with the E-nose where Fig. 1 (a) illustrates a measurement of exhaled breath with the diagnosis of COPD that corresponds to COPD dataset file, Fig. 1 (b) depicted a measurement of exhaled breath from a smoker that match to SMOKERS dataset file, and Fig. 1 (c) illustrates a measurement of exhaled breath from a healthy person (control) taken as CONTROL dataset file.
The database consists of four files: "COPD.csv" contains the measurements of the 20 patients diagnosed with COPD, "SMOKERS.csv" comprises the measurements of the 4 smokers, "CON-TROL.csv" has 10 measurements of healthy persons, and' "AIR.csv" contains the ambient air samples taken during the sampling of each group; the magnitude of the measurements are in units of voltage and the characteristic response of MOS-type sensors in the presence of gas is shown. Additionally, a "General_data_from _the_dataset" file is attached which comprises relevant data such as gender, age, the severity level of COPD (estimated), etc., additionally, for each measurement an identifier was assigned as shown in Table 1 .

Experimental setup
The E-nose system is made up of a MOS [5,6] sensor chamber with a capacity of 30 mL, an Arduino nano card to control the electronics, a wireless DAQ card for data acquisition, and a PC where data acquisition software is used and the information is processed and analyzed (See Fig. 2 ).
For the control of temperature and humidity, a mechanism was implemented in the E-nose sensor chamber to minimize the effect on the responses of the sensors produced by the random change between measurements (exhaled breath) of different people. For this control, a silicone heater pad was used that heats the sensor chamber between 65 °C-70 °C, where it is possible to minimize humidity and condensation inside the chamber keeping it between 0% RH-10% RH [7] . Additionally, the device contains a chamber recovery and cleaning system injecting synthetic air or oxygen by controlling two pneumatic valves (Valve 1 and Valve 2), where valve 1 controls the inlet of samples to the chamber, and valve 2 controls the intake of synthetic air or oxygen in the recovery stage. During this recovery stage, the sensors return to their initial state (based line), and traces of residual VOCs are expelled into the sensor chamber.

Measurement protocol
For the acquisition of breath samples with the e-nose system, a protocol was defined for the participants, where the participant must be fasting before the measurement acquisition to minimize the error crossing due to odors or ingested substances that could be combined with the exhaled VOCs. For this reason, the data acquisition was carried out during the first hours of the day (i.e., 7 am-8 am). Besides, the acquisition time was about 3 min.
Once the e-nose system triggers the sensors and the sensor chamber reaches the set temperature, the patient must place the mouth on the disposable mouthpiece installed at the inlet of the device, since the flow path of the device does not generate a greater make effort to exhale. In this study it was not necessary to use a nose clip, for this reason, it was reported to avoid taking a deeper inhalation before the moment of exhaling to maintain a higher concentration of VOC during exhalation.
The acquisition is made once the operator presses the button acquisition of the device that in turn emits a beep that indicates to the patient the moment to start exhaling. Through the exhalation process, the sensor array performs the reading of the entered VOCs in the sensor chamber by the volunteer whereas data are recorded by assigning an identifier as established ( Table 1 ) and saving it into a CSV file. For exhalation, the participant is asked to perform it continuously and with a flow that is comfortable for the volunteers, and the duration depends on the person as the vital capacity varies according to age, therefore, the exhalation can be done when the participant reaches to empty his lungs making a slight effort by contracting the belly to expel the highest amount of VOC housed in the lungs, obtaining the end-tidal.
The acquisition and exhalation process finishes when the participant empties their lungs by making a slight effort by contracting the belly to expel the highest amount of VOC stored in the lungs.
Once the acquisition is finished, the device enters a purging or cleaning process entering synthetic air or oxygen under pressure to clean the chamber and return the sensors to their initial state (based line).

Ethics Statement
Based on the basic principles of the Declaration of Helsinki and with the prior approval of the ethics committee of the University of Pamplona, the exhaled breath samples were acquired with prior approval of the participants and authorization of the doctor in charge. Each participant was given an informed consent document explaining the procedure to be carried out (non-invasive) and where there would not be any subsequent side effects. Besides, personal information and medical history were protected for the use of the study.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which they have, or could be perceived to have, influenced the work reported in this article.