Dataset of PAHs determined in home-made honey samples collected in Central Italy by means of DLLME-GC-MS and cluster analysis for studying the source apportionment

This paper would like to show all the data related to an intensive field campaign focused on the characterization of the Polyaromatic Hydrocarbons (PAHs) composition profile in almost 60 honey samples collected in Central Italy. The analytical data here reported are the base for a study aimed to identify the pollution sources in a region. 22 PAHs were analyzed by means of ultrasound-vortex-assisted dispersive liquid-liquid micro-extraction (DLLME) procedure followed by a triple quadrupole gas chromatograph/mass spectrometer (GC-MS). A chemometrics approach has been carried out for evaluating all the data: in particular, principal component analysis and cluster analysis has been used both for the identification of the main natural/anthropogenic pollutants affecting a site and for evaluating the air quality.


Keywords:
Honey PAH Bioindicator DLLME GC-MS Cluster analysis PCA Source apportionment for evaluating all the data: in particular, principal component analysis and cluster analysis has been used both for the identification of the main natural/anthropogenic pollutants affecting a site and for evaluating the air quality. The honey samples were processed by an extraction protocol based on the ultrasound-vortex-assisted dispersive liquid-liquid micro-extraction (DLLME) procedure followed by a gas chromatography coupled with a triple quadrupole mass spectrometry. The analyses were carried out using a standard solution of perdeuterated PAH compounds. Data  Value of the Data • The analytical procedure reported allows to investigate PAHs by perdeuterated compounds and DLLME-GC-MS analysis at trace levels • Honey samples can be considered as a biomonitoring index in anthropogenic or natural areas, avoiding long and tedious sampling procedures • Data can be useful for source apportionment of PAHs in relationship to different emissions for air quality studies • Data can used by other scientists for different chemometrics analysis in the food quality study

Data Description
The dataset reported here is related to the analytical procedure set up for analyzing 22 polyaromatic hydrocarbons (PAHs) ( Table 1 ) in honey samples adapted from Kazazic et al. [1] .
The raw files of the gas chromatography coupled with mass spectrometry (GC-MS) data are available in a dedicated repository: all the chromatograms are deposited in the Mendeley one [2] . It should be noted that in the repository 62 chromatograms are deposited: the difference, i.e. 5 chromatograms, is due to samples #2997 and #2998 whose chromatographic runs were repeated three times, and to a toluene chromatogram reported (for checking the column clearness).
Under such analytical conditions 57 home-made honey samples were analyzed. For a preliminary analysis of the relations among PAHs, the Pearson's correlation was performed: Table 2 shows the main correlations between PAHs with R above 0.6.
The simultaneous presence of 22 PAHs and 57 samples generates a problem of multivariate analysis. Before running the chemometric approach, the analysis of variance (ANOVA) was carried out by SPSS statistics software for Windows, version 25.0 (IBM Corp., Armonk, NY, USA).
The results show what compounds with high concentration variability (i.e., high relative standard deviation, RSD) record high square mean values, and a significance value (or α level) equal to zero ( < 0.01), i.e. BaA, BeP, Bb + jF, BghiP, BkF, IPy, Chr, BaP, DahA, Phe, DalP, DaiP, DahP, DaeP, Per. The main consideration regards the role of molecules at the highest molecular weight. In fact, this occurrence is responsible for the sample distribution in different clusters. The Cluster Analysis (CA), performed by means of SPSS software and based on non-hierarchical (k-means) technique, meaning that the grouping is built on Euclidean distance, was applied for determin-  Table 2 Main correlations between PAHs, showing an R above 0.6. For acronyms: see Table 1 . Table adapted from ref. [3] .
Correlations between 0.6-0.7 Correlations between 0.7-0.8 Correlations between 0.8-0.9 Correlations > 0.9 Ace-Acy Fl-Acy BeP-BkF ing the possible grouping among the honey samples [4] . First, 4 clusters were identified. Table 3 shows distance among center clusters: the greater the distance between the final centers of the clusters, the greater their dissimilarity. On the other hand, Table 4 reports the number of samples in each cluster. It can be noted that cluster 1 is characterized by 1 sample (#41), cluster 2 by 3 samples (#32, #49, #50) and cluster 4 by 6 samples (#15, #19, #24, #25, #47, #53) whereas the cluster 3 is the most abundant containing 41 samples. Tables 5-7 show the statistical data (in terms of mean, min, max values, standard deviation, RSD and 95 percentile) of each cluster (except for cluster 1).
A Principal Component Analysis (PCA) was applied for identifying the similarities among different datasets [ 5 , 6 ]. The chemometrics approach was carried out by open-access software, i.e. Tanagra [7] : the only condition considered was to have a dataset made of the same compounds. Following this statement, 15 PAHs were considered for the chemometrics treatment. The authors performed the PCA overall three datasets. In details, Table 8 shows the PCA applied to all the samples (i.e., 135 samples, divided in 51 from this study, in 61 from Serbia area [8] and in 23 from Belgrade area [9] ).   Table 8 PCA of all the samples investigated in this study along with the data collected in other papers [ 8 , 9 ].   Fig. 2 shows the PCA biplot applied to all the samples investigated in the three studies, using the two principal components.   2. PCA score plot of the samples collected both in Belgrade and in Serbia and in Central Italy (this study) ("sample": data from Serbia area [8] ; #"sample": this study; h"sample": data Belgrade area [9] ). For acronyms: see Table 1 .

Honey sample collection
The analysis involved 57 home-made honey samples from different geographical locations in Central Italy. The sampling was carried out in maritime, hilly and mountainous areas; the samples were directly collected in the apiaries by local experts in the period from May to July. The samples were collected in different locations reported in Fig. 3 ; in each sampling site 5 samples were withdrawn every 15 days. For a better understanding of the PAH behavior in terms of distribution and contamination, a comparison with other data present in literature was carried out. In particular, two papers were considered: they report a dataset of PAHs determined in honey samples collected in Serbia [8] and Belgrade [9] areas. These are the only papers showing a complete PAH profile.

DLLME procedure
The extraction was carried out by DLLME procedure [10] . 10 μL of the extraction standard solution of perdeuterated PAHs (10 ng μL −1 , L 429 IS, Wellington Laboratories) were added to 2.5 g of honey sample in acetone solution before shaking by vortexing for 40 seconds to favor the dissolution of the sample. The microextraction was performed using 150 μL of chloroform and the extraction process was favored by the formation first of a macroemulsion by vortexing for 5 min and then by the formation of a microemulsion with the aid of an ultrasonic bath for 6 min. Subsequently, in order to facilitate the breaking of the emulsion and the recovery of the solvent, 10 g L −1 of NaCl were added and then centrifuged at 40 0 0 rpm for 30 minutes [11] : after, 1 μL was injected.

PAHs analysis by GC-MS
The instrumental analyses were performed by a triple quadrupole gas chromatograph/mass spectrometer (GC-MS) (Trace 1310 GC/TSQ 80 0 0 Evo) (Thermo Fisher Scientific, Waltham, MA, USA) in electronic impact (EI) mode and the chromatographic separation was performed by a DB-XLB column (60 m × 0.25 mm, 0.25 μm I.D.) (Agilent Technologies, Santa Clara, CA, USA) with H 2 3.00 mL min −1 as the carrier gas. The PTV splitless injector was maintained at a constant temperature of 250 °C, mass transfer line temperature of 290 °C and ion source temperature of 300 °C. The oven was held at 60 °C for 1 min, then warmed 20 °C min −1 until 200 °C was reached, and held for 0 min, after was warmed at 7.0 °C min −1 until 275 °C and held for 7 min, finally it was warmed at 18 C min −1 until 325 °C and held for 13 min. The analysis was performed in Selected Ion monitoring (SIM) and full scan mode: SIM time 0.215 s, full scan mode time 0.083 s and total scan mode time 0.300 s.