Low cost air quality sensors “PurpleAir” calibration and inter-calibration dataset in the context of Beirut, Lebanon

The PurpleAir PA-II-SD is a low-cost particulate matter (PM2.5 and PM10) sensor that is currently available on the market. It is one of many such low-cost and commercially available particulate matter sensors which are being adopted by individuals and researchers worldwide. With growing use of these sensors, there is an interest in better understanding the performance and characteristics of these devices. Data was collected from twelve of these low-cost PurpleAir PA-II-SD sensors and two high fidelity Met One E-BAM PLUS instruments installed at a single location, on the campus of the American University of Beirut, in Beirut, Lebanon over a period of time from June 28, 2020 to September 30, 2020. The data was collected with the aim of assessing inter-sensor variability for the PurpleAir sensors and the sensor accuracy of the PurpleAir when compared to a high fidelity Met One E-BAM PLUS instrument.


Specifications
Environmental science-pollution Specific subject area Air pollution sensor calibration Type of data Table  Graph How data were acquired Instruments: • 12 Purple Air PA-II-SD.
• 1 Met-One E-BAM PLUS configured for measuring PM10. Data format • Raw.
• Analysed. Parameters for data collection • Data logging by PurpleAir instruments was every 2 m.
• Data logging by E-BAM Plus is recorded every 1 h.
• Data was retimed and averaged to an hourly reading to synchronize with the reference E-BAM PLUS data. • Negative readings were excluded (2 datapoints out of 2163).
• Exclusion of timestamps with missing data points for any sensor for that timestamp. Description of data collection • PurpleAir are real-time optical air quality sensors measuring PM2.5 and PM10.

Value of the Data
• The data contains simultaneous measurements of twelve PurpleAir sensors [1] and two highfidelity E-BAM PLUS instruments [2] measuring PM2.5 and PM10 levels in Beirut and allows for the assessment of inter-sensor variability between PurpleAir sensors and their accuracy compared to a high fidelity instrument. • The data from the twelve PurpleAir sensors can enable PurpleAir users to quantify errors when comparing data from multiple sensors installed over a large area. • The calibration coefficients reported here can enable PurpleAir sensor users to improve the accuracy of their PurpleAir measurements. • The placement of the PurpleAir and Met One E-BAM PLUS sensors at the campus of the American University of Beirut provide measurements of the background PM levels within and around the city of Beirut over the time period from July 1, 2020 to September 30, 2020. • The data provided here is a resource to allow the more than 20,0 0 0 users [5] (individuals, researchers, and weather forecasting agencies reporting air quality index [6] ) to improve the accuracy of reporting of their data.

Data Description
The first part of the data summarizes the results of the linear regression of data from the PurpleAir sensor against the reference E-BAM PLUS instruments.
• Table 1 summarizes the errors (root mean square) for the two calibration and validation scenarios for the two PM ranges. The smaller range being selected from the 90% quantile of the concentration measurements shown in Figs. 1 and 2 .  • Table 2    The second part shows the data collected from 11 PurpleAir sensors placed at a single location to assess the precision of measurements between multiple sensors.
• Figs. 11 and 12 show 95% confidence interval around mean for PM2.5 and PM10 measurements from 11 PurpleAir sensors with the linear best fit.
The full dataset which is accessible at the repository is divided into two CSV files.  The file 'MultiSensor_IntercalibrationData.csv contains hourly PM2.5 and PM10 data from eleven PurpleAir sensors for the date range between June 28, 2020 to July 11, 2020.
The columns are divided as such: • The file 'SingleSensor_CalibData.csv' contains hourly PM2.5 and PM10 from a single Pur-pleAir sensor and two Met One EBAM-PLUS instruments for the date range from July 1, 2020 to September 30, 2020.
The file contains five columns:      • (E) meanAB_10: PM10 reading (ug/m3) of PurpleAir sensor. The value is the average of both channels A and B from the sensor.

Experimental Design, Materials and Methods
Data for ambient air pollution (PM2.5 and PM10) was collected on the campus of the American University of Beirut at 33.9N and 35.5E.
Twelve PurpleAir PA-II-SD sensors and two E-BAM PLUS were installed at a single location on the campus of the American University of Beirut. The dataset generated was used for a twofold purpose: 1. Generate a linear calibration curve for each of the PM2.5 and PM10 measurements of Pur-pleAir PA-II-SD sensors using Met One E-BAM PLUS instruments as reference ( Table 2 ). 2. Report on the precision of measurements of PurpleAir sensors and inter-sensor variability by comparing measurements of multiple sensors from a single location ( Figs. 11 and 12 ).
The location of the PurpleAir sensors was chosen to be the campus of the American University of Beirut which is located within the capital, Beirut, as it is representative of background PM levels within the city and also to have it adjacent to the Met One E-BAM PLUS instrument, the reference measurement, which are part of the American University of Beirut Air Pollution Observatory Project [3 , 4] .
Data is reported by the E-BAM PLUS at an hourly interval and every two minutes for the PurpleAir sensors. These were averaged every hour to synchronize the reporting interval of all sensors.
For the calibration of the PurpleAir sensor against the E-BAM PLUS instrument, hourly data covering a span of three months (from July 1, 2020 to September 30, 2020) from a single Pur-pleAir sensor was used resulting in a total number of 2163 data points.
The dataset was split into two groups, the first for linear regression / curve fitting comprising 90% of the data points for the purpose of performing linear regression and the second comprising 10% of the data point for the purpose of validation of the curve fit.
For each size range (PM 2.5 and PM 10 ), two linear regressions were performed: 1) Using the entire span of concentrations in the dataset ( Figs. 3 and 4 ).
The second regression (the 90% quantile range) is done for the purpose of achieving a better result for the regression with the outliers excluded. The improvement is apparent in a lower RMSE value for the regression for the 90% quantile when compared to the full range as seen in Table 1 .
For assessing the inter-sensor variability eleven PurpleAir sensors were used, all located at a single site on the campus of the American University of Beirut with measurements covering a span of five weeks (from June 28, 2020 to July 09, 2020) for a total of 276 data points (hours) and the upper and lower bounds of the 95% confidence intervals were calculated across the span of measurements for this time period.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.