Air Temperature and Relative Humidity Datasets from an Urban Meteorological Network in the City Area of Novi Sad (Serbia)

This data article describes two groups of datasets which capture, firstly - 10-minutes air temperature (Ta) and relative humidity (RH) data from 27 urban and non-urban sites over a period of 3.5 years covering 2014–2018; and secondly - hourly Ta data from 12 urban sites over a period of 2 years covering 2016 and 2017. Both datasets are from urban meteorological network located in the Novi Sad city (Serbia). These datasets have 2 different types of information in the collection: one type provides details about the monitoring sites at which the Ta and RH sensors are placed, while the second type contains Ta and RH data at all sensor locations. In all, the 10-minutes dataset contains about 185,000 instances of Ta and RH data, and the hourly datasets contain 17,544 instances of Ta data. The 10-minutes datasets were not quality controlled, but the hourly Ta data has been cleaned and gap-filled so there are 24 measures at each site for each day. There are multiple potential uses, where this data can be applied. It can provide insights in understanding intra-urban and inter-urban research, urban climate modeling on local or micro scales, heat-related public health investigations and urban environment inquiries. It can also be used in machine learning experiments, for example, to test the accuracy of classification algorithms or to build and validate spatio-temporal machine learning functions, either for classification purposes or for gap filling. These datasets are directly citable through its DOIs and available for download from the Zenodo platform or from the Fair Micromet Portal.


a b s t r a c t
This data article describes two groups of datasets which capture, firstly -10-minutes air temperature (T a ) and relative humidity (RH) data from 27 urban and non-urban sites over a period of 3.5 years covering 2014-2018; and secondlyhourly T a data from 12 urban sites over a period of 2 years covering 2016 and 2017. Both datasets are from urban meteorological network located in the Novi Sad city (Serbia). These datasets have 2 different types of information in the collection: one type provides details about the monitoring sites at which the T a and RH sensors are placed, while the second type contains T a and RH data at all sensor locations. In all, the 10-minutes dataset contains about 185,0 0 0 instances of T a and RH data, and the hourly datasets contain 17,544 instances of T a data. The 10-minutes datasets were not quality controlled, but the hourly T a data has been cleaned and gap-filled so there are 24 measures at each site for each day. There are multiple potential uses, where this data can be applied. It can provide insights in understanding intraurban and inter-urban research, urban climate modeling on local or micro scales, heat-related public health investigations and urban environment inquiries. It can also be used in machine learning experiments, for example, to test the accuracy of classification algorithms or to build and validate spatio-temporal machine learning functions, either for classification purposes or for gap filling. These datasets are directly citable through its DOIs and available for download from the Zenodo platform or from the Fair Micromet Portal.
© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Table   Subject Environmental Science: Climatology Specific subject area Urban climatology: In situ measured 10-minutes and hourly air temperature (T a ) and relative humidity (RH) variables for intra-and inter-urban assessments and spatio-temporal analyses Type of data Comma separated text files (.csv) -T a and RH datasets; Excel files (.xlsx) -metadata of station sites How the data were acquired Raw T a and RH data were obtained from the Novi Sad Urban Network (NSUNET) wireless system located in the urban area of Novi Sad and its surrounding [1] . Data was collected from measurements in situ, using 27 measurement sites, each equipped with ChipCap 2 air temperature and relative humidity sensors developed by the General Electric Measurement & Control Company. Sensors were placed in ventilated radiation protection screens [1] . Datasets consisting of 10-minutes and hourly based T a and RH values, obtained from 25 urban sites and 2 non-urban sites. 10-minutes datasets are raw and not quality controlled, but hourly T a datasets from 12 urban stations are subjected to quality control procedures (detection of outliers and gap filling) [2] .

Data format
Raw (Not quality controlled -with outliers and gaps) -10-minutes T a and RH datasets from 27 urban and non-urban sites. Raw (Quality controlled -no outliers and no gaps) -hourly T a datasets from 12 urban sites. Description of data collection The outdoor T a (in °C) and RH (in%) data were collected at 10-minutes basis for the period between 1st June 2014 and 8th February 2018. Data from sites in urban areas (25 stations) collected T a and RH values from different built-up local climate zones (LCZs) and data from sites in non-urban areas collected T a and RH values from different land cover LCZs [1] .

Value of the Data
• The T a and RH data, obtained from the high-densely urban meteorological network, contributes to comprehensive urban thermal and climate analyses on seasonal or daily levels, and can be useful in detailed intra-urban and inter-urban research, urban climate modeling on local or micro scales, heat-related public health investigations and urban environment inquiries. • Data provided by this research can be used for evaluations of fine-scale temperature and humidity models in urban areas, as well as material for risk studies of extreme weather events (heat or cold waves). It can provide insightful information used in local or regional climate change adaptation strategies in cities, benefitting climate researchers, data scientists, health policy professionals and urban planners. • Researchers can use this data as it is well suited to both machine learning and time series researchers as data is of a spatio-temporal nature, suited to gap filling or classification machine learning models. • Educators can use this data for machine learning (clustering, classification, time series analyses) in under-graduate and post-graduate training. • Researchers in the field of climatology, meteorology and public health who are focused on the interactive effects of climate change and urbanization on population and environment in cities.
• The data can also be useful for stakeholders, especially urban planners, architects, demographers, environmentalists who investigate heat load effects on various social/urbanization activities in cities.

Objective
The datasets are obtained from the Novi Sad Urban Network (NSUNET) system that was created as a part of the international cross-border project [3] , where each measuring site was equipped with multiple sensors and a variety of electronic and hardware devices. In creating this urban monitoring network, the project's objective was to provide conditions for progressive urban climate research into the future, i.e. contribute to the thermal pattern differences with an in-depth investigation of the various urban designs and city surroundings [1 , 4 , 5] . Ultimately, the primary motivation for creating the NSUNET network was to obtain conditions for further urban climate research by way of intra-and inter-urban research and thus, widening possibilities for cooperating with research groups having similar goals.

Description of monitoring area
Novi Sad is the second largest city in the Republic of Serbia, with 102 km 2 of built-up and urban green/blue areas and a population of 330,0 0 0 people in 2017. The city is located on the Pannonian Plain in Central Europe (45 °15 18 N, 19 °50 41 E), and thus, most of the urban area is flat with an absolute elevation between 72 m and 80 m [6] . Novi Sad has a Cfb climate (temperate climate, fully humid, warm summers, with at least four months of average T a above 10 °C) based on the Köppen-Geiger climate classification system [7] .

Air temperature and relative humidity data
The NSUNET system generated two separate databases with different measurement frequencies (10-minutes and hourly) and different quality control protocols.
10-minutes T a ( °C) and RH (%) data are obtained from the NSUNET system that covered urban and non-urban area of Novi Sad and its surroundings. These meteorological parameters are obtained from 27 stations with data from all stations presented in a single .csv file [1 , 8] . The content of the .csv file is organized in such a way that the first column represents date (dd/mm/yy), the second time (hour:minute), while the remaining columns represent T a and RH values from each of 27 stations, respectively (e.g. 15.6 °C and 47.8 RH). The name (ID) of each station is defined by two digits: the first represents the number of the local climate zone (LCZ), and the second one represents the number of each sensor in a particular LCZ [1 , 9] . Each station provides 10-minutes measurements covering the time period from 1st July 2014 to 8th February 2018 using Coordinated Universal Time (UTC). Datasets are freely available on the Zenodo platform [8] .
Hourly T a data ( °C) are obtained from the NSUNET system that covered the urban area of Novi Sad. Hourly T a datasets from NSUNET are located across 12 different urbanized sites and contain 12 temperature sensors (datasets from all sensors are presented in one .csv file) [1 , 10] ]. The content of the .csv file is organized so that the first column represents date (dd/mm/yy), the second one time (hour:minute), while the remaining 12 columns represent temperature values from each sensor, respectively (e.g. 15.6). The name (ID) of each sensor is defined by two digits: the first represents the number of the LCZ, and the second one represents the number of each sensor in particular LCZ [1 , 9] . Each sensor provides hourly measurements covering the time period from 1st January 2016 to 31st December 2017 using the UTC. Datasets are freely available on Zenodo and FMP platforms [10 , 11] , with WMO metadata descriptions on the Knowledge Sharing Platform constructed as part of a FAIRNESS Cost Action with FAIR testing protocols as described in [11 , 12] .
In the following sections, the quantity and quality details for the hourly T a datasets is presented. Note that for the 10-minutes interval of T a and RH datasets, the same analysis on the presence of outliers and missing values, is not provided. Table 1 displays site metadata of 12 T a sensors (with hourly data) using 8 columns to describe the site locations for temperature sensors. The station ID corresponds to the station ID column in T a metadata, with address, longitude and latitude. LCZ values range from 2 to 8 while the description is taken from 5 possible values. Station height refers to how high the sensor is located above ground while altitude refers to the specific site location.

Detailed summary description
For each sensor, 17,544 instances of hourly T a have been recorded, with Table 3 showing a statistical description for hourly T a values on a site basis. The top row represents the station names while rows provide values for: Mean, standard deviation (std), min, max , 25%, 50% and 75% quartiles. We provide a visual sample from Table 3 in Figs. 1-4 , where winter (January) and summer (July) hourly T a for both midnight and midday are shown for 2017. In these figures, the x -axis represents the day (31 values in each case) for the selected month and the y -axis represents the T a . Each site has a selected color which can be used to identify those sites that are warmer or colder than others. These four figures are also useful to visualize the range of T a across sites for a specific date in winter or summer.

Raw data collection
Sites for each sensor were selected to represent a thermal/humidity pattern across different built-up areas and their surroundings, as shown in Fig. 5 . These urban area types are known as the LCZs, and were proposed by Stewart and Oke [9] . Using this method, the network was developed by selecting sensor locations based on two main criteria: firstly, that stations be evenly distributed based on the ratio of each LCZ within urban area, while at the same time, each station should be 10 0-20 0 m inside the defined LCZ; secondly, that the station be positioned in Fig 5. NSUNET station sites in urban Novi Sad. Blue dots -hourly T a data from 12 urban sites; Blue + green dots -10-minutes T a and RH data from 27 urban and non-urban sites. Map Source: https://a3.geosrbija.rs/ . a street location to maximize protection from vandalism. The LCZ map with T a and RH sensor sites were presented in the study of Še ćerov et al. [1] .
NSUNET continuously collected raw T a and RH values from July 2014 to February 2018. Data was collected using ChipCap 2 sensors, fully calibrated and developed by the General Electric Measurement & Control Company, and located in a ventilated radiation protection screen with dimensions of 200 × 240 mm. Based on the calibration certificate provided by the manufacturer, further calibration of sensors was deemed unnecessary during the period of network operation. The accuracy of the T a sensor was ±0.3 °C and the RH sensor was ±2% (20-80% RH), and they were installed at least 4 m above ground (with exceptions ±0.2 m), on arms (50 cm long) fixed to selected lamp posts. Each measurement site, near the sensors, was equipped with a station containing a central processor, EPROM chip used for data storage, GPRS/EDGE/3 G modem, backup battery and charger. Sensors at city sites had a direct power supply with batteries charged during the time that street lights were powered on. Sensors measured the T a and RH values every minute and every 10 min, with measured data sent to the main server located at University of Novi Sad, Faculty of Sciences. At each site, the internal memory for the cloud storage approximated 70 0,0 0 0 measurements (15 months), meaning more consistent datasets in the face of problems due to mobile Internet provider or server issues [1 , 13] . The network ceased to operate in February 2018 due to the lack of institutional financial support for operating the network (particularly the cost of data transfer) and lack of support from government institutions to maintain the hardware. The distance (kms) between 12 urban sensors is presented in Table 4 as a dissimilarity matrix, computed using the Haversine function. The Haversine distance is the distance between two points on the surface of a sphere where the coordinates of each point are the (latitude, longitude) pair. The result in radians is then converted to kms. This type of matrix is used by a number of clustering algorithms to determine how close 2 points are to each other. This is useful, for example, when using spatial data to fill gaps as one can set a distance threshold beyond which, sites are not reliable in providing gap filling support.

Technical specifications for network development
In terms of implementation, NSUNET used the file transfer protocol (FTP) to send measured data to the servers using a predefined structure. FTP, a well-established protocol of choice, is highly reliable for file transfer. One of the main objectives was to develop a back-end system (defined as Core Segment) sustainable for an extended period and adaptable to any operating system changes. Variable names were defined with 2-3 capital letters, followed by character ':' and its measured value. Each measurement is terminated using Unix EOL, character (0xa). Using this data structure provided a plaintext view of measured data while allowing a fast file parsing process. Two variable types were introduced in NSUNET's design: a) climatological and b) debug. The former stored urban-climate data while the latter produced information on the entire workings of the wireless network of stations (defined as Remote Segment). The file delivered to the system used the naming convention: station-id_date_time.txt. The Core Segment contained configuration files for each station, with the same data structure design. Thus, stations could be re-configured remotely. Each station was defined with its own unique ID while each measurement contained its measurement session (MS) together with time of measurements (time stamp, TS). There were roughly 28 different variables used in the ASCII (plaintext) data structure. Data were subjected to different QC methods as part of the core segment modules. The Core Segment itself was developed using open-source technologies, more specifically BASH shell scripting. To ensure the highest levels of reliability of each point of failure (from sensor up to the database server instance), an additional study was performed [13] . A highly reliable serverclient model was developed to support this form of experimental study. Here, the Core Segment was stress tested with roughly 20,0 0 0 parallel connections. Above this, the reliability of station data degrades with frequent and different errors in inter-process communications (IPC) of the Linux systems.

Outlier analysis and gap filling
From the T a raw values, measured at a 10-minute frequency, 1-hour datasets were extracted for the period 1st January 2016 -31st December 2017, and subjected to quality control (QC) measures. The QC methodology comprised two main steps: logical outliers were removed and missing temperature values were interpolated. The outlier detection process excluded all values higher than 50 °C or lower than −30 °C. Gap filling comprised three sub-methods based on linear interpolation. The complete description of the quality control methodology i.e. outlier detection and gap filling, are explained in the guideline [2] .

Ethics Statements
The work did not involve the use of human subjects, animal experiments and data collected from social media platforms.

Data Availability
Hourly Air Temperature Datasets from city of Novi Sad -NSUNET system (Original data) (Zenodo).
10-minutes Air Temperature and Relative Humidity Datasets from city of Novi Sad -NSUNET system (Original data) (Zenodo).