Dataset of seismic ambient vibrations from the quaternary Norcia basin (central Italy)

Central Italy was affected by a long seismic sequence in 2016 and 2017, characterized by five main-shocks with Mw>5.0. The Mw 6.5 mainshock occurred on 30 October 2016 close to the town of Norcia, located in the intra-Apennine Norcia basin. Different degrees of damages were observed during this seismic crisis, caused by a variable seismic shaking. This was also due to important 1D and 2D variation of Quaternary fluvio-lacustrine sediments infilling the basin. Following such considerations, a new geophysical dataset of seismic vibration measurements was acquired in the study area during the period April 2017–November 2019. We collected mainly single-seismic station noise data, to infer the distribution of resonance frequency (f0) of the basin. A total of 60 sites were measured to cover the entire extension in the basin. We deployed seismometers along three transects of a total length of 21 km, mostly along the main structural directions of the basin (i.e. NNW-SSE and NE-SW). Two 2D arrays of seismic stations with a elicoidal-shaped geometry, and a set of MASW active data were also acquired in the northern sector of the basin, in order to better constrain the seismic velocity of the sedimentary infilling. These new records have been integrated with available geological information in order to reconstruct the deep structure of the basin, as discussed in the research paper by [2]. The entire dataset used in [2] is here provided, together with 7 additional records recovered for the basin (i.e. N54-N60) and ancillary open-source geospatial data. The dataset can be used for different purposes: specific research on the Norcia basin, comparative studies on similar areas around the world, development of new data modeling and testing of new analysis software, and as a training dataset for machine learning applications.


a b s t r a c t
Central Italy was affected by a long seismic sequence in 2016 and 2017, characterized by five main-shocks with Mw > 5.0. The Mw 6.5 mainshock occurred on 30 October 2016 close to the town of Norcia, located in the intra-Apennine Norcia basin. Different degrees of damages were observed during this seismic crisis, caused by a variable seismic shaking. This was also due to important 1D and 2D variation of Quaternary fluvio-lacustrine sediments infilling the basin. Following such considerations, a new geophysical dataset of seismic vibration measurements was acquired in the study area during the period April 2017-November 2019. We collected mainly single-seismic station noise data, to infer the distribution of resonance frequency (f 0 ) of the basin. A total of 60 sites were measured to cover the entire extension in the basin. We deployed seismometers along three transects of a total length of 21 km, mostly along the main structural directions of the basin (i.e. NNW-SSE and NE-SW). Two 2D arrays of seismic stations with a elicoidal-shaped geometry, and a set of MASW active data were also acquired in the northern sector of the basin, in order to better constrain the seismic velocity of the sedimentary infilling. These new records have been integrated with available geological information in order to reconstruct the deep structure of the basin, as discussed in the research paper by [2] . The entire dataset used in [2] is here provided, together with 7 additional records recovered for the basin (i.e. N54-N60) and ancillary opensource geospatial data. The dataset can be used for different purposes: specific research on the Norcia basin, comparative studies on similar areas around the world, development of new data modeling and testing of new analysis software, and as a training dataset for machine learning applications.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.

Value of the data
• Ambient seismic vibrations (noise hereinafter) can be used to determine the properties of the noise wavefield, and compute the resonance frequency ( f 0 ) using H / V spectral ratios. • The dataset can help in the reconstruction of the complex stratigraphic architecture and buried substrate of the Norcia basin. • Data can be cross-checked with numerical data, and used to model active and passive data in a basin environment.
• Researchers, professional geologists and private companies interested in the study of the basin (e.g. seismic response) and post-earthquake recovery of the Norcia area can benefit from these data. • Future seismic data acquisition can integrate our dataset, to refine the knowledge of the buried geology of the Norcia basin, as well as to improve the understanding of similar basin environments.

Data
The dataset reported in this work mainly consists of ambient vibration measurements carried out in a total of 60 sites covering the intra-mountain basin of Norcia basin in central Italy ( Fig. 1 ). The data were collected during the period 2017-2019, following the 2016-2017 seismic sequence that struck the area. The region is located in the Apennine chain, characterized by a Quaternary extensional tectonic regime, reactivating high-angle normal faults capable of generating earthquakes up to Mw = 6.5 [1] . The subsurface architecture of the study area is complex and poorly known due to lack of geophysical data and absence of deep well stratigraphy. This consideration motivated the acquisition of new geophysical data presented here, encompassing seismic records used in [2] and seven additional seismic registrations ( Table 1 ). The seismic recordings are densely distributed across the whole Norcia basin and were collected using pairs of commercial seismic digitizers and velocimeters in similar atmospheric conditions (sunny days with no significant wind). The dataset includes passive single station measurements, 2D arrays, active seismic profiles and georeferenced information which have been accurately organized (see section data assembly) and stored in the Mendeley repository [3] .

Experimental design
A total of 53 sites of seismic vibration measurements were carried out during the fieldwork [2] , whereas 7 additional sites have been added in the present paper. The strategy of the seismic measurements was designed in order to cover the whole Norcia basin. In particular, three main transects were planned along the structural orientation of the tectonic-controlled basin ( Fig. 1 ). The longer transect (9 km long) was planned along the NNW-SSE direction, which is the elongation direction of the basin, parallel to the trend of the main fault (Norcia-Nottoria-Preci fault). Other two minor transets (ca. 4 km long) were designed along the NE-SW and W-E direction in the northern and southern sector of the basin, respectively ( Fig. 1 ). Additional acquisitions were carried out on the uncovered sectors and in correspondence of the borders of the basin. As example Fig. 2 shows the N09 measurement at the northern sector of the basin close to the carbonate substrate.
The eleven stations between N21-N31 and the ten stations between N32-N41, installed respectively along a N-S and W-E oriented transects in the southern part of basin, acquired the seismic noise simultaneously for about 2 h. For this reason, such stations can be treated as two linear arrays in passive acquisition whose data are potentially useful for obtaining information on the velocity model in the southern part of Norcia basin ( Fig. 3 ). A possible approach to derive the 1D model from these data is that of ambient noise cross-correlation analysis, as done in the works [2 , 4] .
Four 2D helical passive arrays have been performed at two sites in the northern part of Nb ( Fig. 1 ): in the palustrine area named "Marcite" and in the old alluvial fan sector known as "Fontevena". For each array we deployed 11 seismological stations arranged on a helical geometry ( Fig. 4 ). The seismic stations were synchronized using the GPS antenna receivers. In both the sites, we have designed one minor array (about 150 m of maximum aperture) and a larger one Table 1 The table shows the information of the single-stations ( N ) and the arrays datasets. The table is provided as OpenDocument ".ods" and comma-separated values ".csv" formats in the supplementary material.   [2] in the Marcite area was done using the mean HV curve computed with the recordings of the MS array. For the Fontevena site, this process was done using N04, very close to FS09 station of the FS array, as a representative HV curve. This decision was taken due to a suspect bias at low-frequency ( < 1 Hz) occurred during recording of the FB and FS arrays. Active multichannel seismic records were also registered at the Marcite site ( Fig. 5 ), close to the arrays and to the single-station measurements N11, N59 and N60. The survey was carried Fig. 1. Location map of the study site (Norcia basin). The blue dots display the location of the single-station noise measurements across three main cross-sections (black lines) overlying a high-resolution Digital Elevation Model as basemap [11] . The red and blue circles provide the position of the arrays in the North sector, whilst the yellow dashed is the velocity model boundary used in [2] . out using a "Do.Re.Mi" seismograph (Sara Electronic Instruments s.r.l.) equipped with 12 channels linked to vertical 4.5 Hz geophones (Sara Electronic Instruments s.r.l.). The data have been recorded using two different linear configurations of geophones named "M2" and "M4" (22 m and 44 m long respectively). In the first case M2, we used a geophone spacing of 2 m, and four energizations generated though vertical impacts on a metallic plate, using a 5 Kg sledgehammer. For M2, two shots were done on the North side close to the geophone 1 (G1), and other two on the South side close to the geophone 12 (G12), with minimum offsets of 2 and 4 m respectively ( Fig. 5 ). In M4 we increased the geophone spacing up to 4 m, using six source points with an offset of 4, 6 and 8 m on either North and South sides (close to G1 and G12 geophones). Table 2 summarizes all the operative parameters related to the offset, geophone spacing and filenames. Each common shot gather encompasses 12 seismic traces, and was collected using a time window of 2 s and a sampling frequency of 10 0 0 Hz. The dataset can be potentially ana-  lyzed using different techniques. However, our field setup was thought to analyze the dispersive behavior of the shallow subsurface (e.g. Multichannel Analysis of Surface Waves -MASW [5] ).

Equipment
All the seismic vibration records (single stations and arrays) were measured using Reftek130 digitizers coupled to Le3d5s velocimeters. In some of these sites, we have co-located (i.e. at the same place) a SARA Geobox (4.5 Hz and 0.5 Hz tern of geophones) ( Fig. 2 ). At a few other points, measurements were repeated in different time periods and slightly different positions in the order of a few meters ( Table 1 ). All measurements using Reftek130 digitizer were provided with a GPS antenna and therefore synchronized with the UTC reference time. The measurements us-  Table 2 Main acquisition parameters of the active multichannel configurations used at the Marcite site (OpenDocument ".ods" and comma-separated values ".csv" formats are included in the supplementary material).

Label (data filename)
Geophone spacing (m) S-G1 offset (m) ing SARA Geobox were recorded without a GPS antenna, and therefore are provided without UTC time synchronization. The amplitude scale of all files in the repository is a velocity in meters per second (m/s). The amplitude of time series was originally in digital counts, but for homogeneity of the dataset we prefer to store the entire data set in m/s after applying the instrumental corrections.

Data assembly
The passive ambient vibration measurements are provided as binary SAC format [6] . In the repository [3] , the SAC binary files are named for example as 171090945_R.N01E (see Table 1 ), where the first part of the name indicates the time period of the recording following the scheme YYJDHHMM; where YY JD HH and MM stands for year (17 means 2017), julian day (109 in the example) and starting hours (09, UTC time) and minutes of acquisitions (45), respectively. Because we used two types of equipment, the flag _R or (_S) indicates noise recording performed with Reftek130 coupled to Le3d5s velocimeter (whilst the flag _S with a Sara Geobox). The second part of the file name after the dot refers to the code name of the temporary station (N01 in the previous example), and the last letter indicates the component of the ground motion (E, N and Z means EW, NS and UP components, respectively). Because our dataset is composed of three-components measurements, at each site we have always three files (following with the previous example 171090945_R.N01e, 171090945_R.N01n and 171090945_R.N01z). The SAC binary format is a common format in the seismological community, and it is used within the Sac software (Seismic Analysis Code, http://ds.iris.edu/files/sac-manual/ ; [6] ), an interactive program designed for the study of seismic signals, especially time-series data. It can be requested by following the instructions on the web page accessed via the link: http://ds.iris.edu/ds/nodes/ dmc/forms/sac/ (last accessed on 2020/04/04) . The SAC binary format is convenient with respect to ASCII format because the file size is smaller. Further the SAC binary format keeps other important information into the headers; for example, NPTS (number of samples in the time series), DELTA (sampling step in seconds; e.g. 4e −3 corresponds to 250 Hz), KZTIME (begin time in the format hour, minute, seconds and mseconds; e.g. 09:45:42.0 0 0), STLA (latitude of the measurement point in decimal degree; e.g. 4.281351e + 01) and STLO (longitude; e.g. 1.311644e + 01), KSTNM (station name that was set equal to the code into the name; e.g. N01) and KCMPNM (component of ground motion; e.g. E for EW component).
The SAC binary format is automatically read by other software commonly used for the analysis of seismic data, such as the opensource code geopsy ( www.geopsy.org , last accessed on 04/04/2020). Geopsy is a quite standard tool to analyze passive data [7] . In any case SAC binary format can be easily converted in ascii files, using software such as the same geopsy (e.g. the command line "geopsy 171101313_R.N19E -export file_output.txt" easily converts a SAC file in a one column ascii file).
The data of the 2D arrays (MB, MS, FB and FS; acronymous for Marcite Big, Marcite Small, Fontevena Big and Fontevena Small, Fig. 1 ), as described in the main text of [2] , keep the same format of the single-station measurements, except that the time indicated in the name does not correspond to the starting time of the stored files. This is because all the 2D array data has been already synchronized and trimmed (setting the begin header into the sac file equal to zero), and therefore the data set of each single array is ready to be processed for array analysis.
The active multichannel data are provided as SEG-Y files [8] , obtained after conversion of the proprietary * drm format through the GEOEXPLORER software (Sara Electronic Instruments S.R.L.). The filename in the dataset describes basic information: for example, "M2-2" indicates, in its first part, the linear array configuration (Marcite area -M) and the geophone spacing in meters (2), whilst "−2" suggest the source (S) -geophone (G1) minimum offset in meters (an underscore divides the filename for a positive offset along the array, e.g. "M2_26").
Together with the seismic records, we provide ancillary information represented by a Geospatial dataset provided as an open-source GIS project (EPGS: 32633) created with QGIS software ( https://qgis.org/en/site/ , last access April 2020). The project includes 18 vectors (EPGS: 4326) and one OpenStreetMap (OSM) basemap (EPGS: 3857). In addition, we provide each layer as separate Geopackage ( * .gpkg) and Google K eyhole M arkup L anguage ( * .kml) files. This geospatial dataset contains the location and geometry of the seismic surveys carried out at the Norcia basin, together with some layers related to the paper [2] .
The layer NOI (cyan points) includes all the points of single-station measurements recorded with the Reftek and Sara equipment. The layers FB, FS, MB and MS report the location of the four helical arrays [2] . The orange points report the two big arrays (Fontevena Big -FB) and (Marcite Big -MB); the green points display the two small arrays, respectively (Fontevena Small -FS and Marcite Small -MS). The three vectors "Section_S1, Section_S2, Section_S3" and the layer "Velocity_mod_boundary" are the cross-sections and the velocity models boundary by [2] . The layer groups M2 and M4 gather the information related to the active surveys. The vectors M2_G1-G12 and M4_G1G12 represent shorter and longer linear seismic arrays, respectively (the geophone G1 to the North and G12 Southward). The starting and end points vectors of each one are displayed by vectors with "filename_p" (e.g. M2_G1-G12). The position of the seismic sources is also reported as red point vectors (e.g. filename "S_M2_G1"), labelled with the minimum offset information. A Web Map Service (WMS) layer from OpenStreetMap is also provided as a basemap [9 ; 10] . The service is freely available from the website ( www.openstreetmap.org , last access April 2020) and is integrated in the QGIS project through the OpenLayers plugin ( //plugins.qgis.org/plugins/openlayers _ plugin/ , last access April 2020).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
Norcia area is provided as Web Map Service (WMS) in the QGIS project using the OpenLayers plugin (Copyright (c) 2010-2017 Pirmin Kalberer & Mathias Walker, Sourcepole AG). This map is available under a creative common CC BY-SA licence and copyrighted OpenStreetMap contributors from: www.openstreetmap.org . We sincerely thank all the authors providing the open-data and codes allowing us to work on such research projects. This work was written by the authors in smart working modality during the covid-19 emergency (in Italy March-April 2020).

Supplementary materials
Supplementary material associated with this article can be found online at https://www. sciencedirect.com/science/article/pii/S235234092030603X?via%3Dihub