Sampling data of macro-invertebrates collected in grasslands under restoration succession in a lowland stream-valley system

Abstract Background Publication of data from past field studies on invertebrate populations is of high importance, as there is much added value for them to be used as baselines to study spatiotemporal population and community dynamics in these groups. Therefore, a dataset consisting of occurrence data on epigaeic invertebrates collected in 1996 was standardised into the Darwin core format and cross-checked in order to make it publicly available following FAIR data principles. With publication, it can contribute to the biodiversity assessment of terrestrial invertebrates, thereby improving the availability and accessibility of much-needed historical datasets on macro-invertebrates. Here, we present sampling event data on invertebrates from four grasslands taken out of agricultural production over the span of several decades, effectively displaying a chronosequence on the effects of agricultural extensification. The data were collected by means of a standardised sampling design using pyramid traps, pitfall traps and soil samples. New information The raw data presented in this data paper have not been published before. They consist of 20,000+ records of nearly 70,000 specimens from 121 taxonomic groups. The data were collected using a standardised field study set-up and specimens were identified by taxonomic specialists. Most groups were identified up to family level, with eight groups identified up to species level. The occurrence data are complemented by information on plant composition, meteorological data and soil physical characteristics. The dataset has been registered in the Global Biodiversity Information Facility (GBIF): http://doi.org/10.15468/7n499e


Introduction
In 1996, a nine-month sampling programme was carried out in four semi-natural grasslands in the northern part of the Netherlands with the aim of studying the effects of restoration management in former agricultural grasslands on the composition of below-and above-ground macro-invertebrate communities (Hemerik and Brussaard 2002).The four grasslands had been taken out of agricultural production after a period of intensive land use with high levels of nutrient amendments.At the time of sampling, the grasslands formed a restoration succession chronosequence of 7, 11, 24 and 29 years postagricultural management.The main aim of this extensification programme was to restore former botanical richness by reducing nutrient levels in the soil (Berg and Hemerik 2004).The 1996 field sampling programme was set up in order to study the effects of restoration management on soil and ground-dwelling insects and spiders.It was suspected that changes in plant biomass production and alterations in litter quantity and quality stemming from the cessation of nutrient amendments would affect macro-invertebrate populations (Hemerik and Brussaard 2002).In addition, changes in vegetation structure, affecting microclimate, might have consequences for soil and ground-dwelling animal groups.Therefore, the researchers wanted to explore whether they could find differences in species diversity and composition of macro-invertebrates in the chronosequence.
The sampling plots were situated in the valley of the brooklet "Anlooer" Diepje, a much intact natural stream that forms part of the stream-valley system "Drentsche Aa," which is situated in the north-eastern part of the Netherlands in the Province of Drenthe (Liu et al. 2021).The fields surrounding the brooklet mainly consist of grasslands on loamy sand with a slightly acidic pH.The field study set-up involved the use of photoeklektors, also known as pyramid emergence traps, to sample airborne insects emerging as imago from the soil.Inside the tent of the pyramid emergence trap, two pitfall traps were dug into the ground.
The soil that was dug out for placing the pitfalls was taken to the laboratory as soil samples to hand-sort for macro-invertebrates.The pyramid and pitfall traps were emptied weekly and moved randomly to a new location within the experimental plot every two weeks.Concurrently, a number of environmental variables were collected, including soil physical characteristics, soil temperature, as well as plant composition in the experimental plots (see Table 3 for a complete list).The sampling programme ran for 9 months, from 11 March until 23 December 1996.This resulted in a dataset containing 21,282 records of 71,415 specimens that were caught during the nine months of sampling.All invertebrate species were identified up to at least the taxonomic level of order.In total, 121 taxonomic groups were distinguished (see Table 3).Terrestrial isopods (Isopoda), millipedes (Diplopoda), centipedes (Chilopoda), spiders (Araneae), harvestmen (Opiliones), click beetles (Elateridae), ground beetles (Carabidae) and weevils (Curculionidae/Brentidae) were identified up to species level.
In order to make it publicly accessible and (re)usable following FAIR data objectives, the data have been reviewed and cross-checked, as well as reorganised and standardised following the Darwin Core Standard (Wieczorek et al. 2012).The data have subsequently been published on GBIF (Hemerik and Creuwels 2023): https://doi.org/10.15468/7n499e

Study area description:
The study area is located in the Drentsche Aa stream valley in the Province of Drenthe in the north-eastern part of the Netherlands (Fig. 1).The Drentse Aa is a 30 km stream valley that has had a long history of agricultural exploitation with high levels of nutrient amendments and large alterations in the hydrological functioning of the Drentsche Aa stream and its tributaries, amongst which the Anlooer Diepje brooklet (Liu et al. 2021).
The four field study plots, all located along the banks of the Anlooer Diepje (Fig. 1), consisted of grass dominated vegetation, with Lolium perenne L., Holcus lanatus L., Festuca rubra L. and Anthoxanthum odoratum L., respectively, dominating in the chronosequence gradient.Management consisted of mowing once a year and the resulting hay was removed (Verschoor et al. 2001).

Funding:
The reorganisation and publication of the data were made possible with a grant from NLBIF Netherlands Biodiversity Information Facility (grant number nlbif2021.003).

Description:
The sampling programme was designed as a block field study set up within randomly selected experimental plots that were situated in the four selected grasslands of the chronosequence.

Sampling description:
Within each of the four grasslands taken out of agricultural production, a plot measuring 30 m × 15 m was selected for the sampling to take place.Each plot was divided into three subplots (10 m × 15 m).One pyramid trap was placed within each of the subplots (i.e. 3 pyramid traps per experimental plot) and moved randomly across the subplot every two weeks.Inside the pyramid trap, two pitfall traps were placed (i.e. 6 pitfall traps per experimental plot), flush with the soil surface.The soil samples that were dug out (i.e. 12 soil samples per experimental plot) to accommodate the pitfall traps were collected in plastic bags and taken to the laboratory in Wageningen, stored at a temperature of 4°C and hand-sorted for soil macrofauna.Both pyramid and pitfall catches were collected on a weekly basis and soil samples on a two-weekly basis.In total, 12 pyramid traps were placed in the four plots, together with 24 pitfalls and 48 soil samples.See Table 1  The pyramid traps consisted of an opaque grey plastic ring (diameter of 56 cm).Above this ring, a gauze net made of white net fabric formed an inverted funnel-shaped dome.On top of the funnel dome, a white plastic trapping container in the shape of a ring functioned as a lightfall, bringing the total height of the pyramid trap to 80 cm (Fig. 2).The lightfall at the top contained a 2% formaldehyde (CH O) solution, which served as a killing preservative.
The pyramid traps were designed to catch flying insects that were either residing in the grass tussocks at the moment of placement or, more specifically, all insects that hatched from the soil during the time the trap was placed at each location.The pitfall traps consisted of white plastic cups (0.5-litre yoghurt cups) with a diameter of 9 cm.The pitfalls were also filled with a 2% formaldehyde solution.
Details on sampling plots (named K, O, C and B), years since extensification (YSE) and trap and soil sample numbers per plot.Note that soil samples are divided into up-soil (top 10 cm) and lowsoil (10-20 cm deep) samples, but numbers and plots are the same for both types.Sampling was set up with pyramid traps and pitfall traps.The pyramid trap was 80 cm high and had a diameter of 56 cm.Light traps and pitfalls were emptied weekly.Soil samples were taken every two weeks and, at the same time, the light traps and pitfalls were moved to a new location within the experimental plot.
Sampling data of macro-invertebrates collected in grasslands under restoration ...
The specimens that were collected during sampling were stored in 70% ethanol, awaiting identification.The samples were destroyed after identification.
Table 1 provides an overview of the four plots and the trap numbers that were placed in each sampling plot.
Additional data were collected during the sampling programme on abiotic and botanical variables (see Table 2 for an overview).In addition, photographs of the research plots taken in 2021 are provided in Suppl.material 1, together with an overview of botanical composition in 1996 and in 2021.

Variable Unit
Wet weight of upper 7.5 cm of the soil core (WWu) g Wet weight of the soil core 7.As part of a study on nematodes in the same plots, Verschoor et al. (2001) collected abiotic data, namely bulk density, clay-silt percentage, pH-H O, C-pool, N-pool and C/N ratio (see their table 1).
Environmental variables (a total of 10) can be found in the GBIF Dataset "Macro-invertebrate survey Drentsche Aa 1996".The plant composition is in Suppl.material 1.

Geographic coverage
Description: The study area was located in the Province of Drenthe, in the north-eastern part of the Netherlands, see Fig. 1.The plots O, B, C and K have their own pair of coordinates in GBIF.For the location of these plots on the map, we refer to Fig. 1.

Taxonomic coverage
Description: After collection, the specimens were identified by taxonomy specialists.All specimens were first sorted and identified to order level; true flies (Diptera) and beetles (Coleoptera) were further identified to family level by Lia Hemerik.Arnold Spee and Theo van Dijk assisted with field sampling and identification of the carabid beetles.Matty Berg identified terrestrial isopods (Isopoda), millipedes (Diplopoda) and centipedes (Chilopoda).Aart Noordam identified spiders (Araneae) and harvestmen (Opiliones).Michael Traugeot identified beetle larvae.Theodoor Heijerman identified weevils (Curculionidae/Brentidae).
Before publication of the dataset, all original taxonomical assignments were cross-checked with the Checklist Dutch Species Register -Nederlands Soortenregister (Creuwels and Pieterse 2023) and provided with a currently valid assignment when necessary.For a complete list of taxa included in the dataset, see Table 3.

Notes:
The field study started on 11 March 1996 and was finished on 23 December 1996.Sampling points were created each week when traps were collected.Comments or notes about the taxon or name.

Usage licence
type (Event core) event.
verbatimIdentification (Occurrence extension) the taxonomic identification as it appeared in the original record.

Figure 1 .
Figure 1.Location of the four grasslands in the valley of Anlooërdiepje in the north-eastern part of the Netherlands (see inset below left).The four black arrows point at the four experimental plots, indicated in black.The numbers in circles indicate the number of years the grassland was taken out of agricultural production at the time of sampling in 1996: plot O (7 years since last fertilisation), plot B (11 years since last fertilisation), plot C (24 years since last fertilisation) and plot K (29 years since last fertilisation).On each grassland, a randomly selected rectangle of 30 m by 15 m served as the sampling plot.See Suppl.material 1 for more information on palnt composition.
5-15 cm deep WWl) g Dry weight soil of upper 7.5 cm of the soil core (DWu) g Dry weight soil of the soil core 7.5-15 cm deep (DWl) g Moisture content soil the soil core 0-7.5 cm deep (= 100 * (WWu-DWu)/WWu) % Moisture content soil the soil core 7.5-15 cm deep (= 100 * (WWl-DWl)and dry weight were measured from the soil cores collected for placement of the pitfall traps.Furthermore, mean air temperature and total weekly precipitation in the week preceding a collection event were calculated from data available from the KNMI Meteorological Station Eelde (KNMI 2023), which is situated approximately 11 km from the research site.The measurements are connected to the eventIDs and published as a MeasurementOrFact table.
for details.

Table 3 .
TaxonomicSampling data of macro-invertebrates collected in grasslands under restoration ... Sampling data of macro-invertebrates collected in grasslands under restoration ... Sampling data of macro-invertebrates collected in grasslands under restoration ... Sampling data of macro-invertebrates collected in grasslands under restoration ... Sampling data of macro-invertebrates collected in grasslands under restoration ...
list of all species caught during the 1996 sampling programme.The list is organised alphabetically from highest to lowest taxonomic level; first by phylum, then class, order and finally by taxon.Note that taxon contains three levels of identification; subfamily, genus and species, which are in this sequence dealt with within the groups.
(Hemerik and Creuwels 2023)hed in the Global Biodiversity Information Facility platform, GBIF(Hemerik and Creuwels 2023).It is set up as a sampling event dataset with a three part structure; eventID, occurrenceID and MeasurementorFact.The dataset is published as a Darwin Core Archive (DwCA).The core data file contains 2,898 events with 21,887 occurrences.The GBIF IPT (Integrated Publishing Toolkit, Version 2.5.6)serves as the data repository.The table below provides descriptions of the column labels used.Note that labels are entered in alphabetical order, not in the order they are provided in the DwCA; MoF stands for Measurement-or-Fact extension.