Biodiversity of testate amoebae in Sphagnum bogs: the dataset from forest-steppe ecotone (Middle Volga Territory, Russia)

Abstract Background Testate amoebae are a polyphyletic group of unicellular eukaryotic organisms that are characterised by a rigid shell and inhabit mostly freshwater and terrestrial ecosystems. They are particularly abundant in peatlands, especially in Sphagnum-dominated biotopes. Peatland hydrology is the most important influence on testate amoebae communities. The good preservation of the shells in peat deposits and their response to hydrological regime changes are the principles for palaeohydrological reconstructions. Any changes in the water balance of mires should be expected to have far-reaching effects on biogeochemical cycles, productivity, carbon dioxide and methane exchange. New information This paper presents a dataset (Darwin Core Archive – DwC-A) on the distribution of Sphagnum-dwelling testate amoebae in nine mires located in the forest-steppe subzone of the East European Plane. The dataset includes information about 86 taxa belonging to 29 genera and contains 3,123 occurrences of 49,874 individuals. The following environmental variables are provided: microtopography, oxidising and reducing potential, total mineralisation, substrate temperature, acidity, substrate wetness and water table depth. These data might be used for biogeographical and palaeoecological studies, including quantitative reconstructions.


Introduction
Testate amoebae are eukaryotic unicellular organisms that are enclosed in a rigid cover called a shell or a test.The shell has one or two openings (pseudostome) through which filose or lobose pseudopodia protrude during locomotion and feeding (Cavalier-Smith 2004).It is a polyphyletic group within Supergroup Amoebozoa (Adl et al. 2019) related to Phylum Tubulinea (Smirnov et al. 2005) and Supergroup Sar related to Phylum Stramenopiles (Adl et al. 2005) and(?) Rhizaria (Cavalier-Smith 2004).Testate amoebae are particularly abundant in peatlands, where they can constitute up to 50% of microbial biomass (Gilbert and Mitchell 2006).Thus, studying the biodiversity of these organisms provides an important contribution to understanding the structure and functional role of microbial communities.
The growing number of available resources on these microorganisms allows for solving large-scale issues of biogeography and problems of flagship, endemic and eurybiotic species (Foissner 2006).The dataset presented in this paper represents the results of five years of investigations on testate amoebae in Sphagnum-dominated bogs of its southern boundary distribution in the forest-steppe ecotone (Tsyganov and Mazei 2007, Mazei and Tsyganov 2007a, Mazei and Tsyganov 2007b, Mazei et al. 2007a, Mazei et al. 2007b, Mazei and Bubnova 2007, Mazei and Bubnova 2008, Mazei and Bubnova 2009, Mazei et al. 2009, Tsyganov et al. 2016).

Project description
Design description: The description of each observation in the dataset is based on terms used in the general Darwin Core vocabulary.In the dataset, each observation includes basic information on the location (latitude and longitude), date of observation, name of the observer and number of counted individuals.The coordinates were determined in situ using a GPS device.For mire ecosystems, sampling locations contain information on microtopography (hummocks, lawns, hollows or not available), oxidising and reducing potential (redox), total mineralisation (tds), substrate temperature, acidity (pH), substrate wetness and mire water table depth (WTD) (Table 3).Table 1.

Number of occurrences
Species diversity of testate amoeba families in the dataset.
Biodiversity of testate amoebae in Sphagnum bogs: the dataset from forest-steppe ...

Sampling methods
Sampling description: Samples were generally collected in the biotopes dominated by Sphagnum spp.mosses and less frequently by Polytrichum spp.The sampling strategy tried to cover all the diversity of the microtopography of the mires (hummocks, lawns and hollows).Mosses were carefully extracted from the moss carpet and cut into layers according to the vertical zonation of peat soils: first from 0 to 15 cm by a 3 cm step and then the rest of the entire part of the dead mosses (Dobrovol'skii et al. 1998).After that, samples were placed in plastic containers and fixed with a formaldehyde solution in situ to avoid major post-sampling changes in the community structure (Mazei et al. 2015).Additional samples were taken for moisture content measurements.Water table depth (cm) was measured at each sampling point in a hole in relation to water on the surface of the moss cover after at least 30 minutes.Oxidising and reducing potential, total mineralisation, substrate temperature and pH value were measured using portable HANNA multiparameter meters in situ.In the laboratory, samples were thoroughly shaken and stirred for 10 minutes in distilled water to extract testate amoebae.The suspension without Sphagnum stems was poured off to a Petri dish; live amoebae and empty tests were identified and counted separately in one-tenth part of the entire Petri dish using a stereomicroscope at 65× magnification.If necessary, the shells were transferred to a slide with a thin pipette, placed in a drop of glycerol and investigated at 150× or 300× magnification using a light microscope.A minimum count of 300 shells in each sample was achieved.The taxonomic classification at the genus level is based on the revisions of Kosakyan et al. ( 2016 (2022).Moisture content was determined from additional samples taken in the field.Wet samples were weighed and placed in an oven at 105°C for eight hours.The samples were then cooled in a desiccator to room temperature and then weighed again.Percentage moisture was calculated, based on the difference between the wet and dry sample weights.

Taxonomic coverage
Description: The dataset represents information on the distribution of 86 species of testate amoebae in Sphagnum-dominated bogs in the forest-steppe ecotone.There are a total of 29 genera, which belong to 16 families and three incertae sedis ranks (Table 1).In total, 49,238 individuals were identified with 3,123 occurrences (Kriuchkov et al. 2023).The greatest number of genera were in the families Hyalospheniidae (4) and incertae sedis (3), including Argynnia, Physochila and Trigonopyxis.Families Arcellidae, Assulinidae, Centropyxidae, Euglyphidae, Netzeliidae and Trinematidae include two genera and all the others contain only one.The largest number of taxa were found in Arcellidae (18), Difflugiidae (11) and Euglyphidae (11).
The most abundant species in the dataset (

Column label Column description
eventID (Occurrence) An identifier for the set of information associated with an Event.parentEventID (Occurrence) An identifier for the broad event of place and year.
Biodiversity of testate amoebae in Sphagnum bogs: the dataset from forest-steppe ...

samplingProtocol (Occurrence)
Descriptions of the methods and protocols used for material sampling.

samplingEffort (Occurrence)
The amount of effort expended during sampling procedure.

sampleSizeValue (Occurrence)
A numeric value for a measurement of the size (volume) of a sample.sampleSizeUnit (Occurrence) Cubic centimetre.
occurrenceID (Occurrence, eMoF) An identifier for the occurrence (as opposed to a particular digital record of the occurrence).

eventDate (Occurrence)
The date when material was collected or sampling period.

basisOfRecord (Occurrence)
The specific nature of the data record.

kingdom (Occurrence)
The full scientific name of the Kingdom in which the taxon is classified.

scientificName (Occurrence)
The full scientific name, including the genus name and the lowest level of taxonomic rank with the authority.

family (Occurrence)
The full scientific name of the Family in which the taxon is classified.

class (Occurrence)
The full scientific name of the Class in which the taxon is classified.

taxonRank (Occurrence)
The taxonomic rank of the most specific name in the scientificName.

decimalLatitude (Occurrence)
The geographic latitude of location in decimal degrees.

decimalLongitude (Occurrence)
The geographic longitude of location in decimal degrees.

countryCode (Occurrence)
The standard code for the country in which the location is found, Russia (RU).

individualCount (Occurrence)
The number of individuals present at the time of the occurrence.

organismQuantity (Occurrence)
A number or enumeration value for the quantity of organisms.

organismQuantityType (Occurrence)
The type of quantification system used for the quantity of organisms (counted shells).

verbatimDepth (Occurrence)
The original description of the depth below the local surface (sampling depth from Sphagnum stems).
627 mm (Weather and climate 2023).Nine mire ecosystems are included in this dataset (Fig. 1).Bezymianoe mire (53.30463°N, 45.13816°E) was sampled once a month from 20 May to 26 September 2004.The bog is circular and about 300 m in diameter.The vegetation of lawns is dominated by Calamagrostis canescens (Weber) Roth., Eriophorum vaginatum L. and Menyanthes trifoliata (L.).The centre of the mire is overgrown with Betula pubescens Ehrh.and Pinus sylvestris L., together with the shrub Myrtus communis L. The moss cover is flat, with the predominant species Sphagnum palustre L., S. divinum Flatberg & K. Hassel and S. angustifolium (C.E.O.Jensen ex Russow) C.E.O.Jensen.The hummocks in the middle of the mire are formed by S. papillosum Lindb.and S. angustifolium, Polytrichum strictum Brid.and Drosera rotundifolia L. Due to peat excavation, there is a drain channel at the edges of the mire and several ditches with open water in the centre, where Utricularia vulgaris L. and Sparganium minimum Wallr.were common.The edge of the Sphagnum quagmire is formed by S. riparium Ångstr.

Table 3 .
Environmental variables represented in the dataset.Biodiversity of testate amoebae in Sphagnum bogs: the dataset from forest-steppe ...

Table 2
E. laevis (131), P. tenella (130), G. arenaria (127), H. sphagni (124), P. hemisphaerica (118), A. flavum (109) and C. aculeata (103).There are 12 species that were observed only once: Pseudodifflugia gracilis, Galeripora megastoma, Scutiglypha scutigera, Argynnia dentistoma, Euglypha strigosa glabra, Cyclopyxis aplanata microstoma Difflugia bacillifera, Placocista lens, Difflugia brevicolla, Arcella vulgaris undulata, Arcella vulgaris penardi and Lesquereusia inequalis.The description of each observation in the dataset is based on terms used in the general Darwin Core vocabulary (GBIF.org2023).In the dataset, each observation includes basic information on the location (latitude and longitude), date of observation, name of the observer and number of counted individuals.The coordinates were determined in situ using a GPS device.The dataset is structured using the Occurrences and Extended Measurements or Facts (eMoF) extension.The Extended Measurement or Fact table contains the fields listed in the table below.