Data mobilisation at the Fund of Invertebrates of the State Museum of Natural History of the NAS of Ukraine

Abstract Background The described dataset contains occurrence records of invertebrate specimens deposited at the State Museum of Natural History of the NAS of Ukraine, Lviv, Ukraine (SMNH NASU). It combines diverse taxonomic groups, mostly belonging to the class Insecta of the phylum Arthropoda, that were selected as prioritised for digitisation in war conditions. Selected specimens were ascertained as those being the most vulnerable to hostilities and requiring virtual preservation. Such virtual preservation is essential in the war realities as collection can be lost or damaged at any moment, resulting in a significant retrospective biodiversity data gap. At the same time, collection virtualisation and its deposition on the internet grant remote access to scientists who cannot visit it in person due to the war. Moreover, we believe that the mobilisation of the data from the Ukrainian collections and their publication online are essential for the integration of Ukrainian research facilities into a global scientific biodiversity pool. New information A total of 3,660 occurrence records mobilised in 2023-2024 from the collection of invertebrates of the SMNH NASU, were published. This dynamic dataset will be continually supplied by new records during further digitisation work.

SMNH NASU, being located in Lviv, far from the war line, is, however, suffering from indirect impacts such as lack of financial support due to govermental budget transfer to the prioritised defence policy and damage to hosted collections as a result of blackouts and lack of heating.The scientific fund of invertebrates of the SMNH NASU comprises several independent collections each curated by their custodians and curators.There are collections of extant and fossil insects, molluscs, microscopic slides of the soil invertebrates (nematodes, flatworms, springtails, protura, oribatid and mesostigmatic mites etc.) and a few memorial collections.In general, the fund is subdivided into the principal and supplementary subfunds.The principal subfund of invertebrates includes ca.188,000 storage units and the supplementary subfund has over 13,000 storage units.There are represented specimens collected since the end of 19 and the beginning of the 20 centuries by many famous regional naturalists, including Kazimierz Smulikowski, Marian and Jarosław Łomnicki, Maksymilian Nowicki, Michał Świątkiewicz, August Stöckl and Stanisław Kapuściński (Starzyk 2004).
The digitisation of the collections at the SMNH NASU started in the early 2000s when separated datasets were created locally by curators.In 2017, the Data Centre th th "Biodiversity of Ukraine" (DCBU) has been launched (State Museum of Natural History of the NAS of Ukraine 2024).Since then, it has served as the main entry point to host and operate with digitised materials (Rizun andScherbachenko 2019, Rizun et al. 2020).By 2024, 20358 records about invertebrate specimens hosted at SMNH NASU, were deposited at the DCBU, including 19068 records of Arthropoda (Insecta -15359, Arachnida -3709), 978 records of Mollusca (Bivalvia -372 and Gastropoda -606) and 312 records of Nematoda (Enoplea -302 and Chromadorea -10).Thus, today, the DCBU contains data about 11% of the total number of invertebrate specimens hosted at SMNH NASU.Since late 2023, to provide wider access, the integration of mobilised data to the GBIF platform has begun.At the moment, data for only about 2% of the total number of invertebrate specimens hosted at SMNH NASU were published in GBIF and represented in the current dataset.Only 5% of the hosted specimens had images of different quality, which were captured at different times.Currently, many of these images are archived on the internal SMNH NASU servers and their publishing online through the DCBU and other platforms is still in progress.(Fig. 1).Design description: As it was initially mentioned in our previous publication (Novikov et al. 2024), the project aims to: (a) develop digitisation protocols for the most valuable and vulnerable natural history collections; (b) mobilise and publish the data about such collections deposited at SMNH; and (c) digitise prioritised specimens deposited at SMNH, including those belonging to the herbarium collection and the collection of invertebrates.However, the digitisation workflow for invertebrate specimens differs from those for the herbarium material described by Novikov et al. (2024).In the case of digitisation of the herbarium material, there are two stages of capturing images -the first, preliminary, when the images of the herbarium labels are taken and the second, main, when the images of the entire herbarium sheets are taken.The data from the herbarium labels are manually transferred to the dataset and later verified for taxonomic consistency and other issues.Only after such verification, specimens meeting preselection and quality criteria, are digitised using a hi-res photo camera.In the case of invertebrates, the traditional digitisation workflow (Flemons and Berents 2012, Nelson et al. 2015, Blagoderov et al. 2017, Harris and Marsico 2017, Dupont et al. 2020, Nieva de la Hidalga et al. 2020) has been chosen -the specimens are preselected by curators, based on the preliminary outlined criteria and digitised.The data are later mobilised from the original images.Such a protocol simplifies the digitisation procedure and is excellent for routine digitisation of the entire collection.However, it loses in specimens preselection, resulting sometimes in the digitisation of less important specimens, misidentified specimens, mix of specimens from different regions and occasional records with low data quality (e.g.unknown collection date, uncertain locality etc.).

Funding:
The grant programme "Science for the Recovery of Ukraine in the War and Post-War Periods" (Nr 2022.01) of the National Research Foundation of Ukraine (NRFU).

Sampling methods
Sampling description: Similarly to the botanical fund (Novikov et al. 2024), three levels of priority for digitisation and data mobilisation were defined in other funds of SMNH NASU (Fig. 2).The first, red, group comprises the most valuable specimens that are, at the same time, the most vulnerable (e.g. can be easily and heavily damaged by moisture, fire, mould etc.).The second, yellow, group combines valuable specimens that are relatively resistant to damage and specimens that also can be easily damaged, but have moderate importance.The third, green, group includes specimens of regular species and specimens from supporting (e.g.loan and educational) collections that are either resistant to damage or have limited scientific value.
Although the mentioned priority classification (Fig. 2) is not constant, we believe that general logic of designation of the priority groups by multiplication of the specimen value by its vulnerability could be useful for other digitisation projects.In particular, such priority classification can serve as a good point to focus the digitisation in emergency situations.At the same time, the number of value and vulnerability levels, as well as their order, can be determined individually, depending on the collection peculiarities (e.g.presence of type material and preservation technique) and certain digitisation goals (e.g. in case of long-term flow digitisation, such priority groups can be neglected or bulked).
Step description: 1.To designate the digitisation priority, the working table with classes of value (from 1 to 10) and vulnerability (from 1 to 8) of the specimens has been created (Fig. 2).Each specimen was evaluated following these two principal criteria and received cumulative points.For example, if the specimen is a herbarium voucher (7 level of vulnerability) representing the endemic taxon (7 level of value), it received 49 (7 x 7) cumulative points.If the same herbarium voucher also represents the type material (10 level of value), then it received 70 (7 x 10) cumulative points.In such a case, the highest points (i.e.70) were taken into account and this specimen was digitised at the first priority, while other specimens with lower points were digitised later, in order of received points.
2. The prioritised specimens were generally checked for preserving condition and presence of the readable labels.They also were preliminarily evaluated to fit the digitisation protocols and available technical facilities at SMNH NASU.

3.
The still image of each specimen has been captured using different photosystems available at SMNH NASU.For microslides, the digitisation photo camera Canon EOS 800D (24 Mp) with Canon EF-S 18-55mm f/3.5-5.6 IS STM lens mounted on the horizontal tripod over the light box has been applied.The following presets were set up: ISO 200, f/5.6, exposition 1/250, automatic white balance.Additionally, the camera Olympus DP72 mounted on the trinocular microscope Olympus ВX51 has been used for microphotography purposes.For the digitisation of pinned and fixed specimens, the photo camera Canon EOS 800D (24 Mp) with Canon EF 100mm f/2.8LMacro IS USM lens also mounted on the horizontal tripod over the lightbox has been applied.The images were saved simultaneously in RAW (master file) and JPEG (distributive file) formats in the highest possible resolution.At the moment, the resulting images are stored on the internal SMNH NASU server and only a portion of them has been published online.
4. The data from the labels have been manually filled from the images into Excel tables mapped following DarwinCore standard (TDWG 2024) and separated by taxonomic group.
5. The first step of data quality control has been manually realised by assistants, who checked the initial datasets for typos, technical mistakes and errors.
6.The occurrence records were georeferenced using the data from the field "locality" and OpenStreetMap facilities (OpenStreetMap contributors 2024).The OpenStreetMap has been chosen over other similar web map services as it is well-developed, openly provided under ODbL licence and has extended functionality, allowing checking the elevation.Many toponyms in the OpenStreetMap are provided along with spelling variants and there is an option to add new or correct existing information on the map.This results in better identification of the locality described on the label.The coordinates accuracy has been evaluated in metres and filled in the respective field in the dataset.
7. The second step of data quality control has been realised by collection curators, checking for consistency of provided coordinates and localities descriptions.
8. Separate datasets were merged into the common dataset by the project PI (AN).9.The third step of data quality control has been realised by the project PI (AN), checking for consistency of provided data in the merged dataset.10.The dataset has been published using the GBIF IPT (GBIF 2024b).

Taxonomic coverage
Description: The dataset contains occurrence records belonging to two phyla, Arthropoda and Nematoda (Fig. 4 and Table 1).Table 1.
The list of invertebrate species and the number of their occurrence records represented in the dataset.

Description:
The tab-delimited TSV-formatted dataset was created following the DarwinCore standard.It contains 3,660 occurrence records on the digitised specimens of invertebrates deposited in the SMNH NASU (Rizun et al. 2024).This dataset will be dynamically updated with new data along with digitisation and data mobilisation progress.

Column label Column description institutionCode
The acronym in use by the institution having custody of the object(s) or information referred to in the record.In our case, it is SMNH NASU.
institutionID An identifier for the institution having custody of the object(s) or information referred to in the record.In our case, it is ROR identifier. basisOfRecord The specific nature of the data record, for example, preserved specimen or field observation.
occurrenceID An unique identifier for the Occurrence.In our case, it is UUID ver. 4. catalogNumber An identifier for the record within the collection.Data mobilisation at the Fund of Invertebrates of the State Museum of Natural ...

habitat
The description of the habitat where the specimen was collected or observed.
minimumElevationInMetres The lower limit of the range of elevation (altitude, usually above sea level), in metres. maximumElevationInMetres The upper limit of the range of elevation (altitude, usually above sea level), in metres.
geodeticDatum The ellipsoid, geodetic datum or spatial reference system (SRS), upon which the geographic coordinates given in decimalLatitude and decimalLongitude are based.
In our case, it is WGS84. decimalLatitude The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a locality. decimalLongitude The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a locality.
coordinateUncertaintyInMetres The horizontal distance (in metres) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the locality.
identifiedBy A list of names of people who assigned the taxon to the subject.
the SMNH NASU invertebrate collection is focused on the red priority group, limited by the fauna of Ukraine.This group is the most valuable and requires urgent digitisation, since its loss in the case of hostilities will leave an irrevocable legacy.Specimens collected from other countries are out of current digitisation plans and will be digitised only occasionally or on request.The next digitisation round (2025-2030) will involve the specimens belonging to the yellow priority group.The specimens of the green priority group and specimens from othe countries will be digitised in the last digitisation round (2030)(2031)(2032)(2033)(2034)(2035).
Besides the mentioned plans, we will gladly consider requests for prioritised digitisation from scientists worldwide.We believe that it is most important to digitise those materials that are urgently needed for research purposes.Therefore, please direct your requests to the collection curators, i.e.Coleoptera and other invertebrates -Volodymyr Rizun (novikoffav@gmail.com),Lepidoptera -Kateryna Hushtan (katrinantonyuk@gmail.com),Arachnida -Habriel Hushtan (habrielhushtan@gmail.com)and Nematoda -Andrii Susulovsky (susulovsky@gmail.com).

Title:
Digitisation of natural history collections damaged as a result of hostilities and related factors: development of protocols and implementation on the basis of the State Museum of Natural History of the National Academy of Sciences of Ukraine (Nr 2022.01/0013)Personnel: Project PI: Andriy Novikov (Dr., Senior Research Scientist, SMNH, Department of Biosystematics and Evolution, ORCID https://orcid.org/0000-0002-0112-5070).

Figure 1 .
Figure 1.Digitisation progress of the invertebrates at the SMNH NASU.
the Fund of Invertebrates of the State Museum of Natural ...

Figure 2 .
Figure 2.Simplified priority evaluation applied for the digitisation of the natural history collections at the SMNH NASU.Scores are calculated by multiplication of specimen value by its vulnerability.Three priority groups are ascertained -red (highest) with scores over 35; yellow (moderate) with scores between 20 and 35; and green (lowest) with scores under 20.

Figure 3 .
Figure 3. Distribution of the occurrence records in the dataset by countries.

Figure 4 .
Figure 4. Taxonomic structure of invertebrates represented in the dataset.

Figure 5 .
Figure 5.The temporal coverage of the dataset by years.
verbatimScientificName A string representing the taxonomic identification as it appeared in the original record.scientificName The full scientific name of the taxon including the genus name and specific epithet.taxonRank The taxonomic rank of the most specific name in the scientificName.kingdom The full scientific name of the kingdom in which taxon is classified.phylum The full scientific name of the phylum in which taxon is classified.class The full scientific name of the class in which taxon is classified.order The full scientific name of the order in which taxon is classified.family The full scientific name of the family in which taxon is classified.genus The full scientific name of the genus in which taxon is classified.recordedBy A person, group or organisation responsible for recording the original Occurrence.verbatimEventDate The date of record as it appears in the original publication or specimen label.eventDate The date during which an event (e.g.collection of the specimen, photographing of the plant or its registering in the field in any other way), occurred.countryCode The standard code (ISO 3166-1-alpha-2) for the country in which the locality occurs.country The name of the country in which the locality occurs.languageThe language of data representation for the occurrence record.localityThe specific description of the place where the specimen was registered or collected.
Data mobilisation at the Fund of Invertebrates of the State Museum of Natural ...