Biases and distribution patterns in hard‐bodied microscopic animals (Acari: Halacaridae): Size does not matter, but generalism and sampling effort do

The interplay between distribution ranges, species traits and sampling and taxonomic biases remains elusive amongst microscopic animals. This ignorance obscures our understanding of the diversity patterns of a major component of biodiversity. Here, we used marine Halacaridae to explore whether differences between marine provinces can explain their distribution patterns or if differential sampling efforts across regions prevent any macroecological inference. Furthermore, we test if certain functional traits influence their distribution patterns.

vary amongst these two extremes. As a corollary, biological diversity is unevenly distributed throughout the Earth, defining patterns of species richness that change with spatial scale (Gaston, 2000;Hawkins et al., 2003;MacArthur & Wilson, 2016). These patterns reflect the interaction between environmental parameters and organisms' traits (Gotelli et al., 2009;Willig et al., 2003) but also the historical events that have affected the spatial units under consideration (Gaston, 2009;Schemske, 2002;Willig et al., 2003).
Concomitantly, our knowledge of species ranges is biased by the uneven attention paid to different organisms by researchers.
Vertebrates and large organisms have been all-time favourites, in contrast to small and microscopic living forms that have remained neglected (Appeltans et al., 2012;Lambshead & Boucher, 2003;Vitorino & Bessa, 2018). Indeed, many aspects of the biology of minute organisms remain unknown (Fonseca et al., 2018) hampering our understanding of their spatial distribution patterns and the mechanisms that drive them Marrone et al., 2022;Martínez et al., 2019). This lack of knowledge seriously impacts our global understanding of biodiversity patterns because microscopic animals are a numerically important component of biodiversity in many regions (Curini-Galletti et al., 2012;Jörger et al., 2021;Martínez et al., 2019;Willems et al., 2009), offer essential services in many aquatic ecosystems (Schratzberger & Ingels, 2018) and represent tools to test general eco-evolutionary hypotheses avoiding the confounding factor of the non-independence of the observations due to shared evolutionary history (Fonseca et al., 2018).
Furthermore, the anatomical simplification of organ systems due to miniaturisation depicts microscopic animals as excellent model organisms for a wide range of morphological to evolutionary, ecological or physiological studies. Many investigations have suggested that microscopic organisms exhibit much broader distribution ranges than their macroscopic counterparts (Fontaneto, 2011;Hillebrand, 2004;Maraun et al., 2007). These broader distributions are easy to explain for those microorganisms that possess traits favouring long-distance dispersal, such as high abundances, dormancy stages, long-term viability and parthenogenesis (Curini-Galletti et al., 2012;Fontaneto, 2019). What is more striking, however, is the large-scale ranges recorded for species without evident mechanisms of long-distance dispersal (Giere, 2008). This so-called 'meiofauna paradox' has been attributed to our inability to identify the actual units of diversity across wide areas: Genetically distinct species may exhibit highly conserved morphologies, masking geographical patterns of diversity. Additionally, species identifications are challenging for the non-expert, populating the literature with misidentifications that are seldom examined or rectified. Those wrongly attributed species names might then linger in the literature for years, causing additional inaccuracies in forthcoming studies. Indeed, sampling bias, which hassles all biodiversity inventories (Barbosa et al., 2010;Boakes et al., 2010), becomes hefty for meiofauna due to the problems inherent to the study of minute animals . Some microscopic animals, though, are more popular than others, as most conventional ecological studies on meiofauna have focused on hard-bodied taxa rather than on groups with flimsy, soft bodies (Curini-Galletti et al., 2020). This has been well documented in the literature and attributed to the fact that soft-bodied taxa (e.g. flatworms, gastrotrichs etc.) must be studied in vivo in the field, whereas hard-bodied animals (e.g. nematodes, arthropods) can be fixed and studied months after their collection. What remains to be addressed is what traits bias our knowledge of the distribution patterns within hard-and soft-bodied taxa in meiofauna.
In this study, we investigate the potential sources of bias affecting our knowledge of the distribution of microscopic animals using marine halacarids in Europe as a model group. Halacaridae is a family of microscopic mites that have been historically neglected, despite being relatively easy to find crawling in a variety of substrates, such as macroalgae, marine phanerogams, sandy and gravelly particles, or even associated with larger animals (Bartsch, 2006;Giere, 2008).
We know to date more than 1200 species, predominantly marine although brackish and freshwater representatives have also been described (Bartsch, 2008). Halacarids are also common in a wide range of subterranean and semi-subterranean aquatic habitats, such as caves or the hyporheic zone associated with rivers (Husmann & Teschner, 1970;Schwoerbel, 1961Schwoerbel, , 1986. Halacarid mite species possess phytophagous, omnivorous and/or carnivorous diets, the latter including predators, parasites and probably scavengers. More striking is that many halacarid species, yet small (150-1000 μm) and only capable to crawl, have been recorded in all geographical provinces, from polar to tropical regions, and all depths, from the intertidal to abyssal zones (Bartsch, 2006;Newell, 1967). The degree of ecological specialisation, though, varies depending on the species, ranging from ecologically specialised forms to opportunistic groups recorded across a wide range of substrates. This versatility, along with their low dispersal capability and conserved morphology, depicts halacarids as a suitable model for addressing general questions in ecology and evolution (Pepato et al., 2019). Nevertheless, halacarid mites are understudied despite their widespread presence and many species have yet to be described (Appeltans et al., 2012). This leads to a unique situation amongst meiofaunal groups: being easy to preserve as well as geographically and ecologically widespread hard-bodied organisms, many experts on meiofauna have come across halacarids. Yet, only a few researchers have ventured into their taxonomy and provided records at the species level.
Our main premise is that sampling effort will largely explain the distribution patterns of these halacarid species, masking the ecological factors driving their distribution. This premise leads to three hypotheses. (H1) The patterns of taxonomic richness of halacarids will be explained by the presence of marine biological stations. (H2) Regardless of these biases, we expect that differences in species composition between marine provinces will explain the overall differences in species replacement across European marine stations. Yet, given the uneven sampling effort (Rubio-López et al., 2022), the species known in stations with fewer published papers and records will represent a subsample of the species known in the stations that have been better sampled. (H3) Finally, larger and generalist species might be easier to find and capable to thrive in more localities, accumulating more records than smaller or more ecologically specialised species.

| Data set and rationale
We focused on Europe because its coasts along the northern Atlantic and adjacent basins are the best investigated areas in the world for Halacaridae, with ca. 230 recorded marine species (Bartsch, 2004b), with the first records dating back from the 18th century (Baster, 1758).
We used a data set previously assembled (Rubio-López et al., 2022) with all occurrences of marine halacarids recorded between 70°N-20°N and 50°W-50°E. We divided our studied area according to marine provinces and countries following the Marine Ecoregions of the World (Spalding et al., 2007) and the Biodiversity Information Standards of the Taxonomic Database Working Group (www.tdwg.com). Accordingly, records from Svalbard and the deep sea (<200 m depth) were considered outside the study area (coastal and shelf areas). Also, records from Macaronesia (Açores-Canary Islands) were removed because of the scarcity of records and they lack an effective marine station. Here, we refer to record as the report of a species at a physical location at a certain time (Isaac & Pocock, 2015).

| Hypothesis tests and statistical models
All analyses were performed using the statistical software R 4.1.2 (R Core Team, 2021). Before running each model, we excluded multicollinear explanatory variables (Pearson's r > 0.6) checking for correlation values using the package psych v. 2.2.5 (Revelle, 2022). We also examined the potential remaining effect of multicollinearity, together with the fit of each of our models, using the package performance v. 0.9.1 (Lüdecke et al., 2021).

| Species richness of marine halacarids in Europe is largely explained by sampling effort (H1)
We explored whether the distribution of marine biological stations explains the overall patterns of species richness exhibited by halacarid mites along the European coastline (H1). This is because most of the studies on marine halacarids in Europe were performed in littoral areas by researchers associated with marine facilities (e.g. Bartsch, 1975, 1976a, 1976b, 1977, 1979, 1980. Therefore, we expect a decay in species richness as the distance from marine biological stations increases (H1a) but also the number of records and papers associated with each marine station will explain the differences in species richness (H1b).
We first calculated the distance of each record to the nearest marine station using the function 'spDistsN1' of the package sp v.
1.4-6 (Pebesma & Bivand, 2005). We assembled the data set of marine biological stations using the Mars network (The European Network of Marine Stations, https://www.marin estat ions.org/) and CIESM (The Mediterranean Science Commission, https:// www.ciesm.org/), completed by adding a few stations not included in those networks but historically relevant in halacarid research (Appendix S1). The coordinates for each station were extracted from Google Maps. We then grouped all species records according to the logarithm of the distance to the nearest marine station and modelled the effect of this distance on species richness using generalised linear models (GLM). We logarithmically transformed the distances because we expect that smaller distances near each station will have a stronger influence. In other words, we assume that scientists prefer sampling nearby a marine station, but as their sampling localities move away from a marine station, the importance of the distance would decay rapidly. Given the uncertainty in assigning coordinates for some records (Rubio-López et al., 2022), we did not use the exact coordinates, which could be biased, but alleviated the uncertainties by grouping the records in six distance bins, following the Sturges approximation, which determines the optimal number of classes in a frequency distribution. Each bin has been included as a continuous integer variable in our analyses. Records that were more than 400 km away from a marine station were disregarded. As countries and marine provinces are collinear, we included provinces in the models A and B which is biogeographically meaningful to account for differences in species richness of marine halacarids; and countries in the model which we included the number of papers as a confounding factor, to account for differences in sampling effort amongst countries, since some of them belong to more than one marine province (e.g. France, Spain). In addition, as a metric of sampling intensity, we incorporated the number of halacarid records and published articles for each marine biological station. We used a negative binomial GLM, with the package MASS v. 7.3-57 (Ripley et al., 2013), checking for overdispersion. The significance of each independent variable was summarised as a Type II ANOVA table, using the function 'Anova' in the package car v. 3.0.9 (Fox et al., 2013). We further checked the relative importance of each explanatory variable with the 'varImp' function in the package caret v.6.0-90 (Kuhn et al., 2020).
Then, we investigated the factors explaining differences in species richness (H1b), after grouping the number of species according to their nearest station. We included the number of halacarid records and published papers associated with each marine biological station as a measure of sampling intensity. This association was inferred by geographical proximity. Based on that, each record could be unambiguously assigned to a single marine station. We also included the area of influence of each marine station, defined by the distance to its furthest record. We used generalised least squares models (GLS) to account for the potential effect of various spatial autocorrelation structures. The latter were calculated with the 'gls' function in the package nlme v. 3.1-153 (Pinheiro et al., 2007). We further checked the relative importance of each explanatory variable with the 'varImp' function in the package caret v.6.0-90 (Kuhn et al., 2020). We mapped the distribution of marine biological stations, as well as their number of records and number of species using the package ggplot2 v. 3.3.5 (Wickham, 2016).
2.2.2 | Differences in species composition depend on both marine provinces and sampling effort (H2) Despite biases in sampling effort and distances from marine stations, we still expect to find differences in species composition (i.e. beta diversity) across marine provinces based on their historical and biogeographic contexts. We decomposed beta diversity into its two components: species replacement and differences in species richness (Podani & Schmera, 2011). The substitution of species in one site by different species in another site results in species replacement (Qian et al., 2005). Many mechanisms (e.g. colonisation, dispersal limitation) can produce species loss or gain, resulting in differences in richness between sites (Carvalho et al., 2012). Separating the contributions of species richness differences and replacement to the total beta diversity might reveal distinct underlying processes concealing similarity in composition amongst locations or habitats. For instance, although some halacarid species occur across the Mediterranean, the Black Sea and the Atlantic Ocean (Bartsch, 1989(Bartsch, , 2004a(Bartsch, , 2004b, others are restricted to one of those seas (André, 1939;Bartsch, 1983Bartsch, , 1998. Due to the differences in the distribution area and range amongst those species, we expect to find a strong effect of the marine province and country in which each station is located on the total beta taxonomic diversity and its replacement component (H2a), reflecting the substitution of species with increasing geographical distances. In contrast, we expect that the number of records and published papers will have higher relative importance in explaining richness differences (H2b), as more species should be known from thoroughly studied areas.
First, we calculated the overall differences in total beta diversity and its two components across marine provinces. Then, we calculated the relative importance of the countries, the marine provinces, the number of records and the number of papers in explaining differences in beta diversity across marine stations using permutational analysis of variance (PERMANOVA; Anderson, 2001).
To prevent stations with inadequate sampling, only the stations with more than 10 records were selected. Beta diversity was calculated as the Jaccard similarity index using the 'beta' function in the package BAT v. 2.7.1 (Cardoso et al., 2021); PERMANOVA was computed with the 'adonis' function of the package vegan v. 2.5-7 (Oksanen et al., 2020). The relationships between the differences in species composition amongst the four investigated provinces were graphically shown using a dendrogram, calculated with the function 'hclust' in the stats package (R Core Team, 2021) (see next section).

| Species traits influence the distribution patterns of halacarid species (H3)
We expect that larger and generalist species accumulate more records because they are easier to spot and capable to thrive in more localities. Hence, we expect that the most recorded species are larger and/or more able to inhabit different substrates, whereas small species and/or those inhabiting specific substrates (i.e. specialists) have fewer records (H3a). We further expect that this relationship will affect our knowledge of the co-occurrence of species across marine stations because, in each of them, large and generalist species are more likely to be found first, more often and together with more species (H3b).
We assembled a data set with the number of records, number of habitats (i.e. types of substrates), number of depth zonation categories, number of countries, number of stations, number of localities each species has been reported for, as well as the idiosomal length (in μm; Appendix S2) and the geographical range (in km) of each species.
We have retrieved the body length from the illustrations available in the literature to make measurements of morphological traits, as used in other studies (Gonzalez et al., 2018). We only considered adults of each species, because illustrations of immature specimens are less frequent, and because different life stages show different ecological preferences and dispersal abilities, even within the same species (Bartsch, 2006). For species lacking illustrations, we used the size of the genus as estimation. Measurements were made using the free software 'ImageJ', widely used by experts in different areas to measure functional traits (Birn-Jeffery et al., 2012;Cobb & Sellers, 2020;Fowler et al., 2009;Tinius & Patrick Russell, 2017). We defined the geographical range for a species as the geodesic distance between the furthest sampling localities where the species has been found.
The geographical range was calculated using the 'geodist' function in the package geodist v. 0.0.7 (Padgham et al., 2021). After check- ing for collinearity, we tested the effect of body size and the number of habitats against the number of records and the geographical range, only for species recorded more than once (H3a). We select a Gaussian distribution for the geographical range, after confirming the normal distribution of the residuals; and a negative binomial distribution for the number of records because it represents count data with overdispersion. We further checked the relative importance of each explanatory variable with the 'varImp' function in the package caret v.6.0-90 (Kuhn et al., 2020).
To investigate the effect of those traits on the differences in species composition across marine provinces (H3b), we first calculated the Bray-Curtis dissimilarity index (BC) between species in each marine province using the 'vegdist' function on the package vegan v.
2.5-7 (Oksanen et al., 2020). To compute the Bray-Curtis index, we used the species occurrence matrix of each marine province, which is equivalent to the transposed species community matrix. We selected the Bray-Curtis dissimilarity index because we wanted to account for the number of times that each species was recorded in the same marine province. Then, we calculated the relative importance of body length, number of habitats, number of records and species geographical range over the Bray-Curtis dissimilarity matrix using PERMANOVA (Anderson, 2001) with the 'adonis' function of the package vegan v. 2.5-7 (Oksanen et al., 2020). We visualised the number of times each species has been recorded in each province using a heat map drawn with the 'heatmap' function on the package ComplexHeatmap v. 2.10 (Gu et al., 2016). The number of records of each species was log-transformed to ease the visualisation of the results with a colour scale. Then, we represented the co-occurrence pattern of each species across the entire data set using the 'complete linkage' hierarchical clustering method calculated with the function 'hclust' in the stats package version 4.1.2 (R Core Team, 2021).

| RE SULTS
According to the first hypothesis (H1a), we expected a decay in species richness as the distance from marine biological stations increases. Our results supported this assumption, as we found a negative relationship between species richness and the distance to Our results also supported our second hypothesis since, regardless of sampling bias, differences between marine provinces explained the highest percentage of the variance of the total beta diversity matrix and its replacement component (H2a, Table 3). In addition, as expected, the number of records had higher relative importance in explaining the variation in the richness component of the beta diversity (H2b; Table 3). The highest differences in total beta diversity across marine provinces (H2) ranged between 0.53 (Lusitania vs Northern European Seas) and 0.8 (Black Sea vs. Northern European Sea; Table S1). The cluster calculated from this matrix recovered the Northern European Seas as the first province branching off, followed by Lusitania and a subcluster including the Mediterranean and the Black Sea ( Figure 2). The dendrogram for the species matrix and the support values can be seen in the Figure S1.

PERMANOVA,
Generalist species of halacarids (i.e. species occurring in more different types of substrates) accumulated more records and exhibited broader geographical ranges (H3a ; Table 4). Conversely, species' body size did not show any relation with the number of records, but a marginally significant relationship with their geographical range (Table 4). The number of habitats and records accumulated per species were the predictors with the highest relative importance in explaining the species occurrence matrix (H3b, Table 5). We found congruent results for each marine province separately (Tables S2   and S3).

| DISCUSS ION
Our analysis confirmed our main premise: sampling effort largely explains the differences in distribution patterns of marine mites in Europe. Specifically, we found a small yet significant decay of species richness with the distance from each marine station, whereas sampling effort explained differences in species richness across them (H1). Although most differences in total beta diversity and its replacement component were explained by countries and marine provinces, the number of papers and records had higher relative importance for the differences in beta richness (H2). Finally, generalist species (occurring in more habitats) had broader geographical ranges and accumulated more records, affecting the species occurrence known for each marine station (H3). However, sampling biases could not alone explain the distribution of records in our data set since, despite the bias, we found an effect of geography and species traits over those patterns. In other words, marine Halacaridae are Note: The marine provinces and countries are collinear, so only one of them is included in each model. Models A and B include marine provinces as an explanatory variable, which is biogeographically meaningful to account for differences in species richness of marine halacarids; model C includes country to account for differences in sampling effort amongst countries, since some of them belong to more than one marine province (e.g. France, Spain). p values for significant predictors are marked in bold.
Abbreviations: df, degrees of freedom, LR Chisq, likelihood ratio chi-square values, VarImp, relative importance of each variable.

TA B L E 1 Effect of distance, numbser
of records and number of papers on species richness (A), number of records (B) and number of papers (C) of marine halacarids in Europe (H1a) according to a type II ANOVA output of generalised linear models.
not equally distributed across Europe, but their distribution patterns are affected by certain ecological and geographical processes.

| Impact of sampling effort on halacarid richness: The effect of marine stations
Our results confirmed that researchers focus on the surrounding areas of marine biological stations. This is not surprising, since biological stations play a pivotal role in addressing today's most critical biological issues in marine habitats (e.g. climate change, biodiversity loss, biological invasions) and provide a global network for long-term environmental monitoring and research, education and scientific dissemination (Struminger et al., 2018;Wilson, 1982;Wyman et al., 2009). Furthermore, marine stations also provide key resources for research, such as vessels and wet lab facilities, which are necessary for sample collection and species documentation.
Consequently, biological stations might also pitch our knowledge on biodiversity leaving some areas unevaluated and potentially introduce non-random geographical biases. Nevertheless, our results indicate only a moderate decline in species richness with increasing distance from research stations, but this result might be affected by two features of our data set. First, we enforced all records to a marine biological station, even when this might not always be the case. Thus, the enforcement might have inflated the number of records associated with certain isolated stations (e.g. Bergen as the only station in Western Norway), affecting the relationship between distance and species richness associated with the furthest bins. The second feature is related to the geographical uncertainty associated with those records that are not georeferenced (Marcer et al., 2022). In those cases, we inferred the coordinates after the centroid of the geographical information provided, which is further from the actual sampling site. We prefer this more conservative approach, though, rather than assuming that collections were always performed next to the station because we did not want to favour our working hypothesis a priori, but rather evaluate to what extent the hypothesis is sustained despite the issues implicit to the available data. Additionally, the variation in species richness was explained by Note: The number of records and papers are included as a measure of sampling efforts of the halacarid diversity across marine biological stations. The area of influence of each marine station is defined by the distance to its furthest record. The relative importance of each variable has also been calculated (VarImp). p values for significant predictors are marked in bold.
TA B L E 2 Effect of sampling efforts (i.e. number of records and number of papers) on species richness of marine halacarids in Europe (H1b) according to a type II ANOVA output of generalised least squares. is influenced by the number of researchers working in a given area Rubio-López et al., 2022). Sampling intensity does not drive actual species richness, but it does condition our knowledge of species richness and distribution patterns, acting as a confounding factor when patterns of diversity are to be inferred .

| Occurrence of halacarid mites across different marine provinces: Ecology overrules sampling biases, to a certain extent
Macroecological variables explain differences in species composition of marine halacarids across the European seas, despite sampling bias. Accordingly, geography (i.e. marine province) accounts for most of the variance of the beta diversity matrix in our PERMANOVA analyses, particularly regarding differences in total beta diversity and its replacement component. We

| Generalist species accumulate more records: does this reflect the actual distribution patterns of mites or the biases produced by the sampling behaviour of researchers?
According to the results of our PERMANOVA, generalist species with a larger number of records are more likely to be found together, Note: Results are reported from permutational multivariate analyses of variance (PERMANOVA) for the beta diversity (β total , β replacement , β richness ) across marine provinces. Explanatory variables include marine provinces, countries, number of records and number of papers.
Abbreviations: df , degrees of freedom; R 2 = coefficient of determination; SS, sum of squares.

TA B L E 3
Effect of sampling effort on species composition of marine halacarids (H2b).
independently of their body size. This indicates that species traits might shape the occurrence of species and ultimately the composition of mite communities across marine provinces.
However, these traits might also affect the detectability of different species of mites (Brown, 1984), which could confound our interpretations. Detectability refers to the probability of finding at least one individual of a given species in a sampling event, assuming that the species is present in the sampled region. We found a marginally significant negative relationship between geographic range size and body size, but this is masked by sampling bias and cannot be trusted.
Moreover, species' body size showed no correlation with the number of records in our analyses.
Conversely, generalist species collectively exhibit wider geographical ranges in our data set, being recorded more often than specialised taxa (i.e. occurring in fewer habitats). This may be because abundant species are more likely to appear in a given sample, F I G U R E 2 Heatmap showing the frequency at which each species has been recorded in each marine province. Each row corresponds to a species and each column to a marine province (Northern European Seas, Lusitania, Mediterranean and Black Sea respectively). We built the hierarchical clustering dendrogram on the right side using the corresponding values of the Bray-Curtis dissimilarity index obtained for each species. The distance has been calculated using the species occurrence matrix of each marine province, which is equivalent to the transposed species community matrix. The colour gradient from white to black indicates the number of occurrences (from low-white to high-dark). Each type of substrate is coloured: animal origin in watusi (orange), hard bottom in frost (light green), soft bottom in cavern pink and vegetal origin in jet stream (sky blue) respectively. Abbreviations as in Figure 1.

Biogeography Habitat
log 10 (occurrences + 1) animal vegetal hard-bottom soft-bottom increasing its apparent distribution range (Fridley et al., 2007;Gaston et al., 1997), and because scientists prefer to sample habitats where their target organisms are more common, inflating the rarity (sensu Brown, 1984) of specialised species.
In that regard, as researchers interested in biodiversity, we must be aware of two different cognitive biases that might affect the way we collect our data. One of these biases can be identified with the 'streetlight effect' or 'principle of the drunkard's search' (Kaplan, 1964) and it is represented in our analyses by the concentration of research on the surrounding areas of marine biological stations. Researchers presumably resort to sites close to marine stations because they are easily available and convenient. Interestingly, the fact that a given geographical area has been already well sampled might favour future researchers to continue sampling in it, if, for example, they are interested in collecting a particular species that was originally described in there for phylogenetics or physiological studies or to redescribe species with old, inaccurate descriptions or look for new type material if the original has been damaged or lost (Phillips et al., 2009;Reddy & Dávalos, 2003). This might reinforce through time that areas known to have more species tend to attract more observers. Whereas these two biases might affect our knowledge of biodiversity across the tree of life, they might have a more accentuated impact on groups that require a higher specialised taxonomic training and equipment for their identification, such as meiofauna.
In other words, many people, including researchers but also non- Note: Permutational multivariate analysis of variance (PERMANOVA) for the variables explaining differences in the distribution of marine mites. We used the following variables: idiosomal length as a proxy of the body size (length), number of habitats, number of records and the geographical range (max. distance) of each species.
Abbreviations: df, degrees of freedom; R 2 , coefficient of determination; SS, sum of squares.

| CON CLUS IONS
The variation in species richness was largely explained by uneven sampling effort, as indicated by the concentration of species known in the surrounding areas to marine biological stations and by a strong effect of the number of papers and records. However, our findings also supported that the species composition of marine halacarids varies across marine provinces, suggesting that ecological and historical factors explain the distribution of mites across large geographical scales. Nonetheless, sampling effort also explained the richness differences that emerge from the species sorting process. Finally, in our data set, generalist species have larger geographical ranges and are documented more frequently than ecologically specialised taxa. Our knowledge of the distribution patterns of marine mites largely depends on the patterns of sampling effort, which might conversely reflect the behaviour of the scientists studying these animals.

ACK N O WLE D G E M ENTS
The MS has been written in the IRSA-CNR in Pallanza, supported by an Erasmus+ Mobility grant to IRL

CO N FLI C T O F I NTE R E S T S TATE M E NT
All authors declare that they have no conflicts of interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13679.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the supplementary Information in Dryad (https://doi.org/10.5061/ dryad.d7wm3 7q53). The data set is available to download in a previ-