Skeletal completeness of the non‐avian theropod dinosaur fossil record

Non‐avian theropods were a highly successful clade of bipedal, predominantly carnivorous, dinosaurs. Their diversity and macroevolutionary patterns have been the subject of many studies. Changes in fossil specimen completeness through time and space can bias our understanding of macroevolution. Here, we quantify the completeness of 455 non‐avian theropod species using the skeletal completeness metric (SCM), which calculates the proportion of a complete skeleton preserved for a specimen. Temporal patterns of theropod skeletal completeness show peaks in the Carnian, Oxfordian–Kimmeridgian and Barremian–Aptian, and lows in the Berriasian and Hauterivian. Lagerstätten primarily drive the peaks in completeness and observed taxonomic diversity in the Oxfordian–Kimmeridgian and the Barremian–Aptian. Theropods have a significantly lower distribution of completeness scores than contemporary sauropodomorph dinosaurs but change in completeness through time for the two groups shows a significant correlation when conservation Lagerstätten are excluded, possibly indicating that both records are primarily driven by geology and sampling availability. Our results reveal relatively weak temporal sampling biases acting on the theropod record but relatively strong spatial and environmental biases. Asia has a significantly more complete record than any other continent, the mid northern latitudes have the highest abundance of finds, and most complete theropod skeletons come from lacustrine and aeolian environments. We suggest that these patterns result from historical research focus, modern climate dynamics, and depositional transportation energy plus association with conservation Lagerstätten, respectively. Furthermore, we find possible ecological biases acting on different theropod subgroups, but body size does not influence theropod completeness on a global scale.

The fossil record has temporal, geographical, environmental and skeletal gaps (Newell 1959;Foote & Raup 1996;Kidwell & Holland 2002), and it is essential that these limitations are considered when making interpretations about the evolutionary patterns of a group. In recent decades much research has focused on the impact of this incompleteness on our interpretations drawn from the fossil record (e.g. Dingus 1984;Foote & Sepkoski 1999;Benton et al. 2000Benton et al. , 2011Smith 2001Smith , 2007Cooper et al. 2006). Many assessments have focused on the relative proportions of species or species ranges represented in the fossil record. This has been assessed by quantifying the extent to which fossil occurrence ranges represent 'true' temporal ranges of species (Benton & Storrs 1994, 1996Foote & Raup 1996;Eiting & Gunnell 2009), and by the level of congruence, or percentage of gaps (ghost ranges), between the stratigraphical order of fossil occurrences and order of phylogenetic tree branching (Dingus 1984;Benton & Storrs 1994, 1996Teeling et al. 2005;Upchurch & Barrett 2005;Dyke et al. 2009;O'Connor et al. 2011a).
Over the last two decades, many assessments of the quality of the fossil record have focused on the variation in information content provided by fossil specimens of a group (Benton et al. 2004;Fountaine et al. 2005;Smith 2007;Dyke et al. 2009;Benton 2010;Mannion & Upchurch 2010a;Brocklehurst et al. 2012;Walther & Fr€ obisch 2013;Brocklehurst & Fr€ obisch 2014;Cleary et al. 2015;Dean et al. 2016;Verri ere et al. 2016;Davies et al. 2017;Driscoll et al. 2018;Brown et al. 2019). Using these approaches, a high-quality fossil record would be one that contains many highly complete specimens. Early methods for quantifying specimen completeness were relatively subjective, and scored the completeness of fossil specimens by separating preservation quality into four or five simple categories (Benton et al. 2004;Fountaine et al. 2005;Benton 2008), an approach that was later refined by examining different skeletal regions (Beardmore et al. 2012), following previous taphonomic studies (Sander 1992;Kemp & Unwin 1997;H€ ungerb€ uhler 1998;Casey et al. 2007). Subsequently, Mannion & Upchurch (2010a) conceived two completeness metrics that quantify the completeness of individual specimens and species in more detail and with greater accuracy. These metrics are the skeletal completeness metric (SCM) and character completeness metric (CCM). SCM measures the absolute proportion of the skeleton that is preserved for a species, whereas CCM measures the proportion of phylogenetically informative characters preserved. Calculating such metrics enables meaningful comparisons to be drawn between various sampling biases that could influence the record of a group.
Environmental and geological parameters can theoretically influence the quality of fossil specimens (Dingus 1984;Retallack 1984). For example, a high number of localities from depositional settings with higher quality preservation could lead to increased specimen completeness within a time interval. Ecological and biological differences between groups could also influence fossil quality, as body size and robustness of skeletons (Cooper et al. 2006;Brown et al. 2013), and particular environmental preferences (Mannion & Upchurch 2010b) have been associated with differing qualities of fossil records. Variation in historical or geographical sampling by researchers can also potentially influence the level of specimen completeness known for a group, as more effort being allocated to a particular group or a set of localities is likely to yield more complete skeletons (Bernard et al. 2010). Incomplete skeletons may also be difficult to diagnose, resulting in either a reduction in diversity estimates for a group or time bin or, conversely, increasing diversity as a result of taxonomic oversplitting (Brocklehurst & Fr€ obisch 2014). Previous studies have found varying correlations between completeness metrics and changes in diversity and fossil record sampling metrics through time, as well as various geographical and environmental differences between the fossil records of different groups ( Dinosaurs have featured prominently in discussions of the quality of the fossil record (Butler & Upchurch 2007;Benton 2008Benton , 2010Lloyd et al. 2008;Barrett et al. 2009;Mannion & Upchurch 2010a;Tarver et al. 2011;Brocklehurst et al. 2012). Studies have demonstrated that: (1) highly incomplete taxa can still provide important information for our understanding of dinosaur phylogenetic relationships (e.g. Butler & Upchurch 2007); (2) there are differences in fossil completeness between continents and changing levels of completeness through historical time (Benton 2008); (3) sampling artefacts influence our interpretation of apparent dinosaur diversification events (Lloyd et al. 2008); (4) the validity of named dinosaurian taxa depends on the researcher (Benton 2010); (5) additional finds of new species significantly change dinosaur phylogenetic relationships and our understanding of their evolution (Tarver et al. 2011); (6) the sauropodomorph fossil record varies in completeness through geological and historical time, and may influence our understanding of the group's temporal diversity changes (Mannion & Upchurch 2010a); and (7) Mesozoic avian dinosaurs have a record that may be strongly influenced by diversity changes through time and preservation in Lagerst€ atten deposits (Brocklehurst et al. 2012).
Despite the aforementioned studies, the quality of the theropod fossil record has never been quantified using specimen completeness metrics. Theropods are an ideal group to assess using these approaches, as their broad geographical and temporal extent may provide insights into large scale biases acting upon the fossil record. Here, we quantitatively assess the fossil record of theropod dinosaurs using the skeletal completeness metric originally developed by Mannion & Upchurch (2010a). SCM was preferred ahead of CCM as it has more obvious connections to the natural taphonomic, environmental and weathering processes on which we were more interested in drawing conclusions for this study. We also focus on non-avian theropods (here referred to simply as 'theropods'), from the earliest species to the immediate precursors of avians. Avian taxa are excluded because recent studies have already assessed the quality of the Mesozoic bird fossil record (Fountaine et al. 2005;Brocklehurst et al. 2012;Gardner et al. 2016) and additional assessment of Cenozoic birds would be beyond the scope of this study.
Our main aim was to ascertain whether theropod specimen completeness is influenced by spatial and temporal sampling biases. We statistically compared theropod completeness between different geographical regions, depositional environments, and taxonomic subgroups; and the relationship between completeness and changes in rock record, sampling effort, and taxonomic diversity through geological time. By doing so we tried to ascertain if there are particular patterns in the theropod fossil record that are indicative of larger scale ecological, geological, geographical or sampling biases, and to uncover controls acting on the records of the different theropod subgroups. We hope that the results of this study will highlight some of the modern and ancient spatial and temporal inconsistencies of the global fossil record which often go unconsidered when regarding the macroevolutionary understanding of a group. We further hope they can be used to guide future exploration of and research on the theropod fossil record.

Completeness metrics
The skeletal completeness metric (SCM) was proposed by Mannion & Upchurch (2010a) to more objectively estimate the proportion of the total, complete skeleton that is preserved for an individual species. They provided two different definitions for SCM: scored solely on the most complete specimen of a species (SCM1), or as the composite completeness of all known specimens of a species (SCM2). Strong correlations have been found between the two metrics (Mannion & Upchurch 2010a;Cleary et al. 2015;Tutin & Butler 2017), but we solely use the latter in this study, as it uses all the information at hand for each species and is more appropriate than arbitrarily nominating a most important specimen (Mannion & Upchurch 2010a;Brocklehurst et al. 2012;Brocklehurst & Fr€ obisch 2014).
Mannion & Upchurch (2010a) used approximations of relative skeletal proportions (e.g. the percentage of the total skeleton made up by any individual bone or skeletal region) to assess specimen completeness for sauropodomorphs. Subsequently, the metric has been refined and altered multiple times. For example, Cleary et al. (2015) used different skeletal proportion percentages for ichthyosaur taxa of different geological ages because significant morphological change occurs through time within the group. In contrast to the approximate estimates provided by Mannion & Upchurch (2010a), Brocklehurst & Fr€ obisch (2014) more precisely estimated the skeletal body proportions of synapsids by modelling each bone as the volume of a cone, cylinder, or a prism, based on skeletal measurements of multiple representatives of morphologically and taxonomically distinct subgroups. The assigned body proportion percentage of each bone was then derived from the average of these representatives. This was further developed by Verri ere et al. (2016), who modelled bone volumes using more precise natural shapes and mapping twodimensional outlines, representing each cranial bone, onto the external surface area of the skull (truncated pyramid) to obtain percentage volumes for each.
Although these refinements have made SCM calculations increasingly more precise, they are highly time consuming to implement, particularly for large and morphologically diverse taxonomic groups like Theropoda. Due to the lack of physical access to specimens or multi-dimensional measurements of every bone (mostly due to varying completeness) we opted not to calculate skeletal proportions using three-dimensional volumes. Instead we used an alternate but efficient method, whereby we modelled the two-dimensional surface area of each bone for ten morphologically and taxonomically disparate theropod taxa, based on scientifically informed skeletal reconstructions produced by Scott Hartman (http://www.skeletaldrawing.com): Herrerasaurus ischigualastensis, Coelophysis bauri, Majungasaurus crenatissimus, Allosaurus fragilis ( Fig. 1 fig. S1). Choice of the representative skeletal diagrams was based on the availability of distinct species that represent the major groups of Theropoda, as well as how completely known the remains of each species are (see Cashmore & Butler 2019). Each skeletal diagram and its constituent bones were traced in Adobe Illustrator (version CC) and the surface areas of individual bones and skeletal regions calculated using a free Illustrator plug-in, Patharea Filter (http://telegraphics.c om.au/sw/product/patharea). This enabled us to have precise representative shapes on which to base our relative bone dimensions. All individual skull and mandibular bones were assigned the same proportional percentage of the total skull and mandible, regardless of the varying sizes of the bones.
The lack of the third dimension when estimating proportions is a potential limitation of our approach. To test whether skeletal proportions can be sufficiently well estimated by two-dimensional lateral views, a shape-volume proportioned skeleton of T. rex was calculated from the measurements available in the Brochu (2003) monograph of 'Sue' (FMNH PR2081), one of the most complete specimens of T. rex ever discovered (Cashmore & Butler 2019, table S1). As in Brocklehurst & Fr€ obisch (2014), cones, cylinders and prisms were used as the representative shapes for each bone, plus half pyramids, hollow cylinders and cuboids when necessary (see Cashmore & Butler 2019). The resulting proportions are highly similar (Pearson's R 2 = 0.96, p = 2.432 9 10 À7 ) to those calculated from the two-dimensional skeletal reconstruction. Neither method is perfect, but a strong significant correlation between the results shows that they are coalescing on a relatively consistent set of skeletal proportions. Furthermore, Brown et al. (2019) found that there was no statistical difference between the completeness scores of bat taxa calculated using body proportions estimated via three-dimensional (CT scan of extant specimen) or twodimensional approaches. As a result, we opted for the simpler two-dimensional method, which is easier to apply to a much greater taxonomic sample.
After the proportions were calculated for each skeletal diagram, the percentage values for each individual bone from all ten exemplar taxa (e.g. ten differing values for the femora; see Cashmore & Butler 2019) were used to determine a mean value for each bone, which was applied to all theropods when computing completeness scores. Figure 1 shows the percentages used for individual regions of the theropod skeleton.

Dataset
We present a comprehensive dataset of 455 valid nonavian theropod species, including specimens that have not yet received formal taxonomic names but have been included as operational taxonomic units (OTUs) within phylogenetic analyses. Many of these OTUs represent isolated specimens of fairly low completeness but their inclusion is justified because they probably represent distinct, unnamed taxa, and can be of great value with regard to understanding phylogenetic relationships; their inclusion provides a better representation of the quality of the fossil record. We excluded all theropod species currently considered to be nomina dubia, Protoavis texensis because it is considered to be a chimera including non-theropod remains (Nesbitt et al. 2007), and Vitakridrinda sulaimani because the published information on this species is not adequate to score it (Malkani 2006). All published specimens of every taxon were included unless information was lacking for an individual specimen, or if a taxon's composite completeness was already 100% and any additional specimens made no difference to its completeness score. Completeness data were primarily gathered from figures and descriptive text in the literature, and when necessary from additional online sources, museum catalogues and via personal communication. The dataset includes detailed descriptions of the completeness of each specimen and scores completeness of individual bones from 0 to 100%, which was then transformed into overall skeletal proportions. See 'Scoring specimen completeness' in Cashmore & Butler (2019) for a detailed description of how individual bones were scored and how non-typical specimens were treated. Information regarding each taxon's geographical locality (modern and palaeocoordinates), geological age (stratigraphic stage), sedimentary setting (e.g. siliclastic or carbonaceous facies) and depositional setting were also gathered from the Paleobiology Database (PBDB: http://www.paleodb.org) and the literature. Body size data were collected as mass estimates (179 taxa) from Benson et al. (2018), supplemented by a further 57 calculations of additional taxa from available femoral measurements based on methods described in the same paper (see Cashmore & Butler 2019). The dataset is up-to-date as of December 2018 (Cashmore & Butler 2019).

Theropod completeness subdivisions
Time bins. To examine completeness through time, SCM2 scores of each taxon were used to calculate a mean completeness value for each geological stage-level time bin from the Carnian to Maastrichtian. Stage-level time bins were chosen for ease of comparisons with sampling proxy data and with completeness data from the majority of previous studies. The standard deviation of completeness scores was calculated for each individual stage. Taxa that were present over multiple geological stages, or have an uncertain stratigraphic age, were included in each stage in which they were potentially present. The Triassic and Jurassic (T-J) SCM2 scores were also analysed separately from the Cretaceous (K) in some tests to assess changes in the theropod record through time.
Geographical localities. To assess the varying quality of the theropod fossil record throughout the world, SCM2 scores were grouped by their hemisphere and between the major continental regions: Africa (30 taxa), Asia (191 taxa), Australasia (8 taxa), Europe (62 taxa), North America (95 taxa), and South America (68 taxa). Antarctica (1 taxon) was excluded from these analyses due to its very limited fossil record.
Depositional setting. SCM2 scores were also subdivided according to their inferred sedimentary setting and depositional environment to generally understand global taphonomic influences on the theropod fossil record. Taxa were classified as originating from either siliciclastic or carbonaceous settings, and from aeolian, fluvial channel, alluvial plain, or lacustrine terrestrial environments, or a coastal or open marine setting.
Lagerst€ atten. We further separated taxa derived from either conservation Lagerst€ atten, concentration Lagerst€ atten, or background (non-Lagerst€ atten) sedimentary regimes in order to measure the impact that sites of exceptional preservation have had on our understanding of the theropod record. For this study we define conservation Lagerst€ atten as deposits (and formations) which preserve soft tissues alongside skeletal remains (Eliason et al. 2017), and concentration Lagerst€ atten as unusually dense macro-bone accumulations from a single sedimentary stratum (Behrensmeyer 2007). Assignment of taxa as belonging to either type of Lagerst€ atte was primarily based on information gathered from the PBDB.

Temporal correlations
The temporal curve of theropod SCM2 completeness was statistically compared to a number of other time series with which it might potentially have a relationship. We first compared the complete theropod SCM2 time series with scores for its component preservational regimes: time series of concentration Lagerst€ atten, conservation Lagerst€ atten, non-conservation Lagerst€ atten, and background SCM2. Additionally, we tested the correlations between temporal changes in total SCM2 and changes in SCM2 curves for specific continental regions, subgroups and depositional environments to understand the different natural and sampling aspects that best explain the complete SCM2 curve. We tested the correlation between SCM2 and changes in non-avian theropod richness through time, derived from the number of taxa in our dataset, and performed separate correlations for various time intervals, with and without conservation and concentration Lagerst€ atten taxa. Geological stages lacking any data were removed from all correlations where necessary.
We compared theropod SCM2 with stage bin length to assess whether the uneven lengths of stages influenced completeness recovered for individual intervals. Changes in sea level through time were derived from Butler et al. (2010), and were compared to theropod SCM2 because it has been argued that sea level has a potential influence on the completeness of marine fossil groups (Cleary et al. 2015;Tutin & Butler 2017), although whether this relationship holds in the terrestrial realm is subject to debate (Fara 2002). The number of dinosaur-bearing formations (DBFs) and dinosaur-bearing collections (DBCs) for the Carnian to the Maastrichtian were collected from the PBDB. These have been argued to represent proxies for the amount of rock availability and the level of collection effort made on the respective fossil groups (Upchurch et al. 2011), which could have a strong influence on the theropod fossil record. However, the use of these as sampling proxies has been criticized (Benton et al. 2011;Dunhill et al. 2014Dunhill et al. , 2018Benton 2015;Brocklehurst 2015), with formation counts in particular being regarded as information redundant when compared to raw diversity changes (Benton 2015;Dunhill et al. 2018). Results from comparisons between completeness and these proxies should therefore be taken with a level of caution. We consequently opted to calculate Good's u as an estimate of sampling coverage for each time bin. This estimates coverage for each geological stage based on the relative proportion of singleton (taxa sampled from one site only) to non-singleton (taxa sampled from two or more sites) taxon occurrences. If a geological stage has a majority of singleton taxa and a minority of non-singleton taxa, it will have low coverage and is therefore poorly sampled; but if there are higher proportions of non-singleton taxa, then the coverage for that stage is higher, suggesting that the fauna is more evenly sampled and better understood. Species-level theropod taxon occurrences per stage were gathered from the PBDB and sampling coverage was calculated using an R function developed by Chao & Jost (2012) (see Cashmore & Butler 2019). We also used number of theropod PBDB occurrences and the number of specimens per taxon (from our dataset) as proxies for relative abundance of theropod fossils and compared the summed number of each per stage with the theropod SCM2 time series. We also tested each major individual time series for trends in the overall patterns through time and whether combinations of observed species richness, fossil record sampling and time bin length provided significant explanations of mean completeness through time.
Theropod completeness through time was also compared with the records of other Mesozoic tetrapod groups for which skeletal completeness studies have been performed: plesiosaurs (Tutin & Butler 2017), ichthyosaurs (Cleary et al. 2015) and sauropodomorph (Mannion & Upchurch 2010a) time series. These comparisons aimed to identify shared or diverging completeness signals between the different groups of terrestrial and marine vertebrates.

Non-temporal comparisons
A variety of comparisons of median and distribution of completeness values were made between subsets of the data, including Triassic, Jurassic and Cretaceous data, the major theropod subgroups, geographical hemispheres and continents, and the preservational regimes, sedimentary settings and depositional environments of each taxon. If a taxon with multiple specimens is known from more than one of these subsets, the taxon's completeness score was replicated in each group when performing statistical comparisons. Some singleton taxa were assigned to multiple depositional settings when one specific setting was not known for certain. SCM2 values are currently also known for plesiosaurs ( SCM2 values for individual taxa were also compared with the number of known specimens, modern and palaeolatitudinal coordinates, and with their body mass estimates, if available. For taxa known from multiple localities, the modern and palaeolatitudes of the type specimen were used for analyses. The relationship between body mass and completeness was further tested by excluding conservation Lagerst€ atten taxa (which tend to preserve numerous relatively complete specimens of small-sized species), and concentration Lagerst€ atten taxa, to assess whether these unusually preserved taxa were obscuring any underlying relationship between completeness and body size.

Statistical tests
All statistical analyses were performed in R. Time series plots were produced using the package ggplot2 (Wickham et al. 2019) and non-temporal completeness distributions plots were produced using the package vioplot (Adler 2015).
For linear regressions testing the statistical trend in overall patterns of individual time series and correlations between different time series, generalized least-squares regressions (GLS) with a first order autoregressive model (corARMA) were applied to the data using the function gls()in the R package nlme v. 3.1-137 (Pinheiro et al. 2018) as the chance of overestimating the statistical significance of regression lines due to temporal autocorrelation is reduced when using GLS. To ensure normality and homoskedasticity of residuals, time series were log-transformed prior to analysis. Likelihood-ratio based pseudo-R 2 values were calculated using the function r.squaredLR() of the R package MuMIn (Barto n 2018).
The results of fitting GLS autoregressive models to multiple combinations of potential explanatory variables were compared using Akaike's information criterion (AICc), calculated using the function AICc() of the R package qpcR (Spiess 2018). To identify the best combination of variables from those analysed, Akaike weights were calculated using the aic.w() function of the R package phytools (Revell 2017).
Pairwise comparisons of non-temporal range data were performed using non-parametric Mann-Whitney-Wilcoxon tests, which compare the standard deviation and median of datasets. False discovery rate (FDR; Benjamini & Hochberg 1995) adjustments were used to reduce the likelihood of acquiring type I statistical errors over multiple comparisons. Kruskal-Wallis tests, which analyse whether there is a dominance of a specific variable, were used for comparisons of more than two datasets (e.g. subgroups, continents, and depositional settings). GLS models were also used to compare the non-temporal relationship between log-transformed theropod SCM2 and specimen number, body size estimates, latitude and palaeolatitude. The Shapiro-Wilk normality test was used to assess whether theropod latitudinal occurrences have a normal distribution. Hartigans' Dip test was employed using the R package diptest (Maechler 2013) to test the level of bimodality/multimodality of the latitudinal distribution of theropod occurrences.

Theropod completeness through time
Mean theropod skeletal completeness ( Fig. 2) ranges between 10% and 48% through the Mesozoic, with notable peaks in the Carnian, Oxfordian-Kimmeridgian and Barremian-Aptian, and lows in the Berriasian and Hauterivian. All stages exhibit relatively wide standard deviations apart from the Bathonian and Berriasian. There is no significant trend in full theropod SCM2 ( fig. S2). The models that best explain the theropod SCM2 time series are those including taxon diversity + sea level, taxon diversity + DBFs, and taxon diversity + DBFs + time bin length as explanatory variables, although all three of these models have weak R 2 values (0.16-0.27) and their coefficients are non-significant (Cashmore & Butler 2019, table S3). F I G . 2 . Changes in theropod skeletal completeness through time. Mean SCM2 (red line) with one standard deviation from the mean (shaded) and all taxon SCM2 scores per stage (grey circles).

Correlations with theropod taxonomic richness through time
The observed theropod species count gradually rises throughout the Mesozoic, with relative peaks in the Norian, Kimmeridgian and Aptian, and extreme outlying peaks in the Campanian and Maastrichtian (Fig. 3A). There is a strong significant trend toward increasing species counts through time (Cashmore & Butler 2019, table  S2). There is no statistically significant correlation between mean theropod SCM2 and observed richness

Correlations with sampling proxies and sea level
There is no significant relationship between mean theropod SCM2 and time bin length (Table 2). DBFs and DBCs (Cashmore & Butler 2019, table S2) show significant trends through time and rise from the Late Triassic onwards, with similar relative peaks in the Late Jurassic, the Aptian-Albian and the latest Cretaceous (Cashmore & Butler 2019, fig. S4A-B). There is no significant correlation between theropod SCM2 and DBFs and DBCs through time (Table 2). Furthermore, theropod SCM2 does not show a significant correlation with either specimen numbers or PBDB occurrences per stage (Table 2; , table  S2). However, there is a very weak but statistically significant correlation between non-temporal SCM2 score and specimen numbers per taxon (R 2 = 0.08, p = <0.0001). Good's u sampling coverage, which exhibits no significant trend through time (Cashmore & Butler 2019, table S2), and has troughs in the Rhaetian, Toarcian and Aalenian, and peaks in the earliest Jurassic, Late Jurassic, and middle and latest Cretaceous (Cashmore & Butler 2019, fig. S4E), also lacks a significant correlation with theropod SCM2 (Table 2). Sea level gradually rises in a stepwise manner throughout the time interval, reaching a high in the Late Cretaceous, and has no significant correlation with SCM2 through time (R 2 = 0.04, p = 0.33).
T A B L E 1 . Results of pairwise comparisons between theropod SCM2 and taxon richness time series using GLS.

Comparison to other tetrapod fossil records
Theropod completeness values range from just above 0 to 100%, with a median completeness of 17%, which is similar to the median and range of pelycosaur-grade synapsids and sauropodomorphs (Fig. 4). Mann-Whitney-Wilcoxon tests reveal theropod SCM2 distribution is statistically no different to pelycosaurs, but is significantly lower in comparison to the sauropodomorph distribution (Table 3). Theropods have a significantly less complete skeletal record than Parareptilia, and the marine ichthyosaurs and plesiosaurs (Fig. 4, Table 3). Time series comparisons show no significant correlation between theropod and sauropodomorph (Fig. 5A), ichthyosaur (Fig. 5B), or plesiosaur ( Fig. 5C) SCM2 through time (Table 4). However, when removing taxa known from conservation Lagerst€ atten, a significant relationship is identified between the theropod and sauropodomorph curves (Table 4). A stronger and statistically significant result is found during just the Triassic-Jurassic, even though mean stage-level sauropodomorph completeness is consistently higher (Fig. 5A) and sauropodomorph median completeness is significantly higher than that of theropods during this interval (Table 3). In the Cretaceous, mean stage level sauropodomorph completeness drops (also significant drop in sauropodomorph median completeness: W = 5256, p = 0.0001) and the significant differences in median completeness and distribution of scores between them and theropods are lost (Table 3).

Theropod subgroups and body size
Compsognathidae have the highest median SCM2 (89%) of any subgroup by a substantial margin (Fig. 6), and, like non-deinonychosaurian Paraves, have a markedly different distribution to all other taxonomic groups. Compsognathids have the highest lower quartile and upper quartile completeness compared to any other subgroup. Following these strongly outlying group distributions, Oviraptorosauria (28%) and Ornithomimosauria (33%) have the next highest median SCM2. All remaining subgroups have median SCM2 of <25%. Basal Tetanurae, Megaraptora, basal Coelurosauria, Alvarezsauroidea and Therizinosauria are all notable for their relatively low completeness ranges and lack of completely known taxa (Fig. 6). Ceratosauria and Troodontidae also have particularly low median completeness values. Megaraptora has by far the least complete record of any subgroup, with the second lowest median (5.98%), lowest upper quartile, and a high of only 34%. Kruskal-Wallis tests suggest the variance of completeness distributions is dominated by one or more subgroups (H = 47.786, p = 5.132 9 10 À5 ). Cashmore & Butler (2019, table S5) displays the results of pairwise Mann-Whitney-Wilcoxon tests between each subgroup. Compsognathidae is consistently found to have significantly higher SCM2 scores than almost all other subgroups.
No significant relationship is recovered between theropod SCM2 and body mass estimates (R 2 = 0.017, p = 0.144) for individual taxa from GLS modelling, even when conservation Lagerst€ atten taxa (R 2 = 0.015, p = 0.129) are removed, or when concentration Lagerst€ atten taxa are additionally removed (R 2 = 0.02, p = 0.09) (Fig. 7).   Figure 9 shows modern and palaeolatitudinal distributions of theropod taxon finds in relation to their SCM2 scores. Taxon occurrences are unevenly situated within the northern hemisphere, heavily concentrated from around c. 20-55°, but with only one taxon above c. 56°N. Here, higher completeness values generally become more frequent at higher latitudes. Towards the equator both occurrences and levels of completeness substantially drop, with only nine occurrences between 10°N and 10°S, and a peak SCM2 score of 38%. Between c. 20 and 50°S there is much less data but a similar peak in occurrences and completeness to the northern hemisphere. Statistically significant Shapiro-Wilk normality and Hartigans' Dip tests suggest the latitudinal density distribution is non-normal (W = 0.72, p < 2.2 9 10 À16 ) and non-unimodal (D = 0.04, p = 9.666 9 10 À6 ) respectively. Further, there is a weak statistically significant positive correlation between latitude and SCM2 value (R 2 = 0.04, p = 0.017). In contrast, palaeolatitudinal coordinates show a more even spread of theropod occurrences within an ancient context (Fig. 9B), but the palaeolatitudinal density distribution is still significantly non-normal (W = 0.82, p < 2.2 9 10 À16 ) and non-unimodal (D = 0.04, p = 6.631 9 10 À6 ). Higher and lower northern palaeolatitudes are better represented, but there is still poor equatorial, polar and general southern representation and completeness.

Sedimentary and depositional setting
There is no significant difference between the range of completeness values of taxa from either siliciclastic or carbonaceous sedimentary settings (W = 8295.5, p = 0.32; Cashmore & Butler 2019, fig. S8). On the other hand, a statistically significant difference is found between the completeness range of theropods from terrestrial and marine deposits, with taxa from the latter being less complete (W = 8995.5, p = 0.003; Cashmore & Butler 2019, fig. S9). Kruskal-Wallis tests suggest that one or more settings significantly dominate the distribution of depositional environments (H = 48.262, p = 3.141 9 10 À9 ). Lacustrine deposits exhibit statistically higher SCM2 values than all other depositional settings, with the exception of aeolian deposits ( Fig. 10 Cashmore & Butler 2019, table S9). The latter has the next highest range of values but a similar median value to taxa from alluvial plains. Fluvial channels, coastal and openmarine settings are sequentially the depositional settings with the least complete specimens, and all exhibit statistically similar completeness ranges (Fig. 10).
Cashmore & Butler (2019, fig. S10) shows mean temporal SCM2 based solely on taxa from the six depositional categories. Aeolian and open marine SCM2 curves are the only environmental time series that lack a statistically significant relationship with total SCM2 through time in GLS correlations (Cashmore & Butler 2019, table S10).

Comparative completeness
The range of skeletal completeness values observed indicates that the theropod fossil record is one of the poorest of previously assessed tetrapod groups (Fig. 4) of taxa are c. 5-10% complete, numbers of taxa sharply drop above 20% SCM2, with a very gradual but steady decline towards increasing completeness levels. This low level of skeletal completeness for such a well-known group can potentially be explained by the ability of palaeontologists to recognize synapomorphic characters of theropods based on very little fossil material. It could also be explained by a heightened scientific interest in theropods, producing more taxa named from material unlikely to be intensely studied in other tetrapod groups (Benton 2008(Benton , 2010. Verri ere et al. (2016) examined only genuslevel taxa of parareptiles, and this may potentially explain the higher completeness of parareptiles in relation to all other terrestrial groups.
When conservation Lagerst€ atten taxa are excluded from the theropod time series, a significant positive correlation between sauropodomorph and theropod completeness is recovered (Table 4). The lack of correlation when conservation Lagerst€ atten are included emphasizes how preservational or ecological exclusion of the large bodied sauropodomorphs from such deposits could be limiting our interpretations of their fossil record. As there are almost no sauropodomorph taxa found in conservation Lagerst€ atten, their fossil record shows differences from other clades that are richly represented in such deposits. Thus, conservation Lagerst€ atten create a strong signal in the theropod data that obscures an underlying correlation with sauropodomorph completeness. This underlying correlation probably reflects the groups' cohabitation of generally similar palaeoenvironments (Butler & Barrett 2008) and the many overlaps in geographical localities, as well as likely subjection to similar sampling standards through historical time on a global scale (Upchurch et al. 2011;Starrfelt & Liow 2016), although it has been suggested that theropod fossil sampling on regionally scales is potentially heightened in comparison to other dinosaurs (Farlow 1976(Farlow , 1993McGowan & Dyke 2009;Horner et al. 2011). The non-conservation Lagerst€ atten theropod and sauropodomorph time series have stronger statistical correlations with each other during the Triassic-Jurassic but diverge in the Cretaceous.
The non-temporal range of sauropodomorph completeness scores is significantly higher than that of theropods (Table 3). Cretaceous data considered alone lacks this significant difference (Table 3). However, removing theropod conservation Lagerst€ atten from this comparison reduces the median and upper quartile range enough to create a statistically significant difference between the Cretaceous records, like all other non-temporal comparisons between the groups. This is intriguing as it suggests that under similar preservation regimes, theropod specimens are significantly less complete than sauropodomorph specimens. Again, this illustrates how the theropod fossil record is positively influenced by the presence of conservation Lagerst€ atten.
Following this, the consistently higher levels of sauropodomorph completeness might be caused by ecological or preservational differences between them and theropods. It is likely that the higher population numbers of the herbivorous and often gregarious (Lockley et al. 1986;Upchurch et al. 2004;Myers & Fiorillo 2009) sauropodomorphs in Mesozoic ecosystems, as well as their generally more robust skeletons, enhanced their preservation potential relative to theropods. Large carnivorous theropods would also be expected to be less abundant than their herbivorous contemporaries (Farlow 1993;White et al. 1998 Though our results show that inland settings generally preserve more complete theropod specimens, there is no significant difference in the distribution of completeness scores of theropods from coastal settings in comparison to fluvial or alluvial settings (Cashmore & Butler 2019, table S9). Differences may be exacerbated in the sauropodomorph record. These reasons might explain the lack of correlation between the two time series in the Cretaceous, as well as the drop in sauropodomorph completeness to levels comparable to theropods.
If SCM and CCM generally depict similar completeness signals through time (Mannion & Upchurch 2010a; Tutin & Butler 2017), then comparisons can be drawn between the SCM of theropods and completeness estimates for other Mesozoic terrestrial taxa for which only CCM has been calculated. The non-avian theropod fossil record shows similarities to fluctuations in pterosaur and bird CCM through time. All have time series that begin with relatively high completeness levels, have dramatic reductions in completeness at the Jurassic-Cretaceous boundary, a reduction in completeness and diversity from the Aptian to the Albian that reflects the influence of Lagerst€ atten (see below), and a Maastrichtian fossil record F I G . 1 0 . Distribution of theropod SCM2 scores between different depositional settings. that is taxonomically diverse but has relatively low completeness values (Brocklehurst et al. 2012;Dean et al. 2016). However, theropod (SCM2) and pterosaur (CCM2) time series reveal no significant correlation for all time bins (R 2 = 0.13, p = 0.08) or solely the Triassic-Jurassic (R 2 = 0.17, p = 0.99), and there is also no correlation between theropod (SCM2) and bird (CCM2) time series (R 2 = 0.05, p = 0.8). However, differences between these time series may have been exacerbated by the use of differing completeness metrics. On the other hand, similarly to the significant similarities in the sauropodomorph and theropod SCM2 records, the sauropodomorph and pterosaur CCM2 time series are significantly correlated during the Triassic-Jurassic (Dean et al. 2016), hinting at a potential common causal control of completeness for Triassic-Jurassic terrestrial taxa. Furthermore, like the non-avian theropod record, bird CCM is correlated with observed taxonomic richness through the Jurassic-Cretaceous. Non-avian theropods and birds also show a similar distribution of geographical occurrences and relative continental completeness, with northern landmasses yielding more taxa than southern; Asia has the most rich and complete (CCM) record, North and South America have relatively abundant but typically less complete records, and there are a few finds in Australia and Antarctica (see Brocklehurst et al. 2012). The similarities between the non-avian theropod and bird records are unsurprising given that the latter are direct descendants of the former, considering their similar life histories, ecologies and environmental preferences (Erickson et al. 2009;O'Connor et al. 2011b), as well as the overlapping geological occurrences. Dean et al. (2016) concluded that the similar flight-adapted body plans and fragility of bird and pterosaur skeletons explained their similar patterns of completeness. Likewise, many non-avian theropod groups (e.g. coelurosaurs) had comparable body plans to Mesozoic birds and so at least in part experienced similar preservation biases.
The global similarities highlighted in the theropod, sauropodomorph, avian and pterosaur fossil records could be explained by a large scale common cause. Instead of preservational issues dependant on ecological or biological affinities, these temporal similarities could well represent time bins of genuine higher and poorer quality for all terrestrial tetrapods regardless of taxonomic group, probably controlled by geological and taphonomic histories. Therefore, major components of the terrestrial tetrapod faunas may have generally similar fossil records governed by geological processes and sampling availability. This is somewhat supported, given that the completeness distributions of all terrestrial groups are fundamentally different to the marine Plesiosauria and Ichthyosauria records. As far as can be concluded from our study and previous discussion (Rook et al. 2013;Cleary et al. 2015;Tutin & Butler 2017) there are fundamental differences between the marine and terrestrial fossil records and tetrapods have consistently higher SCM and CCM values in the marine realm.

Depositional biases
Our results suggest that the best preserved theropod skeletons are those from lacustrine and aeolian deposits, where lack of transport and rapid burial ensured skeletal material was protected from scavenging, weathering, disarticulation and decay. Lacustrine environments are associated with conservation Lagerst€ atten deposits in the Alluvial, fluvial, coastal and open marine depositional settings generally have incrementally fewer relative occurrences of high completeness, which can probably be attributed to the levels of transportation skeletons underwent before burial. A large quantity of concentration Lagerst€ atten deposits occur within alluvial plains, which seems to result in the higher numbers of taxa in the 30-40% completeness range for this preservation regime.
44% of taxa in our dataset are derived from fluvial channel deposits and there is a strong statistically significant correlation of fluvial channel SCM2 and total SCM2 (Cashmore & Butler 2019, table S10). This supports the unsurprising idea that a large component of our understanding of the theropod fossil record is derived from fluvial depositional settings. Although this is probably the case for most terrestrial fossils, as fluvial deposits are commonly preserved, it highlights our reliance on a regime that naturally transports and winnows its sedimentary load, leading to abrasion and disarticulation of skeletal material within it. White et al. For theropods, the Maastrichtian and the preceding Campanian are marked by taxon occurrences that are significantly higher in number than other geological stages but have fundamentally unremarkable levels of skeletal completeness. The Campanian and Maastrichtian alone contain 34% (156/455) of all theropod taxa in our data set, but many species from these intervals are named from relatively incomplete material. One potential driver of this could be the substantial corresponding rise in taxa derived from fluvial channels within the latest Cretaceous (88/156 Campanian and Maastrichtian taxa, 56%) (Cashmore & Butler 2019, fig. S10C), in comparison to all pre-Campanian stages (105/305 taxa, 34%). Increased preservation within these erosive regimes could at least partially explain the relatively poor levels of completeness. The increased number of occurrences within fluvial settings predominantly corresponds with a few formations in North America, such as the Dinosaur Park (14/15 fluvial channel taxa), Hell Creek (6/6 fluvial channel taxa), and Horseshoe Canyon In addition to the fluvial signal, the significant correlation between lacustrine, alluvial plain and coastal environment SCM2 and total SCM2 (Cashmore & Butler 2019, table S10) suggests that they all significantly impact our understanding of the theropod fossil record. This is, however, not the case for the aeolian and open marine settings; again a foreseeable outcome as these two environments are the most unlikely to consistently preserve theropod fossils.
In theory, large scale sea level fluctuations could control the amount of fossil material preserved within different time bins due to variation in continental flooding (Butler et al. 2010). The lack of any significant correlation between SCM2 and sea level changes suggests that sea level is poorly supported as a large scale control on the theropod fossil record. However, sea level does contribute to the model that best explains changes in SCM2 through time, along with raw diversity (Cashmore & Butler 2019, table S2). This could indicate some level of sea level influence on specimen completeness but has relatively low explanatory power.

Biological and ecological biases
The wide differences between the non-temporal SCM2 ranges of different theropod subgroups (Fig. 6) suggests skeletal completeness may in some ways be influenced by the different abundances, ecologies, body sizes and environmental preferences of different groups of theropods.
Megaraptora has one of the lowest median completeness of any group and no known taxa over 34% complete, which could be explained by generally low number of specimens known for each taxon (75% of taxa known from single specimens) and their common recovery from fluvial channel deposits (67% of taxa) (Cashmore & Butler 2019, table S11). Its poor record probably also stems from its relatively recent recognition as a group (Benson et al. 2010a) and unclear phylogenetic relationships (Porfiri et al. 2014(Porfiri et al. , 2018Novas et al. 2016). Continued finds in relatively unexplored areas of South America and Australasia are likely to boost its currently poor skeletal record.
Ceratosaurians and troodontids are known from a wide range of completeness scores but comparatively low median SCM2 (Fig. 6) resulting in relatively poor records. 71% of ceratosaurians and 74% of troodontids in our dataset are known from singleton specimens (Cashmore & Butler 2019, table S11). Though there is some evidence of troodontid rarity within some palaeoecosystems (White  ) demonstrated that abelisaurid specimens only had a positive association with terrestrial regimes, meaning relatively few abelisaurid fossils were transported to coastal environments and may therefore have more commonly occupied a setting relatively far inland. In our dataset, 63% of ceratosaur taxa are found in fluvial channels and 21% are from alluvial plains.
Basal tetanurans, alvarezsauroids and therizinosaurians all have relatively poor and statistically similar completeness distributions that lack highly complete taxa. Their records may represent a genuine rarity in ancient ecosystems, potentially limited environmental preferences (Butler & Barrett 2008) or a scarcity of finds (Bell et al. 2012;Currie & Koppelhus 2015) as 50% of basal tetanurans, 71% of alvarezsauroids, and 63% of therizinosaurians are known from single specimens (Cashmore & Butler 2019, table S11).
Unlike almost all other theropod groups, the distinctive spinosaurid megalosauroids can be regarded, with some certainty, to have had at least partially piscivorous diets (Charig & Milner 1997;Rayfield et al. 2007;Cuff & Rayfield 2013;Sales & Schultz 2017) and relatively specific environmental preferences for fluvial and coastal settings (Amiot et al. 2010;Ibrahim et al. 2014;Sales et al. 2016). These environments produce numerous but generally poor quality theropod finds. The spinosaurid record reflects this in that there are only ten taxa in our dataset (only nine classified species) but abundant fossil occurrences are known from specific sites (L€ ang et al. 2013;Medeiros et al. 2014;Benyoucef et al. 2015), most of which preserve solely teeth. However, isolated from the other megalosauroids their non-temporal distribution of completeness scores is statistically no different to non-spinosaurid megalosauroids (W = 58, p = 0.3669), and is not significantly lower than any other subgroup except Compsognathidae (W = 12, p = 0.0029), Oviraptorosauria (W = 109, p = 0.0101), and non-deinonychosaurian Paraves (W = 24, p = 0.0036), all of which have relatively unique records in relation to other theropods (see below). The non-significant difference between the distribution of their completeness scores and most theropod subgroups may relate to their heightened association with deposition-friendly aquatic settings (Hone et al. Basal theropods, basal neotheropods, megalosauroids, allosauroids, basal coelurosaurians, tyrannosauroids and dromaeosaurids all have relatively unremarkable distributions of completeness values that largely resemble the overall theropod distribution. The generality of their records probably derives from a mixture of specimen numbers per taxon (all groups have singleton specimen taxa close to or above 50%), broad depositional environments (except basal Theropoda and basal Coelurosauria no one depositional setting corresponds to more than 50% of a groups' taxa), and similar preservational regimes (all but Allosauroidea have at least 20% of taxa from concentration deposits) (Cashmore & Butler 2019, table S11). Unlike the rest of these groups, tyrannosauroids have an unusual number of highly complete taxa. This may represent local taphonomic biases towards large bodied animals (Brown et al. 2013); however, increased sampling effort in attempts to collect museum display specimens could also have aided their completeness. Species such as Tyrannosaurus rex are famed for their ability to fascinate and attract the public and are a highly prized commodity for museums and institutions.
Ornithomimosaurians and oviraptorosaurians have very similar distributions that contrast significantly with other subgroups. The fairly consistent number of taxa at all levels of completeness with relatively minor reduction at high levels (Fig. 6) suggests that the influences on their preservation differ from most other groups. Intriguingly, both groups have comparable morphological adaptations of the skull (the reduction or total loss of teeth and the development of beaked skulls) and it has been suggested that they were herbivorous and omnivorous (Barrett 2005(Barrett , 2014. A further distinction between these subgroups and others is increased gregariousness, as suggested by monodominant bonebed assemblages (Kobayashi & L€ u 2003;Varricchio et al. 2008. Cullen et al. 2013Funston et al. 2016), potential communal nesting (Norell et al. 1995;Fanti et al. 2012;Xu et al. 2014) and possibly heightened abundance in comparison to other theropods (White et al. 1998). Gregarious behaviour and higher abundance within Mesozoic ecosystems is likely to enhance the chances of individuals being preserved, and the chances of preserving complete skeletons due to the heightened density of individuals within local areas.
In contrast to all other groups, the significantly higher completeness distribution of the compsognathid and nondeinonychosaurian paravian records are almost exclusively the result of preservation in exceptional depositional settings, mostly in lacustrine environments (50% and 87% respectively) (Cashmore & Butler 2019, table S11). Compsognathidae has the highest median completeness of any group and exhibits a bimodal distribution that derives from most taxa preserving in conservation Lagerst€ atten (70% of taxa) and a few in normal sedimentary regimes (20% of taxa). They are also the most limited theropod subgroup, with only ten taxa in our dataset. By contrast, a striking 93% of non-deinonychosaurian Paraves (14/15 taxa) are solely known from conservation Lagerst€ atten (Cashmore & Butler 2019, table S11). Without the presence of exceptional Lagerst€ atten deposits it is highly unlikely that these groups would be as well understood as they currently are. However, differing levels of spatial sampling intensity influences the discovery of such exceptional deposits (Eliason et al. 2017), therefore limiting our evolutionary understanding of groups that seem to be dependent on Lagerst€ atten to consistently preserve in the fossil record (Sales et al. 2014).
The statistically significant correlations of mean SCM2 time series for basal Theropoda, Allosauroidea, Compsognathidae, Alvarezsauroidea, Oviraptorosauria and nondeinonychosaurian Paraves with total SCM2 suggests that their records are most representative of the overall temporal completeness signals for theropods. The most notable are the basal theropods, which explain the high completeness levels in the Late Triassic, and the compsognathids and non-deinonychosaurian Paraves, which strongly contribute to the mean temporal completeness signal in the Late Jurassic and Early Cretaceous (Cashmore & Butler 2019, table S6).
Body size has previously been argued to be a strong factor in fossil preservation, with larger, more robust skeletal elements preferentially surviving fossilization (Cooper et al. 2006;Brocklehurst et al. 2012;Brown et al. 2013) except when elements become too large for easy burial. In this scenario it is expected that very small and very large taxa are less frequently preserved in the fossil record making their skeletons more fragmentary (Cleary et al. 2015), thus potentially not reflecting their original abundance. Brown et al. (2013) concluded that there is significant bias towards high abundance and high completeness of large bodied dinosaurs in Dinosaur Provincial Park in Alberta, Canada. Further, Zanno & Makovicky (2013) identified a significant relationship between body mass of closelyrelated herbivorous Asian theropods and fossil localities, concluding that a taphonomic and/or ecological signal was obscuring evolutionary trends in body mass. Studies show that on a global scale the highest completeness scores arise from different size categories dependent on the tetrapod group in question (Cleary et al. 2015;Gardner et al. 2016;Driscoll et al. 2018). On the other hand, Orr et al. (2016) argued that because of the role of decay products and adhesion of downward facing bones to the sediment, completeness of a skeleton is not necessarily influenced by size or density of the skeletal elements. Our results of the global theropod record do not recover a relationship between body size and skeletal completeness. We initially thought that this might reflect the many highly complete but small taxa derived from conservation Lagerst€ atten (Gardner et al. 2016). Removal of these taxa, and the further removal of concentration Lagerst€ atte taxa from the correlation again resulted in no relationship in either analysis. Because of this we are not convinced that body size of theropods influences the completeness of their fossil record on a global scale. A singular variable cannot adequately explain the differential completeness of all theropod skeletons, but size biases probably strongly influence the record on local scales. Biases that reduce the occurrence and completeness of small taxa under normal depositional regimes also act to limit the occurrence of larger taxa from preservation in conservation Lagerst€ atten (Zhou & Wang 2010;Gardner et al. 2016).

Sampling biases
Our analyses suggest that rock volume or outcrop availability (DBFs), collection effort (DBCs) and sampling coverage (Good's u) are not significant controls on specimen completeness within the theropod fossil record on a global scale. The number of theropod fossil occurrences (PBDB and specimen) through time also has no significant influence on the temporal completeness patterns, but increased specimen numbers do tend to lead to enhanced completeness for individual taxa. GLS model fitting results reveal different combinations of sampling proxy also offer little explanation for the changes in the SCM2 time series (Cashmore & Butler 2019, table S3). DBFs contribute to two of the best explanatory models but little can be concluded from these due to relatively low R 2 values and AIC weights.
Our results reveal strong spatial biases between different latitudes and continents. The high abundance of theropod remains from northern mid-latitudes and the relative scarcity of specimens at other latitudes strongly suggests a historical focus on Europe, North America, northern Africa and East Asia, and the comparative neglect of South America, southern Africa and Australia (Benton 2008;Tennant et al. 2018). This is supported by the significantly higher completeness distributions of theropods from Asia and North America (Fig. 8).
The geographical differences in the quality of the theropod fossil record cannot only be due to historical sampling intensity. The latitudinal distribution of theropod occurrences is relatively bimodal in nature, with the dominant occurrences not only coming from the northern but also the southern mid-latitudes within modern and ancient contexts (Fig. 9). This suggests that the most productive theropod fossil localities occur in particular latitudinal zones, probably governed by climate and local environment.
Though we have not quantified it here, modern environments and climate probably play an important role in the availability of theropod bearing localities and, therefore, the global understanding of the group. For example, western Europe, the birth place of modern palaeontology, probably has among the highest historical research levels of any continent, but the theropod fossil record is the worst of all studied in terms of quantity and relative quality (SCM2), barring the very limited Australasian and Antarctic records. Benton (2008) similarly found that recent dinosaur species described from European deposits were of the poorest quality in comparison to other continents, and attributed this to historical research efforts and an overfamiliarity with deposits, corroborated by high European theropod Good's u sampling coverage estimated by Tennant et al. (2018). This, however, cannot be solely driven by human sampling effort, but is more likely to reflect the lack of consistent availability of terrestrial Mesozoic horizons yielding fossiliferous material. This may be due to the generally temperate climate, vegetation cover and subsequent erosion in modern day localities. Because of this limited exposure, many of the terrestrial occurrences come from rapidly eroding coastal sections, where even if specimens were originally more complete, elements might be lost. Furthermore, large quantities of the European Jurassic and Cretaceous occurrences are marine, because Europe was an archipelago (possibly making it easier for taxa to end up in marine deposits) (G€ ohlich & Chiappe 2006;Csiki et al. 2010;Csiki-Sava et al. 2015), which we have found to be consistently less complete than terrestrial theropod specimens. However, Europe does still preserve many key theropod taxa.
Vast arid areas with little vegetation and high levels of rock exposure such as western North America, Patagonia, northern and southern Africa, and East Asia provide ideal conditions for the heightened availability of fossiliferous localities and are probably driving the completeness signals seen between different continents and latitudes (Raup 1972(Raup , 1976Wall et al. 2009).
On the other hand, Australasia's poor record cannot simply be attributed to a significant lack of rock availability. Rich & Vickers-Rich (1997) argued that Australia's poor dinosaur record was the result of deep weathering of land profiles, aided by low topographic relief and by a lack of mountain building causing fossils to either be leached away or eroded through extended exposure. A number of sites with the potential to yield vast quantities of dinosaur remains have produced numerous isolated specimens but very few associated skeletons that can be confidently identified at low taxonomic levels (Rich & Vickers-Rich 1997;Hocknull et al. 2009;Agnolin et al. 2010).
An almost complete absence of occurrences at high latitudes (>60°north and south) and the scarcity and low completeness of theropod occurrences from equatorial regions emphasizes the geographical limitations in our sampling of the theropod fossil record (Fig. 9). Reasons for this could be the comparatively limited exploration of fossil bearing localities in these regions, many of which represent challenging environments for fieldwork. The lack of rock exposure due to extensive vegetation overgrowth (e.g. Amazon, Congolese and Indonesian rainforests) and ice cover (Arctic and Antarctic) vastly reduce the sampling availability, plus extreme weathering processes such as frost shattering aid erosion of preserved skeletons. There is, however, potential for further theropod findings in these regions; especially Antarctica, which has previously produced a number of new dinosaur species (Olivero et al. 1991;Hooker et al. 1991;Hammer & Hickerson 1994;Case et al. 2000Case et al. , 2007Salgado & Gasparini 2006;Smith & Pol 2007;Cerda et al. 2012;Coria et al. 2013). In the future, the use of predictive modelling of fossil bearing localities may potentially improve our ability to sample these challenging environments more efficiently (see Anemone et al. 2011;Conroy et al. 2012;Emerson et al. 2015;Wills et al. 2018).
Furthermore, the spatial spread of sampling is variable through time (Fig. 9), and potentially creates another bias on completeness scores. Triassic theropod localities are the most geographically limited, which probably represents the restricted dispersal and diversity of the clade during the period. Jurassic and Cretaceous localities are much more latitudinally spread and far more consistently complete in the northern hemisphere, but both contain sporadic occurrences of low completeness in the southern hemisphere: only three Jurassic and four Cretaceous taxa exceed 50% completeness. Cretaceous occurrences cover the largest latitudinal distance of any period and are the most representative of more equatorial and higher latitudes. The Cretaceous northern hemisphere has produced 58% of the taxa of any age or locality, the majority of which are relatively poorly preserved.
Through time, different continents display different patterns of theropod completeness. The significant correlations between changes in SCM2 for Asian and European taxa and the total SCM2 dataset (Cashmore & Butler 2019, fig. S7) suggests that these two records best represent the current understanding of the quality of global theropod fossil record greater than other continents. However, both of these records also show significant correlation between changes in SCM2 and taxon richness through time (Cashmore & Butler 2019, table S8), suggesting changes in observed theropod diversity in these continents may be influenced by the preservation of specimens or vice versa (see below), unlike all other continents.

Lagerst€ atten influence
In comparison to total SCM2, background SCM2 shows more distinct drops in the Middle Jurassic, and the loss of the Oxfordian and Barremian-Aptian peaks (Fig. 3). Background taxon richness is very strongly correlated with total taxon richness throughout the entirety of the Mesozoic (Cashmore & Butler 2019, table S4).
The relatively high Callovian-Kimmeridgian total SCM2 seems to be mostly driven by the high completeness scores derived from conservation deposits, as the mean background and concentration SCM2 for the stage are relatively low. The high number of taxa derived from conservation Lagerst€ atten partially explains the richness peak in the Callovian, but a high abundance of concentration deposits seems to contribute the most to enhance the total richness peaks in the Late Jurassic stages (Fig. 3C-D). The Barremian and Aptian peaks and subsequent Albian drop in total SCM2 and richness are almost totally derived from conservation Lagerst€ atten, as 25 and 33 conservation Lagerst€ atten taxa occur in the former stages, respectively. Our results also indicate that without Lagerst€ atten included, mean completeness slightly drops through time (Cashmore & Butler 2019, table S2) showcasing how significant these preservational regimes are for our interpretations of the theropod fossil record.
The influence of concentration and conservation Lagerst€ atten on theropod faunas is important because a large drop is observed in both total SCM2 and taxon richness across the Jurassic-Cretaceous boundary. This interval has previously been postulated as an extinction event for specific marine and terrestrial groups (Barrett et al. 2009;Benson et al. 2010b;Starrfelt & Liow 2016;Tennant et al. 2016a, b) due to observed drops in diversity. Our findings show that the Late Jurassic peak in theropod taxonomic richness is much reduced when Lagerst€ atten are excluded, resulting in more reasonably similar background richness in both the Tithonian and Berriasian. Though this is simply the theropod record, it may signify that the apparent observed falls in species richness for other groups may be an artefact of preservation, probably controlled by the loss of Lagerst€ atten taxa and genuinely poor preservation in the earliest Cretaceous.

Impact on evolutionary understanding
The weak but significant correlation between observed taxon richness and specimen completeness throughout varying time intervals (Carnian-Albian, Hettangian-Albian, Jurassic-Cretaceous, Cretaceous) might suggest that changes in observed theropod diversity are influenced by the completeness of specimens, as time intervals with good preservation will yield high taxonomic abundance. This is important because it suggests that our understanding of theropod macroevolution may be influenced by temporal variation in the quality of the fossil record. However, the correlations are not very strong, and are lost depending on the inclusion of a few stages. Exclusion of Triassic stages and inclusion of Cretaceous stages seems to increase the strength of the correlation between richness and completeness ( Table 1). The strongest correlation occurs in just the Cretaceous stages. There is also notable divergence between the taxonomic richness and mean completeness in the Carnian, Rhaetian, Campanian and Maastrichtian.
Alternative explanations for a positive correlation between diversity and completeness are: (1) genuine evolutionary events drive diversity change and alter the relative likelihood of preservation of taxa and therefore completeness within a stage (Brocklehurst et al. 2012), for example, times of high diversity provide more chance of taxon preservation and vice versa; and (2) more fossil specimens or occurrences increase both completeness of taxa and the number of identified taxa of a stage (Brocklehurst et al. 2012).
The Carnian has relatively high mean specimen completeness even though raw diversity is low, which suggests that macroevolutionary understanding at the beginning of theropod evolution is not influenced by taxon completeness, specimen counts or abundance. The Carnian theropod signal is anomalous because it has one of the highest standard deviation of scores for any stage (33.2%) and most (60%) taxa are derived from the Ischigualasto Formation of Argentina, which tends to predominantly produce well-preserved skeletons. The subsequent Norian has much reduced completeness but vastly increased specimen count and raw diversity reflecting the proliferation of neotheropods and an increased sampling pool in other formations with poorer preservation regimes. Other stages, such as the Toarcian, Aalenian and the Valanginian, which show relatively high mean completeness but low specimen number and taxon abundance, are likely to be the result of relatively poor sampling. Even though there is no negative correlation between skeletal completeness and taxon richness, the Campanian and Maastrichtian are good examples of how increased specimen number and observed diversity does not necessarily equate to higher levels of taxon completeness. These intervals have the highest specimen number (733 combined), highest raw taxon richness (156 combined), and some of the most varied completeness scores of any stage, but with relatively few concentration (24 taxa, 15%) and no conservation Lagerst€ atten taxa. It could be argued that this peak in richness is the result of numerous taxa being falsely identified from fragmentary, non-overlapping skeletal material (Brocklehurst & Fr€ obisch 2014) but this seems doubtful considering the derived and probably more diagnostic nature of differing theropod clades during the latest Cretaceous. We would postulate that the numerous fossil rich localities from these stages in North America and East Asia, and the extensive sampling (Upchurch et al. 2011;Starrfelt & Liow 2016;Tennant et al. 2018) and heightened interest of these stages at the end of the dinosaur record probably explain their extensive outlying peaks in specimen number, raw diversity and the moderate completeness levels at which a majority of taxa are found and named.
Above, and in previous sections, we described a number of distinct temporal and spatial inconsistencies in the sampling and completeness of the theropod fossil record. Some geological stages contain more preferable preservational regimes due to geological changes and are therefore better sampled. The final stages of the Cretaceous provide an example of this (see Good's u coverage; Cashmore & Butler 2019, fig. S4). There are also clear spatial biases that suggest that sampling of the theropod fossil record has been geographically constrained to the mid-latitudes, possibly biased towards the re-sampling of previously known fossiliferous localities from countries with long histories of palaeontological research. Furthermore, because of the nature of the sedimentary record, theropods which had ecological preferences for fluvial environments are likely to be more consistently preserved than others. All of this potential unevenness could be hiding key information, and it is important to take these natural and human sampling biases into consideration when interpreting the evolutionary trends of theropod dinosaurs. For palaeontologists, these should be obvious prerequisites to studying the fossil record and deciphering true evolutionary patterns. However, in future we should be aiming to explore formations and depositional environments from time bins and localities that have not been strongly sampled.

CONCLUSIONS
1. Theropod completeness fluctuates through geological time, with notable peaks in the Carnian, Oxfordian-Kimmeridgian and Barremian-Aptian, and prominent lows in the Berriasian and Hettangian. 2. Peaks in theropod completeness and raw taxonomic diversity in the Callovian-Kimmeridgian and the Aptian-Albian are driven by the presence of concentration and conservation Lagerst€ atten. Lagerst€ atten taxa positively influence the appearance of the theropod fossil record in a significant manner. 3. Raw diversity changes through time may be influenced by completeness of theropod specimens for particular time intervals, but correlations are statistically weak. 4. There are no correlations between different sampling proxies and theropod completeness through geological time. 5. Theropods have one of the statistically poorest nontemporal distributions of completeness scores of any previously assessed tetrapod group, with many taxa known from low skeletal completeness. 6. Theropods have statistically poorer distribution of completeness scores than sauropodomorphs. When Lagerst€ atten taxa are removed, there is a significant positive correlation between theropod and sauropodomorph completeness time series suggesting a commonality to the preservational biases and sampling standards influencing our understanding of these groups. The poorer theropod fossil record could be due to generally less robust skeletons and predatory population dynamics in comparison to herbivorous and gregarious sauropodomorphs. 7. Megaraptora has the worst fossil record of any theropod subgroup. The gregarious behaviour of the omnivorous ornithomimosaurians and oviraptorosaurians potentially aids their significantly higher distribution of completeness scores in comparison to many other subgroups. Compsognathids and non-deinonychosaurian Paraves have the most complete records of any theropod subgroup because they are almost exclusively derived from conservation Lagerst€ atten. 8. We recover no significant relationship between the body size of theropod taxa and their skeletal completeness, even when Lagerst€ atten taxa are removed. This means that body size, at least on a global scale, is not a significant bias on the completeness of theropod taxa. 9. The consistently best preserved theropod skeletons come from lacustrine and aeolian deposits. However, the majority of theropod finds come from fluvial channel deposits, a regime that naturally downgrades the quality of fossils through transportation and abrasion. The heightened number of theropods derived from fluvial regimes in the Campanian and Maastrichtian could explain the generally poor quality of material from these time intervals.
10. There are strong spatial biases in the theropod fossil record. Historic research interest and sampling effort probably explain the high abundance and significantly higher completeness of theropod remains from the northern hemisphere, specifically the northern mid-latitudes. Asia has the statistically best theropod fossil record of any continent, while Australasia has the most limited, and Europe has a very poor record considering its historical scientific interest. Geographical differences in the quality of the fossil record may be more connected to modern climate, vegetation cover and rock outcrop availability, than to just human sampling.