Assessing sampling of the fossil record in a geographically and stratigraphically constrained dataset: the Chalk Group of Hampshire, southern UK

Taphonomic, geological and sampling processes have been cited as biasing richness measurements in the fossil record, and sampling proxies have been widely used to assess this. However, the link between sampling and taxonomic richness is poorly understood, and there has been much debate on the equivalence and relevance of proxies. We approach this question by combining both historical and novel data: a historical fossil occurrence dataset with uniquely high spatial resolution from the Upper Cretaceous Chalk Group of Hampshire, UK, and a newly compiled 3D geological model that maps subsurface extent. The geological model provides rock volumes, and these are compared with exposure and outcrop area, sampling proxies that have often been conflated in previous studies. The extent to which exposure area (true rock availability) has changed over research time is also tested. We find a trend of low Cenomanian to high Turonian to Campanian raw richness, which correlates with, and is possibly driven by, the number of specimens found. After sampling standardization, an unexpected mid-Turonian peak diversity is recovered, and sampling-standardized genus richness is best predicted by rock volume, suggesting a species–area (or ‘genus–area’) effect. Additionally, total exposure area has changed over time, but relative exposure remains the same. Supplementary materials: A locality list, abundance matrix and all correlation and modelling results are available at https://doi.org/10.6084/m9.figshare.c.3592208.

Evolutionary trends can be investigated by comparing counts of numbers of species or genera (taxonomic richness) in successive time bins. However, there are many analytical and sampling issues to be considered before interpreting changes in diversity curves as evolutionary events (Smith & McGowan 2011). Some of these problems, such as the 'pull of the Recent' (Raup 1979), may be corrected for by excluding Recent taxon records (Jablonski et al. 2003), or by using within-bin sampled occurrences rather than range-through data. However, variations in the sampling intensity of different time periods may influence apparent diversity patterns (Raup 1972;Smith & McGowan 2011). Sampling bias affecting fossil occurrence data can be split into four separate but interlinked categories: (1) facies heterogeneity (Holland 1995;Smith et al. 2001;Smith & Benson 2013); (2) rock volume, differential deposition, preservation and survival of sediments through time (Holland 1995;Peters & Foote 2001;Smith & McGowan 2007;Smith & Benson 2013); (3) rock exposure area and accessibility (Dunhill 2011(Dunhill , 2012; (4) palaeontological sampling effort (the 'bonanza' effect of Raup 1977;Dunhill et al. 2012Dunhill et al. , 2013Dunhill et al. , 2014b. A correlation is often found between taxonomic richness and various sampling proxies (Sheehan 1971;Raup 1972;Smith 2001;Smith & McGowan 2005;Lloyd et al. 2011). Three explanations have been suggested for this: (1) the four sampling biases previously listed are strongly distorting our estimates of past diversity, or the 'bias' model (Smith 2007;Benson & Mannion 2012;Lloyd 2012); (2) there is a third, unaccounted-for factor that is driving both the proxies used and palaeodiversity, or the 'common cause' model (Sepkoski 1976;Peters & Foote 2002;Peters 2005;Hannisdal & Peters 2011;Peters & Heim 2011); (3) some proxies are partially redundant with the fossil record (Benton et al. 2011;Dunhill et al. 2014a;Benton 2015). These three hypotheses have been discussed in detail elsewhere (Benton et al. 2011;Smith & McGowan 2011). Researchers have generally used subsampling (shareholder quorum subsampling (SQS) of Alroy 2010a; Benson et al. 2016) or sampling proxy modelling (Smith & McGowan 2007;Benson & Mannion 2012;Lloyd 2012;Benson et al. 2016) to obtain 'samplingcorrected' relative diversity, although there has been some criticism of the use of both of these methods (Brocklehurst 2015;Hannisdal et al. 2016). In particular, the sampling proxy modelling approach, as it has previously been applied by Smith & McGowan (2007) and Lloyd (2012), is now rejected based on statistical errors in its formulation and implementation (Sakamoto et al. 2016).
Other parts of the sampling discussion have focused on the fidelity and equivalence of the sampling proxies on which these hypotheses have been based (e.g. Benton et al. 2011;. This is especially important for sampling proxy modelling, when sampling proxies themselves are used as driving variables in models produced with the aim of correcting for sampling bias. It has been found that outcrop area, a proxy that has previously been used to approximate exposure area and rock volume (and hence the opportunity for palaeontologists to sample fossils) is a poor proxy, largely because it does not always correspond to exposure area, the more direct measure of rock availability (Dunhill 2011(Dunhill , 2012Dunhill et al. 2012Dunhill et al. , 2013. Here, we pick up this baton, comparing outcrop and exposure area and rock volumes with taxonomic richness in a regional case study. There are advantages and disadvantages in using both global-and local-scale studies to address questions of sampling, redundancy and common cause. Primarily, global diversity is of interest from a macroevolutionary perspective when studying the whole Earth system (Badgley 2003). However, such studies may confound data from regions with strata of different ages and exposure, different levels of facies heterogeneity, different traditions in publication and perhaps comprising rock units that are hard to date and correlate. Because of the complexity of these many biasing factors, it is highly unlikely that there is one 'global' sampling signal (Benson et al. , 2016. Regionalscale studies (e.g. Crampton et al. 2003;Dunhill et al. 2014b;Pereira et al. 2015) may provide a partial solution by restricting the number and complexity of the variables (McGowan & Smith 2008). A regional study may be considered a microcosm for studying the macrocosm of global diversity. Additionally, sampling needs to be understood at a local scale before it can be scaled up.
In the UK, the extensive field mapping and borehole investigation by the British Geological Survey (BGS) means that stratigraphic data have been well constrained and digitized. For the Chalk Group, these data have been used to construct a 3D geological model for the south of the UK (Woods et al. 2016). We harness this model to provide an estimate of rock volumes. Furthermore, the BGS hosts large, well-documented collections of fossils, including those found in the course of the field mapping carried out since the inception of the survey in 1835, part of which we use for this study.
Here, we choose the Chalk Group of the southern UK as a case study to explore the covariation between the rock and fossil records, and the fidelity of various sampling proxies. Previous studies have shown that facies heterogeneity is a strong control on diversity (Crampton et al. 2003;Rook et al. 2013). For example, terrestrial ecosystems are much more diverse than marine ecosystems, and have perhaps been so since the Late Cretaceous (modern data : May 1994;Benton 2009;fossil data: Vermeij & Grosberg 2010); if the rock record is skewed towards the preservation of terrestrial environments in any given time period for which this is true, diversity will appear to increase into this time bin, and vice versa. Ecological facies can be represented by lithofacies in the rock record and, at the formation scale, the Chalk Group has a relatively uniform lithology. The facies represented in the formations of the Chalk Group, deposited on a pelagic carbonate shelf with low sedimentation rate, are more similar to each other than the facies heterogeneity in most diversity studies we are aware of. This means that variations in depositional and ecological facies and preservation (although they do exist, and should be considered; see Table 1) are minimized, relative to other diversity studies conducted on larger temporal and geographical scales (e.g. Lloyd & Friedman 2013;Benson et al. 2016).
Our dataset uses specimens collected by a single geologist and therefore minimizes bias introduced from multi-collector databases such as studies that rely on museum collections or large synoptic fossil compilations (e.g. the Paleobiology Database, Alroy et al. 2001, http:// paleobiodb.org). Many museum specimens are collected for the purpose of taxonomic or morphological description, so often only the best specimens from good outcrops are recorded. Ultimately, this bias in large fossil databases may be solved by the entry of more data in the future, and more methodological, even dogged, field sampling from the outset. However, because this is not the case at present, despite painstaking efforts from the palaeontological community, geological survey data most readily provide thoroughly sampled datasets. The use of an extensive collection made through a single facies, by a single collector, and over a relatively short sampling time span ought to remove some of the biases of facies heterogeneity and palaeontological sampling effort, leaving only rock volume and rock accessibility biases to be considered.

Geological background
Great thicknesses of chalk were deposited during the Late Cretaceous, in water depths of 100 -500 m, when sea levels were higher than at any point in the Phanerozoic (Mortimore et al. 2001) and tropical sea surface temperatures were up to 7°C warmer than they are today (Norris et al. 2002). Chalk deposition dominated the Cenomanian to Maastrichtian (100.5 -66.0 Ma) through much of Northern Europe and similar chalk deposits are found in North America and Australia.
In the UK, Upper Cretaceous chalk sediments comprise the Chalk Group, which can be split into Southern ('Tethyan'), Northern ('Boreal') and Transitional provinces based on faunal content (Hopson 2005). Hampshire sits entirely within the deposits of the Southern Province, and like other parts of the UK Chalk Group, shows a change from more clay-rich chalks in the Cenomanian to purer chalks in the higher part of the succession. Component formations of the recently established stratigraphic framework for the Chalk in the UK (Hopson 2005; also see Fig. 1) can be recognized on the basis of marker beds and sedimentary texture. The Chalk of Hampshire is a well-established regional succession, with formalized stratigraphical units (Hopson 2005; Fig. 1) recognized by recent BGS mapping that extend into adjacent areas of the Southern Province.
To begin with, some nomenclature: hereafter, 'chalk' in lower case refers to the lithology, and 'the Chalk Group' refers to the formal Southern Province Upper Cretaceous Chalk Group of the UK, as defined by Hopson (2005). The UK Southern Province Chalk Group can be subdivided into two subgroups and nine formations; text abbreviations and definitions for these are given in Table 1. We are considering all of the stratigraphical subdivisions of the Southern Province Chalk Group, and measuring richness and sampling in each of the constituent formations of the group. As well as the very fine-grained, high-purity coccolithic limestone that is characteristic of the Chalk Group, the succession also includes minor clay-rich units (marls) of both detrital and volcanic origin (Wray 1999;Wray & Jeans 2014), admixtures of chalk and clay (marly chalk) and flint-rich beds. The Chalk Group can be soft or lithified as a nodular chalk or hardground, and hardgrounds are distributed throughout the formations in the Chalk Group. Although previous studies have found evidence that more lithified beds contain less fossil diversity (Hendy 2009;Sessa et al. 2009), the lithified hardgrounds of the Chalk Group are both condensed and preserve originally aragonitic organisms better than the nonhardground beds in the Chalk (Mortimore et al. 2001, p. 23). Although single formations are characterized by the dominance of particular combinations of lithological features, the range of intraformational lithological variability is similar to that which occurs between formations. A lithological description of each formation is given in Table 1, and more detailed stratigraphical information with qualitative descriptions of the fauna of each formation has been given by Woods (2015, fig. 2).
In general, periodic patterns in the diversity of marine organisms have been linked with periodicity observed in sea-level curves (Melott & Bambach 2014). The Cretaceous Period corresponds to a single large-scale sea-level cycle, from lowstand to highstand (de Graciansky et al. 1998;Smith & Benson 2013), with maximum transgression attained in the early Turonian (e.g. Haq et al. 1987). The Chalk Group succession in the UK includes the thickest and most complete section spanning the Cenomanian-Turonian boundary in the Anglo-Paris Basin (Gale et al. 2000), corresponding to one of the largest positive δ 13 C excursions in the geological record of the Mesozoic (Jarvis et al. 2006), and indicating high levels of organic carbon burial and widespread anoxia. This has previously been identified as a mass extinction, during which 26% of genera died out (Sepkoski 1989;Harries & Little 1999).
The Late Cretaceous was clearly a time of major environmental perturbation in the marine system and this is likely to have affected diversity. The lower part of the Chalk Group contains high concentrations of shelly macrofossils at the base of fining-upwards cycles, which have been labelled 'pulse faunas' (Jeans 1968). It is unclear whether these pulse faunas represent range expansions or preferential depositional conditions, although evidence suggests the former as a result of either influxes of cold water or rises in sea level (see discussion by Mitchell & Carr 1998). Ammonites form the basis of biozonal schemes in the Grey Chalk Subgroup (= Lower Chalk of traditional usage; Hopson 2005; Fig. 1), but rarity of originally aragonite-shelled organisms above the Cenomanian limits their use at stratigraphically higher levels in UK successions. Consequently, abundance patterns of a range of other macrofossil groups, including brachiopods, echinoderms and bivalves, are used in the UK to define biozones in the White Chalk Subgroup (= Middle and Upper Chalk of traditional usage; Hopson 2005; Fig. 1).
It is important to note that there is an overprint of taphonomic sampling bias in the Upper Cretaceous succession, as a result of the poor preservation of aragonitic organisms, a reduced number of facies preserved in the sedimentary record, and erosion during marine transgressions. Some workers (e.g. Gale et al. 2000;Smith et al. 2001) identify the widely recognized fall in diversity at the Cenomanian-Turonian boundary as more of an artefact than a real extinction, caused by faunal replacement within the facies preserved and a change in the ratio of shallow and deep marine sediments resulting from sea-level change. Smith & Benson (2013) argued that during the Late Cretaceous (1) facies shifts and (2) subsequent erosion of sediments drove artefactual changes in fossil record diversity by restricting the variety of lithofacies captured by the rock record and removing sediments respectively. Therefore, it is possible that dramatic changes in diversity in the Late Cretaceous were a result of preservational biases, and it may be difficult to tease apart these biases from common cause effects.

Fossil dataset
Palaeontological datasets for the measurement of taxonomic richness have been collected over the past 200 years. Collections are made for a number of reasons: single, well-preserved specimens may be collected by both amateur collectors and professional palaeontologists as exemplars of a particular taxon, or a large amount of material may be collected to investigate the facies and biostratigraphy of a locality or area. For diversity studies, the latter collection method is desirable, as it is more thorough and should capture more of the rare taxa. Many richness studies make use of historically collected data, rather than new field data; a large amount of the literature makes use of the numerous collections that have been entered into the Paleobiology Database (PaleoDB).
For this study, a dataset of invertebrate macrofossil occurrences from the Chalk Group of Hampshire is used, based on the work of one collector, chalk researcher Reginald Marr Brydone (1873Brydone ( -1943. The collection (Brydone Collection herein) results from Brydone's dedicated work to produce a map of chalk biozones (Brydone 1912). It consists of 18 358 specimens from 1198 localities, of which 77% have been assigned to valid genera, and is largely held by the BGS at Keyworth, with some parts at the Natural History Museum, London and the Sedgwick Museum, Cambridge. Brydone collected from this area over six field seasons in Hampshire, visiting every 'Old Chalk Pit' and 'Chalk Pit' marked on the Ordnance Survey (OS) maps of the time (Brydone 1912, p. 3). He collected both surface fossils and smaller macrofossils from bulk sediment sampling. In areas where chalk pits were rare, Brydone 'traversed enough of the roads to know the apparent scarcity of exposures [was] very genuine' (Brydone 1912, p. 3). Lang (1944, p. lxvi) noted in his obituary, 'The mere distance walked must have been quite an achievement'. The vast Brydone Collection includes many unassignable specimens (23%), and some incomplete and damaged specimens, evidence that Brydone recorded even the worstquality finds, so minimizing the risk of excessive bias from a focus on excellent specimens. Furthermore, Brydone was collecting to produce a biozone map of the Chalk in this area; this led him to employ a thorough and consistent collecting method, with a requirement to achieve a spatially uniform coverage of the area, rather than collecting large concentrations of fossils from particular exposures, as perhaps may be the case for compilations of well-preserved museum-held specimens, the 'bonanza effect' of Raup (1977).
All specimens in this dataset were collected for a single publication (Brydone 1912), whereas museum collections from classic areas in Europe have often been built up over at least the past two centuries, often combining chance finds by many different collectors, who may have been motivated to collect for many different reasons. Therefore, this local-scale study contributes novel data to the quality of the fossil record debate, by eliminating any inconsistencies between the collecting habits and effort of different palaeontologists operating at different times, and perhaps with variable methods and variable rock exposure.
On the basis of Brydone's wider published research, and details of his other collections of Chalk macrofossils in the BGS archives, we can be as confident as it is possible to be about the uniformity of the collection. In other words, Brydone collected specimens from a wide range of taxa. In fact, there is a strong argument for suggesting that it is possibly more representative than other collections from the Chalk precisely because Brydone was aware that as well as the conspicuous macrofossils, there is also a fauna of 'mesofossils' (material small enough to require a hand lens for observation, but not microscopic), which many historical collectors ignored. Additionally, Brydone's collections have been used by others for palaeontological investigation; his excellent collection of belemnites from Norfolk, UK, was used by Christensen (1991) for taxonomic study.
Although Brydone published extensively on Bryozoa, he also published widely on biostratigraphy and the broad range of fossils required to recognize the standard macrofossil subdivisions in the Chalk. Indeed, echinoids described by Brydone form part of the current standard macrofossil biozonation of the Chalk. Some of this literature was self-published or in obscure journals published by regional natural history societies, and is therefore not widely visible to outside researchers. This extensive catalogue of work illustrates Brydone's broader research interests in stratigraphy, describing the occurrence, distribution and morphological variability of a broad range of macrofossils, but particularly including data on the distribution of brachiopods, bivalves, crinoids, echinoids and belemnites.
Another important point that should be emphasized is that given the size of the project area for his 1912 work (i.e. the whole of Hampshire), it would have been a practical necessity for Brydone to have a well-distributed network of comprehensively sampled sites; he simply could not have achieved this result by focusing on only part of the available faunal evidence. Brydone's work overlapped with that of Arthur Rowe (1858 -1926), a towering figure in Chalk biostratigraphy who profoundly shaped the way that the biostratigraphy of the Chalk was understood (Gale & Cleevely 1989). There is likely to have been significant pressure on Brydone to demonstrate that his recognition of biozones across Hampshire was not open to criticism by Rowe (who often criticized the Geological Survey). In this context, it also seems highly unlikely that Brydone would have focused on particular groups that were the basis of his work in East Anglia (e.g. bryozoans, which generally have restricted biostratigraphical value) at the expense of other fossil groups for defining his biozonal units. Material in BGS collections from glacial rafts of Chalk collected by Brydone at Trimingham, in East Anglia, for his work on bryozoans includes voluminous non-bryozoan macrofossils. If Brydone had been interested in only bryozoans, why would he have collected all this other material? We think the answer lies in the fact that he was interested in understanding the complete fauna of stratigraphical units and how this could be used for correlation, as evidenced by the considerable number of stratigraphical papers he wrote about the Chalk of Trimingham.
Overall, the impression from Brydone's collections for Hampshire and East Anglia is that he wanted to comprehensively understand the fauna and correlation of the units he collected from. It can be difficult to disentangle different aspects of sampling bias in empirical studies, but intrinsic features of the Brydone Collection for Hampshire serve to minimize the confounding factors of both facies heterogeneity and temporal and spatial variability of sampling effort. This leaves variation in the survival and outcrop of sediments and fossil preservation as the only potential biases, ignoring any taxonomic effects.

Exposure area
We have quantified exposure area in this study. It has been suggested that exposure area changes through time as a result of erosion and changes in land use (e.g. the loss of quarries as potential sampling sites in the past 150 years; Dunhill et al. 2013). All specimens in this dataset were collected around the start of the twentieth century for a publication produced in 1912, so any measure of exposure area needs to approximate exposure as it was at the time of collection. In some cases, exposure area has changed substantially through time; early finds were made in railway cuttings or functioning stone quarries in the mid-to late nineteenth century, and many of these have become overgrown or infilled, allowing collection for only a relatively short length of time . The fossils forming the Brydone Collection were collected from railway cuttings, road banks, fields and other transient exposures. In other cases, hand-operated quarries that provided rich fossil trove in Victorian times may have become mechanized and so yield almost nothing now. For this reason, an attempt is made here to quantify changes in exposure area through historical time.

Collecting and preparing data
For this study, the paper records of the Brydone Collection were digitized, including fossil registers, BGS publications and the fossil specimen labels. Material relating to specimens collected by Brydone for a later publication (Brydone 1942) was excluded from the final dataset. The result was a list of localities and the specimens that were found at each locality, all of them exclusively collected for his 1912 paper.
The formations that cropped out at each of the localities of Brydone (1912) (see Fig. 1) were identified. Because modern Chalk Group formations have only recently been formalized (Hopson 2005) and the stratigraphy exposed at each locality on a modern geological map may be different from when Brydone was collecting, Brydone's original biozonal assignments have been used to derive modern formational lithostratigraphy. This is achieved by understanding the relationship between the modern biozonal scheme and the scheme used by Brydone (1912), and the typically simple relationship between modern Chalk Group biostratigraphy and lithostratigraphy (Gale & Cleevely 1989;Gale & Kennedy 2002;Woods et al. 2016). The biozonal concept of Brydone (1912) for each locality was correlated with the modern biozonal equivalents. In some cases, Brydone's biozone designations ranged across modern formation boundaries. In the dataset, all potential formational interpretations were recorded. Where there was uncertainty around assigning a locality to a formation (e.g. if a locality is in the Tererbratulina lata Zone, it may be in either the New Pit Chalk Formation or the Lewes Nodular Chalk Formation), a bootstrapping method was used, with the locality being randomly allocated to one of the possible formations 500 times. This was repeated for all localities, and richness was calculated for each of the 500 runs. The mean richness and error bars were calculated from these trials. This ambiguity affected only 104 of the 1198 localities. Diversity counts, specimen counts, rarefaction and all correlations were carried out in R v.3.2.2 (R Core Team 2015). Outcrop and exposure areas were measured using ArcMap 10.0, BGS DiGMapgb-50 (BGS geological map, 1:50 000), Esri World Imagery (2013), and modern (2014, 1:25 000) and historical (c. 1910 -1913, 1:10 560) OS maps, all in ArcMap 10.0.

Diversity.
To produce diversity counts, the 272 taxa listed in the fossil registers were updated by identifying and correcting/removing any misspellings, synonyms, invalid taxa and ichnogenera using Smith & Batten (2002) and PaleoDB. A list of 235 valid genera was produced, with their appearance in Smith & Batten (2002) or PaleoDB recorded. Genus richness counts for each locality and each Chalk Group formation were calculated.

Subsampling.
Two methods of subsampling were used: rarefaction (Sanders 1968;Hulbert 1971) and shareholder quorum subsampling (SQS; Alroy 2010a; R script for SQS version 3.3 obtainable from http://bio.mq. edu.au/∼jalroy/SQS-3-3.R). SQS is a method that subsamples to an even coverage across samples, rather than to an even number of specimens (as rarefaction does). The coverage of a sample, in this case, is the proportion of specimens in the formation that is represented by a taxon included in the subsample taken. This method treats rare taxa more fairly than rarefaction does (Alroy 2010b). Rarefaction and SQS were carried out to multiple levels in each formation for comparison. For rarefaction, the number of specimens sampled ranged through 20, 30, … , 80, and for SQS, coverage values of 0.2, 0.3, … , 0.6 were used. The numbers of specimens varied substantially between the formation-based time bins, from means of 67 to 5956 in our bootstrapped run, meaning that the sample sizes for rarefaction and coverage values for SQS had to be low. However, even with these low sample sizes, the number of specimens sampled from the Zig Zag Chalk Formation was too small for both types of subsampling. Therefore, this formation is excluded from the subsampled time series.

Specimen count and maximum specimen count.
The total number of specimens included in the collection was counted. However, in the fossil registers, occasionally one specimen number was allocated to a group of specimens or specimen fragments, with no information on the field associations of the fossil fragments. Except where fragments could be reassembled, it was often impossible to determine the number of individuals represented. For example, this was especially true for the specimen jars full of crinoid ossicles, which make up part of the collection. Therefore, two specimen counts were made: a minimum specimen count, which counted only the number of original allocated specimen numbers, and a maximum specimen count, where an estimate of the number of fragments was made and each fragment was assumed to represent a different individual. Both measures, the maximum specimen counts (the number of fragments) and the minimum specimen counts (the total number of specimen numbers allocated), were used in the analyses to test whether this uncertainty altered the results.
Both the minimum and the maximum specimen counts produce inexact results; both of the scenarios assessed by these measurements, with either one crinoid ossicle representing one organism or a group of catalogued ossicles coming from the same organism, are unlikely. An alternative way of dealing with difficulties in estimating specimen numbers comes from Holland & Patzkowsky (2004), where 1 cm of a bryozoan colony was counted as one individual. However, this would be difficult to implement in a diverse group of taxa as we have here, and there is no obvious way of applying this method to, say, crinoids.

Formation thickness.
The thickness ranges of each of the Chalk Group formations in the Hampshire area were obtained from Sheet Explanations for BGS 1:50 000 geological maps (Sheet 299: Booth 2002; Sheet 300: Farrant 2002; Sheet 314: Barton et al. 2003;Sheet 316/331: Hopson 2001). To produce average thickness estimates across the whole area, the means of these values were calculated for each formation.

Outcrop area.
The outcrop area of each formation in the study area was quantified in ArcMap. The study area boundary was defined as a polygon bounding the outermost localities with a buffer of 50 m, and digitized BGS bedrock maps (DiGMapGB-50, scale 1:50 000) were used to measure the map area covered by each formation in ArcGIS 10. Some of the Chalk outcrop area was undifferentiated even in this most up-to-date map, meaning that it could not be assigned to one formation, and this undifferentiated area could not be divided accurately between formations without any additional information. Therefore, two outcrop area measures were used: for the first, this undifferentiated outcrop area was excluded from the count; for the second, this area was split equally between all possible constituent formations (e.g. outcrop labelled as 'West Melbury Marly Chalk Formation and Zig Zag Chalk Formation (Undifferentiated)' was split equally between the West Melbury Marly Chalk Formation and the Zig Zag Chalk Formation). Undifferentiated strata accounted for just over one-sixth of the Chalk outcrop area, so any future revision of these maps may change this measured outcrop area.

Rock volume.
Volumes within the study area for each formation were calculated using the new BGS 3D digital Chalk model of the Chalk (Fig. 2; Woods et al. 2016). This model has been constructed in GOCAD-SKUA TM to explore physical property variation in the Chalk. Outcrop, borehole and structural data have been used to construct formational surfaces and interpret physical property variation across the region and the wider southern England area. The 3D framework of formational boundaries, used here for rock volume calculation, functions to explore stratigraphical variation in physical property data within the model, interpolated between boreholes using geostatistical techniques (kriging and variograms).

Exposure area.
Rock must be exposed for fossils to be sampled by standard palaeontological field techniques. Thus exposure area may be one factor affecting measured diversity in the fossil record, and yet it has often been confounded with outcrop area (Dunhill 2011(Dunhill , 2012Dunhill et al. 2012). Depending on land use, exposure may be transient and areas may be difficult to measure. This may be a problem with the Chalk Group in particular, as evidenced by 'older literature, when far more chalk pits were available for study' (Wray & Gale 2006). Likewise, Dunhill (2012) found that exposure and outcrop were more or less constant in deserts and semi-arid areas where there is little vegetation cover. However, in populated and more humid zones, much of the outcrop area can be concealed beneath soil and superficial deposits, as well as human developments. In fact, Dunhill (2011) found that older rocks that happen to form mountain belts in the UK are better exposed than younger rocks that occur in lowland agricultural areas with an often extensive cover of superficial deposits.
For this study, exposure area has been measured according to two protocols: (1) to assess exposure area at the time of fossil collection; (2) to assess changes in exposure area through time. Three mapbased metrics were used: (1) scanned OS maps published between Where these fossil localities corresponded to an 'exposure feature' on an image (e.g. a pit marked on one of the OS maps or white exposed chalk on the satellite images), a polygon was drawn around the likely exposure extent. The area of each polygon was measured and these areas were summed for each formation and the number of observed exposures counted. Figure 3 shows one locality on each of the three maps (Fig. 3a, historical OS map; Fig. 3b, modern OS map; Fig. 3c, satellite image) for comparison. Additionally, fieldwork was carried out to find if exposures using maps and satellite images could be observed on the ground. Given the large number of localities (1198) and the size of the study area (see Fig. 1), a sample of localities was visited. Two transects were chosen, which included both managed farmland and urban areas and a range of formations and topography.

Locality distribution.
We explore the spatial distribution of Brydone's fossil localities (Fig. 1) using a nearest neighbour analysis performed in ArcMap (Average Nearest Neighbor Spatial Statistics Tool), inputting the coordinates for each locality point. Ideally, sampling should be random across the study area; that is, points should not be more clustered than would be expected under a random distribution.

Correlations.
Non-parametric Spearman's rho was used to test correlations between variables. Test statistics were calculated to assess the strength of the correlations between the subsets of data. When testing correlations between time series, it is important to consider long-term trend. If similar long-term trends are present, correlations between variables may be falsely interpreted as showing a causal relationship, or as a result of a common cause other than time. To test this, series of variables in this study were regressed against time (using function 'lm' from the R package 'nlme' version 3.1-128). No long-term trends were found, so it was not deemed necessary to calculate generalized (McKinney 1990) differences, a method commonly used to detrend data. All P values were corrected for the increased likelihood of Type I error that comes with multiple comparisons, using a Benjamini-Hochberg correction (Benjamini & Hochberg 1995) using the R function 'p.adjust'.

Model selection.
We test uni-and multivariate regression models, evaluating the best models for raw genus diversity and diversity after subsampling (SQS, q = 0.6). The variables input to the models are as follows: (1) the number of specimens; (2) exposure area as measured from historical maps; (3) formation rock volume as measured from the 3D geological model; (4) a binary vector denoting the presence or absence of air weathering. Regression models were calculated using the R package 'nlme' (v. 3.1 -128), and model selection was carried out by calculating the small sample size unbiased Akaike information criterion (AIC C ) for each model (R package 'AICcmodavg' v. 2.0-4). AIC C chooses the model with the best fit, whilst correcting for model complexity (Johnson & Omland 2004). We also tested the model residuals for normality and heteroscedasticity (R function 'jarque. bera.test' in package 'lmtest' v. 0.9-34, and R function 'bptest' in package 'tseries' v. 0.10-35 respectively). Here, we do not sort individual time series independently (see the criticism of Smith & McGowan 2007by Sakamoto et al. 2016, and we do not use model residuals as relative diversity estimates. Figure 1 is a map of the fossil localities included in this dataset. The nearest neighbour analysis indicates that there is statistically significant clustering in the geographical spread of the data (P 0.001, z-score = −16.12). Plotting the localities on a map, it is clear that most of the fossil localities are close to roads and paths. This corresponds to the results of Dunhill et al. (2012), who found that fossil localities were often close to public houses and car parks (i.e. points of access close to areas of habitation). Evidently, accessibility (as indicated by roads or paths here) is an important factor for fossil sampling, even for a fossil collector as dedicated to thorough and even coverage as was Brydone. Additionally, exposure is unlikely to be randomly distributed, with factors such as proximity to the coast increasing the likelihood of exposure.

Changes in exposure area
Total exposure.
Not all of the localities sampled by Brydone were marked as features on maps, or are visible in modern satellite images (Table 2). Approximately 68% of the localities could be linked with features on OS maps dating from 1910 to 1913; these features were largely chalk pits, or road or railway cuttings. Many of the remaining localities were temporary excavations or cuttings, and c. 5% of the localities were described as scrapings in fields or paths ('float'), so were not marked on the historical OS maps. However, on modern OS maps, only c. 34% of localities could be linked with currently mapped pits or cuttings, marking a decrease in the number of chalk localities available to sample today. Brydone's localities were even harder to identify in satellite images, with exposed rock being visible in only c. 8% of the study localities. The fieldwork sampled sites were harder to identify using the geographic information system (GIS) methods. Within this subset of data, more exposure features at Brydone's localities were marked on the historical OS maps (c. 49%) than on modern OS maps (c. 29%) and satellite images (20%). One more exposure was found during fieldwork than by inspection of satellite images.

Exposure per formation.
The relative number of exposures in each formation is similar through sampling time (over the past 100 years), with a reduction in the number of identifiable exposures in the modern OS and satellite image measures (Fig. 4a). A similar pattern is seen in the exposure area measurements (Fig. 4b). Exposure area for all formations is higher when measured on historical OS than on modern OS maps, and measurements from satellite images are again lower. All measures of exposure number and exposure area are strongly correlated, with all but one pairing being correlated with P < 0.01 (Table 3), or P > 0.05 after first differencing.

Comparing geological sampling proxies
As explained in the previous section, there was a strong link between all of the exposure measures, with all but one of the correlations having P < 0.01 (Table 3). However, formational rock volumes and thicknesses do not correlate with any of the exposure measurements, except for formation thicknesses with the number of satellite exposures (P < 0.05, r s = 0.7670, Table 3), and none of these correlations is significant after first differencing. All of the exposure measurements do correlate with outcrop area with P < 0.05 (Table 3). Both measurements of specimen count (maximum and minimum) correlate, as do both outcrop area measurements (including and excluding undifferentiated formations).

Diversity patterns
Raw diversity.
In the raw richness dataset, there is a trend of low Cenomanian (West Melbury Marly Chalk Formation to New Pit Chalk Formation) diversity to high Turonian to Campanian (Lewes Nodular Chalk Formation to Portsdown Chalk Formation) diversity (Fig. 5a). There is most uncertainty about the assignment of localities to formations in the oldest two (West Melbury Marly Chalk Formation, Zig Zag Chalk Formation) and youngest two formations (Culver Chalk Formation, Portsdown Chalk Formation).

Subsampled diversity.
After rarefaction (Fig. 5b), richness in all formations, with the exception of the New Pit Chalk Formation, appears similar. This rarefaction 'flattening out' of a richness curve is to be expected as sample size decreases (Alroy 2010a). However, SQS produces a similar trend, with New Pit Chalk Formation richness elevated above that of remaining formations (Fig. 5c).

Diversity and sampling proxies.
Here, raw diversity correlates only with the number of specimens (P < 0.05, r s = 0.8667). Raw diversity is not correlated with any of the measured sampling proxies or with subsampled estimates of diversity (Table 4).   Table 1.

Model selection
Uni-and multivariate models for genus richness were tested (Table 5). For the raw genus diversity, the number of specimens found in each time bin was the best predictor of genus diversity (AIC C weight = 0.52), with the number of specimens and exposure area combined being the second best model (AIC C weight = 0.37).
For the subsampled (SQS) genus diversity, formation rock volume was the best predictor (AIC C = 0.31), with the single-variable models being the best predictors of subsampled richness.

Changes in exposure area
There is observational evidence that exposure area (rocks visible at the surface) of the Chalk Group has decreased over time as a result of the loss of chalk pits (e.g. Wray & Gale 2006). We find that the number and total area of exposures, as measured by OS mappers, has certainly decreased. This is problematic for palaeontologists; many   Table 1. studies of richness through time use summed collections to measure this, and these collections have been sampled at different dates over past centuries. However, our ability to sample fossils has changed through history. In this study, for example, it has become harder to find fossil localities since Brydone's time. In other areas, as land use changes, rock may become more or less accessible over time. However, in this study, we find that the relative exposure of each formation has remained the same since Brydone's collection was made, and all measures of exposure correlate (Table 3). In other words, formations that were well exposed in the past remain the best exposed now. This may be due to land use; although the sample contains both urban and rural managed areas, many of the fossils were collected as loose specimens that had been ploughed up on farmland and this land use has changed little in the past 100 years. Another explanation might be the structure of the geology; on the large scale, all chalk formations in this area are constantly and shallowly dipping to the south. The interplay of this structure and the response of each formation to weathering and surface processes dictate the topography of this landscape (Aldiss et al. 2012). Perhaps this relationship dictates exposure, because topography is linked to exposure (Dunhill 2012). The importance of exposure area, in this case, is not diminished as a sampling proxy as a result of changing exposure through sampling time.
We have tested the success of different methods in finding and recovering exposure area. Satellite images are worse at capturing exposure than OS maps. This may result from vegetation obscuring any exposures, and exposures being too small to find in the satellite images. When fieldwork was carried out on a small sample of the localities, it was only slightly easier to find exposure than using satellite images (Table 2). This may also have been as a result of vegetation, with old chalk pits being obscured or filled in.
Links between sampling proxies Unlike Dunhill (2011, we found that measures of exposure do correlate with outcrop area in this region, perhaps because of relative similarity of facies and hence response to erosion and weathering of all included formations. Neither outcrop area nor exposure area correlated with rock volumes or thickness. Not all surviving rock can be sampled, so rock volume measures are further divorced from the concept of sampling availability than is outcrop or exposure area. This lack of correlation reaffirms the importance of careful sampling proxy choice.
Within the Chalk Group, there is an inherent preservation bias because aragonite is not widely preserved (Mortimore et al. 2001;Smith & Batten 2002). The Chalk Group was deposited in a time of 'calcite seas', when the Mg/Ca ratio of seawater was low (Sandberg 1983;Hardie 1996;Stanley & Hardie 1998), which may have a significant impact on the preservation of aragonitic organism diversity (Cherns & Wright 2000. Originally aragonitic organisms that have been replaced by calcite (e.g. ammonites, nautiloids, some gastropods) are generally well preserved only in the clay-rich Grey Chalk Subgroup (West Melbury Marly Chalk Formation and Zig Zag Chalk Formation). In formations younger than this, evidence of aragonitic organisms is found at condensed horizons (hardgrounds), as moulds on the attachment scars of calcitic organisms, or when these aragonitic organisms had originally calcitic parts; for example, the aptychi of ammonites (Smith & Batten 2002). However, in this study there is a marked increase in diversity between the older (West Melbury Marly Chalk Formation, Zig Zag Chalk Formation, Holywell Nodular Chalk Formation, New Pit Chalk Formation) and younger (Lewes Nodular Chalk Formation, Seaford Chalk Formation, Newhaven Chalk Formation, Culver Chalk Formation, Portsdown Chalk Formation) Chalk Group formations, indicating that this failure to capture originally aragonitic organisms is not the only sampling bias in the Chalk Group. However, it is impossible to estimate what a diversity curve would have looked like without this preservation failure (Cherns & Wright 2009).

Richness through time.
The raw richness count shows a trend of low richness in the Cenomanian to Turonian to high richness in the upper Turonian to Campanian. This trend may be related to lithology; the Grey Chalk Subgroup is less likely to air-weather than the White Chalk Subgroup (A. Gale, pers. comm. 2015). It is more difficult to find small fossils in fresh exposures than in weathered exposures. However, after subsampling using both rarefaction and SQS, the New Pit Chalk Formation (Turonian) stands out as the highest diversity formation in the study, when compared with the remaining subsampled formations, which all have similar levels of diversity ( Fig. 5b and c). This is unexpected; Lewes Nodular Chalk Formation contains hardgrounds, including the Chalk Rock, which are condensed and fossiliferous. For this reason the Lewes Nodular Chalk Formation often appears to contain a greater abundance of macrofossil remains, as indicated by the numerous stratigraphically important fossil ranges illustrated for the Lewes Nodular Chalk Formation compared with the New Pit Chalk Formation (Mortimore 1986). However, here, Lewes Nodular Chalk Formation has the second-lowest genus richness when SQS is calculated with a quorum of 0.6. There are three possible explanations for this: (1) genus richness may not be a good proxy for species richness (Hendricks et al. 2014), and the stratigraphically important fossils listed by Mortimore (1986) are identified to species level, whereas this study is to the level of genera; (2) withinformation evenness may affect the results of subsampling (Olzewski 2004) and even SQS (Hannisdal et al. 2016); or (3) the fossiliferous hardgrounds of the Lewes Nodular Chalk Formation were less accessible in the past than for modern BGS mappers.
Generally, fossils are unevenly distributed in the geological record as a result of condensed beds and varying preservation potential, and this evidently holds true for a geological unit like the Chalk, with its relatively uniform lithology. The Chalk has evidence of 'pulse faunas', described as 'recurring assemblages, or abnormal abundances of calcitic macrofossils' (Paul et al. 1999). These faunas resemble Konzentrat-Lagerstätten, which are rock units with unusually high concentrations of fossils (Seilacher et al. 1985). However, pulse faunas are thought to arise as a result of real faunal immigration (Mitchell & Carr 1998) rather than elevated preservation potential, and fossils are also concentrated in condensed hardgrounds, such as the Chalk Rock. At this small scale, diversity changes from bed to bed are likely to be great. During sampling, if fossiliferous beds are not collected from, perhaps because of issues of accessibility or ease of extraction, then any diversity measurements will be unrepresentative.

Richness and sampling proxies.
Rock volume has been suggested as a driver of diversity (Crampton et al. 2003), and specifically within the UK a reduction in the number of lithofacies and subsequent rock package erosion has been suggested as a driver of diversity (Smith & Benson 2013), either as a common cause or through sampling bias. For raw genus diversity, the number of specimens was the best predictor of richness, and the second-best model was the number of specimens and exposure area combined. For subsampled (SQS) genus diversity, rock volume was the best predictor of sample-standardized richness. Subsampled richness, in theory, is a better indication of the real relative richness in each time bin, compared with raw richness, as sample size biases are reduced. The proxy of rock volume is a function of both sampling bias and original facies area; that is, the original accommodation space for sediment to accumulate. Because rock volume is linked with subsampled richness and not raw richness, it suggests that there may be a species-area (or 'genus-area') effect controlling sampling standardized diversity.

Conclusions
Using maps and satellite images to track changes in exposure through time, we find an overall reduction in exposures and exposure area over the past 100 years. However, relative exposure has remained the same; units that were well exposed in the past are, on the whole, well exposed now, and vice versa. We find that exposure area and outcrop area correlate, but they do not co-vary with rock volumes, suggesting that a volume proxy is more divorced from rock availability than either outcrop area or exposure area. This lack of correlation is very important to consider when choosing sampling proxies for comparison with the fossil record. Overall, raw diversity patterns in the Chalk Group are shaped by the number of specimens, with both the diversity and specimen count datasets showing a general increase through time. However, when sampling is standardized by the number of specimens, unexpectedly, the New Pit Chalk Formation diversity is elevated, and there is a possible link between subsampled richness and original sediment accommodation space. Raw richness in this geological section is probably driven by the number of samples and the existence of highly fossiliferous beds that yield a large proportion of the diversity measured.