Progress Are all sediment traps created equal? An intercomparison study of carbon export methodologies at the PAP-SO site

Sinking particulate flux out of the upper ocean is a key observation of the ocean’s biological carbon cycle. Particle flux in the upper mesopelagic is often determined using sediment traps but there is no absolute standard for the measurement. Prior to this study, differing neutrally-buoyant sediment trap designs have not been deployed simulta- neously, which precludes meaningful comparisons between flux data collected using these designs. The aim of the study was to compare a suite of modern methods for measuring sinking carbon flux out of the surface ocean. This study compared samples from two neutrally buoyant drifting sediment trap designs, and a surface tethered drifting sediment trap, which collected sinking particles alongside other methods for sampling particle properties, including in situ pumps and 234 Th radionuclide measurements. Samples were collected at the Porcupine Abyssal Plain Sustained Observatory (PAP-SO) site in the Northeast Atlantic Ocean (49°N, 16.5°W). Neutrally-buoyant conical traps appeared to collect lower absolute fluxes than neutrally-buoyant, or surface-tethered cylindrical traps, but compositional ratios of sinking particles indicated collection of similar material when comparing the conical and cylindrical traps. In situ pump POC: 234 Th ratios generally agreed with trap ratios but conical trap samples were somewhat depleted in 234 Th, which along with sinking particle size dis- tribution data determined from gel traps, may imply under-sampling of small particles. Cylindrical trap POC fluxes were of similar magnitude to 234 Th-derived POC fluxes while conical POC fluxes were lower. Further comparisons are needed to distinguish if differences in particle flux magnitude are due to conical versus cylindrical trap designs. Parallel analytical determinations, conducted by different laboratories, of replicate samples for elemental fluxes and gel trap particle size distributions were comparable. This study highlights that the magnitude of particle fluxes and size spectra may be more sensitive than the chemical composition of particle fluxes to the instrumentation used. Only two deployments were possible during this study so caution should be taken when applying these findings to other regions and export regimes. We recommend that multiple meth- odologies to measure carbon export should be employed in field studies, to better account for each method’s merits and uncertainties. These discrepancies need further study to allow carbon export fluxes to be compared with confidence across laboratory, region and time and to achieve an improved global understanding of processes driving and controlling carbon export.


Introduction
Sinking particulate flux out of the upper ocean is a key observation for research of the ocean's biological carbon cycle. Sinking particulate material is considered to be the most important mechanism for the vertical downward transport of particulate organic carbon (POC; Boyd et al., 2019;Buesseler et al., 2007) and is important for carbon sequestration in the deep ocean. There is currently no absolute standard for the measurement of POC flux. The main approaches that have been employed to measure carbon export in the upper ocean are: (1) sediment traps to directly capture sinking material (neutrally buoyant drifting traps and tethered traps of varying designs; Bourne et al., 2019;Buesseler et al., 2000;Estapa et al., 2020;(Buesseler, 1991;Buesseler et al., 2006;Villa-Alfageme et al., 2014), (3) marine snow catchers for in situ snapshots (Baker et al., 2017;Riley et al., 2012), (4) optical sensors deployed on gliders and Lagrangian floats which can capture flux time series (Bishop et al., 2016;Bourne et al., 2019;Briggs et al., 2011;Dall'Olmo and Mork, 2014;Estapa et al., 2013Estapa et al., , 2017Estapa et al., , 2019, (5) camera systems and optical profilers, such as the Underwater Vision Profiler, to estimate particle properties (Stemmann et al., 2008) and (6) export estimates using upper ocean elemental balances (Emerson, 2014;Michaels et al., 1994;Quay et al., 2012). These approaches measure and integrate particle fluxes over varying timescales (instantaneous to several months) with different merits and shortcomings.
Accurate estimates of carbon export are crucial for balancing ocean carbon budgets Sabine, 2004;Sanders et al., 2016) and for advancing understanding of the key biological carbon pump (BCP) processes. Without accurate quantifications of carbon export, it is not possible to evaluate how climate change has and will impact the future BCP (Bopp et al., 2001;Boyd, 2015). There are continued inconsistencies in carbon export measurements, such as an unbalanced mesopelagic carbon budget (Burd et al., 2010;Giering et al., 2014;Steinberg et al., 2008) and discrepancies between radionuclide methods and trap fluxes (Buesseler, 1991;Ceballos-Romero et al., 2016;Lampitt et al., 2008). Due to limited intercomparison studies, it is not yet possible to distinguish if these inconsistences are due to methodological issues, a lack of comprehensive measurements of the processes driving the BCP or a result of data misinterpretation, and more likely is a combination of all three factors.
A standardised procedure for sediment trap deployment and processing was created and recommended by the Joint Global Ocean Flux Study (JGOFS; Gardner, 2000). However, laboratory groups have continued to improve upon the JGOFS protocols, often using different approaches, meaning that inconsistences in measuring sinking particle flux still remain (Buesseler et al., 2007) and may preclude reliable comparisons. This is particularly pertinent as sustained observatories, such as the Bermuda Atlantic Time-series Study (BATS) and the Porcupine Abyssal Plain Sustained Observatory (PAP-SO), use different trap designs to measure upper ocean carbon export and a lack of comparable time series data make causal relationships difficult to identify for wider geographical areas (Buesseler et al., 2000;Estapa et al., 2019;Giering et al., 2014;Henson, 2014;. Several different sediment trap designs have been deployed to measure upper ocean export and are classified as either neutrally buoyant sediment traps, i.e. autonomous Lagrangian floats with cylindrical or conical collectors (Estapa et al., 2020;Lampitt et al., 2008;Sherman et al., 2011;Valdes and Price, 2000) or drifting traps, tethered to a surface float (Knauer et al., 1979;Peterson et al., 2009). Traps anchored to the seafloor are not discussed here, but have been used in some cases for sampling the upper 500 m, and generally are considered to undercollect when deployed at depths shallower than 1-2 km (Buesseler et al., 2010;Gardner, 2000;Scholten et al., 2001;Yu et al., 2001). Neutrally buoyant traps are generally favoured over tethered traps after evidence of possible hydrodynamic biases were observed for surface tethered traps (Baker et al., 1988;Buesseler, 1991). The major issues that may lead to inaccurate carbon fluxes when using sediment traps, as summarised in Buesseler et al., (2007), are hydrodynamic biases, swimmers, and solubilisation. Hydrodynamic effects, which have been found to impact the collection efficiency of sediment traps, are amplified by larger Reynolds numbers (R t ), which parameterises the turbulence associated with the flow over and around the trap collector opening, and increasing tilt of the collectors (Buesseler et al., 2007;Gardner, 1985). The shape and aspect ratio, which is the ratio of the collector height to the collector diameter, of conical traps makes them more likely to be susceptible to the processes influencing collection efficiency than cylindrical traps (Baker et al., 1988;Buesseler et al., 2007;Gardner, 1985).
The motivation behind this study is to improve the scientific communities' understanding of different sediment trap designs, and begin working towards establishing a standard metric for the use of sediment traps to measure particle flux in the upper mesopelagic. This study aimed to: (1) compare different drifting sediment trap designs with nontrap methodologies for detecting sinking carbon flux out of the surface ocean and (2) undertake a laboratory intercomparison of sample processing methodologies across four international laboratories (Table 1). We measured the chemical composition and particle size spectra of sediment trap particle fluxes, in situ pump particle elemental ratios and 234 Th water column flux. This was the first time that two different neutrally buoyant sediment trap designs have been deployed simultaneously, alongside other methods, and should offer insight into carbon export estimates.

Field site and deployment strategy
Particle flux was measured at the PAP-SO Site (49.0°N 16.5°W) in the Northeast Atlantic between the 16th of April and the 28th of April 2017 during a RRS Discovery (DY077) expedition (Fig. 1). Deployment 1 (D1) was from the 19th-21st of April and deployment 2 (D2) was from the 24th-27th of April (see Table 2 for further detail).

Water column sampling
CTD casts were undertaken for targeted sampling at the beginning and end of the sediment trap deployment ( Fig. 1b and 1c). Chlorophyll concentration (µg L −1 ) was derived from fluorometer (Wetlabs) data at 30 m for April and May 2017 at the PAP Site and from bottle samples at 30 m from CTD casts during the cruise (Fig. 3). The fluorometers were calibrated prior to each year-long deployment using instrument specific factory calibrations and using chlorophyll concentrations derived from in situ water sample chlorophyll fluorescence measurements. The mixed layer depth (MLD) was calculated using an average of temperature boundaries of 0.2°C and 0.5°C (Hosoda et al., 2010) and the base of the primary production zone (PPZ), i.e. where particle production occurs, was derived from chlorophyll fluorescence data, following the method described by Owens et al. (2015), with the PPZ extending between 0 m and the depth where fluorescence reaches 10% of its maximum value.

Neutrally buoyant sediment traps
The Group 1 neutrally buoyant sediment trap (NBST) consists of four "wide" cylindrical sediment trap tubes (collection area of each tube = 0.0113 m 2 ) and a 0.25 m pathlength transmissometer (C-Rover 2000, WETLabs, Philomath, OR) arranged around a profiling float (Estapa et al., 2020;Valdes and Price, 2000). The traps were programmed to sink to a predetermined depth (200 m and 350 m; Table 2) and drift in a Lagrangian manner whilst collecting sinking particles. The wide tubes were programmed to close and the traps rose to the surface for recovery.

Surface tethered trap
The Group 1 surface tethered trap (STT) array was deployed here with both the "wide" NBST tubes and a "narrow" tube similar to standard particle interceptor traps (PIT) used at the Hawaii Ocean Timeseries (HOT) and BATS that remain open during deployment and recovery (PIT; collection area = 0.00385 m 2 ; Karl & Lukas, 1996;Knauer et al., 1979). The wide tubes are deployed with a honeycomb baffle with openings of diameter 0.95 cm and with an aspect ratio of 2.67. The narrow tubes are deployed with a similar baffle with openings of diameter 1.27 cm with an aspect ratio of 4.5. During D1, STTs were deployed with collection tubes at 200 m and 350 m and during D2, STT tubes were only deployed at 350 m. A burnwire controller programmed to close the four wide tube lids functioned correctly at 200 m but not at 350 m during D1. During D2 two wide tubes were programmed to close, and two were set to remain open along with a third pair of PIT tubes without lids.
The STT was deployed with a downward-looking Nortek current meter, approximately 2 m below the bottom of the 350 m trap, which measured current speeds every 30 seconds. During D1, the horizontal and vertical velocities were always less than 10 cm s −1 , with a median horizontal velocity of 6 cm s −1 . During D2 the velocities increased during the deployment to reach a maximum of 23 cm s −1 (horizontal) and 20 cm s −1 (vertical), with a median horizontal velocity of 13 cm s −1 . An hourly mean of the horizontal and vertical velocity was calculated to remove noise and reveal trends during the sampling periods (Fig. B1).

PELAGRA traps
The Particle Export LAGRAngian (PELAGRA) traps were developed at NOC (Lampitt et al., 2008) and similar to the Group 1 NBSTs, were programmed to reach a chosen sampling depth (350 m in this study) where they drift in a Lagrangian fashion. The PELAGRA sediment traps were deployed without baffles. Two versions of PELAGRA were deployed during this cruise, the standard version (PELAGRA1, P1 and PELAGRA2, P2; area 0.11560 m 2 ) and a time-series version (PELAGRA ts , P t-s ; area 0.10061 m 2 ) which allows for an 'early' and 'late' sequential sample to be collected. Each PELAGRA trap has the capacity for four sampling cups, with two cups and overlying collectors sometimes removed, to be replaced by the P-Cam camera system (Canon EOS 6D digital SLR camera equipped with a 50 mm macro lens and a Canon Speedlite 600EX RT flash gun) which is used to determine the abundance and size spectra of particles >100 µm. The P-Cam data was not included in this study. PELAGRAs were deployed with gel cups (see Section 2.4 and Table 2 for details) and in these instances, they were deployed without the collection funnels above the gel cups, in which case the collection area of the gel cup was 0.001924 m 2 . All PELAGRAs were deployed at 350 m.

On-board trap preparation and processing
All sampling tubes from the NBSTs, STT and cups from PELAGRA were prepared using the same brine and poison. The configurations of the tubes for each deployment are detailed in Table 2. Tubes or sample cups used to collect the primary fluxes were prepared with 70 ppt brine poisoned with 0.1% formaldehyde concentration and borate buffered to pH 8.5. For both the wider and narrow tubes the brine layer (500 ml in the bottom of the tube for wide NBST style tubes and in a 125 ml bottle in the bottom of the narrow tubes) was overlain with 1 µm filtered seawater from 350 m. For PELAGRA, the 600 ml cups were filled with brine prior to deployment and were attached to the trap in the closed position, which opens and closes as programmed.
After recovery, the wide NBST tubes filled with brine were left to settle for 1-3 h. The samples were then processed on board by removing the water overlying the brine layer in the bottom of the sampling tube by a peristaltic pump. The bottom brine layer containing sinking particles was screened through 350 µm nylon mesh to remove so called "swimmers". Up to three replicate tubes were drained through a single The black box refers to the area of (b) and the grey box refers to the area of (c). Regions with continuous cloud cover are white. (b) Deployment 1 and (c) Deployment 2 deployment and recovery locations, tracks of the sediment traps and locations of CTDs deployed at the beginning (early, squares) and end (late, circles) of the deployments. The PAP-SO Site is marked with a red cross. The Group 1 neutrally buoyant sediment trap is the NBST and the Group 1 drifting surface tethered trap is the STT (Fig. 2). There were three PELAGRAs deployed, which will be referred to as P1, P2 and P t-s (Group 2). The STT was tracked by GPS and provided continuous location data whereas only deployment and recovery locations were available for the neutrally buoyant drifting traps.

Table 2
Sediment trap deployment metadata. The three types of sediment trap deployed were PELAGRAs (neutrally buoyant sediment traps developed at NOCS, UK, Group 2), NBST (neutrally buoyant sediment trap developed at WHOI, US, Group 1) and the STT (surface tethered trap developed at WHOI, US, Group 1). Tubes are referred to as either wide (NBST tubes which were also deployed on the STT) or narrow (PIT tubes deployed on the STT). The medium in the cups/tubes were either brine (low formalin concentration), polyacrylamide gel (GPA) or cryogel (GCG  Baker, et al. Progress in Oceanography 184 (2020) 102317 screen and combined. The screen was picked under 12x magnification to remove swimmers whilst leaving on the screen particles and organisms, such as foraminifera, that passively sank into the trap. These particles were rinsed from the screen back into the main sample. Swimmers were rinsed onto a 25 mm diameter QMA filter (nominal pore size 1 µm), dried and counted on board for 234 Th and analysed later by laboratory Group 1 for total carbon, nitrogen (TC/N) and particulate inorganic carbon (PIC). Passively sinking particles remaining after swimmer removal were combined and wet split into 1/8th aliquots using a custom rotary splitter . Generally, four of the splits were filtered onto QMA filters, dried and processed for 234 Th counting and later for TC/N and PIC analysis (details for 234 Th in Section 2.3.5). Three or four splits were filtered onto 0.2 µm polycarbonate membranes (Whatman Nucleopore) and rinsed with borate-buffered, pH 8.5 Milli-Q water to remove seawater salts. The filters were then dried and weighed on shore to determine mass flux and later analysed for biogenic silica (BSi; SiO 2 ). Some of the splits prior to filtration were also provided to laboratory Group 2 for cross calibration (see Section 2.3.2, Table 1 and  Table 3).
Similar to the wider NBST tubes, the narrow tubes on the STT had overlying seawater siphoned off, and the 125 ml sample collection bottles were removed. Generally two samples were combined into a secondary container and processed identically to the wider sampling tubes, with a screen to remove swimmers, wet split and 1/8 aliquots filtered on to either QMA or polycarbonate filters and processed in an identical manner to the other samples. Some of the single PELAGRA cups were also processed according to this protocol, while others were sent to Group 2 for further processing (see Section 2.3.2).

Sample processing at laboratory Group 2
Replicate Group 1 splits and PELAGRA cups were processed in parallel using Group 2 methodology. To better preserve the sample prior to processing on shore, additional concentrated formaldehyde was  . The mixed layer depth (MLD; solid line) and the base of the primary production zone (PPZ; dashed line) derived from CTD data are plotted in black. Deployment 1 (D1) and 2 (D2) are marked. The fluorescence sensor (data up to April 17th) had been deployed for one year and may have undergone sensor drift, which may explain the lower chlorophyll concentrations from the water samples versus the sensor data. The fluorescence sensor was recovered on April 17th, hence the sampling gap, and a new calibrated sensor was deployed.
added upon recovery to raise the concentration to 5% formaldehyde and samples were stored at 4°C. Cups processed by Group 2 were handpicked for swimmers at 75× magnification under a laminar flow unit. There is no size distinction in the usual Group 2 protocol but to allow for a more robust intercomparison with the Group 1 protocol, swimmers for each sample were screened through a 350 µm mesh. The swimmers that passed through the mesh were filtered onto a preweighed pre-combusted GF/F (0.7 µm Fisherbrand) for dry weight and POC/N analysis.
PELAGRA cups and Group 1 replicates were treated identically postpicking. Wet samples were split into 1/8th aliquots using a custom Group 2 rotary splitter into 60 ml Nalgene bottles and stored at 4°C. Two splits were filtered respectively onto two pre-weighed pre-combusted 0.7 µm GF/Fs, one for Total Carbon (TC)/Nitrogen analysis and one for POC analysis. Both filters were used to measure dry weight to calculate mass fluxes (reported with propagated uncertainties from measurement error). GF/Fs were rinsed with borate-buffered Milli-Q to remove any salts. A third PELAGRA split was filtered onto a 0.4 µm polycarbonate filter (Whatman Nucleopore) for BSi analysis and rinsed with borate-buffered Milli-Q. An independent mass flux estimate was calculated from the Nucleopore filters for comparison. Group 1 replicates (1/8th of original) were split again into 1/8th splits (1/64th of original) for consistency and also to allow TC/N and POC analysis for samples with only one replicate 1/8th split available. For Group 1 samples with only one original split available, four splits (4 × 1/64th) were filtered for TC/N analysis and four splits (4 × 1/64th) filtered for POC analysis. For Group 1 samples where two original 1/8th splits were available all eight of the 1/64th splits were filtered for TC/N and POC analysis respectively. The Group 1 replicates were handled identically post-filtering to the PELAGRA cups for mass flux, TC/N and POC analysis, to allow for a comparison of the analytical methods without the effect of heterogeneity between sample cups (but may still differ due to split to split variability, see Appendix C).

In situ pumps
Two McLane in situ battery powered pumps were deployed twice during each deployment to collect size-fractionated particles. Water was pumped through a 51 µm pore size Nitex screen followed by a 1 µm pore size quartz filter allowing for size distinctions of particles greater than 51 µm and 1-51 µm in size. Filters were 142 mm in diameter and a baffle opening was used to keep particles from washing off the screen during recovery (Lam et al., 2015). The pumps operated for 2 h and were shut off before recovery. The volume pumped was recorded and the screen was rinsed with pre-filtered seawater onto a 1 µm silver filter (25 mm diameter). Weighed slices were used for C/N and PIC (methodology described in Section 2.3.4.1).

Analytical methods
2.3.4.1. Laboratory Group 1 analysis. QMA filters for analysis at laboratory Group 1 were dried at 45°C on board, mounted, and immediately counted for low-level β emissions on board the ship (further 234 Th methodology in Section 2.3.5). QMA filters were then unmounted, re-dried, and gravimetrically sub-setted into 4 pieces. One quarter of the filter was analyzed for TC/N after high-temperature combustion on a Thermo Electron FlashEA 1112C/N analyzer. Coulometric analysis for PIC after sample acidification was performed on the second quarter of the filter (Honjo et al., 2000;Johnson et al., 1985). The remainder of the filter was archived. Polycarbonate filters for mass and BSi determination were dried and reweighed repeatedly on a microbalance until stable weights with a precision better than ± 0.01 mg were achieved. Filter tare weights were subtracted and net mass accumulation was calculated. Then the filters were Table 3 Variability intercomparison for split to split variability, a laboratory comparison between Group 1 and Group 2 methodology and PELAGRA cup to cup variability for mass, TC, PIC, POC, BSi and 234 Th fluxes. The mean relative standard deviation (RSD, %) of replicate measurements and the number of samples, splits or cups used in the calculation is detailed. The POC split to split variability is calculated based on the difference between the TC and PIC means and the maximum uncertainty of both measurements is reported.   Baker, et al. Progress in Oceanography 184 (2020) 102317 digested to release BSi using a weak alkaline digest (0.2 N NaOH for 2 h at 95°C) and analysed following standard spectrophotometric methods (Strickland and Parsons, 1972).

Laboratory Group 2 analysis.
Net mass accumulation was measured by re-weighing the dried pre-weighed GF/F's filtered for C/ N analysis and also the dried pre-weighed polycarbonate filters for BSi analysis. TC/N filters were pelleted and analyzed on an Elementar Vario Isotope Select (University of Southampton). Particulate organic carbon GF/Fs were fumed using concentrated HCl for 24 h to dissolve any inorganic carbon (Hedges & Stern, 1984). Filters were dried in a 40°C oven and handled similarly to TC samples. The PIC mass was the difference between TC and POC. BSi was measured using a spectrophotometric autoanalyser and after digesting Si with 0.2 N NaOH as in Brown et al. (2003). The relative standard deviation (RSD) of the BSi analytical error was 0.25%. Fluxes were calculated by normalising mass to the sampling area and sampling time. All fluxes are reported with propagated uncertainties derived from analytical errors.

234
Th profiles were determined at the beginning and end of each deployment, triangulated 10 km apart around the drifting trap location ( Fig. 1b and c). A 4L sample was collected from CTD casts, a stable Th yield monitor was added and the pH was adjusted to promote the formation of a Mn precipitate that scavenges Th. The sample was filtered through a 25 mm diameter quartz filter . The filter was dried, mounted and beta counted on board, and 6 months post-cruise, to determine the amount of interfering beta activity and detector background that was not associated with 234 Th in the sample. All total 234 Th activities were corrected for the overall efficiency of the manganese precipitation method determined using 230 Th as a yield monitor. After the final beta count the 230 Th recovery was conducted as in Pike et al. (2005) but without using an ion exchange column to reduce Mn interferences. Mn precipitates were dissolved in a nitric acid and hydrogen peroxide solution and a known amount of 229 Th was added to each sample. The 230 Th/ 229 Th ratio was analysed by ICP-MS to determine the amount of 230 Th in each sample and allow for the correction of thorium loss during processing. The 234 Th CTD samples had a small deficit relative to its source from 238 U (determined by salinity; Owens et al., 2011), which contributed to the higher uncertainties for the 234 Th fluxes as they are derived from the difference in 234 Th and 238 U activities. Integrated 234 Th fluxes were calculated as in Buesseler et al. (2008) using a 1-D steady state model. 234 Th was also measured from the in situ pumps using a silver filter and a 25 mm subsample from the QMA filter which were dried, mounted and beta counted in the same manner described above. For all analyses, measurement and analytical errors were propagated throughout all calculations.
POC fluxes were estimated using in situ pump POC: 234 Th ratios from the >51 µm particle fraction and integrated 234 Th fluxes at 200 m and 350 m from CTD samples similar to Ceballos-Romero et al. (2016). The 1-51 µm particle fraction's POC: 234 Th ratios were very similar to the >51 µm particle fraction's ratios and so only the large particle fraction ratios were used to calculate POC fluxes.

Particle size distribution methodology
Gel samples were collected from all platforms in identical, transparent-bottomed polycarbonate jars. These were filled with approximately 50 g of 40% polyacrylamide gel (Durkin et al., 2015) or an equivalent volume of cryogel (Tissue Tek, O.C.T.™ Compound from Sakura). On the STT and NBSTs, the gel-filled jars were placed in the bottom of the wide trap tubes which were then filled to the top with 1 μm-filtered seawater. On the PELAGRA traps, the jars were filled to the brim with filtered seawater and then threaded into place on the trap carousel. The trap funnels above the gel collectors were removed to avoid aggregation of particles during collection. Gels were stored at 4°C except while they were being imaged, and then frozen (−20°C) once on-board imaging was complete. Polyacrylamide gel jars were shipped frozen for subsequent onshore analysis by Group 2.
On board the ship, the polyacrylamide and cryogels were imaged in their entirety by both Groups 1 and 2 using transmitted light at low magnification (pixel size 11.5 μm) with a custom imaging setup (Basler acA4600 7gc camera, Edmund Optics 16 mm/F1.8 86,571 lens). On shore polyacrylamide gels were reimaged by Group 1 under transmitted light at two higher magnifications (pixel sizes 1.02 and 2.54 μm) using an Olympus IX83 inverted light microscope with automated stage and microscope control. The high magnification imaging was not done for the cryogel samples. Full details of particle size distribution calculations are given in Appendix D. Briefly, for the polyacrylamide gels (Group 1) fields of view were either selected (at high magnification) or manually cropped (at low magnification) to avoid swimmers, the background was removed, and particles were identified and sized using the Matlab 'bwconncomp' function (The Mathworks, Inc.). The cryogel samples were analysed by Group 2, by removing the background using a Gaussian filter, and the particles were identified and sized using the 'Matlab Image Processing Tool Box' (The Mathworks, Inc).
At each magnification, particles were binned into log-spaced size bins, particle counts in the process blanks were subtracted, and counting uncertainty (±2.5 particles per size bin) was computed. Particle number fluxes as a function of size (units N μm −1 m −2 d −1 ) were computed at each magnification by normalizing particle counts to the imaged area and the deployment length. PELAGRA particle counts imaged at low resolution were additionally normalized to the jar:trap opening area ratio (5.87) because the counts included the jar edges, which were not directly under the trap opening. High resolution counts, which were conducted on images from directly under the trap opening, were not normalized in this fashion.
Using relative uncertainty as a guide, merged particle size distributions were created from the three magnifications (Durkin et al., 2015). A two-parameter power-law function was fit to the particle size spectrum for each sample: where N(D) gives the number of particles as a function of particle diameter, D 0 is a reference diameter, and ξ is the power-law size distribution slope. The value of ξ is insensitive to the choice of D 0 (and therefore the value of N(D 0 )) so only ξ is reported here. A Monte Carlo procedure was used to propagate the error in N(D) into ξ.
A simple estimate of the particle volume fluxes was computed assuming particles were solid spheres, which introduces bias, as the area:volume ratio of a sphere is the smallest of all possible shapes. However, these volume fluxes are likely to be overestimates as particles are often porous or irregularly shaped. In addition, particle mass is unlikely to scale with volume, but more likely scales with diameter raised to a power between 2 and 3 (Jackson et al., 1997).

Trap aspect ratio and Reynolds number calculations
The aspect ratio and Reynold's (R t ) number were calculated as in Gardner (1985) for each sediment trap design (Table 5). We assume zero mean flow in our estimate of the turbulent velocity experienced by the neutral traps for the R t number calculation. We calculated the turbulent motion at the trap depth based on typical dissipation rates of 10 −8 -10 −10 m 2 s −3 for the upper 1000 m, following Estapa et al. (2017) and Osborn, (1980). We assume that neutrally buoyant drifting traps were Lagrangian at length scales longer than 1 m. Estimated turbulent velocities ranged from 10 −3 -10 −4 m s −1 . The R t was also estimated using relative median horizontal current velocity data from 350 m on the STT for the wide and narrow tubes.

Field setting
The satellite-derived chlorophyll a concentration at the PAP-SO Site peaked in April 2017, with a maximum value of 1.6 µg L −1 . At 30 m, the depth of the PAP-SO mooring sensor frame, the mean chlorophyll a concentration was 2.1 µg L −1 during the cruise (Fig. 3). The two sediment trap deployments occurred in a changing bloom environment. From mid to late April there was a deepening of the mixed layer from 10 m to 100 m (Fig. 3). The primary production zone was more stable during the cruise with a mean depth of 53.8 ± 14.7 m (Fig. 3).
An anticyclonic eddy was observed north of the PAP-SO site during both deployment periods, with positive sea level anomalies and low velocities in the eddy core, and faster velocities on the eddy flanks, particularly during D1 with the drifting traps traversing along the eddy flanks (Fig. A1). During D2 the sediment traps travelled through and sampled from a region of lower velocities and with minimal influence from the nearby eddy feature. The velocities on the eddy flank were weaker and more localised, away from the drifting sediment traps, during D2.

Methodological variability
To work towards establishing a standard metric for the use of sediment traps to measure particle flux in the upper mesopelagic, we explored the origin of any observed variability between the differing designs. Furthermore, to understand the origin of the observed differences we need to quantify the sources of variability: among different sample splits, between the Group 1 and 2 methodologies, and between different PELAGRA cups.
An intercomparison between replicate sediment trap splits analyzed by Group 1 was undertaken to determine the variability due to the analytical methods. The split to split mean variability for mass, TC, POC, BSi and 234 Th fluxes for all trap designs was ≤15% (further details in Table 3 and Appendix C). PIC fluxes exhibited greater split to split variability, with a mean relative standard deviation (RSD) of 19%, likely due to comparatively low PIC fluxes. The STT deployed with narrow tubes, and thus smaller sample sizes, exhibited the greatest split to split variability for C and 234 Th fluxes, whilst the NBST had the greatest variability for BSi fluxes, and the PELAGRAs exhibited the greatest variability for mass fluxes. A comparison to quantify the variability arising from different sample splits was not carried out for the Group 2 samples.
Replicate splits analyzed independently at laboratory Group 1 and Group 2 were compared in order to determine variability due to different processing methods. TC and POC fluxes measured for the laboratory intercomparison had strong positive significant relationships (Appendix C). The TC and POC fluxes fall close to a 1:1 line whilst the PIC fluxes were greater from the Group 2 calculated values compared to the Group 1 measurements. This was likely to due to differences in the digest of PIC/POC in the respective methods, with Group 1 measuring TC and PIC, whilst Group 2 measures TC and POC. Mass fluxes measured independently by Group 1 and 2 on replicate splits have a weakmoderate positive significant linear relationship (Fig. C1), with greater variability from the STT samples weakening the correlation.
A third possible source of variability arises from intra-platform heterogeneity, between PELAGRA cups. The tubes used for the NBST and STT were combined and then split in an attempt to reduce this heterogeneity, whereas PELAGRA cups are usually analysed individually. PELAGRA intercup variability had a mean RSD of 45% for mass flux, 25% for TC flux, 79% for PIC flux and 23% for POC flux and there was considerable variability among the different PELAGRAs during the same deployment, particularly for mass and PIC flux, the latter of which was amplified by very low measured fluxes (Table 3). PELAGRA fluxes should still allow for the determination of whether trap design or sample heterogeneity was a larger control on export flux than differences in methodology.
To allow for a robust intercomparison below (Section 3.4.1), we have used the data determined following laboratory Group 1 methods, as the measurements had the most replicates and will only be affected by the relatively small variability introduced by split to split heterogeneity. 234 Th fluxes (dpm m −2 d −1 ) sampled at the beginning of each deployment (early) and at the end of each deployment (late) down to 350 m (error bars show uncertainty). The 234 Th fluxes measured using the different drifting trap designs are shown by the markers -green circles = NBST, blue squares = STT, black circles = P1 and P2 and red circles = P t-s .

Spatial and temporal variability
Spatial and temporal variability in the upper ocean may also affect the sediment trap intercomparison, alongside the methodological variability addressed previously. The depth-integrated 234 Th flux was measured from two sets of 3 rosette casts collected in a 10 km triangle around the trap locations, at the start and end of the deployments ("Early" and "Late" respectively; Figs. 1 and 4). Spatial variability in 234 Th flux was smallest among the D1 "Late" casts, and largest (>1500 dpm m −2 d −1 ) among the D2 "Late" casts ( Fig. 4). Variability is observed between the early and late sets of casts and between the two deployments, however there is no consistent increase or decrease in predicted 234 Th fluxes over the sampling period. Overall, the sampling occurred after the peak in chlorophyll a and bloom biomass (Fig. 3), and thus the 234 Th fluxes may be an overestimate of the 234 Th flux at the time of the trap sampling given that the local 234 Th activity reflects processes that occurred prior to sampling, some days-weeks prior to the trap deployment. In periods of decreasing flux, 234 Th may overestimate particle flux if this was not considered (Ceballos-Romero et al., 2018). Spatial heterogeneity in 234 Th may also explain the observed fluctuations over shorter timescales (Resplandy et al., 2012). Sediment trap 234 Th fluxes also exhibited variability and aligned most closely with the lowest integrated 234 Th fluxes in both deployments (Fig. 5c). However, lacking a more complete 3D time-series, we assume that the integrated 234 Th flux, averaged spatially over the sets of 3 casts collected in 10 km triangles around the traps, inform us about changes in flux magnitude derived from 234 Th in the water column relative to the trap 234 Th fluxes.

Sediment trap fluxes
To determine how effectively the different drifting sediment trap designs sample the in situ particle flux we evaluated samples analyzed by Group 1, as described above (Section 3.2). Mass fluxes sampled by wide tubes on the STT were greater than the NBST and greater than narrow tubes (Fig. 5a). PELAGRA fluxes were often lower by more than 50% compared to the cylindrical (NBST and STT) trap fluxes except for the second samples ("Late", Fig. 5a) collected by the time series PE-LAGRA, P t-s . BSi fluxes, had larger NBST and STT fluxes and lower total PELAGRA fluxes, whilst the P t-s late fluxes were more comparable (Fig. 5b). The BSi fluxes were consistently lower for narrow tubes deployed on the STT compared to the wide tubes, similar to the mass fluxes. D1 234 Th fluxes at 200 m agree very closely regardless of trap design and for D1 STT 350 m fluxes agreed well, whilst the PELAGRA fluxes were 50% smaller in magnitude (Fig. 5c). The D1 234 Th fluxes from the late CTD samples agree well with the trap fluxes whilst the early CTD 234 Th fluxes are somewhat larger. D2 STT 234 Th fluxes at 350 m were similar to D1 and were substantially greater than the NBST and PELAGRA fluxes. The D2 234 Th fluxes from the late CTD samples were >1000 dpm m −2 d −1 larger than the trap fluxes whilst the early CTD fluxes agreed well with the STT trap 234 Th fluxes (Fig. 5c). The mean PELAGRA 234 Th fluxes were 69% lower than the mean of cylindrical fluxes at 350 m for D2.
TC and POC fluxes observed during both deployments for the NBST, STT and P t-s late agreed well, whilst the PELAGRA "early" and "total" fluxes were lower (Fig. 6a-b). Split to split variability for mass, BSi, TC, POC and 234 Th fluxes were ≤15% so a >50% difference between cylindrical and conical traps exceeds the expected variability (Table 3). The PIC fluxes were small in magnitude compared to POC fluxes (Fig. 6c). The STT consistently had greater PIC fluxes than the other trap designs, which suggests the sampled material was enriched in PIC. The NBST had larger fluxes than the PELAGRA samples in D1 and D2, except for the P t-s late PIC flux in D2. The PIC split to split variability was greater than for all other flux components with a mean RSD of 19%. Therefore, it is more difficult to determine if the origin of the large observed differences arise from methodological, split to split variability, and/or differences in trap design. There appeared to be no consistent differences between narrow and wide tubes deployed on the STT for carbon fluxes.

Flux composition
The TC: 234 Th and POC: 234 Th ratios in D1 for in situ pumps and sediment traps were broadly similar for both depths and within error, except for P t-s , which has a slightly greater ratios ( Fig. 7a and b). There was minimal difference between the in situ pump ratios for large (>51 µm) and small (1-51 µm) particle fractions. The TC: 234 Th and POC: 234 Th ratios were more variable for D2 with the STT material enriched in 234 Th compared to TC and POC with a lower ratio, whilst some PELAGRAs were slightly depleted of 234 Th compared to the in situ pumps. In contrast, the PIC: 234 Th ratio exhibits greater variability, particularly during D1 (Fig. 7c). The PIC: 234 Th ratio for the 1-51 µm particle fraction collected by the in situ pumps was lower than for particles >51 µm and was considerably lower than the sediment trap ratios during D1. STT PIC: 234 Th ratios had a tendency towards higher ratios.

Particle size distribution
Differences in the sinking particle size distributions (PSDs) collected by the different platforms were compared in terms of number fluxes, particle size distribution slope, and volume fluxes of equivalent spherical particles (Section 2.4 and Appendix D). The size spectra of particles collected in cylindrical traps generally agreed regardless of platform type, while PELAGRA traps, with funnels removed, collected relatively fewer small particles (order 10′s of microns) and more large particles (>100 µm; Fig. 8). The power-law slopes of the particle size distributions were flatter (lower) for PELAGRA than for cylindrical traps (Figs. 9, D1 and D3). PELAGRA traps carried both polyacrylamide (Group 1) and cryogel (Group 2) collectors, and differences between the PSD slopes determined from the different collector types and analysed by the different laboratories were consistent with cup-to-cup variability observed in other analytes on the PELAGRA traps. The implications of the PSDs for the interpretation of bulk fluxes are discussed in Section 4.1.4.

234 Th-derived POC fluxes
The 234 Th-derived POC fluxes (Table 4) exhibited temporal variability between the measurements made early and late in the deployments, even during short (<3 day) time periods. The time-averaged 234 Th-derived POC fluxes at 200 m during D1 compared fairly well at 200 m to cylindrical trap POC fluxes, and agreed with the traps within uncertainty estimates at 350 m. During D2 the 234 Th-derived POC flux was greater than the trap estimates (broadly twice as large) but still within error of the cylindrical trap POC flux, while the conical trap POC flux mean was considerably lower.

Sediment trap sampling
The aim of this study was to determine whether different drifting sediment trap designs, methodologies and laboratory processing leads to differences in carbon export estimates. Firstly, to discern whether differences in observed fluxes are due to differing trap designs, we examine whether the sampled fluxes are expected to agree. The deployments were carried out during a dynamic bloom environment with moderate sinking particle fluxes. Water column 234 Th fluxes exhibited high relative variability on short temporal and spatial scales between the start and end of the 3 day deployment periods. The errors on the 234 Th flux increase with depth, and in part because of the small Th-U disequilibria, these errors are considerable at 200 m and increase by C.A. Baker, et al. Progress in Oceanography 184 (2020) 102317   traps travelled in the same direction and were recovered in similar locations ( Fig. 1b and c), likely driven by the currents on the flanks of the observed anticyclonic eddy. All sediment traps were deployed for similar timeframes, except for the P t-s, which had two shorter deployments periods of~1 day, which may highlight temporal variability in flux.
One of the caveats of using drifting sediment traps is that it is not possible to control the path they traverse. Furthermore, even if the traps do have a similar travel path, this does not ensure that the sediment traps sampled the same particle source regions in the upper ocean, particularly with the heterogeneous nature and velocity shear within the upper ocean (Martin, 2005;Siegel et al., 2008). These factors combined with the small Th-U disequilibria may have been driving the temporal and spatial variability in 234 Th cumulative fluxes. However, due to all the currents moving west/northwest and the short deployment times we are confident that this has not impacted on the flux data or interpretations.
Differences in fluxes when comparing sediment trap data may be due to differences in methodology, which we have standardised in this study. This included standardising swimmer handling and measuring the contribution of <350 µm swimmers to POC fluxes which was small (see Appendix C). Overall the methodology appeared to make minimal difference, particularly for mass and POC fluxes, with PIC differing depending on whether it was measured directly or as the difference between TC and POC. More generally, inter-cup variability within a singular PELAGRA deployment was more likely to have a greater impact. By quantifying sources of variability, we are confident that we have identified differences in flux that are due to variability between the sediment trap designs and not artefacts of other processes.

Flux magnitude
The magnitudes of mass, carbon and biogenic silica fluxes were fairly consistent. The overarching trend was the STT collected the  Table 2 for trap configurations. These data were the basis for the power-law particle size distribution slopes shown in Fig. D3.   Fig. 9. Slope of power-law flux size distribution of particles for Group 1 gel traps during (a) Deployment 1 and (b) Deployment 2. Error bars show effect on particle size distribution (PSD) slope of propagated uncertainty in particle number fluxes.

Table 4
Estimated POC flux (mmol m −2 d −1 ) from predicted 234 Th fluxes and in situ pump POC: 234 Th ratios. The maximum value of either the standard deviation between fluxes or the analytical uncertainty is presented in the brackets. The mean POC fluxes from the cylindrical (NBST and STT) and conical (PELAGRA) traps at the respective depths are shown for comparison. largest fluxes, the NBST fluxes were somewhat lower than the STT, and PELAGRA fluxes were considerably lower and less than the expected variability arising from split to split variability. Surface tethered traps have been found to over-collect in the upper ocean, likely due to tilt of the trap funnels caused by horizontal flows (Buesseler et al., 2007;Gardner, 1980). In this study, cylindrical trap collectors, used on the STT and NBST, appear to sample fluxes that are more representative of in situ fluxes when compared to predicted 234 Th fluxes. The PELAGRA fluxes, except for the "late" P t-s observations , were often at least 50% lower than the NBST and STT fluxes, indicating that there are clear discrepancies in the magnitude of the collected fluxes (Figs. 5 and 6). 234 Th fluxes were measured from water column profiles and sediment traps to allow for an independent estimate of how well the traps represent the true in situ particle flux. Driven by a relatively small disequilibrium at 350 m there was short-term temporal variability in the predicted 234 Th fluxes from the water column profiles at the start and end of the deployments (Fig. 4). The NBST and STT 234 Th fluxes agree more closely with the predicted Th fluxes from the water column samples, whereas PELAGRA 234 Th fluxes were lower in all cases (Fig. 5c). The differences in the predicted versus trap 234 Th fluxes are likely to have been influenced somewhat by the observed spatial variability in the predicted 234 Th fluxes, and by the limitations of using a steady state model during a dynamic period of increasing/decreasing chlorophyll a concentration (Ceballos-Romero et al., 2018;Resplandy et al., 2012). The differences in magnitude between the PELAGRA fluxes and the NBST, STT and predicted Th fluxes do appear to be consistent and so we are confident they are not an artefact of the observed variability.
The 234 Th-derived POC flux was estimated using POC: 234 Th ratios from the in situ pumps as in Ceballos-Romero et al. (2016) and the predicted 234 Th fluxes to provide an independent flux estimate ( Table 4). The 234 Th-derived POC fluxes in Table 4 were broadly within error and within an order of magnitude of the sediment trap POC fluxes, although the uncertainty was large due to the cumulative errors associated with the integrated 234 Th fluxes at the trap depths (relative error ranged between 47 and 156%, with an average of 95% at 200 m (n = 2) and 84% at 350 m (n = 4)). Some variability was expected due to the short deployment periods (<3 days), and the timescale of fluxes measured by 234 Th mentioned above, although as demonstrated by Ceballos-Romero et al. (2018), there is a 'window of success' in which 234 Th fluxes remain similar near in time to the peak in particle flux. However, PELAGRA fluxes in D2 were lower than the 234 Th-dervied POC flux, which is consistent with other measured flux components collected by the platform.
Previous comparisons between drifting trap designs and other export estimates have reported inconsistent findings depending on timing, region and instrumentation. One study found differences in particle fluxes sampled using NBST and PIT traps, even with collectors of identical aspect ratio (Buesseler et al., 2000), and another study found differences between PIT trap and indented rotating sphere trap particle fluxes and 234 Th-derived fluxes, with the latter two methods agreeing well when using a steady state model for 234 Th fluxes (Buesseler et al., 1995;Hernes et al., 2001;Murray et al., 1996). Whereas several other studies have generally observed larger 234 Th-derived fluxes compared to drifting sediment trap fluxes (Benitez-Nelson et al., 2001;Haskell et al., 2013Haskell et al., , 2016Hung et al., 2010). In contrast, other studies found a good agreement between tethered cylindrical drifting traps and 234 Thderived fluxes in the Arctic Ocean, with sediment trap collection efficiencies of 70-100% (Coppola et al., 2002), in the Southern Ocean with fluxes within a factor of 1.4 (Roca-Martí et al., 2017), in the oligotrophic Northern Gulf of Mexico (Maiti et al., 2016) and at the California Current Ecosystem research station (Stukel et al., 2019).
Three studies have compared PELAGRA fluxes with 234 Th fluxes, predominantly in post-bloom environments, or periods of high variability in export, in which the longer term average from radionuclidederived fluxes would not be expected to be similar to the sediment trap fluxes collected during short deployments (Ceballos-Romero et al., 2016;Lampitt et al., 2008;Le Moigne et al., 2013). North Atlantic summer cruises observed integrated 234 Th fluxes up to three times greater than PELAGRA 234 Th fluxes (Lampitt et al., 2008) and the PE-LAGRA POC: 234 Th ratios were generally larger than the in situ pump ratios (Lampitt et al., 2008). In a further North Atlantic study PELAGRA POC fluxes (1-2 mmol m −2 d −1 ) were observed to be much lower than 234 Th-derived POC fluxes (9-11 mmol m −2 d −1 ; Ceballos-Romero et al., 2016). The differences between the fluxes were attributed to the different integration times of the methods and the stage of the bloom during sampling (Ceballos-Romero et al., 2016), however samples from 2 cruises were sampled in pre-bloom, and bloom environments, in which sediment trap samples would be more likely to estimate similar fluxes to predicted Th fluxes (Ceballos-Romero et al., 2018).
When considering the discrepancies in flux magnitude of previous work, the 234 Th-dervied fluxes from the NBST and STT align remarkably well for this study, whilst the PELAGRA fluxes were lower than expected. One further possibility, supported by the larger P t-s 'late' fluxes agreeing more closely with the cylindrical fluxes, is that PELAGRAs may experience a lag in particles sinking into the PELAGRA cups as they slide down the funnel walls. This lag could allow for (1) further particle transformations and degradation to occur, by zooplankton and bacteria feeding, before the particles reach the sample cups resulting in a negative bias, and/or (2) simply a delay in arrival time to the cup, in which case the flux in the first cup might be lower than later cups, due to the late arrival of material during its collection period. Either of these issues might explain differences in 234 Th fluxes and the smaller magnitude of PELAGRA fluxes. The observed differences in the Ceballos-Romero et al., (2016) study, combined with our findings, may suggest that discrepancies in fluxes may arise from undercollection, or delayed collection, by PELAGRAs, as well as differences driven by high export variability or bloom phase (Ceballos-Romero et al., 2018).

Flux composition
The composition of the particle flux material collected was similar, whilst the magnitude of fluxes collected by different drifting trap designs exhibited consistent variability. However, this does not mean that the collected material was similar in composition to the in situ sinking flux (Buesseler et al., 2007). The TC: 234 Th and POC: 234 Th in situ pump ratios were very similar for both particle fractions, and compared well with the trap ratios, although some PELAGRA samples were depleted in 234 Th with higher C/Th ratios. In contrast, the PIC: 234 Th in situ pump ratios had a clear size fractionation, with the small particle fraction having a lower ratio, and the large particle fraction agreeing well with sediment trap ratios. Therefore, the sediment traps appear to collect particles similar in composition to the large particle size fraction of water column samples, with PELAGRA samples showing greater variability in the composition.

Flux particle size
The particle size distribution data indicated that PELAGRA sediment traps appeared to collect fewer small particles and more large particles than the cylindrical NBSTs and STTs, under the deployment conditions. However, the PELAGRAs were deployed without the conical funnels above the sample cups and so we cannot use this information to assess how different collector shapes affect particle size. The differences in number flux of particles appear small (Fig. 8) but these differences lead to large differences in inferred particle volume shown in Fig. D2, albeit with the assumption that particles are solid spheres. The NBST and STT volume fluxes exhibited a distribution that was expected when considering that particles are a continuum of sizes, rather than of discrete size classes, as often determined by observations and used in models (Kriest & Evans, 1999).
The PELAGRA gel cups were deployed with no funnel above the cup. Therefore, it is possible that small particles likely to have been sinking slowly were continually moved with the small scale (<1 m) turbulent motions above the gel collector, and hence were collected less efficiently, compared to the other trap designs. The PELAGRAs sampled less material overall when the conical funnel was used, often 50% lower fluxes than the NBST and the STT. Small, likely slowly-sinking particles are thought to be systemically undercollected by upper ocean sediment traps, especially tethered traps, and the important contribution of slow sinking material to particle flux is often overlooked in sediment trap work (Baker et al., 2017;Buesseler et al., 2007;Durkin et al., 2015).
The differences in flux magnitude observed for the PELAGRAs, when the conical collector was used, compared to the cylindrical trap fluxes, suggests that the PELAGRAs were undercollecting in this study compared to different designs. Additionally, the PELAGRA samples had lower 234 Th fluxes, which may be indicative of sampling less small, slowly sinking particles. As the gel traps were deployed without a conical funnel, we cannot confidently comment on whether observed differences in particle size distribution are related to compositional differences and lower flux magnitudes observed with the funnel. Further deployments with a suite of different configurations, such as deploying PELAGRAs with the wide cylinders, would be needed to fully determine the effect of the conical funnel versus cylindrical collector on sampled particle fluxes. Possible drivers of the differences in sampled fluxes are discussed further in Section 4.2.

Factors affecting collection efficiency
The particle size distributions indicate that PELAGRAs collected fewer small particles and more large particles without the trap funnels installed, and significantly less fluxes overall with the trap funnels installed. Modified flow fields around PELAGRAs versus NBSTs and the STT could drive these differences and would vary for PELAGRAs deployed with and without the conical funnels. For example, slowlysinking, likely smaller, particles may have been collected less efficiently by the gel traps due to the particles continually moving with the turbulent flow above the gel collector. If undercollection of small particles also occurs or was exacerbated by the use of the conical funnel then this may explain the differences in flux magnitude.
PELAGRAs were designed to be lagrangian and have collecting funnels, of a broadly conical shape and a circular sector opening, to reduce issues relating to collection efficiency (Lampitt et al., 2008). It was assumed that horizontal flow across the top of the PELAGRA funnels would be negligible due to the neutral buoyancy of the Lagrangian traps (Buesseler et al., 2000;Gardner, 1980;Knauer et al., 1979;Salter et al., 2007). However, neutrally buoyant drifting traps only achieve quasi-lagrangian drift which should reduce the effect of horizontal flows >1 m length scale (D'Asaro, 2003) on the gravitational settling particles, but are unlikely to eliminate it at smaller scales.
The major design difference of upper-ocean drifting traps is the shape of the collector, usually cones versus cylinders, and there has been much debate about how this affects the collection efficiency and composition of sampled particle flux (Buesseler et al., 2007). Theoretical calculations, laboratory flume experiments and field studies suggest that as the collector's aspect ratio (height:diameter ratio) decreases and the trap Reynolds number (R t ) increases, the effects of hydrodynamic bias will increase in importance (Baker et al., 1988;Buesseler et al., 2007;Butman, 1986;Gardner, 1985Gardner, , 2000. As aspect ratio decreases the likelihood of washout increases, therefore aspect ratios above 3 are preferable and greater than 5 are more likely to provide good collection efficiencies (Buesseler et al., 2007). The aspect ratio of the different trap collectors were calculated and compared in Table 5. The narrow tubes have the highest aspect ratio whilst the P t-s funnels have the lowest between the tube and funnel designs. Both the NBST and PITS collectors aspect ratios are above 5, which is considered ideal for good collection, whilst the PELAGRA collection funnels are below 5 (Buesseler et al., 2007). For PELAGRAs deployed without funnels, the aspect ratio of the gel cups is 0.33, which is lower than the conical funnel (aspect ratio of~2).
The collection areas of sediment traps, such as the PELAGRA samples jars and base of the cylindrical traps, should function similarly to the 'tranquil zone' defined by Gardner, (1985) for moored traps. The tranquil zone is the area at the bottom of a sediment trap where particles, especially slow-sinking particles, are no longer affected by turbulence and hence are unlikely to be re-suspended out of the sediment trap collector. The tranquil zone of the wide and narrow tubes have larger aspect ratios, 5.51 and 8.39 respectively, and the collection cylinders were deployed with a honeycomb baffle to further reduce hydrodynamic issues and reduce washout. The PELAGRA sample jars are wider than the opening at the bottom of the collection funnel which Table 5 Dimensions, aspect ratios and estimated Reynolds (R t ) number for deployments 1 and 2 at 350 m. The estimated turbulent velocities, detailed in Section 2.2, and the kinematic viscosity values, which ranged minimally around 1.298 × 10 −6 m 2 s − 1 , were used to calculate the lower and upper limit of R t . 10-98 D1 -5 867** D2 -12 723** 6-57 D1 -3 414** D2 -7 403** * The PELAGRA collection funnel opening has been assumed to be spherical when using the area to calculate the diameter but it is similar to a circular sector. ** R t calculated from the median relative velocities measured by a current meter below the STT tubes at 350 m for D1 and D2.
could impact the stability of the tranquil zone if complex streamlines of flow arise. For the gel cup deployments, with no funnel above the cup, the tranquil zone above the gel may not have been maintained with the possibility of small scale turbulent flows. For this reason, the size distribution differences between the wide cylinders and PELAGRA gels with the lowest aspect ratios, may be a factor in the observed lack of smaller particles in the PELAGRA gels. As R t increases, the collection efficiency of sediment traps is thought to decrease, as discussed in detail and demonstrated in Fig. 2.2 of Buesseler et al. (2007). R t values were computed using the estimated turbulent velocities and relative current velocity from 350 m on the STT (Table 5). These were very low which supports the assumption that horizontal flows and turbulent flows around the Lagrangian traps should have a minimal influence on the collection of particles. However, the R t is likely an oversimplification of the true nature of flows around Lagrangian traps which may be affected by other factors such as tilt and changes in buoyancy. This is supported by the R t values calculated using the relative horizontal current velocity from the STT which ranged between 3414 and 12 732 (Table 5).

Study implications and recommendations
Drifting sediment traps are commonly used to measure carbon export and have been used previously to attempt to balance the carbon budget in the mesopelagic zone Steinberg et al., 2008). The sinking POC flux is the major contributor to the carbon budget in the mesopelagic zone, and this study highlights that, depending on the drifting trap design used, the magnitude of the POC flux can vary considerably. Here we find generally lower fluxes in the PE-LAGRA traps (up to 50% for all flux components) than either a drifting or different neutrally buoyant design, and lower fluxes than predicted from 234 Th water column data for PELAGRA compared to the other traps. Prior studies have shown both positive and negative collection differences between predicted 234 Th flux and drifting trap flux (e.g. Buesseler, 1991), and general agreement between drifting traps and NBSTs, except on occasions when drifting traps are higher (Owens et al., 2013). Clearly the local conditions, both hydrodynamic variables and the quantity and quality of the sinking material, will impact any collection bias. As done here, it is also important to compare sample processing steps, and cup to cup, or split to split variability. We find for example differences in PIC and mass flux due to processing differences. In Owens et al. (2013) differences in processing, especially consideration of trap blanks, led to positive biases in traps that are not blank corrected. Therefore, we support the recommendation of McDonnell & Buesseler (2012) that multiple methodologies to measure carbon export should be employed in field studies, to better account for each sampling method's, and in this case sediment trap designs, merits and uncertainties.
PELAGRA inter-cup heterogeneity was often greater than differences originating from variability in laboratory protocols and sample handling methodology. We recommend for that platform, that as many cups as possible should be combined before analysis, similarly to NBST and STT tubes, especially if only a singular measurement of the chemical components will be undertaken per trap deployment. Furthermore, when using the PELAGRAs with a time-series capability (P t-s ), both early and late samples should be analysed as fluxes can vary considerably. The differences in the 'early' and 'late' PELAGRA fluxes warrants further investigation into whether PELAGRA fluxes are impacted by a time lag before particle sampling starts. If this is an issue, one solution may be to delay the sampling cup opening to reduce the impact of the delay. In the future, for studies that focus on the magnitude of carbon export, at least two methods of quantifying export should be used. A further intercomparison is needed in which PELAGRAs with wide NBST tubes and conical funnels should be deployed above the sampling cups, as well as varying configurations, with sample cups of different aspect ratios, to fully examine where the differences in the collection efficiency originate. Ideally this needs to be done under varying conditions of flux magnitude and sinking particle composition, since the differences need not hold for all flux components, sinking rates and reactivities.

Conclusion
This study successfully compared a suite of modern methods for measuring carbon export, with a particular focus on assessing the performance of different drifting sediment trap designs. We found that whilst differences in flux sampled by the NBST and STT (cylindrical) traps were small, there were significant differences in fluxes collected by PELAGRAs (conical). The conical traps often collected fluxes less than half of the mean fluxes sampled by cylindrical traps under this set of conditions. When comparing trap fluxes to 234 Th-dervied POC fluxes, in situ pump and particle size distribution data, the NBST and STT appeared to perform well in terms of collecting fluxes representative in terms of magnitude of the in situ particle flux, whereas PELAGRAs were consistently sampling smaller magnitude fluxes. Further work is still needed to illuminate whether differences between the trap fluxes were due to the collector or sample cup shape or other features of the designs, and if these differences would hold when the magnitude or type of sinking particles differ.
Recently, there have been two large projects focusing of carbon export and flux throughout the twilight zone in different localities, COMICS and EXPORTS, Siegel et al., 2016) using either one design (PELAGRA -COMICS) or two designs (drifting and NBST -EXPORTS) of sediment trap as a predominant method for sampling carbon flux. This study highlights that studies focusing on the magnitude of carbon flux may be more sensitive to the instrumentation used than studies focused on the composition of flux. Furthermore, discrepancies in methodology relating to trap design is an important issue that needs to be resolved if carbon export is to be compared across different studies, region and time in the future and to be able to achieve an improved global understanding of processes driving and controlling carbon export.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix C. Laboratory and methodology intercomparison
An intercomparison on replicate sediment trap splits analyzed by the Group 1 was undertaken to determine the variability due only to the analytical methods. The split to split variability for mass fluxes was similar across trap designs with a mean of 12.4% relative standard deviation (RSD) between replicate splits (Table 3). The split to split variability for TC fluxes was lowest for the NBST with a mean RSD of 3.8% and greatest for STT (narrow tubes) with an 11.1% RSD, with the average RSD for all trap types of 8.0%. The PIC flux RSD indicates split to split variability was also lowest for the NBST (11.7%) and greatest for the STT (narrow tubes, 41.7%), with a mean RSD of 18.9%. The split to split variability for the calculated POC fluxes was similar to the TC fluxes with a mean RSD of 8.9%.

Table C1
Intercomparison data for directly comparing Group 1 and Group 2 methodology for TC, PIC, POC, Mass and BSi fluxes with the analytical uncertainty in brackets and the number of replicates (n) stated. All data was collected at 350 m, except for the NBST and STT in deployment 1 which is stated in the table. In instances where the methodology requires either the POC flux (Group 1) or the PIC flux (Group 2) to be calculated the number of replicates is replaced by 'calc'. The sample ID indicates the deployment number and which splits (A-H) were used for the measurements. Only split-split variability and differences in methodology should lead to changes in the fluxes for each trap deployment.  Replicate splits analyzed independently at laboratory Group 1 and Group 2 were compared in order to determine variability due to different processing methods (see Table C1). TC and POC fluxes measured for the laboratory intercomparison had a strong positive significant relationship (p < 0.001). The TC and POC fluxes fall close to a 1:1 line whilst the PIC fluxes were slightly greater from the Group 2 calculated values compared to the Group 1 measurements and exhibited the greatest inter-laboratory variability. This was likely to due to differences in the digest of PIC/POC in the respective methods, with Group 1 measuring TC and PIC, whilst Group 2 measures TC and POC. Mass fluxes measured independently by Group 1 and 2 on replicate splits have a weak-moderate positive significant linear relationship (R 2 of 0.439, p = 0.02, Fig. C1), with greater variability from the STT samples weakening the correlation.
Replicate splits analyzed independently by Group 1 and Group 2 were compared in order to determine variability due to different processing methods. TC and POC fluxes measured for the laboratory intercomparison had a strong positive significant relationship (p < 0.001) with TC flux exhibiting the stronger correlation (R 2 of 0.91) and POC had a correlation (R 2 of 0.88; Fig. C1). The TC and POC fluxes fall close to a 1:1 line whilst the PIC fluxes were greater from the Group 2 calculated values compared to the Group 1 measurements and exhibited the greatest inter-laboratory variability. This was likely to due to differences in the digest of PIC/POC in the respective methods, with Group 1 measuring TC and PIC, whilst Group 2 measures TC and POC. The variability between the laboratories was greatest for PIC fluxes, as they were comparatively small in magnitude, with a mean RSD for all trap designs of 63.0%, compared to 15.1% for TC fluxes and 14.7% for POC fluxes. PELAGRA samples continually exhibited the most variability in carbon fluxes between the laboratory methodologies (Table 3).
Mass flux was measured in two different ways by laboratory Group 2, by weighing two GF/Fs for C/N analysis and using a Nucleopore filter for BSi analysis for the PELAGRA samples (Fig. C2). When comparing the Group 2 mass fluxes against Group 1 fluxes it was apparent that the PELAGRA cups, as supported by the RSD in Table 3, exhibit inter-cup variability with no consistent trend, but also that the material in the cups was likely heterogeneous, meaning the split to split variability may be driving some of the differences. P t-s samples were recorded as having heterogeneous composition in laboratory processing notes and P t-s appears to continually exhibit greater variability across all fluxes compared to total PELAGRA fluxes, but also sampled greater fluxes than the total PELAGRA fluxes. PELAGRA samples also continually exhibited the most variability in carbon fluxes between the laboratory group methodologies (RSD from 13.6% to 84.7%, Table 3) however, it was likely that inter-cup variability exhibits a greater effect than laboratory processing methodology for PELAGRAs.
BSi fluxes measured at Group 1 and Group 2 also exhibit variability, even when using the percentage of BSi to calculate to flux based on Group 1 mass fluxes to remove the effect of differences in mass. The BSi measurements at Group 2 were lower than the Group 1 fluxes (Fig. C2). However, there were no split replicate measurements, only measurements from different PELAGRA cups, which means these differences also include split to split and cup to cup variability which has been shown to be variable for PELAGRAs (Table 3).
One further methodological difference between the Group 1 processed PELAGRA cups and the Group 2 processed PELAGRA cups was that in the Group 2 methodology, swimmers were entirely picked by hand, while in the Group 1 methodology, only swimmers larger than 350 μm were removed. To determine the POC contribution by swimmers smaller than 350 µm to the PELAGRA cups, Group 2 screened swimmers through a 350 µm mesh after picking. On average, the swimmers smaller than 350 µm contributed 0.18 ± 0.09 mmol m −2 d −1 to the POC fluxes which is equivalent to~20% of the lowest PELAGRA POC flux.

Appendix D. Particle size distribution methodology
Gel samples were collected from all sediment trap types in identical, transparent-bottomed polycarbonate jars. These were filled with approximately 50 g of 40% polyacrylamide gel (Durkin et al., 2015) to give a gross jar weight of 120 g, or an equivalent volume of cryogel (Tissue Tek, O.C.T.™ Compound from Sakura). The pre-filled gel jars were stored frozen on board the ship until just prior to use to eliminate air bubbles. On the STT and NBSTs, the gel-filled jars were placed in the bottom of the wide trap tubes which were then filled to the top with 1 μm-filtered seawater. On the PELAGRA traps, the jars were filled to the brim with filtered seawater and then threaded into place on the trap carousel. The trap funnels above the gel collectors were removed to avoid aggregation of particles during collection. After trap recovery, the gels were removed from the traps and allowed to settle in the refrigerator (4°C) for several hours. Then, any remaining overlying seawater was gently pipetted off. Gels were stored at 4°C except while they were being imaged, and then frozen (−20°C) once on-board imaging was complete. Polyacrylamide gel jars were shipped frozen to Group 1 for subsequent onshore analysis. They continued to be stored at −20°C, except during imaging when they were allowed to thaw overnight at 4°C and then warm to room temperature prior to imaging.
On board the ship, both polyacrylamide and cryogels, were imaged in their entirety using transmitted light at low magnification (pixel size 11 μm, one focal plane used) with a custom imaging setup belonging to M. Iversen (Basler acA4600 7gc camera, Edmund Optics 16 mm/F1.8 86,571 Fig. C2. Intercomparison of mass and BSi fluxes measured on PELAGRA cups only using Group 1 and Group 2 methodology. (a) Mass fluxes as measured from GF/F filters by Group 2 (2 replicates; slope −0.255) and a nucleopore filter by Group 2 (1 filter only; slope 0.193). (b) BSi fluxes measured using nucleopore filters by Groups 1 (slope 0.159) and 2 (slope 0.032). Note the different y axis scale in (b). The dashed line is the 1:1 line. No significant linear relationships were found. Differences in fluxes may be due to PELAGRA cup to cup variability as highlighted in Table 3 (mean of 40% for mass but highly variable) as well as differences in methodology. For Fig. C2b the percentage contribution of BSi from Group 2 analyses was used with the Group 1 mass flux to identify where discrepancies in the estimates originated. lens). On shore, Group 1 analyzed the polyacrylamide gels and Group 2 analyzed the cryogels. During the Group 1 analysis, thawed polyacrylamide gels were reimaged under transmitted light at two higher magnifications (pixel sizes 1.02 and 2.54 μm) using an Olympus IX83 inverted light microscope with automated stage and microscope control. The high magnification imaging was not done for the cryogel samples. For the polyacrylamide gels, at the first magnification (pixel size 2.54 μm) 10 quasi-randomly selected fields of view within a radius of 2.5 cm from the center of the gel were imaged through the entire working distance of the lens using the Olympus Cellsens software package. At the second magnification (pixel size 1.02 μm) 5 fields of view were imaged. Fields of view were selected to avoid swimmers. The size of the Z-steps ranged from 5 to 31 μm. Some sample jars had bottoms that were recessed upwards a few millimeters into the jar and thus the entire working distance of the lens could not be utilized. In these cases particle counts (below) were normalized to the thickness of the gel that was actually imaged.
Low magnification images collected by Group 1 aboard the ship included the jar edges, and like most sediment trap samples, typically contained several zooplankton "swimmers" that were presumed to have actively entered the trap. In order to exclude these features from further attenuance and particle size analysis (below), polygons surrounding the portions of each image to be retained for further analysis were constructed manually using the Matlab (Mathworks, Inc.) "roipoly" tool. In each image, the mode of the unmasked pixels was taken as the background value and then attenuance (ATN) was computed on a per-pixel basis as: The attenuance flux (units m 2 m −2 d −1 ) for each sample was computed by summing the attenuance over the total unmasked area of all images collected, then normalizing to that area and to the length of the trap deployment. For the PELAGRA gel samples, the gel jar diameter was larger than that of the trap collection opening, so the measured attenuance fluxes were scaled by the ratios of the unmasked image areas to the trap collection area. Particle size distributions were computed from the red channel of the attenuance images following the same procedure described below for the high-resolution images.
High-resolution images collected on shore by Group 1 were first compressed to a single plane using a focus-stacking algorithm written for Matlab ('fstack', Pertuz et al., 2013). The size distribution of the particles was determined from the red channel using the following sequence of steps. First, the background was computed using a median filter (Matlab "medfilt2", 500x500 pixel filter) and subtracted from the original image. Next, a 1-pixel blurring element (Matlab "strel") was used to dilate and then erode the image (Matlab "imdilate" and "imerode") to remove noise and smooth particle features. Finally intra-particle holes were filled (Matlab "imfill"). For the low-resolution attenuance images collected on board the ship, the first background-subtraction step was skipped. After these pre-processing steps, particle statistics were computed using the Matlab function "bwconncomp" which identifies particles based on connectivity of non-zero pixels. During Group 2 onshore analysis, the cryogel samples were analysed by removing the background using a Gaussian filter and the particles were identified and sized using the 'Matlab Image Processing Tool Box' (The Mathworks, Inc).
At each magnification, particles were binned into 10 log-spaced size bins and the particle counts were normalized to the width of the bins (in microns). The particle counts in the appropriate process blanks, determined in an identical manner to the samples, were subtracted from the sample particle size distributions. Uncertainty in particle counts was propagated from the counting uncertainty (±one particle per size bin) in both sample  (c) and (d) were only estimated from larger size particles only as high resolution images were not taken for Group 2 cryogels. Error bars show effect on particle size distribution (PSD) slope of propagated uncertainty in particle number fluxes. and blank. Particle number fluxes as a function of size (units N μm −1 m −2 d −1 ) were computed at each magnification by normalizing particle counts to the unmasked, imaged area, and to the deployment length. PELAGRA particle counts imaged at low resolution were additionally multiplied by the jar:trap opening area ratio (5.87) because the counts included the jar edges, which were not directly under the trap opening. High resolution counts, which were conducted on images from directly under the trap opening, were not normalized in this fashion.
Using relative uncertainty as a guide, merged particle size distributions were created from the three magnifications (Durkin et al., 2015). Size bins smaller than 40 μm were taken from the highest magnification (pixel size 1.02 μm), size bins ranging from 40 to 380 μm were taken from the middle magnification (pixel size 2.54 μm), and size bins above 380 μm were taken from the lowest magnification (pixel size 11 μm). In order to facilitate a simple comparison of particle size across all samples, a two-parameter power-law function was fit to the particle size spectrum for each sample: where N(D) gives the number of particles as a function of particle diameter, D 0 is a reference diameter, and ξ is the power-law size distribution slope. The value of ξ is insensitive to the choice of D 0 (and therefore the value of N(D 0 )) so only ξ is reported here. A Monte Carlo procedure was used to propagate the error in N(D) into ξ. A simple estimate of the particle volume flux was made by computing the equivalent spherical volumes of particles from their equivalent spherical diameters (which were, in turn, determined from projected particle area). This introduces bias, because the area:volume ratio of a sphere is the smallest of all possible shapes. Therefore, the equivalent spherical volumes presented here are probably overestimates, particularly for large diameter particles which are less likely to be spherical than small ones. In addition, particle mass is unlikely to scale with volume, but more likely scales with diameter raised to a power between 2 and 3 (e.g. Jackson et al., 1997).
During D1 at 200 m the NBST and STT sampled a similar number flux and particle size with the NBST collecting slightly fewer particles >800 µm in size. During D1 at 350 m the STT data looks very similar to D1 at 200 m whereas P2 appears to have collected fewer small particles (10-100 µm) and more particles >400 µm (Figs. D2 and D3). The observations in D1 appear to hold true for D2, although PELAGRAs collected more particles >100 µm, with a 'peak' at 400 µm. Fig. D3 was consistent with the observations in Fig. D2, while there was significant variability among all trap types, the PELAGRA particle size distribution slope was flatter (i.e., includes relatively fewer small particles and more large ones) than the cylindrical traps.  Table 2 for trap configurations. The volume flux was computed by assuming all particles were solid spheres (see Section 2.4) and was likely an overestimate of the true particle volume flux. Fig. D3. Total equivalent-spherical volume flux size distribution of particles in gel traps during (a) Deployment 1 and b) Deployment 2. Note the y-axis scale difference between (a) and (b). The STT trap lids at 350 m failed to close during D1. The volume flux was computed by assuming all particles were spherically shaped which will inflate the volume for large particles, and then the flux was summed over all size bins to give the totals shown here.
During D1 the equivalent spherical volume flux was lowest for the NBST at 200 m, similar for the STT (200 and 350 m) and largest for P2. During D2 the flux in Fig. D3b was almost identical for the NBST and STT, whilst the P1 and P2 fluxes were much greater, supporting the idea that the PELAGRAs are sampling more large particles.