Current issues confounding the rapid toxicological assessment of oil spills

We review the challenges of establishing standardised risk assessment


Introduction
Oil spills have the potential to cause immediate and widespread toxicity to the environment (Brussaard et al., 2016). Despite the significant reduction in the frequency and volume of oil spills from oil tankers over the past 30 years, Cedre (Centre of Documentation, Research and Experimentation on Accidental Water Pollution) reported 60 oil and chemical spills worldwide, ranging from 10 to over 100 tonnes, in 2016 alone (Cedre, 2016). In addition to these mostly small-scale spills, major releases have happened throughout the history of the oil industry. Such spills include the 1989 Exxon Valdez oil tanker grounding, which released 41 million litres of Alaska North Slope crude into the Prince William Sound, Alaska USA (Paine et al., 1996), and the 2010 Macondo deep-sea well blowout, commonly known as the Deepwater Horizon disaster, which caused an estimated spill of 507 million litres of crude oil into the Gulf of Mexico (U.S. District Court for the Eastern District of Louisiana, 2010). The widespread contamination caused by oil spills can have economic, environmental, public health and social and community impacts. Such threats include damage to fishing and tourism industries, endangering of species, and physical health effects in humans (Li et al., 2016). Hence it is essential that adequate methods are available to determine the potential toxicity of a spill and to guide rapid response.
Oil spill responses depend on several factors, including the type and volume of oil, location of the spill, weather, and sea state. For example, a response to a spill in shallow waters, or near a coast line would require a different spill response compared to a deep water spill. Decision makers apply Net Environmental Benefit Analysis (NEBA), or the newer term Spill Impact Mitigation Analysis (SIMA), to weigh up the benefits of clean-up techniques against disadvantages to environmental and social health to ensure minimal damage is caused by a spill (IPECA, 2016).
Toxicity testing is required following a spill not only to assess the risks associated with the spill but also to support decision making and predict the future consequences of the event. In addition, biological monitoring techniques are recommended to monitor both the spill effects and the efficacy of spill clean-up methods (Lee and Merlin, 1999;Redman and Parkerton, 2015). Biomonitoring can be applied in-situ in a number of ways, from whole organism and community evaluations to the use of intracellular-effects assays which are sensitive to specific types of pollutants (Wieczerzak et al., 2016). In the past, biological testing has been expensive, time consuming, highly variable and not always comparable with chemical analysis due to the wide range of methodologies and parameters available (Prasse et al., 2015;Snell and Persoone, 1989). Now, high-throughput biological indicators have been developed into cost effective commercial kits which have been recommended for use within environmental management programmes (Blaise et al., 2004).
A number of reviews and studies including Brack et al. (2016); Brussaard et al. (2016); Whale et al. (2018) have highlighted the need to identify relevant suites of bio-indicator species and intracellular biomarkers to ensure hazard assessments are accurate for future oil spill responses, especially in environments with multistressors and complex mixtures. According to this literature, to provide the best possible guidance for decision makers, biological assessment methods should be highly characterised, and include intracellular indicators of toxicity and genotoxicity for a number of species across trophic levels (Blaise et al., 2004;Coelho et al., 2013;Galloway et al., 2002). Accordingly, a wide range of biological assays and commercial tests have been ring-tested and validated for use in the assessment of freshwater and effluent monitoring. Moser and R€ ombke (2009) ;Prasse et al. (2015); Wadhia and Thompson (2007); Wieczerzak et al. (2016) have provided an overview of many of the highly validated tests in the form of reviews and ring tests. Despite this, there remains a lack of such tests that have been specifically validated using crude oil, dispersants or the individual chemicals found within them (Whale et al., 2018).
In this review we explore the significant progress that has been made in meeting these criteria and review the current state of science regarding the rapid assessment of biological effects following an oil spill. We first highlight the challenges associated with working with oil as a pollutant before discussing the most relevant oil-related biological impacts and some of the universal commercially available kits that may be appropriate for the rapid assessment of oil spill toxicity. Next we discuss approaches and the use of passive dosing (an alternative approach to oil dosing) in laboratory exposure scenarios and show how these methods can be supported by models to predict ecosystem damage, to improve the applicability of oil toxicity studies. Finally, we comment on the future work required to develop a state-of-the-art toolkit that will enable rapid risk assessment following a spill, a vital tool not currently available to decision makers.

Oil components
One of the issues with assessing oil toxicity is that crude oil is not a single compound; it is a unique combination of thousands of different compounds of different weights and structures (Beens and Brinkman, 2000) some of which are illustrated by the exemplary crude oils in Fig. 1. All crude oils contain saturated hydrocarbons, aromatic hydrocarbons, non-hydrocarbon molecular structures containing sulphur (S), nitrogen (N) and oxygen (O) and a proportion of trace metals. Crude oils vary in structure and composition based on the ratio of the above compounds. The main chemical groups associated with the toxicity of oil and their modes of action are shown in Table 1.

Weathering
Another key challenge is that the properties of oil change significantly following a release as the oil becomes a slick, spreads and weathers against different climatic conditions (Fig. 2). Weathering describes the process by which light, volatile compounds are evaporated into the air and/or dissolved into the water column. The reduction of light compounds causes an increase in the percentage of heavier compounds within the slick, increasing viscosity and promoting the formation of an oil-water emulsion which will thicken with increased wave action (Arey et al., 2007;Fingas, 1995;Mackay and Matsugu, 1973).
Such physical changes are demonstrated by the example of Deepwater Horizon, where the concentrations and compositions of the petrochemicals reported varied significantly across depths and were estimated to have persisted for up to a month (Reddy et al., 2011). Understandably, the composition of polycyclic aromatic hydrocarbons (PAHs) in deep water (~1000e1200 m depth) was significantly different to that of the surface, which had a higher proportion of the more toxic low molecular weight PAH (C 1 eC 3 ) and BTEX (benzene, toluene, ethylbenzene, xylene) chemicals. In addition, these high concentrations of C 1 eC 3 PAHs (up to 100 mg L À1 ) are likely to have persisted in the water column for longer than relatively insoluble PAHs and alkanes which were transported to the sea surface or sea floor (Reddy et al., 2011;Ryerson et al., 2011). Sammarco et al. (2013) measured 66 surfacewater sites between May and November 2010 in the Gulf of Mexico, finding high mean concentrations (202 mg L À1 ) of total petroleum hydrocarbons (C8eC40), but low mean concentrations of the total PAHs (0.047 mg L À1 ). Their results showed PAH concentrations were not consistent across groups; C1 phenanthrenes and anthracenes appeared to be more persistent than most PAHs with a mean concentration of 1.174 mg L À1 across sites while the concentrations of other groups were measurable but significantly lower.

Complex mixtures
The turbulent chemical characteristics of oil spills caused by weathering can result in the formation of complex, unresolved mixtures, the toxicity of which remains unclear (Petersen et al., 2017). Interacting chemical mixtures can cause toxicity through the independent action of each chemical, or additive or synergistic effects where the toxicity of a chemical combination is greater than the sum of the combined chemicals (Duan et al., 2017). Mixtures of PAHs are considered to have additive toxicity (Fent and B€ atscher, 2000;Landrum et al., 2003). However, in the case of oil spill toxicity testing, it is necessary to assess oil and dispersant toxicity both individually and in combination to support this theory and identify any chemical specific variation.
Oil and the emulsions it forms can have physiological effects. Observed effects following physical oiling/smothering include lack of mobility, inflammation, increased metabolic rate, and damage to digestive organs in birds (Hartung and Hunt, 1966;Khan and Ryan, 1991;Lambert et al., 1982;Lee et al., 1985;Patton and Dieter, 1980). Furthermore, teratogenic and early development stage effects have been identified in birds (Grau et al., 1977), sea urchins (Pillai et al., 2003), oyster larvae (Wessel et al., 2007) and marine worms  following oil exposure. In marine worms (Arenicola marina and Alitta virens), oil contamination reduces fertilisation success and development rates which coincide with an increase in developmental instability .

Mitigation methods
To effectively manage spills, responses are tailored to each spill scenario, considering the risk of both the spill and the mitigation methods used. Mitigation methods should be considered to ensure the response is not more harmful than the spill itself. Commonly employed mitigation techniques include the direct capture of oil by surface skimming; application of sorbent materials, including particles and gels; burning, and dispersal using chemical dispersing agents (US EPA, 2017).
Dispersants accelerate oil degradation, often preventing or reducing the amount of oil that reaches areas of high biodiversity such as a coastline or coral reef. However, some studies which investigated in-vitro toxicity of chemically-dispersed oil have suggested the potential for human health effects. These studies showed alterations in the gene expression of pathways associated Table 1 Commonly observed toxicological effects of polycyclic aromatic hydrocarbons.
with cancer and immune function in human lung epithelial cells exposed to chemically enhanced water accommodated fractions (CEWAFs; oil, water and dispersant mixtures) containing Corexit® EC9500A or Corexit® EC9527 and Macondo crude oil compared to Macondo crude oil water accommodated fractions (WAFs; oil water mixtures) . Furthermore, some studies have shown dispersants to have toxic effects individually while others have shown organism sensitivity is increased by some oil dispersant mixtures compared to when exposed to oil alone. For example, low concentrations of dispersants can impact organism health and behaviour; Negri et al. (2018) investigated the impact of dispersants (Ardrox® 6120, Corexit™ 9500A, Slickgone LTSW, Slickgone NS, Finisol® OSR 52 and SDS) on coral Acropora millepora over three exposure periods (2, 6 and 24 h). The results showed coral larval settlement was inhibited between 2 and 6 h with EC50s (the concentration at which 50% of the exposed population showed an effect) ranging from 2.6 mg L À1 to 11.1 mg L À1 suggesting that coral larvae are more sensitive to dispersants than other coral life stages (Negri et al., 2018). However, these results also suggest that the dispersants are less toxic to corals exposed to light crude oil (Negri et al., 2016). Modern dispersing products follow an approval process, use less toxic solvents and in some cases up to 100 fold less toxic than their predecessors (Fingas and Advisory, 2002;MMO, 2011). However, there is still uncertainty regarding the suitability of dispersants for environmental use due to challenges associated with the quantification of dispersant, dispersant and oil, and dispersant, oil and environmental interaction toxicity (National Research Council, 1989;Rico-Martínez et al., 2013). In order to guide the best choice of response, the more information that can be gleaned about the nature of spilled oil and the susceptibility of the exposed ecosystem, the better. Therefore, the effect of a spill should be monitored throughout the spill, clean up and in subsequent years to better aid future risk assessment and decision making.

Effect based tools and rapid assessment methods
Oil can cause both acute and chronic damage to organisms from cellular to population level (Fig. 3). Hence biological testing following a spill is recommended to monitor toxicity across levels of biological organisation; provide operational guidance and to quantify the success of the response and mitigation methods used (Lee and Merlin, 1999). Post-spill monitoring strategies should ideally assess a selection of species across trophic levels and life history strategies to indicate damage across the ecosystem (Blaise et al., 2004). The inclusion of commercially important species is important to assess whether contamination is likely to exceed regulatory limits for human consumption and to advise on fisheries closures (Law et al., 2011).
Recent advances mean there are a wide range of such assays, toxkits and bioassays available for rapid risk assessment. Extensive research has been conducted to ring test and validate biomarkers for use in effluent testing (Den Haan, 2011), the impact of pesticides (Burga-Perez et al., 2013) and in conjunction with passive samplers (Whale et al., 2018). However, only a small percentage are specialised for use within the marine environment (summarised in Table 2), particularly marine tropical environments, or are specifically validated for assessing PAHs or oil related chemicals (Whale et al., 2018;Wieczerzak et al., 2016). Therefore, it is challenging to encompass all of the potential health implications of exposures following a spill. Below, the impacts of oil and a selection of universal toxkits and biomarkers capable of assessing biological effects which may be seen following a spill are discussed in order to investigate the applicability and adaptability of such tests for rapid oil spill hazard assessment.

Whole organism assays
Whole organism assays are a useful indicator of species sensitivities to different chemical groups. Survival assays (48e96 h) conducted with individual oil components have shown the LC50 (the concentration lethal to 50% of the exposed species) for toluene, for example, to vary from 3.78 mg L À1 in water flea (Ceriodaphnia dubia) to 133.4 mg L À1 in fish (Tilapia zillii) with the majority of species, including salmon (Oncorhynchus kisutch) (Austin and Eadsforth, 2014), fathead minnow (Pimephales promelas) (Devlin et al., 1982), striped bass (Morone saxatilis) and shrimp (Crago franciscorum) (Benville and Korn, 1977) exhibiting LC50 values less than 30 mg L À1 (McGrath et al., 2018). Conversely, LC50s range from 0.295 mg L À1 for glass shrimp (Palaemonetes pugio) exposed to 1ethylnaphthalene for 48 h (Unger et al., 2007) to 8.13 mg L À1 for mussel (Mytilus edulis) exposed to 2-methylnaphthalene for 96 h (Olsen et al., 2011). The toxicity of naphthalene is compound specific, for example zebrafish (Danio rerio) exposed to naphthalene for 120 h were less sensitive (LC50 6.309 mg L À1 ) compared to those exposed to 1-methylnaphthalene under the same conditions (LC50 1.013 mg L À1 ) . Three ringed phenanthrene is more acutely toxic than naphthalene, suggesting that toxicity increases as solubility decreases. Zebrafish exposed to phenanthrene and octohydrophenanthrene for 120 h were significantly more sensitive (phenanthrene LC50 0.334 mg L À1 , octohydrophenathrene LC50 0.052 mg L À1 ) than those exposed to 1methylnaphthalene and naphthalene .
Furthermore, PAH toxicity is presents differently in organisms depending on their ability to biotransform parent PAH compounds. Organisms which are able to metabolise PAHs, such as fish, are more at risk of PAH as they can form reactive metabolites which can cause extensive damage by bonding with cellular macromolecules including DNA, RNA and proteins (Beyer et al., 2010). In animals lacking a highly developed repair system such processes can eventually result in mutagenesis, teratogenesis, and carcinogenesis (Tuvikene, 1995). Species with low transformation capability such as invertebrate filter feeders, in particular the blue mussel Mytilus spp., results in the bioaccumulation and bioconcentration of PAH compounds making them a useful indicator of environmental PAH contamination through direct PAH measurement (Beyer et al., 2010). On the other hand, fish and other vertebrates generally accumulate low PAH levels but significantly increased PAH metabolites (Budzinski et al., 2004). The measurement of PAH metabolites in fish bile to quantify PAH exposure has been used and reviewed extensively by Beyer et al. (2010) and Van der Oost et al. (2003).
Whole organism assays are generally simple to perform, easily standardised and well validated (Law et al., 2011). Species applicable to rapid assessment, are often from low trophic levels such as planktonic species. Such species can provide insight into the level of exposure depending on their sensitivity although short term acute tests have been criticised for their lack of ecological evidence due to the challenge of linking individual effects to population and community effects (Forbes et al., 2006). Larger organisms can be used, including fish species, however, they are less practical and their use may present ethical issues. Survival assays can be conducted using laboratory dosing of selected laboratory-prepared oil components or can be conducted using field-collected water or sediment samples. For example, mysid shrimp (Americamysis bahia) and inland silversides (Menidia beryllina) were used in the follow up to Deepwater Horizon to assess the toxicity of field-collected water samples. The samples represented the highest 5% of concentrations compared to more than ten thousand samples collected from the spill site between May and December 2010. Although, mysid shrimp were more sensitive than inland silversides, minimal toxicity was detected for either species prior to the well closure and decreased to no mortality following the well closure (Echols et al., 2015). This finding is supported by Hemmer et al. (2011) who reported the LC50 of mysid shrimp and inland silversides as 2.7 mg L À1 and 3.5 mg L À1 , respectively, when exposed to Louisiana Sweet Crude water accommodated fraction (WAF) mixtures, which  Artemia franciscana 24 h assay measuring LC50 in brine shrimp Artemia franciscana. Sensitive to toxins produced by freshwater and marine microalgae. Can be used in salinities of 10e35‰ (Walker, 2001).
Commercially available well validated test which is used in routine monitoring (Rojo-Nieto et al., 2012).
Commercially available well validated test which is used in routine monitoring. ISO 19820. 3 Fish embryo larval screen Survival/ Teratogenicity 96 h whole organism assay identifying embryo and larval mortality following egg fertilisation. Heartrate, hatching, length/ growth rate and mortality are monitored (Cunha et al., 2017) Well validated and used in routine monitoring however ideally requires specialist lab with brood stock.  (Gabrielson et al., 2003).
Commercially available routine laboratory test with high sensitivity to crude oil. Gives a holistic image of the impact to microbiota. Has been tested in raw waters, industrial effluents, sewage sludge and soil leachates. 3

Mutatox Genotoxicity
Aliivibrio fischeri 24 h fluorescence assay. Uses a mutant strain (Aliivibrio fischeri M169) which has a gap in the regulatory system coding for the lux genes. Light production is restored by a low concentration of mutagens. Different types of mutagens can be detected (base substitution, insertions, deletions, DNA synthesis inhibition or DNA damage) (Klamer et al., 1997).
Well validated and sold as a commercial kit. Similar sensitivity to highly validated Ames test which uses freshwater bacteria Salmonella typhimurium. is significantly higher than the concentrations measured following the Deepwater Horizon spill (OSAT, 2010). These data illustrate how the results from laboratory exposures may be used in conjunction with field acquired samples to assess risk. Commercially applied assays range from primary producer, algal growth inhibition (Skeletonema costatum), to acute toxicity assays measuring the survival of primary consumers and pelagic zooplankton for example, brine shrimp (Artemia franciscana), marine rotifer (Branchionus plictalis) (Kokkali and Van Delft, 2014;Wadhia and Thompson, 2007) and copepod (Tisbe battagliai) (Law et al., 2014). Species should be selected carefully based on the environment being assessed; freshwater endpoint assays have retained, the more permeable and therefore the more damaged the lysosomal membrane. Determined by photometric measurement (Borenfreund and Puerner, 1985).
Requires tissue or blood sample from living organism.
EROD activity Detoxification Rainbow trout, dab, zebra fish, The measure of EROD, a P450 monooxygenase 1A (CYP1A) dependant enzyme, activity in in-vitro cell lines. EROD activity is measured using fluorescence to indicate CYP1A induction which can be indicative of carcinogenesis or cellular toxicity. Uses rainbow trout cell lines (Bartram et al., 2012).
Well validated and used in routine monitoring. Less sensitive than other assays but may be more environmentally relevant as cells originate from an organism with biotransformation capabilities (Brack et al., 2016).
2 Alamar blue Cell viability Active ingredient resazurin permeates into the cell, once in the cytoplasm the blue dye (resazurin) is reduced to resorufin which is red in colour. Healthy cells continually reduce resazurin to resorufin resulting in red fluorescent cells. Can be carried out both in a tube or in microtiter plates (Magnani and Bettini, 2000). Rapid colorimetric assay indicating antioxidant enzyme activity. Requires sample from organism e.g. mussel gill tissue (Cohen et al., 1970;Owens and Belcher, 1965;Sun and Zigman, 1978).
Well validated and frequently used in ecotoxicology. Tissue or blood sample from organism, some elements of sample and test preparation are technically challenging. 2 a Validation: 3, well validated and sold as commercial kit, frequently used in monitoring; 2, used in some monitoring, less well validated; 1, used primarily for research purposes, has potential to be used in monitoring with better validation.
previously been used as a proxy for marine environments when marine data is unavailable. However, freshwater assays have been found to be less sensitive than marine assays. This difference is considered to be resultant of physiological differences caused by a saline habitat (Young and Schmidt-Nielsen, 1985) which can change the speciation and bioavailability of toxicants (Heugens et al., 2001;Rocha et al., 2016;Rodrigues et al., 2014). One limitation of commercially available survival assays is that they are typically carried out in open multi-well plates, which result in the loss of volatile compounds such as low molecular weight PAHs and BTEX chemicals (Smith et al., 2013(Smith et al., , 2010a. Methods may be adapted to allow passive dosing in closed vials to prevent the release of volatile organic compounds (VOCs)(e.g. Rojo-Nieto et al., 2012) (see section 'Current issues and recent advances').
Whole organism survival assays are not always practical in rapid assessment due to the animal husbandry and large systems required. In order to overcome this, a number of embryo larval acute toxicity tests have been developed as indicators of vertebrate impacts. New, sub-lethal, embryo-larval assays based on wild type or transgenic fish cell lines are being developed but are currently time and sample intensive (Brack et al., 2016). Current screening methods include freshwater species such as: transgenic cyp19a1b-GFP zebrafish (Danio rerio) which causes brain tissue to fluoresce in response to estrogens ; spg1-gfp medaka (Oryzias latipes) which induces fluorescence in the kidneys in response to androgens (S ebillot et al., 2014), in turn acting as indicators of endocrine disruption. The assessment of multiple other transgenic indicators, including heat shock proteins and indicators of oxidative stress, are reviewed in detail by Lee et al. (2015).
Technical developments are focused on increasing throughput in future assays (Pardo-Martin et al., 2010). In the meantime marine invertebrate embryo assays, including oysters, worms and sea urchins, provide sensitive indicators of early life stage effects. For example oyster (Crassostrea gigas) exposed to phenanthrene have shown impact on development when exposed to concentrations of 0.02 mg L À1 while concentrations of 2.0 mg L À1 decreased development, shell size and caused morphological defects to the mantle and shell formation (Nogueira et al., 2017). These assays provide a sensitive indicator of toxicity allowing the assessment of early life stage sublethal and lethal endpoints through physiological and biochemical means.

Single-celled organisms
Some dispersing products can inhibit the growth of oil degrading microorganisms, preventing the microbial degradation the dispersants are trying to promote (Rahsepar et al., 2016). Bacteria are effective indicators of BTEX and PAH metabolism and are commonly used in toxicity testing (Close et al., 2012;King et al., 1996). A number of bacterial tests have been validated for rapid assessment of water quality. Two examples are: Aliivibrio fischeri growth inhibition, a 15 min assay which uses the natural bioluminescent properties of the bacteria to show a population decrease in response to toxicants and; the Microbial Assay for Risk Assessment (MARA), a multi-species test with 11 species of bacteria and one yeast that uses the same natural properties to indicate community damage. The most well validated version of the A. fischeri growth inhibition test is Microtox™; a commercially available kit sensitive to over 2700 chemical compounds and recommended for toxicological assessment (Hsieh et al., 2004;Kralj et al., 2007;Van der Grinten et al., 2010). However, the large initial investment and bespoke equipment required make it difficult to apply to widescale assessment (Liu et al., 2018). Conversely, the MARA introduces the concept of investigating community effects. It is a cost effective, simple test which is especially useful for toxicological screening as it has potential to create unique fingerprints of toxicity (Gabrielson et al., 2003).
Recent advances have investigated bioluminescent nanopaper, A. fischeri immobilised on bacterial cellulose nanopaper, as a low cost alternative for toxicological assessment. Early results show comparable but lower sensitivity than Microtox™. Inter-assay variances, likely caused by variations between batches of A. fischeri or assay conditions (Liu et al., 2018), currently prevent this method from being recommended for routine monitoring. Further method development and validation of this simple, fast, non-invasive and sensitive technique could make this a highly applicable tool in future (Liu et al., 2018).

Sublethal assessment
Sublethal assays can provide an indication of the mode of action of contaminants and the potential population and community effects (Martínez-G omez et al., 2010). Knowing the mechanism of toxicity is not a regulatory requirement, so long as an effect has been shown. However, identifying the mechanism of toxicity may help to identify the compound or mixture causing the toxicity (Hylland et al., 2017). Sublethal assessment can be a useful tool to indicate damage in samples with low chemical concentrations (Brack et al., 2016). It is not the aim of this review to describe and evaluate all of the available tests; this section will evaluate a selection of sublethal endpoints, chosen based on the effects discussed below, which may be applicable to the rapid assessment of oil spills as biomarkers of oil toxicity.

In-vivo versus in-vitro bioassays
There are significant differences between the response of cells and enzymes in in-vitro test systems compared to in-vivo bioassays (Martínez-G omez et al., 2010). In-vivo bioassays typically use tissue samples from organisms which have been exposed to a toxicant or environmental stimulus and can be timely as animals must be exposed prior to analysis. However, dosing and validating concentrations may be simpler to control during whole organism exposures for in-vivo bioassays compared to off the shelf in-vitro assay kits which typically use plastic well plates. For the most relevant estimate of spill effects bioassays typically use marine species such as mussels (Mytilus edulis), oysters (Crassostrea gigas), sea urchins (Paracentrotus lividus) and polychaete worms (Arenicola marina). The above are well validated for marine toxicological assessment however, the selection of species can be adapted to the geographical location with the addition of highly characterised local species (Martínez-G omez et al., 2010). It is also possible to assess in-vivo effects in a number of other species including fish (Budzinski et al., 2004;Patel et al., 2006;Pollino and Holdway, 2002), however these are less applicable to rapid assessment due to the additional challenge of maintaining populations within the laboratory and the ethical implications associated with testing vertebrate species.
In-vitro kits are valuable in routine and rapid monitoring as they do not require tissue samples; allowing faster throughput. The applicability of some tests to routine monitoring is limited due to the specialised facilities and costs associated with them. One example of this are CALUX (Chemical Activated Luciferase Gene Expression) assays, which are highly sensitive indicators of a number of endpoints including PAH metabolites, endocrine disruption, genotoxicity and oncogenesis (Martínez-G omez et al., 2010). CALUX are cell based assays based on human osteosarcoma cells which have been genetically modified to fluoresce which require a commercial license, preventing their adoption into some monitoring programmes and limiting their applicability to marine specific risk assessment (Brack et al., 2016). The following section describes in-vivo bioassays which may be useful for the assessment of oil spill toxicity.

Cardiotoxicity
Recent advances regarding cardiac function in marine species have focused on the mode of action and physiological impacts on fish exposed to oil derived contaminants. Studies have shown polycyclic aromatic hydrocarbons can cause cardiotoxicity in multiple ways. Certain 4-6 ringed compounds, such as benz(a)anthracene and benzo(a)pyrene, activate the aryl hydrocarbon receptor (AHR) which interferes with cardiomyocyte formation and can lead to functional cardiac defects (Incardona et al., 2011(Incardona et al., , 2006. This form of cardiotoxicity can be prevented by knock-down of AhR genes (Tiem and Di Giulio, 2011). However, AhR-independent mechanisms exist. For example, tricyclic PAH compounds, e.g. phenanthrene, have been found to cause arrhythmia and reduce cardiac cell contractility (Incardona et al., 2011). Previous work found that crude oils disrupt excitation contraction (EC) coupling pathways in fish cardiac cells (Brette et al., 2014) but further study revealed that the excitable cell pathway disruption was caused by phenanthrene at 0.9 mg L À1 . Phenanthrene inhibited mineral transport across cardiomyocyte membranes; these interruptions disturb EC coupling leading to irregular cardiac contraction and rhythm (Brette et al., 2017). The mechanisms by which hydrocarbons and crude oils cause both physical and functional deformities in fish are described in detail by Incardona (2017) and Incardona and Scholz (2016).
Cardiac function in fish embryos appears to be very sensitive to PAH contamination. Embryonic pink salmon (Oncorhynchus gorbuscha) and pacific herring (Clupea pallasii) exposed to 15 mg L À1 and 0.23 mg L À1 respectively showed reductions in juvenile growth in salmon and juvenile cardiorespiratory function and cardiac function in both species, despite few visibly malformed embryos. When compared with concentrations recorded during the herring spawning season in Prince William Sound, Alaska (following the Exxon Valdez oil spill in 1989) 108/233 water samples (46%) may have caused developmental cardiac defects, likely to reduce adult fitness (Incardona et al., 2015).
Limited information is available regarding other species although impacts have been observed in human studies investigating the impact of combustion particles (airborne particulate matter commonly containing PAHs, n-alkanes, hopanes and steranes. Javed et al. (2019) found links between environmental PAH exposure and cardiovascular disease in humans. In particular, studies suggested benzo[a]pyrene interfered with AhR-dependant mechanisms, and increased the heart to body ratios. Similarly to fish, cardio-toxic effects in humans have also been identified following exposure to pyrene, phenanthrene and benzo[e]pyrene but less is known about the mode of action and its relation to AhRregulated gene expression (Holme et al., 2019).
Thus, the assessment of cardiotoxicity is likely to provide a relevant, sensitive sublethal indicator in future monitoring programs. Recent work has focussed on the assessment of gene expression related to cardiac function (specifically, Nkx2.5) (Philibert et al., 2019), the electro-physical properties of the heart and cardiomyocytes (Brette et al., 2017), and the implications for heart development and physiology (Morris et al., 2018). Some of these methods, e.g. the assessment of the electro-physical properties, are technically challenging. In-vitro methods of assessing cardiotoxicity via AhR receptor assays such as the H4IIE-Luc have also been used to assess the toxicity of hydrocarbon toxicity (e.g. Hong et al., 2012 andKim et al., 2019); the kits use of rat hepatoma cells may limit their relevance to marine ecological assessment but also provide an indication of mammalian effects. Therefore, it seems the most appropriate method of assessing cardiotoxicity resultant of PAHs may be the manual monitoring of heartrate and heart physiology as described by Jung et al. (2013); Morris et al. (2018) and Philibert et al. (2019). Such methods may provide rapid sublethal assessment when used in conjunction with species at low trophic levels.

Detoxification responses and membrane damage
There is a lack of literature investigating marine species and intracellular indicators of BTEX and PAH toxicity (Duan et al. (2017). However, existing studies have investigated disruption of stress response pathways in a number of species. Agwuocha et al. (2013) investigated the effects of a 30 day exposure of the clam Gafrarium divaricatum to xylene (4.25 and 8.5 mg L À1 ). They reported a reduction in cell membrane permeability across different tissues, associated with the inhibition of acid phosphatase (ACP), alkaline phosphatase (ALP) and ATPase enzyme activities; and an increase in anaerobic metabolism, indicated by an increase in lactate dehydrogenase (LDH) activity. They interpreted this as a stress response linked to shell closure, a common protective response in bivalves. Interactions of PAHs with the photosynthetic apparatus have also been attributed to interference with cell membrane function and integrity in algae (Gilde and Pinckney, 2012). For example, anthracene interferes with electron transfer (Aksmann and Tukaj, 2008;Sikkema et al., 1995); such interactions may also affect organelle and chloroplast morphology, organisation and functionality (Wang and Zheng, 2008).
With regards to risk assessment, detoxification processes and membrane damage can be identified in a number of ways, some of which are described in Table 2. One commonly used indicator is lysosomal damage. Lysosomes are organelles involved in nutrition, tissue repair and the management of cellular components, and play an important role in the sequestration and detoxification of contaminants. However, if overloaded, lysosomal damage occurs, which can lead to cytotoxicity and tissue dysfunction (Moore et al., 2006) which can be easily measured using Neutral red dye. The neutral red retention assay is a miniaturised in-vivo assay which allows rapid, high throughput colorimetric indication of lysosomal health and damage caused by contaminant sequestration.
Detoxification process induction can also be identified in-vivo by monitoring cytochrome P450 (CYP) 1A1 induction, proteins involved in the metabolism of xenobiotics and endogenous compounds including PAHs (Tompkins and Wallace, 2007). The ethoxyresorufin-O-deethylase (EROD) assay (Burke and Mayer, 1974) indicates the reduction of 7-ethoxyresorufin (7-ER) to resorufin, a reaction catalysed by CYP 1A1 that can be measured fluorometrically (Rodrigues and Prough, 1991;Safe et al., 1989). Cytochrome P450 activity is calculated as a proportion of the rate of 7-ER reduction (Petrulis et al., 2001). The assay provides a quantifiable measure of EROD activity, and therefore detoxification pathway induction, allowing comparison between exposed organisms. Detoxification is an important toxicological indicator, as if protective detoxification and sequestration pathways become overrun genotoxic and cellular damage may occur. However, such techniques are unlikely to be applicable to rapid assessment due to the lack of in-vitro tests available and challenges associated with maintaining animals within the laboratory.

Oxidative stress
Oxidative stress can result in toxic and/or mutagenic effects following DNA base modifications, strand breaks, and the formation of lesions on the DNA of genes encoding antioxidant enzymes, stress-response genes, and cytokines (Girard and Boiteux, 1997). The regulatory mechanisms that control these responses are complex, involving activation of transcription factors and signal transduction pathways that are cell type and species specific. It is not surprising therefore that the nature of the responses reported are varied. For example, exposure of freshwater catfish (Clarias gariepinus) to xylene, benzene and toluene for four days inhibited the activity of catalase (CAT) in the liver and gills (Otitoloju and Olagoke, 2011), whilst superoxide dismutase (SOD), a free-radical scavenger, was induced in the muscle and liver of mudskipper at 84.9 mg L À1 after a 3 day exposure to p-xylene (Duan et al., 2017).
Glutathione-S-transferase (GST), an enzyme involved in oxy-radical defence, was particularly sensitive, being significantly induced in male amphipods (Gammarus locusta) exposed to 19e300 mg L À1 pxylene (Neuparth et al., 2014). Notwithstanding the induction of defence mechanisms, oxidative damage at the cellular level (lipid peroxidation) was identified in both fish (Clarias gariepinus) and amphipods (Gammarus locusta) following exposure to xylene; fish also showed sensitivity to benzene, toluene and crude oil (Neuparth et al., 2014;Otitoloju and Olagoke, 2011).
Food contaminated by PAHs poses a significant risk to consumers as some organisms concentrate contaminants within tissues. PAH-contaminated feed led to oxidative stress responses, elevated ROS production and induction of CYP1A in starfish (Asterias rubens) (Bucheli and Fent, 1995). Vertebrates, such as fish, are able to biotransform PAHs using cytochrome P450 dependant monoxidase enzymes which oxidise the parent compound to create polar metabolites, giving them a lower bio concentration factor than invertebrates (Varanasi, 1989). Despite this, high molecular weight hydrocarbons, such as benzo[a]pyrene, have been shown to cause carcinogenesis through the formation of DNA adducts with dihydrodiol epoxides which have not been detoxified in the biotransformation process (Beiras, 2018).
Antioxidant enzyme activities allow the assessment of specific ROS responses: superoxide dismutase (SOD) converts superoxide (O 2 À ) into hydrogen peroxide (H 2 O 2 ) (Otitoloju and Olagoke, 2011); catalases (CATs) facilitate the reduction of H 2 O 2 into water and oxygen (Martínez-G omez et al., 2006). Some enzymes directly protect other intracellular functions: glutathione reductase (GR) maintains the homeostasis of glutathione (GSH) and oxidised glutathione (GSSG) enzymes involved in the metabolism of electrophilic compounds in oxidative stress conditions (Winston and Di Giulio, 1991).
It is possible to assess total antioxidant activity (excluding glutathione) using a colorimetric ferric reducing antioxidant power (FRAP) assay (Halliwell and Gutteridge, 2015). Total oxyradical scavenging capacity is often recommended to allow a more robust assessment of oxidative stress as anti-oxidant enzymes respond differently to different chemicals (Martínez-G omez et al., 2006). However, in the case of oil spill assessment specific endpoints, used in conjunction with total enzyme response, may act as a useful way of fingerprinting the type of damage compounds could be causing. There are numerous endpoints which could be considered as indicators of oxidative following an oil exposure, which, unfortunately, it is not possible to discuss in detail here. Each endpoints relevance is species and location specific and so any selection should be chosen with these factors in mind. For example, lipid peroxidation and (Ca 2þ , Mg 2þ )-ATPase activity significantly increased in corals following exposure to oil and a bacterial consortium (Fragoso Ados Santos et al., 2015); as the Ca 2þ -ATPase pump has been linked to membrane damage resultant in coral bleaching (Sandeman, 2008(Sandeman, , 2006) Ca 2þ -ATPase may be more appropriate to the assessment of oil spills impacting tropical regions.

Genotoxicity
Genotoxicity has been identified following exposure to both oil constituents alone and dispersants. Genotoxic damage is either repaired by the cell or leads to apoptosis; uncontrolled DNA damage can lead to cancer (Prasse et al., 2015). Which suggests the assessment of genotoxicity may act as a precursor of carcinogenesis. PAHs commonly cause genotoxicity in plants and algae: El-Sheekh et al. (2000) found that algae exhibit lower DNA, RNA and protein content when exposed to hydrocarbon contamination. In addition, a PAH mixture (fluoranthene, pyrene and benzo[a]pyrene) supressed gene expression for photosynthetic pigments and silica-associated proteins that prevent cell division in diatoms (Bopp and Lettieri, 2007). Bagchi et al. (1998) hypothesise that genotoxicity resultant of PAH contamination is due to DNA damage caused by ROS formed during hydrocarbon degradation.
By using a selection of assays and endpoints it is possible to identify the genotoxic potential of samples (Brack et al., 2016). There are a number of genotoxicity assays available (see Table 2 and Brack et al. (2016) for a more detailed review). However, some of methods are technically challenging and time-consuming. Such examples include the widely employed comet assay, which detects and quantifies DNA strand breaks in individual cells (Singh et al., 1988) and the micronucleus assay, which assesses micronuclei formation containing part or whole chromosomes (Countryman and Heddle, 1976).
Tests applicable to rapid risk assessment are generally bacteria based, although few are marine specific. The umuC uses a fluorescence based bioreporter strain of Salmonella sp. to indicate the upregulation of umuC operon, part of the SOS response (involving multiple gene induction encoding proteins for DNA protection, repair, replication, mutagenesis and metabolism (Janion, 2008) to DNA damage, regulating DNA repair (Oda et al., 1985). However, the umuC assay may not be the most effective indicator of marine systems due to its use of halo-sensitive bacteria Salmonella sp. (Podg orska and We,grzyn, 2007). In this case Mutatox®, an assay using a mutated dark strain of luminescent marine bacteria Aliivibrio fischeri, may be more applicable to marine rapid assessment. Light production is restored by low concentrations of mutagens allowing the sensitive assessment of genotoxicity. Mutatox® is well validated having been used in a number of studies and has been reported as having similar sensitivity to the popular Ames test, a reverse mutation assay using freshwater bacterium Salmonella typhimurium (Podg orska and We,grzyn, 2007). In addition, as Mutatox® is available as a kit and uses a widely abundant marine species it may be easily applicable to rapid risk assessment.
Endocrine disruption to androgen and/or estrogen receptors has been identified following PAH exposure in human osteosarcoma cells, mice (Arcaro et al., 1999;Chaloupka et al., 1993) and following PAH and dispersant exposure using in-vitro bioassays (Judson et al., 2010;Schultz and Sinks, 2002). In-vitro endocrine disruption kits have been shown to correlate well with in-vivo endocrine endpoints (Sonneveld et al., 2006). Such kits include the yeast estrogen screen and yeast androgen screens (YES/YAS), described in Table 2. YES and YAS are simple, cost effective, high throughput tools which measure exposure to estrogenic and androgenic compounds. The tests are highly validated and recommended for the assessment of oil spills (Martínez-G omez et al., 2010).
Yeast assays have been criticised for being less sensitive than mammalian cell based assays (Leusch et al., 2014). However, interspecific differences have been identified in the estrogen receptor subtypes of fish and humans (Cosnefroy et al., 2012;Ihara et al., 2015) suggesting that, although more sensitive, human cell lines may not be a reliable proxy of damage to marine species. Thus, yeast based assays may provide a useful measure of toxicity until a wide variety of marine specific indicators of toxicity are available. In addition, it should be considered that endocrine disruption may not cause permanent impacts and therefore may only provide insight as to the health of the organism rather than long term effects. Further work should be conducted to identify potential marinespecific indicators of endocrine disruption and assess the applicability of yeast based assays to marine studies before they are incorporated into any marine monitoring program.
Not all biomarker species will be appropriate to rapid assessment due to the extensive and often lengthy exposures required to assess impacts at environmentally relevant concentrations. Further, the endpoints considered by these rapid tests may not represent the most sensitive targets for oil exposed intact animals as they were not developed with crude oil contamination in mind. Future studies should focus on collating baseline toxicity data, for both individual hydrocarbons, oil as a whole and their metabolites, using environmentally relevant and standardised conditions from a number of species from different trophic levels which can be compared against rapid assessment data. In addition, studies should consider and aim to prevent variances or biases in sensitivity which may occur resultant of species seasonality, age or sex (Finch and Stubblefield, 2016;Harris et al., 2014;Tannenbaum et al., 2019). The possibility of more marine and oil specific assays should be investigated and developed to aid future assessment. Until then, the potential for a selection of toxkits and biomarker species appropriate for a rapid oil specific monitoring program should be investigated.

Standardised dosing techniques
Laboratory toxicity testing is essential to our ability to make decisions about oil spills. However, the complex physiochemical properties of oil, for example its lack of solubility and high volatility, and the changes which occur following dispersant application make it difficult to reach and maintain target concentrations within test systems (Bragin et al., 2016). Variable study designs also limits the comparability, reliability and applicability of a number of studies to modelling which relies on the accuracy of defined concentrations (McGrath et al., 2018). The Chemical Response to Oil Spills Environmental Research Forum (CROSERF) have developed standardised methods for the creation of water accommodated fractions (WAFs) and chemically enhanced water accommodated fractions (CEWAFs) to minimise this variability in testing (Redman and Parkerton, 2015). In addition, careful experimental designs using closed systems have effectively minimised loss of volatile and semi-volatile compounds (Mayer et al., 1999). However, these methods may not be applicable to chronic exposures, which often require additional culture medium to maintain test organisms (Jensen et al., 2008).
Passive dosing offers an alternative dosing method that can buffer and therefore prevent compound loss in small, large, static and flow through test systems. A simplified version of the dosing system is shown in Fig. 4. Briefly, hydrophobic organic carbons (HOCs) are partitioned from a high concentration methanol solution into a biocompatible polymer, most commonly silicone. When the polymer is added to test media, the HOCs partition from the polymer into the test media and form an equilibrium. Continued partitioning compensates for compound losses and helps to maintain HOC exposure concentrations (Smith et al., 2013(Smith et al., , 2010a. Passive dosing has been used successfully (±20% of nominal concentration) in acute toxicity tests in closed vessels, most commonly around 20 mL, with algae (Raphidocelis subcapitata and Scenedesmus Vacuolatus) (Bandow et al., 2009;Bragin et al., 2016;Mayer et al., 1999), water flea (Daphnia magna) (Smith et al., 2010b), springtail arthropods (Folsomia candida) (Mayer and Holmstrup, 2008), and bacteria (Aliivibrio fischeri) (Brown et al., 2001). The technique has also been used to dose test systems of various sizes with flow through chambers, of 130e750 mL and dosing chambers between 1 and 2 L, allowing the analysis of chronic exposures. Chronic exposure studies include zebrafish (Danio rerio) (Butler et al., 2013), shrimp (Americamysis bahia)  and corals (Porites divaricata) . Phenanthrene, anthracene, fluoranthene, and pyrene have been dosed into open polystyrene 24 well plates for in-vitro cell culture assays. However, the use of an open test system was less successful than sealed closed vials with an unacceptable (>20%) loss of more volatile compounds such as naphthalene (Smith et al., 2013(Smith et al., , 2010a. In addition, the use of polystyrene vessels may have altered the composition of the hydrocarbons due to interactions between oil and plastics (Rochman et al., 2013). Thus, the use of more easily sealed and more inert materials should continue being explored for lower molecular weight compounds.
The technique has been used most commonly with PAHs containing 1e6 benzene rings e.g. toluene, 1-methylnapthalene, phenanthrene, benzo[a]anthracene, benzo[a]pyrene and trichlorobenzene (Bandow et al., 2009;Mayer et al., 1999;Smith et al., 2010a). However, recent work by Bera et al. (2018) has shown passive dosing yields comparable results to WAF preparations. There are two main differences between WAF and passively dosed systems: passively dosed systems limit exposure to oil droplets and higher molecular weight molecules partition more slowly and take longer to reach equilibrium. Thus, it is possible to reach test concentrations with WAFs faster than with passive dosing due to faster dissolution of oil however, concentrations within passively dosed systems are more stable as they prevent the formation of oil droplets. As the composition, amount, and stability of oil droplets are specific to the oil and dosing method used their presence can confound toxicity studies both by directly oiling organisms and complicating exposures through droplet dissolution (Bera et al., 2018). Although it could be argued that mixtures containing droplets may be more comparable to a spill scenario, greater stability within test concentrations is likely to improve the reproducibility and consistency of oil toxicity testing. In addition, the use of both methods in conjunction could enable the comparison of toxicity of solubilised oil in comparison to oil at varied dissolved phases.
Challenges exist with adapting current toxicity kits and biomarker assays to allow passive dosing as most developments have been made in closed test systems greater than 20 mL, whereas most toxicity assays are based on open well plate systems using smaller volumes (>1 mL). Future work should focus on continuing to validate the use of passive dosing with oil and miniaturising passive dosing. Given the variability within exposures, toxicity testing should continue to be supported by chemical analysis to confirm exposure concentrations and stability for the length of the test.
Studies supported with analytical chemistry which shows the maintenance of exposure concentrations may allow the extrapolation and comparison of studies to other species and chemicals using the target lipid model (TLM). The TLM is a model that describes the relationship between PAH and species sensitivity as a quantitative structure-activity relationship based on the assumption that narcosis occurs in the lipids of organisms (Di Toro et al., 2000). Previous work has observed that critical body burdens, the relative toxicity to organisms, increases linearly with the lipid concentrations within organisms (Van Wezef et al., 1995). The TLM uses this relationship to extrapolate and compare species sensitivities from different hydrocarbons against one another (McGrath et al., 2018). In addition, the outputs can be used as inputs to other, more detailed, risk based and ecosystem models.

Linking toxicity tests to ecosystem effects
Ecological risk assessment aims to quantify and predict the likelihood of chemicals to have adverse effects on an ecosystem (Forbes et al., 2008). In the case of oil spill risk assessment, this means quantifying and predicting the damage an oil spill, and its remediation, is likely to have on an environment. It is clear from the above discussion and previous reviews (e.g. Blaise et al., 2004) that the sensitivity of a single species can not represent an ecosystem. However, an appropriate selection of bio-indicator species and biomarkers may generate a more representative estimate of ecosystem effects. In order to estimate risk, data must be extrapolated to predict impacts at population and community level. Extrapolation from individual to population responses is challenging as the relationships and population dynamics are nonlinear (Ferson et al., 1996).
A number of models have been developed to predict oils fates following a spill, but few have investigated the ecological impact of a spill (French-McCay, 2002). For example the Spill Impact Model Application Package (SIMAP) is able to evaluate exposure considering oil and organism migration; impacts of spill mitigation measures; short term acute toxicity; indirect effects of resource destruction e.g. food source and habitats and population impacts resultant of mortality and sublethal effects. However, the model is limited as it is unable to quantify sub-lethal, chronic effects or changes in ecosystem structure and behaviour resultant of increased pressure on growth, survival, and reproductive stress (French-McCay, 2004, 2002. The progress of environmental models for oil spill risk assessment has been reviewed elsewhere (Li et al., 2016;McGrath et al., 2018;Nelson and Grubesic, 2018). Research should be directed to the collation of both whole organism and sublethal responses to oil and hydrocarbon contamination enabling the improvement of such tools and, where possible, their inclusion into a biological assessment toolkit allowing holistic risk assessment.

Conclusions and future perspectives
It is clear from the preceding sections that the complex and variable nature of oil poses a unique challenge to risk assessment. Our review of the state of science regarding oil's unique complexity; the challenges following a spill; biological impact of oil spills, and use of rapid assessment tools, including commercial toxkits and bioassays has highlighted current issues preventing effective, rapid risk assessment of oils.
The use of a selection of bio-indicator species and biomarkers covering a range of sensitivities and endpoints allows for the measurement and characterisation of such toxicity, the spatial extent of the spill and the estimation of the recovery time required and/or the effectiveness of any remediation measures (Judson et al., 2010;Martínez-G omez et al., 2010). To be of use, methods should be specifically tailored to take into account the hydrophobic and volatile nature of oil. Passive dosing is a promising method for circumventing some of this complexity, allowing for reproducible and accurate validation of candidate bioassays. When used in tandem with the target lipid model (TLM), which uses the inverse relationship between LC50 and K OW to account for the varying toxicity of individual PAHs (Di Toro et al., 2000), reliable data from individual compounds may also allow for comparison across species and trophic levels and the extrapolation of data to predict the potential impacts of oil (McGrath et al., 2018). Furthermore, robust standardised data provides validation for oil spill models allowing the more effective estimation of oil toxicity across trophic levels.
Organisms within the same trophic level can respond differently to toxicants (Codina et al., 1993), organisms within different trophic levels can exhibit varied responses (Okay et al., 2005;Ribo, 1997), and organismal physiology varies, creating a complex problem (Chapman, 2000). Suites of bioassays and biomarkers, such as those described in Table 2, applied across a range of test species, can encompass the range of sensitivities observed across ecosystems. Bioassays should ideally be simple and carried out using environmentally relevant concentrations, standardised dosing methods, and supported with chemical characterisation of both the samples and the oil. Assays applicable to passive dosing using closed test systems are of particular interest for further development. Remaining research gaps include the need for more marine test species, and further development of sublethal effects based assays, such as swimming speed and cardiotoxicity.