Best practice in Ecopath with Ecosim food-web models for ecosystem-based management

Ecopath with Ecosim (EwE) models are easier to construct and use compared to most other ecosystem modelling techniques and are therefore more widely used by more scientists and managers. This, however, creates a problem with quality assurance; to address this we provide an overview of best practices for creating Ecopath models. We describe the diagnostics that can be used to check for thermodynamic and ecological principles, and highlight principles that should be used for balancing a model. We then highlight the pitfalls when comparing Ecopath models using Ecological Network Analysis indices. For dynamic simulations in Ecosim we show the state of the art in calibrating the model by ﬁtting it to time series using a formal ﬁtting procedure and statistical goodness of ﬁt. Finally, we show how Monte Carlo simulations can be used to address uncertainty in input parameters, and we discuss the use of models in a management context, speciﬁcally using the concept of ‘key runs’ for ecosystem-based management. This novel list of best practices for EwE models will enable ecosystem managers to evaluate the goodness of ﬁt of the given EwE model to the ecosystem management question.


Introduction
Ecopath with Ecosim (EwE) is a modelling complex that has been used to create mass balanced models of marine and aquatic ecosystems since the 1980s, when the first Ecopath model of the French Frigate Shoals was created by Jeff Polovina (1984). The software and techniques have been improved to include methods of comparing ecosystems using Ecological Network Analysis (Christensen and Pauly, 1992), to model dynamic changes using Ecosim (Walters et al., 1997), to model spatial changes using Ecospace (Walters et al., 1999) and to model both temporal and spatial dynamics using (2014) few have been fitted to historical data using Ecosim (27/105) and even fewer (13/105) were actually calibrated by fitting to time series data (Table S1 in Heymans et al., 2014). Additionally, very few models have actually been used in a management context. A noteworthy exception is the model of the North Sea (Mackinson et al., 2009b), which has recently been used to establish a "key run" (see Section 8 below for a description of key runs) for the ICES Working Group on Multispecies Assessment Methods, WGSAM http://www. ices.dk/community/groups/Pages/WGSAM.aspx (ICES, 2013), and used to evaluate EU Commission proposals for multi-annual plans in the North Sea (STECF, 2015). Models have also been used in the Gulf of Alaska and Bering Sea (Aydin et al., 2007;Gaichas et al., 2011) and the Western Scotian Shelf (Araújo and Bundy, 2012), where those outputs informed the management process by evaluating previously unexplained mortality.
Because of its ease of use, both the construction and comparison of EwE models can be easily misused by inexperienced users. Ecopath models are easy to construct, with very few automatic checks, but instead requiring the understanding and application of ecological knowledge by users. Ecological Network Analysis (ENA) comparisons are similarly easily undertaken without checking the thermodynamic and mathematical rules behind the analyses (but this is true for all ecological analyses). EwE models, once balanced (whether sensibly or not) can then be used to describe dynamics in the ecosystem without forced calibrations, which again can be used for management without proper verification and validation. This easy access, with limited quality control, may have contributed to the limited utilization of EwE models for management applications.
In order for EwE models to be accepted as being rigorous and consistent enough to be used for management, guidelines are needed to establish best practices in creating and using the models. These guidelines need to take into consideration the thermodynamics and ecological rules available to users, recommended approaches to balance an Ecopath model, the best tools to fit models to time series data, and how to evaluate uncertainty. In addition, we need best practices for comparing ecosystem models of different ecosystems or different models of the same ecosystem via standardized indicators, descriptors, etc. Here, we provide an overview of the best practices for creating Ecopath models, fitting Ecosim models to time series, evaluating uncertainty, and comparing models. The contents herein can serve as guidelines for testing and reviewing EwE models, especially if they are to be used for management purposes.

Why and how to create an Ecopath model
The FAO have suggested best practices for developing models for an ecosystem approach to fisheries, including considerations for model aggregation, spatial consideration, etc. However, in this paper we are providing the best practices specifically with respect to creating, balancing, fitting and using an EwE model, after one has followed the more generalized best practice guidelines of FAO (2008).

Why this model?
The choice of modelling technique should always depend on the policy or research question that is to be addressed. The model has to be constructed with the policy or research questions clearly formulated, which will lead to using the correct modelling framework. The FAO report on models used for an ecosystem approach to fisheries (Plagányi, 2007) gives a flow chart that summarizes the models explained and where they are best used. This is a very handy way of making sure the model one uses is able to answer the question being asked. However, it is important that the newest descriptors of the techniques are used when making that decision, as the 2007 report describes the capacity of modelling techniques at that time. According to the report, EwE can be selected to: identify and quantify major energy flows in an ecosystem, describe the ecosystem resources and their interactions among species, evaluate the ecosystem effects of fishing or environmental changes, explore management policy options by incorporating economic, social and ecological considerations of fisheries, evaluate the placement and impact of marine protected areas, or predict the bioaccumulation of persistent pollutants. EwE models are also useful for testing ecosystem theories on resilience, stability and regime shifts (Pérez-España and Arreguín-Sánchez, 2001;Tomczak et al., 2013;Arreguín-Sánchez and Ruiz-Barreiro, 2014;Heymans and Tomczak, 2016).

Creating an Ecopath model
A EwE model must represent the main species and trophic levels that are present in the modelled ecosystem and are of relevance for the policy or research question that is to be addressed. The time frame and spatial extent of the EwE model depends on the questions that are to be addressed, as well as data availability. The modelled ecosystem should, as a rule, include the whole habitat area of the main species of concern. If for instance the role of a migratory species on a smaller ecosystem needs to be studied, the biomass of that migratory species would have to be forced in the model. The model will not be able to predict its biomass without information of dynamics outwith the system. The baseline Ecopath models are usually based on average parameters for a given baseline year, and thus typically average over seasonal changes in the ecosystem, although daily (Orr, 2013) and seasonal  models have also been developed. To use a EwE model for ecosystem-based management, time series fitting is required to incorporate density-dependence, which best is done by evaluating how the EwE model can reproduce historical dynamics. It follows that sufficiently long time series are required to provide contrast, and the baseline Ecopath model should thus be based on the earliest time where data is available. Temporal extents have ranged among EwE models from 10 to >50 years, depending on the degree of available data. Alternatively, a model can be constructed for the year where most data is available, and then adapted to an earlier timeframe where time series data started, as was done in several examples such as for Central North Pacific (Cox et al., 2002), the Aleutian Islands (Heymans, 2005), Newfoundland (Pitcher and Heymans, 2002), the North Sea (Mackinson and Daskalov, 2007), and the north-western Mediterranean (Coll et al., 2008). It is important, however, to consider the choice of time frame in relation to changes in ecosystem structure that should be accounted for in the dynamic behaviour of the ecosystem model. For example the behaviour might be different in one climate regime than in another. In some cases it is possible to model this change, such as was done in the Baltic Sea (Tomczak et al., 2013), the Gulf of Alaska (Heymans et al., 2007), and the northern Benguela (Heymans and Tomczak, 2016). However, it might not be possible to model this behavioural change, which will severely limit the given model's long-term predictive capabilities.
EwE models must contain at least one detritus group and one consumer group, while there in practice is no upper limits to the number of groups that can be included. Primary producers, while most often included, are not required. As an example, in deep sea models (Tecchio et al., 2013) primary producers may not be present and more than 1 detritus group might be needed to describe the system adequately, for example incorporating marine snow. Often, a combination of functionality and lack of data will require aggregation of several species into 'functional groups' to describe the ecosystem. Functional groups can be individual species or groups of species that perform a similar function in the ecosystem, i.e. have approximately the same growth rates, consumption rates, diets, habitats, and predators. They should be based on species that occupy similar niches, rather than of similar taxonomic groups. All species in a functional group should ideally show similar biomass and catch trajectories over time. In addition, functional groups might be used when a specific policy question requires parsimony. For example it might not be necessary to describe the benthic functional groups in great detail when addressing top predators such as marine mammals. Instead, representation of benthos groups might be aggregated in a simplified way. In most fished ecosystems, fisheries discards may play an important role, and therefore discards might need to be included as a separate detritus group that are eaten by species such as seabirds, crabs and mammals before the discards are broken down to detritus. If more than one detritus group is defined the detritus fate needs to be described in the model. The detritus fate from all functional groups needs to be allocated to an appropriate detritus group otherwise there will be instability when moving towards modelling temporal dynamics.
Top predators are important to constrain the parameters of other consumers in the ecosystem, and therefore it is especially important to have biomass estimates for the top predators in the ecosystem. It is also important to split species into "multi-stanza" groups where it is necessary to capture ontogenetic diet shifts and/or different exploitation patterns. If species are split into multistanza groups, estimates are needed of the age (in months) of the split between stanzas (e.g. adult and juvenile), the total mortality per stanza, the von Bertalanffy K parameter, and the estimate of weight at maturity as a fraction of weight at infinity (W mat /W inf ) (relative weight at maturity, determines size-fecundity relationship). The parameter W mat /W inf is important because it influences the degree of compensation in recruitment (i.e. how productive the juvenile stanzas are at low spawning stock biomass), which has a direct influence on recovery and depletion rates in time dynamics (Ecosim) and estimates of F msy . The reason for this is that when W mat /W inf is small, fish mature at early ages, and it is quite possible for the 1st and/or 2nd "juvenile" stanza(s) to produce enough eggs to sustain recruitment even when the "adult" F is very high. This is a classic prediction from equilibrium theory, i.e. that F msy can be infinite if size at first capture is enough larger than size at maturity for animals to have replaced themselves before becoming vulnerable. The default setting for W mat /W inf (0.09) is probably low in many cases, and a better value from the Beverton-Holt invariants would be 0.22. For each stanza, estimates of diet and predation, catches and discards also need to be included.
For each consumer functional group, consumption (Q) = production (P) + respiration (R) + egestion (E) and production a function of the natural and fishing mortality of the ecosystem, so that: where P i is the total production rate of group i, Y i is the total fishery catch rate of i, M2 i is the total predation rate for i, B i the biomass, E i the net migration rate (emigration-immigration), BA i is the biomass accumulation rate for i. EE is the ecotrophic efficiency of the group, and indicates the proportion of the production or total mortality (where P/B = Z) that is actually explained in the model. 1 − EE is the proportion of the production that remains not explained, i.e. the 'other mortality' rate for i (Christensen et al., 2008). Normally, biomass, P/B and Q/B, in addition to diets and catches, are inputs of the model, while EE is an output. However, in some cases when data on biomass are not available, EE values are set to allow the model to calculate missing parameters. When using EE as input parameters, one should consider: • Top predators that are not fished or predated upon will have an EE of 0 -indicating that none of the production of that group is consumed or fished in the ecosystem, thus a biomass estimate for such groups is always required. • EE values for primary producers are also often significantly less than 1, with only 10% of macrophytes such as kelp being consumed in some ecosystems (Mann, 1988) and phytoplankton in bloom only ∼50% is utilized outside of the microbial loop (Azam et al., 1983). • Most other functional groups should have EE values approaching, but lower than 1 if all of their production is utilized within the ecosystem that is modelled.
It is, however, not advisable to use EE values to estimate parameters of functional groups that are placed very high or very low in the food web due to the low constraints of the algorithm to properly parameterize those groups: setting EE to calculate the biomass of top predators is generally not possible, as they are not consumed within the system and therefore EwE will be unable to calculate the biomass of top predators needed in the system. Similarly, setting EE to calculate the primary producer parameters, if necessary, needs to consider that EE is generally only high (close to 1) in clearwater systems such as coral reefs or highly oligotrophic systems such as open seas, while systems that have strong seasonal patterns in productivity, such as spring blooms, tend to have EE's of 0.5 or lower.
For each functional group, estimates are required for biomass, production/biomass (P/B), consumption/biomass (Q/B) and diet, while estimates of catches and discards are needed for fished groups. The excretion/egestion rate, for which the default is 20% can vary between functional groups, and will typically be higher for groups that predominantly feed on phytoplankton. An excretion rate of 40% is thus often more realistic for zooplankton -to evaluate this, it is important to check the respiration/biomass ratio for a group, and compare this to values from physiological studies. The issue with using the default excretion rate often appears when the estimated EE for detritus (which is defined as detritus required/detritus produced) exceeds unity: Increasing the excretion rate results in more detritus being produced.
The microbial loop is important, but not always of importance for a given policy question. Where it is, it is important to consider how to model the loop explicitly, and where it is not, it can often be considered as part of the detritus/phytoplankton part of a model. It often appears in models where the primary production is entered that it is insufficient to meet the demands of the consumers. Where models, however, have explicitly represented the microbial loop, e.g. North Sea (Mackinson and Daskalov, 2007), a considerable part of the production that needs to be passed on to the higher predators can be derived through microbial pathways. When including the microbial loop, estimates of primary production are rarely exceeded. In fact, a lot of primary production is not utilized directly by zooplankton, but settles to the bottom and enters the food web through detrital pathways.
Estimates of biomass for each functional group need to reflect the biomass of the group in the whole model area, not just the area within the ecosystem that they occupy. Thus, for species that only occupy part of the area the biomass need to be pro-rated by area. Biomass estimates are calculated in weight/area, e.g. t km −2 and can be calculated through standard surveying or stock assessment methods (Hilborn and Walters, 1992;Walters and Martell, 2004).
Diet estimates for functional groups are obtained from stomach content analyses and should reflect the best available data on diets of predators by weight, not frequency of occurrence or numbers (Stergiou and Karpouzi, 2002). Alternatively, they can also be obtained from stable isotopic analyses (Ramsvatn, 2013;Deehr et al., 2014) using Bayesian isotopic mixing models. Where functional groups are used, weighted averages using consumption rates or biomass proportions of species specific diets should be used. In some cases, species might consume food outside the ecosystem (i.e. birds that migrate from sea -land, etc.) and as such this should be reflected by an import to the diet of the species. Often the best diet estimates are not available for the time frame of the initial Ecopath model, in which case it is prudent to create a balanced Ecopath model for the year where most diet data is available as a first step, before adapting that model for the timeframe needed, by assuming that changes in prey biomass could reflect change in the diet while consumer preferences would stay similar over time as was done for the Aleutian Islands (Heymans, 2005), Newfoundland (Pitcher and Heymans, 2002) and the North Sea (Mackinson and Daskalov, 2007). In addition, it is important to be aware of the issues of cannibalism, which might impact the estimation of biomass. A rule of thumb is <5% cannibalism in one group, as this causes computational problems when estimating biomass, but also improves the representation of the groups (Christensen et al., 2008). If cannibalism is a problem it is as a rule best to split the group into a multi-stanza group.
Estimates of P/B and Q/B can be obtained from laboratory experiments. However, if these are not available readily available empirical equations can be used. The P/B ratio is related to the turnover rate of a functional group and is equal to the total mortality (Z, year −1 ), where is equal to fishing mortality (F, year −1 ) plus natural mortality (M, year −1 ), all expressed as instantaneous annual rates. F is easily obtained from stock assessments or calculated as: And for fish, M may be estimated from Pauly's empirical equations (Pauly, 1980): where L ∞ and W ∞ are the length (total length in cm) and wet weight (g) at infinity for the population, K is the von Bertalanffy growth parameter (year −1 ), and T ( • C) is the mean annual water temperature at which the population is maintained. Similarly, if not estimated directly from experiments, at least for fish the Q/B ratio (year −1 ) may be estimated from the empirical ratio of Palomares and Pauly (1998): where T' is the mean annual habitat temperature for the population expressed as 1000/(T + 273.1), A is the aspect ratio of the tail, h = 1 when the fish is a herbivore and 0 when not, and d = 1 when the fish is a detritivore and 0 when not. Alternatively, Palomares and Pauly (1998) estimated the Q/B ratio as: where Z is the total mortality. The aspect ratio of the tail is a measure of the swimming and metabolic activity of the fish expressed as: where h is the height (mm) of the caudal fin, and s the surface the area (mm 2 ) of the fin, extending to the narrowest part of the caudal peduncle (Palomares and Pauly, 1998). For invertebrates, the best estimates for natural mortality and Q/B ratios are obtained from the work of Thomas Brey (1999, 2012. These can be estimated using the formulas available at http://thomas-brey. de/science/virtualhandbook, or from the allometric relationships found in Peters (1983). For seabirds the consumption rate (DR or daily rate of fish consumed in g) can be estimated from Nilsson and Nilsson (1976) as: where W is the mean body weight of the species (g). While for marine mammals the consumption rate (DR or daily rate, in kg) can be estimated from Innes et al. (1987) as: where W is the mean body weight of the species expressed in kg.

PREBAL: rules of thumb
Once the input data to a model has been collected, it is imperative that the underlying assumptions are tested. This is described by Link (2010), who proposes a set of pre-balance (PREBAL) diagnostics including checking the slopes of biomass ratios, vital rates, total production, etc. based on trophic levels. These diagnostics are based on general ecological and fisheries principles, and should be checked before a model is balanced. The PREBAL criteria include some "rules of thumb" such as the assumption that biomass estimates should span 5-7 orders of magnitude, with >7 indicating too many taxonomic/age-structured groups, and <5 indicating that the model might be too focussed on specific trophic levels (Link, 2010). In addition, the slope of the biomass (on a log scale) should decline by 5-10% across all the taxa arrayed by trophic level (Link, 2010), and groups that are above or below the slope-line should be checked for data integrity. See Link (2010) for further information.
An example of some PREBAL diagnostics for a newly published model of the West Coast of Scotland (Alexander et al., 2015) is given in Fig. 1. These easy to plot diagnostics show that biomass ( Fig. 1a) estimates of harbour seals, cetaceans, seabirds, poor cod, lobster and most of the juvenile stanzas (haddock, whiting and cod) might potentially be underestimated in this model, while that of crustaceans, mackerel, blue whiting and large demersals, might be over-estimated. As with biomass, vital rates of predators should be lower than those of their prey (Link, 2010). In the West Coast Scotland model, annual P/B ratios (Fig. 1b) for cetaceans, lobster, edible crabs and scallops seem low, while those of adult whiting, epifauna and phytoplankton are high. Similarly, annual Q/B ratios (Fig. 1c) of seals, cetaceans, seabirds, cephalopods, epifauna, infauna and small zooplankton seem very high, while that of crustaceans, horse mackerel, edible crab, rays, large demersals and monkfish, seem low. When creating a new model, these estimates should be checked and although it was undertaken to some extent in the West Coast of Scotland model (Alexander et al., 2015), further analyses and re-parameterization might still be required, especially for the top predators. PREBAL has not been used extensively although examples are now forthcoming (Lassalle et al., 2014;Zetina-Rejón et al., 2015).
A further diagnostic, the P/Q ratio or the gross food conversion efficiency, indicates that a group cannot produce more than a fraction of what it has eaten, based on the 2nd law of thermodynamics  (Link, 2010). In fact, the P/Q ratio should mostly be between 0.1 and 0.3, as described in Darwall et al. (2010). Prior to final balancing, we recommend that these rules of thumb should be checked. A new PREBAL diagnostic algorithm is currently being developed for release in a new version of EwE (Fig. 2). This diagnostic is only meant to be a first check, and cannot undertake the more in depth analyses of the ratios of small pelagics to top predators, etc. as suggested in Link (2010). These more in depth analyses should be done before the model topology is decided and before the above mentioned diagnostics are run.

Balancing Ecopath models
Once the Ecopath parameters have been entered into the software and the underlying assumptions tested with the PREBAL approach, the model requires balancing to maintain the laws of thermodynamics. When balancing the model it is imperative that thermodynamic and ecological rules are followed. Darwall et al. (2010) describes the ecological and thermodynamic rules in an easy to read box (Box 1 Darwall et al., 2010). Once the model has been balanced it might be useful to recheck the PREBAL estimations, to check for incompatible vital rates (P/B, Q/B, etc).
It is important to annotate Ecopath input data with appropriate references and to describe the origin of the data used to create the model in the model pedigree Morissette, 2007), which describes the precision of the data and sets confidence intervals to be used with this data if undertaking Monte-Carlo simulations. These estimates can be picked up by the Monte-Carlo routine to test the efficacy of the input data on the dynamic simulations in Ecosim and can be used to calculate confidence intervals in ecological indicators. Model descriptions are often provided in technical reports, where data and methods used for the model are described in detail. A good example is given in Mackinson and Daskalov (2007).

Comparing Ecopath models
Over the past 30 years, the construction and comparison of Ecopath models and the use of Ecological Network Analysis (ENA) indices to compare ecosystem models has been the mainstay of many MSc and PhD theses, as well as for a wide range of projects. This started prior to the main use of Ecopath with the work of Baird et al. (1991), who compared six marine ecosystems using ENA approaches, followed by the estimates of maturity from Christensen (1995) based on the work of Odum (1969). However, only when an Ecopath model has been properly checked with PRE-BAL and balanced should any model comparisons be attempted. As stated above, model comparisons based on ecosystem summary statistics and ENA indices are widespread in the literature. Of the 105 models used for meta-analyses in Heymans et al. (2014), 39 models used ENA indices for comparison (see Table S1 in that reference); however very few of these followed the PREBAL or balancing rules given above.
When comparing Ecopath or ENA models it is important to realize that the topology (number compartments, distribution of species, links between species, etc.) of a model is key since some indicators are dependent of the model structure (Pinnegar et al., 2005). For example, in a model where the primary producers and primary consumers are well defined (i.e. a system where the microbial loop is separated from the detritus Mackinson and Daskalov, 2007), but the top predators are aggregated, the total energy recycled through the ecosystem (Finn cycling index, Finn, 1976) would be much higher than a model with the same number of functional groups where all the primary producers are aggregated, the primary consumers are aggregated, and the top predators disaggregated as much less energy can flow between mid-trophic level and top predators that have a much lower biomass. As suggested by Pinnegar et al. (2005), ecosystem indices are very different in ecosystems that are aggregated differently, i.e. if one places emphasis on the top predators vs. mid-level species, the ecosystem indices might be very different (see Figure 3 in Pinnegar et al., 2005).
A study on the comparison of ecosystem indicators from Ecopath models found eight indicators that were robust to model construction, including the number of living groups, number of linkages or total number of functional groups . These indicators are the ratios of primary production (PP), consumption (Q) and export (Ex) to total systems throughput

Box 1: Ecological and thermodynamic rules for balancing Ecopath models (from Darwall et al., 2010).
An ecologically and thermodynamically balanced Ecopath model requires a series of logical constraints: • EE < 1.0. Ecotrophic Efficiency (EE) is a measure of the proportion of production that is utilized by the next trophic level through direct predation or fishing. The value for EE (often a calculated output of Ecopath) can never exceed 1.0 as it is not possible for more production to be passed on to the next trophic level than was originally produced-unless the population is in decline. As a guideline an EE value near to 1.0 is expected when the main part of production is consumed by predators or taken by the fishery. A value near to 0.0 is expected for a group, such as an apex predator, which suffers no predation and is not exploited by a fishery. • 0.1 < GE < 0.3. Gross food conversion efficiency (GE) in general has a value between 0.1 and 0.3. Values greater than 0.5 are not often found but may be encountered in groups such as bacteria or in specially bred farmed fish. • Net Efficiency > GE. Net Efficiency is the value for food conversion after accounting for unassimilated food (U) for which the Ecopath default value is 20%, which is most applicable for finfish (Winberg, 1956). Thus Net Efficiency = P/(P + R). GE is P/Q, and Q = P + R + U. It is therefore clear that GE can never exceed Net Efficiency. • Respiration/Assimilation Biomass (RA/AS) < 1.0. The proportion of biomass lost through respiration cannot exceed the biomass of food assimilated. As a guideline, K selected species, which are expected to invest a relatively small proportion of energy intake in somatic and gonadal tissue production are expected to have RA/AS ratios close to 1.0. In contrast, r-selected species are more likely to invest a greater proportion of energy intake into growth and reproduction resulting in an RA/AS ratio well below 1.0. • Respiration/Biomass (RA/B) indicates the ''metabolic activity level'' of a group. RA/B ratios are expected to be within 1-10 year −1 for fish and may be as high as 50-100 year −1 for groups with higher turnover such as copepods. The default value for the proportion of unassimilated food (20%) may be changed to better reflect the RA/B ratio value expected of the group in question. • Production/Respiration (P/RA) < 1.0. This ratio effectively expresses the fate of assimilated food. Odum (1969) stated that P/RA, which is typically less than 1, approaches 1 as the system matures. However, Christensen and Pauly (1993) comparing 41 Ecopath models found that P/RA ranged from 0.8 to 3.2. The high ratio values were thought to have arisen because of the omission of bacterial activity that led to an underestimation of respiration.
(TST), the total biomass of the community (TBCo), the total systems throughput, relative ascendency (A/C), relative overhead (O/C) and redundancy (R) (for definitions see Heymans et al., 2014). When comparing models of different ecosystems these indicators would be the most conservative to use, as they are mainly standardized by TST. Most other indicators would more likely indicate differences in model construction. Ecosystem models that have been constructed with similar topologies (e.g. with or without microbial loop defined, similar top predators specifications or similar multiple stanzas for one species, etc.), and are constructed for similar areas can be used for comparison of other indicators. This has been done by Heymans et al. (2007) for ecosystems in the Gulf of Alaska, for upwelling systems (Shannon et al., 2003;Heymans et al., 2004;Moloney et al., 2005) and for the northeast US large marine ecosystem (Link et al., 2008). Similarly, changes in ecosystems over time can be described by these indicators if ecosystem topologies stay constant in Ecosim (Coll et al., 2009 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Data
Original fit 2nd Yaxis biomass (tonnes) Fig. 3. Adapted from Figure 4a in Alexander et al. (2015) where grey seal biomass data (tonnes) and original fit, refers to data and model fit from that paper and 2nd y-axis biomass refers to an order of magnitude difference in biomass (tonnes).
In addition ecosystem traits such as ecosystem depth, size, and location would naturally show differences in ecological indicators, and therefore should not be used for comparison , or should at least be accounted for in any such contrast. They can be used as co-variates in a statistical analysis, for example. Before looking at parameterization of Ecosim, it is important to reflect that the parameters that define a species' productivity which are specified in Ecopath also play an important role on the models dynamic predictions. For example, a species that has a low biomass and a high P/B ratio will have the same production as one with a high biomass and low P/B ratio. The impact of a low P/B ratio on the dynamics of that species is, however, very different to that of a higher P/B ratio. As an example, in a West Coast of Scotland model (Alexander et al., 2015), the initial biomass for grey seals (6204 t), and P/B ratio (0.114 year −1 ) estimates a production of 707 t year −1 , while an order of magnitude more biomass (62,040 t) and a P/B ratio of 0.0114 year −1 will give a similar production. Thus in Ecopath the model will be balanced (even though the lower P/B is unrealistic), but in Ecosim the dynamics will be very different, and time series fitting is important to evaluate which of such parameter settings are most likely. The key to this is setting the density-dependent vulnerability parameters in Ecosim, setting the predatory prey dynamics between top down or bottom-up control between predators and prey (Ahrens et al., 2012).
In Fig. 3, the estimated biomass of grey seals on the West Coast of Scotland is plotted against the input data used to fit that model as per Figure 4a of Alexander et al. (2015). If an order of magnitude mistake was made, the resulting biomass trajectory is shown in blue (on the second y-axis). It is clear that although the model would have balanced the predicted biomass trajectory (given the same vulnerability parameters, etc.) the trajectory would have been quite different. If the wrong Ecopath biomass and P/B ratios were used, the estimated vulnerabilities, etc. would have also been different when fitting to time series.

Time series fitting in Ecosim
In order for a model to predict changes in response to temporal changes in fishing and environmental drives, its historical temporal dynamics should first be calibrated. This process compares model predictions to reference data to evaluate its performance in terms the fit to data, and importantly the credibility of its behaviour. The process of fitting the model to time series data is the part of the calibration process which estimates predator-prey parameters that influence degree of density dependence and thus, the rates of change in the biomass of each group. The second part of the calibration process is evaluating whether the parameterization lead to credible behaviour, for example in its predictions of sustainable fishing rates. Taking a pattern oriented approach to assessing the modelling skill (Grimm et al., 2005;Kramer-Schadt et al., 2007) is the best approach, seeking patterns in both biomass trends and sustainable fishing rate estimates that are reasonable (more on this later).
6.1. Special note: exploitation rate 'F' and instantaneous F The fishing mortality (F, year −1 ) in Ecosim is essentially the instantaneous annual F as traditionally defined in fisheries, i.e. by the relation C = F*B bar , where C is the annual catch and B bar is the mean (integral) of biomass over the annual period. Ecosim only converts these F's to discrete rates in the multi-stanza monthly age-structure updates, where monthly survival rates are calculated as exp(−Z/12) and Z is the sum of F's over fisheries plus natural mortality rate components. In Ecosim, catch per year is always the sum of predicted monthly catches, where catch for each month is approximated by F*B month , i.e. F times the biomass at the start of the month. In other words, there is no need to convert to discrete time exploitation rates when looking at Ecosim F values.
Furthermore, if a species is to be split into multiple stanzas, the parameterization of the multi-stanza groups should be done with care such that the degree of compensation in recruitment at low stock biomass leads to predictions of F msy (fishing mortality at maximum sustainable yield, MSY) that are at least consistent with published ranges of F msy. This check needs to be undertaken as part of the fitting process in Ecosim, and should lead to plausible Beverton-Holt type stock-recruitment functions (Walters and Martell, 2004). Otherwise there is a danger that the vulnerabilities estimated for multi-stanza groups during time series fitting can lead to overly optimistic estimation of how productive the groups are at low stock biomass, and thus vastly inflated estimates of F msy (see Walters et al., 2005;Mackinson et al., 2009b). The 'calibration' of the stock recruitment relationship of multi-stanza groups is essential for analyses seeking to address questions about MSY, but is equally important to all Ecosim work.
To fit an Ecosim model time series, data is required to: (a) affect a change in the model, and (b) to compare the modelled output to reference data using statistical diagnostics Coll et al., 2008;Mackinson, 2014). Time series data is imported into Ecosim through a comma separated value (csv) file. Data used to affect a change in the model include forced biomass, time forcing data (e.g. on primary production, or for instance temperature trends that can affect the search rate, vulnerability or feeding arena of a predator), effort by gear type, fishing mortality by group, total mortality by group, forced total mortality by group and forced catches. Data used for statistical comparative purposes include relative or absolute biomass, total mortality, catches or average weight (for species that are defined by multi-stanzas). Diet compositions have also been used to fit models to data. The forcing data affects a change in the model that is described in biomass and catch estimates. These estimates are then compared with the reference observational data using a statistical measure of goodness of fit . The quality of the time series data is important, and needs to be documented properly if the fitted model is to be used for management purposes.
The time series data can be weighted to represent the reliability of the data by giving it a weight > 0, where 0 indicates that it will not be used to calculate the SS (Christensen et al., 2008). The weighting factors can be set to reflect the reliability of the time series, e.g. by using the confidence intervals from survey estimates or from retrospective analysis of assessment models. Alternatively, the weights can be used to place more emphasis on target species for modelling. For instance, a model that is focused on cod in the Baltic may increase the weight that is placed on cod biomass. One method that can be used to determine the weighting factors is to perform a signal to noise ratio assessment for each series. The signal can be fitted using "LOcally weighted Scatterplot Smoothing" (LOESS) with the degree of smoothing required given by the optimal span determined from the bias-corrected Akaike information criterion (AICc, Akaike, 1974) following the method of Hurvich and Tsai (1998). The noise can be determined from the variance of the model residuals (var res ) and weights for EwE can be determined from the inverse of the variance (1/var res ). When fitting a model to time series data, the best practice is to use the statistical hypothesis testing method as described by Mackinson et al. (2009a) and refined in Tomczak et al. (2012), Mackinson (2014) and Alexander et al. (2015). This procedure is based on the estimation both of the sum of squares (SS) and the Akaike Information Index (AIC), which penalizes for fitting too many parameters based on the number of time series available for estimating the SS (Akaike, 1974). The AIC is defined as: where n is the number of observations, or time series values, i.e. the number of series used multiplied by the number of years for each parameter, minSS is the minimum sum of squares calculated by the algorithm, and K is the number of parameters estimated. In this instance the AICc (Burnham and Anderson, 2004) is used, to address small sample size. The AICc is defined as: The fitting procedure is based on testing statistical hypotheses in the context of fitting the model, related to the impact of fishing, changes in predator-prey dynamics, also called the vulnerability settings (Christensen and Walters, 2000), possible changes in primary production through the estimation of a forcing function on primary production specifically, or all of the above. The hypothesis that obtains the lowest AIC is therefore the hypothesis that obtains the best fit of the model to the data while using the least number of parameters to do so. The hypotheses to be tested include: (1) Baseline (model run without fishing and with no changes to vulnerabilities, or primary production anomaly); (2) Baseline and changes in vulnerabilities; (3) Baseline and changes in primary production anomalies; (4) Baseline and changes in vulnerabilities and primary production anomalies; (5) Fishing, thus including fishing but excluding any fitting to time series for vulnerabilities or primary production anomaly searches; (6) Fishing and changes in vulnerabilities; (7) Fishing and changes in primary production anomalies; and (8) Fishing and changes in vulnerabilities and primary production anomalies.
The number of parameters that can be estimated (i.e. the number of changes in vulnerabilities, or changes in primary production anomalies, or changes in both) depend on the number of time series available to calculate the SS. The number of possible parameters that can be estimated is often significantly more than the data available to constrain the model, thus it is imperative to test the parameters that affect the most detectable change (e.g. Mackinson, 2014). To do so, an algorithm is used to calculate which of the vulnerability parameters creates the largest change of the SS, and the most sensitive vulnerability parameters are used. In addition, the number of primary production anomalies must be ≥2 as a spline point of 1 reverts to estimating a primary production estimate for each year of the simulation. It is also advisable not to estimate more primary production anomaly spline points than the number of time series you have to constrain your model. The AIC assumes that all data points are independent. In most cases fisheries data are not independent, thus as a conservative rule of thumb the total number of parameters estimated should therefore be K − 1. Thus, for example in the West Coast of Scotland model (Alexander et al., 2015), there were 48 time series of catch or biomass used for estimating the SS, and therefore 47 parameters were estimated, either as vulnerabilities or primary production spline points (up to the number of years in the time series). Thus in this case 2-23 spline points or 47 vulnerabilities could be tested, for a possible 1756 combinations. Alexander et al. (2015) tested each 5th combination (i.e. 5, 10, 15 vulnerabilities, 5, 10 15 spline points, etc.) to reduce the repetitive nature of this task. However, an automated routine to run through these options would be preferable and is currently being programmed as a plug-in to the EwE desktop software (Steenbeek et al., 2015, Scott et al., submitted). It is expected that this routine will be released with the next version of the software. A full description of the methodology for fitting a model is given in the Ecopath wiki: Fit to time series (http:// sources.ecopath.org/trac/Ecopath/wiki/EwEugFitToTimeSeries).
6.2. Special note: the model fits the data, but is its behaviour credible?
It's not uncommon that the estimated vulnerability parameters that provide the best fit of model predictions to observation data can lead to unrealistically high estimates of F msy . When Ecosim estimates low vulnerabilities, this implies high productivity at low biomass and potentially high estimates of F msy that would be not be considered either feasible, or credible based on other assessments and history of F rates that have led to stock depletions in the past. This presents a dilemma to the modeller -pretty fit or credible model behaviour? Taking a pattern oriented approach, the best method is to find the balance between credible fit and behaviour (see for example, Mackinson, 2014). As already noted, particular attention needs to be paid to multi-stanza groups where the connection between the vulnerability estimated for each stanza's (as well as their other parameters mentioned earlier) have a strong influence on the productivity and estimates of sustainable fishing rates.

Addressing uncertainty in input data
A rapid assessment of the effect of uncertainty in Ecopath input parameters on Ecosim simulations can be done by running Monte Carlo simulations in Ecosim. For example, in the West Coast of Scotland model (Alexander et al., 2015), 20 Monte Carlo runs based on a coefficient of variation (CV) of 0.1 around the input parameters for biomass, P/B, Q/B, EE, etc. gives twenty very different outcomes for grey seals (Fig. 4). The best statistical fit (lowest sum of squares) trajectory for grey seals is much higher than the best estimate based on the AICc with grey seal biomass being overestimated by the model. However, this Monte Carlo routine provides a good idea of the range of outputs available based on the uncertainty surrounding the input data, as seen by the 5th and 95th percentile values plotted (Fig. 4). The CV of input parameters can be set in the Monte Carlo routine or can be obtained from the quality of the input data given in the pedigree routine. The Monte Carlo routine also runs in a new "ecological indicators" plug in that will be release in the new version of software, enabling to assess the impact of input uncertainty in some output results of Ecopath and of Ecosim .
EwE users have recently developed several sophisticated routines for considering model parameter uncertainty, enabling representation of multiple possible representations of the modelled ecosystem and assessment of the impact of this performance model simulation (e.g. Gaichas et al., 2012;Prato et al., 2014;Guesnet et al., 2015;Platts and Mackinson, 2015). A new (soon to be released) plugin enables users to create alternative Ecopath models with associated Ecosim parameters set, and evaluate the performance of management strategies when this uncertainty is accounted for .

Quality assurance for EwE model applications to management: key runs
Only once a model has been evaluated for the quality of the input data through PREBAL and other diagnostics, the balance of the model has been evaluated through testing the thermodynamic and ecological rules, the model has been calibrated to the appropriate time series data, and the effect of parameter uncertainty examined should the model be considered for use in management purposes.
In Europe, the concept of a 'Key run' model (ICES, 2012(ICES, , 2013 is being used as part of quality assurance procedures (ICES, 2013) for assessing models being (or intended to be) used in provision of advice for management. The term Ecopath 'Keyrun' originated from earlier ICES multispecies working groups (ICES, 2009) as a way to facilitate a common understanding among modellers using multispecies models. These have typically been used for multispecies models and EwE models. A 'key-run' refers to a model parameterization and output that is accepted as a standard, reviewed and published by ICES Working Group on Multispecies Assessment Methods (WGSAM, http://www.ices. dk/community/groups/Pages/WGSAM.aspx), and thus serves as a quality assured source for scientific input to ICES advice. WGSAM described what this standard should look like for different models as part of their terms of reference in 2013. In general terms, a key run describes and makes accessible the model outputs for a case study of a particular ecosystem. It contains consistent outputs, together with documentation and information on inputs. ICES (2012) defines that the prime purposes of a key run include: (a) Demonstrating the utility of a particular model formulation in a controlled environment and thereby building confidence that this formulation is appropriate to use in providing advice. (b) Assisting with the development of multi-model approaches by providing a "standard" setup to aid understanding of different model frameworks, and a worked example of the results that can potentially emerge.
Key runs are typically run every three years, or alternatively, when a substantive change is made to the model parameters, when sufficient new data becomes available, or when the previous keyrun is deemed out of date. Thus, they are run regularly, but not at a high frequency (i.e. annually) to avoid any spurious changes attributed to simply just another year of data. A detailed example of the North Sea EwE model 'key run' is presented in Annex 5 of ICES (2012) and the forthcoming ICES report (ICES, 2016). The foundation of a EwE 'key run' is a published and time series fitted Ecosim model. Ongoing quality assurance work in ecosystem models in ICES WGSAM is helping to define protocols and give examples. In the future, there are aims to define and provide examples of a 'key run' spatial model -where a calibrated Ecospace representation is developed using the time series fitted Ecosim model and the spatial-temporal data framework (Steenbeek et al., 2013).

Conclusions
Ecosystem-based management (EBM) has now been written into policy in Australia, Europe, the USA and other countries. The requirement for executing EBM has led to a large number of ecosystem models being developed. During the past 30 years EwE has been continuously updated and extended, and this paper is a natural extension of codifying it as an important tool for ecosystem-based management. EwE models are easy to construct and manipulate, but we need to raise the standards of use for these models in a management context. The same applies when using these models to explore ecosystem theory. To use ecosystem models, one needs to have confidence in their ability to be executed correctly. To be executed correctly, one needs to understand the uncertainty in the input data, and be able to assess the confidence in the model outputs. This means taking accounting of model parameter uncertainty and undertaking specific performance evaluations. We describe a suite of best practices that should contribute to helping ensure that EwE models are able to be used for ecosystem-based management. We trust that what we have provided here is a useful step in that direction. These best practices will help with the uptake and utilization of model outputs and increase the credibility of EwE models for wider application.
EwE models cannot, and should not be used as the only tool for EBM. In many cases the quality of the data is just not sufficient or the questions being asked are not sufficiently related to predator-prey dynamics underlying this model. However, in many instances where EwE is an appropriate model, a clear expectation of reviewers, managers and stakeholders should be that the models presented have met the standards described above. We see EwE as being one of a suite of tools useful for the development of multimodel ensemble predictions of ecosystem responses to changes, where we seek to establish robust results from a suite of models (Fulton and Smith, 2004).
We realize that these guidelines for best practice may be challenging for some applications, and we sympathize with systems where data quality and quantity is insufficient. In these instances, using EwE models to describe the most critical data gaps, to design research strategies, as part of adaptive management and to test uncertainties in unknown parameters are still valid and useful. However, without best practice and without a clear understanding of the associated uncertainty, it should be expected that management interventions may not yield the desired outcomes.