Maximum entropy modelling to identify optimal locations for an IMTA system comprising Sparus aurata, Mytilus galloprovincialis and Ulva rigida on Europe ’ s Atlantic coastline

– Meeting the resource demand for an increasing human population has led to the emergence of the aquaculture industry as the fastest growing sector for food production worldwide. Modern ﬁ n ﬁ sh aquaculture has raised environmental concerns and, to address this, Integrated Multi-Trophic Aquaculture (IMTA) has gained popularity as a means to minimise environmental impacts. This is done by culturing extractive species alongside fed species to utilise excess nutrients and enhance their own growth. The current study, based within the Atlantic Area of Europe, identi ﬁ ed suitable habitats for the three species Sparus aurata, Mytilus galloprovincialis and Ulva rigida , for use in a new IMTA system. Models were created using MaxEnt software and input into GIS software (ArcMap 10.8.1) for analysis. For all species, the AUC results in the model were > 0.9, with values of 0.931 ( S. aurata ), 0.928 ( M. galloprovincialis ) and 0.939 ( U. rigida ), demonstrating signi ﬁ cant predictive power. Jackknife testing of the model for each species identi ﬁ ed the mean sea surface temperature ( ° C) and Chlorophyll A (mg m – 3 ) concentration as the two most important variables. The model showed that areas of > 50% suitability could be found throughout the study area, although the most suitable sites were in coastal areas in more southern latitudes. To identify the feasibility of establishing an IMTA system in different areas, the shipping density, MPA status and the locations of harbours were added to the maps for further consideration. Using this study, and the species-speci ﬁ c information identi ﬁ ed by the model, the aquaculture industry will be better equipped to identify potential IMTA sites and integrate these systems into the EU market for sustainable production


Introduction
The global population reached 8 billion people in 2022 and, although growth is slowing, the population is expected to peak at 10.4 billion people by 2100 (United Nations Department of Economic and Social Affairs, Population Division, 2022).The highest levels of population growth (~2% growth per year) can be seen in the poorest countries, with these countries tending to rely more heavily on aquatic foods as a source of protein than high-income countries (Ezeh et al., 2012;FAO, 2022).This increasing population has led to a higher demand for aquatic resources that wild stocks cannot sustainably meet; the proportion of wild fish stocks being fished at unsustainable levels has risen from 10% in 1974 to 35.4% in 201935.4% in (FAO, 2022)).To meet this growing demand, the aquaculture industry has become the fastest growing food production sector in the world, accounting for 49% of total aquatic animal production in 2020, with the annual output increasing by 609% between 1990 and 2020 (Troell et al., 2014;FAO, 2022).
The use of industrial farming methods in aquaculture has led to high production levels for farmed species, with annual aquaculture output reaching 122.6 million tonnes in 2020 (FAO, 2022).However, these methods have raised environmental concerns about the impact such high production will have on local environments and wild populations (Tidwell and Allan, 2001;Pillay, 2008).When culturing fish species at such high densities, nutrient pollution becomes an area of concern due to the input of uneaten food and the waste excreted by the fish.This can lead to eutrophication of the local area, increased level of pathogens and can make the area less suitable for both wild and cultured species (Folke and Kautsky, 1992;Martinez-Porchas and Martinez-Cordova, 2012).
In an effort to reduce the negative effects of aquaculture, Integrated Multi-Trophic Aquaculture (IMTA) has emerged as a more sustainable alternative.IMTA uses complementary species from multiple trophic levels and cultures them in the same locality, allowing the waste products from fed species to be used as a food source by extractive species, reducing the nutrient output to the environment (Chopin et al., 2001;Neori et al., 2004).Using an IMTA system also provides an economic benefit, as the extractive species can be sold and have been shown to have increased production levels when cultured in this way; a study examining the growth of Mytilus galloprovincialis when cultured in proximity to a fish farm found that the mussels gained a higher length, weight and biomass when compared to mussels grown away from the farms (Sarà et al., 2009).Similarly, a study of Ulva rigida production showed significantly higher growth rates when placed downstream from a fish farm, while also showing higher levels of carbohydrates, starch and soluble protein than those at a control site (Korzen et al., 2016).
Despite being practiced in Asia for centuries, IMTA has not yet been widely adopted within the European aquaculture industry (Troell et al., 2009;Kleitou et al., 2018).While the previous paper (Hughes and King, 2023) dealt with species with a more northern distribution, this study will produce habitat suitability models for commercially important southerly distributed species within the Atlantic Area of Europe, with the aim of improving the adoption of IMTA systems.The species selected for use are: Gilthead bream Sparus aurata, Mediterranean mussel Mytilus galloprovincialis and Sea lettuce Ulva rigida.Both Mytilus and Ulva species are of interest for temperate water IMTA systems due to their known habitats, commercial value, biomitigation ability and current husbandry practices (Barrington et al., 2009).The commercial value of these species is highlighted by the levels of production for each of them: Sparus aurata is the fourth most produced species in global aquaculture, with a production of 282,100 tonnes in 2020, while Mytilidae mussels were the fourth most produced mollusc group with a production of 1,108,300 tonnes (FAO, 2022).Ulva rigida was not identified as a major cultured species globally, however it has been identified as an effective bioremediator of economic interest within European countries (Marinho et al., 2013;Ara ujo et al., 2021).Additionally, the majority of seaweed aquaculture takes place in Asian countries, which provides a market opportunity for European aquaculture companies as the demand for these products is predicted to increase rapidly in the near future (Chopin, 2014;Kim et al., 2017).
This study took place within the European Union's INTERREG Atlantic Area as part of a series of papers for the INTEGRATE project.Using species occurrence data collected from the Global Biodiversity Information Facility (GBIF) and European Marine Observation and Data network (EMODnet) databases, habitat suitability models were produced for Sparus aurata, Mytilus galloprovincialis and Ulva rigida, using the Maximum Entropy (MaxEnt) modelling approach and were analysed with Geographic Information System (GIS) software, using the methodology outlined by Hughes and King (2023).This study identifies the locations that are best suited for an IMTA system using environmental conditions and additional criteria that may impact a sites feasibility.

Study site
This study was conducted within the spatial extent covered by the European Union's INTERREG Atlantic Area, as part of the INTEGRATE project.The INTEGRATE project includes the European countries that border the Atlantic Ocean: Spain, Portugal, France, Ireland, the United Kingdom and several islands (Madeira, the Canary Islands and the Azores).The geographical area used for the habitat suitability models in this study range from 36°N to 61°N latitude and 11°W to 2°E longitude, as shown in Figure 1, with the study site shaded in light blue.Although part of the INTEGRATE study area, Madeira, the Azores and the Canary Islands were excluded due to their geographic distance from the main study area and may require a separate model to examine their habitat suitability.

Data acquisition
As using too many variables can lead to overfit or overly complex models (Anderson and Gonzalez Jr, 2011), the initial selection included eight variables.Pairwise correlations were calculated for these variables, noting variables with a Pearson's |r| ≥ 0.8.For correlated variables, multiple models were run using each variable one at a time, with the variables that returned higher AUC (Area Under Curve) values in the model being retained.Removing correlated variables from the model is beneficial in order to reduce the effect of collinearity, which can cause the model to be over-fitted (Baldwin, 2009;Wei et al., 2018).After removing correlated variables, five variables were chosen for use in the model: mean Sea Surface Temperature (SST, °C), pH, Salinity (PSS), mean Current Velocity (m s -1 ) and mean Chlorophyll A concentration (mg m -3 ).These environmental variables were sourced from the BioOracle data repository (Tyberghein et al., 2012;Assis et al., 2018), found in RStudio (R Core Team, 2021).In order to determine an upper limit for temperature tolerance, a second model was also run which replaced the mean SST variable with the maximum SST (°C) in the study area.
Presence-only location data for Sparus aurata, Mytilus galloprovincialis and Ulva rigida was obtained using the GBIF and EMODnetdatabases (EMODnet Biology, 2023;GBIF, 2023a,b,c,d).As the selected environmental variables use an average value taken between 2000 and 2014, the species occurrence data was filtered to remove records from before 2000 or with no date information, in order to better reflect the current distribution of these species.Once duplicate records and records outside of the study area were removed, the final dataset used by MaxEnt to produce the model contained 256 data points (Sparus aurata -75, Mytilus galloprovincialis -73, Ulva rigida -108).

Data manipulation
Although this study uses the same methodology outlined by Hughes and King (2023), it is presented here for reader clarity.After downloading the environmental variables, RStudio was used to manipulate the data for use in the model (R Core Team, 2021; SI1).The geographical coordinate system for the data was set to WGS84 and the extent was set to the longitude and latitude defined in Figure 1.The environmental variables were cropped to this extent, and all data outside this range was excluded from the dataset by giving them an N/A value.The variables were then resampled to standardise their resolution to 30 arcseconds.As resampling can lead to changes in the spatial extent of the data, the variables were re-extended to match the study extent before being saved in an ascii format.The manipulations performed on the data were carried out using the 'raster' (Hijmans, 2021), 'sdmpredictors' (Bosch and Fernandez, 2022), 'rgdal' (Bivand et al., 2021) and 'sp' (Pebesma and Bivand, 2005;Bivand et al., 2013) packages within RStudio.
The location data for each species was merged into a spreadsheet for use in MaxEnt, and arranged using a 'species, longitude, latitude' format.

MaxEnt analysis
The environmental and species occurrence data were entered into MaxEnt, with the settings used to run the model being described in SI2.The MaxEnt models of habitat suitability were produced using tools available in the SDMToolbox package in ArcMap (Brown et al., 2017).In order to reduce the influence of spatial autocorrelation on the model results, the species distribution data for S. aurata, M. galloprovincialis and U. rigida were spatially rarefied to distances of 10 km and 20 km, with the best performing model in each case being retained (20 km for S. aurata and M. galloprovincialis, 10 km for U. rigida).Spatially rarefying data reduces multiple occurrence records to a single record within the user-specified distance and this, combined with the usage of bias files, is an effective method for reducing the effect of sampling bias in the model (Boria et al., 2014;Fourcade et al., 2014).Using the rarefied data for each species, bias files were generated with a Gaussian Kernel Density method.The sampling bias distance was set at 1 decimal degree for S. aurata, and 0.5 degrees for M. galloprovincialis and U. rigida.A separate bias file was made to address the latitudinal background selection bias in the study area; this was merged with the species bias file and used in the model.Models were run using different regularisation multipliers (RM; the multipliers used were 0.5, 1, 1.5, 2 and 2.5) and feature classes (FC) to find the best, least complex model for use.The models were first evaluated by which model had the lowest Omission Error Rate (OER), then by using the area under curve (AUC) value of the model.If models had the same OER and AUC then the model that used fewer feature classes was selected.The AUC is a value ranging from 0 to 1 that is used to assess the model's predictive accuracy, with higher values indicating better performance for the model, although the maximum AUC will always be below 1 when using presence-only occurrence data (Fielding and Bell, 1997;Phillips et al., 2006;Townsend Peterson et al., 2007).A jackknife test was performed on the models to measure individual variable importance.Prior to the jackknife test, the model analyses the contribution of individual variables through their 'permutation importance'; each variable in turn has its values randomly permutated and the decrease in model performance is given as a percentage, with the more important variables causing a larger decrease in performance (Phillips, 2017).The jackknife test is then used to determine how each variable contributes to the regularised training gain of the model by running models that exclude each variable in turn and a model that uses only the specified variable.Variables that have a higher gain in the model are more important to the final model output (Phillips, 2017).

ArcMap analysis
The MaxEnt models were examined using ArcGIS software (ArcMap 10.8.1) (ESRI, 2020), set to the WGS1984 geographic coordinate system.The 'World Countries' shapefile (Esri.World Countries, 2021) was used to produce each map.The average model output for the three species were added to ArcMap, and an average habitat suitability value across the three species was calculated using the 'Raster Calculator' tool.When calculating areas where the average suitability for all three species is >50%, if any individual species had a suitability below 50%, this pixel was classed as unsuitable for use and removed from the final output.

Other factors
After determining the average habitat suitability for the three species, several factors that may influence placement of an IMTA system were taken into account.Location data for Marine Protected Areas (MPAs) (UNEP-WCMC and IUCN, 2022) were used to determine any overlap with suitable areas.These areas were excluded from selection as establishing new aquaculture sites within a protected area is likely to be more difficult than in unprotected waters (Brown et al., 2020).Data on shipping routes was also added, to ensure an IMTA system is not established in an area that is at risk of being damaged by passing vessels (EMODnet Human Activities, 2022).For the purposes of this study, areas with a vessel density of >104 vessels per year (approximately 2 ships per week) were excluded.The model results were used to identify which areas had the most suitable habitats, as well as the largest areas of suitable habitat.After identifying these areas, harbour data was added, to establish which sites are easily accessible for use in an IMTA system (National Geospace Intelligence Agency, 2016).After applying these criteria, the remaining sites were highlighted and assessed based on their ease of access, proximity to built-up areas and the available habitat for IMTA use.In order to ensure the sites had a suitable depth to install IMTA infrastructure, Bathymetry (m) data was downloaded from MARSPEC (Sbrocco and Barber, 2013) and a minimum average depth requirement of 20 m was used to further identify the optimal location for an IMTA system.
The results of the test of permutation importance are shown in Table 1.For S. aurata and M. galloprovincialis, the variable that was most important to the model was Mean SST (50.9% and 57.4% respectively), while for U. rigida it was Chlorophyll A (81.1%).The least important variable for S. aurata and U. rigida was pH (0.1% and 0% respectively), and for M. galloprovincialis, Salinity was the least important (0%).The jackknife test (SI3) shows that, when used in isolation, Mean SST contributes the highest training gain for S. aurata (0.14) and M. galloprovincialis (0.36), with Chlorophyll A contributing the highest gain for U. rigida (0.52).For all species, the variable with the lowest training gain when used in isolation was Current Velocity: S. aurata -0, M. galloprovincialis -0.04, U. rigida -0.02.When omitted, Chlorophyll A was the variable that decreased the training gain the most for all species.

Optimal conditions
Using the response curves generated by MaxEnt for variables in isolation, the value that provides optimal habitat suitability can be identified for each species (Tab.2).
MaxEnt generates two sets of response curves for each variable: one which uses the variable in isolation, and another that keeps all other variables at their average values.Comparing these response curves (SI4) shows different suitability values for each variable and highlights how the interaction of other variables can influence habitat suitability, even for variables identified as more important by the model.This shows that balancing environmental conditions for all variables will result in a more suitable habitat than simply focusing on the variables with a higher individual influence.

Habitat suitability
Averaging the model results for S. aurata, M. galloprovincialis and U. rigida gave a habitat suitability map that ranged from 3.4% to 100% suitability (Fig. 2).The model output indicates that the sites with the highest suitability can be found in more southern latitudes, with large areas of the French, Spanish and Portuguese coastlines having a suitability between 80% and 100%.Despite this, areas of >50% suitability are found throughout the study area, from southern Spain to northern Scotland.The areas of highest suitability tend to be found in nearshore areas, indicating that this may be a more preferable location to use for an IMTA system.
To facilitate further identification of optimal sites, areas with a predicted habitat suitability of <50% were excluded from the map.Overlaying MPA and shipping density data also excludes many areas from selection, as they are within a protected area or at increased risk of damage from vessels.After overlaying these datasets, areas with an average depth of <20m were also excluded, to ensure that the remaining sites would have sufficient depth to establish an IMTA system (Fig. 3).This shows that areas of >80% suitability can be found in multiple locations, including: coastal areas close to Cádiz and Valencia in Spain, the southwestern coast of Portugal and several locations along the western coast of   France.Much of coastal Ireland is estimated to be 60-80% suitable for these species while some areas of southern and western UK are also suitable.Adding harbour data to these maps shows the relative accessibility of these areas and the size of the harbour that would be used to conduct IMTA work in the area.

Discussion
The MaxEnt modelling approach was selected for use in this study due to its high level of performance using presenceonly data and its ability to handle small datasets (Phillips et al., 2006;Elith et al., 2006;Baldwin, 2009).Using MaxEnt for selected species allows large areas, such as the Atlantic Area, to be modelled for habitat suitability, which will highlight specific locations to be considered for more detailed investigation.The suitable areas reported in this study could also be expanded by adjusting the criteria used when identifying sites, such as by considering sites within MPAs.
The model AUC results for all three species were >0.9, which indicates a high level of model performance.Although AUC does not always select the most parsimonious model for use, it was the secondary criteria for model selection here, after the OER.In addition, the use of multiple regularisation multipliers for each species model can produce more parsimonious models (Wiltshire and Tanner, 2020).Despite the strong model performance, combining models for multiple species can be difficult and is often imperfect without information on the prevalence of each species and this should be considered when interpreting the results.
After identifying suitable areas for the species to be cultivated, consideration must then be given to the area's suitability for an IMTA system.Although suitable locations can be seen in offshore areas, particularly in the southeast of Spain, few commercial-scale offshore IMTA systems exist and technical or financial constraints may limit the feasibility of establishing a new system in these areas (Buck et al., 2018).Of further consideration is the idea that establishing an IMTA system will then increase the suitability of an area further due to the positive interactions between species, the addition of feed for these species and working to reduce potential dangers in the IMTA area.How much of an impact these changes may have is hard to quantify as it will depend on various factors such as the configuration of the system or the density of the farmed species within it, which is beyond the scope of this study.
The model identified the optimum mean SST for all three species to be 20 °C (Tab.2), the highest value for this variable within the study area, suggesting that their optimum sea surface temperature may be higher than this value.This may be expected, as the species distribution data from GBIF and EMODnet show multiple occurrence records for all three species at more southerly latitudes, demonstrating a resilience to high temperatures (EMODnet Biology, 2023;GBIF, 2023a).Analysis of the temperature data used in the model suggests the highest mean SST in Spain is ~19 °C, although this is estimated to be rising by ~0.26 °C/decade in the period 1985-2009, similar to the estimated rise in UK sea temperatures of between 0.17 and 0.45°C/decade since 1984 (Garcia, 2015;Collins et al., 2020).The current temperatures are therefore suboptimal for cultivating these species, however this may increase the longevity of an IMTA system as ocean temperatures continue to increase.Mass cultivation of Ulva rigida may locally reduce negative effects of climate change such as ocean acidification by raising the pH of the water, while also acting as a source of oxygen in the system and reducing wave energy in coastal areas (Duarte et al., 2017;Williamson et al., 2017).
While these results offer a preliminary assessment for identifying suitable habitats, further study must be carried out on a local scale to ensure conditions are suitable for farming species at the required densities and establishing the infrastructure for an IMTA system.As an example, the mean current velocity required for successfully farming Sparus aurata can be as low as 10.5 cm s -1 , as demonstrated by an aquaculture site in Tunisia which produces 1470 tons of S. aurata per year (Abdou et al., 2017).Once a location with a suitable velocity has been found, the species must then be oriented and spaced apart in a way that maximises their nutrient assimilation abilities while minimising any negative interactions between them.
The habitats that are predicted to be suitable may be limited by the environmental variables selected for use in the model.These variables were average values, recorded from 2000 to 2014, and there is a possibility that they are no longer fully accurate within the study area.Additionally, variables that were not used in the model may have an unaccounted for influence on suitability within an area, further necessitating local analysis of potential sites.
This model, like the model produced in Hughes and King (2023), used presence only data and, despite using several methods to account for sampling bias, possible areas of bias must still be considered.Given the extent of the study area, sampling bias is an area of concern, with near shore locations receiving higher sampling effort than offshore locations, which will have a larger impact on models which use presence only data, compared to those that use presence-absence data (Phillips et al., 2009).
As this study uses data from natural populations, it is also predicting suitable habitats for these natural populations.When applied to sites for an IMTA system, some variables can be controlled and so were not used when building the model (e.g.bathymetry).The use of equipment such as longlines or cages can help to avoid boats and poor weather conditions by selecting an appropriate culture depth.The culture cages for S. aurata should allow for surface access to refill their swim bladder.In aquaculture, the inability of S. aurata to fill their swim bladders has been shown to lead to higher mortality and deformities in juvenile fish (Prestinicola et al., 2014).If this is not possible then one solution could be to integrate a dome filled with air inside the cage, an approach that was shown to be successful when used with Atlantic salmon Salmo salar (Korsøen et al., 2012).For all three species, the mean SST and Chlorophyll A are identified as the two most important variables in the model.The clearance rate of M. galloprovincialis has been shown to be significantly reduced when Chlorophyll A levels in the environment were outside a tolerated range (<2.08 mg L -1 or >26.91 mg L -1 ) for prolonged periods of time (Filgueira et al., 2009).The predicted optimal Chlorophyll A conditions for all three species lie within these values and should be considered an appropriate range for use.

Conclusion
When considering only the environmental factors in the model, this study shows the high potential for IMTA within the Atlantic Area.As aquaculture technology improves, more of these areas, particularly offshore, will become available for use in future systems.The suitable habitats highlighted by this study are designed to identify preliminary sites for investigation, although further assessments of the local environments at these sites is recommended, such as wave exposure or distance to shore.Introducing additional factors that will affect site selection, such as vessel density and MPA status, will further help to identify the ideal site for potential IMTA systems in the INTEGRATE area.

Fig. 1 .
Fig. 1.Study area covering the INTERREG Atlantic Area, ranging from 36°N to 61°N latitude and 11°W to 2°E longitude.Excludes Madeira, the Canary Islands and the Azores.

Fig. 2 .
Fig. 2. Results of the model, showing the average habitat suitability for S. aurata, M. galloprovincialis and U. rigida, within the INTEGRATE Atlantic area.

Fig. 3 .
Fig. 3. A) Suitable habitats after adding MPA data (in blue; UNEP-WCMC & IUCN, 2022), shipping route density within the Atlantic Area (EMODnet Human Activities, 2022) and excluding sites with <20m depth.The areas of highest suitability are shown to be in more southern latitudes.B) Spain, Portugal and western France, including harbour size and location data.C) Northern France, Ireland and the UK, including harbour data.

Table 1 .
Results of the test of permutation importance for S. aurata, M. galloprovincialis and U. rigida.

Table 2 .
MaxEnt output showing the optimal values for each variable in isolation for S. aurata, M. galloprovincialis and U. rigida (SI4).Also includes the optimal Max SST from the secondary model.
FundingThe INTEGRATE project is part funded by the European Union.INTEGRATE project area (Integrate Aquaculture: an eco-innovative solution to foster sustainability in the Atlantic Area, funded by the ERDF through the INTERREG Atlantic Area 2014-2020 Programme (project grant number EAPA_232/2016), www.integrate-imta.eu).