Remote Coastal Weed Infestation Management Using Bayesian Networks

: The increasing prevalence of species that are detrimental to biodiversity is a major concern, particularly for managers of national parks. To develop effective programmes for controlling weeds, it is essential to have a thorough understanding of the extent and severity of infestations, as well as the contributing factors such as temperature, rainfall, and disturbance. Predicting these factors on a regional scale requires models that can incorporate a wide range of variables in a quantifiable manner, while also assisting with on-ground operations. In this study, we present two Bayesian Network models specifically designed for six significant weed species found along the southern coast of Australia. Our models are based on empirical data collected during a coastal weed survey conducted in 2015 and repeated in 2016. We applied these models to the coastal national parks in the isolated and pristine East Gippsland region. Importantly, the prediction models were developed at two different spatial scales that directly corresponded to the scale of the observations. Our findings indicate that coastal habitats, with their vulnerable environments and prevalence of open dune systems, are particularly susceptible to weed infestations. Moreover, adjacent regions also have the potential for colonization if these infestations are not effectively controlled. Climate-related factors play a role in moderating the potential for colonization, which is a significant concern for weed control efforts in the context of global climate change.


Introduction
The management of biodiversity within conservation reserves requires the control of species that expand their range at the detriment of other species.In regions that are relatively pristine in terms of disturbance, this change is inherently obvious as so-called 'weeds' colonize available space rapidly [1].Often, this process is assisted by the disturbance generated by external factors such as fires, animals, and humans.Direct control of the weeds becomes a priority before the system is beyond repair and the ecosystems are required to accept the change to a novel ecosystem [2].
Early intervention is difficult when the vast area of the conservation area is inaccessible except by foot, and weed control requires significant physical and chemical effort to have any noticeable effect [3].This translates directly to high yearly expenditure on weed control and detection with significant demands to spatially prioritize efforts [4].The development of a strategic plan to ensure the greatest effectiveness of control efforts is essential but these plans are often constructed in the face of high data uncertainty and inadequate weed behaviour models.On-ground surveys need to be fully leveraged to expand and interpolate weed presence/absence observations to regions that were unable to be surveyed.Supporting data, such as the spatial extent of vulnerable vegetation communities, are required to provide a regional assessment.However, combining expert opinion with survey data and remotely sensed imagery for the purpose of estimating the probability of a particular weed occurring is not trivial [3,5].
Utilizing the theories of an ecological niche and environmental gradients is the foundation of habitat suitability probability modelling [6,7].In this framework, the observation of the presence of weeds is statistically correlated to a suite of environmental conditions.For many applications of this approach, the assumption is that the system is in equilibrium and the absence of a species at an observed location indicates the likelihood of unsuitable environmental conditions [7].However, for emerging weeds that are in the early stages of colonization, the observation of 'species absent' has an additional meaning that the survey space may simply have avoided colonization due mainly to stochastic events.This habitat suitability dynamic is also complicated when survey results are incomplete due to resource limitations.The statistical correlations then will be 'weak' for weed species that may only occur in small fractions of the available habitat.It may even be possible that a suitable habitat is incorrectly classified as 'unsuitable' because the correlation has not been observed.Increased sampling effort combined with systematic sampling design will assist [6] but expert opinion on specific weed species preferences may also be required.This expertise can often be acquired from weed occurrences in adjunct regions.
With such uncertainty regarding the impact and colonization success of weeds in a conservation area, the use of an adaptive management framework is important [8].Routine field work such as track maintenance and visitor facility upkeep can be combined with biodiversity actions such as weed control and surveying [9].Ideally, the feedback mechanisms in place for conservation managers, from weed observations to modelled vulnerability, can assist with a dynamic prioritization of targeted control actions.Equipping land managers with both the tools and knowledge to capture weed observations and environmental conditions is optimal to modelling the extent of the issues in the region [6].Habitat suitability modelling will require a sophisticated capacity to integrate disparate data and provide rapid updates of the infestation extent and intensity including previous measures of success in infestation control and contributing factors (i.e., soil disturbance).Adaptive management of the conservation areas requires a close linkage between monitoring, objectives, and action [10].Critically, conservation managers require a model of the vulnerability of weed infestations across a range of habitat types (to assist in survey strategies) combined with another model of site-level contributing factors that can be physically controlled.
Models suitable for this environment management need to be able to combine disparate data and require a common 'currency' to determine the relationships within the model.Simply combining the presence/absence of a weed with the coincident observation of a suite of environmental parameters ignores the complexities of the multicollinearity relationships between dependent variables [11], i.e., rainfall, soil type, and disturbance.In order to restrain the model complexity to maintain predictive power while negotiating uncertainty limits and yet offer spatially valid estimations of vegetation dynamics, alternative modelling approaches will be required [12].One such approach is to base the probabilistic predictions on correlations between observations over space and time rather than formulate a set of precise interaction equations [13].Correlations in a trophodynamic system do not necessarily directly equate to metabolic, behavioural, or ecological processes but the tradeoff is the ability to predict with increased precision in a diverse and uncertain environment [14].
Bayesian Networks (BNs) are one such modelling technique that is particularly popular in ecology due to the capacity to support both complexity and uncertainty simultaneously [15,16].BNs offer the capacity to encompass complex interactions of disparate data types within a probabilistic framework with only a few limitations [17][18][19][20].The Bayes rule, combined with the chain rule, enables the efficient propagation of conditional probability throughout a network structure [21,22].The network design is typically the result of expert opinion although machine learning algorithms exist to formulate a possible network structure through an analysis of correlations [20].The parameterization of a BN model is through the inclusion of observational cases that fully or partly describe a system state.The more cases used to inform the conditional probabilities, within the model, the more accurate the predictions [13].Algorithms, such as expectation maximization, can assist in adjusting for missing data [23].Expert opinion, equations, numerical (continuous, discrete, and censored) data, and categorical data can be included in the model, which is particularly useful for socioecological models [15].
Limitations relevant to ecosystem models include the prohibition of feedback loops and the inability to predict outside of the observational space [16].Feedback loops, in particular, have severely limited the application of BN to trophic dynamics but recent advances in network analyses [18] and time aggregation have established an acceptable compromise.Eklöf et al. [18] demonstrated the application of BN to extinction rates in food web models via the simplification and retention of fundamental pathways between groups of species.The BN is able to predict the likelihood of a system being in a particular state given additional evidence.However, this requires that the conditional probabilities (from observed cases) have been previously included in the model parameters.Predicting how the system will respond to conditions outside of the observation space requires the inclusion of expert-derived predictions, often in the form of equations, generated from models such as IPCC climate models or experiments on metabolic thresholds.Even with such input, the propagation of predictions to unobserved biotic interactions becomes uncertain with a significant loss of accuracy.
Interestingly, the primary concepts behind BNs are familiar to the general population.For example, when assessing the appropriate clothes to wear for a walk in the forest, people will gather up information about the likely weather patterns, the seasonal influences, the past experiences (being too hot or cold), and the available selection of clothes.The walker has a priori knowledge that the weather is uncertain and that events have a range of probabilities depending on the season and daily factors.The estimation of these probabilities in our minds is a regular occurrence but few people would use a mathematical approach to carefully define the likelihoods.The Bayes theorem permits the calculation of these probabilities so that we are not solely reliant on expert opinion and vulnerable to surprises [23].
In this manuscript, we utilize the weed surveys conducted over multiple years in a remote section of the Australian coastline.The East Gippsland series of coastal national parks extend along an uninhabited and pristine coastline [24][25][26] for 176 km (Figure 1).The surveys encountered 84 weed species [9] although many are not considered a threat to the ecosystems present.However, if the key weed species are permitted to flourish then endangered ecosystems such as wetlands and coastal dunes are likely to be diminished [24,26].Here, we present the results of the two BN models that incorporate a range of influential data sets to generate predictive maps of weed distributions.Complimentary BN models at two alternative spatial scales are presented as a mechanism to assist with the adaptive management of an expansive conservation area.The two models presented are, in themselves, interesting reflections of the influences that determine the weed dynamics.Here, we present the results of the two BN models that incorporate a range of influential data sets to generate predictive maps of weed distributions.Complimentary BN models at two alternative spatial scales are presented as a mechanism to assist with the adaptive management of an expansive conservation area.The two models presented are, in themselves, interesting reflections of the influences that determine the weed dynamics.The questions we address have a different focus.What ecosystems are vulnerable to weed infestations across the entire East Gippsland national park (in Victoria, Australia)?What contributing factors can be managed at the site level to control weed infestations?

Materials and Methods
In brief, the methods consisted of four parts: the collection of weed observations and in situ environmental data across the study area, the compilation of geospatial data for use in a regional-scale model, and the development of a casual network to inform the Bayesian Network.

The East Gippsland Study Area
The spectacular and unspoilt coastline of the East Gippsland study area includes UNESCO World Biosphere Reserves amongst a diverse suite of inlets, rocky headlands, and isolated beaches (Figure 1).The enormous diversity of ecosystems from heathlands, dunes, rainforests, to majestic forests attracts visitors both nationally and internationally.The study area includes Croajingolong NP, Cape Conran NP, and Peach Tree Creek Reserve.The study area is 100,094 Hectares with a 176 km length of coastline with no significant human habitation in the region.

The Weeds Survey
Within the study area, the following landforms and features were surveyed for weeds: 1.
Beach Strand: The area of beach between the high tide line and dunes.

2.
Dune Complex: Primary (first) dune and swale beyond above beach strand.

3.
Rocky Headlands: Elevated cape or point of land reaching out into the water, devoid of beach strand or dune characteristics.4.
Estuarine Shores: Areas of land abutting estuarine waters at the time of survey to a maximum of 250 m inland.

5.
Human Access Nodes: Areas readily and frequently accessed by recreational users comprising the last 100 m of vehicular tracks servicing carparks and lookouts, and 20 m buffer around lookouts, carparks, and campgrounds.
Three key survey methods were applied across the study area: 1.
Random stratified sampling (unbiased) of transects: The generation of 90 random point locations (using ET Geowizard within ARCGIS 10) within the ecological vegetation class (EVC) layer based on each area of an ecological vegetation class.

2.
Random sampling (biassed) of past infestations: Biassed random transects across 110 locations within areas where weed species have previously been recorded.

3.
Opportunistic searching: Data on weed species were recorded throughout the entire study area through meander searching.This involved crews of two people walking the entire stretch of the coastline within the study area between Point Ricardo and the NSW border.
For the surveys along the dune complex, the 3-way transect method was used.This required the surveyors to start at the beach then head inland up to 100 m inland (perpendicular to the water's edge) over the fore dune and into the swale (where practical).Then, the surveyors follow for 100 m along the swale or dune.Finally, the surveyors turn back out to the beach, recording along all three sections.The weed cover and extent were recorded by the two surveyors who walked either side of the centre of the transect line (covering an estimated survey width of 20 m along each transect).A GPS was used to record the start and end points of each transect line (including change in direction) and location of weed species and related attributes (Table 1).Additional site-based observations were also collected (Table 2).For the estuary or campground and activity nodes, the transect location involved the completion of a 2-way transect.The transect was commenced at the estuary or campground activity-node edge, heading directly away from the approximate centre of the node for 20 m.
The weed surveys conducted in November 2015 and 2016 noted 6 key species that were significant invasive pests in the region [9].A total of 2522 survey sites (1486 in 2015 and 1036 in 2016) were recorded along the coastline and the presence and absence of key weed species were noted as well as a range of environmental conditions.A linear distance of approximately 176 km of coast was surveyed in 2015 and repeated in 2016.During the 2016 survey, 173 transects were completed and 27 transects were abandoned and not completed due to steep inaccessible terrain, very close proximity of a transect to another transect, or lack of time on the day of surveying to complete the transect.The combined linear distance of transects is 2.3 km.A total of 84 different weed species (of which 8 were on adjoining private land) and 1538 weed records were captured during the survey.The 10 most frequent weed species recorded during the survey were Milk Thistle (Sonchus sp.33), Flatweed (Hypochaeris sp.35), Blackberry (Rubus fruticosus aggregate 38), Panic Veldgrass (Ehrharta erecta 47), Dolichos Pea (Dipogon lignosus 50), Sea Rocket (Cakile sp.76), Coast Gladiolus (Gladiolus gueinzii 87), Marram Grass (Ammophila arenaria 175), Coast Capeweed (Arctotheca populifolia 209), and Sea Spurge (Euphorbia paralias 521).Sea Rocket and Marram Grass are actually the most common and so the number of observations represents the intersects within the transects.

Model Development
The primary motive for this project was to develop a regional model of the vulnerability of key weed species for the entire study area.However, given the imperative to address adaptive management processes, a local-site-scale model was also developed directly from the environmental and weed observation data.While the regional-scale model utilized covariate data that were recorded or modelled across the region to develop a spatially explicit set of predictions, the local site model was not spatially explicit and captured fine-scale observations that were pertinent to field-based operatives.

Regional-Scale Weed Vulnerability BN
The critical first step to the regional model development is the construction of a causal diagram [20] for the immergence of weeds across the region.This required many iterations based on expert opinion to successfully capture the environmental influences and their association to weed colonization.Many region-scale environmental variables could have been included but were excluded simply due to the constraint of keeping a model sufficiently simple and manageable.Complementing this process was the availability of data that were sufficiently high-resolution and temporally relevant and had regional coverage.Spatial information on the activities of feral animals, for example, was not available with sufficient accuracy to include.Finally, the network diagram showing the various parameters and the cross linkages was agreed on by the authors.The site-scale model, in contrast, used a machine learning tree-augmented naïve (TAN) algorithm based on the survey data alone to generate a BN model [27].
The data collection of environmental variables at the scales of the model output was gathered or created using GIS modelling techniques.The various data sources and complimentary metadata are listed in Table 3.The GIS analysis was conducted in QGis Version 2.18.2 (QGIS Development Team, 2009).The resolution of the output was determined at 30 m by 30 m in order to capture some fine-scale features (precision) but remain sufficiently robust (accuracy) for the regional approach.
For every weed species, the spatial points showing the observed occurrence and the observations without any weeds were placed in separate shapefiles.The values for the raster environmental and GIS model data were extracted to every survey point.The attributes were exported, examined, and consolidated in R (Version 3.3.2) (R Core Team 2017).The scripts in R created a text file (referred to here as a 'case' file) where every spatial point was a data frame row with column information pertaining to the various lists of model parameters.Three case files were created for each weed.The first was the full survey case file with the associated environmental data.The second and third case files were the same file but randomly sampled for 20% and a complimentary 80% of the data.The causal network formed the basis of a naive Bayesian Network (BN) within the Netica V6.04 software environment (Norsys Software Corp 2016).The conditional probability tables (CPTs) were updated by importing the 80% survey case file for the single weed using an expectation maximization procedure.This algorithm is particularly suited to data that contain significant levels of missing data [23].The BN model was compiled and contained the marginal probabilities for each parameter.Essentially, this was a reflection of the observed likelihood of any parameter occurring in the survey data set, similar to a histogram but with bins' sizes reflecting the frequency of data.
The BN was then tested for predictive accuracy for each weed species using the associated 20% reduced data set.The testing compared the observations of species occurrence with the BN predictions given the environmental data.This generated a number of indices (correlation matrix error, Gini coefficient, and Area under ROC) that provide a measure of accuracy of the model structure and parameterization [28].The full survey case file was then used to totally update the CPT probabilities.
The study region case file was compiled from the centroids of all 30 m × 30 m raster cells in the study polygon and attributed with the regional data sets listed in Table 3.This was used to predict the likelihood of a selected weed occurring within the entire study area.A new file that recorded the probability of a particular weed occurring, given the conditional probability of the environmental and social parameters, was generated.This file was subsequently joined to the spatial points' file and used to map the distributions in the GIS.
The process of CPT updating is repeated for every key weed species so that the BN model structure (based on the causal diagram) remains consistent but the marginal probabilities are adjusted accordingly.

Local-Site-Scale BN
A second model was also developed from the information contained in the survey data alone.This model was not spatially explicit due to the fine-scale nature of the fieldbased observations and was used to describe the mechanisms that determine the local-scale processes promoting the occurrence and spread of the weeds.The selection of parameters to collect was based on the expert opinion of field staff with particular focus on Victorian national park operational management.Expert opinion generated the structure of the field survey data associations to develop the BN with the 'common weed names' as the target variable.The survey parameters observed during the field trip are detailed in Tables 1 and 2. This model, due to the key factors observable only at a site level (i.e., soil disturbance and drainage), cannot be extrapolated to a regional scale but still serves to provide insights into the influences affecting weed spread.Critically, this model can inform park managers about the actions required to control weed infestations at a site level.This approach of generating two models at different scales supports the adaptive management framework by providing synthesized information about weed behaviour.Following systematic repeated surveys, the data can also reveal the effectiveness of control measures, vulnerability of habit types, and influential socioecological factors in weed colonization.

Results
Utilizing the survey data combined with the geospatial data, two BN models were developed.The site observations were used to construct a BN that provided managementorientated outputs useful for on-ground operations.The regional vulnerability BN model was able to predict the occurrence of several different weeds along the East Gippsland coastal national parks with mixed accuracy to facilitate weed control priorities.

Local Site BN
The model configuration is shown in Figure 2 and describes the observations in a 30 m radius.The error rate of this model was 27.74% based on a confusion matrix with a 20% random subset of the survey data (Table 4).Essentially, this compared the number of cases allocated by predictions against the observed.For example, in Table 4, 143 cases are accurately predicted to be Sea Spurge while in the cell below, 22 cases were predicted to be Sea Spurge but were actually Coast Capeweed.The marginal probabilities in the local BN model (Figure 2) show that surveys along the coast were conducted in mostly well-drained sandy soils with grassland cover or a dune/scrub/grassland mosaic (vegetation-type node in Figure 2).The observed weeds were predominantly noted as emerging and scattered, often covering a 10 m square area.
The model can be used to predict the likelihood of weed occurrence if those identified parameters can be estimated.The model calculates the influences present in the model (calculated as variance reduction) as shown in Table 5. Vegetation type (grasslands, etc.), behaviour (emerging, etc.), and soil disturbance (wind, etc.) are the most influential nodes.The node 'Common Name' shows the occurrence of the observations for the field survey.Sea Spurge was rated as the most common with 50.3% of observations while Purple Groundsel was only 0.29%.The absence of weeds was noted in 15.0% of observations and this was used to highlight the more resilient vegetation types.
Spurge but were actually Coast Capeweed.The marginal probabilities in the local BN model (Figure 2) show that surveys along the coast were conducted in mostly welldrained sandy soils with grassland cover or a dune/scrub/grassland mosaic (vegetationtype node in Figure 2).The observed weeds were predominantly noted as emerging and scattered, often covering a 10 m square area.  1 and 2).The classes (continuous data) or states (for discrete data) for that variable appear within the box and show the occurrence in a percentage.The lines connecting the boxes show the correlations observed in the data and indicate that one variable has an effect on another.

Table 4. Confusion Matrix
showing the cases where the predicted (columns) occurrences are shown against the observed field data (rows) for the local BN.The diagonal column is the optimal location of the predictions that match those observed.The model can be used to predict the likelihood of weed occurrence if those identified parameters can be estimated.The model calculates the influences present in the model (calculated as variance reduction) as shown in Table 5. Vegetation type (grasslands, etc.), behaviour (emerging, etc.), and soil disturbance (wind, etc.) are the most influential nodes.The node 'Common Name' shows the occurrence of the observations for the field survey.   1 and 2).The classes (continuous data) or states (for discrete data) for that variable appear within the box and show the occurrence in a percentage.The lines connecting the boxes show the correlations observed in the data and indicate that one variable has an effect on another.From a management perspective, the capacity of the model to highlight the most likely site-specific factors that influence the presence of a specific weed is critical.By selecting the common weed name, the model, within Netica software, will automatically adjust the marginal probabilities and present a series of primary factors to observe or control.Given the large area of the landscape to manage, this capacity to focus on the most likely areas of emerging weeds is highly effective.

Regional Vulnerability BN Model
The regional vulnerability BN model was built from several iterations of a causal diagram containing nine environmental factors and two social factors (Table 3).
This BN model is shown in Figure 3.There are three main components: dispersal influences, habitat vulnerability, and climate.Each of the nodes (shown as boxes linked in a network) are described in Table 1.Spatial data, where available, were used to populate the model except in three variables called latent nodes (green boxes in Figure 3).These nodes do not have a spatial data set and are used to assist with the flow of conditional probability logic through the model and include the climate, dispersal influences, and habitat vulnerability nodes.Climate is defined as a mix of geomorphology such as aspect and regional changes in temperature and rainfall.Dispersal influence captures the assisted transport of seeds and plants through vectors such as water and human disturbance.Habitat vulnerability is the combination of geology and existing vegetation types that might hinder or assist the establishment of these weeds [29].For example, weeds are more likely to exist when habitat vulnerability is high and this may be coastal dunes or coastal scrub EVC.The expectation maximization algorithm is used to 'shape' these latent node probabilities beyond simple expert opinion.The probabilities shown in the model in Figure 3 describe the 2522 survey observations and associated environmental data that were used to inform the CPTs customized for each weed species (noting that Figure 3 was the Coastal Capeweed BN).The dominance of coastal scrub along the coastal dunes is evident (ecological vegetation communities' node values in Figure 3) and the wilderness of the region is captured by the majority of survey points being away from roads (road cost distance node; mean = 964 m).The study area is generally dry with 70% of points being more than 170 m from a creek or water source.The error for the spatially enabled regional BNs for the mix of weeds was estimated at 3.9% to 6.1% based on a confusion matrix in Table 4.The model accuracy was not able to be tested with Purple Groundsel and Tree Lupin due to the small number of observations that denied a sub-setting algorithm.Coastal Capeweed, Coastal Gladiolus, Dolichos Pea, and Sea Spurge predicted well for the immediate area around the survey sites and the model can be utilized for these species.The Gini coefficient and Area Under Curve (AUC) [28] highlight that for the survey locations, the predictions are considered accurate (values close to 1 in Table 6) in being able to predict the existence of the weed species.
The sensitivity of the "species occurrence" node to changes in the other nodes was examined for each weed species and shown in Table 7.This table shows that climate and dispersal influences (particularly the distance to existing weed populations) are highly influential.Other factors such as distance from a campground were surprisingly weak in influencing the presence of weeds at a regional scale.Notably, each weed responds differently to the environmental and social factors (see Appendix A, Table A1, for examples of diversity).
Maps predicting the occurrence of these selected weed species were generated by asking the regional BN model to predict the likelihood of occurrence for every 30 by 30 m cell in the study area given the environmental data available.Figure 4 shows the predicted occurrence of one weed, Coastal Capeweed.Other weeds can be similarly mapped.It should be noted that the models predicted well where the field survey data were located but in the nearby unsurveyed regions, our confidence in the model was significantly less.This lack of confidence applies to both the presence and absence of a particular weed species.Highly vulnerable areas that are potentially remote and expensive to monitor can be targeted for the emergence of specific weeds.Similarly, the factors that continually enhance weed distribution can be controlled.survey locations, the predictions are considered accurate (values close to 1 in Table 6) in being able to predict the existence of the weed species.3. The latent variables are in green boxes while the nodes in the beige colour are based on data from surveys, remote sensing, and GIS.Table 6.Accuracy for each weed variation of the regional vulnerability BN model based on the confusion matrix.Accuracy testing using a 20% sample was not possible for weeds with low numbers and is indicated by Not Available (NA).The error rate is based on the ratio of correctly predicted cases verses the observed cases.The Gini coefficient varies in the range 0 to 1 where a value of 0 represents complete uncertainty and 1 represents complete certainty.AUC values' range is [0, 1], where 1 denotes no error, 0.5 denotes totally random models, and <0.5 denotes models that more often provide wrong predictions [28].The sensitivity of the "species occurrence" node to changes in the other nodes was examined for each weed species and shown in Table 7.This table shows that climate and dispersal influences (particularly the distance to existing weed populations) are highly influential.Other factors such as distance from a campground were surprisingly weak in influencing the presence of weeds at a regional scale.Notably, each weed responds differently to the environmental and social factors (see Appendix A, Table A1, for examples of diversity).3. The latent variables are in green boxes while the nodes in the beige colour are based on data from surveys, remote sensing, and GIS.Table 6.Accuracy for each weed variation of the regional vulnerability BN model based on the confusion matrix.Accuracy testing using a 20% sample was not possible for weeds with low numbers and is indicated by Not Available (NA).The error rate is based on the ratio of correctly predicted cases verses the observed cases.The Gini coefficient varies in the range 0 to 1 where a value of 0 represents complete uncertainty and 1 represents complete certainty.AUC values' range is [0, 1], where 1 denotes no error, 0.5 denotes totally random models, and <0.5 denotes models that more often provide wrong predictions [28].Maps predicting the occurrence of these selected weed species were generated by asking the regional BN model to predict the likelihood of occurrence for every 30 by 30 m cell in the study area given the environmental data available.Figure 4 shows the predicted occurrence of one weed, Coastal Capeweed.Other weeds can be similarly mapped.It should be noted that the models predicted well where the field survey data were located but in the nearby unsurveyed regions, our confidence in the model was significantly less.This lack of confidence applies to both the presence and absence of a particular weed species.Highly vulnerable areas that are potentially remote and expensive to monitor can be targeted for the emergence of specific weeds.Similarly, the factors that continually enhance weed distribution can be controlled.

National Park Management
The development of predictive regional-and local-scale models from the field survey data has enabled the extrapolation of the survey observations to a wide area of interest for national parks in East Gippsland.The models are particularly tuned to the list of six notable weed species but could be applied to many others.The regional BN model is spatially enabled and is able to construct a cell-by-cell map of the study area showing the likelihood of the weed occurrence.The dominant prediction of the model is that the fragile coastal dunes with their associated vegetation groups are particularly vulnerable and disturbance by wind and storms is likely to extend the spread.In contrast, the local BN model focuses the on-ground operations to areas of disturbance and past weed infestations for maximum impact.
While the model performed satisfactorily along the coastline and for the widely distributed or commonly recorded weeds, the newly emerging weeds such as Purple Groundsel and Tree Lupin were not predicted with confidence.This is due to three reasons.Firstly, the scale of the model and the associated data may not capture the key ecological aspects of the weed.Secondly, the survey in this large region did not cover sufficient ecological systems to note the full range of these less common weeds.Lastly, the weed is sufficiently rare such that correlations with a diverse suite of environmental factors were limited.In essence, more field data are required to gain confidence for these species.Field crews undertaking weed eradication programmes and park maintenance can also be collecting data on the weed locations and conditions.Another potential solution here is a citizen science approach that encourages people to use their phones to note the observation of a small set of weeds in the park with GPS coordinates (see the iNaturalist app, https://www.inaturalist.org/,accessed on 1 May 2024, for example).
The models highlight the impact of letting existing populations flourish.The data from previous field work undertaken one year previously were included and clearly show that weed occurrence is most likely where past populations succeeded.Early intervention with control of small patches, especially before seeding takes place, is supported by the model especially in the distribution of individuals, size of clumps, and life stages.

Model Limitations and Future Directions
This type of model is able to be adjusted and developed as new information becomes available.Of particular note is climate data.Fine-scale humidity and temperature data would increase the precision of the predictions for the regional BN model significantly.The model structure can be used to help develop strategic plans despite some clear areas of error and uncertainty and this has the benefit of highlighting the predicted results but also emphasizes the need to sample in more regions.Regular systematic surveys will develop a database that can be directly used to evaluate the effectiveness of the management interventions and enable financial estimations of weed control.Models that are empirically based are able to adapt and ensure a non-stationary approach to decision making [4,10].
Several limitations and constraints have been identified primarily with the execution of the survey effort.Due to the large study area, it was not practical to survey every square metre of the study area for weed presence and only a small percentage was surveyed on foot.Several sections of steep headland coast were not surveyed due to their inaccessibility.Practical transects were completed by walking in a straight line between points; however, this was not always possible due to the very thick coastal scrub.Several flora taxa were only identified to the genus level due to the lack of flowering material.Certain flora species are only readily identifiable onsite during periods of particular environmental and climatic conditions.Surveying of the site was undertaken during four consecutive weeks in Spring and there is potential that plants that flower outside of the survey period may not have been detected.
The models highlight that Sea Spurge and Coastal Capeweed are indeed very serious threats to the delicate coastal environments.They have the capacity to dominate the space made available and can exclude other native species.In regions where the ecosystems are undisturbed, these weeds will be restricted to a narrow coastal strip unless they are able to opportunistically expand into the heathlands following a disturbance event.The Coastal Capeweed has so far concentrated to the northeastern sector but the capacity to move south is noted.Sea Spurge dominates some beach areas that face southeast and this may be a function of storm and wave disturbance.
Of more concern is that the climate gradient is noted as a driver of the weed presence.Given that climate models indicate a warmer change especially along the eastern seaboard [30], the capacity of weeds to dominate where natives are struggling is considerable.Early intervention to remove the established colonies will be essential to ensure future resilience of these fragile habitats.

Conclusions
The optimization of on-ground surveys for weed management is described in this research.Critically, the contributing factors that are associated with the presence or absence of a set of key weed species can be modelled both at a regional level for strategic planning purposes or modelled at a local scale for optimized detection and the potential amelioration of contributing processes.Here, we found that the type of ecological vegetation community combined with disturbance history are important elements in the success of weeds to colonize natural areas.In the pristine East Gippsland region where anthropogenic disturbances are absent or minimal, utilizing the past weed monitoring to target control practises is highly effective.Our findings indicate that coastal habitats, with their vulnerable environments and prevalence of open dune systems, are particularly susceptible to weed infestations.This use of Bayesian Networks can enable updates and predictions based on changes in environmental conditions and new weed observations.Land management tools that leverage on-ground information are fundamental to modern national parks.

Figure 1 .
Figure 1.Overview map of East Gippsland study area.

Figure 2 .
Figure 2. The local BN from the survey data.Each box or node represents a variable noted in the field (Tables1 and 2).The classes (continuous data) or states (for discrete data) for that variable appear within the box and show the occurrence in a percentage.The lines connecting the boxes show the correlations observed in the data and indicate that one variable has an effect on another.

Figure 2 .
Figure 2. The local BN from the survey data.Each box or node represents a variable noted in the field (Tables1 and 2).The classes (continuous data) or states (for discrete data) for that variable appear within the box and show the occurrence in a percentage.The lines connecting the boxes show the correlations observed in the data and indicate that one variable has an effect on another.

Figure 3 .
Figure 3.The regional BN parameterized for Coastal Capeweed showing marginal probabilities for factors outlined in Table3.The latent variables are in green boxes while the nodes in the beige colour are based on data from surveys, remote sensing, and GIS.

Figure 3 .
Figure 3.The regional BN parameterized for Coastal Capeweed showing marginal probabilities for factors outlined in Table3.The latent variables are in green boxes while the nodes in the beige colour are based on data from surveys, remote sensing, and GIS.

Figure 4 .
Figure 4.The predicted likelihood of the Coastal Capeweed occurrence across the study region.Areas away from the survey locations have a low level of confidence.Inset map: Zoomed in the section of the predicted model for Coastal Capeweed.The red regions indicate the high likelihood of observing the weed.Hence, the weed is biologically suited to the niche created in the red zones and is more likely to occur in this location than in other locations indicated in different colours.

Figure 4 .
Figure 4.The predicted likelihood of the Coastal Capeweed occurrence across the study region.Areas away from the survey locations have a low level of confidence.Inset map: Zoomed in the section of the predicted model for Coastal Capeweed.The red regions indicate the high likelihood of observing the weed.Hence, the weed is biologically suited to the niche created in the red zones and is more likely to occur in this location than in other locations indicated in different colours.

Table 1 .
Weed Field Data Attributes Collected.

Table 2 .
Additional Weed Field Data Attributes Collected.

Table 3 .
GIS layers used to inform the model were sourced from Victorian Department of Environment, Land, Water and Planning (DELWP) and Bureau of Meteorology (BOM) unless otherwise stated.

Table 4 .
Confusion Matrix showing the cases where the predicted (columns) occurrences are shown against the observed field data (rows) for the local BN.The diagonal column is the optimal location of the predictions that match those observed.

Table 5 .
Sensitivity of the weed list node 'Common Name' to a finding at another node for the local BN using variance reduction algorithm for the top 7 influences.The higher the variance percentage implies a higher influence on the Weed species occurrence prediction.

Table 7 .
The Sensitivity of 'Species Occurrence' to a finding at another node in the regional vulnerability BN measured as percentage variance reduction.

Table 7 .
The Sensitivity of 'Species Occurrence' to a finding at another node in the regional vulnerability BN measured as percentage variance reduction.