Ecological monitoring is a vital tool for effective conservation (Tittensor et al., 2014), but it can be a resource heavy investment. In this regard, rapid habitat assessment methods may be valuable tool to increase our monitoring capabilities. However, testing five rapid assessment scorecards for peatlands and heathlands, our results found that the overall sites scores produced by ecological scorecards rarely match. Scorecards may be reflective of project-specific goals, but may not be suitable for wider monitoring in their current form. However, with some adaptations the methodology could become a valuable tool outside of results-based AES.
4.1 Comparison of Scorecard Results
We compared the results of five heathland and peatland scorecards in the Blackstairs Mountains SAC to understand how scorecards assess ecological condition on the same site. Site scores rarely matched, with a minimum difference of 1.5 points and a maximum of 6.3 out of 10. The selected scorecards most often use 12 indicators out of 18, with only drainage, burning, and bare soil being used by all scorecards. Grazing (two cards) and secondary indicators of hydrological condition (one card each) are the least used indicators. The RBAPS draft card scored sites the lowest on average, while the Test card scored sites most favourably. Of the scorecards currently in use, the HHP scored sites lower than both the PMP and BFF. Due to the small number of field sites in this case study, we decided to use score simulations to replicate potential site scores using random score distributions. Applying a random score generator to each scorecard produced similar results to our field data, suggesting that the choice of scorecard will impact the site score due to card design rather than variation in ecological condition.
Despite assessing similar metrics, there are some key differences between scorecards. While vegetation-based indicators typically provide the largest contribution (50-70%) to the total score, substantial differences occur within this category, and it is the primary cause of score variation. This is likely a result of project specific goals. For example, the largest scorecard deviation (HHP) does not assess either positive indicator species number or cover, but instead rewards a potential 40% of the total score for vegetation structure. The PMP places more weight on bryophyte cover and hydrology than species present, whereas the BFF card places greater emphasis on positive indicators with 40% of the total score available for number and cover (20% each). The weightings of these indicators make sense for the specific project; however, the data is of limited use for assessing underlying habitat condition if used outside of these aims.
The score weighting is not solely responsible for score variation, as the thresholds and qualifiers set for potential points loss can also have an effect. For instance, the RBAPS draft card penalises for the number of negative indicator species and invasive species together, meaning that the detection of one individual from the list of negative indicator species can reduce the score and this is the leading cause for this card scoring sites lowest on average. This variable is either assessed by percentage cover or by separating negative indicators and invasives in the other scorecards. Threshold testing has been rarely carried out in European ecological score indexes to date (Elmiger et al., 2023). As our data shows, this can lead to variation in the ecological score, which has both financial implications for landowners in results-based AES and obscures the overall ecological condition of the site. Testing and calibration requires experts in the field, and so can add to project expenses. However, scenario testing and modelling may help limit this. Despite the merits, modelling has yet to find a regular role in the quantitative estimation of ecological condition, such as scorecard design and biodiversity offsetting (Borges-Matos et al., 2023). A relatively simple program such as the one used in our study could help with the creation of new scorecards by testing it against a benchmark, set by a previously tested scorecard. This would allow creators to test if their scorecard is likely to score sites above or below the benchmark, saving valuable field time.
Variation was low overall in site management indicators. This was due to the similarity in the thresholds and descriptions across most scorecards. A caveat to this is that none of the sites surveyed are subject to turbary and supplementary feeding, whilst damaging activities are infrequent. However, burning and bare soil are common in this region and the thresholds differ between cards. The key difference for burning is that the BFF and Test scorecards score low burning evidence as neutral, whereas the other cards only score neutral or positive for no evidence of burning. Similarly, the BFF, HHP and RBAPS card sets the threshold for the lowest category score (-2 and –1.5 respectively) for bare soil cover at >5%, whereas the PMP sets it at >10% (-2). Hydrological indicators are mostly limited to drainage and similar guidance is used throughout the selected scorecards. The variance in this category is due to the PMPs stronger weighting, reflecting the projects focus on water quality.
Scorecards typically have a maximum score of 100, but the potential points that can be lost is not standardised. When a metric does not score maximum in its category, the overall achievable score is reduced (i.e., if a metric can score 20 points and scores 0, then the maximum score is now 80). This is the cause of the negative means in the random score simulations. However, this data matches the infield findings and suggests that the lack of standardised points loss across cards does influence the average score achieved by each scorecard. Adding a negative score compounds the effect. An example of this is that the PMP and HHP both scored a site below zero (Appendix 1). An indicator that can score negatively and positively has an increased weighting, as it now can score across a wider range. For example, the PMP scorecard scores drainage from a range of +15 to -30. This gives a combined weighting of 45 points, which is 45% of the potential score on one indicator. The overweighting of indicators can lead to ‘attribute eclipsing’, where one indicator overshadows the rest. This problem was identified in three Australian rapid assessment methodologies for biodiversity offsetting (Oliver et al., 2014). Another problem these methods can face is the combination of positive and negative scores leading to all sites scoring average, despite clear differences in condition. This could potentially be the case with the BFF scorecard, as most sites scored maximum points on the positive indicator species metrics. A review of wetland rapid assessment methods in the US showed that two systems tested on the same sites produced ‘medium’ ranked scores persistently, despite the field sites showing apparent differences in condition (Fennessy et al., 2007).
A survey of ecologists currently using scorecards feel that they are a ‘fairly accurate’ representation of ecological condition (Gorman, unpublished data), however such variance suggests that they may be accurate to project goals but not necessarily true condition. The question that must be asked regarding scorecard methodologies is what are we measuring? The purpose the card is created for must be considered before any assumptions are made as to whether it represents true ecological condition. Ecological scorecards have proven to be a valuable tool in results-based systems, but in their current form they are unlikely to be suitable for wider uses.
4.2 Recommendations for Developing Scorecards Outside of Results-Based AES
One of the many benefits to ecological scorecards is that they are easily adaptable to new locations or purposes, however we found that caution is needed before assuming an existing card will be appropriate for new project objectives or habitats. Designing a new card should not be ruled out through the assumption an existing one will be adequate, as the choice and weighting of indicators may need to be adjusted.
The use of positive indicator species can be problematic for scorecard design. For a single survey based methodology, plants are the only viable taxa, but the number and choice of indicator species deserves particular attention as their detection and correct identification can be a major source of error, especially in simple presence/absence surveys (Elmiger et al., 2023; Milberg et al., 2008; Scott & Hallam, 2003). Furthermore, heterogenous landscapes such as the uplands can have incompatible species lists. Positive indicator species lists for specific habitats are detailed in established monitoring methodologies (Perrin et al., 2014, JNCC, 2009), however scorecards that cover multiple habitats makes the selection of species either too large to be practical or too vague to be accurate. Despite concerns regarding the suitability of indicator species to reflect broader trends (Lindenmayer & Likens, 2011), they are still gaining popularity as a cost-effective tool to measure a variety of ecological issues (Siddig et al., 2016).
Proxy indicators can be vulnerable to sampling bias, observer bias, and measurement errors as shown in an evaluation of the US ecological integrity assessment (Brown & Williams, 2016). Further problems may arise with the biases inherent in converting raw data into categorical scores (Gorrod et al., 2013). Proxy indicators that rely on expert judgement should be used with caution as assessor experience levels may result in varied opinions of the same site (Gordon et al., 2016). Indicators such as vegetation structure and grazing intensity are particularly vulnerable to observer bias as they are reliant on the timing of the assessment and the level of surveyor experience of the habitat. To reduce the chances of error, indicators should be clearly defined to reduce ambiguity in their interpretation and have raw data recorded, such as average vegetation height instead of visual grazing assessments. This also allows for the adjustment of the yearly scores, should the scorecard need to be recalibrated. Applying scorecards with a fixed species list across a range of landscapes poses the problem of regional variation in the plant community. A scorecard that is suitable for use in one region may not be as accurate when applied to another, even if the habitats are classified the same. Recalibration is an important feature to consider when testing scorecards, but it may carry risk of damaging landowner trust if executed midway through a project.
These scorecards are designed for rapid, single survey assessment and so have reduced the sampling strategies used in traditional biological sampling, such as quadrats, transects, and relevés, in favour of those that suit rapid assessment. This favours visual estimates of percentage cover of positive indicator species alongside presence/absence surveys. Whilst this is undoubtedly rapid and easily accomplished, it does not adequately address the reasons why traditional biological sampling methods were developed. These sampling methods were designed to limit the amount of bias and background variation that can blur results. Without an adequate sampling strategy, the variation in site scores over time may be impossible to separate from natural variation or user variation (Magurran, 2021). This limits the ability to accurately assess the impacts of conservation efforts. Taking measurements, using assessment points, and recording raw data should limit variation and provide a record should the scoring criteria be adjusted. Shifting baselines and assessor experience will influence metrics that are based on expert opinion, such as grazing level or percentage cover over large areas.
Scorecards need to remain simple enough to be flexible and practical, and so there is inevitably a balance to be struck between sampling effort and accuracy (Elmiger et al., 2023). The number of metrics chosen, and their weighting will define the clarity of the message the scorecard is conveying as measuring too many variables makes it hard to decipher the leading causes of change. This issue becomes apparent when scorecards are designed to be multi-habitat. Each additional habitat that is added to a scorecard’s range will lose a degree of accuracy for all covered habitats. Stewart & Jones (2020) tested the HHP and PMP peatland scorecards on blanket bogs in the Outer Hebrides and did not find them to be an accurate representation of ecological condition across the mosaic of habitats that comprise heathland landscapes. To address this, the authors developed a more general habitat condition scorecard covering various grasslands and heaths, rather than make separate habitat specific scorecards (Stewart & Jones, 2020).
Addressing the issue of practicality and flexibility does not require a substantial overhaul of current methods or increasing the time spent in the field. Methods exist to incorporate Common Standards Monitoring into scorecards in Yorkshire, Wales and Scotland (Stewart & Jones, 2020; POBAS., 2022). These cards require species to be present in a given number of stops, which helps to address the lack of frequency data in the positive indicator species variable and limits the influence of one patch overshadowing the condition of the whole site, providing that bias in stop placement is addressed. This method also incorporates a ranking system for positive indicator species, and scores are awarded based on the presence of high-ranking positive indicators and a selection of more commonly associated species. This helps elevate sites with rarer or more unique species assemblages from sites that have a large number of the more common indicators.
Scorecards rarely account for area or landscape-scale features despite their importance to the distribution of species such as birds, carabidae and lepidoptera (Jeanneret et al., 2003; Merckx et al., 2009; Peach et al., 2001). This is in part due to the requirement for the indicators used in results-based AES to be within the control or influence of landowners (Ruas et al., 2021). However, assigning a single condition score over large, heterogenous landscapes such as an upland heath becomes a difficult task. A potential solution to this is to divide sites into land parcels, separated by condition or change of habitat, as is the practice in the PMP. However, opinion is split on this solution, as landowners may see low-scoring parcels as ‘sacrificial’ and discourage improvements (Gorman, unpublished data). Developing scorecards that are decoupled from an AES, and instead provide an ecological condition score to monitor restoration efforts, would not be bound by such restrictions.
Ultimately, there is a pressing need to increase our monitoring capabilities of Annex I habitats across the EU, yet priority areas included in results-based schemes are being monitored annually with project-specific methods producing data that is incompatible with wider monitoring. There is currently a missed opportunity to link these monitoring programs and produce valuable data to assist national monitoring efforts. The IBECA framework (Jakobsson et al., 2021) developed for forest and alpine ecosystems in Norway is an example of how a quantitative and cost-effective framework can be developed and we believe a similar approach may be viable with ecological scorecards.
Before seeking to adapt ecological scorecards to new purposes, we should consider the following questions:
1) What are we measuring and how well do the existing scorecard metrics represent it? Is there a clear scientific link between the metrics and the target?
2) To what extent can good sampling strategy be employed in the survey method whilst remaining a rapid assessment?
3) How closely does the scorecard represent ecological condition assessments for related protected habitats?
4) Does the scorecard need recalibration to reflect regional variation?
If we can adequately answer these questions to provide multi-user scorecards for Annex I habitats, we could increase our monitoring capabilities and potentially empower local communities to monitor their own environment and become engaged in restoration efforts. Failure to address these may lead to inappropriate monitoring strategies being used in future conservation projects, wasting valuable resources. This would be particularly important in regard to the extent of peatland restoration planned in Ireland. Quantifying restoration efforts in the rewetting of bogs would be an ideal role for a new scorecard, if they can be shown to accurately measure peatland recovery.