Quantifying GFAP immunohistochemistry in the brain – Introduction of the Reactivity score (R-score) and how it compares to other methodologies

Background: Immunohistochemical upregulation of glial fibrillary acidic protein (GFAP) is commonly used to detect astrogliosis in tissue sections and includes measurement of intensity and/or distribution of staining. There remains a lack of standard objective measures when diagnosing astrogliosis and its severity. New method: Aim was to test a novel semi-quantitative assessment of GFAP which we term reactivity (R)-score, on its reproducibility and sensitivity to measure astrogliosis. The R-score, which is based on the proportion of astrocytes seen at each level of reactivity, was compared to 3 other commonly employed quantification methods in research: (1) thresholding, (2) point-counting, and (3) qualitative grading. Sub-regions of the hippocampus, medulla, and cerebellum were studied in piglet, and 4 human cases with clinically reported astrogliosis. Intra-assay coefficient of variation (CV) and percentage agreement cut-offs of ≤ 20% and ≥ 75% were used respectively to compare amongst the methods, with outcome measures being reproducibility across serial and non-serial sections, resilience to changes in experimental conditions, and inter-and intra-rater concordance. Results: Averaged across 3 brain regions, the intra-assay coefficient of variation (CV) was 5% for R-score, with inter and intra-rater kappa scores being 0.99 and 0.95 respectively. Comparison with existing methods and conclusions: Based on CV values, the R-score was superior to thresholding (CV of 51%) and point-counting (CV of 16%), with the qualitative grade being found to be on par (percentage agreement 95%). Given the ease, reproducibility and selectivity of the R-score, we propose its validity in future research purposes and clinical application.


Introduction
Astrocytes are a subtype of glial cell that play an integral role in the development, structure, and function of the central nervous system (CNS).Astrogliosis is a highly characteristic but non-specific response to homoeostatic instability of the CNS (such as in the case of injury) whereby astrocytes become 'reactive', undergoing a range of morphological, molecular and functional changes, dependent on the nature and severity of the insult (Sofroniew and Vinters, 2010).
Glial fibrillary acidic protein (GFAP) is the main component of astrocyte intermediate filaments and has become a prototypical marker of reactive astrocytes (reviewed in (Preston et al., 2019)).Injury and/or insult to the CNS increases the production of GFAP so that previously immunonegative cells may become immunopositive (thus, an increase in the number of GFAP-positive cells), and astrocyte cell bodies and their processes may appear hypertrophied (thus, a change in morphology), both indicative of astrogliosis (Sun and Jakobs, 2012;Zhu et al., 2004).Acute and diffuse trauma is sufficient to evoke a transient upregulation of GFAP, even in the absence of tissue damage (Wilhelmsson et al., 2006), while longer and more severe injuries/insults lead to more significant GFAP upregulation, with more marked hypertrophy of processes and distal processes potentially extending into adjacent astrocyte domains (Myer et al., 2006).The degree of GFAP upregulation depends on the type of insult (Zamanian et al., 2012;Zhang et al., 2016), the brain region affected (Sun and Jakobs, 2012;Emsley and Macklis, 2006;Messing and Brenner, 2020;Boos et al., 2021), and survival time following the insult (Hausmann et al., 2000).
Different methods have been used to measure changes in GFAP expression in formalin-fixed, paraffin-embedded tissue using immunohistochemistry (Summarised in Supplementary Table 1), however, the results differ depending on the method employed (Weber et al., 2013; point-counting, or more complex quantitative methods of assessment (Summarised in Supplementary Table 1).By contrast, in the clinical setting, where time, money and tissue availability are often more limited, a qualitative assessment ('mild', 'moderate' or 'severe') based on cell morphology and number is often used.However, in the absence of standardised criteria, this qualitative assessment is subjective resulting in considerable inter-rater discordance (Escartin et al., 2019;Leitner et al., 2022).Numerical grades are often applied to qualitative descriptions in a research setting to allow for comparison between groups (i.e., 0 = normal, 1 = mild, 2 = moderate, 3 = severe) (Gouw et al., 2008;Sofroniew and Vinters, 2010), yet these remain subjective with no set criteria (Summarised in Supplementary Table 1).
In an attempt to address this issue, we introduce a novel method which we term the Reactivity score (R-score), which standardises the qualitative method.This R-score, a derivative of the Histochemical score (H-score; Supplementary Section 2), quantitatively combines assessment of cell number and morphology, providing a weighted score indicative of astrogliosis severity.Utilising piglet brain tissue, we apply the R-score method and compare it to three other methods commonly used to assess GFAP expression by immunohistochemistry, including the qualitative method based on morphology, and quantitative assessment based on the area of GFAP immunoreactivity or the number of GFAPpositive cells, both of which do not take cell morphology into account.We therefore test the hypothesis that the R-score will be reproducible, and subsequently apply it to four clinical cases to validate its clinical applicability in diagnosing astrogliosis.

Tissue collection
Piglet brains were those collected previously in our laboratory (Machaalani and Waters, 2003a,b;Peiris et al., 2004) following approval by the Animal Ethics Committee of the University of Sydney (K14/1-2000/3/3075 and K14/2-2003/3/3708).On extraction, the whole brain was fixed in 10% neutral-buffered formalin for 2 weeks, sectioned into 4 mm blocks, and returned to the fixative for a further 1 week before paraffin-embedding.The post-mortem interval never exceeded 30 min.3 piglets aged 13-14 days were studied; 2 piglets were not exposed to any insult while the 3rd had been exposed to intermittent hypercapnic hypoxia (model and exposure detailed in (Waters and Tinworth, 2003)), known to stimulate astrogliosis (Li et al., 2010;Turlejski et al., 2016).Formalin fixed paraffin embedded (FFPE) blocks of the hippocampus, cerebellum and the medulla were serially sectioned at 7 µm thickness using a rotary microtome, mounted onto silanized glass slides, dried overnight, and stored at room temperature in a dust-free environment for at least 1-week prior to immunostaining.
FFPE tissue sections of hippocampus and cerebellum from 4 paediatric cases of sudden unexpected deaths were also studied.These cases were derived from a cohort in our laboratory with approval to utilise brain tissue for analyses by the Ethics Committee of the University of Sydney (X13-0038 & 2019/ETH06915) and the NSW state Coroner.Mild to moderate (2 cases) and moderate to severe (1 case) astrogliosis had been identified in the hippocampus and the cerebellum (1 case) at autopsy.

GFAP immunohistochemistry
Sections were immunostained for GFAP manually using a standard laboratory protocol (Luijerink et al., 2020) unless otherwise stated.Briefly, sections were deparaffinised in xylene and rehydrated through a graded series of ethanol washes.Microwave antigen retrieval was performed with 10% Tris-EDTA buffer (1 mM EDTA, 1 mM Na Citrate, 2 mM Tris; pH 9.0) for 14 min.Sections were quenched for 20 min in a 3% H 2 O 2 , methanol and phosphate-buffered saline (PBS) solution and subsequently blocked with 10% normal horse serum (NHS) for 30 min.
Overnight incubation was carried out using GFAP primary antibody (Z033401-2, Agilent, details and specificity described in our previous manuscript (Luijerink et al., 2021)) diluted to the relevant concentration using 1% normal horse serum (NHS).The following day, sections were incubated at room temperature for 1 h in an affinity-purified anti-mouse/anti-rabbit secondary antibody (1:400; BA-400, Vector Laboratories Inc), followed by 1 h incubation with avidin-biotin horseradish peroxidase complex (VEPH-4000, Vector Stain ABC kit; Vector Laboratories Inc).Colour labelling was achieved with 3,3′diaminobenzidine (DAB) (K346811, DAKO).Slides were counterstained with haematoxylin before dehydration through a graded ethanol series and clearing in xylene prior to cover slipping.

Microscopy and regions of analysis
Imaging, capturing, and quantification were performed by a single observer (LL).Stained sections were viewed using light microscopy (Olympus Upright BX51 Microscope, Olympus Optical Co., Ltd Japan) and imaged using image capture software (DP Controller, Olympus Optical Co, Ltd, Japan).
Four microscopically defined regions of interest (ROI) were analysed: the hypoglossal nucleus (XII) in the rostral medulla; the internal granular layer of the cerebellar cortex (IGL); and the CA1 and CA4 regions in the hippocampus (Fig. 1).For each ROI, multiple, nonoverlapping images were captured using a 40x objective lens, avoiding large blood vessels and artefacts.Where the ROI was smaller than the captured image, a grid overlay was applied using the inbuilt 'grid' plugin, and analysis was carried out within multiple 0.01 mm 2 subregions.This was mostly applied to the CA1 of the hippocampus.The Red Green Blue images were then exported to Fiji imaging software (Schindelin et al., 2012) for GFAP quantification.

GFAP quantification
Four different methods were applied to assess GFAP staining: thresholding to determine the area of GFAP staining; point-counting to determine the number of GFAP-positive cells per mm 2 ; qualitative grade based on cell morphology; and the R-score which combines the graded score and point counting (see below).These methods are commonly used in research settings and are often used in combination since they provide information on different components of the astrocyte response to injury) (Summarised in Supplementary Table 1).
1. Mean area of GFAP expression (thresholding); For each image, colour de-convolution was performed on RGB images using the inbuilt 'H DAB' plug-in to isolate blue, brown and green colour layers.The brown layer (R:0.268,G:0.570, B:0.777) representing positive DAB-GFAP staining, was selected for analysis by manual thresholding using the histogram, which displays the distribution of pixel intensity values across the image (example provided in Supplementary Fig. 1).A threshold value that effectively separates GFAP-immunostaining from background was selected.This method accounts for staining of astrocyte cell bodies, as well as staining from distal processes of other astrocytes.The area of the image above threshold was calculated as a percentage using the inbuilt 'measure' function.An average percentage was taken for ROI's with more than one image.2. GFAP-positive astrocytes per mm 2 (point counting); The number of GFAP-positive astrocytes (for which the nucleus was visible) was counted manually in each image using the in-built 'multi point' function and the value was converted to counts per mm 2 .Each image was captured at 400X magnification, measuring 0.135 mm 2 .Total counts for a given region were divided by the total area to produce the average count per mm 2 .3. A graded score based on the morphology of GFAP-positive cells, termed 'subjective grade' herein, adapted from (Gouw et al., 2008;Sofroniew and Vinters, 2010); examples provided in (Fig. 2).For each image captured, astrogliosis was assessed using a 4 tier scoring system:'0′ = no astrogliosis (visible astrocyte cell bodies and processes, with weak staining of processes); '1′ = mild astrogliosis (slight cellular hypertrophy, with enlarged cell body and increased staining of glial processes); '2′ = moderate astrogliosis (significant cellular hypertrophy whereby individual processes are no longer visible due to increased staining, although potentially a few long, thick processes); '3′ = severe gliosis (gemistocytic appearance of cell bodies and dense staining of glial processes).A grade > 0 was only applied to an image when at least 3 GFAP-positive cells were reactive, for example, at least 3 cells that appeared to be mildly reactive had to be present to receive an overall grade of 1, and at least 3 cells had to be moderately reactive to be present to receive an overall grade of 2.
Given that some ROI had multiple images captured for analysis, the image with the highest score was taken.Thus, if 1/3 images for a given ROI had a grade of 1, with the others being graded a 2, the final grade would be 2. astrogliosis score as defined in method #3 above.Reason for selecting this equation as opposed to other possible ones as indicated in Supplementary Section 2, is due to the fact that we did not count GFAP-negative astrocytes.As such, score range is from 0 to 400, with a R-score of 0 indicating that no GFAP-positive cells are identified, and 100 when 100% of cells are a score of 0 (i.e.non-reactive).
Applying this equation allows for weighting the amount of astrogliosis given that a mixture of the scores could potentially be present (e.g., Fig. 2).

Outcome measures
The reproducibility of each of the 4 quantification methods was assessed in both piglet and human tissue using serial sections, non-serial sections and different antibody concentrations as detailed below: 1. Serial sections were used to assess whether each method gave reproducible results when applied on up to 4 serial sections stained in a single experimental run 2. Non-serial sections from different brain blocks across the ROI: to determine whether the quantification method yielded similar values for each ROI 3. The effect of varying the primary antibody concentration (1:5000, 1:3000 and 1:1000) on quantification using the four methods was assessed.This was only done in hippocampal tissue, thus CA1 and CA4.
In addition, test-retest reliability and inter-rater reliability were assessed using human brain tissue 4. For test-retest reliability, quantification was repeated on 25 sections previously captured and analysed after 2 weeks 5.For inter-rater reliability, quantification was undertaken independently by 2 raters (LL and RM)

Statistical analysis
All data obtained was entered into Microsoft Excel and exported to SPSS for Windows (version 25; SPSS (IBM) Inc., Illinois, USA) for statistical analysis.For continuous outcome values across serial sections (mean area of positive GFAP expression, GFAP-positive astrocytes per mm 2 and R-score), the intra-assay coefficient of variation (CV) was calculated.The intra-assay CV refers to the variation obtained from replicates in the same experiment, and is calculated by (SD duplicates)/ (mean duplicates) x 100% (Van Der Hoeven et al., 2022).For ligand-binding assays (Van Der Hoeven et al., 2022) and immunohistochemistry (Smith and Womack, 2014), CV ≤ 20% is accepted.For ordinal values (subjective grade) percentage agreements were calculated (Araujo and Born, 1985).Agreement ≥ 75% was considered acceptable (Graham et al., 2012).Comparisons for different tissue blocks, test-retest and inter-rater were made using the 2-way-mixed, consistency analysis intraclass correlation coefficients (ICC).Ordinal inter-rater outcomes were compared using the Kappa statistic.Effect across antibody concentrations was tested using Pearson correlations for continuous data and Kendall Tau correlations ordinal data.P-values taken at ≤ 0.05.

Subjective grade (0− 3)
Subjective grades for each image ranged between 0 and 1, with only a few ROIs scoring 2 and none receiving a grade of 3.There was high concordance in values across consecutive sections, with percentage agreement being 95% (Table 1), yet not for non-consecutive sections (ICC=0.40,p = 0.29, Table 1).There was no correlation between primary antibody concentration and the subjective grade (data not shown).

Correlation between quantification methods
When analysed across all tissue sections (n = 54), with one exception, the results derived from all 4 quantification methods were highly correlated (Table 2); lack of correlation was between thresholding and subjective grade (p = 0.10) (Table 2).

Comparison between brain regions
Although the results of quantitation were highly correlated across ROIs, when ROIs were analysed individually, the results derived by thresholding did not correlate with those derived using the other 3 methods (Table 1).Considering the 2 control piglet cases for baseline values, the pattern of expression for the thresholding method showed highest in the XII nucleus and lowest in the CA1, while counts and Rscore showed CA4 to have highest and CA1 lowest, with subjective grade not differentiating between them (Table 3).

Sensitivity to detect clinical assessments using human tissue
Based on the acceptable CV and percentage agreement values obtained when assessed in piglet brain, point-counting, the R-score and subjective grade were used to assess astrogliosis in human tissue to ensure that in addition to being consistent, they were sensitive in detecting what is currently applied clinically.Across consecutive sections, values were concordant for the R-score (CV =5%) and subjective grade (percentage agreement of 100%), but not for the point counting method (CV = 24%, Table 4), while across non-consecutive tissue sections, values were concordant for all 3 methods (ICC > 0.97 and p < 0.001 for all, Table 4).
The R-scores had high concordance with the degree of astrogliosis described in the pathology report for each case (Table 4 & Fig. 4).

Discussion
In this study we demonstrate that the R-score, based on the morphological assessment of each GFAP-positive cell, was the most reproducible method for quantifying astrogliosis, with high inter-and intra-rater concordance.

Current methods for assessing GFAP immunohistochemistry
In the research setting, numerous methods have been described to assess astrogliosis based on GFAP immunohistochemistry (summarised

Table 2
Correlation between the methods.

Table 3
Pattern of GFAP levels across the brain regions according to quantification method.
L. Luijerink et al. agreement (Escartin et al., 2019;Leitner et al., 2022), and only limited ability to distinguish the range of phenotypic changes in astrocytes (McNeal et al., 2016).Similarly, in the research setting, astrogliosis is poorly defined and may refer to an increase in overall GFAP immunostaining, a change in astrocyte morphology, an increase in the number of GFAP-positive cells, or a combination of these.Compared to qualitative grade, quantitative assessment offers certain advantages, including being less subjective and more sensitive to detecting subtle changes (McNeal et al., 2016).
We found the thresholding method to not be reproducible, having consistently failed to satisfy our measures of reproducibility across tissue sections and differing antibody concentrations.Furthermore, it appeared to be inconsistent with the other methods when considering for which brain regions GFAP expression was highest.While thresholding identifies all GFAP-positive staining in both cell bodies and processes, the process is somewhat subjective (Muñoz-Castro et al., 2022;Healy et al., 2018;Johnson and Walker, 2015), since background normalisation to differentiate between signal and background or 'noise' is based on parameters set by the end user (Supplementary Fig. 1).
Although results were consistent when assessed in consecutive sections using the point-counting method, this was not the case when nonconsecutive sections or differing antibody concentrations were used.The lack of consistency across non-consecutive sections is not surprising given that this method is purely based on cell quantity, and numbers are likely to differ across the rostral/caudal or ventral to dorsal aspect of a particular ROI.The hippocampus, for example, differs widely in function along its longitudinal axis, with the dorsal portion involved in learning and spatial memory, and the ventral portion involved in regulating emotion and motivation and subsequently projecting to different brain regions (Kheirbek and Hen, 2011).Animal models have reported that GFAP upregulation is not consistent between dorsal and ventral portions following injury (Jegliński et al., 1995;Mandwie et al., 2022).Variation in counts according to changes in antibody concentration are likely due to increased binding of antibodies to their respective antigens with increased concentrations, indicating at suboptimal low antibody concentrations, true GFAP-positive astrocytes are missed, although morphologically and equating it according to the R-score, this was not an issue given that the R-score did not change with different antibody concentrations.
Compared to point-counting and thresholding methods, qualitative values gave consistent results across serial tissue sections and different tissue blocks (human only), with acceptable inter-and intra-rater concordance.A baseline standard was set in the current study, whereby at least 3 GFAP-positive cells had to be of a particular grade to apply this to the whole tissue section, a parameter we set, in contrast to the literature where such parameters as these are not clearly defined (Lyck et al., 2008;McNeal et al., 2016).

R-score
While qualitative assessments may be useful for identifying moderate or severe astrogliosis, the novel R-score provides a standardised measure, incorporating the proportion of astrocytes in each ROI with different morphological states, reflecting the severity of astrogliosis.Although this remains somewhat subjective since it is based on assessing the morphology of individual cells, we show that the R-score is consistent when assessed across at different time-points and by different raters.Given the large range of R-score values (0− 400) compared to typical subjective measurements (2-4 tier value ranging from 'normal' to 'severe'), the R-score will likely be able to detect even subtle group differences when undertaking statistical analysis in both clinical and animal studies and can potentially be applied to other immunohistochemical markers.Moreover, we suggest that its clinical diagnostic potential is afforded and informative, whereby an R score > 100 would indicate astrogliosis is occurring and set cut-off ranges can be applied such that an R-score ranging 120-160 would be suggestive of mild astrogliosis, and > 160 of moderate astrogliosis, as was demonstrated on the 4 clinical cases we studied.We propose herein that the R-score will overcome non-concordance between pathologists having set criteria of quantification.Of note, accuracy is still not guaranteed given assessment of activated morphology status remains subjective.Set criteria for assessing reactivity of individual cells is required (requiring a global task force to define this (Escartin et al., 2019), although the R-score would be incorporated in the method of diagnosis to account for the two major defining features of astrogliosis simultaneously; change in astrocyte morphology and as a proportion.

Conclusions
This study compared several commonly used methods for quantifying GFAP-immunohistochemistry to detect astrogliosis and showed the R-score, a derivative of the H-Score, that takes GFAP-positive cell morphology into consideration to be reproducible, not sensitive to methodological parameters, and resulted in good concordance, thus making its clinical applicability eminent.

Declaration of Competing Interest
There is no conflict of interest to declare.

Fig. 2 .
Fig. 2. GFAP immunostaining in CA4 and XII of the piglet hippocampus and CA4 and CA1 of the human infant, denoting examples of astrocytes given a score of 0 (green), 1 (blue), and 2 (red) within the same tissue section.Regions selected on the basis of having all 3 astrocyte sub-types present.Scale bar = 100 µm for main panel, and 10 µm for the enlargements of 0-2.
Pearson correlation was used for continuous variables and Kendall's tau for ordinal values.Values expressed as Pearson's r/Kendall's Tau B (p value).Significance taken at p ≤ 0.05.

Fig. 4 .
Fig. 4. Examples of sections that received an overall R-score ~130 in the CA4, and ~160 in the CA4 and CA1 of case # 3 & 4 (respectively) of human cases diagnosed with mild-moderate astrogliosis post-mortem.As the R-score increases, the proportion of astrocytes stained with a higher Reactivity score (> 0) also increases.Scale = 100 µm.

Table 1
Mean scores across serial sections for the 4 quantification methods of thresholding, counts per mm 2 , R-score and subjective grade.
), it is both time-consuming and labour intensive and is not practical for assessing astrogliosis in a diagnostic clinical setting.Consequently, in the clinical setting, where the number of sections available are limited, time and cost are constrained, reproducible and accurate methods that are time/cost efficient are required.Currently in the clinical setting, 'astrogliosis' is usually assessed qualitatively, and while this is cost and time-efficient, it is subjective with poor inter-rater

Table 4
GFAP-positive astrocytes per mm 2 , R-score and qualitative grade applied to human cases.