The use of integrated behavioural z-scoring in behavioural neuroscience – A perspective article

Complex pathophysiology in psychiatric disorders results in difficulties interpreting pre-clinical data. Guilloux et al. (2011b), proposed an integrated behavioural z-scoring procedure to improve the predictive validity of animal models by converging evidence similarly used to diagnose mental health conditions in humans. Here, I set out to give a brief review of the current methodology and literature using integrated behavioural z-scoring. Secondly, I will discuss the benefits and downfalls of integrated behavioural z-scoring and its potential future applications. Integrated behavioural z-scoring is a methodology used most frequently within animal models of depression and anxiety. Here, I am suggesting broadening the application of integrated behavioural z-scoring beyond the field of depression and anxiety to a three-step methodology to obtain disease-specific behavioural z-scores (i.e Schizophrenia index, Alzheimer ’ s disease index) to aid translatability and interpretation of data. Lastly, I suggest integrating not only behaviour but also biological variables to create converging psychological and physiological evidence to sustain face and construct validity, while improving predict validity.


Introduction
Human behaviour can be modelled in animals using comprehensive behavioural test batteries investigating the entirety of phenotypic features resembling the disorder in humans (Bovenkerk and Kaldewaij, 2014). Animal models are a useful tool to investigate the pathophysiology, psychopathology and potential novel treatments for psychiatric disorders (Bovenkerk and Kaldewaij, 2014). Although behavioural testing can allow researchers to measure multiple disease phenotypes (observable characteristic influenced by genetics and environmental factors (Bearden et al., 2016), variability between animals and differing experimental testing conditions, such as time of the day (Bailey et al., 2006;Roedel et al., 2006), experimenter and handling (Bailey et al., 2006;Bohlen et al., 2014), housing conditions (Bailey et al., 2006;Balcombe, 2006), sex (Georgiou et al., 2022) and age (Shoji et al., 2016) of the animal might result in altered outcomes called behavioural noise (Guilloux et al., 2011b). Behavioural noise or variability can result in difficulty interpreting data. This variability led Guilloux et al. (2011b) to suggest an integrated behavioural z-scoring technique to comprehensively analyse anxiety-and depressive-like states in mice using complementary tests which include a battery of behavioural tests investigating similar behavioural phenotypes (Guilloux et al., 2011b).
Z-scores are standard scores, which represent the number of standard deviations a score is above or below the mean outcome score (Andrade, 2021). By standardising the distribution, comparisons can be made across different variables (Andrade, 2021). A z-score is obtained by subtracting the population mean from a single raw score. This difference is then divided by the population standard deviation (Labots et al., 2018). Consequently, Guilloux et al. (2011) used this technique to normalise multiple behavioural tests of paradigms (complementary tests), into a single emotionality z-score index. This methodology proposed by Guilloux et al. (2011b) relies on converging evidence similarly used to diagnose mental health conditions in humans, to standardise complex pathophysiology into a single paradigm. An example in human medicine of converging evidence is the medical scale used in individuals with schizophrenia to measure symptom severity, Positive and Negative Syndrome Scale (PANSS) (Kay et al., 1987). This scale investigates positive, negative symptoms and general psychopathology. These individual symptom scores can then be combined to assess overall symptom severity (Leucht et al., 2005).
Within this review, I will provide a brief overview of the current methodology of integrated behavioural z-scoring and literature using integrated behavioural z-scoring in rodent studies. Then, I will be discussing the benefits and downfalls of an integrated behavioural z-scoring and the potential future directions of this methodology.

Eligibility & inclusion criteria
I investigated original articles using integrated behavioural z-scoring within the manuscript. To be included studies had to use z-scoring on original, complementary behavioural data in rodent models. No date or language restriction was applied. As this publication is based on Guilloux et al. (2011b), I included studies, which cited this publication and the publications itself.

Database search strategy
The databases used were Pub Med and Scopus additional records were identified by scanning the reference list and ResearchGate. The final search was performed on 21/06/2022.

Report selection
The author (AKK) determined the eligibility of papers by screening titles, abstracts, and methodology for relevance. Eligible documents were then read as a whole to analyse if the articles matched the inclusion criteria. Excluded articles were documented, and reasons were given for exclusion.

Data extraction
AKK extracted information from relevant publications including the area of research, animal model/paradigm, year of publication, behavioural test, behaviours integrated and naming of z-score/index by individual publications.

Database search
The initial search resulted in 252 results including the original publication by Guilloux et al. (2011b). Duplicates (n = 91) were excluded resulting in 161 records being screened for eligibility. During eligibility screening, 61 additional articles were excluded, as 17 articles were not original research articles (Review (n = 13), Book chapter (n = 1), Protocol (n = 2), JOVE Video (n = 1)), and 32 articles did not perform integrated behavioural z-scoring and seven articles did not perform z-scoring on behavioural analysis, four studies were excluded as they did not conduct behavioural testing in rodents and one full text could not be obtained. This search process resulted in 100 articles being included in this review (Fig. 1).

A brief overview of the current methodology
Integrated behavioural z-scoring was first proposed by Guilloux et al. (2011b) to comprehensively analyse anxiety-and depressive-like states in mice using complementary tests to reduce behavioural noise (Guilloux et al., 2011b). Complementary tests, a battery of behavioural tests relating to similar emotional states, are based on the core symptoms evaluated in the human depression scales (Guilloux et al., 2011b).
All publications, identified within this review had the same goal as the Guilloux et al. (2011b) publication, however, some varying methodologies can be identified. Within this section, we will be identifying the different methodologies.
In the original publication by Guilloux et al. (2011b) z normalisation was calculated by subtracting the observed parameter (X) from the group's mean(μ), which is then divided by the population's standard deviation (σ).
The results of this equation will indicate how many standard deviations an individual is above or below the mean of the control population. The directionality of the scores will need to be adjusted according to the integrated behavioural variable. This means, for example, that when investigating social behaviour with individual variables we must decide if all variables share the same directionality. In a social behavioural paradigm, one might investigate, sniffing behaviour (social behaviour), or avoidance behaviour (anti-social behaviour). This variable directionality needs to be adjusted when using behavioural zscoring that in this example decreased avoidance behaviour was interpreted as increased social behaviour (Kraeuter et al., 2020).
To obtain an integrated behavioural z-score the individual z-scores of each animal from each test are added and divided by the number of tests included in the analysis.

Number of Tests
This overall integrated behavioural z-score can then be averaged for all animals per group to obtain mean and standard deviations for graphical group representation. Z-score data can then be used statistically for further analysis of significance.
Another method based on the aforementioned methodology by Guilloux et al. (2011b) was proposed by Labots et al. (2018). This methodology was proposed as the original methodology by Guilloux et al. (2011b) did not account for studies, which do not have an identifiable reference or control group, such as sex and strain differences studies. Secondly, the originally proposed methodology would be impossible, if the control group would have a standard deviation of zero (Labots et al., 2018). Labots et al. (2018) follow a similar methodology to Guilloux et al. (2011b), however, instead of using a single reference group, Labots et al. (2018) used the pooled data of all groups as a reference, which will further decrease the chance of a standard deviation of zero.

Results: review of the current literature using z-scores
Integrated behavioural z-scoring has increasingly been used since the first publication by Guilloux et al. (2011b) from three publications in 2011-21 publications in 2021 (Fig. 2). Although the use of integrated behavioural z-scoring has increased (Fig. 2) the majority of research (64%) has not deviated from the original field of research by Guilloux et al. (2011b) on major depressive and or anxiety disorder (Fig. 3).

Discussion: benefits and downfalls of an integrated behavioural zscoring
Thus far integrated behavioural z-scoring has not been used extensively across different areas of research (Fig. 3, Table 1). Within this section of the review, I will be discussing the benefits and downfalls of behavioural z-scoring and comparing z-scoring to alternative methods used.
Integrated behavioural z-scoring allows for improved interpretation of behavioural data, by normalising data with different scales to one scale (Guilloux et al., 2011b). Furthermore, as mentioned previously behavioural testing can create great variability or behavioural noise due to altering testing conditions and inter and between animal differences  (Bailey et al., 2006;Balcombe, 2006;Bohlen et al., 2014;Georgiou et al., 2022;Roedel et al., 2006;Shoji et al., 2016). This variability can be reduced by using an integrated behavioural z-score (Guilloux et al., 2011b). An alternative to an integrated behavioural z-scoring is principal component analysis (PCA). PCA can be used to produce composite variables, which are variables made up of two or more variables, which are highly related to one another (Labots et al., 2018). PCA is less suited for behavioural data as consistent variables need to be obtained for PCA (Guilloux et al., 2011b).
Two methodologies are used to determine an integrated behavioural z-score (Guilloux et al., 2011b;Labots et al., 2018). Labots et al. (2018) demonstrated a technique to investigate cohorts, which do not have an obvious reference or control group. Investigators wanting to use integrated z-scoring need to clearly identify, which of the two above                Although combining complementary data might create stronger evidence, this will result in potentially overlooking important individual results and reduce comparability across studies. Integrated behavioural z-scoring allows for combining and comparing them to the absolute controls of that study. Integrated behavioural z-scores can be compared across studies even if individual result might show conflicting results. Studies presented here used different behaviours and behavioural test to measure the same construct in mice. In future, studies should aim to use the same or comparable behaviours to create consistency and reproducibility between laboratories when measuring specific constructs.
Reporting individual results allows for comparing results across studies. All publications reviewed here still report all individual results and use the integrated behavioural score as an additional variable to improve predictive validity by combining converging evidence.
Researchers using behavioural z-scoring will need to take care to not over-value individual behaviours in comparison to the condition in humans. Using behavioural z-scoring authors can determine, how much weight they give each individual behaviour parameter, which could result in creating an imbalanced representation of the results. This can be negative, if not presented transparently, however, on the contrary, being able to place greater importance on a specific variable might help to closer resemble the human disease phenotype and with this improve construct validity. Here, it will be important for the author to provide a transparent description and justification of the weight place on individual behaviour/biological variables.
Creating one score improves non-topic experts to interpret and understand behavioural data. This could allow for easier correlation of behaviours with biological variables. Authors should aim to model the integrated behavioural z-score on the human complex psychological and physiological condition (Wittchen et al., 2014). The current emotional behavioural z-score solely relies on behavioural measures ignoring the biological changes. Below in future perspective uses of z-scoring, I describe the possibility of creating a comprehensive disease-specific z-score (Fig. 5), which uses behavioural and biological variables to create converging evidence.

Future use of integrated behavioural z-scores
Integrated behavioural z-score has mostly been used in the field of anxiety and major depressive disorder to assess emotionality in animal models (Fig. 3, Table 1). Within the following section, I will be proposing further potential ideas to use an integrated behavioural z-score beyond its original proposed purpose. I would like to propose to investigate behavioural domains, such as social, anxiety-like, depressive-like and cognitive behaviour to create disease related z-scores comparable to human scales currently used. Current z-score literature has started to investigate specific behavioural domains such as "Depression-like zscore" (Apazoglou et al., 2018;Bullich et al., 2022;de Sá-Calçada et al., 2015;Glover et al., 2019;Huynh et al., 2011;Kerloch et al., 2021;Orrico-Sanchez et al., 2020;Subhadeep et al., 2020), "Anhedonia z-score" (Bansal et al., 2022;Fee et al., 2021) and "Social z-score" (Giacomin et al., 2018;Glover et al., 2019). Currently, there is a lack for consistency of how many and which behaviours and behavioural test are to be integrated into an integrated behavioural z-score (Table 1). Inconsistency between studies will reduce comparability between studies. Creating behavioural domains might improve consistency. Here, I am proposing behavioural domains relevant for schizophrenia.
Individuals with schizophrenia are currently assessed according to the Positive and Negative Syndrome Scale (PANSS) (Kay et al., 1987).
Patients are rated from 1 to 7 in 30 different symptom scales differentiated into the positive, negative, and general psychopathology scale (Kay et al., 1987;Leucht et al., 2005). Animal behavioural test can also be classified into these three domains (Fig. 4, this list is only an example and is not exhaustive). We can generate an individual z-score per behavioural domain as performed for humans. After generating an integrated z-score for each individual domain an overall schizophrenia-like behaviour integrated z-score could be generated. `.
Step 1. Generate z-score for each individual test.
Step 2. Cluster conceptually related behaviours together as suggested in Fig. 4  Overall General Psychopathology scale Integrated behavioural z − score = Z Test1 + Z Test2 + Z Test3 + …

Number of Tests
Step 3. All three scales are combined into one overall schizophrenialike behaviour integrated z score.
Overall Schizophrenia − like Integrated behavioural z − score = Z Positive scale z− score + Z Negative scale z− score + Z General Psychopathology scale z− score + …

Number of Tests
The benefits of a stepwise approach would allow for transparency of individual behavioural results influencing the integrated scores. Investigating individual domains can demonstrate if treatment options might be more beneficial for one domain compared to another. Lastly, using the overall schizophrenia-like behaviour integrated z-score would create one objective measure, like the PANSS in humans. This objective measure might be used to determine how effective potential novel therapeutic treatment options might be. Beyond, the benefits in the drug development process current and future animal models can be assessed for their translatability using the overall schizophrenia-like behaviour integrated z score. This methodology could be applied to other neurodegenerative and neurodevelopmental disorders generating further disease specific integrated behavioural z-scores such as an Alzheimer's diseases score.
Overall, I suggest constructing overall disease specific integrated behavioural z-scores as proposed here for schizophrenia based on currently used diagnostic scales to increase face and predictive validity and with this translatability of animal research.
Neurodegenerative and neurodevelopmental disorders are a complex combination of psychological and physiological changes (Wittchen et al., 2014). Therefore, to understand the more complex pathophysiology, future research could use the same mathematical technique suggested here to generate a biological z-score, which could be further subdivided into various biological domains such as metabolic z-score (Zemdegs et al., 2016b), gene expression changes (Kerman et al., 2012) (Marchisella et al., 2020), antioxidant activation (Spero et al., 2022) or further parameters such as plasma corticosterone levels, hippocampal volume, electrophysiological recordings. These additional variables could be used to correlate physiological marker with psychological marker to understand potential relationships between variables. In addition, physiological and psychological marker could be integrated to produce a disease comprehensive score (Fig. 5). All approaches proposed would allow for greater understanding of complex psychiatric disorders increasing translatability between animal models to the human conditions, which is an interplay between behavioural and biological variability. This comprehensive approach would create converging evidence to not only minimise disorders to either a psychological condition or physiological condition but to create a whole-body approach increasing the predict validity of findings.
Additional application in animal research might include to use behavioural z-scoring in a repeated measure longitudinal study design to evaluate behavioural noise by investigating the individual behavioural trajectory of each mouse according to different experimental conditions sex and age (Bailey et al., 2006;Balcombe, 2006;Bohlen et al., 2014;Georgiou et al., 2022;Roedel et al., 2006;Shoji et al., 2016).
Other potential future applications of behavioural z-scoring might be applicable to human studies as it has been done for cognitive task in schizophrenia patients (Andrade, 2021) or potentially integrating demographic variables to be correlated with variables of interest. For instance, creating a socio-economic status index, this construct is complex and often measured using multiple measures such as education, income, and occupation (Anon, 2022). A socio-economic status index would allow to form a comprehensive view of the socio-economic status of a participant, which could be used to to understand the impact socio-economic status might have on the outcome variables.

Conclusion
In conclusion, within this perspective review, I set out to summarise the current uses and methodologies of integrated z-scoring. I found that integrated behavioural z-scoring has not yet been widely used within the broader literature but is a methodology of increasing interest. Here, I proposed to expand the use of this technique in other areas of behavioural neuroscience in neurodegenerative and neurodevelopmental disorders such as schizophrenia using an overall disease specific integrated behavioural z-scores based on currently used diagnostic scales, creating disease specific diagnostic domains. Domains will enable greater translatability of animal models by improving face and predict validity in areas such as the treatment development process. Additional application might include integrated socioeconomic status indexes in human studies. Integrated behavioural z-scoring provides strong support for disease-and treatment-related phenotypes.
Lastly, I proposed a comprehensive disease specific integrated zscore integrating behavioural pathology and pathophysiology increasing the predict validity of findings by create converging evidence to create a whole-body approach.

CRediT authorship contribution statement
The author (Ann-Katrin Kraeuter) is responsible for the conceptualisation of the researched idea within the manuscript, literature search and inclusion and writing of all drafts, revisions and editing of the final manuscript.

Conflict of Interest
The author has no competing interests to declare.

Declaration of Competing Interest
None.

Data availability
No data was used for the research described in the article.