Coup in the coop: Rank changes in chicken dominance hierarchies over maturation

Chicken dominance hierarchies or pecking orders are established before maturation and maintained by consistent submissive responses of subordinate individuals, leading to stable ranks within unchanged groups. We observed interactions of 418 laying hens ( Gallus gallus domesticus ) distributed across three small (20) and three large (~120) groups. The observations were performed before sexual maturation (young period) and additionally after onset of maturation (mature period) to confirm stability of ranks. Dominance ranks were estimated via the Elo rating system across both observation periods. Diagnostics of the ranks revealed unexpected uncertainty and rank instability for the full dataset, although sampling appeared to be adequate. Subsequent evaluations of ranks based on the mature period only, showed more reliable ranks than across both observation periods. Furthermore, winning success during the young period did not directly predict high rank during the mature period. These results indicated rank changes between observation periods. The current study design could not discern whether ranks were stable in all pens before maturation. However, our data rather suggested active rank mobility after hierarchy establishment to be the cause for our findings. Once thought to be stable, chicken hierarchies may provide an excellent system to study causes and implications of active rank mobility.


Introduction
Dominance hierarchies were first described in chickens by Schjelderup-Ebbe (1922). They are found across multiple taxa (Holekamp and Strauss, 2016;, and are characterised by asymmetrical relationships of dominance and subordination between the members of a group (Drews, 1993;. From observations of dominance relationships or individual attributes of dominance, researchers can infer the social rank of an individual, which can have direct implications for an animal's fitness (Creel, 2001;Snyder-Mackler et al., 2020).
The mechanisms of hierarchy establishment and maintenance, as well as the advantages and disadvantages of individual rank, can be investigated by examining social instability (Strauss and Holekamp, 2019a;Tibbetts et al., 2022). Social instability, often referred to as hierarchy dynamics, can occur through various processes including demographic and/or ontogenetic changes (Goldenberg et al., 2016;Wallace et al., 2022;Williamson et al., 2016), and individuals' status-seeking behaviour (Ehardt and Bernstein, 1986;Strauss and Holekamp, 2019b). The resulting changes in individual rank, can either be due to passive mobility, without changes in the hierarchy order, or active mobility, with reordering of individual ranks .
Social instability can be assessed at the group level by characterizing hierarchies in terms of their transitivity and steepness (Sapolsky, 1983;Silk et al., 2019;. Transitivity is a measure of the orderliness of a hierarchy (McDonald and Shizuka, 2013). Using a theoretical group of 3 animals, whenever animal A wins against animal B, B wins against animal C, and A also wins against C, the relationship is considered transitive. If all relationships within a group are transitive, a hierarchy is considered linear, and animals can be ranked perfectly from most to least dominant. The steepness of a hierarchy reflects the extent of the differences between individuals close in rank (De Vries et al., 2006). The higher the steepness of a hierarchy, the greater the probability that a higher-ranking individual wins against lower-ranking individuals.
The various mechanisms and interdependencies between the properties and measures associated with social instability render detecting and measuring its extent complex. For instance, rank instabilities can affect measures of transitivity and steepness. Changes in rank can decrease transitivity and reduce steepness by altering winning probabilities. Furthermore, methods which infer dominance ranks may introduce measurement uncertainties that can lead to false rank instabilities (Strauss and Holekamp, 2019a;. Sources of measurement uncertainty include unknown dominance relationships and sampling issues (Neumann et al., 2011;Sánchez-Tójar et al., 2018). Transitivity and steepness can further affect the performance of rank estimations (De Vries, 1998;Sánchez-Tójar et al., 2018). The more intransitive and the flatter a hierarchy is, the more difficult it is to discern the rank order and the larger the measurement uncertainty.
One method for estimating dominance rank, the Elo rating system, was adopted for its robustness to limited observations, unknown relationships, and changes in ranks (Albers and De Vries, 2001;Elo, 1978;Neumann et al., 2011). The Elo rating is based on a sequential approach, which updates rankings after each new interaction. The Elo rating process allows changes in group composition to be considered and active rank mobility to be observed. The randomised Elo rating is an extension of the method which adds an estimation of the measurement uncertainty of the inferred ranks (Sánchez-Tójar et al., 2018). Because the randomised Elo rating shuffles the interaction sequence, the randomised method is recommended only for datasets in which the sequence of interactions is assumed to be irrelevant; relatively stable stystems. Nevertheless, in combination with data-splitting approaches the randomised Elo rating can be a useful tool to assess both measurement uncertainity and hierarchy dynamics (Sánchez-Tójar et al., 2018;Vilette et al., 2021).
In our experiment, we evaluated Elo ratings for six groups of chickens with a constant group composition. Observations of social interactions were made over two distinct periods of the animals' lives, the young (10-12 weeks of age, WoA) and the mature period (24 WoA). Preliminary examinations of the inferred dominance ranks from the full dataset showed surprisingly high measurement uncertainty and indicated active rank mobility in all the six groups. Such hierarchy dynamics were unexpected, since established pecking orders in chickens within this age range have been reported to be stable (Gottier, 1968;Guhl, 1968;Rushen, 1982). Hence, our research question arose whether the dominance hierarchies and individual ranks of hens were indeed unstable across two observation periods that included a standard rehousing procedure and the onset of sexual maturation and egg laying. Furthermore, the investigation allowed us to explore the possibilities of recently developed methods to distinguish rank instability from measurement uncertainty.
To study the hierarchy dynamics, we split the full dataset into the two time periods and applied the Elo rating system for both periods separately, as well as the full dataset. First, we compared hierarchies within and between the two time periods to inspect individual rank changes. Second, we analysed whether individual winning success during the young period positively associated with dominance rank during the mature period, according to winner-loser effect expectations (Chase, 1982;Cloutier et al., 1995;Cloutier and Newberry, 2000). Lastly, we compared the mature and full dataset as additional investigation of rank changes. The comparison provided a more robust analysis regarding known issues of data-splitting approaches when investigating hierarchy dynamics, as for instance overestimation of the dynamics (Strauss and Holekamp, 2019b) or the possibility of low sample sizes after splitting. Furthermore, contrasting the mature and full dataset allowed us to explore the interdependencies between hierarchy dynamics and measures of hierarchy uncertainty, transitivity, and steepness.

Animals
The study was conducted in 2021 in Zollikofen, Switzerland. All methodologies were performed in accordance with the relevant national and cantonal guidelines and regulations and approved by the Bern Cantonal authority (BE-126/2020). In total, 418 female Lohman Selected Leghorn chicks were randomly distributed in six different pens at one day of age. Three of the pens held 119 (A), 120 (B) and 119 (C) animals (large groups, 2.05 m x 9.8 m) while the other three pens housed 20 animals each (D, E, F, small groups, 2.05 m x 1.7 m). The hens remained in the pens for the duration of the rearing period, until 17 WoA. At 17 WoA, with the assumed onset of sexual maturation and egglaying, the laying period, all hens were transferred to laying pens (small: 3.1 m x 2.3 m; large: 3.1 m x 13.8 m) as is common practice with commercial systems. The laying pens were equipped with additional resources (e.g., nestboxes, detailed below) to allow for performance of species-specific behaviours (e.g., nest seeking). Each group was transferred so that the same groups were maintained in the rearing and laying periods. For the transfer, hens were introduced into the new pen in random order by two teams working simultaneously. The procedure took less than five minutes for small groups and about 60 min for large groups between the transfer of the first and last individuals.
Both pens used for the rearing and laying periods were floor pens, containing a litter area with an automatic feeder line (Inauen AG), elevated slats with two perches (Sanatherm 38 mm, Inauen AG), and nipple drinkers (Inauen AG). Pens during the laying period also included nestboxes (small: 2, large: 12, Grando Nest, 15 ×150 cm, Vencomatic Group). Density between the group sizes was controlled for in terms of total floor space, perch length, feeder space, and amount of available nestboxes at a small to large group size ratio of 1:6. Food and water was available ad libitum with additional fresh feed provided automatically at regular intervals with frequency depending on age. The light programme was a standard commercial schedule with a maximum of 15 light hours per day. Pens were visually separated from each other at all times by non-flexible plastic sheets attached to the partitions. During the experiment, five hens died in the large groups resulting in 116 (A), 119 (B) and 118 (C) animals, otherwise the groups remained unchanged.
Between seven and eight WoA, all hens were individually tagged for remote visual identification with two circular laminated paper-labels (diameter 4.1 cm; one per wing) fixed with plastic label fasteners in the skin of the wings Nazar et al., 2015). Six individuals in the large groups lost their identification and were thus excluded from analysis of the full dataset.

Behavioural observations
Social interactions of the hens were observed during three different events at several times of day to account for individual variation by time of day and resource use. The events were: feeder chain runs (when fresh feed was delivered to the feeding troughs), competition around a highquality food source (grapes) provided by the researchers, and intervals in between two feeding chain runs (minimum 30 min after a run). Prior to the first observations of social interactions, hens were familiarised with the grapes by offering them on four days within a two-week window. The grapes were placed in one container for the small groups and in six containers for the large groups and were refilled after 10 min of observations to maintain hens' interest.
The observations were performed in two different time periods. The first time period was between 10 and 12 WoA (young period) during the developmental phase but following the establishment of a pecking order (Craig, 1992;Guhl, 1958;Rushen, 1982). Observations in the young period consisted of 12 days of 15-20 min each for each pen within the three-week period. During the young period, observations of feed chain runs and grape access were performed live, while observations in between the feeding chain runs were done using video recordings and the software INTERACT (Mangold, 2018).
The second set of observations, conducted after the onset of sexual maturation and egg laying at approximately 17 WoA, was done at 24 WoA (mature period) when 88% of hens were assumed to have laid their first egg (based on management protocols of LOHMANN BREEDERS GmbH). The timing of observations in the mature period allowed for seven weeks of habituation after the transition from rearing to laying pens. Observations during the mature period were restricted to four days of 10-20 min each within the one week for each pen to confirm stability of the previously established dominance ranks. All observations during the mature period were done via video recordings for logistical reasons in the larger-sized laying pens. The full dataset consisted of all observations from both the young and mature periods.
All observations were performed by one observer (K.G.) who recorded all agonistic interactions during both sets of observations by means of an Ethogram (Table 1). Whenever any aggressive or submissive act occurred, the aggressor was deemed the winner and the recipient as loser.

Hierarchy extraction and data analysis
All calculations were performed in R v. 4.2.0 (R Core Team, 2022), using the packages "aniDom" v. 0.1.5 (Farine & Sánchez-Tójar, 2017) and "EloRating" v. 0.46.11 (Neumann & Kulik, 2014). Based on preliminary assessments that suggested non-stable dominance hierarchies over the two observation periods, we divided the full dataset into observations from the young and mature periods yielding three datasets, the young, mature, and the full dataset.

Elo ratings and sampling assessments
Based on the outcome of observed interactions, the individual dominance rank was determined via the Elo rating system (Neumann et al., 2011) with a higher Elo rating indicating a higher rank, (i.e., lower rank value). Initial Elo ratings were at zero, which resulted in negative ratings if an individual predominantly lost. For each pen, we extracted the range of the Elo ratings (i.e., from the hen with the lowest to the hen with the highest rating). We calculated the ratio of the sum of all interactions per group by the number of individuals in a group (d/N), where values in the range of 10-20 indicate a reliable extraction of hierarchies in stable systems (Sánchez-Tójar et al., 2018). We further applied the randomised Elo rating (Farine & Sánchez-Tójar, 2017), calculated as a mean rating across n randomised interaction sequences (Scheme S1.1). The risk to decrease sample size below the recommended criterion (d/N < 10) increased by splitting the full dataset. Hence, we implemented a minimum threshold of d/N > 5 for all datasets and pens (young, mature, and full) to derive dominance ranks. This minimum threshold was chosen due to simulations showing relatively robust rank estimates at d/N = 4 for moderately steep and steep hierarchies, as expected for chickens (Sánchez-Tójar et al., 2018).

Extraction of hierarchy measures
For each dataset of the two periods, and for the full dataset, we performed the same suite of analyses in order to: determine the uncertainty of the inferred dominance ranks, the temporal dependence, and characteristics of the underlying hierarchies. The hierarchy measures were used to assess and compare the hierarchies of the different datasets and are thus exclusively presented in the result section of the comparisons across datasets.
To estimate the uncertainty of dominance ranks, we first calculated the repeatability of Elo ratings of individuals across 1000 randomised sequences, where repeatability scores above 0.8 suggest a reliable underlying hierarchy (Farine & Sánchez-Tójar, 2017) (Scheme S1.2 A). Second, 1000 randomised sequences were divided into two halves and dominance ranks were calculated for each half. Spearman's correlations were made between the dominance ranks of the halves (randomised split) with correlations above 0.5 indicating low uncertainty of the associated rankings (Farine & Sánchez-Tójar, 2017) (Scheme S1.2B).
As an estimate of the stability of the hierarchies, we considered the importance of the order of observed interactions (i.e., temporal dependence). A Spearman's rank was calculated by correlating Elo ratings and randomised Elo ratings as an estimate for group-level rank agreement, whereas an intra-class correlation coefficient (ICC) was calculated as a measure of within-individual agreement. High correlation (> 0.8) and good rating agreement (> 0.75), respectively, (Koo and Li, 2016) between ranks derived from the Elo rating and randomised Elo rating indicate low temporal dependency and thus stable rankings (see Scheme S1.2C).
We evaluated two hierarchy characteristics for each dataset: transitivity and steepness. Triangle transitivity, as a measure of orderliness (Shizuka and McDonald, 2012), was calculated on the sum of interactions won by one individual of each pairing. The triangle transitivity (t tri ) ranges from 0 to 1, with 1 indicating perfect transitivity. To evaluate hierarchy steepness, we visually assessed plots for probabilities of higher-ranking birds to win in relation to the difference between the higher-ranking bird and its opponent's rank ( Fig. 1), as suggested by Sánchez-Tójar et al. (2018). Furthermore, we quantified the probability for the higher-ranking individual to win in relation to the difference in rank to the opponent using estimates of the growth rate of a logistic function as indicator for steepness. For this estimation, we fitted a logistic regression as follows (Eq. 1): with P(y) as the probability of a higher-ranking individual to win, x the difference in rank and k the nonlinear least-squares estimate of the growth rate (i.e., the steepness). Rank differences (ranging from group sizes of 0 to N-1) were standardised by the function S x to a group size of 10 to account for different group sizes. The baseline probability that the higher-ranking individual wins was P(y) = 0.5 by setting b = 0 and b was kept constant. The higher k, the faster P(y) converges to 1 and the steeper the underlying hierarchy. A steepness k lower than 0.4 corresponds to higher-ranking individuals winning less than 70% of encounters when rank differences are small (< 3) (with standardised group size). Such a low k thus reflects a flat hierarchy. A winning probability above 80% for a rank difference of 1 corresponds to a k above 1.4, which therefore indicates a relatively steep hierarchy (for examples see Fig. S1). Confidence intervals for k are provided, when possible, though in cases of very high steepness, (i.e., if more than 80% of the observed probabilities are equal to 1), confidence intervals for k cannot be estimated due to ceiling effects (Helsel, 2011).  Estevez et al. (2002) Behaviours grouped by outcome of social interactions into wins and losses.

Aggressive peck
A bird raises its head and stabs its beak towards another bird (usually directed to the head and neck region) Chase A bird runs after/towards another bird in an aggressive manner Threat A bird raises the head (sometimes accompanied by raising of the neck feathers) and looks at another hen or makes an intentional movement towards the other hen. Losses Crouching A bird crouches low and remains very still in front of another animal. Avoidance A bird suddenly lowers its head and walks (or runs) away after receiving attention from another bird.

Fights
Two birds perform a series of aggressive acts towards each other in rapid succession, including leaps and pecks. After a fight the retreating hen is defined as the loser of the interaction.

Analysis: Comparisons across datasets
For those pens which reached the minimum threshold criterion for both periods, we compared observations made during the young and mature periods, to investigate hierarchy dynamics over time. First, we assessed the hierarchies in terms of uncertainty, temporal dependence, and characteristics (transitivity, steepness) for each period and compared them descriptively (Scheme S.1.3A). Then, we compared the individual dominance ranks between the two observation periods by calculating Spearman's rank correlations and within-individual agreement using ICC to estimate consistency of ranks (see Scheme S1.3B). Furthermore, we performed Mantel correlations ("vegan" package v. 2.6-4, Oksanen et al., 2022) between the two periods, a method applied to test similarity of social networks (James et al., 2009).
To test whether individual winning success observed during the young period was associated with dominance ranks of the birds during the mature period, we used a linear mixed model ("lme4" package v.1.1-30, Bates et al., 2015). The full model consisted of Elo ratings from the mature period as the outcome variable that was then related to the total number of wins and the total number of losses during the young period as continuous variables, treatment (small, large group), and all two-and three-way interactions as explanatory variables. Pen was included as random intercept, but then dropped due to lack of explained variance. Elo ratings, as well as total number of wins and losses, were scaled per pen. Model selection was based on the AIC criterion and model diagnostics were assessed using the "DHARMa package" v. 0.4.5 (Hartig, 2022).
Lastly, we compared all pens which reached the minimum threshold criterion for the mature period with the full dataset. The hierarchy measures of the mature period and the full dataset should be very similar, if the interactions in the young period did not alter the hierarchy measures of the full dataset. The comparison between the mature period and the full dataset allowed the investigation of the effect of hierarchy dynamics on hierarchy measures of uncertainty, temporal dependence, and the characteristics (transitivity, steepness). Furthermore, we estimated the impact of the hierarchy dynamics on the dominance rank of individuals by correlating (Spearman) dominance rankings derived from Elo ratings of the full dataset with rankings based on randomised Elo ratings of the mature period (Scheme S1.3C). An ICC value was calculated to estimate the absolute within-individual agreement of Elo ratings between the mature period and full dataset (Scheme S1.3C). Dominance ranks for both the mature period and the full dataset were scaled using function S x (Eq. 1) and rank differences calculated per group size to standardise the magnitude of rank changes across group size.

Elo ratings and sampling assessments
A total of 4442 interactions (young period: 1483; mature period: 2959) were observed achieving a d/N > 10 for five out of six pens when considering the full dataset (Table 2; additional information on agonistic interactions provided in Tab. S1). Assessment of the observation periods Fig. 1. Hierarchy steepness of Pen E based on different time periods. Data points show the mean probability of higher-ranking individuals winning (y-axis) across all instances of interactions for a specific rank difference (x-axis). Black vertical lines indicate 95% confidence intervals of the probability means. The grey line is a loess fit (polynomial) to the means, the black line a logistic fit. The steepness k is based on nonlinear least-squares estimates of the growth rate of the logistic fit. A shows data based on interactions recorded between 10 and 12 WoA (young, k = 1.35), B depicts data from 24 WoA (mature, k = 1.87) and C is based on the full dataset (k = 0.17).

Table 2
Observed interactions of each pen for both periods and the full dataset. The observations for the young and mature periods do not add up to the full dataset for large pens due to individuals that lost their ID tag between periods and thus were excluded from the full dataset. Elo rating ranges are only given for datasets where the d/N was above minimum threshold criterion (>5). independently (i.e., young and mature period) resulted in a d/N < 10 for all pens, where all values in the mature period exceeded 5. Values of d/N for observations in the young period exceeded 5 in two pens only (i.e., E and F; Table 2). Consequently, the between-period comparison was only performed for these two pens. For the comparison between the mature period and the full dataset we could use all pens. The calculated Elo ratings for all pens and datasets with d/N > 5 ranged from a minimum of − 608.2-701.41 in large groups and from − 512.87-659.61 in small groups (Table 2).

Comparisons across datasets
3.2.1. Comparisons of the hierarchies between the young and mature period Assessment showed overall high repeatability of individual Elo ratings (≥ 0.86), transitivity (t tri ≥ 0.82), and steepness (k > 1.35) (see Fig. 1, Table 3) for the pens which met the minimum threshold d/ N > 0.5 criteria (E, F) for their respective hierarchies during observations of the young and mature periods. Comparison of periods within pen indicated more uncertainty in the young compared to the mature period as indicated by the lower Spearman's rank correlation value for the randomised splitting. The Spearman correlation of the Elo rating and randomised Elo rating was high and ICC agreement of Elo ratings was excellent within each period. One pen (F) had a weak tendency for a rank correlation between the young and mature period (r = 0.42, p = 0.07) whereas no correlation was found in the other pen (r = 0.12, p = 0.61) (see Table 3; Fig. 2). Agreement for ICC and the Mantel correlation between the two periods were low (Table3). Mean change in dominance rank across the two time periods for pens E and F were 6 and 4 positions, respectively (Pen E: SD= 4.86, min = 0, max = 19, Pen F: SD= 4.81, min = 0, max = 16).

Association between winning success during the young period and rank in the mature period
Hens with the highest winning success based on observations during the young period were consistently less successful in the mature period in all pens (Fig. 3). For birds in the young period with a relatively high number of wins (i.e., above the 75th percentile), only 12% (10/83) in large groups and 25% (3/12) in small groups were found in the top 10% of highest-ranking birds during the mature period. At the same time, 9% (8/83) and 25% (3/12) of those birds with the high number of wins in the large and small groups during the young period, respectively, were even placed in the lowest 10% of ranks in the mature period. The best fitting model included solely an interaction of the wins and losses during the young period. The group size treatment was not included in the best fitting model (excluding treatment improved AIC by 2.8, see Table S2). Animals characterized with a relatively high number of both wins and losses during the young period were associated with high Elo ratings in the mature period (Elo rating estimate at max(Wins) & max(Losses): 3.1, 95% CI [0.97, 5.24]). In contrast, animals that mainly experienced winning and rarely lost during the young period were associated with low Elo ratings in the mature period (Elo rating estimate at max(Wins) & min(Losses): − 2.25, 95% CI [− 3.83, − 0.67]). Examining the associated plot, the interaction effect between losses and wins (effect estimate: 0.12, 95% CI [0.04, 0.2], p = .004) was likely due to animals with extreme numbers of wins and losses (Fig. S2).

Comparison of the mature period with the full dataset
Results assessing hierarchy measures of the full and mature period datasets are provided in Table 4. The repeatability of Elo ratings for the mature period was high (≥ 0.87) and above threshold for all pens, whereas for the full dataset, below 0.8 in two large and two small pens. Spearman correlations of rankings for 1000 randomised sequences were above 0.5 for all pens using the mature period dataset but below 0.5 for two pens of the full dataset.
Spearman correlations between the Elo rating and randomised Elo rating for the mature period were high (0.77 ≤ r ≤ 0.95, all p < .0001) and ICC agreement of Elo ratings per individual were good to excellent (all ICC > 0.86). For the full dataset, correlations were also high (0.63 ≤ r ≤ 0.84, all p < .05), with one exception, while the ICC agreement was good for four pens (ICC > 0.75) and moderate for two pens (ICC > 0.5).
Estimations of transitivity (i.e., orderliness) for the mature period were high in five pens (t tri > 0.8 all p < .0001) and a sixth reaching t tri = 0.67 (p < .0001). In contrast, transitivity for pens using the full dataset was low to moderate in five pens (0.73 ≥ t tri ≥ 0.39) and high for one small pen (t tri = 0.92). The hierarchy steepness was high for the mature period in two pens (k > 1.4), moderate in three pens, (k > 0.5) and low in the pen with low transitivity (k = 0.36). Hierarchies were generally flat (all k < 0.4) for the full dataset.
Lastly, in five pens, there were high positive correlations of the dominance rank of the full dataset with dominance ranks of the mature period (r ≥ 0.84, p < .01). The remainder (pen D) showed a trend for a positive correlation (r = 0.77, p = 0.07). ICC agreement between Elo ratings for the full dataset and mature periods were good to excellent (all ICC > 0.84). Mean rank differences of scaled ranks were 1.16 (95% CI [1.01, 1.27], max = 6) in large and 1.27 (95% CI [0.96, 1.57], max = 5) in small groups (Table S3).

Discussion
In the present study, we evaluated Elo ratings of laying hens across two time periods, the young and mature, which differed in their housing environments, to determine whether dominance hierarchies remained stable. Against expectations for chicken hierarchies to be stable, comparisons between the time periods for two groups indicated active rank mobility across periods. Further analyses showed that in all groups high winning success during the young period did not directly associate with high rank in the mature period. Finally, comparisons between the mature period and the full dataset supported rank instabilities for all groups across the two time periods. The comparison also revealed increased hierarchy uncertainty, and reduced transitivity and steepness for the full dataset due to the dynamics. Our findings not only provide novel evidence for rank instabilities in chicken hierarchies across Values indicated with b as superscript fell below a certain threshold and/or were not significant. P-values are indicated with ≤ .05 * , ≤ .001 * *, ≤ .0001 * ** , note that significance only applies to simple correlations and triangle transitivity. 95% confidence intervals are also provided for randomised splitting, ICC estimates, as well as steepness k (exceptions due to censoring issues). maturation and rehousing, but also showcase interdependencies between hierarchy measures and dynamics.

Evidence for rank instability
Preliminary diagnostics for the full dataset of the six pens indicated unexpected measurement uncertainty and potential changes in the underlying hierarchies between the observation periods. To achieve a more comprehensive picture of the hierarchies, we investigated the two time periods of observation together and separately. In summary, our evidence suggests that ranks were actively changing between the periods rather than remaining stable, a conclusion based on several analyses and comparisons.
Hierarchies in the young and the mature period could only be compared directly for two out of the six pens. Both pens had very transitive, steep, certain, and stable hierarchies in both periods, though not correlated between periods. In other words, the hierarchies took on a period-specific composition with inconsistent ranks between periods. Such period-dependent hierarchies would be indicative of animals successfully challenging higher-ranking individuals and causing changes in rankings between the two periods. Although these results were only based on two pens of the smaller group size and are not necessarily generalisable, they were in line with our finding that in all groups individual winning success during the young period was not directly associated with rank for the mature period.
Under assumptions of winner-loser effects (Chase et al., 1994;Fig. 2. Dominance hierarchies of Pen E based on two different time periods.Two-letter labels indicate individuals in Pen E. A shows Elo ratings of individuals in ascending order based on randomised Elo ratings from interactions recorded between 10 and 12 WoA (young period), while B shows Elo ratings equally as in A but based on data from 24 WoA (mature period). C compares dominance rankings of individuals based on Elo ratings from the mature (x-axis) versus the young (y-axis) period. Low values in rank indicate high hierarchical position (high Elo rating). Rank was not correlated between the two periods (Spearman's r = 0.12, p = 0.61). Grey dashed line indicates expected position if rank during the young and mature period were identical (difference = 0).

Fig. 3.
Paired scatter plot of scaled number of wins for each individual collected during the young period compared to after maturation.Orange: animals placed above the 75% percentile of wins during the young period; blue: animals found in the top 10% of ranks during the mature period (rank 1 & 2 in small groups, rank 1-12 in large groups); grey: animals neither below the 75% percentile of wins during the young period nor in the top 90% of ranks during the mature period; animals in both orange and blue fulfil both conditions. Each plot represents one group (A-C large groups, D-F small groups). Cloutier and Newberry, 2000;Rutte et al., 2006) and stability of ranks across periods, one would expect to find animals with many wins during the young period also in the upper ranks in the mature period. Contrary to expectations, high winning success and few losses in the young period were associated with a low rank in the mature period. Interestingly, individuals with both high winning success and high numbers of losses in the young period were found to have a high rank in the mature period. Such an outcome suggests that opportunistic hens which continuously challenged higher-ranking conspecifics obtained a better position in the mature period compared to hens which mostly interacted with lower-ranking hens. Alternatively, hens with both high winning and losing success might have interacted with more, different individuals at a young age, resulting in the higher rank in the mature period. However, looking at the data for the two pens that met the minimum threshold criterion for the young period, we found no such indication. On the contrary, the highest-ranking individual in both pens had not only the highest winning success but also interacted with most other individuals. Nevertheless, both top-ranking hens during the young period dropped to the lowest ranks in the mature period, while the previous second and third in hierarchy rose to the top. A similar pattern was observed when considering pure winning success between the two periods for the other four pens (Fig. 3). Such a systematic "fall from grace" of the highest-ranking individuals in all six pens is very intriguing and indicates a common factor driving the active rank mobility. As last comparison concerning rank instabilities, we assessed hierarchy measures between the mature period and the full dataset for all pens, which provided further evidence for active rank mobility in all chicken groups. All hierarchy-related measures for the mature period were increased and measurement uncertainty decreased in comparison to the full dataset. As mentioned, the d/N for the mature period remained below recommendation (10 > d/N > 5), but the steepness estimates of the hierarchies were intermediate to very high. Such steepness has been shown in simulations to result in more robust rank estimates at smaller d/N (Sánchez-Tójar et al., 2018). Furthermore, the ranks of the mature period seemed to be stable, as opposed to the ranks of the full dataset, reflected by high correlations and good within-individual agreement between results of the Elo rating and randomised Elo rating. If the mature period had shown unreliable, instable results as the full dataset, it would have indicated inadequate sampling or continuous rank instabilities. However, the mature period appeared reliable and stable, suggesting that the additional interactions of the young period within the full dataset caused the different outcomes.
We collected interactions only during short periods, three weeks and one week for the young and mature period, respectively. Experiments with continuous observations would be necessary to fully understand the hierarchy dynamics over time. In this regard, the current study is not optimized in terms of experimental design to determine if hierarchies in these life stages were indeed stable. Nonetheless, we believe our results to be important towards the overarching goal of understanding hierarchies in present day, commercial laying hybrids and to provide a framework for future, more targeted studies.

Effects of hierarchy dynamics on hierarchy measures
Another aim of the study was to inspect the impact of the rank instability between the periods on the measurement uncertainty and hierarchy measures of the full dataset. Repeatability scores of the inferred ranks of the full dataset were below reliability criterion (< 0.8) for four out of six groups, indicating that agreement of Elo ratings between randomised sequences was not reliable. On the other hand, randomised splitting results (median split correlations of randomised interaction sequences) indicated robustness of hierarchies for four out of six pens. The differences in the uncertainty estimates suggested that repeatability which operates using the Elo rating values as opposed to rank orders like the randomised split may be more sensitive to rank Table 4 Comparison of the mature period with the full dataset.  Values indicated with b as superscript fell below a certain threshold and/or were not significant. P-values are indicated with ≤ .05 * , ≤ .001 * *, ≤ .0001 * ** , note that significance only applies to simple correlations and triangle transitivity. 95% confidence intervals are also provided for mean randomised splitting correlations, ICC estimates, as well as steepness k (exceptions due to censoring issues).
K.J. Grethen et al. instabilities. Correlations and agreement of ranks derived from Elo ratings and randomised Elo ratings were below 0.8 for most pens, especially in small groups with similar numbers of interactions between the young and mature period. Thus, comparisons of Elo ratings and randomised Elo rating outcomes can serve well to detect temporal dependency of the inferred hierarchies. Transitivity was low to moderate for most and hierarchies flat in all pens, suggesting that the probability of a higher-ranking individual winning an interaction remained close to random (P = 0.5), especially for small rank differences. When directly comparing Elo ratings derived from the full dataset with the randomised Elo ratings of the mature period, the Elo ratings were as expected similar. Hence, the Elo rating method was rather robust to the rank changes, in line with studies showing that Elo ratings accurately reflect hierarchy dynamics (Neumann et al., 2011;Strauss and Holekamp, 2019a;Vilette et al., 2021). Although contrary to implications of a simulation study, which showed extended periods of unreliable rankings after disruptions (Goffe et al., 2018). Mean differences in rank were relatively small, and comparable across group size, though some individuals displayed large differences. These individual differences indicated that while on average the hierarchy dynamics had a low negative impact on the final rankings, the impact was not uniform across animals resulting in decreased reliability of individual estimates.
Taken together, hierarchy dynamics rendered hierarchy estimates for the full dataset uncertain and interaction outcomes between individuals difficult to predict, resulting in hierarchies having low transitivity and appearing flat.

Potential reasons for the hierarchy dynamics
Even though hens can challenge higher-ranking individuals once ranks are established, it happens seldomly and the animals appear to generally interact as their ranks would predict (Guhl, 1968;Mench and Keeling, 2001;Schjelderup-Ebbe, 1922), a phenomenon described as 'social inertia' by Guhl (1968). Thus, we had originally planned observations shortly after the believed onset of hierarchy establishment with the expectation that the hierarchies would remain unchanged for the remainder of the experiment. However, agonistic interactions were increased during the mature period in both group sizes, even though, aggression is expected to be reduced once hierarchies are established (Guhl, 1968(Guhl, , 1953Guhl and Allee, 1944;Tibbetts et al., 2022). Such a rise in interactions could indicate that hens established their hierarchies later than expected, shifting the peak of agonistic interactions from the young towards the mature period. Under this assumption, our findings of active rank mobility would be explained by a lack of stable ranks in the young period, with not yet established hierarchies.
Alternatively, we did find stable hierarchies in two of the pens in the young period where we reached minimum threshold for sampling. Thus, the low incidence of aggression in all pens during the young period could suggest hierarchies were indeed established, leading to reduced aggression, but social instability was incited in between the observation periods. Two major known, disruptive events which occurred following observations in the young period may have contributed to sparking active rank mobility.
The first event was the rehousing procedure from rearing to laying pens. Location and territory play a vital role in fighting success of individuals (Cloutier et al., 1995;Piper, 1997;Wooddell et al., 2017). Therefore, it is plausible that the re-housing procedure led to rank instabilities, a possibility supported by anecdotal reports from our animal caretakers of increased numbers of comb lesions after rehousing. Head and comb injuries result from overly aggressive pecks in social interactions and have been related to social stress (Birkl et al., 2017). Furthermore, the order of introduction into a group has been shown to affect dominance rank (Bernstein and Gordon, 1980;Boucherie et al., 2022). Also hens, which have spent more time in a certain place, have been shown to win more frequently over later introduced individuals (Guhl and Allee, 1944; Schjelderup-Ebbe, 1922). In our study it is questionable how much an order effect might have contributed, especially for small groups, which were introduced into the pen in less than 5 min. Albeit outside the familiar pen context, previously established relationships might have been contested despite the same individuals being present. Once a new context is associated with losing, subsequent losing is more likely (Cloutier et al., 1995).
Environmental perturbations have been shown to incite hierarchy instability, for instance in some fish species simulations of drought or turbulences resulted in hierarchy dynamics (Sloman et al., 2001;Sneddon et al., 2006), with cases of lower-ranking individuals rising to the top ranks. But the relocation of a stable group only resulted in one minor, temporary rank change for macaques (Honess et al., 2004) and did not increase aggression in chickens (Cloutier and Newberry, 2002). While there is a rising interest in the consequences of environmental perturbations for the social behaviour of a species, investigations are mostly focused on changes due to human impact, such as rise in temperature or light pollution (for an overview see Fisher et al., 2021), which are not necessarily comparable to a relocation as in the present study. More research into relocations will be necessary to estimate their impact on social hierarchies of stable groups.
The second disruptive event was the onset of sexual maturation. Due to laying hens having been continuously selected for a variety of traits including early onset of egg laying, docility, and social factors (Muir and Cheng, 2014;Rodenburg et al., 2010), maturation may have disrupted social hierachies in a manner distinct between previous and current laying hen strains. For instance, strains selected for early maturation showed increased aggression (Bhagwat and Craig, 1977;Craig et al., 1975;Lee and Craig, 1981;Muir and Cheng, 2014), while selection for dominance ability (winning pair encounters) resulted in decreased age at first egg (Craig, 1968). Furthermore, high rank and early maturation within flocks appeared to be positively correlated (Lee et al., 1982;Rushen, 1982;Tindell and Craig, 1959). Such a positive relationship between dominance and maturation onset in females has also been reported in other species (wasps: Barth et al., 1975;baboons: Bercovitch and Strum, 1993;mice: Drickamer, 1985), highlighting the close connection between sexual hormones and social behaviour across species.
Only two studies from over 40 years ago investigated chicken flocks from hierarchy establishment across maturation (Guhl, 1958;Rushen, 1982). In addition to genetic differences, both researchers used mixed flocks containing males and females. Males are known to suppress agonistic interactions between females in small and large groups (Appleby et al., 2004;Craig and Bhagwat, 1974;Guhl, 1969;Odén et al., 1999), which may have contributed to our differing results.
To summarise, the transition from rearing to laying pens and/or genetic selection's impact on sexual maturation and social factors could have contributed to incite active rank mobility between observed time periods. Additionally, more factors, such as third-party influences, individual attributes, health status or others apart from the discussed effects are presumed to influence hierarchy formation and maintenance (Dehnen et al., 2022;Lindquist and Chase, 2009;Tibbetts et al., 2022).

A note on the group size treatment
Our study question for the current effort arose from preliminary observations for a related work comparing social interactions in different group sizes. The laying hen industry is undergoing a gradual but steady change from various forms of cage housing with small groups of animals (typically between 10 and 50) to cage-free housing with group sizes of thousands of animals (Schuck-Paim et al., 2021). Although such housing benefits hens regarding the expression of normal behaviours (Karcher and Mench, 2017), it is still discussed whether large groups (> 100 animals) would form hierarchies as seen in small groups (D'Eath and Keeling, 2003;Estevez et al., 2002;Pagel and Dawkins, 1997).
The here presented findings indicated similar social hierarchies in large and small groups with comparable hierarchy dynamics independent of group size. However, the d/N remained very low (< 4) in the young period in particular for large groups, indicating reduced aggression in large groups compared to small groups. This difference in aggression by group size aligns with previous findings of decreased aggression in large hen flocks (Estevez et al., 2002(Estevez et al., , 1997Nicol et al., 1999;Zimmerman et al., 2006). Then again, incidence rates of agonistic interactions were comparable between group sizes in both periods using video observations (see Table S1), whereas incidence rates were reduced for live observations in the young period for large groups. Therefore, the low rates for large groups were likely due to live observations in the young period limiting the ability to observe simultaneous interactions and not due to group size effects on aggression. Potential effects of group size on the social structure of laying hens will be more directly addressed in forthcoming work. Due to the possible impact of both maturation and rehousing on the hierarchy dynamics, it is too early to derive implications for the management of laying hens. Future efforts will be necessary to disentangle the influence of the two events. Nonetheless, the consistent drop in rank of high-ranking hens across time periods raises welfare concerns for the individuals in question, considering the high costs associated with losing high rank position presented in case studies of several species (Milewski et al., 2022).

Conclusion
When evaluating dominance ranks of laying hens from observations collected across two time periods, ranks were uncertain and temporally dependent. Comparisons across time periods indicated active rank mobility between the observed time periods. Potential reasons for active rank mobility could include social disruption by rehousing practices, sexual maturation of the birds, as well as the absence of males acting as mediators of agonistic interactions between females, in addition to decades of selective breeding. Furthermore, the discussed factors could have had an impact on discrepancies between the presented and previous findings (Gottier, 1968;Guhl, 1958;Rushen, 1982). Further research will be necessary to understand the underlying mechanisms.

Declaration of Competing Interest
This document reflects only the author's view and the European Union's Horizon 2020 research and innovation programme is not responsible for any use that may be made of the information it contains.

Data Availability
The data and code for this study are available at http://doi.org/ 10.17605/OSF.IO/EZUM6.