Rugby game performances and weekly workload: Using of data mining process to enter in the complexity

This study aimed to i) identify key performance indicators of professional rugby matches, ii) define synthetic indicators of performance and iii) analyze how weekly workload (2WL) influences match performance throughout an entire season at different time-points (considering WL of up to 8 weeks prior to competition). This study uses abundant sports data and data mining techniques to assess player performance and to determine the influence of 2WL on performance. WL, locomotor activity and rugby specific actions were collected on 14 professional players (26.9 ± 1.9 years) during training and official matches. In order to highlight key performance indicators, a mixed-linear model was used to compare the players’ activity relatively to competition results. This analysis showed that defensive skills represent a fundamental factor of team performance. Furthermore, a principal component analysis demonstrated that 88% of locomotor activity could be highlighted by 2 dimensions including total distance, high-speed/metabolic efforts and the number of sprints and accelerations. The final purpose of this study was to analyze the influence that WL has on match performance. To verify this, 2 different statistical models were used. A threshold-based model, from data mining processes, identified the positive influence (p<0.05) that chronic body impacts has on the ability to win offensive 1 on 1 duels during competition. This study highlights practical implications necessary for developing a better understanding of rugby match performance through the use of data mining processes.


Introduction
Rugby union (RU) became a professional sport in 1995 and has since come across multiple ethical and financial issues. The incessant increase in game intensity and competitive demands PLOS ONE | https://doi.org/10.1371/journal.pone.0228107 January 29, 2020 1 / 21 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (i.e. promotion-relegation championships), among other factors, has greatly contributed to enhanced risks of injury and non-functional adaptations [1,2,3]. Optimizing physical preparedness has, therefore, become the main concern for team staffs. Workload (WL) monitoring, its' management and developing optimal adaptational capabilities are important parameters to consider in elite team-sport environments [4,5]. Indeed, many studies demonstrate the influence of weekly workloads (2WL) on acute and chronic physical performance, physiological adaptations and injury risks in elite rugby players [6,7]. In spite of the correlations between WL, performance, and more particularly physical performance, which are commonly accepted in team sports, very few studies have successfully established these relationships in a competitive context [8,9]. One of the main reasons certainly resides in the difficulty to identify and evaluate the key performance indicators (individual and collective) in team-sports. Nevertheless, for some time now, various studies succeed to reveal some tactical, technical and physical key performance indicators during RU games at different age categories and level of play [10,11,12]. Furthermore, elements of research outline some individual technical skills as being directly correlated to playing performance. Ortega et al. [13] and Den Hollander et al. [14] demonstrated that the percentage of successful tackles, the amount of defensive line breaks and the number of offensive duels won (tackle breaks) positively influenced individual and team performance during RU matches.
Other reason for the complexity of studying WL and its' effect on game performance is the elaboration of a valid and reliable longitudinal monitoring protocol (training and competition). Indeed, according to Fernandez et al. [15], "Physical performance has not yet taken much attention from the research community, due to the difficulty of accessing this information with the same devices during training and competition". For a few years now, microtechnology (GPS and inertial sensors) used in rugby has monitored activity during training and matches with acceptable accuracy [16]. Novel technology has provided the possibility of collecting sport specific data; individual (internal and external parameters) and team (match analysis) statistics. These tracking means provide staff with a large amount of data to analyze. Appropriate modelling of training WL and performance (in a competitive context) is necessary to give a practical meaning to this data [17].
The main objective of this study is to demonstrate how WL influences game performance (individual and collective) in short and moderate terms during a professional RU season. However, as mentioned above, studying the relationships between WL and match performance requires preliminary steps. Hence, the intermediate objectives will be i) to identify key performance indicators during professional RU matches, ii) to elaborate synthetic indicators of performance as to facilitate data analysis, iii) and finally to analyze the influence of 2WL on changes of match performance during an entire season.

Participants
Fourteen professional RU players (6 forwards and 8 backs) (age: 26.9 ± 1.9 years; height: 185 ± 7.9 cm and weight: 97.6 ± 13.2 kg) volunteered to participate in this study. All players had been playing professionally for several years (experience: 137.1 ± 73.4 professional matches) and were active members of the same team (CA Brive Correze which took part at the 1 st professional division of French championship- Top 14). All subjects gave informed consent to participate in the experiment in accordance with the Declaration of Helsinki. The study protocol was conducted with the support of medical and technical staffs of the professional team. Finally, the study respected the ethical guidelines of the Rennes university and research laboratory associated at this study.
Workload and performance throughout a rugby season PLOS ONE | https://doi.org/10.1371/journal.pone.0228107 January 29, 2020 2 / 21 these workload parameters were used to better understand the links between the workload and the performance. These data can be made public from June 2021. Nevertheless, anticipated data access from qualifying researchers with an approved protocol will be possible with the agreement of the club (SASP Club Athletique Brive Correze) by contacting the corresponding author (romain. dubois55@orange.fr), the club (contact@briverugby.com) or the head S&C coach during the study period (s_pollyfr@yahoo.fr).

Funding:
No funding was received for this study and there was no conflict of interest for this study. The SASP Club Athletique Brive Correze provided us access to the GPS and activity data, but it had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Procedure
WL and match activity of 14 players were monitored throughout a professional RU season. WL parameters were obtained from different methods (S-RPE, heart-rate (HR) based methods, and GPS tracking). WL parameters were analyzed with different weekly rolling averages (up to 8 previous weeks). Rugby match activity was assessed by GPS tracking (locomotor activity) and completed with video analysis to identify sport-specific activity (tackle count, duels won, . . . See Table 1). Team performance (victory vs defeat and positive vs negative) was analyzed to highlight the key performance indicators during elite RU matches. Data mining and data mining processes were used once data collection was completed. This strategy was elaborated to identify key performance indicators and to underline the influence of WL parameters (acute and chronic) on RU performance.

Raw data collection
The general organization and WL distribution during this season was presented in a previous study [6]. The season lasted 48 weeks including 8 pre-season microcycles. The competitive phase (40 weeks) contained 32 official matches. To reach the objectives of this present study, internal and external WL were quantified during training and matches. During matches, performance and physical activity were assessed by a microtechnological system (SPI-HPU, 5 Hz, GPSport, Australia), and though video analysis. Video analysis was used to record rugby-specific activity: attempted tackles, successful tackles, defensive line breaks, ruck participation, etc (Table 1). Team performance was identified from match results (victory vs defeat) to which was added another type of classification (cf. "Britannic Ranking", see below) which considers the influence of match location on results [18]. Workload quantification: Throughout the season, WL was quantified during each training session using different monitoring methods: session-RPE (S-RPE = RPE (CR-10 Scale) x session duration (expressed in min)) [19], HR-based methods (i.e, TRIMPS; Polar T34, Polar Electro, Finland). External WL was assessed with the use of electronic performance and tracking systems which included GPS and microsensor technology (accelerometers, gyroscopes and magnetometers). 2WL was defined as the sum of WL of each session included in the microcyle (in the present case, all matches were held on Saturdays and a 1-week microcycle corresponds to a Monday-Sunday working week) [6]. The different parameters used to analyze WL during training are specified in Table 2.
Locomotor activity and performance during matches: During the 32 official matches of the season, locomotor activity of players was tracked by microtechnology using the same parameters than those used during training. HR recordings were nevertheless different between matches and training ( Table 2). Additional rugby specific actions were recorded by video analysis. This permitted quantification and qualification of rugby specific actions. A qualified video analyst was responsible for collecting data for each rugby match. The specific actions analyzed during the matches are presented in Table 1. In order to accurately normalize data, GPS data and sport specific actions were expressed relatively to playing time. Data corresponding to less than 10 min of playtime was not used for this study. In the aim to focus on individual variations (and to remove inter-individual differences from the performance potential), a Z-score specific to each player was calculated for all performance and locomotor activity parameters. This Z-score is based on the average and standard deviation (SD) of the full season for each parameter.  [20].

High-metabolic power distance
HMPD m External Sum of the distance covered above 20 W.kg -1 [20].

Sprint distance
Sp Dist m External Sum of the distance covered above 25 km.h -1 .

Sprint Number
Sp N n External Number of times the player has run more than 25 km.h -1 .

Accelerations
Acc n External Number of accelerations performed above 2.5 m.s -2 . In order to consider the influence of match location on results, the "Britannic Ranking" was used to determine positive, negative and neutral performance. More precisely, a bonified victory (offensive bonus) during a home match, a defensive bonus or a victory during an away match will be considered as "positive performance". Defeat during a home match will be considered as "negative performance", and finally, a victory during a home match and a defeat during an away match will be considered as "neutral performance". This type of ranking is often used by French rugby teams' staff to predict final standings when considering the number of remaining matches to be played at home and/or away.

Data contextualization and transformation:
As demonstrated in other studies, 2WL and match performance are influenced by different contextual factors such as: the period of the season, player status (starter, substitute), playing position and match location, among other factors [6,21]. Therefore, the player's status (starter or substitute) and position (forward or back) on the field was taken in account for this study.
To study the effects at short and moderate terms, all WL parameters were analyzed on a rolling average. The rolling average for the 2nd, 3rd, 4th, 5th, 6th, 7th and 8th previous weeks was analyzed for each parameter. A weighted average, to increase the impact of recent WL, was also used with similar time lags. Variability of training was considered by analyzing the SD of previous weeks (2nd to 8th). Finally, for each WL parameter (Table 2), 21 other parameters were added: 7 for average at 7 different weekly considerations, 7 for weighted average and 7 for SD.
After one year of data collection, an important analysis was performed in an attempt to analyze how WL influences match performance in successive matches for an elite RU team. This study provides a methodology based on data mining to relate physical performance variations of players during time-framed training sessions and their performance throughout the following matches. The study is structured by three major steps, each one being associated to different analysis methodologies. The first part focuses on constructing an informative dataset from GPS measurements and specific data on WL, match activity and performance indicators. The second part analyses this information to identify rugby specific actions in terms of player status (starter or substitute). The third part aims at identifying links between performance and match activity. Difficulties were encountered on the two previous parts. Indeed, it was necessary to identify useful information from such large amounts of dataset as to optimally interpret the data.

Statistical analysis
As shown in Fig 1, the methodological process can be divided into three steps. As a preliminary step, descriptive statistics were computed. Prior to the main analysis, the level and the variability (mean ± SD) of each training parameter were calculated relatively to playing position and the player's status using a linear mixed model. Effect size (ES) was then calculated using Cohen's d statistics where an ES <0.2 was considered non-significant (NS), 0.2-0.6 small, 0.6-1.2 moderate, 1.2-2.0 large and > 2.0 very large [22,23].
Because performance is measured through a set of several variables and not a unique response variable, multivariate statistical approaches were carried out. With the same constraints and objectives, Haghighat et al. [24] propose a review of several methods to allow an automatic selection of the most significant features based on data mining techniques. We favor a dimensional reduction approach as it facilitates analysis during the third part, makes storage/computation less expensive and allows for easier interpretation [17]. For this purpose, a linear dimension reduction method called Principal Component Analysis (PCA) was used on the performance dataset to reduce the dimension of analysis [17]. A normalized PCA was used in the second part to reduce the high-dimensional raw feature [17,25]. PCA is a descriptive multivariate statistical analysis that explores a set of quantitative variables in order to improve collinearity between them and to discuss the importance of each variable in terms of variability. It is a mathematical tool used for computing a set of new synthetic variables. These variables, also called dimensions, aim at identifying high variability components based in bigger dimensional datasets. Subsequently, choosing a small number of new dimensions allows to create a discriminative sub-space based on informative features in terms of variability to map the high-dimensional data set.
Finally, in the third step, we tried to explain the relationships between different WL parameters at short and moderate terms (x-factors) and the performance/locomotor activity indicators (y-factors). A cross-correlation analysis was used to assess the level of cross-collinearity between performance descriptors (response variables) and training locomotor activity descriptors (descriptive variables). However, there exists a certain limit for linear models to highlight relationships between WL and performance (no significant correlation was found between the two groups). Regression trees are also applied during this third step to extract discriminative information for performance and (potentially) to further reduce the dimension. A regression tree is a data mining process which is based on decision induction analysis. It estimates a regressive relationship through binary partitioning (splitting) by testing the link between a set of explanatory variables and a quantitative response variable. Classical and conditional regression trees were used to identify non-linear link through a graphical binary tree. This results in a discrete model based on a set of rules given by a categorical pattern of dependence computed on interaction between categorical explanatory variable and categorized quantitative explanatory ones. In this part, the different response variables used are successively the first two dimensions of the PCA above. All analysis was conducted with R Statistical Software (R. 3.3.3, R Foundation for Statistical Computing).

Results
Workload description, collective performance and specific indicators of team performance: Table 3 shows the 2WL of players depending on their position and their playing status. Regarding the playing position, our results show no significant difference concerning the internal WL when the S-RPE method was used. However, backs covered a greater TD (p<0.001, d = 0.8) and have higher NBL (p<0.001; d = 0.8) than forwards. This is more pronounced in faster speed (p<0.001, d = 1.6) and metabolic zone (p<0.001, d = 0.5). No other significant differences were observed between backs and forwards concerning the other 2WL parameters. Table 3 also highlighted that players who started the upcoming match were exposed to greater WL regardless of their position. This was true for the weekly S-RPE (p<0.001, d = 1.4), HSR and HMP distances (p<0.001, d = 0.5, respectively) and TRIMPS (p<0.05, d = 0.3). Table 4 provides information about the 2WL parameters, at short and moderate terms, depending on the team's performance (victory vs defeat or positive vs negative) during official matches. It shows that, when the team studied won, some 2WL parameters were greater during the week prior to competition. Indeed, the acute S-RPE was greater (p<0.001, d = 0.4), as well  Table 5 highlights the individual indicators of match performance according to the player's position and match results. It reveals that backs have a greater average speed (m.min -1 ) during matches won (p<0.05, d = 0.4). In contrast, the relative distance traveled in HMP zone is significantly greater in backs (p<0.5, d = 1.0) during lost matches. Concerning specific activities, forwards performed a bigger defensive performance during successful matches by totalizing more completed tackles (p<0.05, d = 0.8) and more offensive tackles (p<0.05, d = 0.6) compared to matches that were lost. Moreover, forwards have a greater activity index during successful matches (p<0.05, d = 0.5). On the contrary, backs played the ball significantly less during victories (p<0.05, d = 0.4). Table 5 also highlights other significant differences between backs and forwards concerning physical and rugby specific actions during matches. Table 6 shows the influence of player status on playing activity during matches. Playing activity indexes were greater for substitutes independently of the player's position (p<0.05, d = 1.0 & d = 1.9, respectively for forwards and backs). Furthermore, forward substitutes conceded less penalties (relatively to ball-in-play time-when the player played) compared to starting forwards (p<0.05, d = 0.9).

Summary of individual performance:
Characteristics of individual speed are used in this analysis. Fig 2 shows more important collinearity for three variables (HMPD.min, HSR.min and TD.min) with Dim1 and that only sprint and accelerations (Sp+Acc) are highly correlated with Dim2. One can consider that the three variables of Dim1 measure the same aspect of performance, while Sp+acc measure another aspect which is not correlated with the others. With these two new synthetic dimensions, around 88% of variability for the measures can be explained. The first dimension (Dim1) contained 65.43%, while 23.22% was explained by the 2 nd dimension (Fig 2). The heterogeneity between the observations is meanly due to the variables contained in Dim1 called "running. performance" and can be interpreted as follow: a negative value means "low performance" and a high positive value means "high performance".
Ten characteristics of match playing activities are used in this analysis but only the meaningful ones can explain the variability of observations (cos2>0.5). They are shown on Fig 3. A larger degree of collinearity was seen between Tack and Tack.suc than in activity rate and Ms. win. Moreover, these 2 groups show no correlation. However, the PCA is not very efficient here because only 38.51% of total variability is explained through these 2 dimensions. The first dimension (Dim1) contained 21.6%, while 16.91% was measured by the 2 nd dimension. It outlines the fact that the 10 characteristics have no signs of correlation between each other. Other links may exist but these are not detectable by linear methods such as PCA.

Performance insights from descriptors of training activity
As a preliminary analysis, several correlational matrices were calculated to assess the level of collinearity between WL indicators (explanatory variables gathered into a matrix called X), performance indicators (variables to be explained gathered into a matrix called Y) and the cross-correlation between X and Y. The results are presented using a black and grey colored gradient where dark colors represent strong correlations, positive or negative (Fig 4). No significant collinearity is noted in the cross-correlational matrix thus encouraging the use of nonlinear statistics analytical tools to study potential links between WL and performance indicators. "Running.performance", the first principal component of PCA corresponding to a performance descriptor, is analyzed hereby (Fig 5). On the left-hand side of the tree, node number 2 characterizes the mean level of performance of backs which is logically lower than forwards in terms of offensive activity. Node 19 contains observations on backs with greater levels of offensive performance. This is due to an acute amount of speed exertion > 21e+3 combined with a 4-week rolling average of heavy impacts (Hi.4weeks.SD) < 9.4 heavy impacts. Node 8 contains observations on backs with lower levels of offensive activity which is due to a Hi.4weeks.SD > = 9.4. According to the right branch of the tree, the higher level of offensive performance for forwards is due to a 6-week average of low running speed (LSR.moy.6) < = 8421. This regression tree illustrated that several parameters appear to influence running performance but have no significant effect according to a statistical test (conditional regression tree). Concerning the other significant effects observed on the relationships between WL parameters and activity indicators, Fig 6 reveals that the normalized (Z-score) number of sprints and accelerations was significantly and negatively affected when the average time spent under 85% HR max was above 218.992min (Fig 6A). Similarly, the normalized number of offensive duels won was significantly and positively affected by the chronic (4 week rolling average) number of HI (Fig 6B). Table 7 presents the summary of all the conditions tested when using this method.

Discussion
The main goal of this study was to detect the existing relationships between WL at short and moderate terms and performance or locomotor activity during matches of professional RU players throughout a season. Several preliminary steps were necessary to accomplish this: i) understanding the different factors that might influence WL and team performance, ii) synthesizing activity indicators to facilitate and simplify the modelling/definition of individual performance and iii) analyzing, with different statistical models (linear and threshold-based), the influence of different WL parameters on performance. The main findings showed that 2WL was influenced by playing position and player status. Indeed, backs presented greater external WL (GPS-based data) during training sessions than forwards (p<0.001), while starters expressed greater internal (S-RPE and TRIMPS methods) and external WL (p<0.001). Furthermore, when team performance was analyzed considering home advantage (using Britannic ranking classification), the findings demonstrated that collision load (number of heavy impacts) at short and moderate terms negatively influenced team performance (p<0.001).
Positive team performance was noticed when backs covered greater relative distance (m.min -1 ) and when forwards successfully tackled more. Furthermore, forwards also had a greater (p<0.05) activity index during victory. Concerning our attempt to define performance with new synthetic variables, we found that one can construct a synthetic index based on individual GPS data (individual Z-score, based on the average and SD for each player and for each GPS parameters throughout the season) regrouping the HMPD.min, HSR.min, TD.min, on one side, and Sp+Acc on the other. No synthetic index was found concerning specific activity parameters. Finally, the first method conducted to highlight the relationships between 2WL and individual locomotor activity/performance indicators, based on a linear model, did not allow for any observation of significant effect. The use of a threshold model, from data mining processes, permitted us to illustrate some significant effects of WL parameters on individual performance indicators. Indeed, the chronic (4-week rolling average) number of heavy impacts influenced positively the number of duels won during matches (p<0.05). Finally, these results highlight the difficulty to identify and synthesize physical performance in RU and also point out the high level of complexity encountered when establishing models to establish relationships between WL and physical performance/locomotor activity during matches. Before analyzing the influence of WL on performance, it was important to underline the different factors which influence 2WL. Indeed, several studies reported significant differences between forwards and backs concerning the internal and external WL of training weeks during preseason and in-season for professional rugby players [6,26]. The present findings confirmed, in part, the results arguing that backs have greater external WL than forwards. These differences were mainly explained by the difference in locomotor activity during training sessions. Nevertheless, no significant differences were observed regarding the internal WL indicators (S-RPE & TRIMPS). The scrum and lineout training sessions for forwards, which account for 20 to 30% of weekly training, do not generate running activity. Thus, external WL during this type of training cannot be recorded by GPS technology. Therefore, the differences observed in external WL between forwards and backs, do not reflect the difference in quantity of training. It is also important to note that the external load estimate by GPS technology may presents some limitations about the accuracy and the reliability of some variables like the accelerations and the metabolic power approach variables [27]. These limitations lead us to remain cautious about the indiscriminate analysis of data derived from GPS signals [27]. Playing position was not the only factor that explained the observed differences among WL parameters. In a similar way to a study carried out in soccer [21], we have analyzed the relationships between players' status and WL. Our findings demonstrate that substitute players, regardless of their position, had lower internal and external WL than starting players during weekly training. These results must be considered in moderate and long-term training processes. Indeed, players who substitute regularly were exposed to a lower internal and external WL. This trend may conduct to undertraining. Thus, team staffs should propose complementary training to these players to expose them to high intensity running and HR efforts. In their recent study, Dalton-Barron et al. [28] demonstrated that WL perception was influenced by different factors: playing position, previous match results, phase of the season and the time lapse between matches. Therefore, results from this study and prior ones, point out to the need for applying a multifactorial approach to plan and monitor the rugby players' WL during the different phases of the season (Tables 3 & 4).
Dalton-Barron's [28] study also reflects the significant impact of the competitive context on the perception of difficulty of training sessions during a "competitive-phase" week. Indeed, these results corroborate other studies which also demonstrated an effect of competitive stress on physiological adaptations, especially on endocrine responses [6,8,29]. In our findings, significant differences were found when comparing WL parameters during weeks with a match victory ( Table 4). The differences were mainly observed in the subjective perception (S-RPE) method and may suggest that stress before key matches of the season could induce an increase in WL perception. Indeed, in RU, home advantage was statistically demonstrated [30,31]. The team studied here was a bottom ranked team for which home matches were of crucial importance. Indeed, during season studied, the team won 14 of their 16 home matches and only won one away game (out of 16). Thus, the team studied prepared home matches with particularly high pressure. Subjective perception of difficulty of training sessions seems to be influenced by the competitive context (p<0.001, d = 0.4). Thus, a greater acute total distance (p<0.05, d = 0.3) was also observed and may signal a greater WL exposition during the weeks of victory. Trainers seemed to have a tendency (maybe unconsciously) to increase the external WL by increasing tactical and strategical situations to prepare for a challenging match. In an attempt to minimize the influence of match location on team performance (especially with the bottom ranking team), the Britannic ranking classification was applied to highlight the "positive" results (defensive bonus, draw and victory during away matches and bonified victories for home and away venues). With this filter of team performance, it appears that an important number of heavy impacts at short and moderate terms influenced performance negatively. This result could suggest that neuromuscular fatigue induced by repetitive heavy impacts [32,33] may also affect team performance during matches. However, because of our relatively small dataset, great caution should be taken when analyzing these results. Moreover, it is crucial to specify that all these results depend on multiple contextual factors (score, domiciliation, level of the opposition, climatic conditions, . . .) [34]. Indeed, studying one team during a Table 7. Overview of the different analysis perform to observe if some workload indicators influence (positively or negatively) the different performance/locomotor activity indicators.

TYPE OF INDICATORS PERFORMANCE PARAMETERS
unique season represents a complex protocol. Therefore, the results observed are highly linked to the context and specifity of the team. In terms of performance, it also seems that different individual locomotor activity parameters influence team performance (Table 5). Thus, the specific activity (number of actions normalized by ball-in-play time) and defensive performance of forwards (number of completed and offensive tackles) were greater during matches won (p<0.05). Thus, the number and percentage of completed tackles seemed to be a good indicator of defensive performance which contributes to team performance. These results confirmed the results presented in other studies [13,23] which attested the importance of defensive performance on match results. Our results also show that running activity (relative distance) of backs was significantly greater (p<0.05) during matches. Nevertheless, with the Britannic ranking classification, running activity was greater during negative results, especially for distance covered at high speed/ power intensity (p<0.05). These results are similar to those reported in other studies for which, in other team sports, running activity at high intensity was significantly greater for losing teams [35,36,37]. Time spent on defense and the number of defensive line-breaks conceded may require the necessity to realize more high intensity running efforts. This could explain the differences observed between positive and negative results. However, these results demonstrate the complexity of using GPS based data to identify a valid and reliable key performance indicator in RU throughout a season.
The main aim of this study was to identify how 2WL influences the activity/performance in professional RU players throughout a season. Thus, to analyze this influence, and to avoid comparing the inter-individual differences during matches, we chose to apply an individual normalized score (Z-score) based on the mean and the SD of all matches, individualized for each player and for each parameter over the entire season. Individual indicators, normalized by ball-in-play time, permitted to smoothen activity differences induced by the players' position and profile. By using this methodology, we expected to observe the intra-individual fluctuation of performance throughout the season. Therefore, we highlighted the performance peaks and performance drops throughout the season. From this data transformation, a reduction dimension of data was performed in order to summarize running and specific performance. Fig 2 shows that running performance could be synthesized into 2 dimensions. One regrouped total distance, high speed and high-power metabolic distances, while the other dimension included the number of sprints and accelerations. These results partially corroborate those of Weaving et al. [17] who also show that GPS data may be presented by total distance ran and by distance travelled at high-speed. In our study, we used more variables than Weaving et al. [17]. This is probably why the analysis carried out in the present study demonstrated the importance for including the number of very-high intensity efforts (sprint and high acceleration) in WL quantification in addition to "traditional" variables. The data collected for specific skills shows that no variable contains a sufficient linear co-variability that can be resumed by a synthetic index (Fig 3). These results prove that each action analyzed seems to be independent of another and should be studied separately. Indeed, at professional level, specific tasks and player profiles have a significant importance. The individual performance analysis, based on an individual normalized score, demonstrates that physical performance in RU is complex to summarize, especially in terms of sport specific actions. Finally, using a high-dimensional feature for performance identification seems to be relevant for collecting high quantities of relevant information. Difficulties will nevertheless arise during storage, computation and, consequently, on the understanding of the phenomenon.
The final objective of our study was to highlight the influence of 2WL, at short and moderate terms, on individual performance/locomotor activity during matches. The first method (Fig 4) was based on a linear model analyzing the correlation between variables of WL (X) and activity parameters (Y). The use of this method did not reveal any significant effect of WL on the activity/performance during matches. This first result outlines the limitation of linear models to analyze the interactions between WL and performance. Data mining processes made it possible to reveal significant effects of WL variables on some locomotor activity/performance parameters (Figs 5 & 6 and Table 7). Indeed, data mining processes demonstrated that the number of sprints and high accelerations were negatively influenced when the weighted average of the time spent in low HR intensity (>85% HR max ) was superior to 218.9 min (Fig 6A). This result emphasizes that too much time spent at low-intensity efforts during training sessions may negatively impact sprinting/accelerating ability. These results are in agreement with those observed in other studies showing the negative effect that training spent in low intensity zones has on the reduction of neuromuscular performance during a professional team-sport season [29,38]. Indeed, Dubois et al. [29] observed significant correlations between % of moderate and high-speed running distances and drop jump testing performance at short term. This demonstrated the negative influence of low-intensity training sessions on neuromuscular performance. Nevertheless, these results do not suggest that training spent at low intensity should be completely ignored. Indeed, during a typical competitive week, the first session of the week (36h after a match) was devoted to technical and tactical training and was performed at low intensity according to a tactical periodization approach [39]. Therefore, the present results seem to show more interest in devoting training to high intensity efforts during other training sessions of the week, even if it means reducing training volume.
In the present study, another significant correlation was observed between chronic load (4 week rolling average), the number of severe impacts (>8G) and the number of successful offensive duels (Fig 6B). In fact, a chronic number of severe impacts greater than 22.6 per week positively impacted this performance parameter. Indeed, the capacity to beat a defender represents an important aspect of offensive performance and contributes to positive team performance [14]. In our study, a greater number of impacts was reached during small-sided training situations. This type of situation, which resembles competitive situations because of increased space-time pressure conditions, enhanced a player's ability to beat his direct opponent. The results concerning the number of severe impacts also illustrates training complexity and the particular difficulty to balance training loads between over-reaching and under-training [1,40]. Indeed, Table 4 shows the negative effects that high quantities of severe impacts during training has on performance at short and moderate terms. This result could be explained by neuromuscular collisions-induced fatigue [32,33]. Dubois et al. [6] also showed a possible negative effect of low exposure to impacts during training sessions on injury rate at short term. However, this study did not specify the effect of this parameter on the types and severity of injuries. Nevertheless, all these results demonstrate the necessity to include specifictraining situations including high-intensity actions combined with an "optimal" number of contacts to promote the optimization of individual and team performance. Finally, data mining processes seem to be a "new" method that may contribute to a better understanding of the underlying interactions between practice dosage of locomotor activity/performance in a competitive context [41]. However, despite a high-dimensional approach including an important number of variables, only a few interactions were significantly observed. This indicates that team and individual performance remains difficult to model and identify. Furthermore, the contextual factors (social, psychological, motivational, . . .) were not considered in WL quantification. These factors could interfere in the "dose-response" relationships between training "dose" and physiological adaptations or performance (response). Finally, it would be interesting to study these interactions individually. Indeed, an individual's physical capacity profile may alter how the player copes with the physiological stress induced from practice [8].

Conclusion and practical applications
The study highlighted the importance of defensive skills for team performance during elite RU matches. Indeed, the number of tackles completed and the number of offensive tackles, especially involving forwards, seemed to be a positive indicator of performance in elite RU, thus corroborating the results of other studies [13,23]. Moreover, forwards presented a greater (p<0.05) activity index (number of coded actions normalized to ball-in-play time) when matches were won, demonstrating the importance of developing a player's ability to repeat high-intensity rugby-specific actions. As for backs, the locomotor activity (GPS data) seems to be an indicator of performance. Nevertheless, all these results must be considered cautiously as they were obtained from analysis based on a single team. Therefore, all these results were largely influenced by the team's tactical and strategical preferences as well as its' mindset. Secondly, this study pointed out that locomotor activity during matches can be summarized by 2 dimensions: one including the total distance travelled, high-speed and high-metabolic running efforts and a second one which corresponds to the number of sprints and fast accelerations. Unfortunately, it was not possible to resume the different specific actions into a synthetic index relating the influence of positional demands and activity profiles in elite rugby players. Finally, the last purpose of this study was to model the influence of WL at different terms (acute, chronic and up to 8 previous weeks) on match performance. The first method based on colinear analysis did not provide significant relationships between WL parameters and performance variables. The use of a threshold-based model, from data mining processes, permitted to identify the influence of WL parameters on different performances variables. Thus, the chronic number of severe impacts seemed to be one of the most influential factors of specific performance, and more specifically on the number of offensive duels won. Therefore, the specific drills/skills including contacts/collisions seems to increase the player's ability to beat the opposition. However, other studies revealed that a high exposure to collision may induce neuromuscular fatigue [6,32,33]. This parameter illustrates perfectly the complexity of training: i.e. how to tune WL as to be between the too much and the not-enough. To conclude, we think that data mining processes will help scientists and sports practitioners develop a better understanding of the underlying relationships between 2WL and match performance. This will undeniably contribute to the ever-striving quest of reaching peak performance by optimizing training processes.