Intra-and interrater reliability of subjective assessment of the drop vertical jump and tuck jump in youth athletes

Objectives: To investigate intra-and interrater reliability of the subjective assessments of ﬁ lmed DVJ and TJA in youth male and female soccer players and to compare subjective assessment of the DVJ with two-dimensional movement analysis. Design: Cross-sectional study. Participants: 115

Neuromuscular control at landing can be assessed during the drop vertical jump (DVJ) test (Hewett et al., 2005).To make the test clinic friendly, criteria for subjective assessment of knee alignment during the DVJ have been developed (Nilstad et al., 2014;Stensrud et al., 2011).Assessment of reduced or poor knee control has been shown to be associated with increased frontal plane knee angles (Stensrud et al., 2011).Two-dimensional analysis of normalized knee and ankle separation distance is a reliable method to record lower limb axial alignment during the DVJ (Noyes et al., 2005(Noyes et al., , 2011(Noyes et al., , 2012)).The association between subjective assessment and twodimensional analysis of normalized knee separation distance during DVJ is yet to be established.Interrater reliability for the subjective assessment of the DVJ was substantial to almost perfect (kappa 0.52e0.92)when assessed in elite female soccer players (Nilstad et al., 2014) and moderate (kappa 0.45) in youth athletes and non-athletes (Toivo et al., 2018).
The tuck jump assessment (TJA) was developed to assess neuromuscular control and identify lower extremity technique flaws during a demanding plyometric exercise over 10 s (Myer et al., 2008).The interrater reliability when assessing university students was almost perfect (kappa 0.88) using a small sample of 10 participants (Herrington et al., 2013), and poor (ICC 0.47) when 5 raters with different clinical experience assessed 40 participants (Dudley et al., 2013).
The reliability of DVJ and TJA in youth athletes is unclear.Assessment of adolescents entail special challenges since young athletes might display greater heterogeneity in performance of screening tests due to large variations in physical maturity, fitness and neuromuscular control compared to adults and elite athletes (Quatman-Yates et al., 2012).The aim of the present study was to investigate the intra-and interrater reliability of the subjective assessments of filmed DVJ and TJA in youth male and female soccer players.A secondary aim was to compare subjective assessment of the DVJ with data from two-dimensional movement analysis.Our hypotheses were that assessments of filmed drop vertical jump (DVJ) and tuck jump assessment (TJA) would have high intra-but varying interrater reliability, and that subjective and objective assessment of DVJ will be mainly concordant.

Participants and recruitment
One-hundred and fifteen soccer players (66 boys, 49 girls) mean age 14 ± 1 (range 13e16) years from 8 youth teams were tested.Mean height and body mass of included players were 166 ± 9 cm and 55 ± 10 kg.The players had mean 7.4 ± 2.0 years of football experience and 4.3 ± 1.3 football exposures (trainings and matches) each week.All teams had scheduled football training at least 2 sessions per week.All players were informed that to take part in testing they should be able to participate with maximum effort.No players reported injuries at the time of testing, and all were participating regularly in football training.The study was carried out in € Osterg€ otland, Sweden.Testing was carried out at the beginning of the second half of the competitive season after the school summer break in AugusteSeptember 2017.Informed consent was obtained from all players and legal guardians prior to participation.The study was approved by the Swedish Ethical Review Authority: Dnr 2017/294-31.

Assessments
All players were recommended to wear tight shorts, t-shirt, short socks and indoor shoes.Due to their preference, two players performed the DVJ barefoot and five players performed the TJA barefoot.Two physiotherapists served as test leaders.They used standardized instructions and demonstrated the performance of each test before the players made the practice trials.Both tests were filmed with two GoPro Hero5 cameras positioned to analyze movements in the frontal plane (TJA and DVJ) and the sagittal plane (TJA) and films were used for post-test assessment.

Drop vertical jump
The DVJ was performed as described by Hewett et al. (Hewett et al., 2005) The player stood on a 30 cm high box with the feet 35 cm apart, dropped down from the box and immediately made a maximum vertical jump and tried to reach an overhead target positioned 2,6 m above.Players performed at least three practice trials prior to testing.Three valid trials were filmed and used for post-test assessment.The first landing phase of the DVJ, i.e. after the drop from the box, from initial contact to next take off, was used for analysis (Hewett et al., 2005).The participants' ability to control their knees in the frontal plane during the landing phase was assessed using a graded scoring scale ranging from 0 to 2 according to the criteria by Nilstad et al. (Nilstad et al., 2014) (Table 1 and Appendix 1).

Tuck jump assessment
The player jumped repeatedly for 10 s and attempted to lift the knees to hip level (parallel to the ground) during the jump and start a new jump immediately upon landing.Jump and landing technique was assessed according to ten criteria (Table 2 and Appendix  2).A dichotomized grading scale was used (Myer et al., 2008(Myer et al., , 2011)).Free practice was allowed prior to testing.Practise trials were short to avoid fatigue.One valid trial was filmed and used for post-test assessment.

Video analysis
Each video was viewed in real time and slow-motion as many times as needed.The subjective assessments of the DVJ and the TJA from 3 independent raters were used to calculate interrater reliability (Fig. 1).All raters were sports physiotherapists with 13 (rater 1), 19 (rater 2), and 11 years of clinical experience (rater 3), respectively.All raters were familiar with the tests and before assessments the raters discussed the interpretation of the assessment criteria.For the DVJ, interrater reliability was calculated in two separate ways; 1) for the scoring of all assessed trials (115 players x 3 trials, giving a total of 345 assessed trials), and 2) for the jump with the worst technique for each player (115 assessed trials).The latter was chosen since the worst trial of three is usually used for analyses (Nilstad et al., 2014).For the TJA, interrater reliability was also calculated in two ways; 1) for assessments on each separate criteria, and 2) for the total score (0-10 points).Intrarater reliability was calculated based on two assessments at least three weeks apart by rater 1 and 2 and presented as separate intrarater reliability scores for each rater.Intrarater reliability for the DVJ was calculated based on assessments of the trial representing the worst technique and for the TJA on each separate criteria and the total score.
The trial representing the worst technique, as measured subjectively, was used for further motion analysis in Dartfish Pro Suite 7 performed by rater 1.In Dartfish, the hip, knee and ankle separation distances were calculated based on markings on the greater trochanter, center of patellae and lateral malleoli of the feet at three different time points: T1; initial contact (the frame where the player's feet just touched the ground), T2; maximum knee flexion and T3; preparation for take-off (Appendix 3).(Noyes et al., 2005(Noyes et al., , 2012) ) The exact distance between the hip markers was used to calculate the normalized distance between the knee and ankle markings (Noyes et al., 2005) in the 3 different time points.

Statistical analysis
Percent agreement between raters and between assessments were calculated.Kappa statistics was used for calculation of reliability of nominal data (TJA separate criteria) and ordinal data (DVJ).Prevalence and bias-adjusted kappa (PABAK) was used for scores with a binary outcome (TJA separate criteria) and weighted kappa (quadratic) was used for scores with three possible outcomes (DVJ) (Gisev et al., 2013;Kottner et al., 2011;Sim & Wright, 2005;Walter et al., 1998).For calculation of reliability on interval data (TJA total score), intraclass correlation coefficient (ICC) 3.1 ; 2-way mixed effects model, absolute agreement definition, single measure was used for intrarater reliability, and ICC 2.1 ; 2-way random effects model, absolute agreement definition, single measure was used for interrater reliability (Koo & Li, 2016).
For the measure of correspondence between the subjective assessment of DVJ and the 2D analysis the normalized knee separation distances (NKSD) were presented descriptively for players rated as 0, 1 and 2. Additionally, the distribution of NKSD was compared across the three time points of the DVJ using repeated measures ANOVAs and repeated contrast analyses among the players rated as 0, 1 and 2. The NKSD data were mainly normally distributed with small skewness acceptable for analysis with ANOVA.Rater 1 did the 2D analyses and correspondingly her subjective assessments of DVJ was the comparator.

Drop vertical jump
Classification of neuromuscular control during the DVJ for all 345 trials and 115 worst trials is presented in Table 3. Intrarater reliability for the subjective assessment of the DVJ was substantial for rater 1 and almost perfect for rater 2. Interrater reliability scores were substantial to almost perfect (Table 4).

Comparisons between subjective and objective analysis of drop vertical jumps
When comparing the subjective assessments of DVJ with 2D analysis players rated as 0, 1 and 2 displayed similar NKSD at T1, whereas those rated as 1 or 2 had lower values at the other time points (Table 5).However, the variability between players in NKSD were large also within each group (subjectively rated as 0, 1 and 2).In the repeated measures ANOVA there was a significant main effect with decline in NKSD between the three time points (T1, T2 and T3) (p < 0.01), and a group effect (p < 0.01), where players rated as 1 and 2 decreased their NKSD between time points whereas players rated as 0 did not change (Fig. 2).

Tuck jump assessment
Intrarater reliability for the TJA total score was moderate for rater 1 and almost perfect for rater 2. For the separate items, intrarater reliability varied between fair to almost perfect (Table 6).Interrater reliability for the TJA total score was moderate.For the separate items, interrater reliability scores varied between poor to almost perfect (Table 6).

Discussion
The main finding of the study was that the subjective assessment of filmed DVJ and TJA in youth athletes was rater dependent.Although the assessments were mainly concordant, the ratings of the same film were not identical, neither between different assessments by the same rater (intrarater reliability) or between different raters (interrater reliability).This needs to be taken into consideration when using these tests for screening or evaluation purposes.The rater's clinical experience and experience in using the tests are probably prerequisites for consistency in assessments.
In the present study experienced sports physiotherapists performed the assessments.If less experienced test leaders assess the tests the reliability can be expected to be even lower.In the present

Table 1
Assessment of the participants' ability to control their knees in the frontal plane during the landing phase of the drop vertical jump using a graded scoring scale ranging from 0 to 2. Score Criteria (Nilstad et al., 2014) 0 Good control of the knees in the frontal plane during the landing phase, in which the participant displayed proper knee alignment with a straight line from the knees to the mid toes, no obvious valgus motion of either knee and no mediolateral side-to-side movement of the knees. 1 Reduced control and improper knee alignment, with 1 or both knees moving into a slight valgus position and/or some mediolateral side-to-side movement of the knees. 2 Poor control and poor knee alignment, with at least 1 knee clearly moving into a substantial amount of valgus (i.e.knee medial to foot) and/or clear mediolateral side-to-side movement of the knee.

Table 2
Assessment of neuromuscular control during the tuck jump assessment according to ten criteria.
Original criteria (Myer et al., 2008) Additional clarification to criteria Knee and thigh motion 1 Lower extremity valgus at landing Athlete displaying a "knock-kneed" position while in contact with the ground (Myer et al., 2011) Hip, knee and foot not aligned, collapse of the knee inwards (Herrington et al., 2013) 2 Thighs do not reach parallel to the ground (peak of jump) Athlete's inability to create enough power to achieve a height at which legs can become properly tucked (Myer et al., 2011) 3 Thighs not equal side to side (during flight) Side dominance is often visible when an athlete has one thigh that does not achieve the same height as their contralateral thigh (Myer et al., 2011) Foot position during landing 4 Foot placement not shoulder width apart Feet either closer together or farther apart (Myer et al., 2011) 5 Foot placement not parallel (front to back) Often an athlete will drop one foot behind the other while on the ground to help minimize forces on the weaker limb (Myer et al., 2011) 6 Foot contact timing not equal The athlete will occasionally change the timing of the foot contacts to protect a weaker limb (Myer et al., 2011) Asymmetrical landing (Herrington et al., 2013) 7 Excessive landing contact noise Typically displayed by the athlete through landing with flat feet and is typically accompanied by a lack of knee and hip flexion during the stance phase (Myer et al., 2011) Plyometric technique 8 Pause between jumps 9 Technique declines prior to 10 s 10 Does not land in same footprint (excessive in-flight motion) The athlete tends to float around the jumping area due to a lack of full body or core control (Myer et al., 2011) Inconsistent point of landing (Herrington et al., 2013) study we assessed filmed tests, while the subjective assessment of the DVJ was originally described using real-time observational screening (Nilstad et al., 2014).When assessing tests in real-time the reliability may be expected to be lower.Assessment of filmed tests may be more accurate as the films can be viewed several times in both real-time and slow motion and can also be paused.Limitations using filmed tests include that the video cameras need to be placed straight in front and at the side to the test subject.If the player does not land in the same footprint during the test the film is no longer in the exact movement plane.Further, some of the TJA criteria, such as the criteria of excessive landing noise, can be difficult to assess on a film.
In the present study the agreement between assessments and between raters differed for the separate items of the TJA.The intrarater reliability varied between fair to almost perfect and the interrater reliability between poor to almost perfect.Previous research found poor reliability for separate items (Gokeler & Dingenen, 2019;Read et al., 2016).Reliability between two separate test sessions of the TJA in youth soccer players was moderate for the total score, with only few of the separate criteria showing acceptable reliability (Read et al., 2016).Consequently, caution is needed when solely interpreting the total score.The TJA consists of different factors and the use of the TJA total score, a unidimensional construct, for screening purposes has been questioned (Lininger et al., 2017;Mayhew et al., 2017).The same total score could be obtained by different combinations of scores on separate criteria (Gokeler & Dingenen, 2019), and hence different alterations in movement pattern and neuromuscular control.The criteria used for assessments of DVJ and TJA leaves room for interpretation (Lininger et al., 2017).Clarification of the criteria for the tests would potentially increase the reliability between assessments.Since it is a subjective evaluation it is difficult to quantify how severe an altered movement pattern needs to be to be marked as reduced neuromuscular control (Dudley et al., 2013).To improve the TJA we integrated both the original instructions and criteria (Myer et al., 2008) and the supplementary instructions for the assessments that have been published later (Herrington et al., 2013;Myer et al., 2011).Further, a subjective evaluation results in a rough estimate of   the movement pattern, and even within each scoring there are variations in performance.For the TJA, we used the original dichotomous scoring system.This might be too simple to assess complex movement patterns and a modified 3-point ordinal scale might be more useful to distinguish between minor and major flaws (Fort-Vanmeerhaeghe et al., 2017).
One to two fifths of the players were rated as having good control in the DVJ.Players with good control differed in NKSD compared to those rated as having reduced or poor control.Additionally, players rated as having good control maintained their fairly high NKSD across the three time points of the DVJ trial whereas those rated as 1 and 2 landed with smaller NKSDs and decreased these even more over the three time points.It was also obvious that the NKSD varied within these three groups, suggesting that there is large variation among players.The result implies that it may be valuable to use subjective assessment and 2D analysis in tandem, since neither of the analyses might be sufficient for standalone measurements of knee control.
Whereas the DVJ offers greater consistency across trials with standardized jump-landing height, the TJA is more dynamic with greater variability in movement technique across repetitions and trials (Lloyd et al., 2019).In the present population players showed rather large variability in both tests, implying that for youths the measurement error may be rather large.The youth soccer players tested in the present study showed a median of 3e4 flaws during the TJA.This can be compared to the suggested cut-off of 6 or more flawed techniques as an indication of individuals who should be targeted for technique training (Myer et al., 2008).In our cohort, 'lower extremity valgus at landing' and 'thighs do not reach parallel to the ground' were the most common flaws.Some criteria of the TJA focus on asymmetrical movement patterns, which can be caused by an avoidance pattern where the athlete minimizes forces on the weaker limb.In our cohort there were few flaws on these criteria.This is an expected finding since we measured uninjured players, where asymmetry should be less frequent compared to athletes who return to sport after an injury or have ongoing complaints in one extremity.To resemble the players' movement pattern during regular training they were tested wearing their usual indoor shoes.Testing barefoot or with a standard shoe could provide more comparable results.Laboratory based screening tests are expensive and time consuming, and therefore not suitable for large scale screening in the athletic field.Instead, clinic-based tests with simpler analysis methods enable field-based screening of athletes (Myer et al., 2008;Padua et al., 2015;Stensrud et al., 2011).A major flaw in using simple screening tests for evaluation of movement patterns are that motions are not viewed three-dimensionally.Valgus motion visible during a screening test is in fact not a motion restricted to a single plane, but rather also a combination of femoral internal rotation and knee flexion (Donnelly et al., 2012).This may explain the poor ability of screening tests to identify high-risk lower limb mechanics during a sport-specific landing task (Fox et al., 2017).
Maturation influence dynamic valgus alignment (Schmitz et al., 2009;Thompson-Kolesar et al., 2018) and jump-landing performance (Fort-Vanmeerhaeghe et al., 2019).Therefore, the present study contributes with valuable data, since there are limited studies on screening tests on youth soccer players.Another strength of this study is that assessment of intra-and interrater reliability was calculated on a sufficient sample based on sample size calculation.Most earlier studies involve smaller samples.
Clinical implications of the present study are that the subjective assessment of the DVJ and the TJA were mainly concordant but varied between raters and between assessments, and this needs to be taken into consideration when using these tests for screening or evaluation purposes.Due to large variation in NKSD between players and across the three groups subjectively rated as having good, reduced or poor knee control during DVJ, neither subjective nor 2D analysis seem suitable for stand-alone measurements of knee control.

Conclusion
The subjective assessment of filmed DVJ and TJA in youth athletes was rater dependent.For the assessment of the DVJ the intraand interrater reliability was substantial to almost perfect.For the assessment of the TJA total score, using a dichotomized grading scale, intrarater reliability was moderate to almost perfect and the interrater reliability was moderate.On a group level players with different mean normalized knee separation distance were categorized into the three groups subjectively rated as having good, reduced or poor knee control, but within-group variability was large, implying that the subjective assessment should be used in tandem with 2D analysis.

Trial registration
Clinical Trials gov identifier: NCT03251404.

Ethics approval and consent to participate
The study was approved by the regional ethical review board in Link€ oping, Sweden: Dnr 2017/294-31.Players and their legal guardians received written information about the study and gave Values are a n (percent) of players rated with each flaw; b for criteria 1-10: percent agreement (PABAK, Prevalence and bias-adjusted kappa; 95% CI); for TJA total score and for number of jumps: ICC (95% CI).
written informed consent before study commencement.Players depicted in the additional files specifically consented to their pictures being shown in research presentations.

Fig. 1 .
Fig. 1.Flow diagram of the intra-and interrater reliability assessments and the comparison between subjective assessment and two-dimensional motion analysis.Abbreviations: DVJ; drop vertical jump, NKSD; normalized knee separation distance, TJA, tuck jump assessment.

Fig. 2 .
Fig.2.Distribution of normalized knee separation distances across the three time points of the drop vertical jump in players subjectively rated as 0 (good control), 1 (reduced control) and 2 (poor control).Values are mean and 95% confidence intervals.Time points are T1 initial contact, T2 maximum knee flexion and T3 preparation for take-off.Abbreviations: CI; confidence interval, DVJ; drop vertical jump, NKSD; normalized knee separation distance.

Table 3
Classification of neuromuscular control during the drop vertical jump based on all trials and for the trial representing the worst technique for each player.
Values are n (%).All trials total n 345 and trial representing the worst technique total n 115.

Table 4
Intra-and interrater reliability of the assessment of neuromuscular control during the drop vertical jump.
a Weighted kappa (quadratic).b The trial representing the worst technique for each player.

Table 5
Distribution of normalized knee separation distances in relation to subjective rating of drop vertical jumps.

Table 6
Scoring and intra-and interrater reliability of the assessment of neuromuscular control on each separate criterion and the total score of the tuck jump assessment.