Quantifying Jump Height Using Markerless Motion Capture with a Single Smartphone

Goal: The countermovement jump (CMJ) is commonly used to measure lower-body explosive power. This study evaluates how accurately markerless motion capture (MMC) with a single smartphone can measure bilateral and unilateral CMJ jump height. Methods: First, three repetitions each of bilateral and unilateral CMJ were performed by sixteen healthy adults (mean age: 30.87$\pm$7.24 years; mean BMI: 23.14$\pm$2.55 $kg/m^2$) on force plates and simultaneously captured using optical motion capture (OMC) and one smartphone camera. Next, MMC was performed on the smartphone videos using OpenPose. Then, we evaluated MMC in quantifying jump height using the force plate and OMC as ground truths. Results: MMC quantifies jump heights with ICC between 0.84 and 0.99 without manual segmentation and camera calibration. Conclusions: Our results suggest that using a single smartphone for markerless motion capture is promising. Index Terms - Countermovement jump, Markerless motion capture, Optical motion capture, Jump height. Impact Statement - Countermovement jump height can be accurately quantified using markerless motion capture with a single smartphone, with a simple setup that requires neither camera calibration nor manual segmentation.

Abstract-Goal: The countermovement jump (CMJ) is commonly used to measure lower-body explosive power. This study evaluates how accurately markerless motion capture (MMC) with a single smartphone can measure bilateral and unilateral CMJ jump height. Methods: First, three repetitions each of bilateral and unilateral CMJ were performed by sixteen healthy adults (mean age: 30.87±7.24 years; mean BMI: 23.14±2.55 kg/m 2 ) on force plates and simultaneously captured using optical motion capture (OMC) and one smartphone camera. Next, MMC was performed on the smartphone videos using OpenPose. Then, we evaluated MMC in quantifying jump height using the force plate and OMC as ground truths. Results: MMC quantifies jump heights with ICC between 0.84 and 0.99 without manual segmentation and camera calibration. Conclusions: Our results suggest that using a single smartphone for markerless motion capture is promising.
Impact Statement-Countermovement jump height can be accurately quantified using markerless motion capture with a single smartphone, with a simple setup that requires neither camera calibration nor manual segmentation.

I. INTRODUCTION
The countermovement jump (CMJ) is commonly used to measure lower-body explosive power and is characterised by an initial downward movement of the centre of mass (COM), known as countermovement, before toe-off [1]. Performance assessment with CMJ often involves motion capture and measurement of metrics such as peak velocity and vertical jump height. Traditionally, motion capture is performed using wearable sensors, optical motion capture (OMC) equipment and force plates, which are highly accurate. However, compared to smartphones, they are relatively expensive, not readily portable, and their operation requires some level of technical instruction. In addition, OMC requires physical body markers, which can be affected by skin and clothing artefacts. Moreover, wearable sensors, physical markers, and the awareness of being under observation may alter the real performance of subjects [2,3].
Recent advances in computer vision research have enabled markerless motion capture (MMC) from videos. MMC often relies on human pose estimation (HPE) algorithms such as AlphaPose [4], OpenPose [5], and DeepLabCut [6]. These MMC techniques have shown potential to replace OMC, especially since smartphones are ubiquitous. However, there is still a lot to be done in evaluating the accuracy and usability of MMC.
Existing MMC approaches can be categorised based on capture plane (2D or 3D) and number of cameras (multi-or single-camera). 2D monocular (single-camera) techniques have been used for quantifying limb kinematics during underwater running [7] and sagittal plane kinematics during vertical jumps [8]. However, these works rely on deep learning approaches, where the generalisation ability depends on the size and diversity of the data and the model architecture. For example, trained athletes, casual trainers, and rehabilitation patients will exhibit different performance ranges. Since collecting large quantities of representative data is difficult, we take an alternative approach here, a quantitative approach, and we focus on the ease of deployment in practice and ease of use. The MyJump2 [9] app has been deployed for measuring jump height using a single smartphone. However, it requires manual selection of jump start and end frames. Previous researchers have performed 3D MMC using multiple cameras [10,11]. However, the 3D multicamera approach requires careful calibration and reconstruction of 3D poses from multiple 2D camera angles, which is not feasible for wide deployment in practice.
Therefore, we evaluated a single-smartphonebased MMC in measuring bilateral and unilateral countermovement jump height. Our main contributions are: 1) We use a simple setup with a single smartphone, with no strict requirements on view perpendicularity and subject's distance from the camera. This is a more realistic application setting where MMC is used outside the lab, without specialised equipment. 2) We show how to exploit gravity as reference for pixel-to-metric conversion as proposed in [12], removing the need for reference objects or manual calibration.
3) We analyse how accurately MMC measures jump heights compared with OMC and force plates. 4) We discuss situations in which MMC could be potentially useful.

A. Participants
Sixteen healthy adults (mean age: 30.87±7.24 years; mean BMI: 23.14±2.55 kg/m 2 ) volunteered to participate in this study. The dominant foot of each participant was determined based on the foot with which they kick a ball [13]. Each participant signed the informed consent form approved by the Human Research Ethics Committee of University College Dublin with Research Ethics Reference Number LS-C-22-117-Younesian-Caulfield.

B. Tasks
After a five-minute warm-up, each participant performed three repetitions each of CMJ bilateral (BL) and unilateral (UL) while simultaneous motion capture was performed using force plates, OMC, and MMC ( Fig. 1 C. Apparatus 1) Force Plate: AMTI 1 force plates sampling at 1000 Hz were used as the first ground truth. To obtain the flight time T f for each jump, we first identified jump repetitions by selecting force values less than 5% of the force in the stance phase. Then, for each selected repetition, we identified the toe-off and landing forces as sudden force changes relative to the noise of the unloaded force plate, thereby obtaining T f more precisely (Fig. 2). We obtained the jump height in centimetres as where g is the acceleration due to gravity [14].
2) Optical Motion Capture: Optical motion capture was performed using four synchronised CODA 2 3D cameras sampling at 100 Hz and synchronised with the force plate. Four clusters, each consisting of four light-emitting diode (LED) markers, were placed on the left and right lateral sides of the thigh and shank (Fig. 3). Moreover, six LED markers were placed on the anterior superior iliac crest (anterior and posterior), and greater trochanter (left and right). Three LED markers were attached to the lateral side of the calcaneus and on the first and fifth metatarsals of the dominant foot. 1 Advanced Mechanical Technology, Inc (https://amti.biz) 2 Charnwood Dynamics, UK (https://codamotion.com) For a motor task with duration T seconds and K tracked joints, CODA outputs a sequence of 3D coordinates {(x t i , y t i , z t i )|i = 1, ..., K; t = 1, ..., 100T } in millimetres; where z is the vertical axis, and 100 is the sampling rate.
3) Markerless Motion Capture: Markerless motion capture was performed in the side view using one Motorola G4 smartphone camera with a resolution of 720p and a frame rate of 30 frames per second (fps). The smartphone was placed on a tripod perpendicular to the dominant foot of the participant. We placed no strict requirements on camera view perpendicularity and distance to the participant. However, we ensured that the camera remained stationary and participants remained fully visible in the camera view.
To obtain motion data from the recorded videos, we performed 2D HPE using OpenPose [5]. The HPE algorithm outputs a sequence are the 2D coordinates in pixels, and c t i ∈ [0, 1] is the probability for joint i in frame t.

D. Data Preprocessing
During preprocessing, we performed denoising, segmentation, resampling, and rescaling. 1) Denoising: As shown in Fig. 3(a), occasional false detections in pose estimation appear as spikes on the motion time series. In most cases, these spikes could be removed by smoothing. However, 19 unilateral jumps such as Fig. 3(b) showed uncharacteristic movements and were removed as failure cases. To avoid filtering out important motion data, we performed smoothing of the OMC and MMC time series using z-score smoothing [15], proposed specifically for spike removal in motion sequences, and a second-order Savitzy-Golay [16] (Savgol) filter. The Savgol filter is known to smooth data with little distortion [17], and we chose a window size of 21 to preserve the main maxima and minima of the time series for accurate segmentation.
2) Segmentation and Resampling: Each jump repetition is characterised by a dominant peak corresponding to the maximum vertical height attained by the hip (Fig. 4). Using these peaks as reference, we segmented each jump with a window t secs to either side of each peak, where t is based on exercise duration and capture frequency. This enabled the synchronisation of OMC and MMC based on start and stop times for each task. After segmentation, we upsampled the MMC time series to match the length of the OMC time series using Fast Fourier Transform resampling [18], which minimised distortion.
3) Rescaling: Two approaches were taken to rescale MMC from pixels (px) to a metric scale, namely reverse minmax (RMM) and pixel-to-metric (PTM).
Reverse MinMax (RMM) involved using OMC as reference to rescale MMC into metric mm. This was done by applying MinMax on both OMC and MMC, and then rescaling MMC into mm using the scaling factor obtained from OMC. Let vectors p mm and q px represent the OMC (in mm) and MMC (in px) time series respectively. We obtained q * = where i = 1, ..., N , and N is the length of q. We then obtained q px in mm scale as Since RMM requires OMC as reference, it can be used for evaluation purposes only. Pixel-to-Metric (PTM) Conversion was performed based on the 'free-fall' of the centre of mass during a vertical jump. PTM uses g, the universal acceleration due to gravity as reference as proposed in [12]. From Newton's law of motion, the motion of a rigid body 3 in free fall is described by where d 0 is the initial position in metres (m), v 0 is the velocity in m/s, and t is the elapsed time in seconds (secs). We set the free-fall duration, T , to depend on total hip vertical displacement, such that Each value is the mean across three repetitions. F: Failure cases (Fig. 3). E: The corresponding FP, OMC, and RMM unilateral jumps are excluded from analysis.

E. Quantifying Jump Height
We measured jump heights directly from the OMC and rescaled MMC time series as the maximum vertical displacement of the fifth metatarsal (small toe). We believe this approach is more straightforward than basing measurements on the flight time of the centre of mass, which may vary based on jump strategy.

III. ANALYSIS AND RESULTS
The jump height reported for each participant is the mean of all three repetitions performed for each task (Table I) pixel-to-metric (PTM) approaches as described in Section II-D3. The mean R across all the participants was 3.43mm/px. In cases of errors like the one shown in Fig. 3, the mean value of R was used.

A. Analysis
We consider all jump repetitions from all participants as individual measurements, thereby recording 6 jumps per participant and 96 jumps in total, of which 77 (48 bilateral and 29 unilateral) were valid and used for analysis. For quantitative comparison, we use the intraclass correlation coefficient [19] (ICC) and Bland-Altman analysis [20] (BA). ICC and BA are often used for comparing new methods of measurements with a gold standard [9,14,21]. 1) Intra-class Correlation (ICC): We took the simultaneous capture of each jump repetition by FP, OMC, PTM and RMM each as a rating. Using the ICC 2,1 , also known as the "two-way random effects, absolute agreement, single rater/measurement" according to the McGraw and Wong [22] convention, we compute the inter-rater ICC for four pairs of methods: OMC vs RMM, OMC vs PTM, FP vs RMM, and FP vs PTM, where FP and OMC are taken as ground truths in each case. The ICC ∈ [0, 1] is a measure of the agreement with the ground truths, where a value closer to 1 is preferred.
We also computed the intra-rater ICCs to obtain the intra-session test-retest reliability of each measuring technique (shown in Table II) across the three repetitions for each participant. We obtained the ICCs using the Pingouin [23] intraclass corr module.
2) Bland-Altman Plots: The Bland-Altman plots are often used in clinical settings to visualise the agreement between two different methods of quantifying measurements based on bias and limits of agreement (LOA) [24]. The bias b for each MMC measurement technique compared to ground truth is given by the mean of the differences between individual measurements. The LOA is defined as and SD is the standard deviation of the differences between the two measurements. At least 95% of jumps measured with MMC will deviate from OMC by a value within the range [c 0 , c 1 ], where a narrower LOA means better agreement with ground truth. We perform Bland-Altman analysis (Fig. 6) using statsmodels [25].

B. Results
In this section, we analyse the level of agreement of MMC with OMC and we put this work in context with similar approaches based on ICC, bias, LOA, and simplicity of setup (Table III).

1) MMC vs OMC:
First, the accuracy of MMC in quantifying jump height is evaluated with OMC as ground truth. Both MMC and OMC are measured using the vertical displacement of the toe. As shown in Table III, both MMC RM M and MMC P T M achieve results comparable with the work of [21], which was also evaluated using OMC equipment. It is worth noting that our PTM approach assumes a simpler setup without manual calibration.
2) MMC vs Force Plate: The jump height measured from the force plates is taken as the main ground truth in this section. As shown in Table  III, MMC RM M and MMC P T M fall short of the results achieved with MyJump2 [9], especially during unilateral jumps. This is because the MyJump2 app involves the manual selection of start and end frames of jumps, and also requires subjects to be 1m away from the camera. In addition, effective usage of MyJump2 may also require a second party holding the camera. On the other hand, our methods are simpler and more convenient, requiring only a tripod stand and one calibrating jump.

IV. DISCUSSION
In this study, we have evaluated 2D markerless motion capture with a single smartphone in quantifying vertical jump height during countermovement jumps. Optical motion capture (OMC) was performed using CODA, and markerless motion capture (MMC) was performed using OpenPose with a single smartphone camera. Jump heights obtained from force plate flight times were used as the first ground truth for evaluating jump height, while OMC was used as the second ground truth. We found that MMC can quantify jump heights with ICC between 0.84 and 0.99 without manual segmentation and camera calibration. For all jumps, the greatest agreement is found between OMC and MMC RM M (LOA [-0.39, 3.70] cm) because Reverse MinMax is performed based on OMC. On the other hand, MMC P T M is more prone to errors (LOA [-3.20, 6.00] cm vs OMC, and [-6.70, 3.10] cm vs FP) since noise in the jump time series is further amplified by the pixel-to-metric conversion factor, R.
Although our proposed methods achieve comparable results, the acceptability of LOA will depend on measures similar to the minimally important difference [26] (MID) in each application context. In order to be acceptable, the LOA should be smaller than the MID. For example, the MID in an elite sports context with high accuracy and precision requirements would be considerably smaller than the MID in recreational athletes.
There are some limitations to our approach. For example, the pixel-to-metric conversion requires a calibrating jump, and movements towards or away from the camera during each task change the pixelto-metric scale. In general, the main sources of error we identify in MMC are: 1) Video quality. The quality of the video and the amount of clutter in the background affect the confidence of detected keypoints during pose estimation. 2) Video viewpoint. Accurate detection of body parts is affected by video viewpoint. For example, pose estimation sometimes fails when used for unilateral CMJ in the side view (Fig. 3). Future studies will explore other views for the unilateral CMJ. 3) Noise in HPE output. The noise level could be influenced by HPE model accuracy, background clutter, and lighting conditions. 4) Approximations. Preprocessing steps such as smoothing, segmentation, MMC scaling and pixel-to-metric conversion involve approximations, introducing errors. The Force Plate and OMC are also prone to errors due to human factors. For example, OMC coordinates drop to zero when participants' hands or clothes occlude markers. Force values are also affected if participants step outside the force plates momentarily. In cases where such errors were discovered during data collection, the participant was asked to repeat the jump.

V. CONCLUSION
The results of the analyses in this study suggest that markerless motion capture with a single smartphone is promising. However, its use case will depend on the domain-specific minimally important differences (MID). For example, for applications with very small MID, monocular MMC could provide enhanced feedback and/or augmentation for body-worn sensors and markers. On the other hand, for applications such as measuring countermovement jump height, MMC frame-by-frame tracking accuracy is not critical for the method used in this study. Hence, as shown in this study, 2D monocular MMC could potentially replace sensors and physical markers for such applications.
This study focuses on two variants of one motor task with sixteen participants. Future studies will focus on improving and generalising the techniques used to cover a comprehensive range of motor tasks. In addition, the videos used in this study were captured in the side view. Future studies will consider other views and their effects on capture techniques.