Reduced Cervical Muscle Fat Infiltrate Is Associated with Self-Reported Recovery from Chronic Idiopathic Neck Pain Over Six Months: A Magnetic Resonance Imaging Longitudinal Cohort Study

Background: It is unclear why neck pain persists or resolves, making assessment and management decisions challenging. Muscle composition, particularly muscle fat infiltrate (MFI), is related to neck pain, but it is unknown whether MFI changes with recovery following targeted interventions. Methods: We compared muscle composition quantified from fat-water magnetic resonance images from the C3 to T1 vertebrae in individuals with and without chronic idiopathic neck pain at two times 6 months apart. Those with neck pain received six weeks of intervention (physiotherapy or chiropractic) after their baseline MRI; at 6 months, they were classified as recovered (≥3 on the 11-point Global Rating of Change scale) or not recovered. Results: At 6 months, both asymptomatic and recovered individuals had decreased MFI compared to baseline (asymptomatic estimated marginal mean difference −1.6% 95%; CI −1.9, −1.4; recovered −1.6; −1.8, −1.4; p < 0.001) whereas those classified as not recovered had increased MFI compared to baseline (0.4; 0.1, 0.7; p = 0.014), independent of age, sex and body mass index. Conclusions: It appears MFI decreases with recovery from neck pain but increases when neck pain persists. The relationship between cervical MFI and neck pain suggests MFI may inform diagnosis, theragnosis and prognosis in individuals with neck pain. Future development of a clinical test for MFI may assist in identifying patients who will benefit from targeted muscle intervention, improving outcomes.


Introduction
Neck pain is a common problem with a high burden for individuals, healthcare systems and society [1].Neck pain ranks in the top 11th of 369 health conditions in terms of years lived with disability [2], and when combined with low back pain, it is the leading cause of years lived with disability worldwide [3].Globally, the years lived with neck pain-related disability has increased by 75% over the last 30 years [4].The high prevalence rate [5] of neck pain is also increasing [1,6] and is expected to continue to rise, largely due to population growth, ageing [4] and potentially due to increases in overweight and obesity [1], Interventions that provide long term relief for neck pain remain elusive.Symptoms commonly recur [7], suggesting that current intervention approaches may not address the source of symptoms.One of the most common types of neck pain, termed non-specific or idiopathic neck pain, is characterised as having no identifiable cause of pain, as radiological investigations typically do not correlate with patient symptoms [8].Thus, there is a need for investigations to understand the underlying contributors to neck pain so that new innovations, diagnostics and management strategies can be designed to target the underlying problem more effectively.
Muscle health is one possible biological mechanism that may affect the onset, persistence, and recovery from neck pain.The cervical muscles are commonly implicated as a source of dysfunction in a wide range of disorders, with anatomical, mechanical or functional associations to the jaw [9], eyes [10] and thoracic spine [11].Muscle health is typically described by muscle size (cross-sectional area [CSA] or volume measured by imaging) and muscle composition.[12].Muscle fat infiltrate (MFI) is the estimated fat within the muscle, quantified using signal intensities from the water and fat images produced using multi-echo MRI acquisitions (e.g., Dixon technique) [13,14].Muscle volume and MFI can be used to calculate 'relative muscle volume' representing lean muscle mass: the amount of tissue within the muscle boundaries that can be attributed to muscle mass (without fat or fascial tissue).A better understanding of muscle composition may contribute to more targeted therapies to resolve neck pain.
Altered muscle composition has been associated with having neck pain [15].Greater volume and MFI, with less relative volume, are observed in individuals with neck pain compared to asymptomatic populations [16,17].Greater cervical MFI is consistently demonstrated in individuals with whiplash-associated disorder (WAD) [18][19][20][21][22] and appears to be localised to the cervical spine and not a generalised increase in muscle fat throughout the body [23,24].The majority of studies of muscle composition in neck pain investigate individuals with WAD [25] or, when including idiopathic neck pain, limit their populations to females [16,17].Two studies report muscle composition specifically in individuals with idiopathic neck pain [26,27], though only one of these includes MFI, reporting greater MFI in the multifidus of individuals with idiopathic neck pain compared to asymptomatic controls [27].Thus, further studies of muscle composition in idiopathic neck pain are needed.
Relationships between chronic pain and MFI appear to lie along a continuum.For example, greater MFI in the cervical multifidus was observed in those with greater disability (>30% neck disability index [NDI]) compared to asymptomatic controls, but those with mild disability (<30% NDI) had similar MFI to controls [24].This continuum suggests that muscle composition, particularly MFI, might be related to changes in pain or predict pain recovery.However, very few studies on any type of neck pain investigate whether muscle composition changes over time or in relation to changes in pain.One study showed that individuals with WAD had greater CSA in multifidus than asymptomatic controls at baseline and 10 years follow-up and greater CSA in semispinalis cervicis and semispinalis capitis compared to controls at follow-up [28].However, both groups changed similarly over time (small increase), with the authors reporting this was likely due to ageing [28].One single-group cohort study of 5 females with chronic WAD showed that MFI in the multifidus might decrease following 10 weeks of supervised exercise [29].These preliminary findings suggest muscle composition may be one biomarker of neck pain that might be addressed via interventions that lead to decreased symptoms.For example, motor control training has been shown to improve neck pain and disability [30], presumably through an improvement in motor performance that has led to recovery.The lack of longitudinal studies of muscle composition and the diagnostic potential of muscle composition indicate a clear need for investigations to understand variations in muscle composition over time and whether these are related to recovery.This will support the development of diagnostic tests and interventions that include and address muscle composition.
Therefore, the aim of this study was to determine whether muscle composition (MFI, volume and relative volume) changes over six months in individuals with and without chronic idiopathic neck pain and whether changes are associated with recovery from neck pain.We hypothesised that muscle composition would improve (less MFI, greater lean muscle mass represented by relative volume) in individuals with neck pain who recovered over six months.

Materials and Methods
This longitudinal cohort study measured muscle composition in individuals with chronic idiopathic neck pain ≥ 3 months and age and sex-matched asymptomatic controls at baseline and 6 months.The study population was drawn from two studies investigating the biological effects of physiotherapy (n = 42) and chiropractic (n = 21) interventions in individuals with idiopathic neck pain.Each study included age and sex-matched asymptomatic controls to investigate normal variability over time (n = 20 and n = 10, respectively).For our analyses, we developed a convolutional neural network (CNN) for automatic muscle segmentation using the baseline scans from these two studies with an additional 24 participants from a third study (16 with idiopathic neck pain and nine controls).This resulted in 83 unique participant baseline scans, of which 70 were used for the training dataset, and 13 that were manually traced by two raters were used for the testing dataset.All participants provided written informed consent, and the studies were approved by the University of Newcastle Human Research Ethics Committee (H-2014-0416, H-2014-0233, and H-2015-0235).The two intervention trials were registered with the Australian New Zealand Clinical Trials Registry (physiotherapy: ANZCTR12614000303640; chiropractic: ACTRN12615000256572).
Participants were recruited from the general community using advertising (paper and electronic notices, social media, and a local research volunteer register and newsletter) and were screened for eligibility by telephone.Eligible participants were aged between 18 and 55 years.Those with neck pain were eligible if their current neck pain was ≥4 out of 10 on a verbal numerical pain rating scale and if their pain interfered with daily activity at least "moderately' over the previous four weeks (question 5 asked verbally from the 12-Item Short-Form Health Survey [31]).They were excluded if headache or dizziness was their primary complaint (though we did not exclude those with occasional headaches related to their neck pain) or if they had (a history of) migraine headaches, trauma/surgery to the neck, diabetes, peripheral vascular disease, inflammatory disease, neurologic conditions, neuropathic pain (score of 10 on the Self-Reported Leeds Assessment of Neuropathic Symptoms and Signs [32]) referred symptoms past the tip of the shoulder, receiving workers' compensation, history of long-term steroid use, currently taking anticoagulant medication, or were pregnant/breastfeeding. (Additional participants included in the sample used to develop the CNN were not excluded on the basis of referred symptoms past the tip of the shoulder, and nine of these additional 24 participants had mild radiculopathy.)Asymptomatic participants were excluded if they had any current musculoskeletal pain in any area, if they had sought treatment for neck pain in the previous two years, or if they could not be matched to a participant with neck pain by sex and age within 5 years.All participants needed to be able to undergo an MRI exam (no metallic implants, pacemakers, claustrophobia, not pregnancy).
Participants with pain were randomised to receive various interventions: manual therapy and exercise [33,34] with or without task-specific training [35] (physiotherapy study), manual manipulation and exercise or wait-list control (chiropractic study [36]).All intervention participants were assigned to attend six sessions, 45 min in length, once per week for six weeks following their baseline MRI.In the physiotherapy study, interventions included manual therapy consisting of joint mobilisations as described by Maitland et al. [37] to cervical and upper thoracic joints as indicated and individually tailored exercises that included range of motion and stretching to the cervical and/or thoracic spine, postural training and deep cervical flexor training as described by Jull et al. [38,39].In the chiropractic study [36], intervention participants received high-velocity, low-amplitude thrust manipulations as indicated in the upper thoracic vertebrae using a supine procedure [40] and in the cervical spine using the Gonstead Cervical Chair procedure (described by Bergman and Peterson as the "seated index/pillar push" [40]), with tailored dynamic neuromuscular stabilisation exercises [41], trigger point work, stretching, posture education and task modification.In each study, a single professional performed the interventions for both groups.Participants were not blinded to the treatment they received, though all interventions were described as 'expected to relieve neck pain'.
At six months, participants with pain completed the Global Rating of Change Score on an 11-point scale by answering the question, "With respect to your neck pain, how would you describe yourself now compared to before you had the intervention on your neck?"The 11-point GROC scale was anchored by 'very much worse (−5)' on the left and 'completely recovered (+5)' on the right, with 'unchanged (0)' in the middle of the scale [42].A score of 3 points or more was considered "recovered"; ≤2 was classified as "not recovered".
The following variables contextualised the participant sample: age, sex (male/female), weight (kg using a standard scale: Seca, Model 7621019009), height (cm using a standard stadiometer), body mass index (BMI), physical activity level (Godin Shepherd Leisure-time Physical Activity Questionnaire [43]), and depressive symptoms (Center for Epidemiologic Studies Short Depression Scale [CES-D 10] [44]).In participants with neck pain, we also collected neck disability (Neck Disability Index, NDI [45]), pain duration (months) and pain intensity (100 mm visual analogue scale [VAS] anchored by 'no pain' on the left and 'worst pain imaginable' on the right for current, past 24 h and past four weeks on average [46]).All participants had their neck range of motion measured in flexion, extension and rotation (right and left), using the Cervical Range of Motion instrument (CROM, Performance Attainment Associates, Minnesota, IL, USA [47]), recording the average of three repetitions.MFI, muscle volume, and relative volume were measured from MR images from the intervertebral disc of C2/3 through the intervertebral disc of T1/2 (Figure 1).MRI was performed on a Siemens Magnetom Prisma 3-tesla scanner utilising a 64-channel head/neck array coil.An axial, VIBE (T1-weighted gradient echo) using two-point Dixon technique (Dixon-VIBE) (TR/TE1/TE2 7.05/2.46/3.69ms) was acquired with a 320 × 320 mm field of view and 448 × 448 acquisition matrix (0.7 mm in-plane resolution) with a slice thickness of 3 mm.A single slab with 52 slices was acquired from the cephalad portion of C3 through the caudal portion of the T2 vertebral end plate in 6:23 min.Axial slices were aligned parallel to the C2/3 intervertebral disc allowing MRI slices to perpendicularly intersect muscles.The participant's head was positioned in an approximately neutral position, using the same coil for every study to standardise alignment.A foam pad was placed under the head of the participant for comfort, while additional padding was placed on either side of the head to minimise head movement.The participant was instructed to remain stationary throughout the examination.
To identify each axial slice in relation to the cervical vertebrae, we used a sagittal localiser view to assign individual slices to vertebral levels.Firstly, the slices closest to the midsection of each intervertebral disc were identified.These identified the disc space and were assigned to the spinal level cephalad of the disc.Lastly, the slices between those that identified disc spaces were assigned to the appropriate spinal level.
Two blinded raters (SS, OK) manually segmented the muscles of interest (i.e., left and right levator scapulae, multifidus including semispinalis cervicis (MFSS), semispinalis capitis, splenius capitis including splenius cervicis (SCSC), longus colli and sternocleidomastoid) across the C1-T1 cervical region on all images taken at baseline using anatomical cross-references [12] as previously described [26,27,48].Both raters had training in cervical spine anatomy and interpreting the muscle boundaries from the MR images.To develop the CNN, all baseline images that had been manually segmented were assigned to a training dataset (n = 70; 45% female; mean age 35.9 years, SD 10.4; BMI 25.1 kg/m 2 , SD 4.1), except the images from 13 participants manually segmented by two raters that were assigned to a testing dataset (n = 26; 62% female; age 38.6, 10.5; BMI 27.9, 5.8).We trained a modified 2D U-Net CNN for image segmentation using the MONAI framework for deep learning in healthcare imaging [49,50].A NVIDIA RTX 3090 24GB graphical processing unit (GPU, NVIDIA, Santa Clara, CA, USA) (spatial window batch size = 10, batch size = 1, optimiser = AdamW, loss function = DiceCEloss, weight decay = 0.0001, and learning rate = 0.001) was used for model training.The 2D CNN model was trained on axial slices of the water images using a spatial window size of 368 × 144 × 1.The CNN used in this study is available, open-source, at https://github.com/MuscleMap/Muscle-Map(accessed on 20 June 2024).Subsequent to the development of the CNN, all muscle data used in the current study were extracted using the CNN, including data from baseline images.To develop the CNN, all baseline images that had been manually segmented were assigned to a training dataset (n = 70; 45% female; mean age 35.9 years, SD 10.4; BMI 25.1 kg/m 2 , SD 4.1), except the images from 13 participants manually segmented by two raters that were assigned to a testing dataset (n = 26; 62% female; age 38.6, 10.5; BMI 27.9, 5.8).We trained a modified 2D U-Net CNN for image segmentation using the MONAI framework for deep learning in healthcare imaging [49,50].A NVIDIA RTX 3090 24GB graphical processing unit (GPU, NVIDIA, Santa Clara, CA, USA) (spatial window batch size = 10, batch size = 1, optimiser = AdamW, loss function = DiceCEloss, weight decay = 0.0001, and learning rate = 0.001) was used for model training.The 2D CNN model was trained on axial slices of the water images using a spatial window size of 368 × 144 × 1.The CNN used in this study is available, open-source, at https://github.com/MuscleMap/MuscleMap(accessed on 20 June 2024).Subsequent to the development of the CNN, all muscle data used in the current study were extracted using the CNN, including data from baseline images.

Data Analysis
As this was a sample of convenience, the sample size was not calculated a priori.CNN segmentation performance was evaluated by comparing its output to the 'ground truth', defined as the mean of the values extracted from the manual segmentations of both blinded raters on the testing dataset (n = 13).CNN segmentation accuracy and reliability were assessed using segmentation metrics (e.g., Sørensen-Dice index), intraclass correlation coefficients (ICC 2,1 ), correlation plots and Bland-Altman analyses in the testing dataset (n = 13) using previously described methods [48].Similarly, we also assessed the interrater reliability of manual segmentation between the two raters.ICCs were interpreted as <0.40 poor, 0.40-0.59fair, 0.60-0.74good, and 0.70-1.00excellent [51].See Supplementary Table S1 for descriptions of each of the segmentation metrics calculated.
Participant characteristics at baseline were analysed with descriptive statistics, with potential baseline differences between groups determined using one-way analysis of variance with post-hoc tests adjusted using the least significant difference (for continuous measures) and Pearson's Chi-square for categorical measures.Participant characteristics were analysed including the asymptomatic group, except for variables not measured in the asymptomatic group and for range of motion, where the asymptomatic group would be expected to be different at baseline.
Bonferroni-adjusted estimated marginal means (EMM) from linear mixed regression models for each of MFI, volume and relative volume are used to report group means (SD), determine changes between baseline and 6 months, and determine between-group differences at baseline and at 6 months accounting for participant group, side (left/right), spinal level, age, sex, body mass index, time (baseline vs. 6 months), and interaction for group × time.Models were conducted for each of the six muscle/muscle groups separately and for all muscles together.As models were analysed by MRI slice, and each participant had data from multiple slices and time points, we included a random effect for the participant in all models.For each model, the assumptions of normality, linearity, homoscedasticity, and independence of residuals were checked and confirmed for each model.p-values of 0.05 were considered statistically significant.
All differences between time points or between groups are reported as outputs from the post-hoc tests of the regression models; therefore, reported values are adjusted for age, sex, BMI, spinal level, and side.Within-group changes at six months are adjusted for baseline values.Between-group comparisons account for the two time points.Mixed models were completed using IBM SPSS Statistics, Version 28.0 (Armonk, NY, USA: IBM Corp).

Participants
Participants with neck pain were recruited from May through December 2015, with asymptomatic matched controls subsequently recruited through May 2017.Of 151 volunteers with neck pain screened, 33 enrolled (22 in the physiotherapy study and 11 in the chiropractic study).Reasons for exclusion were previous trauma (e.g., motor vehicle collision) or surgery (29%, n = 43), migraines (15%, n = 22), did not meet pain criteria, usually with pain levels too low (12%, n = 18), age > 55 years (12%, n = 17), radiculopathy (9%, n = 14), declined participation or not contactable after inquiring about the study (7%, n = 10), neuropathic pain or fibromyalgia (5%, n = 8), reports of dizziness of unknown origin (2%, n = 3), currently receiving treatment (1%, n = 2), congenital fused vertebrae (1%, n = 1), diabetes (1%, n = 1), or reason not recorded (8%, n = 12).Two participants were excluded from the intervention studies after their baseline scan (one had an unrelated injury, and the other was deemed not eligible for the assigned intervention by the treating practitioner); their baseline MRI scans are included in the data used to develop the CNN.Asymptomatic volunteers were enrolled when their age was within 5 years, and their sex matched a pain participant.
Twenty-one participants with neck pain enrolled in the physiotherapy study and were randomised to tailored manual therapy + exercise with or without task specific training; 10 enrolled in the chiropractic study and were randomised to manipulation + exercise or no treatment control.Seven participants with pain are missing their GROC score at six months (six did not return, and one did not complete questionnaires); thus, their data are removed from the current analyses, as they could not be categorised into an outcome group.Seven asymptomatic participants did not return at six months; their baseline scan is included.Characteristics of the included participants are reported in Table 1.Groups were not significantly different in terms of the characteristics listed in Table 1, and these characteristics did not significantly differ for the missing pain participants (n = 7) compared to the two pain groups (Table 1).

CNN Performance
The two-dimensional CNN model training was completed in 30,000 iterations.CNN segmentation accuracy was good to excellent with Sørensen-Dice ≥ 0.73 (range 0.73 to 0.87).We report good to excellent CNN reliability for MFI with ICC 2,1 ≥ 0.708 (range 0.708 to 0.977) except for the right SCM (ICC 2,1 = 0.565), left SCM (ICC 2,1 = 0.482), and the right longus colli (ICC 2,1 = 0.708).CNN reliability was excellent for muscle volume of all muscles with ICC 2,1 ≥ 0.880 (range 0.880 to 0.973).In comparing manual segmentation between the two raters, good to excellent interrater segmentation accuracy with Sørensen-Dice ≥ 0.72 (range 0.72 to 0.87) was observed.We report good to excellent interrater reliability for MFI with ICC 2,1 ≥ 0.633 (range 0.633 to 0.957) except for the right SCM (ICC 2,1 = 0.454) and left SCM (ICC 2,1 = 0.388).Interrater reliability was good to excellent for muscle volume of the individual muscles with ICC 2,1 ≥ 0.669 (range 0.669 to 0.975).Tables and Bland-Altman and correlations plots summarizing the CNN and interrater segmentation accuracy and reliability are provided in the Supplementary Materials.

Changes in Muscle Composition Over Time
The recovered and asymptomatic groups had reduced MFI at six months compared to baseline (estimated marginal mean [EMM] difference, all muscles analysed together: recovered −1.6%; 95% CI −1.8, −1.4; asymptomatic −1.6; −1.9, −1.4; p < 0.001) whereas the group classified as not recovered had increased MFI compared to baseline (0.4; 0.1, 0.7; p = 0.014).Consistent across regression models for individual muscles, each muscle in the recovered and asymptomatic groups had significantly less MFI at 6 months compared to baseline (EMM differences ranging from −0.5 to −3.0% for recovered and −1.3 to −2.1% for asymptomatic groups, p < 0.001 for all muscles except levator scapula p = 0.012; Table 2, Figure 2).Three muscles in the not recovered group had greater MFI at six months (levator scapula, semispinalis capitis, SCSC, with EMM differences ranging from 0.5 to 1.0%; p ≤ 0.009; Table 2, Figure 2).
Volume was less for all groups at 6 months compared to baseline (EMM difference, all muscles analysed together: not recovered −37.2, Figure 2).
Relative volume was reduced at 6 months compared to baseline in the not recovered group (EMM difference, all muscles analysed together −32.5 mm 3 ; 95% CI −49.5, −15.5; p < 0.001) but not significantly different for the recovered (−15.0;−30.9, 0.8; p = 0.063) and asymptomatic groups (1.0; −10.1, 11.3; p = 0.913).For analyses of individual muscles, relative volume was reduced at six months for all muscles in the not recovered group except for longus colli (EMM differences ranging from −12.  2, Figure 2).No group × time interaction, but an overall effect for Time.‡  No group × time interaction nor time effects.§ Group × time interaction but no overall effect for time.No group × time interaction, but an overall effect for Time.‡  No group × time interaction nor time effects.§ Group × time interaction but no overall effect for time.2. Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.2. Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.

Differences between Groups
At baseline, the recovered group had greater MFI than the asymptomatic group (EMM difference, all muscles analysed together 3.5%, 95% CI 0.2, 6.8; p = 0.036).There were no other significant between-group differences at baseline or six months in EMMs when analysing all muscles together and accounting for all confounders.Examining the regression models for individual muscles, at baseline, the recovered group had greater MFI compared to the asymptomatic group in the MFSS (EMM difference 5.4%; 95% CI 1.5, 9.3; p = 0.004), semispinalis capitis (3.4; 0.2, 6.7; p = 0.037), and SCSC (5.5; 1.9, 9.1; p = 0.001).

Discussion
This study investigated the relationship between cervical muscle composition on MRI and chronic idiopathic neck pain over a six-month period.Individuals defined as recovered at 6 months (GROC ≥ 3) had less MFI compared to baseline in all muscles, whereas those who were not recovered (GROC ≤ 2) had greater MFI in three of the six muscles investigated (levator scapula, semispinalis capitis and SCSC).For the recovered group, the relative volume increased with no change in overall volume for longus colli, suggesting those who recovered may have increased their muscle mass particularly in longus colli.At six months in the group that did not recover, muscles with greater MFI (levator scapula, MFSS and SCSC) had less volume and relative volume compared to baseline, suggesting MFI may have accumulated over time occupying the muscle space.These changes in muscle composition may reduce the capacity to generate or sustain muscle forces, which may affect neck position, leading to anatomical changes that might underpin pain chronicity.There were few between-group differences in muscle composition, though the recovered group had greater MFI than the asymptomatic group in 3 of 6 muscles investigated at baseline and 6 months.Decreased MFI in the recovered group and increased MFI in the not recovered group over time suggest that changes in MFI are related to recovery from idiopathic neck pain.As participants with pain received neck muscle exercise interventions, this relationship between neck pain recovery and MFI reduction suggests a potential mechanism supporting targeted neck muscle training [34].
The majority of participants with neck pain in the current study received a form of treatment that included exercise, with many having deep cervical muscle flexion [34,39] as part of their treatment program.Changes in muscle composition (that were not reported to the patient) represent a possible mechanism for self-reported improvement, as reduced cervical MFI was observed in the recovered group: those for whom this treatment approach was successful.Specifically, longus colli was observed to have greater relative volume alongside reduced MFI at six months in the recovered group, suggesting an increase in active functional muscle mass.In contrast, other muscles showed reductions in MFI but no changes in relative volume.Longus colli is a muscle that is specifically activated with the deep neck flexor exercise that many participants received [52], suggesting the intervention was successful in affecting the target mechanism.Future research might develop a decision tree to identify patients who may benefit from mechanically focused exercise intervention (perhaps by identifying those with high MFI) and establish a specific intervention approach that includes deep cervical flexor training.If this approach is shown to be effective for reducing symptoms and MFI, it may lead to recommendations for interventions that target MFI and enable MFI to be used as a biomarker for theragnosis and prognosis.
At 6 months follow-up, the recovered group had less MFI compared to the baseline, whereas the not recovered group had the same or more than their baseline values.Reduced MFI in the recovered group associated with self-reported recovery following specific muscle training suggests that the underlying pain mechanism in the recovered group may have been predominantly mechanical or nociceptive [53], as this would support the assumption of a successful outcome from targeted muscle training and manual therapy.The lack of self-reported recovery and minimally changed MFI in the not recovered group suggests their underlying predominant pain mechanism was unlikely mechanical or nociceptive.Their predominant pain mechanism may have been driven by centrally mediated mechanisms.These findings suggest MFI might be a useful biomarker to predict those who might respond to mechanical treatment focused on muscles, whereas those without excessive MFI might require other interventions, such as psychologically-informed interventions [54].Current AI-based studies, using larger numbers of participants, are underway and aimed to establish normative reference values for MFI controlled by age, sex, gender, race, and ethnicity.The normative reference dataset of muscle composition across the lifespan will help diagnose pathology, gauge the efficacy of interventions, and develop new outcome measures capable of accurately assessing the impact of change in muscle composition (https://github.com/MuscleMap/MuscleMap,accessed on 20 June 2024).
Unexpectedly, there were few differences in muscle composition between asymptomatic, recovered or not recovered groups at baseline or six months.This might suggest that when individual characteristics are accounted for in a mixed model (i.e., age, sex and BMI), differences between individuals overshadow any group differences [55,56], and thus group differences are not detected.Regarding between-group differences, the recovered group had greater MFI than the asymptomatic group at baseline in levator scapula, semispinalis capitis and SCSC.Despite MFI reduction for those who recovered, the group mean MFI for the recovered group remained greater than the asymptomatic group at six months.
Notably, all of the relationships regarding muscle composition in this study have been analysed with adjustment for age, sex, BMI, spinal level, and side (left/right) in regression models.The models showed that muscle composition for all variables and muscles differed significantly with spinal level (patterns of differences depended on the muscle examined and were related to the muscle anatomy) and side (right side typically slightly greater), so these factors were included in all models.Older age was associated with greater MFI in all cervical muscles and less relative volume (active muscle mass) in MFSS, SCSC and SCM, suggesting that MFI may increase with age, reducing the available muscle mass for head and neck control.The relationship between increased MFI and increased age has been confirmed in multiple studies in asymptomatic individuals in the lumbar spine [57][58][59][60][61][62][63][64] and triceps surae [65] and in individuals with degenerative cervical myelopathy [66,67].Sex was not associated with MFI, though males had greater volume and relative volume than females likely due to larger muscles as a result of anthropometrics; males as a group tended to be taller and weigh more.Greater BMI was associated with greater MFI for all cervical muscles studied.However, greater BMI was associated with greater volume and relative volume for only two muscles: the levator scapula and the SCM.Associations between BMI and MFI are likely related to more body fat overall, though the relationship between MFI and self-reported recovery remained when accounting for BMI in the models.As age, sex and BMI were strongly associated with muscle composition for many muscles, they were retained in the models analysing each muscle.These findings are consistent with previous studies in the cervical spine [26,27,68].
The clinical implications of our findings may suggest that muscle composition may change in response to therapies, particularly if the primary pain mechanism is mechanical or nociceptive.All participants with pain received mechanically-focused interventions (manual therapy and exercise), suggesting that those who responded to this therapy regime and recovered might have had more of a mechanical or nociceptive pain mechanism (as compared to a neuropathic or central pain mechanism) [53].If that hypothesis is accepted, then it may support investigations to determine if MFI may be a possible biomarker to identify patients who may benefit from manual therapy and exercise, that is, patients with a pain mechanism that is predominantly mechanical or nociceptive.
This study has several limitations.First, the use of GROC to classify recovery is limited, although it is commonly used in clinical research [42] and recommended by research consortia [69].Defining a GROC cut-off score to classify 'recovery' is somewhat arbitrary.
The minimally clinically important change has been reported as 2, 2.5 or 3 [42,70,71], with higher cut-offs shown to better distinguish between improved and not improved patients [72].Further, patients with less severe symptoms at baseline (like the participants in the current study) typically report smaller change scores [73]; thus, the cut-off score of ≥3 in the current study increases the certainty of true improvement for those assigned to the recovered group.The number of participants in each of the pain groups was relatively small, so there was inadequate statistical power to investigate the effects of the different treatment approaches.Participants in the current study had a lengthy history of neck pain (92% had pain ≥ 1 year and 58% had pain ≥ 5 years, Table 1).It is unknown whether MFI might develop rapidly at the onset of pain, as in whiplash-associated disorder [20], or if it is present prior to pain onset.
Our results linking reduced MFI with recovery should be viewed with caution.Changes in MFI are unlikely to be a cause of changes in pain and are only one possible physiological mechanism that may occur alongside pain changes.It is likely that reductions in pain severity are the result of complex interactions of multiple factors, such as psychological (e.g., cognitions and emotions) and social (e.g., socioeconomic and cultural) systems [74].These factors were beyond the scope of this study but warrant examination in future studies that fully characterize these interactions before and after the exercise program.Nevertheless, improvement in clinical function following reduced muscle fat infiltration might be expected through increased muscle functional capacity per unit size [75].It has been hypothesized that this will improve spinal resilience and/or reduce muscle fatiguability to metabolic demands [74,76].Future research is warranted to further elaborate the inter-related associations between muscle composition, muscle function and spinal pain.Moreover, longitudinal studies with larger samples are needed to better understand the causal relationship between MFI, head and neck control, and pain.Automated methods of classifying muscle composition [77], as used in the current study, and/or data sharing will be necessary to potentially uncover findings that are masked in smaller samples due to variability between participants.Data sharing will require consideration of ethical and privacy issues [78,79].

Conclusions
This study investigated 31 participants with chronic idiopathic neck pain and 30 controls over six months.Reduced cervical MFI was related to self-reported recovery from chronic neck pain, accounting for age, sex, BMI, spinal level, and side (left/right).The 12 participants who recovered had greater MFI at baseline and greater reductions in MFI over six months compared to the 12 participants who did not recover.Thus, MFI may be a possible biomarker to identify patients expected to recover following intervention.

Supplementary Materials:
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/jcm13154485/s1,Table S1.Descriptions of interrater segmentation metrics.Table S2.Interrater segmentation metrics for assessment of segmentation performance between two raters in the testing dataset (n = 13).Table S3.Accuracy and reliability of muscle fat infiltration (MFI, %) between two raters were assessed in the testing dataset (n = 13) for human-level interrater reliability.Table S4.Accuracy and reliability of muscle volume (mm 3 ) between two raters were assessed in the testing dataset (n = 13) for human-level interrater reliability.Table S5.Performance of the convolutional neural network (CNN) model segmentations with respect to the ground truth assessed in the testing dataset (n = 13).Table S6.The accuracy and reliability of muscle fat infiltration (MFI) between the convolutional neural network (CNN) model and the ground truth were assessed in testing the dataset (n = 13).Table S7.Accuracy and reliability of muscle volume between the convolutional neural network (CNN) model and the ground truth were assessed in testing the dataset (n = 13).Figure S1.Interrater reliability and accuracy for muscle fat infiltration (MFI) between two raters (R1 and R2) in the testing dataset (n = 13), assessed by correlation and Bland-Altman plots for each muscle or muscle group.Figure S2.Interrater reliability and accuracy for muscle volume (mm3) between two raters (R1 and R2) in the testing dataset (n = 13), assessed by correlation and Bland-Altman plots for each muscle or muscle group.Figure S3.Interrater

Figure 2 .Figure 2 .
Figure 2. Bonferroni-adjusted estimated marginal mean differences between baseline a months for muscle fat infiltrate (MFI), volume and relative volume.* Indicates a significant ence between baseline and six months (for MFI p ≤ 0.01, volume ≤ 0.032, relative volume ≤ adjusted for age, sex, body mass index, spinal level, side (left/right), group, and interaction for × time.

Table 1 .
Baseline characteristics of participants.

Table 2 .
Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.

Table 2 .
Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.

Table 2 .
Not recovered; Rec = Recovered; Asymp = Asymptomatic.* Post-hoc comparison significant at p < 0.001 ***, p < 0.01 **, or p < 0.05 *. † No group × time interaction, but an overall effect for Time.‡ No group × time interaction nor time effects.§ Group × time interaction but no overall effect for time.Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.

Table 2 .
Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.

Table 2 .
Bonferroni-adjusted estimated marginal means (EMM) and mean differences (95% CI) from linear mixed regression models for each muscle for muscle fat infiltrate (MFI), volume and relative volume, accounting for age, sex, body mass index, spinal level, side (left/right), and interaction for group × time.