Modernising tactile acuity assessment; clinimetrics of semi-automated tests and effects of age, sex and anthropometry on performance

Background Reduced tactile acuity has been observed in several chronic pain conditions and has been proposed as a clinical indicator of somatosensory impairments related to the condition. As some interventions targeting these impairments have resulted in pain reduction, assessing tactile acuity may have significant clinical potential. While two-point discrimination threshold (TPDT) is a popular method of assessing tactile acuity, large measurement error has been observed (impeding responsiveness) and its validity has been questioned. The recently developed semi-automated ‘imprint Tactile Acuity Device’ (iTAD) may improve tactile acuity assessment, but clinimetric properties of its scores (accuracy score, response time and rate correct score) need further examination. Aims Experiment 1: To determine inter-rater reliability and measurement error of TPDT and iTAD assessments. Experiment 2: To determine internal consistencies and floor or ceiling effects of iTAD scores, and investigate effects of age, sex, and anthropometry on performance. Methods Experiment 1: To assess inter-rater reliability (ICC(2,1)) and measurement error (coefficient of variation (CoV)), three assessors each performed TPDT and iTAD assessments at the neck in forty healthy participants. Experiment 2: To assess internal consistency (ICC(2,k)) and floor or ceiling effects (skewness z-scores), one hundred healthy participants performed the iTAD’s localisation and orientation tests. Balanced for sex, participants were equally divided over five age brackets (18–30, 31–40, 41–50, 51–60 and 61–70). Age, sex, body mass index (BMI) and neck surface area were assessed to examine their direct (using multiple linear regression analysis) and indirect (using sequential mediation analysis) relationship with iTAD scores. Results Mean ICC(2,1) was moderate for TPDT (0.70) and moderate-to-good for the various iTAD scores (0.65–0.86). The CoV was 25.3% for TPDT and ranged from 6.1% to 16.5% for iTAD scores. Internal consistency was high for both iTAD accuracy scores (ICC(2,6) = 0.84; ICC(2,4) = 0.86). No overt floor or ceiling effects were detected (all skewness z-scores < 3.29). Accuracy scores were only directly related to age (decreasing with increasing age) and sex (higher for men). Discussion Although reliability was similar, iTAD scores demonstrated less measurement error than TPDT indicating a potential for better responsiveness to treatment effects. Further, unlike previously reported for TPDT, iTAD scores appeared independent of anthropometry, which simplifies interpretation. Additionally, the iTAD assesses multiple aspects of tactile processing which may provide a more comprehensive evaluation of tactile acuity. Taken together, the iTAD shows promise in measuring tactile acuity, but patient studies are needed to verify clinical relevance.


2012), sex
, and anthropometry (e.g., body mass index (BMI), waist-hip ratio, surface area of the tested body part) (Falling & Mani, 2016b;Peters, Hackeman & Goldreich, 2009), which further complicates interpretation of results. Although moderate to good intra-rater and inter-rater reliability has been reported, large measurement error has been observed (Catley et al., 2013), making it difficult to detect treatment effects. Correspondingly, TPDT was the least responsive sensibility test following median nerve injury and repair (Fonseca et al., 2018;Jerosch-Herold, 2003), as well as the least responsive to improvements in hand function after surgery (Fujimoto & Kon, 2016).
A variety of tests have recently been developed to improve tactile acuity assessment, such as the two-point orientation test (Tong, Mao & Goldreich, 2013), point-to-point test (Adamczyk et al., 2016), tactile acuity charts (Bruns et al., 2014), grating orientation task (Van Boven et al., 2000), and two-point estimation task (Adamczyk et al., 2019b;Zimney et al., 2020). Additionally, technological developments instigated semi-automated tests which may be less affected by inter-rater variability (Guemann et al., 2019;Rinderknecht et al., 2019;Hoffmann et al., 2018). An added benefit of these semi-automated tools is the potential for independent sensory training. This may be of clinical relevance, given that treatment success in manual sensory discrimination interventions could be limited by the need for caregiver involvement during at home training (Ryan et al., 2014). However, despite neck pain being ranked among the top five leading causes of years lived with disability globally (Global Burden of Disease Study 2013Collaborators, 2015, only a limited number of these novel procedures have been applied to the neck (Zimney et al., 2020;Harvie et al., 2017;Morrow & Ziat, 2018;Adamczyk et al., 2019a). As such, the body of knowledge about tactile acuity in neck pain appears limited compared to other painful conditions (Luedtke & Adamczyk, 2017). The development of improved tactile acuity assessment at the neck therefore has the potential to elucidate mechanisms of neck pain, inform development of new treatment strategies, and guide clinical decision making.
One recently developed tool designed to assess tactile acuity at the neck is the 'Imprint Tactile Acuity Device' (iTAD) (Olthof et al., 2021). The iTAD is a semi-automated device that uses single and successive vibrotactile stimuli to quantify absolute and relative tactile localisation performance (for details see elsewhere (Olthof et al., 2021)). As stimulus administration is automated and does not require synchronicity between locations, the iTAD may overcome some complications associated with TPDT. Despite some initial design issues, the iTAD prototype has shown comparable intra-rater reliability to existing tests (Olthof et al., 2021), suggesting prospective utility. In this manuscript, we report two experiments that investigate the clinimetric properties of an updated version of the iTAD. Experiment 1 aimed to determine inter-rater reliability and measurement error of both the iTAD and TPDT assessments, which is currently the most reliable tactile acuity assessment at the neck (Harvie et al., 2017). Experiment 2 aimed to quantify internal consistency and identify floor or ceiling effects of the iTAD scores, and determine their relationship with age, sex, BMI and neck surface area.

Participants
Using a convenience sample, individuals without current pain and neurological symptoms as well as without a history of persistent (i.e., >3 months) pain and neurological symptoms in the past five years were recruited from the general public.

Assessors
Three final year physiotherapy Masters students each performed iTAD and TPDT assessments after receiving approximately one hour of training for each procedure. Test instructions and performance were standardised using a testing protocol. Training included approximately 20-30 min of instructions on the testing protocol and about 30-40 min of practicing the protocol. Each assessor practised each protocol five times and underwent each assessment at least once.

TPDT assessment
A digital caliper (Renegade industrial, carbon fibre, RCFVC150) was used to establish TPDT, utilizing the two arms and pressure of its own weight as tactile stimuli. Contact area of the tip of each arm was approximately 0.25 mm × 0.5 mm and stimulus duration <1 s. During TPDT assessment, participants were seated with their forehead resting on a table in front of them. On the dominant side, TPDT was measured in a cranio-caudal direction with the caudal arm of the caliper stationary at 15 mm lateral to the spinous process of C7.
Similar to a previously published procedure (Luedtke et al., 2018), a two-alternative forced-choice (one or two points) staircase method was used, alternating two ascending and two descending runs (see Fig. 1). The caliper distance started at 15 mm, increasing with five mm steps. Steps were reduced to two mm for the subsequent runs. To avoid guessing, three consecutive reports of either one or two points indicated a reversal. A 10 mm step was added in the direction of the completed run, before running the reversed direction. The TPDT was calculated by averaging the scores of the four reversals, expressed in millimetres, with larger distances indicating poorer tactile acuity. iTAD assessment iTAD prototype. The iTAD consists of a wearable neoprene collar containing twelve vibrotactile stimulators (∼200 Hz with 0.75 g vibration amplitude), arranged in three rows of four (see Fig. 2). A wirelessly connected tablet operates the stimulators and records user's responses. After fitting and familiarisation, two tactile acuity tests were performed: the localisation test, which measures the ability to localise the vibrations (one second stimulus duration), and the orientation test, which measures the ability to determine the orientation of two successive adjacent vibrations relative to each other (0.7 s stimulus duration each). Accuracy scores (i.e., percentage correct) for each test, and the overall score (i.e., mean of both test), were calculated with higher scores indicating better tactile acuity. For a full description of the iTAD prototype and its assessment procedures, see elsewhere (Olthof et al., 2021).
Changes to the prototype. After development of the prototype, the internode distance was reduced to 32.5 mm (centre-to-centre) between all rows and columns. Additionally, a layer of foam was placed in the collar to aid fitting consistency. For the localisation test, the number of trials was increased from 48 to 72, delivered in six series of twelve. For the orientation test, the number of trials was increased from 48 to 64, delivered in four series of sixteen. Furthermore, trials were block randomised within each series, alternating between sides of the neck. Additional scores. In order to better quantify tactile acuity, two new scores were added. For each test, and the overall score, (average) response time was recorded in milliseconds with lower scores indicating faster responses. Additionally, a rate correct score was calculated (= (correct responses)/ (response times in minutes)), quantifying the number of correct responses per minute of response activity (Vandierendonck, 2017) with higher scores indicating better tactile acuity. As the rate correct score integrates response time and accuracy, it accounts for individual speed-accuracy trade-off strategies and has been suggested to provide a better estimate of perceptual performance (Vandierendonck, 2017).

Experiment 2: Internal consistency, floor and ceiling effects, and relationship with age, sex and anthropometry
Design and procedure Using a cross-sectional design, the internal consistencies of the localisation and orientation accuracy score were investigated. Additionally, floor and ceiling effects of all iTAD scores were assessed, as well as their relationship with age, sex, BMI and neck surface area. In one session, participants performed both iTAD tests after age, sex, and anthropometric measures (see 'Anthropometric measurements') were recorded. Measures were taken by a single assessor in a private, quiet room. The assessor had several hours of prior experience performing iTAD assessments.

Participants
Recruitment and selection criteria were identical to experiment one. However, participants represent a different cohort without overlap.

iTAD tests
For procedure of the iTAD tests, see 'iTAD assessment'.

Anthropometric measurements
After weight (kg) and height (m) were recorded, BMI was calculated (weight/height 2 ). Using a tape measure, the distance from the caudal aspect of the external occipital protuberance to the spinous process of C7 was measured to quantify neck length (cm). Neck circumference (cm) was measured at half-way of the neck length measurement placing the tape measure horizontally around the neck. Using these measurements, the posterior neck surface area was estimated (neck length*(neck circumference/2)).

Statistical analysis
Internal consistency was assessed by calculating the inter-relatedness of accuracy scores between series within each test, using ICC model 2,k (i.e., two-way random, absolute agreement, average measures). The ICC (2,k) was chosen over the Cronbach's alpha, the equivalent of the ICC (3,k) (i.e., two-way random, consistency, average measures), to include absolute differences between series. Although various cut-offs are proposed, most recommend 0.7−0.9 for high internal consistency (Taber, 2018).
For the accuracy scores, floor and ceiling effects were considered present if >15% of the participant scored within either the highest or lowest 20% of the scale (Terwee et al., 2007). However, such assessment would not be adequate for response times or rate correct scores, as their scales have no limit on one end and scores are (near) impossible at the other. Therefore, floor and ceiling effects for all iTAD scores were assessed by calculating z-scores for the skewness of their distribution (i.e., the skew value divided by its standard error) (Kim, 2013;Ho & Yu, 2015). Floor and ceiling effects may be present with a z-score >±1.96 in small (n<50) or >±3.29 in medium (50<n<300) sized samples (Kim, 2013;Ho & Yu, 2015).
In order to study the direct multivariate relationships of age, sex, BMI and neck surface area with the localisation and orientation accuracy score, multiple linear regressions (enter models) were performed. Furthermore, to estimate the potential indirect effects of age and sex through BMI and/or neck surface area, sequential mediation analyses were performed using the SPSS extension PROCESS (model 6; 5000 bootstrapped samples) as a secondary analysis. Mediation analyses for all other iTAD scores were performed as exploratory analyses. Regression models are expressed in (adjusted) explained variance (R 2 adjusted ). For all relationships, both the mean unstandardized regression coefficient (b) and the semi-partial correlation (sr) are provided. Indirect effects are expressed in percentage mediation (P m ).

Sample size
To examine internal consistency, a minimum of 100 participants is recommended (Terwee et al., 2007). To explore potential floor and ceiling effects, a minimum of 50 participants is recommended (Terwee et al., 2007). For the multiple linear regressions, 100 participants were needed to find a medium sized (f 2 = 0.15) prediction model using four predictors with a Bonferroni corrected p-value of 0.025 and 80% power. Taken together, the sample size was set for 100 participants, with 10 participants of both sexes in each of five age brackets (18-30, 31-40, 41-50, 51-60 and 61-70).

Experiment 1: Inter-rater reliability and measurement error
Forty individuals (25 male) participated, with a mean (SD) age of 24.1 (4.5) years. One participant was left hand dominant and the others right hand dominant. Mean scores, inter-rater reliabilities and measurement errors are displayed in Table 1. Inter-rater reliability was good for iTAD orientation accuracy score, overall accuracy score, and all response times. All other scores displayed moderate inter-rater reliability. The CoV was <10% for all iTAD response times and >20% for TPDT. All other scores had a CoV of 10-20%.
For all scores, the SDC is presented with varying confidence intervals (80% to 95%) in Table 2. Each can be used to assess the chance that an observed change in score, when larger in either direction, could reflect measurement error: (100%-confidence interval)/2 (i.e., <10%, <7.5%, <5% and <2.5% respectively). For example, a change of +16.2% in  localisation accuracy score would have a 7.5-10% chance to be a result of measurement error, whereas a change of +21.3% a 2.5-5% chance.

Experiment 2: Internal consistency, floor and ceiling effects, and relationship with age, sex and anthropometry
One hundred individuals participated, with ten of both sexes per predetermined age bracket. Ten participants were left hand dominant, 88 right hand dominant and two were ambidextrous. Mean (SD) BMI was 26.4 (4.6) and mean (SD) neck surface area was 261.0 cm 2 (48.1). Mean (SD) duration was 03:06 (00:25) minutes for the localisation test and 03:38 (00:28) for the orientation test.

Floor and ceiling effects
None of the accuracy scores had >5% of participants scoring in either the highest or lowest 20%. Only localisation response time had a skewness z-score >±1.96 (z =+2.73), which was still <±3.29.
For the orientation accuracy score, both age (b =−0.39, sr =−0.36, p = 0.00) and sex (b = 6.69, sr =0.19, p = 0.04) contributed significantly to the model, whereas BMI (p = 0.71) and neck surface area (p = 0.94) did not. This indicates that, on average, men scored 6.69% higher than women, and that scores decreased by 0.39% for each year of age.
The mediation analyses indicated several significant relationships between demographic and anthropometric variables (see Fig. 3). However, for the localisation accuracy score, the total indirect effects of age (P m = 0.16, p > 0.05), and sex (P m = 0.06, p > 0.05), through BMI and neck surface area were non-significant. Similarly, the total indirect effects of age (P m = 0.02, p > 0.05), and sex (P m = 0.04, p > 0.05) were also non-significant for the orientation accuracy score. Additionally, all individual indirect effects were non-significant for both tests. This indicates that BMI and/or neck surface area did not significantly mediate the effects of age or sex for either accuracy score. Scatterplots of localisation and orientation accuracy scores as a function of age and sex are displayed in Fig. 4.
Exploratory mediation analysis of the other iTAD scores showed similar patterns, other than sex not significantly predicting the localisation or orientation response time and rate correct score. Additionally, rate correct scores were more strongly predicted by age. Figures of all mediation analyses can be found in Supplemental files (Figs. S1-S3). Scatterplots of all iTAD scores as a function of age and sex can be also be found in Supplement files (Figs. S4, S5).

Inter-rater reliability
Inter-rater reliability was moderate for TPDT and moderate to good for the iTAD scores. When directly compared, ICC (2.1) values were somewhat higher for iTAD's response times, but similar between TPDT and other iTAD scores. The ICC (2.1) values for TPDT appear comparable to previous research, although results vary (Catley et al., 2013;Harvie et al., Figure 3 Results sequential mediation analyses. Relationships between demographics (sex and age), anthropometrics (body mass index (BMI) and neck surface area (NSA)) and iTAD accuracy scores for the localisation test (A) and orientation test (B). Relationships are expressed in semi-partial correlations (sr) and unstandardized regression coefficients (b), including their level of significance (p). Coding for sex: female =0 and male =1.

Measurement error
Results indicate a larger CoV for TPDT than all iTAD scores. For TPDT assessment, differences in speed, timing and intensity of stimulus delivery affect results (Boldt et al., 2014;Lundborg & Rosen, 2004;Yokota et al., 2020). Therefore, variability in these parameters between raters, trials, and both arms of the caliper, increases measurement error. An inherent problem of manual TPDT assessment is the inability to control, or assess, these variables in a clinical setting. The iTAD scores may be less prone to these sources of error variance, as stimulus administration is automated and does not require synchronicity between locations. Clinically, this implies less difficulty detecting change with iTAD assessments, potentially resulting in better responsiveness to treatment effects (Terwee et al., 2007). Notably, CoV for TPDT seems somewhat larger than previously reported (17.4-20.6% (Luedtke et al., 2018); 19.1% (Catley et al., 2013)). This may be due to the testing procedure (e.g., orientation of caliper, number of reversals), for which no standard is available (Adamczyk, Luedtke & Szikszay, 2018;Cashin, 2017). Yet, mean (SD) TPDT scores appeared similar to several other reports (mean (SD) range: 45.9 (18.4) to 62.6 (22.9)) (Catley et al., 2013;Zimney et al., 2020;Adamczyk et al., 2019a;Cheever et al., 2017), although somewhat higher than others (mean (SD) range: 21.7 (6.2) to 35.2 (9.6)) (Harvie et al., 2017;Luedtke et al., 2018;Elsig et al., 2014). Further, measurement error may depend on various factors related to both participants and assessors included (Catley et al., 2013). However, these were constant between the two assessments in this experiment, allowing for a more direct comparison.
Several SDC values were also presented. Although conventional, the SDC_95 may provide high specificity (few false positives) but low sensitivity (many false negatives) in detecting change due to its large confidence interval (Portney & Watkins, 2009). As both false conclusions can negatively impact clinical decision making, presenting a range of SDC values may support more precise interpretation of observed changes in relation to measurement error.

Internal consistency
Despite including absolute differences between series, internal consistency was high for both the localisation and orientation accuracy scores and higher than for the iTAD prototype (Olthof et al., 2021). Internal consistency is frequently applied to questionnaires but underutilised in experimental tasks, mostly because scores cannot be split into multiple representative parts (Green et al., 2016;Matheson, 2019). However, internal consistency has previously been established in measures such as electrocardiography (Van Lien et al., 2015), electroencephalography (Towers & Allen, 2009), joint position sense (Domingo & Lam, 2014) and motion analysis (Platz et al., 1999). One benefit of reporting internal consistency as a measure of reliability is its comparability between studies, even if only single measurements are taken (Green et al., 2016).

Floor and ceiling effects
No floor or ceiling effects, which may limit responsiveness (Terwee et al., 2007), were detected in iTAD scores; only a potential, yet debatable, floor effect for localisation response time. This could indicate difficulty in detecting improvements in already fast responders. However, this may not be clinically relevant, as slow responders are more likely targets for treatment.

Relationship with age, sex, BMI and neck surface area
Results indicated that localisation and orientation accuracy scores were only directly related to age (decreasing with increasing age) and sex (lower for women). This implies that age and sex, but not BMI or neck surface area, should be considered when interpreting scores.
Regarding the effect of age, similar sized negative correlations between age and tactile acuity have previously been established using TPDT (Kalisch et al., 2012;Falling & Mani, 2016a). One frequently proposed mechanism is the decreased cortical inhibition in response to tactile stimulation associated with older age (Kalisch et al., 2009;Lenz et al., 2012;Brodoehl et al., 2013;Pleger et al., 2016). Interestingly, these age-related declines in tactile acuity can potentially be reversed with sensory training (Pleger et al., 2016;Dinse et al., 2006).
Concerning the effect of sex, previous reports seem inconsistent and vary between body regions when measured with TPDT (Falling & Mani, 2017). For example, better tactile acuity has been reported for women at the orofacial region (Won et al., 2017) and knee (Falling & Mani, 2016b), for men at the knee (Stanton et al., 2013), and no differences were found at the lower back (Stanton et al., 2013;Falling & Mani, 2016a). To the best of our knowledge, this is the first study examining sex differences at the neck, making it difficult to compare results. Additionally, sex differences could dependent on task type. In a single experiment, women made more errors in a tactile object recognition task despite demonstrating similar TPDT scores (Kalisch et al., 2012).
Results also contrast previous reports indicating that tactile acuity at the fingertips relates to surface area, potentially due its relationship with mechanoreceptor density (Peters, Hackeman & Goldreich, 2009). One explanation is that the utilized neck surface area assessment may be a poor proxy for receptive field configuration. Different to the fingertips, necks exhibit variation in hairy (vs. non-hairy) skin which typically does not contain Pacinian mechanoreceptors, known to be activated by high frequency vibrations (Abraira & Ginty, 2013). Proportion of hairy skin may therefore moderate the relationship between neck surface area and iTAD scores, which was not investigated in this study. Alternatively, lack of a significant relationship with surface area could also indicate that iTAD scores may be less affected by peripheral receptive field configuration. Correspondingly, the ability to accurately localise tactile stimuli may be more centrally organised (Braun et al., 2011), and higher order cognitive functions (such as cortical body representations) seem to play a more prominent role (Longo, Azanon & Haggard, 2010;Tame, Azanon & Longo, 2019). The iTAD may therefore be especially suited for conditions with altered body representations, such as musculoskeletal disorders (Viceconti et al., 2020) and persistent pain (Tsay et al., 2015).

Implications and future directions
Less measurement error for the iTAD could result in better responsiveness, although this needs investigation in future trials studying treatment effects. Further, the multiple measures of the iTAD may provide a more comprehensive evaluation of tactile acuity function. However, validity of the iTAD assessments has not been thoroughly established, precluding inferences about their clinical utility in addition to, or instead of, TPDT. Moreover, their clinical relevance needs further examination in patient trials. Additionally, future studies could explore to what extent iTAD scores reflect central somatosensory processing using neuroimaging techniques. Future studies may also investigate the clinimetric properties and clinical utility of other promising manual (Bruns et al., 2014;Van Boven et al., 2000;Morrow & Ziat, 2018;Bleyenheuft & Thonnard, 2007) and automated (Goldreich et al., 2009) procedures, including automated TPDT (Yokota et al., 2020;Frahm & Gervasio, 2021), at the neck.

CONCLUSION
Findings suggest that the iTAD and TPDT have similar inter-rater reliability when measuring tactile acuity at the neck in healthy individuals. However, the iTAD exhibits several advantages such as ability to assess multiple aspects of tactile acuity, less measurement error and a possibility for independent sensory training. Furthermore, no evidence was found that scores were affected by anthropometry, simplifying interpretation. Additionally, internal consistency of iTAD accuracy scores was high and no overt floor or ceiling effects were detected. These results highlight the potential clinical utility of the iTAD and support continued investigation.

Grant Disclosures
The following grant information was disclosed by the authors: The National Health and Medical Research Council of Australia (ID 1142929). The National Health and Medical Research Council of Australia (ID 1178444). The National Health and Medical Research Council of Australia.