Quantifying effects of stochasticity in reference frame transformations on posterior distributions

Reference frame transformations are usually considered to be deterministic. However, translations, scaling or rotation angles could be stochastic. Indeed, variability of these entities often originates from noisy estimation processes. The impact of transformation noise on the statistics of the transformed signals is unknown and a quantification of these effects is the goal of this study. We first quantify analytically and numerically how stochastic reference frame transformations (SRFT) alter the posterior distribution of the transformed signals. We then propose an new empirical measure to quantify deviations from a given distribution when only limited data is available. We apply this empirical measure to an example in sensory-motor neuroscience to quantify how different head roll angles change the distribution of reach endpoints away from the normal distribution.


Introduction
Reference frame transformations are crucial components in many areas of science and technology. This includes Engineering, Computer Graphics, Physics, Robotics, Mathematics, and Neuroscience. Until now they have been used in a deterministic fashion, i.e., assuming that we have exact knowledge about the transformation parameter, such as rotational angles and axes. However, in real world applications these transformation parameters are often noisy estimates. For example, measurement errors can result in noisy parameter estimates. Here, we are interested in describing the impact of noise in reference frame transformations on the statistical distribution of transformed data. We propose that reference frame transformations should sometimes more appropriately be described in stochastic terms, i.e., stochastic reference frame transformations (SRFTs), and demonstrate the impact of SRFTs for Neuroscience research, but our findings generalize to other areas.
In Neuroscience-our application field of choice-reference frame transformations are omnipresent (Knudsen et al., 1987;Soechting and Flanders, 1992;Lacquaniti and Caminiti, 1998;Snyder, 2000;Cohen and Andersen, 2002;Engel et al., 2002;Henriques et al., 2002;Crawford et al., 2004;Buneo and Andersen, 2006;Schlicht and Schrater, 2007;Tagliabue and McIntyre, 2014) and we therefore expect a large impact of noise on transformed data and thus the neuronal and behavioral outcomes. For example sensory signals enter the brain in different frames of reference (vision in a retinal frame, audition in a head-centered frame, etc) and drive different motor systems (e.g., eye, head, arm movement) requiring motor commands to be specified in yet again different coordinate frames. Thus virtually all sensory-motor computations are affected by the inevitable stochasticity of reference frame transformations (Rossetti et al., 1994;Sabes, 2003, 2005;Blohm and Crawford, 2007;Schlicht and Schrater, 2007;McGuire and Sabes, 2009;Burns and Blohm, 2010;Burns et al., 2011).
In the present manuscript, we provide new methods for quantifying SRFTs. First, we compute the exact change in the statistical posterior distribution compared to the original distribution due to SRFTs. In a second step, we propose a new measure to quantify deviations of statistical distributions from their original distribution in limited experimental data. Finally, we validate our hypotheses and approach on previously published data from a reaching task performed under different head roll positions (Burns and Blohm, 2010) to demonstrate that larger reference frame transformations will lead to a larger deviations from normality. Together, these three steps are the building blocks for capturing the effects of SRFTs on experimental data.

SRFT Analysis
We will consider the following general linear transformation of original data points P 0 into corresponding data P 1 .
Our first goal was to find the distribution of the transformed data P 1 given P 0 with a known distribution and a noisy transformation matrix R = R(k, θ ), where k is a scaling factor and θ is a rotation angle. Without loss of generality, we will use two-dimensional (2D) data and transformations of the form: ,and investigate the effect of noise in both k and θ on the resulting distribution of P 1 . (Note that the linear transformation depends non-linearly on θ and is thus expected to result in non-linear effects). To this end, we assume that the data to be transformed is normally distributed because normality is assumed in almost all behavioral neuroscience data, but our conclusions and mathematical developments also apply for other distributions. Because rotation is a linear transformation, the transformed data should be jointly normal as well. Therefore, it is possible to investigate the effect of noisy transformations by measuring the amount of deviation from normality on the transformed data. We assume that the data vector P 0 = (x 1 , x 2 ) contains two jointly normal random variables with variances σ 2 x i , i = {1, 2}, and correlation coefficient ρ with the following joint distribution: Let us further assume that the data is going to be transformed by the transformation matrix R to the new vector P 1 = (y 1 , y 2 ) = y. The goal is then to find the distribution of the resulting transformed vector y when the angle of rotation θ , and the scaling factor k are noisy with known distributions f θ (θ ), and f k (k), respectively. If we assume that the data is jointly normal, then the resulting linearly transformed vector y would be jointly normal as well with the following distribution: where µ y 1 , µ y 2 , σ 2 y 1 , σ 2 y 2 , and ρ y are defined, respectively as: σ 2 y 2 = k 2 (sin 2 θ σ 2 x 1 + cos 2 θ σ 2 The distribution of the transformed vector y can now be defined as: Note that we have assumed that the scaling factor is independent of the angle of rotation. The integral in Equation (10) is difficult to be solved analytically. However, numerical estimation of the integral can provide us with a good understanding of the shape of the distribution of the transformed data y.
To quantify the influence of noise on the transformed data distribution, we calculated the distance of the resulting distribution from the distribution obtained with a noiseless transformation. Doing so, we can investigate the effect of noise in the rotation angle and scaling factor on the shape of the resulting distribution.
In this paper we use the Kullback-Leibler distance (Kullback and Leibler, 1951) to calculate the divergence of the two known distributions. Since the Kullback-Leibler distance is a non-symmetric measure, we specifically set out to quantify the information lost when using f Y,noiseless (Equation 4) to approximate f Y (Equation 10), such that:

Assessing Experimental Deviations from Multivariate Normality
When the distribution of the transformed data is known or can be estimated numerically, it is possible to assess the deviation from multivariate normality (MVN) using the Kullback-Leibler distance (Equation 11). We use this in the first part of the results section for simulated data. However, in experiments, one typically does not have access to the noiseless distribution f Y,noiseless . Thus, to assess deviation from MVN for experimental data sets, two major groups of procedures have been used. The first group are statistical assessments that can test the hypothesis of data being normally distributed with a given p-value but such tests are not robust against sample size effects (Henderson, 2006). The second group of procedures uses graphical tools such as probability-probability plots (P-P) and quantile-quantile plots (Q-Q) (Wilk and Gnanadesikan, 1968;Thompson, 1990;Burdenski, 2000;Henderson, 2006). Here, we propose a different approach to quantify the amount of deviation from normality that consists in (1) reducing the data space dimensionality and (2) estimating the empirical cumulative distribution function (CDF) of the transformed data samples and the reduced-dimension original data. We propose that the distance between the two empirical CDFs is a measure of deviation from normality.
In order to reduce the data space from 2D (or 3D) to 1D, we compute the sample's Mahalanobis distance: where µ P 0 and P 0 are the mean vector and covariance matrix of the original data (P 0 ), respectively; 1 ≤ i ≤ n is the sample number. The Mahalanobis distance is computed for each sample of the transformed data P 1 . The advantage of the Mahalanobis distance is twofold. It not only reduces the dimensionality of the data, but for MVN, the distance distribution also only depends on the dimensionality of the data, and does depend on neither the marginal standard deviations nor the correlation coefficients of data components. For MVN data with dimension p, the Mahalanobis distance distribution is χ 2 p with p degrees of freedom. To study SRFTs, we thus compute the Mahalanobis distance for the data transformed without noise (P 1 ) as well as the SRFT data (P 1 SRFT ), and estimate empirical CDFs for the two sample sets.
For independent identically distributed (IID) random variables x i , 1 ≤ i ≤ n with the common CDF F(t), the empirical CDF can be defined as: where 1 (.) is the indicator function. Now the distance between the empirical CDFs of d 2 P 1 and d 2 can be calculated as: where F −1 (.) is the inverse of the empirical CDF. The inverse exists because CDF is a monotonically increasing function for any random variable, and its domain is from 0 to 1. To analyze the effects of noise in θ and k, we generated a random data sample, P 0 , and then applied transformations R with varying amounts of noise in θ and k. Burns and Blohm (2010) showed that multi-sensory weights depend on contextual noise in reference frame transformation. They designed a reaching experiment to investigate the effect of head roll on sensory transformations and its consequences for multi-sensory integration weights. They showed that head orientation affects the weighting of visual and proprioceptive information in multi-sensory integration during reaching in two distinct ways. First, non-accurate head roll estimation results in an erroneous rotation of the visual information into proprioceptive coordinates. Second, non-reliable head roll estimation affects motor planning, and results in increased movement variability (Burns and Blohm, 2010). In other words, noise in the reference frame transformation between the rotated visual input during head roll and the spatially required movement resulted in more variable movements when the head was rolled as compared to when the head was straight. In this paper we use their data to show that reaching under head-roll conditions also results in deviations from normality compared to the head straight-ahead situation, confirming the hypothesis that head roll estimation noise underlies SRFTs of the visual information into proprioceptive coordinates. Experimental procedures have been described in detail in Burns and Blohm (2010). Briefly, in their experiment they asked seven participants to perform a reaching task while seated in an augmented reality setup with their head position kept in place using a bite bar. Subjects viewed visual stimuli that were projected from an overhead screen through a semi-mirrored surface in six different positions at 10 cm distance from a center start position cross at 60, 90, 120, 240, 270, and 300 • around the center cross. Underneath the mirrored surface, an opaque board prevented the subjects from viewing their hand. A dot corresponding to real time hand position provided subjects with feedback about their hand, but only until reach movements started at which time the hand position cue was removed. Subjects were instructed to begin each trial by aligning the visual cue representing their hand with the center cross. They performed rapid reaching movements using a vertical handle mounted on an air sled while keeping their gaze fixated on the center cross (Burns and Blohm, 2010). Participants completed the task at three different head roll positions, −30, 0, and 30 • head roll. We used this data to analyze the distribution of reach directions compared to the normal distribution using Equation (14).

Results
In this section we investigate the effect of noise in rotation angles on transformed simulated data (see Figure legends for exact simulation parameters) to investigate the statistical data properties before and after a noisy rotation. To do so, we first numerically computed the integral in Equation (10) to find the distribution of the transformed data with the assumption that the angle of rotation is normally distributed (see Figure 1). We then compare this noisy transformation result to the data transformed without noise in the rotation angle. As noted before, this latter distribution is normal for joint normal original data. All transformations considered were performed under the assumption of independent Gaussian noise in the transformation parameters.
Three things can be observed in Figure 1 when comparing the transformed data without noise in the transformation angle ( Figure 1B) to data from noisy transformations (Figures 1C-E). First, it is quite obvious that noise added to the transformation will result in noisier transformed data. This will result in larger variances and covariances of the transformed data. Second, even moderate noise can change the covariance of the transformed data (compare Figure 1B and Figure 1C), both in size and orientation. Third, noise in the transformation angle generally distorts data away from multi-variate normality. This distortion is non-trivial in particular for data with non-zero correlation ρ. This is best observed in the contour plots in the lower part of Figures 1C-E . It is thus important to quantify such distortions, which we will do in the following.
To obtain a better idea about how noise in transformations affects the distribution of the transformed data, we quantified the difference between the distribution of the data transformed under noisy conditions and the transformed data without noise in the transformation using the Kullback-Leibler distance (D KL ) measure defined in Equation (11). Figure 2 shows the result of the deviation from normality analysis. As one can see, D KL saturates for large transformation angle noise σ θ but grows fast with the data eccentricity from the origin ||µ x ||. The former is observed because with infinite σ θ the data become uniformly distributed on an annulus, while the latter occurs because for small ||µ x || the original data distribution, i.e., the variability ellipse, spans the origin and thus transformation noise does not have a big impact. It should also be noted that D KL is invariant to the mean transformation angle ||µ θ || (data not shown).
Next we analyzed the effect of the scaling factor k on D KL , also as a function of σ θ in Figure 3. Here one can observe the effect of σ θ on D KL for small ||µ x || (i.e., ||µ x || = 1), which was not visible in Figure 2 due to effect scaling. Interestingly, k only has an influence on D KL for large σ θ , actually reducing the deviation from normality. This is because scaling in the transformation will result in data being pushed away from the origin and thus result in smaller relative deviations from normality. More interestingly we also analyzed the effect of data correlation ρ and σ θ on D KL . This is shown in Figures 4A, B. As can be observed, deviations from normality grow with increasing data correlation ρ. Thus, increased covariances (and variances) in the data make the transformation result more vulnerable to noise effects. In addition, Figure 4C shows that this relationship depends on the relative contribution of σ x 1 and σ x 2 . Thus the orientation of the correlated noisy original data with respect to the eccentricity (µ x 1 and µ x 2 ) from the origin is an important factor in how noise in the rotation angle influences multi-variate normality.
While deviations from normality can be quantitatively assessed when both the original and transformed data are available, this is not usually the case when dealing with experimental measurements where we often do not have access to the original data distribution but only measure the data after it has been transformed. In order to still be able to assess deviations from normality and to do so regardless of data dimensionality, we developed a novel measure based on the sample's Mahalanobis distance (see Equation 12). This has three advantages. First, it reduces the multi-dimensional data to a one-dimensional measure; second, the Mahalanobis distance (by definition) normalizes the deviations from the mean by the variance thus providing a scale-invariant measure; and three, the Mahalanobis distance of normally distributed data follows a χ 2 distribution. The latter means that we can generate χ 2distributed data for comparison with experimental data if the original data is not available. Figure 5 shows how the Mahalanobis distance of the transformed data behaves as a function of angular transformation noise σ θ when plotted as Q-Q plots. As shown, the larger σ θ , the more the data deviates from the unity line. The unity line represents equal P 1 and P 0 distributions and thus the larger the deviations from the unity line, the larger the deviation from normality of P 1 (note, that original data P 0 is normally distributed here).
Using the procedure illustrated by the Q-Q plots (Figure 5), we can quantify the deviation from normality in a single measure, as outlined in Equation (14). Using this single measure of the empirical distance D from normality, we can analyze its susceptibility to data set size n. This is done in Figure 6. The small influence from data set size on the mean empirical distance D ( Figure 6A) stems from the random nature of the samples and P 0 and P 1 here being independently generated, i.e., P 1 is not a rotated version of P 0 . We did this to analyze the usefulness of this empirical measure for real data where original distributions are often not available and have to be created based on an assumption of the underlying distribution. Thus, for small effects and limited data it is preferable to only compare D across data sets of equal size. As expected, the variance of D also depends on the data set size n ( Figure 6B) and as a result so does the coefficient of variation ( Figure 6C). The larger the data set size n, the more robust the estimation of D.
To demonstrate that our empirical measure of deviations from normality can be effectively used for real data, we computed the deviation from normality for reaching movements under different head roll angles as published in Burns and Blohm, 2010 (see Methods for more details). For visually-guided reaches during head roll, the brain has to transform rotated visual inputs into spatially accurate reach motor commands. This requires a reference frame transformation. Burn and Blohm (2010) have shown in accordance with other studies Sabes, 2003, 2005;McGuire and Sabes, 2009;Burns et al., FIGURE 7 | Experimental validation of deviation from normality measure. When the head was rolled and thus a larger reference frame transformation was needed, data deviated more from the normal distribution as compared to when the head was straight (head roll = 0). Average measures across all 7 subjects and across all six reach targets are shown for each head roll angle (means ± s.e.m.). Asterisks indicate significant differences (ANOVA with post-hoc paired t-tests, p < 0.05). Data points at −30 and 30 • head roll angle were not significantly different from one another (p > 0.1).
2011) that such reference frame transformations introduce noise, and that the larger the transformation angle, the more noise is added. Based on our SRFT theory, this should lead to changes to the normality of reach distributions. Specifically, we expect larger deviations from normality when the head is rolled as opposed to when the head is upright. We test this hypothesis in Figure 7. As one can observe, data is deviated from normality even when the head is upright. This might have many causes, including measurement errors, biomechanical factors or workspace anisotropies (e.g., we use our right arm more for rightward reaches). Regardless, the important observation is to compare deviations from normality when the head is rolled to when the head is upright. Doing so in Figure 7, we find that reaching under eccentric head rolls leads to significantly larger deviations from normality than reaching when the head is upright [Two-Way ANOVA with factors subjects and head roll; main effect of head roll F (2, 117) = 6.27, p = 0.0026; main effect of subjects F (6, 117) = 0.96, p = 0.45; no interaction effect). This validates our hypothesis and confirms that reference frame transformations in the brain should indeed be viewed as being stochastic in nature.

Discussion
We have studied the impact of stochastic noise on reference frame transformations and argue that SRFTs can lead to distortions of the statistical distribution of transformed data. In neuroscience, this idea has been previously suggested (Rossetti et al., 1994;Sabes, 2003, 2005;Blohm and Crawford, 2007;Schlicht and Schrater, 2007;McGuire and Sabes, 2009;Burns and Blohm, 2010;Burns et al., 2011;Tagliabue and McIntyre, 2011, 2013, 2014 but never formally quantified. Indeed, noise added in reference frame transformations should lead to larger variability (in terms of variance) in the system's output. This has been reported for the geometry-dependence of hand localization (Scott and Loeb, 1994;Fuentes and Bastian, 2010), for reaching movements (Rossetti et al., 1994;Blohm and Crawford, 2007;Schlicht and Schrater, 2007;Burns and Blohm, 2010) and for sensory-motor transformations requiring explicit reference frame transformation (Blohm and Crawford, 2007;Burns and Blohm, 2010). We propose a new method to capture the changes of the statistical distribution of experimental data after a reference frame transformation, which tells us that the sensory-to-motor transformation in the brain involves a SRFT based on noisy estimates of the head roll angle (Steinleitner, 1978;Guerraz et al., 2000).
Reference frame transformations are omnipresent in the brain (see Introduction). Therefore we argue that quantifying those SRFTs could provide deep insight into the working principles of the brain. This includes research areas as diverse as multi-sensory integration across reference frames, coordinate transformations in sensory-motor planning and forward/inverse models in motor control. Indeed, based on our theory, one would expect deviations from specific distributions (usually Gaussian) in many of these studies. Our framework will for the first time allow quantifying these effects. In addition, many other research areas face similar problems. For example, in the pose estimation industry different sensors capture data in different reference frames that need to be combined for a unique pose estimate and the relative orientation of those reference frames is often not fixed (such as for a sensor attached to the moving body). In that case, the relative orientation between reference frames needs to be estimated from (noisy) sensory data. Thus SRFTs should be used to quantify the reliability of individual sensory information in the generation of the unique pose estimate. Thus we believe that this study has broad implications for science and industry that go beyond neuroscience research.
The method presented herein can be used to quantify how different experimental conditions affect output statistics, and thus to indirectly estimate the degree of stochasticity of the SRFTs involved, which can provide insight into different processing steps in the brain. There are of course limitations to our framework. So far we have explored 2D effects and did not consider higher dimensions. In general, only translational transformations will maintain normality, which scaling or rotations can result in deviation from the original distribution. It is straight-forward to extend our main findings to 3D rotations, as data can be projected onto a 2D plane orthogonal to the rotation axis. However, investigating the effect of uncertainty in the orientation of the rotation axis remains to be done. Another limitation is that the original data (before any reference frame transformation) is often not known and has to be assumed, at least in neuroscience research. This also means that we do not know what the ideal distribution shape should be, as required by the empirical distance measure. However, one can get around this problem by simply assuming a certain distribution and computing deviations from that distribution for different conditions, such as we have done here. Overall, these limitations are easily overcome in practice and should not prevent successful application of our theory.

Funding
This study was supported by NSERC (Canada), CFI (Canada) and ORF (Canada).