Models of allocentric coding for reaching in naturalistic visual scenes

To reach to objects, humans rely on relative positions of target objects to surrounding objects (allocentric) as well as to their own bodies (egocentric). Previous studies demonstrated that scene configuration and object relevancy to the task modulates the combination weights of allocentric and egocentric information. Egocentric coding for reaching is studied extensively; however, how allocentric information is coupled and used in reaching is unknown. Using a computational approach, we show that clustering mechanisms for allocentric coding combined with causal Bayesian integration of allocentric and egocentric information can account for the observed reaching behavior. To further understand allocentric coding, we propose two strategies, global vs. distributed landmark clustering (GLC vs. DLC). Both models can replicate the current data but each has distinct implications. GLC efficiently encodes the scene relative to a single virtual reference but loses all the local structure information. In contrary, DLC stores more redundant inter-object relationship information. Consequently, DLC is more sensitive to the changes of the scene. Further experiments must differentiate between the two proposed strategies.


Introduction
Previous studies demonstrated that humans combine allocentric and egocentric information when reaching toward visual targets (Byrne & Crawford, 2010;Klinghammer, Blohm, & Fiehler, 2015, 2017. While a vast number of studies investigated how egocentric reference frames are used in movement planning, little is known about how humans incorporate allocentric information in movement planning. To this aim, a series of experiments investigated the role of allocentric information in reaching (Klinghammer et al., 2015(Klinghammer et al., , 2017. Participants were asked to memorize a scene configuration (encoding). After a short delay, a new scene appeared and participants were instructed to reach to missing object position (decoding). Figure 1 illustrates the task procedure. Figure 1. Task procedure, adapted from (Klinghammer et al., 2015).
To change the integrity of allocentric information, sometimes in the decoding scene objects were shifted (horizontally), either in the same or different directions. Figure 2 shows examples of the encoding and decoding scenes. In addition, the number of shifted objects varies; either all the objects in the scene were shifted or only a subset (e.g. 1 out 5 or 3 out 5) were shifted. We observed that shifting the objects in opposite direction or shifting only a subset of objects as opposed to the whole group of objects resulted in lower reliance on the allocentric information. That is the shift in reaching end points were smaller for conditions that violated the scene consistency. While shifting objects in the same direction or shifting all the objects together resulted in higher reaching end point biases. In addition, participants only considered the shifts of the objects that were potential future targets (relevant objects, RO) and almost ignored the changes in the configuration of the rest of the objects (irrelevant objects, IO). Finally, we observed that violating scene consistency, also, increased movement variability. Whereas this data provides clear evidence that contextual factors of the scene modulate the combination weights of egocentric and allocentric information, the underlying mechanisms of allocentric coding is unknown.
To this aim, we propose two chunking/clustering approaches for allocentric landmark coding. Clustering is a suggested method to compress the information of 161 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 complex scenes (Lew & Vul, 2015). These models are used to reconstruct the memorized scene and casual Bayesian inference was used to identify the consistency between the memorized scene and the current scene. Thus, egocentric and allocentric information are combined using causal Bayesian integration. Through simulations, we demonstrate that both models can replicate the current experimental data, while each model has distinct implications.

Modeling the allocentric coding
Similar to the experimental setup, we included two phases in our model: encoding and decoding. In the encoding phase the goal is to memorize the scene configuration. Similar to suggested mechanisms in the visual working memory literature, we use chunking or clustering to compress the amount of information of the complex scenes (Lew & Vul, 2015) and propose two modeling paradigms: global versus distributed allocentric landmark coding. The former encodes the scene by creating a global landmark (cluster) and calculates the distance of objects from this point. The latter encodes the position of the target using Barycentric coordinates relative to a distributed set of landmark points (object clusters). At the decoding phase, the goal is to infer the position of the target from a new scene (allocentric) and remembered information from encoding (egocentric). To make this inference, the global model combines the egocentric and the new scene's global cluster points to reconstruct target position, while the distributed model combines the reconstructed target position from the new scene's clusters and encoding Barycentric coordinates with its remembered egocentric position. In both paradigms, causal Bayesian integration would be used to infer the reliability of allocentric information and its contribution in final estimation of the missing target position. Following demonstrates the mathematical implementation of our proposal.

Encoding
The main challenge in the encoding is to memorize the scene configuration given the limited memory resources. We are proposing two compression mechanisms: global landmark clustering (GLC) and distributed landmark clustering (DLC). We assumed that the internal representation of the objects in the brain can be modeled as a Gaussian distribution.

Global landmark clustering (GLC)
GLC aims to represent the scene configuration with the least amount of resources. This can be achieved by coding the distance of all the objects with regard to the center of the mass of the collection of all the objects. Based on the assumption that the objects have Gaussian distributions, the center of mass (CP) can be calculated as a weighted summation of the object positions. With higher weights assigned to ROs compared to IOs. To summarize the scene configuration, one should calculate the distance vector of each object from the central point. Since all the computations are linear combination of Gaussian distributions, both CP and distance vectors have Gaussian distributions. Storing CP and distance vectors supplies enough information to reconstruct the scene configuration.

Distributed landmark clustering (DLC)
While GLC provides a framework to store scene configuration with the least amount of resources, it ignores all the local statistical structure. In addition, GLC is not sensitive to the relative distances of objects with regard to each other. It has been shown that humans rely less on the far visual landmark. Therefore, DLC aims at restoring the local statistical structure and encodes the positions of the possible target with regards to local clusters of objects. One approach to create local clusters is to use familiar shapes such as triangles or rectangles. In this study we chose triangles and created Barycentric coordinate systems to memorize the scene configuration. Using generalized Barycentric coordinates we can generalize our model to other shapes as well.
Therefore, by storing and Λ 's all the information regarding scene configuration and position of the targeted object can be retrieved later.

Decoding
In the decoding phase the goal is to estimate the position of the missing object based on the current scene configuration ( (̌| 1 …́− 1 ), where ̌ is the estimated position of the missing object and ́ is the positions in the current scene). Here, one first needs to use the memorized information from the encoding to reconstruct the scene configuration and then identify if the current scene in the decoding represents the same scene (c = 1) as before or if they are different scenes (c = 2). Finally, the missing object position should be estimated by combining allocentric and egocentric information. It is worth noting that, since egocentric information is fixed the whole time, we assumed that the CP in GLC and the possible target position in DLC is stored in egocentric coordinates.
Therefore, similar to (Körding et al., 2007), the problem can be formulated as: , where ̌, =1 represents the position of the missing object (target) when one is certain that encoding and decoding phase represent the same scene. Similarly, ̌, =2 is the estimation for different scenes in the encoding and decoding phase.

Reconstructing scene configuration
In GLC, if the CP is recovered the whole scene can be reconstructed, therefore the problem can be written as: (̌| 1 …́− 1 , 1 … −1 ) = (̌|́), Where ́ is a vector of all the central position estimations (́= [ 1 …́− 1 ]) and ́= ́− . Since both ́ and have Gaussian distribution, ́ has Gaussian distribution. The task is to estimate the distribution of the CP based on the predicted CPs provided by each individual object in the scene. This problem is similar causal inference in multisensory integration. If the two scenes are related the memorized information and the current information should be integrated and segregated otherwise. We deployed a similar procedure as (Körding et al., 2007) to build our causal Bayesian integration. Therefore, here we only provide our final analytical solutions to the relevant probabilities and not the overall procedure. To solve equation (8), three components should be calculated and in the following, we briefly explain our solution for each component.

Estimating the probability of the similar scenes
The probability of common cause can be written as: , where ( = 1) = . For (́| = 1) we obtain: It is worth noting that, when there is a common cause, ́s are correlated and the correlation can be calculated based on the encoding assumption. All the factors in the integral are Gaussian and therefore there is an analytical solution: , where Λ = Σ −1 and Σ is the covariance matrix of ́s .
For (́| = 2), we note that ́' s are independent and thus we obtain: Since all these distributions are Gaussian, we can find an analytical solution: , where and 2 are the mean and standard deviation of CP estimation from encoding. Similarly, 1 and 2 are the estimated CP position and the variability of this estimation from decoding.

Estimating the position of the missing object
To finally estimate the position of the missing object, one needs to calculate the position estimations for when one is certain that the two scenes are related (C=1) or not related (C=2).
When one is certain that the two scenes are related, the position can be estimated using Bayesian integration. Since all the position estimations from all objects are correlated and therefore based on (Winkler, 1981) the solution is as following: , where is a vector of ones, Σ is the covariance matrix of ́' s. Similarly, when one is certain that the two scenes are not related, the segregation is selected and therefore ̌, =2 = . Similar procedures can be used for the DLC.

Model predications and implications
As expected both proposed model could replicate the human behavior. Similar to (Klinghammer et al., 2017), allocentric weight is defined as the regression of the amount of shift in the final reach error to the expected shift in end point reach error for shifted allocentric information. For instance, if all the objects are shifted 5 cm to the right, the predicted end point reach error for allocentric information should be also 5 cm. As Figure 3 illustrates, violating the scene coherency modulated the allocentric weight. Specifically, when all the objects shifted coherently, participants relied on allocentric information the most and decreased their reliance as the number of shifted object decrease. But this is only valid for ROs. In other words, only the changes in the scene imposed by ROs were considered in the reaching movements. While both models predict human behavior, each has distinct implications. GLC encodes the statistical structure in a very compact format. However, it causes the loss of the local structure information. This means that when the overall structure of the scene remains intact, such as magnification of the scene, GLC predicts the same position for the missing object even though the surrounding objects are shifted. On the other hand, DLC requires more resources to store some of the local structures. This added complexity can provide higher sensitivity to the changes of the scene. This tradeoff can be taken into account to choose different clustering strategies for task relevant and task irrelevant objects. For instance, one can use the GLC to memorize the overall structure of the irrelevant objects. While a more detailed approach such as DLC would be more beneficiary to memorize the task relevant objects configuration. Future experiments can shed light on how humans select between these strategies.