Territory surveillance and prey management: Wolves keep track of space and time

Abstract Identifying behavioral mechanisms that underlie observed movement patterns is difficult when animals employ sophisticated cognitive‐based strategies. Such strategies may arise when timing of return visits is important, for instance to allow for resource renewal or territorial patrolling. We fitted spatially explicit random‐walk models to GPS movement data of six wolves (Canis lupus; Linnaeus, 1758) from Alberta, Canada to investigate the importance of the following: (1) territorial surveillance likely related to renewal of scent marks along territorial edges, to reduce intraspecific risk among packs, and (2) delay in return to recently hunted areas, which may be related to anti‐predator responses of prey under varying prey densities. The movement models incorporated the spatiotemporal variable “time since last visit,” which acts as a wolf's memory index of its travel history and is integrated into the movement decision along with its position in relation to territory boundaries and information on local prey densities. We used a model selection framework to test hypotheses about the combined importance of these variables in wolf movement strategies. Time‐dependent movement for territory surveillance was supported by all wolf movement tracks. Wolves generally avoided territory edges, but this avoidance was reduced as time since last visit increased. Time‐dependent prey management was weak except in one wolf. This wolf selected locations with longer time since last visit and lower prey density, which led to a longer delay in revisiting high prey density sites. Our study shows that we can use spatially explicit random walks to identify behavioral strategies that merge environmental information and explicit spatiotemporal information on past movements (i.e., “when” and “where”) to make movement decisions. The approach allows us to better understand cognition‐based movement in relation to dynamic environments and resources.


| INTRODUCTION
Recent empirical and theoretic work suggests that cognition and memory are important for animals' daily movements (Fagan et al., 2013).
For example, spatial memory and memory of past experience allow animals to revisit profitable foraging locations and optimize energy intake (Hopkins, 2015;Merkle, Fortin, & Morales, 2014;Nabe-Nielsen, Tougaard, Teilmann, Lucke, & Forchhammer, 2013;Riotte-Lambert, Benhamou, & Chamaillé-Jammes, 2015;Van Moorter et al., 2009) or to travel efficiently to crucial resources such as waterholes (Polansky, Kilian, & Wittemyer, 2015). Cognitive abilities are associated with metabolic needs (e.g., larger brain size, maintenance of neural structures) and may entail both constitutive and induced costs in terms of fecundity and other fitness components (Burns, Foucaud, & Mery, 2011). Therefore, we would expect to find cognitive-based movement predominantly under conditions where benefits can outweigh costs, that is when resources are heterogeneous in space and time but also predictable (Avgar, Deardon, & Fryxell, 2013;Mueller, Fagan, & Grimm, 2011), and when resource patch density is low and distances between patches are high (Bracis, Gurarie, Van Moorter, & Goodwin, 2015;Grove, 2013). Despite the growing effort in addressing cognition in movement studies and the evidence that it can be important, unraveling the role of cognition and memory for movement is still inherently difficult because these processes can be inferred only indirectly, which requires both creative and state-of-the-art methodology (Fagan et al., 2013).
Here, we address whether gray wolves (Canis lupus) integrate spatiotemporal aspects (i.e., the "when" and "where") of their own travel history into their movement decisions. That memory of travel history is important in wolf movement decisions is reasonable because wolves exhibit little daily overlap in use of their territory, especially in winter, and it raises the questions as to the underlying mechanism (Jedrzejewski, Schmidt, Theuerkauf, Jedrzejewska, & Okarma, 2001).
We use a novel method of modeling memory-based animal movements (Schlägel & Lewis, 2014) to assess hypotheses (Table 1)  Wolves are known to be territorial and to scent mark their territories to advertise their presence to wolves from other packs (Lewis & Murray, 1993;Peters & Mech, 1975;Zub et al., 2003). Scent marks can be found across the territory, but usually territory edges are marked more heavily, especially when they border neighboring packs (Mech & Boitani, 2006;Peters & Mech, 1975;Zub et al., 2003). If fatal encounters with individuals from other packs occur close to the territory edge (Mech, 1994), we would expect avoidance of territory edges to be a major driver to wolf movement (risk avoidance; H 1 ). However, if scent marks decay and must be renewed regularly, we would expect avoidance of territory edges to decline for long TSLV (territory surveillance; H 2a , H 2b ) due to renewing scent marks.
Movement of wolves also may be driven by strategies for efficient prey capture. For example, selecting areas of high prey density (prey selection; H 3 ) would reduce search time to find and potentially kill a T A B L E 1 Tested hypotheses regarding drivers of wolf movement. Our main interest lies in testing time-dependent movement strategies (H 2 , H 5 , and H 6 ) but we included time-independent movement behaviors as possible simpler explanations (H 0 , H 1 , H 3 , and H 4 ). The probability of selecting a location is modeled as a logistic weighting function of the spatial attributes time since last visit (TSLV), distance from territory edge (edge), and prey density (prey) within a spatially explicit movement model. For hypotheses involving two spatial attributes, we tested both a model with additive term in the linear predictor (resulting in a shift of the logistic weighting function) and a model with additional multiplicative interaction (changing also the steepness of the logistic weighting function)

Hypothesis
Behavior Spatial attributes (model) A positive coefficient for this attribute means that the probability of selecting a location increases with its distance from the edge, that is, toward central locations. b Dominant effect of the attribute on the probability of selection (a negative coefficient can be compensated for a range of attribute values by a positive interaction term).
prey (Holling, 1959;McPhee, Webb, & Merrill, 2012). However, if prey concentrate in buffer zones between wolf territories that act as ref- uges to prey (Mech, 1994), wolves are faced with making trade-offs in finding prey while at the same time avoiding conspecifics from other packs (prey selection and risk avoidance; H 4a , H 4b ).
Prey can exhibit temporary avoidance, heightened vigilance, or retreat to safer habitats in areas of recent wolf presence or where conspecifics were recently killed by wolves (Berger-Tal & Bar-David, 2015;Latombe, Fortin, & Parrott, 2014;. Contrary to predictions by the "risky places hypothesis," which accounts only for varying antipredator behavior across sites with different long-term predation risk, observations of elk responses to wolves suggest that antipredator behavior adjusts dynamically to the presence of wolves in line with the "risky times hypothesis" and the "risk allocation hypothesis" (Creel, Winnie, Christianson, & Liley, 2008;Robinson & Merrill, 2013). These behavioral responses lower predation success, an effect called behavioral depression of prey (Charnov, Orians, & Hyatt, 1976).
To optimize hunting success, wolves may not only optimize giving-up times (Brown, Laundré, & Gurung, 1999;Charnov et al., 1976), but also select for longer TSLV (delayed return; H 5 ) to allow time for prey behavior to recover (Latombe et al., 2014;Laundré, 2010). This also spreads the risk over all hunting sites (Lima, 2002). However, wolves may return sooner to areas of high prey density (prey management; H 6a , H 6b ) because of success in finding prey (Kunkel & Pletscher, 2001;McPhee et al., 2012) and greater variation in recovery times of individual prey.
We examined the support for these hypotheses in a model selection framework using movement data of six GPS-collared wolves in winter when denning is less likely to influence movement, and packs are likely to be more cohesive (Metz, Vucetich, Smith, Stahler, & Peterson, 2011). We contrasted our behaviorally based models with a null model that assumed no preferences for spatiotemporal behaviors (H 0 ). We fit observed movement trajectories to random walks that included behavioral mechanisms via a spatially explicit and dynamic resource-selection component (Schlägel & Lewis, 2014). With this, we illustrate how to detect an interplay of travel history with current movement decisions in movement patterns of free-ranging animals.  Webb et al., 2008). The collars were programmed to collect location measurements every 2 hr. This led to regular time series of observed movement steps.

| Wolf and ungulate prey data
Successful fix attempts for locations were 90% (3300Sw model) and 82% (4400S model) indicating habitat-induced GPS bias was minimal (Frair et al., 2004;Hebblewhite, Percy, & Merrill, 2007). We analyzed data of six wolves from different packs whose territories were in the eastern part of the study area with low elevation and no mountain val-  (Basille et al., 2015;Kuijper et al., 2014). Wolf movement trajectories were considered accordingly on this spatial grid of cells, using the coordinates of the cell centers. Each location of a wolf was attributed to the grid cell in which it fell.

| Spatial information and travel history
Relocation data were analyzed using statistical movement models developed by Schlägel and Lewis (2014). These models are spatially explicit random walks in which spatial information influences movement decisions. The random walk is performed on a discrete grid of cells in correspondence to the prey density data. To test the hypothesized explanations of wolf movement behavior (Table 1), three types of spatial attributes were considered. First, the combined prey density measure (prey) was normalized over the territory (see next paragraph) of each wolf. Second, for each territory, the minimum distance of each location from the territory edge (edge) was calculated. Distance from edge is zero at the territory edge and increases for locations more centered within the territory. Third, time since last visit (TSLV) was based on an individual's own travel history. TSLV was defined to specify at each A territory was defined for each wolf based on a Brownian bridge kernel estimate of the individual's utilization distribution obtained with R package "adehabitatHR" (Calenge, 2006;Horne, Garton, Krone, & Lewis, 2007). For this estimation, we used all locations including the first 300 steps for initializing TSLV. The purpose of the territory was twofold. We used it to estimate the "edge" of the territory, close to which the mortality risk due to aggressive encounters with other wolf packs may be higher. We also used the edge as a reflective boundary in the movement model to avoid an artificial avoidance of areas with long TSLV that were not visited during our study period for possibly external reasons (e.g., other pack activity). Therefore, the territory included all locations within the 99.9% quantile of the estimated utilization distribution ( Figure 1), which was the area that contained all locations possibly relevant for an individual during the study period.

| Movement model
In the models, two aspects affect the probability for a movement step between times t − 1 and t from location x t−1 to x t . First, a movement kernel k describes general tendencies regarding speed and directional persistence. Here, the kernel is composed of a Weibull distribution for step lengths and a uniform distribution for bearings (Appendix A1).
Second, given a probability distribution for a step based on the kernel k, a weighting function w adjusts these probabilities based on preferences for the three spatial attributes, which are encoded in the vector . Because the model is spatially explicit, each location x has its own values of the spatial attributes, that is  t (x) = (prey(x),edge(x),TSLV(x)).
The overall step choice probability is given by .

F I G U R E 1
Maps of winter movements of six individual wolves during 10-20 weeks. Colors reflect standardized prey density. Prey density is a combined measure of densities of the main ungulate prey species (deer, elk, moose, feral horse). Black circles are wolf locations with black lines indicating the straight-line steps between locations. Depicted are only "relocating" steps used for the anysis and exclude non-relocating steps such as when handling prey and resting (number of relocating steps was 177-332) We recall that locations represent discrete 300 × 300-m cells in space. The sum in the denominator is a normalization constant over a large enough area Ω around the current location such that the probability of stepping outside this area is negligible. The radius of the area Ω (ranging 30-44 cells, i.e., 9.0-13.2 km) was chosen to be larger than the longest step taken by the wolf. Steps outside the territory have probability zero.
The weighting function is modeled after a resource selection probability function (Lele, Merrill, Keim, & Boyce, 2013), giving the binomial probability of selecting a location x based on the attributes of the location,  t (x). Here, we used a logistic form, The predictor term f( t (x),α,β,γ) contains additive and multiplicative combinations of the spatial attributes, according to our hypotheses (Table 1). For the no preference model, the weighting function is constant over space, that is, f( t (x),α,β,γ) = 0, and only the kernel k influences movement. In the models risk avoidance, prey selection, and delayed return, the weighting function includes one spatial attribute, and the predictor term f( t (x),α,β,γ) is simply given by α+β tslv ⋅ TSLV t (x), α+β edge ⋅ edge(x), or α+β prey ⋅ prey(x), for each model, respectively. The parameter α is the intercept of the predictor term.
For a sigmoidal logistic function, it determines the position of the inflection point of the curve, that is, where the function reaches the value 0.5. For hypotheses that involved two spatial attributes, two models were considered, one with additive term only and one with additional multiplicative interaction. For the model prey selection and risk avoidance, the additive term is α+β edge ⋅ edge(x) +β prey ⋅ prey(x) The models territory surveillance and prey management were built analogously, with interaction parameters γ t,e and γ t,p , respectively. The parameters α, β tslv , β edge , β prey , γ e,p , γ t,e , γ t,p determine the direction and strength of preferences.
Following Aarts, Fieberg, and Matthiopoulos (2012), the weighting function w( t (x)) is a function of geographical space, x, via the spatial attributes  t (x) at a location x. It can alternatively be viewed as a weighting function over environmental space, , where attribute values  range over the three different spatial attributes TSLV, prey, and edge. This latter perspective allows an interpretation of the effects of spatial attributes on movement decisions similar to a classical step-selection analysis (Fortin et al., 2005). When considering the weighting function in environmental space, w( ), as a function of one variable, for example, w(TSLV), it is a sigmoidal curve. Additional additive terms of the other attributes (having β coefficients) shift the curve, whereas multiplicative terms (having γ coefficients) additionally influence the nonlinearity or shape of the curve. A shift in the curve means that the switch from an avoidance (small probability of selection) to a preference (high probability of selection) of a location happens at a different value of the spatial attribute. If the steepness of the curve increases (decreases), the switch happens more (less) abruptly.

| Statistical analysis
Movement data were analyzed individually for each wolf, comparing the fit of 10 models ( Maximum-likelihood estimates of the model parameters were obtained by numerically optimizing the model's likelihood function T A B L E 2 Parameter estimates, together with standard errors (SE) of the kernel k describing general movement tendencies (i.e., step length). The parameters are the shape (λ) and scale (σ) of the Weibull distribution used to model step length. The last column gives the mean of the resulting Weibull distribution. For each wolf, we show the parameter estimates from the best model compared to the null model. The null model consistently overestimates general tendencies for step length Because the analysis operated on 300 × 300 m cells, the mean values translate into meters via multiplication by 300, for example, a mean of 10 translates into a mean step length of 3 km ± 300 m.
using a Nelder-Mead algorithm implemented in R (R Core Team 2015, Appendix A1). Model selection was performed via Akaike information criterion (AIC). We used the small sample criterion AIC c because ratios of available steps to number of parameters for the most complex model were ≤40 (Burnham & Anderson, 2002). Within nested models, more complex models were selected when AIC c differences were larger than 2. We based this on the rule of thumb given by Burnham and Anderson (2002) that AIC differences of 2 or smaller indicate substantial support for a model. Adhering to the principle of parsimony, we therefore only selected a more complex model when its ΔAIC c was larger than 2 com-

| General movement tendencies
Based on the best-fit model, mean displacements over 2-hr time intervals (relocating steps only), calculated from parameters of the Weibull distribution for step lengths in the movement kernel k, ranged from 2,500 to 3,600 m (±300 m due to the spatial discretization) for the six wolves (Table 2). When comparing this with estimates based on the null model, there is a consistent trend. Estimates of both the shape (λ) and scale (σ) of the Weibull distribution were smaller for the bestfit model, which included selection for spatial attributes, than for the null model ( Table 2). The null model distribution corresponds to the "empirical kernel" used in classic step-selection analyses to sample "control" steps (Fortin et al., 2005). Here, this would have consistently overestimated step length by approximately 300-570 m per 2-hr step.

| Selection for spatial attributes
For all six wolves, the territory surveillance model with interaction of TSLV and distance from territory edge (H 2b ) had minimum AIC c (Table 3). For one individual, w230, the same minimum AIC c was reached by the prey management model with additive terms of TSLV and prey (H 6a ). The territory surveillance and prey management hypotheses are not mutually exclusive, and therefore both could be supported by the data without contradiction. Because this suggested the importance of both territory surveillance and prey management, we also tested a combined model with these terms (TSLV + edge + prey + TSLV × edge + TSLV × prey) in the weighting function. For w230, this became the best model, and for w284 the model performed similarly well as the territory surveillance model but was neither significantly better nor parsimonious (Table A1 in Appendix A2).
Parameter estimates of the weighting function for the territory surveillance model (H 2b ) of all wolves were consistent with our predictions. All multiplicative coefficients (γ t,e ) were positive and their confidence intervals did not overlap zero, while most of the additive coefficients (β edge , β tslv ) had confidence intervals that overlapped zero (Table 4). The overall effect of TSLV and edge on the probability of selection (modeled by the weighting function) was dominated by the multiplicative coefficient γ t,e and was therefore positive. The overall selection coefficient for edge, given TSLV, was β edge +γ t,e log (TSLV). As TSLV increased, this became positive already at TSLV = 2 (4 hr) in all cases. Similarly, the overall selection coefficient for TSLV, given edge, was β tslv +γ t,e ⋅ edge. As edge increased, starting from 0, this became positive at edge = 1 or 2 (corresponding to approximately 300-900 m from the edge) in all cases. As a result, there was strong evidence for wolves avoiding territorial boundaries, and as TSLV increased, wolf avoidance of the edge declined (Figures 2 and A2). When locations had not been visited for more than approximately 7 days, the weighting function approached a function nearly constant at one, which means that edge and central locations were selected with the same probability.
Movement patterns of wolf w230 also supported the prey management model (H 6a ), where parameter estimates and the resulting weighting function only partly agreed with our expectations in relation to the prey management hypothesis (Table 4). Consistent with our prediction, the selection coefficient β tslv was positive, and therefore the wolf selected for longer TSLV, indicating that returns to previously  Figure 3b). However, the coefficient β edge was negative and the wolf selected for locations with lower prey density (Table 4, Figure 3a). As a result, the inflection point of the sigmoidal curve from low selection of recently visited sites to high selection of sites with longer absence was shifted to a higher value of TSLV, which led to a longer delay in revisiting sites when prey density was high ( Figure 3b). Likewise, the selection for lower prey density was shifted to the right for increasing values of TSLV, which resulted in nearly equal selection for all prey densities after 5 days of absence ( Figure 3a).
When considering the combined territory surveillance and prey management model for wolf w230, all estimated selection coefficients (all β and γ coefficients) had large confidence intervals that overlapped zero (Table A2). When we plotted the weighting function based on these estimates nonetheless, it was constant at one over most of the range of the spatial attributes, with only two exceptions ( Figure A3).
First, abrupt declines to zero selection occurred for locations that had been visited during the last 2-hr-time step, which simply may indicate persistent movement. Second, the estimates predicted a decline to zero selection for the lowest prey density very close to the edge, and this effect vanished already slightly further inside the territory.
Considering also the large and zero-overlapping confidence intervals, these effects may be over-fits to spurious effects at the most extreme ends of the attribute values.

| DISCUSSION
We investigated how the time since last visiting a location influenced movement decisions in relation to territory surveillance and prey management. Our models are statistical in the sense that they define a probability distribution for observed movements but mechanistic in that they describe a behavioral movement process. This is in contrast to classic resource (or step) selection analyses that treat movement steps as independent data points and sample control locations (or steps) before estimating selection coefficients (Forester, Im, & Rathouz, 2009;Fortin et al., 2005). The advantage of our method is that parameters of general movement tendencies and spatially explicit preferences are estimated simultaneously without assuming that the two aspects are independent (see also Avgar, Potts, Lewis, & Boyce, 2016), which produces consistently lower estimates of the step length distribution than if independent, a priori estimates for step α β tslv β edge β prey γ e,p γ t,e γ t,p Negative coefficients for the additive term still resulted in a mostly positive relationship due to the positive interaction coefficient because the overall selection coefficient for TSLV, given edge, is β tslv + γ t,e ⋅ edge. Vice versa, the overall selection coefficient for edge, given TSLV, is β edge + γ t,e ⋅ TSLV.
T A B L E 4 Parameter estimates (Est.) and standard error (SE) of the best-fit logistic weighting function w based on spatial attributes TSLV, distance from territory edge (edge), and prey density (prey). Parameters β are selection coefficients of additive terms (shifting w), and parameters γ describe the multiplicative interaction of two attributes (changing shape of w nonlinearly). Parameter α is the intercept of the linear predictor and determines the position of the inflection point of the logistic weighting function where it reaches 0.5. Two best model estimates are given for wolf 230 because they had equal support. Estimates for which Wald-type 95% confidence intervals do not overlap zero are highlighted in italics length would have been used. An additional advantage to this approach that we did not use in this analysis is incorporating directional autocorrelation of movement in the movement kernel k (Schlägel & Lewis, 2014). In our case, we did not use this approach because our time series spanned only several weeks, and because we eliminated non-relocating behaviors such as handling a kill, resting away from a kill site, or revisiting kill sites Merrill et al., 2010).
Using autocorrelated bearings would have decreased the number of steps available for the analysis even further, because more than two successive location measurements would have been needed to define the probability of a step.  (Carbyn, 1983;Mech & Harper, 2002). Second, the probability of wolves revisiting these areas increased in time suggesting wolves were responding to a decay in scent marks, which are needed for territorial maintenance (Peters & Mech, 1975;Zub et al., 2003). Scent marks contain pheromones and chemical signals that elicit responses from other individuals and can prevent direct, aggressive encounters (Mech, 1994). They are thought to be an effective means of advertisement because the scent remains in the environment for some time and is readily detected even at night (Feldhamer, Drickamer, Vessey, & Merritt, 2004). Peterson (1974) found on Isle Royale that wolves reversed direction of travel and retreated when they encountered a foreign scent mark along the edge of their territory. Ausband, Mitchell, Bassing, and White (2013) also reported that wolves will avoid areas where humans place wolf scats if they are regularly maintained.
Indeed, the consistency in territorial surveillance among all six wolves indicates there is strong motivation for rotational movements to revisit the territory edge for territory maintenance (Jedrzejewski et al., 2001).
In contrast, we found less support for prey density influencing wolf movements and for movements being consistent with behavioral depression of prey. One of the six wolves showed some evidence of its movements being influenced by prey density, but even this wolf did not select for areas of high prey density as was reported for this  weighting function w TSLV 1 week 1 day 6 hr 2 hr

F I G U R E 3
Weighting function for the prey management model (H 6a ), with parameter estimates from individual w230, for which this model shared first rank. The weighting function gives the probability of selecting a location based on spatial attributes, here depicted as function (in environmental space) of (a) prey density and (b) time since last visit (TSLV). Prey density (numbers per cell) was standardized across the territory. When prey density was high, the probability of selecting a location was higher when time since last visit was longer. When prey density was low, time since last visit mattered less rather than prey density influenced movements. In our approach, we focused on selection along the hunt path and found that the wolf selected areas of low rather than high density. In addition, we analyzed "relocating" movements and did not include short steps. Wolves possibly slow down in high prey density areas, which could have led to a removal of short steps in high prey density areas in our analysis. The movements of the wolf with an effect of prey density also showed evidence of prey management, where a predator delays a revisit to an area because a visit evokes prey behaviors that make them less vulnerable (Charnov et al., 1976;Jedrzejewski et al., 2001;Kotler, 1992;Laporte, Muhly, Pitt, Alexander, & Musiani, 2010). We had expected that wolves would revisit sites with high density sooner because there could be higher variation among individual prey relaxing postencounter anti-predator behaviors and predisposing them to wolf attacks; however, when locations had been visited recently, selection by wolf w230 was highest for areas of low prey density perhaps because low densities are associated with increased vulnerability if group sizes are small (Bergmann et al., 2006;Hebblewhite & Pletscher, 2002;Kuzyk, Kneteman, & Schmiegelow, 2004).
From a modeling point of view, we were able to test the influence of time since last visit separately for territory maintenance and for foraging behaviors; however, an integration of the two behaviors within one model was more difficult. For wolf w230, the combined model fit better than the territory surveillance and the prey management model alone. But parameter estimates of the weighting function in the combined model suggested an over-fit to spurious effects of the spatial attributes at their most extreme values. A possible explanation is that wolves make decisions in a way that our logistic weighting function was unable to represent. The logistic function could track earlier or later returns to locations based on distance to edge or prey density.
However, wolves may assimilate territorial and foraging behaviors in a different nonlinear way (Rothley, Schmitz, & Cohon, 1997). We suggest further research along this line, possibly by modifying the form of weighting function in our modeling framework.
Our model discretizes both space and time, which has implications for the generality of our results. In our random-walk model, we implicitly assume that temporal scales of the underlying behavioral process and our data (2-hourly) match. This is a common problem when fitting discrete-time movement models to data for statistical inference, leading to parameter estimates that are tied to the scale of the analysis and that may not necessarily agree with the "true" parameter values at the scale of the behavioral process (Schlägel & Lewis, 2016). Despite this, we believe our results qualitatively reflect the wolves' behavior, also because we used a logistic form of the weighting function instead of an exponential form; the former having performed better in a simulation-based analysis of the robustness of resource-selection type movement models (Schlägel & Lewis, 2016).
In general, impact of spatial resolution is less clearly understood. In our analysis, we used a relatively coarse discretization of 300 × 300 m cells. Using a finer discretization would have increased computational burden because the bottleneck during likelihood function optimization was the computation of the normalization constant in the step probability (eqn 1). This constant requires multiplication of kernel and weighting function for all locations within an area that the individual may possibly move to based on the current location (and this constant has to be computed for every data point in the time series). For a finer spatial resolution, the same area would consist of more locations, which would (nonlinearly) increase the amount of calculations necessary. With increasing computational power, or by further streamlining the code, it may be possible to reduce current runtime (1-2 days for our six wolves using multiple CPUs). However, we considered the discretization sufficient because of the design of TSLV in our model. For calculating TSLV, we used a buffer of about 1.2 km around the straight line between consecutive GPS fixes because wolf passage affects prey behavior beyond the actual movement path (Latombe et al., 2014;. Therefore, for the sake of TSLV, a finer spatial discretization would not have increased the resolution biologically. Ideally, the size of the buffer would be integrated as a free parameter that is estimated during model fitting, in which case it could vary for different models (e.g., prey management and territory surveillance). In our analysis, we fixed the buffer size to keep model complexity at a reasonable level given the limited time series length of our data.
The approach in this paper provides a step forward in the ongoing attempt to incorporate cognition and memory in movement analyses (Avgar et al., 2015;Börger, Dalziel, & Fryxell, 2008;Fagan et al., 2013;Oliveira-Santos, Forester, Piovezan, Tomas, & Fernandez, 2016). Our method goes beyond previous approaches that investigate traplining (Ohashi, Leslie, & Thomson, 2008) or periodicity in recursive movement patterns (Bar-David et al., 2009;English et al., 2014;Giotto, Gerard, Ziv, Bouskila, & Bar-David, 2015). In our models, time since last visit to locations is a spatially explicit feature that influences movement decisions in combination with information on territory geometry and prey densities. This allowed us to investigate behaviorally complex movement strategies in wolves, and we demonstrated that time since last visit influenced future movement decisions in relation to territory surveillance and prey management. Our approach can similarly be used to study the effect of time since last visit in other contexts of resource renewal (e.g., D'Souza, Patankar, Arthur, Marbà, & Alcoverro, 2015;Janmaat et al., 2006).
Despite some progress in studying cognitive aspects of animal movement, few studies have quantified the temporal and spatial scales at which individuals are aware of and respond to non-local information. Reported time spans during which ungulates shift their habitat selection after wolf presence range from 1 day (Creel, Winnie, Maxwell, Hamlin, & Creel, 2005) to up to 10 days (Latombe et al., 2014). In contrast, Avgar et al. (2015) found indication of no memory decay in a space-use analysis of woodland caribou. In our study, wolf w230 showed a varying response to prey density within approximately 5 days since last visit, after which the probability of selection leveled off at one for all locations. Similarly, after approximately 7 days of absence, wolf movement decisions became irrespective of distance from territory edge. These estimates are roughly in line with the scales reported by Latombe et al. (2014). In a predator-prey system where predators win the behavioral response race (Sih, 2005) we may expect predators' response times to be larger than the prey's response, and vice versa. We need more studies that track predators and prey simultaneously and analyze the temporal scales of awareness for both predators and prey to elucidate this further. Simultaneous tracking studies have the additional advantage that temporal scales of data can be matched. As discussed above, we should expect parameter estimates from resource-selection type analyses to be scale dependent.
Unless we use truly robust models, comparisons of cognitive awareness are best to be attempted when models make the same assumptions about the scales of the behavioral processes.
In our analysis, we used a fixed buffer size for modeling the spatial extent at which locations were considered "visited" for the purpose of calculating TSLV. A possible extension of our model would treat the buffer size as a free parameter to be estimated during model fitting.
With this, it would be possible to also gauge the spatial scale at which individuals experience their environment for this specific purpose.
Using information on elapsed times ("how long ago?") can be part of episodic-like memory in animals, a complex form of memory on the what, when, and where of events, which has been demonstrated in experiments in birds, rodents, and apes (Clayton & Dickinson, 1998;Martin-Ordas, Haun, Colmenares, & Call, 2010;Roberts et al., 2008). Wolves may store and retrieve information on elapsed times in internal memory (Jacobs, Allen, Nguyen, & Fortin, 2013;Lew, 2011), but wolves may also use externalized memory in the form of their own scent marks (Peters & Mech, 1975), as has been argued for neurologically simple amoebae (Reid, Beekman, Latty, & Dussutour, 2013). However, whereas scent marks need to be encountered to retrieve information on previous visits, internal memory allows more efficient integration of information for goaloriented movement (Asensio & Brockelman, 2011;Polansky et al., 2015). Therefore, including goal-oriented movement rules in a modeling framework such as ours would further elucidate the importance of internal memory.

CONFLICT OF INTEREST
None declared.
Processed data and R code to run the analysis for one individual can be found online in the Supporting Information for this article.

COMBINED PREY DATA
To calculate the combined prey density measure, we calculated a weighted sum of prey numbers of all four species. That is, our measure of prey density was where N is the number of individuals of the prey species indicated in the subscript, and w is a weight between 0 and 1 to adjust for the size of the prey. The weights were based on ungulate bodymass in winter ), for simplicity averaged over female and males, To make the weights unitless, we converted them to a number between 0 and 1 by dividing the value of each species by the sum of all,

TIME SINCE LAST VISIT
We defined the variable TSLV to specify at each time step t, and for each location x, the time, measured in time steps, since the animal had last been to the location, denoted by m t (x). For example, if between times t−1 and t the animal moved from location x t−1 to x t , we considered all locations on the path from x t−1 to x t as most recently visited and and set their value of TSLV at time t to be 1. That is, we defined m t (z) = 1 for all locations z that lie on the path between x t−1 and x t . For the calculation of TSLV, we defined the path to be the straight line between two locations. Because it is unlikely that an individual moves in a straight line, we also considered locations within a certain distance of the line as visited (for the purpose of calculating TSLV). For these locations, TSLV was also set to 1. Because we aimed to understand the influence of the travel history in relation to prey, we took into account at which distances wolf presence influences prey behavior.
Studies on elk-wolf relationships found that wolf presence can affect elk behavior, such as group size, vigilance, and movement rates, at distances of 1-5 km Proffitt et al. 2009). Here, we used discretized space with landscape cells of size 300 × 300 m. We defined a buffer around a cell using the rectilinear distance measure (more colloquially also referred to as Manhattan distance) where x and y are two locations on the grid with easting (x-axis) and northing (y-axis) coordinates x east , y east and x north , y north , respectively.
The coordinates were taken from the center of each cell. With this distance measure, the buffer becomes a diamond-shaped area around the center cell. If we define a buffer of size δ around the location x, the corners of the buffer area are those cells that are δ cells away from x in exact northern, eastern, southern, and western directions. For the calculation of TSLV, we used a buffer of size of four cells. We calculated the buffer for each cell that is intersected by the straight line of a step (Fig. A1,a). A distance of four cells in the discretized space corresponds to 1.2 km in continuous space. Using a buffer around the straight line between two locations was a simple way of accounting for the fact that we did not observe all locations that an animal visited on its path.
A more sophisticated approach would be to implement, for example, a Brownian bridge for the estimated path between two successive locations (Horne et al. 2007). One could even go further and expand a Brownian bridge model to include the more complex movement mechanisms studied here.
For all other locations that were not considered visited, TSLV increased by 1 at every time step. That is, we set m t (z) = m t−1 (z) + 1 for locations z not visited during times t−1 and t. This led to a map with values of TSLV similar to a map with environmental attributes, but which changed at every time step. TSLV increased in areas that the individual stayed away from and was reset to 1 whenever an individual visited a location, that is, when it came sufficiently close to the location (Fig. A1,b). The dynamic map was updated at the end of each movement step, and therefore, the weighting function w at time t was based on TSLV at time t−1.
Given TSLV for some point in time, it is straightforward to update it for all following time steps based on the animal's movement path. To obtain an initial map of TSLV, we separated movement trajectories into two segments. We used the first 300 movement steps to initialize TSLV and used the rest of the trajectory for statistical inference. The time that corresponded to the beginning of the second part of the trajectory was set to be t = 1. We calculated TSLV at t = 1 for all locations that were visited during the initialization phase. For locations that were not visited, we set TSLV to the length of the initialization phase.
The trajectories contained missed observations. If at a time step t the corresponding location was missing, we updated TSLV by increas- ing TSLV for all locations by 1. We did not reset any value to 1 because there was no current path available. However, we accounted for this later at the next available time step. At that time, we reset TSLV to 1 for the entire path since the last available location. Because at least two time intervals had passed since the last location, we increased the buffer size for these longer steps by 2.

MOVEMENT KERNEL
The general movement kernel k is the density function of a random walk in discretized two-dimensional space. For this, we sampled a continuous-space density at discrete points (representing the center  TSLV location of each cell in the landscape). The normalization constant in the step probability given in the manuscript assures that step probabilities are properly normalized over the discretized space. We used a Weibull distribution for step lengths and assumed a uniform distribution for bearings. A major reason for using simply a uniform distribution for bearings was to retain as many steps as possible. In a correlated random walk, bearings are autocorrelated, and therefore, three successive location measurements are needed to define the probability for one movement step. With missing measurements in the time series, this would decrease the number of available steps. For example, for wolf w233 with 177 steps available for analysis, this would remove 21 of those steps, despite a reasonable fix rate of 84%. Given that using autocorrelated bearings would also add a parameter, this would reduce the ratio of available data points and number of parameters to 17 (from 22).
Thus, the kernel is given by where λ and σ are the shape and scale parameter of the Weibull distribution, respectively. The factor 1/|| x−y || is due to a change from polar coordinates (using step lengths and bearings to define a step) to Euclidean spatial coordinates.

RELOCATING STEPS
We only analyzed steps with a minimum length.  used a hidden Markov model to identify the three major modes "bedding," "localized activity," and "relocating" in wolf behavior. They found that the relocating mode was characterized by steps with length above 200 m, with the majority of steps between 500 and 2500 m. These distances were obtained using movement data with hourly location measurements and therefore were not immediately transferrable to our study with 2-hourly movement data. Roughly, steps at a rate of 500 m per hour may be converted to 1000 m per 2 hr although it is known that measurements of travel distance are influenced by sampling rate, and the longer the time interval between location measurements the larger the risk of underestimating true travel distance (Pépin et al. 2004;Rowcliffe et al. 2012). Still, with these considerations, it seemed appropriate to set the threshold for defining relocating steps at about 1000 m. If movement was straight in east-west or northsouth direction, 1000 m corresponded to about three cells in the discretized space. Another point to consider for the threshold was the use of the buffer for TSLV. If a step was within the buffer size of the last visited location, the step naturally ended at a location with TSLV = 1 (Fig. A1,c). In contrast, if a step was larger than the buffer size, which was four cells (~1200 m), it could end at a location with TSLV = 1, especially when the animal backtracked. However, there was also a chance that the step ended in a location outside the buffer of the previous step with TSLV > 1. To avoid an artificial bias toward smaller values of TSLV for small steps, we defined the minimum step length to be five cells, corresponding roughly to 1,500 m in continuous space (Fig. A1,c).
Using only steps with a minimum length, strictly speaking we would have to adjust the movement kernel k by truncating the Weibull distribution at the minimum step length. This would lead to a slightly lower estimated mean step length for all models. Given the fairly large mean step lengths, we did not consider this problematic. In addition, we did not implement the truncated Weibull because our aim was to restrict the analysis to steps that can be attributed to a "relocating" behavioral mode. Ideally, a distinction of behavioral modes is performed by other means, for example, using a hidden Markov model (McClintock et al. 2012), in which case "relocating" steps could have also smaller step length. We did not embed our model into a hidden Markov model, because based on our restricted time series length, we could not increase model complexity arbitrarily. However, for data sets with longer time series length, for example, due to higher temporal resolution, we recommend more sophisticated approaches to data segmentation.

LIKELIHOOD FUNCTION AND OPTIMIZATION
The likelihood function of the model is for all available steps from Pépin, D., Adrados, C., Mann, C., & Janeau, G. (2004). Assessing real daily distance travelled by ungulates using differential GPS locations.

APPENDIX A2
T A B L E A 1 . Model selection results when the models with time-dependent effect for both distance from edge and prey density (last two rows) were included. Presented are AIC c differences, ΔAIC c,i = AIC c,i − AIC c,min for each model i. Best models are highlighted in bold. For individual w230 the most complex model becomes best, however parameter estimates suggest that the model is an over-fit to spurious effects of extreme values of the spatial attributes (Table A2, Fig. A3).

F I G . A 2
Weighting function for the territory surveillance model (H 2b ) with minimum AIC c using the estimated parameters. Panels correspond to individual wolves. The weighting function gives the probability of selecting a location based on spatial attributes, here depicted as function of distance from territory edge (in environmental space). Distance from the edge is measured in discrete cells (300 × 300 m). For all wolves, the increasing direction of the sigmoidal curve indicates that locations closer to the edge are avoided. With increasing time since last visit (TSLV), the curve is shifted to the left due to the positive coefficient (β tslv ) and becomes steeper due to the positive interaction parameter (γ t,e ), indicating that the avoidance of edge locations decreases. The weighting functions for the different wolves show the same general pattern and only vary slightly. This variation is likely due to individual variation of the wolves' behavior and territories (see also Fig. 1).  The weighting function gives the probability of selecting a location based on spatial attributes, here depicted as function in environmental space of the three spatial attributes time since last visit (TSLV) (a and b), prey density (c and d), and distance from territory edge (e and f). TSLV is measured in time steps; prey density (number per cell) is standardized over the territory; distance from edge is measured in cells (300 × 300 m). a and c: Distance from edge is fixed at 2 (approx. 600-900 m from edge). b and d: Distance is fixed at 5 (approx. 1.8-2.1 km from edge). e and f: Prey density is fixed at −1 (below average) and 1 (above average), respectively. The weighting function was nearly constant at 1 across most of the ranges of spatial attributes, with only two notable exceptions. First, abrupt declines to zero selection occurred for locations that had been visited during the last 2-h-time step (all panels), which simply may indicate persistent movement. Second, the estimates predict a decline to zero selection for very low prey density close to the edge (a and c), but this effect vanished already slightly further inside the territory (b and d). Thus, effects were predicted only at the most extreme ends of the attribute values. Furthermore, standard errors of parameter estimates were large such that Wald-type confidence intervals of the parameters would overlap zero (Table A2).

Runtime benchmarks
Here, we provide an overview of the computational load of the model fitting. The measurements provided here result from test runs on a PC with Intel Core i5 2.3 GHz processor. We used data of wolf 220. Model fitting required optimization of the likelihood function over a multidimensional parameter space. For the null model, there were only two parameters to estimate, while the most complex model contained eight parameters (those presented in Table A2, plus σ and λ). One evaluation of the likelihood function required approximately 0.54 seconds.
Most of this runtime can be attributed to computation of the normalization constant (denominator) in eqn 1. During the optimization routine, the likelihood function has to be evaluated many times. The amount of calls to the likelihood function necessary until the optimization routing converges varies depending on which model is fitted. During our test runs, the null model required up to 93 calls to the function; the territory surveillance model (H 2b ) required up to 1147 calls; the most complex model required up to 2000 calls (which we had set as maximum number of iterations in the optim function in R). To find a global maximum in multidimensional parameter space, it is customary to perform the optimization multiple times with varying starting values. We used 20 starting values and used the resulting optimum for the final optimization. Thus, if we use 1000 iterations until convergence as a baseline, one model fit requires approximately 190 minutes. We had 10 models (Table 1), not counting the additional runs with the most complex model (Table A1), and six wolves, resulting in an approximate runtime of 8 days. Note that the actual runtime, however, can be shortened by running the analysis simultaneously on multiple CPUs.