The challenges of estimating the distribution of flight heights from telemetry or altimetry data

Global positioning systems (GPS) and altimeters are increasingly used to monitor vertical space use by aerial species, a key aspect of their ecological niche, that we need to know to manage our own use of the airspace, and to protect those species. However, there are various sources of error in flight height data (“height” above ground, as opposed to “altitude” above a reference like the sea level). First the altitude is measured with a vertical error from the devices themselves. Then there is error in the ground elevation below the tracked animals, which translates into error in flight height computed as the difference between altitude and ground elevation. Finally, there is error in the horizontal position of the animals, which translates into error in the predicted ground elevation below the animals. We used controlled field trials, simulations, and the reanalysis of raptor case studies with state-space models to illustrate the effect of improper error management. Errors of a magnitude of 20 m appear in benign conditions for barometric altimeters and GPS vertical positioning (expected to be larger in more challenging context). These errors distort the shape of the distribution of flight heights, inflate the variance in flight height, bias behavioural state assignments, correlations with environmental covariates, and airspace management recommendations. Improper data filters such as removing all negative flight height records introduce several biases in the remaining dataset, and preclude the opportunity to leverage unambiguous errors to help with model fitting. Analyses that ignore the variance around the mean flight height, e.g., those based on linear models of flight height, and those that ignore the variance inflation caused by telemetry errors, lead to incorrect inferences. The state-space modelling framework, now in widespread use by ecologists and increasingly often automatically implemented within on-board GPS data processing algorithms, makes it possible to fit flight models directly to the output of GPS devices, with minimal data pre-selection, and to analyse the full distribution of flight heights, not just the mean. In addition to basic research about aerial niches, behaviour quantification, and environmental interactions, we highlight the applied relevance of our recommendations for airspace management and the conservation of aerial wildlife.

of animals in horizontal space has dominated ecological studies [1], however the vertical dimension is also important for flying animals, and for that matter also diving and tree-climbing animals [2][3][4][5]. For example, flight height data document the vertical niche and community ecology of aerial foragers [6,7]. Flight height data quantify the flight strategies and associated energy allocation tactics [8,9], and their relationships with environmental factors (e.g., [10]). Lastly, from an applied perspective, we need an accurate, error-free description of the distribution of birds and other animals in the aerosphere to avoid collisions with man-made structures and aircraft, in the current context of increasing human encroachment into the airspace [11,12].
However, monitoring vertical airspace use by wildlife remains challenging. Ground-based surveys are limited in their field of vision and time window. Airborne monitoring (e.g., from glider planes) is logistically challenging and constrained by weather conditions. Radar-based methodologies are not usually specific enough to assign records to species (but see [13,14]). Animal-borne tracking methodologies such as global positioning systems (GPS) and altimeters have therefore become popular to monitor flying species [15]. They record data even when the animals are out of sight for ground-based observers, over extensive, potentially uninterrupted periods of time, and with no uncertainty about which species or individuals are being monitored. For example, we can record raptors soaring over the high sea at night [16]. However, the data that GPS and altimeters record are not error-free [17][18][19][20]. Usually, a few unambiguously erroneous positions are recorded beyond unpassable barriers like the ground [10,[21][22][23][24][25], making the occurrence of errors particularly more obvious in flight height data than other movement tracking data.
Most of the research into ways to deal with sampling errors in positioning data has focused on horizontal animal movement [20,[26][27][28]. There is very little guidance for ecologists about the challenges specific to vertical space use data [29]. Many practitioners consider that vertical movement data need to be "filtered" before analysis, i.e., they discard some records before proceeding with the analysis. They may discard records that are too far from preceding ones (as often done for horizontal data [27]), too far beyond impassable barriers [24,25], or obtained from an unreliable configuration of the GPS satellite network [29]. Instead of discarding the more erroneous records, researchers have also sometimes chosen to reset them to plausible values [21,23]. However, when applied improperly, such filters can have undesirable consequences. We start by reviewing the sources of error in GPS and altimeter flight height data. Next, we reanalyse case studies into the flight height of three raptor species [10], and complement them with novel data from controlled field trials and from simulations, in order to illustrate the stakes of proper error-handling in vertical airspace use data.

Review of the sources of error in flight height data from GPS and altimeters
Throughout we refer to flight height h, which is the distance to the ground below the bird, different from flight altitude z. The flight altitude denotes the distance to a reference altitude, often the ellipsoid, i.e., a geometrically perfect (but simplistic) model of the sea level, as documented by the World Geodetic System (WGS84 or EPSG:4326). Alternatively, some GPS devices may provide the altitude relative to the empirical sea level, as measured at a reference point over a reference period. For example, in France the "NGF-IGN 1969" norm means that altitude is measured relative to the mean sea level in the port of Marseille between 1884 and 1896. Alternatively again, some GPS devices may measure the altitude relative to the geoid, which is a model of the sea level if it was only influenced by the local gravitational field and the rotation of the Earth, i.e., without the effect of landmasses and wind [30]. There are databases and simple formulae to convert from one system of reference to another, but this nevertheless represents a first potential source of error in flight height data.
Flight height above the ground is computed as where z DEM x, y is the ground altitude predicted by a digital elevation model (DEM) at the recorded horizontal position x, y , in the same system of reference as z. Errors in h can then be caused by errors in any of the three components: z , z DEM , or x, y ( Fig. 1). Importantly, depending on the application, researchers might want to study z not h [8,9]. In the list below, only the first and second sources of error influence z. The other three influence h but not z.

Error in z when z is given by a GPS
If recorded by a GPS, z is affected by the "user equivalent range error" (UERE) and the "vertical dilution of precision" (VDOP) [31,32].
The UERE stems from diffusion and diffraction in the atmosphere, reflection from obstacles, and receiver noise [31,32]. The acronym UERE usually directly refers to the root mean squared error, but here we will use the notation σ UERE instead. σ UERE is usually in the order of a few meters and considered constant over time for a given device. Some GPS manufacturers specify the horizontal σ UERE , or alternatively it can be estimated from the data h = z − z DEM x, y , [33]. The σ UERE is however reputedly larger in the vertical axis than the horizontal axes [19,34], meaning that manufacturer-provided σ UERE should be considered conservative for vertical applications and should be used with appropriate caution.
The vertical position dilution of precision factor (VDOP) quantifies the effect of changes in the size and spatial configuration of the available satellite network on the precision of GPS records [31,32] (Additional file 1: Fig. S1). The more satellites are available and the more evenly spread apart they are, the more reliable the positioning is. Some GPS manufacturers do provide a VDOP value for each record, but many only provide a more generic DOP value.
When σ UERE and VDOP are known, the error-generating process can then be approximated by a Gaussian process with time-varying standard deviation σ z (t) = VDOP(t) · σ UERE (Eq. 6.45 in [32]). Therefore, the DOP is not a direct index of precision. The spread of the error distribution increases with the DOP, but the error on any given record is stochastic. The DOP is therefore not intended to be used as a data filter (e.g., discard any data with DOP above a given threshold), but instead it should be used to model the error-generating process.

Error in z when z is given by an altimeter
If recorded using an altimeter, z is computed from the barometric pressure, using the formula z = c · T · log (P REF /P) [35,36]. c is a calibration constant that mostly depends on the composition of the air (e.g., percentage of vapour) and on the gravitational field. T is the air temperature in Kelvin, P is the air pressure, and P REF is the air pressure at an elevation of reference (both pressures in mbar or in Pascal). However, this formula only holds when the atmosphere is at equilibrium. Changes in temperature, pressure, and air composition, i.e., the weather, alter the link between z and P. These influences are difficult to control fully because one would need to measure the weather variables both where the bird is, and at the reference elevation immediately below the bird. In other words, altimeters can be more accurate than GPS to monitor flight height, but only over short periods of time when the weather can be considered constant and the altimeter is calibrated for that weather. One should ideally regularly re-calibrate the altimeters using direct observations of flight height and accurate measures of P REF and T. Unfortunately, field calibrations are rarely feasible in practice (but see [37,38]). The consequence is that altimeters are often miscalibrated. The degree of miscalibration depends mostly on the weather. This generates temporal autocorrelation in the error time series. Over a restricted time period, the error patterns are thus more akin to a bias (a systematic over-or underestimation of flight height) than to an error in the statistical sense of a zero-mean, identically and independently distributed random process. Importantly, altimeter data still allow one to compute the derivative of flight height, i.e., climb rate, because the amount of bias can be considered constant over short periods of time. In the following (cf. "The magnitude of vertical errors in GPS and altimeters" in "Results"), we will directly compare the errors from GPS and altimeters using controlled field experiments. x, y is also affected by a user equivalent range error and a dilution of precision (Fig. 1). The horizontal error in x, y can thus also be described as a Gaussian process with time-varying standard deviation: σ xy (t) = 1/ √ 2 · HDOP(t) · σ UERE . Note that we use here a horizontal dilution of precision factor, HDOP. An often-overlooked consequence of errors in the horizontal position is that they introduce flaws in the link to spatially explicit environmental covariates [39,40]. In particular, the ground elevation z DEM is extracted from a location x, y that is slightly different from the true location [24]. If the terrain is very rough, then the ground elevation at the recorded location x, y may be significantly different from the ground elevation below the actual location of the bird. In the following (cf. "Horizontal errors can cause vertical errors" in "Results"), we will use simulations to quantify the influence of horizontal errors.

Interpolation error in z DEM
z DEM is interpolated from discrete ground elevation measurements [41,42]. The ground elevation is measured at a few select locations, but it is interpolated between them. The result of the interpolation is then rasterized at a set resolution, and the result is the DEM. This process can be quite imprecise [41,42]. At a cliff, for example, the ground elevation may drop by several hundred meters within a single pixel of the DEM.

Errors in DEM base data
The original measurements from which DEMs are interpolated are not necessarily error-free either. These errors are assumed small relative to the other sources, however, there is, to our knowledge, not much information available about the base datasets from which DEM are interpolated and their precision.

Controlled field trials
To quantify the magnitude of the vertical error in altimeters and GPS devices, we conducted three controlled trial experiments.
First, we attached an "Ornitrack 25" GPS-altimeter unit (Ornitela) to a drone. We then flew the drone above the rooftop of the Max-Planck institute in Radolfzell, Germany, at heights ranging from 0 (drone landed on the rooftop) to 90 m. We conducted 6 flight sessions over 2 days, each lasting between 15 and 140 min, collecting one record every 10 min for a total of 30 records, flying between 0 and 100 m above the rooftop. We also monitored the air pressure and temperature on the rooftop, which we used to recalibrate the altimeter post hoc. Lastly, the drone carried a separate, on-board, altimeter.
In a second, separate experiment, we attached two "Gipsy 5" GPS units (Technosmart) to an ultra-light aircraft, with a vertical distance of 1.8 m between the two units. We then flew the aircraft near Radolfzell while the two units simultaneously tracked its flight height, collecting one record per second for a total of 11.5 h over 5 days, flying between 0 and 243 m above ground.
Third, we compared the vertical positions recorded by four different units from three different manufacturers: Technosmart (AxyTrek and Gipsy 5), Microwave (GPS-GSM 20-70), and Ornitela (GPS-GSM Ornitrack 85). We (RG and OD) carried these units to 21 known geodesic points, of which the altitude was precisely documented by the French National Geographic Institute. The units recorded their position once every minute for a total of 894, 934, 560, and 563 data points, keeping only the unit * location combinations that yielded more than 25 fixes. We computed the bias and root mean squared error of the vertical measurement by comparing these data to the actual, known altitudes of the geodesic points. Importantly, the manufacturers do not use the same reference to compute the altitude: microwave uses the geoid (WGS 84 EGM-96 norm), whereas the others use the mean sea level (assumed to correspond to the local reference, meaning the NGF-IGN 1969 norm, but see below). We expressed all altitudes in the same norm before computing biases and errors, and accounted for sampling effort (number of fixes) and location when comparing the performance of different units.

Simulations of flight tracks
We simulated flight tracks that followed Ornstein-Uhlenbeck processes [43]. This is a well-studied class of continuous-time stochastic models, which is not specific to vertical movement or even to movement [43]. In the case of vertical movement, the parameters of an Ornstein-Uhlenbeck process represent the mean flight height, the variance in flight height, and the autocorrelation time. The mean flight height varied from 10 to 800 m (drawn from a uniform distribution). The variance in flight height varied from 10 to 750 m 2 (6 values within this range). The autocorrelation time varied between 0.1 and 1.5 arbitrary time units (uniform distribution). We transformed the raw Ornstein-Uhlenbeck simulations using an atanh link as described by Péron et al. [10] to enforce positive flight height. Because these are simulations, we then knew both the true flight height and the recorded flight height, which is the true flight height plus an independent and identically distributed zero-mean Gaussian error.

Simulations of synthetic landscapes
The objective was to quantify the influence of horizontal errors. We generated synthetic landscapes of varying complexity and roughness (Additional file 1: Fig. S2). We then transposed the flight track of a lesser kestrel Falco naumanni over these synthetic landscapes. The individual originally flew over extremely flat terrain (the Crau steppe in France). The data (P. Pilard and OD, unpublished) were collected every 3 min using a Gipsy 5 GPS unit from Technosmart, and processed through the statespace model of Péron et al. [10] to account for real sampling errors before use. We then added simulated random telemetry noise of controlled standard deviation.

Raptor case studies
We reanalysed the data from Péron et al. [10], where the field procedure, data selection, and data analysis procedures are described in full. Briefly, we studied three species of large soaring raptors: Andean condors Vultur gryphus (five juveniles, 1692 individual days of monitoring, 15 min interval), Griffon vultures Gyps fulvus (eight adults, 2697 individual days, 1-5 min interval), and Golden eagles Aquila chrysaetos (six adults, 3103 individual days, 6-10 min interval). After applying the analytical procedure, for each data point, we could compare the corrected position, an estimate of the true position, to the recorded position, which was affected by the sources of errors we listed under "Background".
For the condors, we selected the period between 11:00 and 15:00, which concentrates condor activity and therefore flight time, and discarded other records. For the vultures, we selected the period between 09:00 and 16:00. For the eagles, we selected the period between 08:00 and 17:00 and, because a lot of time is spent motionless in this species even during their core activity period, we further removed all the records that were less than 15 m from the previous record. We acknowledge the arbitrary nature of this data selection and emphasize that it is not necessary or even recommended to apply such filters before analysis. We, however, stress that in the context of the present study, the case studies perform an illustrative function, meaning that we use them to highlight the effect of improper error-handling, at least during the particular time periods that we selected for analysis because we consider them relevant for biological inference, and that the same analytical procedures can indiscriminately be applied to other time frames.

Collision risk
In several instances, we will illustrate the potential effect of improper data-handling on management recommendations by estimating the risk of collision with wind turbines as the proportion of records between 60 and 180 m above ground (assuming no behavioural adjustment in the presence of wind turbines). Collision risk estimated from GPS tracks is increasingly used to make recommendations about the choice of locations for new turbines, or to schedule the operation of existing ones. We expected that the estimated collision risk would depend on flight parameters (mean flight height, variance in flight height), on the magnitude of errors, and on error-handling. For example, a large variance in flight height might lead to a high collision risk even if the mean flight height is beyond the collision zone. Improperly handled errors may lead to positions being erroneously recorded in the collision zone when the birds actually flew outside of it, and vice versa. The same type of thinking could be applied to other types of collision risk, e.g., antennas, utility lines, buildings with bay windows, except that the collision zone would be at a different height.

The magnitude of vertical errors in GPS and altimeters
During the first controlled field trial (with the drone), DOP values between 1.2 and 1.6 indicated that the configuration of the satellite network was reliable throughout. Nevertheless, 6.7% of the GPS flight height records were below the rooftop height, i.e., obviously erroneous. For the altimeter, with default settings, 10% of the records were below the rooftop height. The default settings of the altimeter therefore did not correspond to the atmospheric conditions during the experiment. The standard deviation of the difference between the recalibrated altimetry and the GPS data was 22 m, between the recalibrated altimetry and default-setting altimetry it was 14 m, and between the recalibrated altimetry and the onboard drone altimeter it was 19 m. This means that, with default settings, the altimeters had approximately the same precision as the GPS.
During the second controlled field trial (with two GPS units attached to the same aircraft), in 35% of cases, the lower unit was erroneously recorded above the higher unit. The standard deviation of the difference between the heights recorded by the two units was 7.1 m. The highest of the two units recorded 3% of negative flight heights. The lowest unit recorded 13% of negative flight heights.
During the third controlled field trial (with GPS units carried to a geodesic point of precisely known altitude), the mean absolute bias of the vertical measurement was 27 m on average across units and locations. The root mean squared error ranged from 14 to 42 m depending on the unit, with a small effect of location. However, the within-session standard deviation ranged only to 28 m, suggesting that a bias in the sea level reference point (probably incorrectly assumed to follow the French norm) inflated the RMSE. The average bias ranged between − 17 and + 12 m depending on the unit, after correcting for significant location effect, but without effect of altitude. Overall, this means that different brands of GPS devices yield different rate of error in their altitude measurements, which can impair the comparison of datasets collected by different devices. We thereby recommend accommodating the device-specificity of the error-generating process at the data analysis step, and also that the devices record their VDOP at each record (cf. "Statistical solution" below). Further investigation or communication with manufacturers should decipher whether this stems from different fix acquisition procedures (e.g., satellite detection) or different post-processing algorithms, and should also make clear which sea level reference point different manufacturers are using.
These controlled field trials, along with other similar reports [22,34], highlight that even in benign conditions, GPS and altimeter data are sufficiently error-prone to tamper with ecological inference in many cases (range of the standard deviation of the error: 4-50 m). The issue is only suspected to be more acute in operational conditions when the DOP is larger, the terrain rougher, the weather more variable, and there are more obstacles to signal diffusion than in controlled field trials. Furthermore, the rate of error depended on the brand of the unit and on the location, which can be of importance when comparing across studies.

Horizontal errors can cause vertical errors
In the synthetic landscape simulations, the frequency of negative flight height records increased with the standard deviation of both the horizontal and vertical telemetry error (Additional file 1: Fig. S2a), and with the landscape roughness and complexity (Additional file 1: Fig. S2b). However, the various sources of errors acted in a multiplicative way, so that even when the telemetry noise was small (SD of 1 m), the error in h could be large (SD of 20 m; Additional file 1: Fig. S2c; darkest grey curve). Perhaps unexpectedly, when the horizontal error was large, the error in the height above ground tended to be independent of the vertical error in the GPS (on average across all simulations; Additional file 1: Fig. S2c; lightest grey curve). This means that the effect of the horizontal error in the GPS can supersede the effect of the vertical error, if the terrain is rough. Even in the absence of any vertical error, the horizontal error was indeed routinely sufficient to cause 10-20% of the data points to be below ground (Additional file 1: Fig. S2a).

Errors inflate the recorded variance in flight height
In the simulations of flight tracks, errors in h inflated the variance in the distribution of recorded flight heights, i.e., the variance in the true flight height was consistently lower than the variance in the recorded flight height (Fig. 2). In the raptor case studies, we obtained the same result, with the caveat that we did not have access to the true flight height, but we could instead use the corrected flight heights (Fig. 2).
Indeed, if the vertical movement and error processes were independent, the total variance in flight height would exactly correspond to the sum of the movement and sampling variances (e.g., [44]; see also [45] and references therein). When the vertical movement and error processes are not fully independent, the total vertical variance is still larger than the vertical movement variance. In other words, the total vertical variance is a biased estimate of the vertical movement variance that confounds telemetry errors with rapid movements. The animals would appear more vertically mobile and with a more spread-out distribution in the aerosphere than they actually are. This type of issue is potentially quite widespread in other areas of movement ecology that pertain to horizontal movement as well, e.g., in behavioural assignment exercises that use movement variances (daily displacements, turning angles, etc.) to determine the behavioural state of animals.

Negative flight height records provide useful information
In this section we focus on negative records, i.e., unrealistically low records, but the same logic can be applied to unrealistically high records. Negative flight height records are more likely to occur when animals are near the ground, either perched or flying. If we remove the negative records [29], perching and low flight are under-sampled in the final dataset [21]. To illustrate this point, we used a GPS-tracked flight path from a migrating juvenile osprey (Pandion haliaetus) as it crossed the sea between the Italian mainland and Corsica [16]. During a portion of that sea crossing, its Ornitela GPS unit recorded flight heights that oscillated between − 2 and − 7 m below the sea level (Additional file 1: Fig. S3, inset). The amplitude of the oscillation suggested that the bird followed the swell of the waves. The complete sequence (Additional file 1: Fig. S3) depicts a progressive loss of altitude as the bird glided towards firm ground, and a period of active flapping flight (as per the accelerometry record) very low above the waves once the bird had lost all of its accumulated potential energy before reaching firm ground. These negative flight height records documented a critical time period. First, the risk of having to make a sea landing were clearly much greater in the few minutes when the osprey was flying low over the waves, compared to the rest of the sea crossing when the bird was often soaring high [16]. In addition, when flying low, the bird had no other choice than to flap and therefore expend energy; whereas when higher above the sea, the bird had the option to soar and therefore spare energy. It is critical that negative flight height records like these are maintained, even if, instead of a fully interpretable high-resolution sequence like in this example, there are just a few isolated negative flight height records in a low-resolution dataset.
In addition, if we only kept the records with positive flight height, we would obtain a biased sample of the distribution of flight height. Both in simulations and in the raptor case studies, discarding negative flight height records led to the overestimation of the mean flight height in the remaining dataset, the underestimation of the variance in flight height, the introduction of a right skew in the distribution of flight height, and the overestimation of the collision risk (Fig. 3). The latter result was because negative records mostly occurred when the bird was flying below the collision zone, and thus removing negative records led to under-sample safe periods of time. Note that this particular result pertains to the wind turbine application and d skewness of the distribution of flight height. A percentage bias of + 10% means that the focal quantity is 10% larger after we remove the negative records case only; in other types of collision risk, e.g., buildings and utility lines, the collision zone starts closer to the ground.
The simulations nicely complemented the raptor case studies by (1) eliminating any debate about whether the corrected flight heights in the raptor case studies were trustworthy or not (in the simulations, the true flight heights are exactly known) and (2) increasing the range of flight behaviours, since the raptors tended to exhibit lower percentage of time near the ground (in part because we purposely tried to exclude time spent perched) and different distributions of the sampling error. The amount of bias appeared highly dependent on the underlying flight behaviour and error distribution, and therefore not easy to predict and account for without appropriate error-handling methodology.
Additionally, there are many other major consequences of discarding negative flight heights. One is the disruption of the expected balance of positive and negative errors in the remaining data. Negative flight height records only arise when the error is negative, and so removing them introduces a bias towards positive errors, thereby disrupting the shape of the distribution of errors in the remaining data. Yet, we need the full range of errors to fit the statistical solution that we support (cf. "Discussion"). Another, unrelated consequence is the disruption of the sampling schedule of the remaining data. Many movement analyses are critically sensitive to the sampling schedule, and therefore their outcome will not be the same after removing the negative records. Lastly, and perhaps most importantly, negative flight height records can help fit the models that separate the error and movement processes, because they are unambiguously erroneous and can be informed as such in the model-fitting procedure (cf. "Statistical solution" in "Discussion"). Some authors have applied less stringent filters, such as removing only the most negative flight height records and removing an equal amount of extremely positive flight height records. While the effect on the remaining distribution, and on the balance of negative and positive errors is supposedly weaker than if removing all of the negative records, we warn that the remaining records are still affected by the same error process that generated the records that were deemed too erroneous to keep, thus the issues from the previous section ("Errors inflate the recorded variance in flight height") still need to be addressed. In addition, these extremely erroneous records are potentially the most informative regarding the shape of the error distribution (cf. "Statistical solution" in "Discussion").

The mean flight height is not sufficient to describe the distribution of flight heights
Flight height datasets are often reduced to a single summary metric, the mean flight height and its variation with environmental and individual covariates [29,[46][47][48][49]. This decision is mostly based on the ease of implementing spreadsheets, linear models, moving averages, or spline models. In this section, we instead call for approaches that describe the full distribution of flight heights in the aerosphere, not only the mean flight height. To justify this call, we again focus on collision risk estimation. Indeed, if the variance in flight height is large enough, a proportion of time may be spent in the collision zone even if the mean flight height is outside the collision zone. In simulations, the proportion of time spent in the collision zone indeed depended on both the mean and the variance in flight height (Fig. 4a, b). In the raptor datasets, the estimated probability of flying in the collision zone did not decrease much for the individuals whose mean flight height was estimated above the collision zone (Fig. 4c). Similarly, the individuals that had an estimated mean flight height well below the collision zone were predicted to spend about 20% of their time in the collision zone (Fig. 4c). We strongly recommend that collision risk forecasts should not be based on the fixed effects of linear models, but instead on the full distribution of flight heights-a recommendation that will likely hold for all studies into vertical airspace use.

Statistical solutions
Our results illustrate how the improper treatment of vertical errors in telemetry data can flaw the inference about the use of the aerosphere by flying animals. To avoid these issues, the state-space model framework [50] (Fig. 5) has a structure that is naturally aligned with the challenges of sampling errors in vertical space-use data. A state-space model is a stochastic model describing the changes over time in a state variable (here, the true flight height), when that variable is imperfectly observed (here, the recorded flight height). There is a "state process", separated from an "observation process" (Fig. 5). State-space models are routinely used to correct for positioning errors in satellite-tracking data (chap. 6 in [32]), including in wildlife studies [20,26,33,[51][52][53]. Importantly, these applications are not to be confused with another application of state-space models to movement data, when the focal state variable is a "behavioural state" whose Markovian transitions drive changes in movement rates [8,9,54]. Indeed, when the objective is, like in this study, to correct for positioning errors, the state variable is the position itself.
In studies of flight height, the movement model can be set up such that the state variable always stays above zero. Then, if the recorded flight height is − 7 m, the model "knows" that the error was at least 7 m [22], as for example was the case in the osprey example (Additional file 1: Fig. S3). Actually, the presence of unambiguously erroneous records makes flight height studies bettersuited to apply state-space models than many studies into horizontal space use by animals. Indeed, even when in theory the model is estimable, sometimes only a subset of the parameters of a state-space model are separately estimable, a phenomenon called "weak identifiability" that occurs when the sampling variance largely exceeds the process variance. An example of weak identifiability is when the difference between two classes of individuals are larger than the differences within the classes [55]. In addition, there are large statistical correlations between variance parameters in a movement model [52], making it extra difficult to accurately separate movements and errors in sparse datasets. In that context, unambiguously erroneous records, such as negative flight heights, represent an additional source of information [20]. They can help separate the process and sampling variances [10] and solve issues of weak identifiability.
As a perspective, we stress that there are also ways to obtain unambiguously correct records. These records could in theory perform a role similar to that of unambiguously erroneous records. For example, sometimes the position of the animals can be confirmed, e.g., at a documented feeding site, a nest, or by an incidental groundbased sighting. Those records can be matched to the GPS track, yielding an exact measure of the local error. Animal-borne devices may also include a transponder designed to signal passage near strategically placed emitters (e.g., [56]). This type of validation data is routinely used in other applications of the GPS technology [32]. Lastly, the state-space framework is naturally conducive to the joint analysis of multiple sources of error-prone data (e.g., [57]). In flight height studies, it is therefore possible to jointly analyse GPS and altimeter data, or multiple GPS streams coming from the same animal. This double-data approach is expected to help with statistical covariance issues, but cannot be expected to fully resolve all identifiability issues [58], which only error-free validation data can do. We should eventually stress that several wildlife GPS manufacturers already use a state-space model as part of the on-board data pre-processing algorithm, i.e., the released data have already been corrected by a proprietary state-space algorithm which may furthermore rely on proprietary validation data (Ornitela staff, pers. comm.). From our experience, in wildlife applications, these pre-processing algorithms are only applied during "bursts" of high-frequency data acquisition, not when the users request a more traditional low-frequency data acquisition schedule. Importantly, the data may not be pre-processed across bursts. The error from the first location of a burst is then carried over the entire burst sequence. Flight height tracks affected by this issue would exhibit a staircase-shaped profile. Overall, this type of data pre-processing trades a lower error variance against a larger error autocorrelation. Additional statespace modelling of the released pre-processed data can deal with this type of error autocorrelation, but the models need to be custom-made, i.e., are not routinely implemented in software. Perhaps more worryingly, some commercially available GPS units apparently simply truncate the recorded height at zero above sea level (pers. obs.). We call for a more open approach to these data manipulations, including making the raw, unprocessed GPS records available, in addition to any pre-processed data, and with a formal description of the pre-processing algorithm. Indeed manufacturers may not be aware of the specificties of vertical animal movements. Vertical movements are faster and less temporally autocorrelated than horizontal movements, and they depend on specific environmental covariates [10], making it necessary that end users obtain the unprocessed flight height data to parameterize the most ecologically relevant models.
We also acknowledge that the fitting of state-space models to space use data still requires relatively rare statistical skills. Nevertheless, there are already several free, open-source computing environments to fit state-space models to vertical (and horizontal) movement data, and thereby estimate the most likely movement track as a byproduct of the estimated parameters, similarly to how the individual values would be computed in a generalized mixed model with individual random effects: • The crawl [33] and ctmm [59] packages for R compute the likelihood of the state-space model using a Kalman filter. This algorithm is fast, but requires all the model processes to be Gaussian or approximately Gaussian (no truncation or constraint, no excess extreme values, no excess kurtosis or skew). • The TMB package for R [60] approximates the likelihood of the state-space model using the automatic differentiation algorithm with Laplace approximation. That approach makes computing times shorter than the next option, while still allowing for flexible modelling such as non-Gaussian errors [26], custom link functions [10], or multiple data streams. • The Monte Carlo Markov Chain Bayesian framework [61][62][63] generates parameter distributions that iteratively converge towards the solution. This option is the most flexible in terms of nonlinearities and non-Gaussian features, such as truncated distributions [20], but the computing time can be prohibitive for large datasets.

Data requirements and data quality checks
The state-space model-fitting procedure simply require the h or z time series along with the timestamps [10,26,33]. The interval between records needs to be sufficiently short that the effect of the temporal autocorrelation is visible, which in practice for raptors means an interval below 1 h and ideally below 30 min [10]. The observation error must not largely exceed the movement variance, otherwise the state-space model is likely to become unidentifiable. In practice, researchers may therefore find the following rough data quality checks useful: check that the median interval duration is < 1 h (ideally < 30 min), that the number of fixes per day is > 4, and that the proportion of records with negative flight height is < 50%. In addition, if there are very short intervals (< 1 min) we recommend incorporating into the movement model some temporal autocorrelation in the vertical velocity, in addition to temporal autocorrelation in the vertical position. If available, the VDOP or other metrics of triangulation reliability can predict the observation error in the statespace model, using a log-linear link between the standard deviation of the observation error and the VDOP, which should help with model fitting. Similarly, in case the researchers know for sure that there was no error on some of the records, they can fix the error parameter to zero for these records, which should also help with model fitting. On the other hand, the information that some recorded values are impossible is coded up using adequate link functions [10] and would thereby automatically inform the model about the minimum magnitude of the error on the involved records. Lastly, for a better fit to the data, environmental covariates that are expected to correlate with movement velocity or movement behaviour can also be incorporated using linear links with the movement model parameters (autocorrelation time, diffusion rate).

Conclusion
Improper error-handling methodologies yield a flawed picture of aerial niches. For example, discarding negative flight height records artificially truncates the observed distribution of flight heights (Fig. 3), and focusing on the mean flight height alone (for example when using linear models) does not fully describe the aerial niche (Fig. 4). While these observations are quite intuitive, bad practices remain common enough that it was important to stress these issues and illustrate them thoroughly. On the other hand, not addressing the occurrence of errors at all would artificially spread-out the observed distribution of flight heights (Fig. 2), leading for example to increased observed vertical overlap between species and individuals, which can modify the inference about community processes. Improper error-handling procedures would also tamper with the quantification of behaviour and flight strategies, by increasing or decreasing the observed vertical velocity, and interfere with behavioural state assignments. Lastly, errors may covary with environmental covariates. GPS positioning accuracy decreases with terrain roughness [19]. Thereby, selectively discarding records based on the number of available satellites or the dilution of precision would lead to biased sampling of terrain roughness. Wind speed decreases near the ground [64]. Discarding negative flight height records (that predominantly occur near the ground) would lead to misrepresent the relationship to wind speed.
Regarding applied consequences, we focused on demonstrating how improper methods would imperfectly quantify the time spent by GPS-tracked raptors in the rotor-swept zone of wind turbines (Fig. 3b). There are many other human-wildlife conflicts for the use of the aerosphere, for example bird strikes near airports and disturbance of wildlife by drones and other recreational aircraft. Regarding bird strikes, GPS-based predictive models of bird flight height (e.g., Péron et al. [10]) might help plan ahead the operation of airports. The state-space class of model that we advocate is actually already used, in real time, to exploit bird activity data from radar monitors and generate a warning system for airport managers [65]. Regarding recreational aircraft and drones, analysing bird-borne GPS tracks may help reveal the effect of the disturbance, which is expected to increase in frequency as drones in particular become more popular [66]. The recommendations we made about the effect of errors on the estimation of aerial niche overlaps and the quantification of behaviours seem particularly relevant in this context.
In conclusion, the issue of properly handling errors in flight height data is key to any aeroecology study. We strongly advise against ad-hoc "data quality" filters, and against statistical tools that only document variation in the mean flight height instead of the full distribution of flight height. Our proposed statistical framework based on state-space models and the analysis of the full distribution of flight heights requires interdisciplinary work between experts in flight behaviour and experts in data analysis, and the emergence of interface specialists, but the insights and the applied decisions based on those insights are expected to be more reliable.
Abbreviations h: Flight height above ground; z: Flight altitude (relative to the same reference as the DEM, e.g., the ellipsoid); DEM: Digital elevation model; UERE: User equivalent range error; DOP: Dilution of precision; SD: Standard deviation.