Airborne Laser Scanning for calibration and validation of inshore satellite altimetry: A proof of concept

Recent developments in satellite altimetry are leading to improved spatial resolution, allowing applications in the coastal zone and over inland waters. Validation of these sensors near the shore remains a challenge, since the process of upscaling from single point measurements (gauges or GPS buoys) to the radar altimetry footprint is a source of uncertainty. Meanwhile, Airborne Laser Scanning (LIDAR) has been proven capable of delivering accurate water surface heights rapidly over large areas. Here, we show a proof of concept by comparing airborne LIDAR heights over Lake Balaton, Hungary with near-concurrent Envisat and Jason-2 altimeter heights and water level gauge data. The accuracy of LIDAR heights was improved by strip adjustment and absolute georeferencing to ground control points; waveform retracking improved the accuracy of altimetry data. LIDAR heights were averaged within the outlines of the altimetry footprints. Bias is measured for LIDAR and altimetry with respect to gauge heights, and standard deviation of heights measures the vertical dispersion of footprints within one track. Results show standard deviation of heights is in the order of millimeters for LIDAR and 40–50cm for altimetry and bias with respect to gauge heights is 5cm for LIDAR compared to 40cm for altimetry. We conclude that LIDAR may be used for calibration and validation of high resolution satellite radar altimetry over inland waters. © 2017 Published by Elsevier Inc.


Introduction
For decades, satellite altimetry has contributed to our knowledge of the geoid and ocean currents. In the last years, new processing methods have allowed using altimetry over coastal areas and inland water bodies as well (e.g. Calmant et al., 2008;Berry et al., 2005;De Oliveira Campos et al., 2001;Schwatke et al., 2015b).
With the arrival of new sensors such as SIRAL (SAR/Interferometric Radar Altimeter) on Cryosat-2 (launched 2010), which is a SAR (Synthetic Aperture Radar) altimeter, and AltiKa on SARAL (Satellite with ARgos and ALtika) (launched 2013) measuring with a Ka band radar, improved inland water level time series are possible (Verron et al., 2015;Nielsen et al., 2015;Villadsen et al., 2015). The Cryosat-2 mission has the downside of a long-repeat orbit (369 days) and SIRAL is not constantly measuring in SAR mode. The AltiKa instrument on SARAL is more sensitive towards atmospheric water content (Schwatke et al., 2015a) and is since July 2016 in a drifting orbit phase with the repetitive ground track no longer maintained.
In 2016, the new altimeter mission Sentinel-3 was launched which carries a SAR altimeter again but operates on a repeat orbit. New insights on the water cycle at local scales and applications such as lake level and river discharge monitoring are expected from this mission (Donlon et al., 2012).
In the near future, SWOT (Surface Water and Ocean Topography) will be a satellite altimeter based on interferometric RADAR, measuring over a swath and not only a profile, with the aim to observe river, lake and ocean height (Durand et al., 2010) with a resolution of tens of meters at a vertical precision in the order of a centimeter and height accuracy of 10 cm as written in the SWOT Science Requirements Document (Rodriguez, 2016). SWOT is planned to provide height for rivers wider than 100 m and lakes larger than (250 m) 2 , with launch expected in 2021 (Solander et al., 2016).
Verifying the accuracy of high-resolution radar altimeters over inland and coastal waters remains a challenge as this requires data from the same location in the same reference frame, but with better vertical accuracy (Bonnefond et al., 2011). The classical approach is using water gauges, which are linked by levelling to a terrestrial elevation network and a geoid model. Heights of onshore gauges have to be transferred to offshore coastal altimetry footprints through a detailed local geoid and tide model. Besides the potential inaccuracies of the tide gauge height itself and the problems of transferring the measured water surface height to an offshore location, comparing a single point measurement at a gauge to an elevation measurement of an altimetry footprint several kilometers across may generally be problematic (Morris and Gill, 1994). Proximity to the shore is known to affect not only local water surface topography, but also the altimetry measurement itself (Bonnefond et al., 2013). All in all, tide gauge-based calibration has delivered accuracies up to 1 cm of standard deviation (STD) for a single measurement point (Bonnefond et al., 2011).
For inland water bodies the problems of transformation between gauge position and altimeter footprint are different. For rivers, exact overlap between gauges and altimetry footprints is rare. For inland lakes in case of hydrostatic equilibrium, it can be assumed that the water level is equal everywhere over the lake with regard to the correct geoid model (Schwatke et al., 2015b), and exact colocation of the footprint and gauge is not required. However, since no geoid model is perfect, in some cases deviations may be observed (Zlinszky et al., 2014). Additionally, many if not most inland water gauges do not have an absolute height which is linked to a terrestrial height network. In these cases, it is only possible to compare water level changes derived by altimetry measurements with water level changes measured by the gauge, without absolute height calibration.
Alternatively, GNSS (Global Navigation Satellite System) buoys have been proposed for comparison (Bonnefond et al., 2003b;Watson et al., 2003). Whereas these are still affected by upscaling of point measurements to altimetry footprint areas, they are independent from the proximity to the shore, can be placed within altimetry footprints on demand, have the same height datum as altimetry (ellipsoidal heights optionally corrected with a geoid model) and their accuracy (around 1-3 cm) is usually adequate for calibration (Bonnefond et al., 2011(Bonnefond et al., , 2013. In order to overcome the problem of upscaling from point to area measurements, or to obtain high-resolution local geoid models for height transfer from shore to ocean, the use of ship-mounted GNSS receivers has been established for mapping sea surface heights across larger areas (Bouin et al., 2009). This method allows sufficient area coverage and spatial resolution, but accuracy may be problematic (STD between 22 cm for a large ship (Bouin et al., 2009) and 2.7 cm for a waveriding catamaran (Bonnefond et al., 2003a), bias 1.9 cm compared to tide gauges). Also, the time needed to cover the area of at least a single footprint with a ship-mounted sensor may be long enough for dynamic water surface height changes to influence the measurement.
Remote sensing techniques can provide rapidly collected, areacovering, high resolution, point-based elevation measurements. These could be an alternative or complementary data source to shore gauges or GNSS floats, provided they can deliver comparable accuracies in height independent from (but linked to) shore heights in the same reference frame as satellite altimetry measurements. Airborne Laser Scanning (also known as airborne LIDAR or ALS) is gaining ground as a technology for surveying terrestrial topography, with centimeter-scale ranging accuracies for the individual measurement points. National or regional scans often cover water bodies that are of interest for altimetry, and dedicated scans at planned locations and times are also affordable as the technology is now mainstream (Heritage and Large, 2009). With the onset of bathymetric ALS, coastal zones and lakes are also directly scanned (Pastol, 2011).
Oceanography is moving towards higher resolution and (Melville et al., 2016) suggest that ALS measurements of waves and sea surface height can be used in the calibration and validation of satellite altimeter missions. They synchronized an ALS acquisition with a Jason-1 overpass over the Gulf of Mexico, with time lags below 1 h. The common track length is about 1.75 • of latitude. Sea surface height and sea surface height anomaly were computed from the satellite radar altimeter and the airborne LIDAR data, with LIDAR averaged across the swath and along track. The RMS of differences was "a few centimeters". Significant wave height computed from both sensors also fits together in the order of 0.1 m.
The utility of airborne LIDAR to measure water surface elevations accurately has been proven by additional studies (Carter et al., 2001;Zlinszky et al., 2014;Marmorino et al., 2015;Mandlburger et al., 2015). Expected error budgets are within 10 cm for individual measurement points and in the same range for global georeferencing (Zlinszky et al., 2014), which can be improved by using ground reference data (Kager, 2004). Based on this high accuracy, further enhanced by the statistical redundancy offered by high measurement densities, LIDAR is expected to have potential for calibration and validation of satellite altimetry. Connor et al. (2009) have compared airborne LIDAR with Envisat altimetry over sea ice areas, and conclude that under favourable conditions (re-frozen lead surfaces with little snow cover) the two datasets match with mean differences around 1 cm. However, this setup did not involve in-situ water height measurement (GPS buoys or gauges) and to our best knowledge, no systematic three-way comparison of LIDAR water surface altimetry to satellite altimetry and water gauge heights was carried out yet.
Our objective was to develop and test a methodology for processing airborne LIDAR data as a basis for comparison with satellite radar altimetry, and to assess accuracies of both LIDAR and altimetry-derived heights with respect to near-synchronous water gauge measurements linked to a terrestrial levelling network. Based on the outcome of this comparison, we aim to establish airborne LIDAR as a sensor for calibrating high-resolution satellite water surface altimetry.

Data and methods
Satellite altimetry relies on emitting a short pulse of electromagnetic radiation in the nadir direction, measuring its travel time, and calculating the target elevation from this travel time and the position of the satellite platform (Fu and Cazenave, 2000). Airborne Laser Scanning works with the same principle at a different emitted wavelength (Wagner, 2010). However, due to higher pulse repetition rates, lower beam divergence and slower platform speed, instead of a single track at nadir, a wide swath can be covered by deflecting the laser pulse perpendicular to the flight direction in a systematic scan pattern. As a result, a near-equidistant point cloud of measurement footprints is created within the swath, with point densities typically in the range of 0.5 to 10 points/m 2 (Wehr and Lohr, 1999).

LIDAR data processing
For this study, we used ALS point clouds collected during a scan of Lake Balaton, Hungary on the 26th August 2010 (Zlinszky et al., 2011). The data were collected using a Leica ALS50-II sensor operating at 1064 nm wavelength, 4 ns pulse length, 0.2 mrad beam divergence and 40 • scan angle using an Applanix POS AV positioning system, collecting GNSS positions every second and inertial navigation system (INS) readings at 200 Hz. Over the open water, the data were collected as a by-product of airborne hyperspectral measurements targeting the lake water quality in N-S swaths (flying height 4500 m, point density 1 pt /5 m 2 , footprint diameter 1 m, pulse repetition frequency 29 kHz, scan rate 58 Hz). The coastal zone was covered in a dedicated campaign with an irregular pattern following the shoreline (flying height 1400 m, point density 1 pt/m 2 , footprint diameter 0.22 m, pulse repetition frequency 83.1 kHz, scan rate 45.1 Hz). The accuracy of individual height measurements is 2-5 cm (Leica Geosystems, 2006). The main source of the error, the relative orientation of the sensor system components (INS, GNSS and laser scanner) to each other as well as the GNSS itself (Habib et al., 2009;King, 2009) was corrected by relative and absolute georeferencing of the strips, further improving accuracy (Glira et al., 2015b).
Strip adjustment is an on-the-job calibration method to improve the geometric quality of LIDAR point clouds. As the observed scene is mostly static and sampled densely (open ground, building roofs, street surfaces, etc.) the overlapping areas of strips must have exactly the same elevations. Differences in observed elevation can be exploited to estimate improved parameters for the relative spatial orientation of the measurement system components, which are the laser scanner, the GNSS, and the INS. Systematic errors in the laser scanner observables, i.e. range and angle can also be estimated and corrected. With given ground control measurements (e.g terrestrially observed points on stable sloping surfaces), the absolute orientation (i.e. the datum) of the ALS strips can be improved.
In our case, this was carried out in three steps: first the fit of the high-density shore strips was improved by strip adjustment resulting in a median discrepancy of 0 cm and a STD of 5 cm between these LIDAR datasets (Zlinszky et al., 2014). Then a series of 393 ground control points were collected around the study area ( Fig. 1), using an RTK GNSS (Real Time Kinematic Global Navigation Satellite System) receiver (Topcon Tesla RTK). 180 of these points were measured by averaging 3 measurements, leading to 7 cm height accuracy (STD) and were distributed over 11 horizontal flat surfaces in immediate vicinity to the lake shore. Further 113 points were surveyed by averaging always 20 measurements, leading to 2 cm height accuracy, and distributed on 7 sets of road surfaces, each set sloping in 3 approximately perpendicular directions in order to establish both vertical and horizontal absolute position (Kager, 2004). These were located around the edges of the dataset. The Iterative Closest Point (ICP) algorithm (Glira et al., 2015a) was used for fitting the block of high-density shore strips to these control points, and in the third step, for fitting the low density strips covering the open water (but overlapping the shore) to the already precisely georeferenced shore dataset. The remaining errors had a median of 0 cm and a STD of 9.6 cm, which is partly a result of the low point density which did not allow strict filtering of natural vegetation from the control areas. In the next step, non-water areas were excluded based on the lake outline and non-water objects within (ships, vegetation) were removed based on a height threshold (±2 m above and below the mean water level). Where the incidence angle to the local water surface was perpendicular specular reflections resulted in very high intensity readings and an erroneous height offset, probably caused by range walk (Zlinszky et al., 2014). These data points were removed by excluding all points that had the maximum LIDAR intensity. In order to allow comparison to water gauge height measurements, which are the classical source of water surface height accuracy evaluation, ellipsoidal heights measured by LIDAR were converted to normal heights using the HGTUB2007 quasi-geoid model (Tóth, 2009).

Altimetry data processing
The LIDAR data was compared to altimetry data over the lake. Although no exact temporal overlaps were found between the airborne data collection and satellite overpasses, some datasets with a relatively short time lag were available.
This work uses the high-frequency Sensor Geophysical Data Record (SGDR) of the Envisat/RA-2 and Jason-2 satellite missions. The heights are corrected for the atmospheric delay caused by the ionosphere, and the dry and wet troposphere, the crustal motions caused by pole and solid earth tides, a geoid correction and the radial bias between different altimeter missions. The radial bias between the missions is determined over the ocean with a cross over analysis and interpolated over the continents (Bosch et al., 2014). It is not possible to compute the radial bias over land but the spatial variations of the bias are small and an interpolation over land therefore legitimate.
For a proper comparison between Envisat and Jason-2 measurements, we adopted correction models that were available for the data of both missions. Table 1 summarizes the corrections applied to the data. For the comparison to the LIDAR data, we employ the same geoid model, the regional HGTUB2007 quasi-geoid (Tóth, 2009), therefore the LIDAR and altimetry data are both in the same height system (WGS84 ellipsoidal heights corrected with the HGTUB2007 quasi-geoid).
In addition to the corrections, the ranges were retracked to obtain more reliable heights. We employed the Multi-Subwaveform Retracker (MSR) (Boergens et al., 2016) on the data. The MSR extracts all subwaveforms from the waveform and retracks each with a 50% Threshold Retracker (Gommenginger et al., 2011a). This leads to more than one resulting height per measurement. The waveforms over the lake are neither ocean-like nor peaking but are constituted from several subwaveforms. By extracting all subwaveforms and their according heights over the lake we are able to choose the heights which form a straight surface. Some measurements do not yield to a height near the water surface, these points were excluded.
Although no exact temporal overlaps were found between the LIDAR flights and satellite overpasses, some datasets with a relatively short time lag were available. We used data from the Envisat orbit 199, cycle 92, from 17th August 2010 with 9 days difference in time compared to the airborne survey, as well as heights from the Jason-2 orbit 94, cycle 79, collected on 28th August two days later than the corresponding flight day (Fig. 1).
We compare our resulting heights over Lake Balaton with the standard data retracked with an ocean retracker and with the ICE1 retracker. For the Envisat pass, we are able to reduce the standard deviation over the lake without outliers from 61 cm in the original data, 62 cm for the ICE1 retracker, to 42 cm with the data retracked with the MSR. See Fig. 2 for the comparison. In the Jason-2 standard data, only 4 measurements over the lake are available with a standard deviation of 89 cm. Although more points are in the ICE1 product, they show many outliers and have an along-track standard deviation of 6.37 m. With MSR the lake height could be identified from all footprints within the overpass with an standard deviation of 46.8 cm. The heights along track show no defined pattern like residuals from the geoid correction.

Comparing LIDAR and altimetry
In the comparison between altimetry and ALS, we only want to compare the ALS points elevations inside each altimetry footprint with the corresponding altimeter height measurement. The center of the footprint is given in the altimetry data. The footprint size mainly depends on the wave height of the water surface. The rougher the water surface the larger is the footprint size. According to Chelton et al. (2001) the radius of the footprint r o ut can be estimated with whereR 0 is the height of the satellite over the water, H w the wave height, c the speed of light, and R E the mean Earth radius. t is duration of the time the radar impulse illuminates the water surface. We decided to use as t the time between the start and end of the subwaveforms we used for the height determination. The start of the subwaveform marks the point where the radar first strikes the water surface. The end of the surface illumination with the radar is less well defined but we choose the end of the subwaveform to this end. With this we found footprint size between 2 and 3 km which agrees well with values used in Connor et al. (2009).
The wave height on the days of altimeter measurements was not directly measured but was be approximated by 10 cm based on wind measurement archives and the established relation between wind speed and wave height on Lake Balaton (Muszkalay, 1973). This approximation was also verified by the lack of waves observed in the LIDAR data points.
Individual footprints were exported as circular vector outlines (Fig. 1), and the mean heights and STDs of all filtered LIDAR points within these footprints were calculated for comparison. Therefore LIDAR measured the water surface elevation within nearly the complete area of the footprints, with no need for transfer in space between non-overlapping measurements and extrapolation only necessary in the cases where part of the altimeter footprint was not covered by LIDAR points.

Water level gauge data
Finally, as a ground truth, the lake height from the Balatonakali shore water gauge operated by Hungarian National Water Directorate (OVF) was also included in the analysis separately for each altimetry and LIDAR measurement day. The gauge height data are a product of RTK GNSS levelling performed directly at the gauge station in five repetitions averaging 20 measurements each, with STD of 0.2-0.3 cm. The official registered water levels of the lake are obtained from ellipsoidal heights from GNSS by converting to heights above sea level with the official VITEL transformation parameter set supplied by the Hungarian Institute of Geodesy, Cartography and Remote Sensing (FÖMI). In order to maintain consistency with the altimetry and LIDAR data, we used the WGS84 ellipsoidal heights from the GNSS levelling and added the local quasi-geoid height again based on the HGTUB2007 model. The vertical difference between this height system and the height system of the official water level at the position of the gauge is 2.1 cm.

Results
The resulting mean water level of the gauging station, the two altimetry passes and the LIDAR points are summarized in Table 2. Both the Jason-2 and the Envisat altimetry footprints are approximately 45 cm higher than the gauge-based water surface, but their standard deviations are around 0.4 m. Jason-2 showed lower mean bias (38 cm) and slightly higher STD of water surface heights (46.8 cm), Envisat measurements have a higher bias of 56.6 cm and marginally lower STD of 43.5 cm.
Mean LIDAR point heights within the altimetry footprints are also biased compared to the gauge heights but the heights were about 5 cm lower than gauge heights. For the LIDAR points within the Jason-2 footprints, the measured water surface height is 4.8 cm lower compared to the corresponding gauge-based water level. The STD between the mean LIDAR heights within the individual altimetry footprints is 0.4 cm, while the STD of the 1-2 million individual LIDAR heights within a footprint is 14.4 cm (see Supplementary Table  S1). The LIDAR measurements inside the Envisat footprints have a mean water surface height 4.6 cm lower than the gauge height, and the STD between each altimetry footprints is 0.3 cm. STD of point heights within the individual footprints amounts to 14.7 cm. Mean LIDAR point heights within the altimetry footprints are also biased compared to the gauge heights but the heights were about 5 cm lower than gauge heights. For the LIDAR points within the Jason-2 footprints, the measured water surface height is 4.8 cm lower compared to the corresponding gauge-based water level. The STD between the mean LIDAR heights within the individual altimetry footprints is 0.4 cm, while the STD of the 1-2 million individual LIDAR heights within a footprint is 14.4 cm (see Supplementary Table  S1). The LIDAR measurements inside the Envisat footprints have a mean water surface height 4.6 cm lower than the gauge height, and the STD between each altimetry footprints is 0.3 cm. STD of point heights within the individual footprints amounts to 14.7 cm.
The results indicate that even the individual LIDAR point measurements were more accurate in terms of STD and bias than the nearest altimetry footprint, but two orders of magnitude less accurate than the single-point gauge measurement based on GNSS levelling (which has a STD 0.2-0.3 cm). Combining a large number of LIDAR points available within the area of a single altimetry footprint allowed for statistical redundancy, and the STD between mean point heights within altimetry footprints suggests that LIDAR allowed for highly reliable measurements when combined across the footprint areas. Whereas the STD values compare favourably to float and ship-mounted GNSS measurements and also tide gauges, the bias in height is slightly worse than the typical bias values achieved by these conventional techniques. However, in an operational case, correction based on gauge heights would be possible since the height differences are consistent. Zlinszky et al. (2014) measured a median bias of −2 cm and a STD of 5 cm across 80 million individual LIDAR lake height measurements with lower flying heights (and footprint sizes) and a variety of wave conditions on Lake Balaton. However, Connor et al. (2009) observed bias within 1 cm when comparing airborne LIDAR with Envisat altimetry over frozen ocean leads (without available gauge data) -our results show at least an order of magnitude larger differences between these data sources. Not surprisingly, compared to ICESAT/GLAS satellite LIDAR altimetry, our airborne LIDAR-based lake height measurements are far more accurate (Baghdadi et al., 2011).

Discussion
Literature reports accuracies of 5-50 cm for satellite altimetry of inland water compared to surface gauges (Calmant et al., 2008;Schwatke et al., 2015b); both our bias and STD values are therefore within the expected range. The centimeter-order bias between LIDAR and gauge height and the sub-centimeter STDs between LIDAR heights in altimetry footprints are an important benchmark -the latter is similar to the STD of the GNSS levelling that was used to establish the height of the gauge itself. Since they are at least an order of magnitude better than the bias of altimetry or the STDs between altimetry footprints (and also compare favourably to the accuracy of ship-mounted GNSS surveys; Bouin et al., 2009), LIDAR shows sufficient accuracy to serve calibration and validation of altimetry. LIDAR data can inherently be used in the same height system as altimetry, and can be obtained exactly from the area of the altimetry footprints, as opposed to point data from gauges or GPS buoys. In case of this study, using a geoid model would not have been necessary for simply comparing satellite altimetry and LIDAR heights as both were collected in an ellipsoidal system. Nevertheless, in order to allow comparison to the gauge heights, we applied a geoid model. This also allowed calculating physically meaningful differences and deviations within and between footprints, excluding the effect of local variations in geoid undulation as far as possible.
However, some processing steps were crucial for reaching this level of accuracy. The positional error of each LIDAR point depends mainly on the accuracies of (a) the GNSS/INS flight trajectory, (b) the mounting calibration parameters (which describe the rotational and positional offset of the scanner regarding the GNSS antenna and the INS) and (c) the scanner measurement itself. After strip adjustment, these sources of inaccuracy are mainly removed, giving as a result LIDAR strips with an improved relative and absolute orientation. The residual errors stem to a large extent from the GNSS measurements (Habib et al., 2009;King, 2009). When planning a LIDAR campaign specifically for altimetry calibration and validation, it is therefore essential to include sufficient coverage of the shore to generate control surfaces for strip adjustment, and to measure control points for absolute georeferencing. In case of flat water, most of the LIDAR returns are at or near sensor nadir, with off-nadir echoes lost to specular reflection effects. In the case of the sensor we used, removal of specularly reflected LIDAR echoes (characterized by their extremely high point amplitudes) improved height accuracy since such points often have erroneous elevation (Zlinszky et al., 2014). The fact that flat water only allows LIDAR height measurement in a small part of the strip may raise concern, but initial studies have shown that such conditions allowed measuring the absolute water surface height with very little bias (2.6 cm, Zlinszky et al., 2015). When moderate waves were present, as in our case, echoes were registered across the full LIDAR swath width, and the large number of points provided a very robust height measurement as indicated by the STDs which are well below 1 cm.

Accuracies, error budgets and uncertainties
The GNSS levelling measurements resulting in gauge heights had a STD of 0.3 cm. Dynamic changes in water level such as seiche and setup are known to be within ±2 cm for the study area based on time series analysis of gauge data (Zlinszky et al., 2014). The estimated prediction errors of the quasi-geoid model we use are also within 2 cm for the study area (Tóth, 2009). However, both dynamic water level changes and errors of the geoid model are expected to vary across the study area, whereas the bias in the LIDAR data is nearly constant.
The altimetry data show a variation in height along track of ca. 45 cm. For a water body the size of Lake Balaton shore influence is present in all measurements, even in the middle of the lake and poses the main source of error in the height measurement. However, our retracking scheme tries to minimize the influence of shore infiltration. Alternative retracking algorithms were tested (ICE-2, Improved Threshold Retracker) but did not deliver better accuracies. The extent of shore infiltration on the altimeter measurements partly depends on the surrounding region and is different for every water body (Calmant et al., 2008). More and higher topography leads to less accurate altimeter measurements than flat lands. Water surfaces around the water body itself, like from wetlands, or urban areas are also suspected to influence the altimeter measurements (Boergens et al., 2016). But it is not possible to quantify the influence of shore infiltration on the resulting height measurements over a water body (Gommenginger et al., 2011b).
Furthermore, the accuracy of the altimeter data depends on the accuracy of the corrections applied (wet and dry troposphere, ionosphere, earth and pole tides, geoid). In Bonnefond et al. (2011), an error budget for Jason-2 GDR data over open ocean is given with 0.7 cm for the dry and 1.2 cm for the wet troposphere, 0.5 cm for the ionosphere and 1.7 for the altimeter noise. The ionosphere model used in this study by Scharroo and Smith (2010) is stated to have a global accuracy between 1 and 2 cm. The geoid model HGTUB2007 is given with a model accuracy of 8.3 cm. The accuracies of all these correction models, except the geoid model, generally degrade over land and near coasts and can be up to a few centimeters each (Andersen and Scharroo, 2011).
Because of the uncertainties in the formal accuracies of altimetry over land, the accuracy of inland altimetry is traditionally given through the comparison to in situ gauge data. It is assumed that the gauge data are without any inaccuracies and present the ground truth. For larger lakes, a root mean squared error (RMS) between altimetry and gauge of a few centimeters can be achieved (e.g. Calmant et al., 2008;Schwatke et al., 2015b). Generally, the accuracies decline for smaller water bodies and are more in the range of tenths of centimeters. The 44 cm, respectively 47 cm standard deviation found in this study for the altimetry data are in the expected domain for altimeter standard deviations over a small lake.
The LIDAR data have been corrected with respect to fieldmeasured terrestrial correction points, which have nominal height measurement accuracies below 2 cm STD. The mean bias of the LIDAR heights with respect to these measurements is 0 cm (STD 9.6 cm). The STD of the individual LIDAR points within each altimetry footprint is around 15 cm, which is the combined result of wave heights (around 10 cm) and penetration of the laser pulse into the water column. The latter is known to depend strongly on the incidence angle (Guenther et al., 2000). The remaining 5 cm negative bias is most probably a side effect of removing the points with high amplitudes due to specular reflection, since such points were observed to occur mainly on the wave crests. In case of our sensor, filtering based on amplitude was nevertheless necessary due to the range-walk problem (Zlinszky et al., 2014). Some sensors allow correction for the influence of amplitude on the height look-up table as demonstrated in Carter et al. (2001), but this was not feasible for our instrument which only records 8 bits of intensity information and includes automatic gain control. Newer sensors with more sophisticated peak detection algorithms or even fullwaveform processing are expected to be less influenced by the range-walk effect.

The potential of LIDAR as a calibration data source for satellite altimetry
Classically, water level gauges are used as reference data for altimetry, but the absolute height accuracy or measuring precision of these gauges is rarely evaluated. The former depends on the accuracy of the height system used for levelling the gauge, the latter on the type of sensor or logger used for registering the water levels. In absence of information on these, the accuracy of the "calibration data" can not be assessed; in such cases only the relative differences to a mean water level are compared between gauges and satellite sensors (Crétaux et al., 2011). Wherever GPS or optical levelling data are available, these may identify the height of the gauge within accuracies up to 0.5-1 cm (Bonnefond et al., 2011). In case of GPS buoys, their accuracy is validated through the STD of the heights they measure over a longer period of assumed hydrostatic equilibrium, also allowing ranges between 1 and 2 cm (Watson, 2005). In case of the LIDAR dataset we utilized, the STD of mean heights between footprints was within 0.5 cm in all cases, thereby delivering height data with deviations comparable to or better than gauge height or GNSS float accuracies. In our experiment, a small but systematic bias remained with respect to gauge levels, which can be corrected in the operational case similar to Bonnefond et al. (2003a). In case of these specific gauges, the internal errors of the height system are not specified to a sufficient level for determination of whether this is a product of systematic LIDAR measurement errors, or differences between the height systems.
All in all, this study demonstrates that the accuracy of airborne LIDAR (in terms of STD) is comparable to the accuracy of gauges and GPS buoys which are regularly used for calibration of satellite altimetry, but a systematic bias may be present. The spatial resolution and measurement setup of the dataset used here is far from optimal, nevertheless the resulting dataset is of satisfactory accuracy. Generally, in addition to its accuracy, airborne LIDAR allows data collection across large areas, allowing for exact comparison of altimeter footprint heights and LIDAR heights without spatial extrapolation. This will be of profound importance for calibrating new satellite sensors such as Sentinel-3 and upcoming missions such as SWOT. Specifically for SWOT, the swath-wide acquisition of LIDAR fits to the altimeter swath, although the widths are different (hundreds of meters for LIDAR, tens of kilometers for SWOT). Given the high precision of LIDAR measurements, we expect this method to be useful for validating measurements over rivers and lakes and comparing height patterns in high resolution. Rapid on-demand data collection allowed by the airborne platform is especially important over water bodies where height patterns are variable in time due to tides, setup or seiche and is considerably faster than ship-mounted GNSS. We conclude that given the presence of gauge data for correcting systematic bias, LIDAR data are theoretically suitable for calibration and validation of satellite radar altimetry. As opposed to water gauges, LIDAR allows measuring ellipsoidal heights independently from a geoid model, and contrary to calibration data measured in single points, can exactly cover the water surface within altimetry footprints. The SWOT science requirements document (Rodriguez, 2016) states that "the science return of SWOT" may be increased by comparing SWOT "elevation, mask, and error products" for smaller water bodies with "in-situ or other data". Our experiments, but also the work of Melville et al. (2016) establish ALS as an appropriate, or even ideal, source of "in-situ or other data". It is also noted that LIDAR is able to discriminate between vegetation above water and the water surface at high spatial resolutions.
An ideal calibration setup would consist of well-levelled water gauges, meter-resolution LIDAR coverage of the shore and the water surface, and eventually GPS buoy data from the open water near the target location. Given the increasing availability of regional-scale airborne LIDAR and the ongoing collection of new data, such a setup would be feasible in the near future, if not already available.
Also, this study confirms that airborne LIDAR may be used as a data source for water level altimetry in its own right, with elevation accuracies sufficient for geodetic (Zlinszky et al., 2014) or hydrodynamic applications (Marmorino et al., 2015;Vrbancich et al., 2011). While many of the assumptions used by ocean altimetry do not apply at this fine spatial resolution, this method would hopefully allow further insight into current and eddy systems of costal or inshore waters.

Conclusion
We present the first example of a three-way comparison between near-synchronous satellite radar altimetry, airborne LIDAR-derived water surface heights, and gauge-derived water level for the case of Lake Balaton. Accuracy as represented by the mean bias with respect to gauge height is one order of magnitude better for LIDAR (−0.048 m) than for altimetry (0.57 m), and standard deviation of height (STD) between LIDAR footprints (0.004 m) is similar to the STD of gauge height levelling (0.003 m) and two orders of magnitude better than it is for the altimetry sensors we investigated here (0.468 m). Therefore we conclude that LIDAR can deliver sufficient accuracy to serve for calibration of water surface altimetry heights in coastal or inshore settings, while also allowing direct coverage of the altimetry footprint without the need for upscaling from a few point measurements. Establishing LIDAR for calibration and validation of altimetry should allow widespread ground truthing of such measurements, advancing the methodology of data processing and verifying results. This study is unique in the availability of altimetry, LIDAR and gauge data as an ideal setup for calibration, but with increasing LIDAR coverage or dedicated flights and ongoing archiving of altimetry, probably many similar cases can be found and used. Future work should investigate interaction of the LIDAR and altimetry radar pulse with the water surface to better understand the error budget.