A Large Airborne Survey of Earth ’ s Visible-Infrared Spectral Dimensionality

The intrinsic spectral dimensionality indicates the observable degrees of freedom in Earth’s solar-reflected light field, quantifying the diversity of spectral content accessible by visible and infrared remote sensing. The solar-reflected regime spans the 0.38 2.5 μm interval, and is captured by a wide range of current and planned instruments on both airborne and orbital platforms. To date there has been no systematic study of its spectral dimensionality as a function of space, time, and land cover. Here we report a multi-site, multi-year statistical survey by NASA’s “Classic” Airborne Visible Near InfraRed Spectrometer (AVIRIS-C). AVIRIS-C measured large regions of California, USA, spanning wide latitudinal and elevation gradients containing all canonical MODIS land cover types. The spectral uniformity of the AVIRIS-C design enabled consistent in-scene assessment of measurement noise across acquisitions. The estimated dimensionality as a function of cover type ranged from the low 20s to the high 40s, and was approximately 50 for the combined dataset. This result indicates the high diversity of physical processes distinguishable by imaging spectrometers like AVIRIS-C for one region of the Earth. c 2017 Optical Society of America OCIS codes: (010.0280) Remote sensing and sensors; (300.6340) Spectroscopy, infrared; (300.6550) Spectroscopy,


Introduction
The Visible-ShortWave InfraRed (VSWIR) electromagnetic spectrum from 0.38-2.5 µm contains most surface-reflected solar energy available for remote sensing.These wavelengths access diverse physical absorption and scattering properties throughout the Earth system.For example: • 0.38-0.75µm wavelengths penetrate water, revealing the condition and composition of benthic ecosystems, water optical properties, phytoplankton [1], and water quality [2].
• 0.4-2.5 µm wavelengths interact with vegetation, undergoing absorption by pigments, scattering and absorption in canopies, and absorption by structural elements, indicating plant health, structure, foliar chemistry and even species [3].
• 0.6-2.5 µm wavelengths show vibrational absorption features in solids including iron oxides, phyllosilicates, carbonates, and other materials such as metals and hydrocarbons [4].This reveals a wide range of natural mineralogy and artificial materials in urban environments.
The total number of degrees of freedom determines the spectral sampling required to fully measure the VSWIR upwelling light field and disambiguate target phenomena from confounding processes.This "intrinsic spectral dimensionality" holds implications for future and proposed orbital instruments including HyspIRI [7], EnMAP [8], and HISUI [9].Prior investigations have explored VSWIR dimensionality in specific scenes and pooled archives, but no experiment has yet assessed a wide temporal and spatial range to evaluate variability over space and time.
Prior research gives some clues.Boardman and Green evaluated a historical archive of NASA's "Classic" Airborne Visible Near Infrared Spectrometer (AVIRIS-C) [10].They found dimensionalities ranging from 20 to 50 for scenes taken independently and far higher for the combined dataset.Small analyzed an urban AVIRIS-C scene, finding a spatially-coherent subspace of about 31-35 dimensions [11].Simulations of coarser spatial resolutions reduced this dimensionality due to mixed pixels.The Carnegie Airborne Observatory later corroborated this spatial dependence; its sampling of 1-2 m revealed significantly higher dimensionality [12].It is intuitive that open-water aquatic environments would have the smallest dimensionality due to a simplified optical system and water absorption beyond 1.0 µm.Conversely, urban environments are believed to be most spectrally diverse.Asner et al. recently demonstrated this for a two-site study comprising about 400 ha total [12].There remains a need to characterize dimensionality magnitudes and trends rigorously over a wide spatiotemporal range, with a single sensor and a consistent spatial resolution similar to anticipated orbital instruments.
This study analyzes data from the HyspIRI preparatory campaign, a series of AVIRIS-C flights over California in 2013-2015.Here we focus on the 2013 and 2014 data comprising over 600 distinct flightlines of diverse biomes, latitudes, and elevations.AVIRIS-C flew onboard NASA's high altitude ER-2 aircraft flying at nominal 20 km altitude, above nearly all atmospheric scattering and absorption.The spatial instantaneous field of view was 1 milliradian giving 20 m sampling.These properties made it relevant to a future class of Earth-orbiting imaging spectrometers operating at similar spatial and spectral sampling.We sought to (a) evaluate the properties of the dimensionality distribution and its stability over time, (b) test the specific hypothesis of increasing spectral diversity in aquatic, terrestrial natural environments, and urban environments respectively.

Approach
There are many ways to estimate dimensionality for di↵erent assumptions about the data distribution [13]; we desire a method that is simple, robust and automatic for application on large radiance datasets.A common approach is to analyze the eigenvalues of the covariance matrix as in Boardman and Green [10] and Asner et al. [12].Low-rank data lie on a strict linear subspace implying one or more of the eigenvalues corresponding to the null space will be zero.Eigenvalues are initially large, decay quickly, and never quite reach zero due to measurement noise.Consequently dimensionality estimation reduces to the question of which eigenvectors represent scene structure above the noise level.Wu et al. [14] evaluate alternative solutions including: model selection based on information criteria [15], and the minimum description length [16]; Gershorgin radii [17]; Signal Subspace Estimation, or SSE [18]; and Neyman-Pearson detection theory [19].The question is ultimately a subjective issue of definitions [13], but most methods provide similar results and relative comparisons of a single estimator across datasets are still meaningful.In the Wu et al. survey, only SSE achieves good performance without prespecified noise thresholds [14].However, it relies on a nonnegativity property of geographic mixture models; this applies to reflectance data [20] but may be less appropriate for radiance data having variable atmospheric attenuation.For radiances the standard practice remains an orthogonal decomposition of the covariance matrix into eigenbases, provided a suitable noise threshold can be identified.
Our noise estimation approach was similar to that of Asner et al. [12] except that we automated the spatial coherence metric to enable objective estimates across a large dataset.We exploited the unique spatial uniformity of the AVIRIS-C instrument -a "whiskbroom" design in which the same optics and detectors measured every spatial location [21].In the downtrack direction, AVIRIS-C acquired temporally discontinuous measurements across consecutive scans.Any spatial correlation between such spectra was unlikely to occur by chance unless it was a part of the scene [11].This allowed separation of the spatially-coherent radiance field from spatially random noise.Singular Value Decomposition (SVD) factorized the total covariance ⌃ by eigenvalues i and column eigenvectors v i .The projection of a radiance spectrum L onto the ith such eigenvalue was written x i = v T i (L µ) where µ was the mean spectrum.The sequence of projections for a given cross-track location j and downtrack positions 1 . . .n was the vector y i j = [x i j1 , . . ., x i jn ].We estimated the spatially-correlated component of this signal by a nonparametric smoothing operator f (y i j ), and the di↵erence f (y i j ) y i j was the spatiallyuncorrelated remainder.The standard deviations of these two components represented the signal and noise magnitudes respectively.The smoothing operator could have been be any nonparametric smoothing algorithm, but we simply took the local average of an 11-pixel neighborhood in the downtrack direction.In summary, the sequence of steps was: 1. Calculate the mean and covariance spectra of the calibrated radiance cube.
2. Using SVD, decompose the covariance matrix into eigenvalues and eigenvectors ordered from most to least significant.
3. Project zero-meaned spectra onto eigenvectors, transforming the radiances to a Principal Component (PC) representation.
4. Smooth each channel and cross-track position of the PC cube using a 1-D local averaging operator in the downtrack direction.
5. Define the signal to be the smoothed PC cube, and the noise to be the di↵erence between the original and smoothed PC cubes.
6.The estimated dimensionality is the last eigenvalue index for which the standard deviation of the signal image exceeds that of the noise image.
Our methodology contrasted with a related linear approach, the Minimum Noise Fraction (MNF) [11,22], which estimates the noise in the original measurement space, and then whitens the data to normalize the eigenvalues so that the noise was at unity.MNF requires the noise covariance to be specified in advance, and typically gives similar dimensionality estimates [11].
We radiometrically and spectrally calibrated the AVIRIS data to radiance units using procedures in Green et al. [21], and then located them on the Earth's surface by ray-tracing from the GPS/IMU-derived sensor position to a digital elevation model.Finally, we resampled the data to a consistent geographic spacing, using nearest-neighbor sampling to preserve pristine uninterpolated spectra.We then subdivided each flightline into segments 20 km along track, approximately equal to twice the width which varied slightly based on aircraft motion.This provided over 4000 segments, with each segment containing approximately 700000 spectra.Simple heuristic filtering removed occasional bad data from electronic e↵ects, leaving over 2.5 ⇥ 10 9 measurements.We then analyzed this dataset to determine the spectral dimensionality of flightline segments and of the dataset as a whole.A list of flightlines appears in Table 1.
Example noise and signal magnitude curves appear in Fig. 1, showing two typical flightlines over the AVIRIS-C Ivanpah calibration site separated by approximately two months.As expected, scene structure dominated projections with large eigenvalues while spatially-uncorrelated measurement noise dominated the smaller eigenvalues.The intersection between the two curves indicated the projection where signal and noise had equal magnitude; by convention we used this intersection point as the dimensionality estimate for the scene.For additional resilience to noise around the intersection where the two curves were nearly tangent, we smoothed them with a width-9 kernel before calculating the crossover point.In Fig. 1, the two colored lines show di↵erent flight days.They exhibited near perfect agreement with estimated dimensionalities of  37, demonstrating the method was consistent.We calculated a similar crossover point for each segment in the dataset (Figure 2).
Finally, we associated the combined dataset with MODIS land cover categories [23,24].Each polygonal segment of AVIRIS-C had a single dimensionality estimate, but contained many MODIS land cover map pixels.We tried two approaches to reconcile the resolutions: a hard assignment, defining each datapoint as an AVIRIS-C segment with a land cover determined by its center location; and a proportional assignment, defining each datapoint as a MODIS land cover pixel with dimensionality determined by the encompassing AVIRIS-C polygon.The latter violated the assumption of Independent and Identically Distributed (IID) data, but we shall see the choice did not a↵ect our results.

Results
Fig. 3 shows the eigenvalue decay curve for the combined dataset, with segments' covariance matrices normalized according to their estimated noise.Eigenvalues of unity represent the eigenvector where signal is equal to noise, which here occurs at a value of 45.This was slightly smaller than the pooled dimensionality reported by Boardman et al. [10].The di↵erence may have been related to our spatial coherence criterion, or our estimation of noise independently in each scene rather than a static Noise-Equivalent delta Radiance (NEdL) spectrum.Nevertheless, the result broadly agreed with prior estimates and underscored the high overall diversity of the VSWIR range.
Next we partitioned the dataset into MODIS land cover categories [23,24].Fig. 4 (top panel) shows the dimensionality distribution for each land cover type using proportional assignment, with circles indicating the median and bars indicating quartiles.As expected, Urban and Cropland regions had the highest overall dimensionality.Barren regions and Water (which included both inland water and sea) had the lowest, while Forest, Shrubland and Grassland ecosystems had similar distributions with a dimensionality of approximately 30 per scene.Separating the terrestrial anthropogenic scenes having Cropland, Mosaic, or Urban land cover from other terrestrial scenes, and excluding outliers above dimensionality 50, we found each population was well-characterized by its own Gaussian distribution (Fig. 4, bottom panels).The mean and standard deviation of the natural scenes were 30.4 and 4.5, and those of anthropogenic scenes were 33.2 and 4.3.The di↵erence was highly significant (p< 10 10 for a two-sample t test).The values were in the range of prior case studies that independently studied individual natural scenes [13,14], or urban areas [11].The HyspIRI preparatory dataset demonstrated that the two were best characterized by distinct distributions and that the trend held over wide areas.
The Water class suggested a multimodal distribution, so we considered both a single Gaussian and a two-component Gaussian mixture model fit using Expectation Maximization (EM) to The higher-dimensional mode may have resulted from polygons containing many terrain features, which controlled the dimensionality estimate despite the presence of some water.The lower-dimensional mode suggested segments of pure dark ocean with less observable spectral diversity, though perhaps a wider variability due to starkly-contrasting clouds, marine aerosols, and sunglint.Polygons containing vary small patches of terrain with mostly water would likely show intermediate dimensionalities, further broadening the distributions.Thanks to the physical contiguity of biomes, the two methods of reconciling MODIS and AVIRIS-C resolutions produced virtually identical results.Using the hard assignment method only shifted the median dimensionality for a few land cover categories: Water, from 24 to 23; Permanent Wetlands, from 31 to 29; and Snow/Ice, Evergreen Broadleaf and Deciduous Needleleaf Forests, for which populations fell to one or zero datapoints.The aggregate distributions for natural, anthropogenic and water surface showed negligible change; means shifted by less than 1% and statistical separations remained highly significant.
Several e↵ects can change the dimensionality over time.Solar elevation and atmospheric state can cause changes in signal and noise levels.Other physical e↵ects that can alter the dimensionality include the introduction of novel spectra from atmospheric phenomena such as clouds, and or changes in surface cover due to seasonality in vegetation or snow.The Ivanpah example demonstrated the repeatability of the estimation method, i.e. the consistency in calculated dimensionality for specific conditions.That demonstration was possible because th Ivanpah playa scene was a barren environment without seasonal changes, and observed multiple times under similar atmosphere and illumination.In contrast, the HyspIRI dataset was far more general, and likely contained influence from all factors.
While quantifying this partitioning was beyond our scope, several trends invited interpretation.Fig. 5 shows the dimensionality of each segment.The 2014 dimensionalities were generally lower due to signal levels -the Spring 2014 datasets had lower solar elevation throughout, reducing average dimensionality.Label A shows high-altitude forest and meadow in Yosemite National Park, which had a high dimensionality that was likely related to the wide range of materials: mixed vegetation and canopy types together with geologic features and sparse melting snow.A2 identifies the same area in the subsequent year, acquired one month earlier under thick snow cover which muted this diversity.Label B shows a wilderness area of still lower spectral diversity comprised mainly of low vegetation and grasses in the foothills below the Sierra Nevada.Label C shows a typical diverse urban area -here, Fresno CA.Other metropolitan areas such as Bakersfield, the San Francisco Bay Area, Santa Barbara and Los Angeles all showed significantly increased spectral diversity.Label D shows typical cropland: mixtures of fertile and fallow fields near the Salton Sea.Cultivated areas in the Sacramento Bay Delta showed similarly high dimensionality.Label E shows a contrasting desert area north of the Salton Sea dominated by bare soil and rock with much lower values.There was also depleted spectral diversity in certain less-vegetated areas including inland of Santa Barbara, the low savanna terrain beneath the Sierra Nevada mountain range, and the densely-forested areas northwest of San Francisco.The fall flightlines show similar overall trends as well as some unique features.Labels F and F2 show two areas imaged under very di↵erent illumination.The 2013 flightline was acquired with a low solar elevation (often less than 30 degrees), a signal reduction of over 40% from the same area a year later.Small-scale textures and cast shadow microshading within a pixel could have reduced this signal further; in aggregate, these e↵ects significantly reduced the measurable spectral diversity.Label G shows another example: prominent band of low dimensionality aligned with the uniform, closed canopy of a conifer forest under low illumination.Label H shows scattered clouds, which artificially inflated the dimensionality by modulating small areas with additional reflectance and/or attenuation.

Discussion and conclusions
This study provides the most comprehensive survey to date of the dimensionality of the VSWIR upwelling light field over wide spatiotemporal areas, showing dimensionalities of 30-35 for 20 km segments taken individually and approaching 50 for the dataset as a whole.The data for terrestrial scenes are well-represented by Gaussian distributions for natural and anthropogenic environments, with the latter having a significantly larger mean dimensionality.This is not a perfect unbiased measure of the Earth's surface, since the areal extent of di↵erent terrain types and elevations does not exactly match the planet and the flights were planned to coincide with clear days and favorable observing conditions.However, the study o↵ers an informative view of the Earth's radiometric properties across a broad latitudinal and elevation gradient with diverse land types.
Our dimensionality estimates represent the subspace for which the average variability in the signal falls above that of noise for the scene as a whole.This approach produces a stable, repeatable estimate that is appropriate for many global mapping applications, though it may understate the number of spectra that could be distinguished in practice.For instance, techniques such as spatial averaging could recover signal from features with a magnitude below the spatially-random noise, provided they appear in large contiguous areas.At the other extreme, many eigenvector projections show small compact outlier features which are statistically significant but subtend a small number of pixels, contribute less to the total standard deviation along the eigenvector, and do not count toward the spectral dimensionality by our working definition.Even if there were no spatially-compact features, the covariance structure could underrepresent narrowband spectral diversity that appeared rarely in the dataset as a whole.Our use of second-order statistics was appropriately generic and well-defined, but should be considered a lower bound on spectral diversity.Covariance analyses would not substitute for task-specific degree of freedom or information content studies.Future investigations could investigate how dimensionality varies across spatial scale and spectral resolution, or after atmospheric e↵ects are removed by cloud screening [25] and atmospheric correction [5,26].AVIRIS-C is a good place to begin such a survey, since it is similar to many existing aircraft and anticipated orbital instruments.
Overall, the study echoes prior findings on the astonishingly high diversity of VSWIR spectra, with dozens of degrees of freedom -far exceeding many other remote sensing modalities in current use.In other words, much contemporary remote sensing practice lies in a spectral sampling domain where instruments and algorithms capture a shadow of natural variability.This low-dimensional projection is underdetermined, so explanatory power often relies on modeling assumptions that the measurement data cannot test directly.In contrast, spectroscopic analyses enabled by instruments like AVIRIS-C provide an overdetermined measurement with contiguous spectral sampling over the VSWIR range.This is highly desirable since it o↵ers numerical leverage while also providing the capability to measure unexpected phenomena and falsify modeling assumptions.Spectroscopy provides the opportunity for investigation within the dataset itself, exploiting the rich physical interpretability of the many-channel measurement to discover rich information manifest in the VSWIR light field.

Fig. 1 .
Fig. 1.Left: RGB subset of an example validation segment -foothills of Ivanpah Playa, overflown on multiple occasions under di↵erent illumination conditions.Right: The average magnitude of the projection of each Principal Component, separated into spatially-correlated (thick line) and uncorrelated (thin line) components.The intersection indicates the dimension where the measurement noise is equal to the scene signal.

Fig. 2 .
Fig. 2. Example segmentation of flightline f1405f140528r15 near the Bay Area, CA.The bar chart at the top shows dimensionality estimates for each segment.Images indicate representative segments with intermediate, high, and low values.

Fig. 3 .
Fig. 3. Eigenvalue decay of the pooled covariance matrices, each rescaled to place noise at unity

Fig. 4 .
Fig. 4. Top: Intrinsic spectral dimensionality of MODIS land cover categories.Bottom: Frequency histograms of dimensionality for aquatic and terrestrial scenes.Fitting a bimodal distribution to the Water class yields means 12.4 and 27.1, and standard deviations 3.8 and 5.6 respectively.Natural and anthropogenic terrestrial scenes are well-described by Gaussian distributions of means 30.4 and 33.2, standard deviations of 4.5 and 4.3 respectively.

Fig. 5 .
Fig. 5. Intrinsic spectral dimensionality of observations over California, 2013-14.Dimensionality is floored at 20 and ceilinged at 40.Letter annotations show locations of interest referenced in the text.

Table 1 .
Flightlines used in the experiment.Date: Flight date, N: Number of segments, Radiance: Mean radiance in W m 2 sr 1 nm 1 , Dim: Mean Dimensionality, Lat: Center latitude of flightlines, Lon: Center longitude of flightlines.