An Approach for Unsupervised Change Detection in Multitemporal VHR Images Acquired by Different Multispectral Sensors

This paper proposes an approach for the detection of changes in multitemporal Very High Resolution (VHR) optical images acquired by different multispectral sensors. The proposed approach, which is inspired by a recent framework developed to support the design of change-detection systems for single-sensor VHR remote sensing images, addresses and integrates in the general approach a strategy to effectively deal with multisensor information, i.e., to perform change detection between VHR images acquired by different multispectral sensors on two dates. This is achieved by the definition of procedures for the homogenization of radiometric, spectral and geometric image properties. These procedures map images into a common feature space where the information acquired by different multispectral sensors becomes comparable across time. Although the approach is general, here we optimize it for the detection of changes in vegetation and urban areas by employing features based on linear transformations (Tasseled Caps and Orthogonal Equations), which are shown to be effective for representing the multisensor information in a homogeneous physical way irrespectively of the considered sensor. Experiments on multitemporal images acquired by different VHR satellite systems (i.e., QuickBird, WorldView-2 and GeoEye-1) confirm the effectiveness of the proposed approach.


Introduction
The use of Remote Sensing (RS) in the analysis and evaluation of environmental processes evolution is a valuable tool whose relevance has increased in conjunction with the use of digital image processing techniques. Due to the improvement of both acquisition sensor technology and data processing algorithms, it is possible to get an accurate and automatic identification and extraction of features for understanding the environmental changes occurring on the ground due to natural and anthropogenic interactions. The technological evolution resulted in the availability of multitemporal and multispectral satellite images with Very High spatial Resolution (VHR) acquired by passive sensors (e.g., QuickBird, WorldView-2, GeoEye, Pleiades). In the context of this paper, we define VHR images as those acquired with a metric to sub-metric spatial resolution in the panchromatic channel (i.e., spatial resolution lower than one meter). These images allow a detailed geometrical analysis when compared to medium or high spatial resolution data [1][2][3][4]. When considering VHR satellite systems, it is difficult to define Time Series (TS) made of images from one single sensor that satisfy the application temporal Among CD techniques, the most widely used are the ones based on Principal Component Analysis (PCA), Iteratively Reweighted Multivariate Alteration Detection (IR-MAD) and Change Vector Analysis (CVA) [20,[24][25][26].
CVA has been widely applied to the original spectral bands of multitemporal images from low to very high spatial resolution [22,23,[27][28][29]. Other authors used CVA on the vegetation index or the Tasseled Cap (TC) feature space, but with low and medium spatial resolution images only, and/or for the detection of specific/single vegetation changes [28,29]. In the case of VHR images, original spectral bands are traditionally used, and the CD mainly focuses on the separation of change and no change classes, or the identification of a single type of change, without considering the nature (type) of the changes [5,[30][31][32][33]. In low, medium, high and VHR cases, features are mostly selected according to the possible changes occurring on the ground. However, less attention is devoted to explicitly mitigating the differences induced by the acquisition system, both from the homogenization and feature selection perspectives [30][31][32][33], which in turn would lead to an improvement on the CD accuracy. In the specific case of IR-MAD, little attention is devoted to the possibility of using these features for CD in multisensor images [33]. Mostly qualitative visual analysis about the possibility to detect changes is conducted, but no quantitative change information extraction approach is provided [24]. In this context, the need arises for defining proper techniques and operational strategies for homogenization of multitemporal images acquired by different VHR multispectral sensors that mitigates differences in both atmospheric conditions and acquisition system parameters, thus reducing their impact on the multitemporal information extraction, and on the CD accuracy. The technique should also be able to extract changes automatically and distinguish among different types.
This paper presents an approach for unsupervised CD in multitemporal VHR images acquired by different multispectral sensors. The proposed approach takes inspiration from two existing frameworks presented in [5] and [34], but improves them by introducing guidelines and solutions for how to deal with multisensor VHR images in the context of CD. It exploits some of the concepts in [5] and extends and integrates them with a strategy for mitigating the effects of the non-homogeneous properties of multitemporal images acquired by different VHR multispectral satellite systems at both pre-processing and feature extraction/change detection level. In conclusion, this paper aims at illustrating the entire processing chain from image acquisition to CD map using multi-sensor images. Attention is focused on the possible sources of differences, particularly on the ones becoming more critical when multisensor images are used and may result in change detection errors.
The main steps of the proposed approach are: (i) mitigation of differences induced by the use of VHR multitemporal images acquired by different sensors; and (ii) detection of multiple changes occurring on the ground by means of high level physical features. The first step is conducted by defining homogenization procedures that address radiometric, spectral and geometrical differences. Thus, multitemporal multisensor images become more comparable (i.e., more homogeneous) across time. Homogenization is further improved by extracting physical features from multisensor images that allow for an effective multitemporal comparison across sensors at a given level of abstraction. Features are designed to detect multiple changes relevant to the user. Although the approach is general, here we concentrate on the selection of high-level physical features (i.e., features that account for the physical quantities) suitable to detect changes in vegetation and urban areas. Linear/Orthogonal transformation features are selected for the detection step. They are shown to be effective for representing the multisensor information in a coherent physical way versus the considered sensor. Other physical features should be selected depending on the application. The second step is conducted by means of CVA in spherical coordinates, where the changes are represented by magnitude and direction variables. Separation among the changes is carried out in an automatic way. In the case of the magnitude variable, changed and non-changed pixels are separated by means of a Bayesian decision rule [35], whereas along direction variables, an adaptation of the Two-Stage Multithreshold Otsu (TSMO) method [36] is used. Experiments carried out on bi-temporal VHR image pairs acquired by different sensors confirm the effectiveness of the proposed approach.
The remainder of this paper is structured as follows. Section 2 presents an overview and an analysis of the properties of a CD system for VHR RS images, the proposed approach for the mitigation of differences induced by the use of VHR multisensor multitemporal images and the CD process. Section 3 introduces the multisensor datasets, describes the design of experiments, and illustrates and discusses the experimental results. Finally, Section 4 draws the conclusions and future developments.

Methods
In this section, an overview of the conceptual framework for analyzing the properties of a CD system in the context of VHR RS images is presented. The theoretical structure of the problem in the context of multisensor VHR images is drawn. This is the model for the design of solutions for different specific CD problems. After that, the proposed approach for the mitigation of differences induced using VHR multisensor images is introduced. Last, the description of the CD approach for detecting changes of interest in bi-temporal images is provided.

Conceptual Framework: CD Systems for VHR Remote Sensing Images
When considering VHR RS images, the CD problem becomes complex due to the heterogeneous spatial and spectral characteristics. Further, standard CD techniques often do not account for the semantic meaning of changes of interest. Standard CD methods often assume that unchanged pixels have similar signatures on the two dates, whereas changed ones do not. Unfortunately, this assumption is often not satisfied when we consider multitemporal VHR images, since additional differences may appear due to spectral and spatial heterogeneity. This becomes more critical when multisensor images are considered [5,37]. In accordance with [5], a proper understanding and modeling of changes is fundamental for the development of effective techniques that can mitigate the intrinsic differences in multitemporal VHR data, and accurately detect multiple changes. Further, most of the current methods for CD in VHR images focus on: (i) handling images acquired by the same sensor [38,39], or (ii) detecting specific type of changes by using multisensor images (i.e., deforestation, burned areas, buildings detection) [30][31][32][33]. Thus, their applicability to CD in multisensor image pairs is limited.
In this paper, we aim at developing a CD approach for multisensor VHR optical images. For this purpose, the framework in [5] is used as a baseline. The general flowchart proposed in [5] consists of two main steps: (i) definition of the tree of radiometric changes; and (ii) detection of changes.

Definition of the Tree of Radiometric Changes
In this step, possible classes of radiometric changes are analyzed and their taxonomy is defined. The resulting tree of radiometric changes is specific for the considered CD problem. To this end, a categorization of the different possible radiometric changes that may be present in a multitemporal VHR dataset is required. Figure 1 shows the tree that models radiometric changes for a generic CD problem in multitemporal multisensor VHR images. Let Ω be the symbol representing the concept of change class. Thus, each specific type of change will be represented by the symbol Ω and a sub-script that refers to the meaning of the type of change. According to [5], two main types of radiometric changes (Ω Rad ) originate because of the complexity of VHR images: (i) changes due to acquisition conditions (Ω Acq ); and (ii) changes occurring on the ground (Ω Grd ). The former corresponds usually to changes of no interest for the end user. The latter includes the changes relevant from the user's viewpoint. Ω Acq changes are the ones associated to differences in atmospheric conditions (Ω Atm ) and in the acquisition system Ω Sys . The latter results in the appearance of undesired change patterns, differences in the geometry and in shadows. Ω Sys changes, such as the ones due to the type of sensor (Ω Sen ), can be mitigated by working with proper homogeneous features.
Among Ω Sen , we find changes due to differences in the spectral resolution (Ω Spe ) and spatial resolution (Ω Spa ). Ω Sys includes changes due to the sensor view angle (Ω Ang ) or the solar rays incidence angle (Ω Sun ). Ω Ang can be related to the topography (Ω Top ) and the relief (Ω Rel ). For example, they may generate changes in shadows, whereas Ω Sun refers to effects such the ones induced by seasonal variations of the solar ray incidence angle, which generates shadows differences that are not associated to changes on the ground. On the other hand, Ω Grd changes can be divided into four main categories: natural disasters (Ω Dis -e.g., earthquake damages), vegetation phenology (Ω Veg -e.g., leaves lost during winter), environmental conditions (Ω Env -e.g., variation in soil moisture levels) and anthropogenic activities (Ω Ant -e.g., harvested crop fields, new buildings). The tree structure illustrated in Figure 1 has a general validity and can be used to model most of the CD problems [5]. However, depending on the specific CD problem, the tree can be adapted and optimized. On the one hand, some leaves/nodes might be irrelevant and thus can be removed. On the other hand, some leaves may require to be further specified in accordance with the specific study case (see Section 3 for an example of how to define the tree of radiometric changes for a concrete CD problem). To extract the changes of interest, it is necessary to select effective features and to count on prior knowledge about the problem.

Detection of Changes
In this step, the type of changes identified in the previous step are detected by selecting the strategy for the design of the CD method. According to [5], two possible strategies can be adopted based on: (i) direct extraction of the radiometric changes of interest; or (ii) detection by cancellation of non-interesting radiometric changes. Most of the current available CD methods for VHR RS images make use of the direct extraction strategy since their goal is to extract a specific type of change. Nevertheless, sometimes it is easier to detect the radiometric changes that are of no interest, and therefore to detect relevant changes by cancellation.

Proposed Approach to Unsupervised CD in VHR Multispectral Images Acquired by Different Sensors
In this sub-section, details on the proposed approach to solve the CD problem in VHR multisensor optical images are given. For handling the problem, we focus on two issues: (i) mitigation of multisensory induced changes Ω Sys by the homogenization of multispectral data acquired by different VHR sensors; and (ii) detection of Ω Grd changes by mitigation of residual Ω Sys at the level of feature extraction. Figure 2 depicts the block scheme of the proposed approach. In order to accomplish the Ω Sys mitigation, two main steps are considered: (1) spectral; and (2) geometric differences mitigation. Ω Grd detection is accomplished in two steps: (3) multisensor feature extraction; and (4) change detection. Let us consider two VHR optical images, X 1,a and X 2,b , acquired by different sensors S 1 and S 2, where a = 1, 2, . . . , A and b = 1, 2, . . . , B represent the multispectral bands for S 1 and S 2 . The number A and B of spectral channels in S 1 and S 2 can be equal or different depending on the sensor. Given the use of different VHR sensors, X 1,a and X 2,b are likely to show different number of acquisition bands with slightly different bandwidth and spatial resolution, and/or different view angle. In other words, different spectral and geometrical properties. Let us assume that X 1,a and X 2,b sizes are I 1 × J 1 and I 2 × J 2 , respectively, and that the images are acquired over the same geographical area at different times t 1 and t 2 .

Ω Sys Differences Mitigation
When dealing with multitemporal images acquired by different sensors S 1 and S 2 , one of the critical issues is to identify and remove acquisition system induced changes (Ω Sys ). Handling the differences due to Ω Sys , contributes to mitigating issues on the left side of the tree of radiometric changes ( Figure 1). In single sensor VHR images, Ω Sys are mainly due to differences given by the sensor view angle (Ω Ang ), and are accentuated by the topography (Ω Top ) and relief (Ω Rel ). All of them contribute to the geometrical differences and result in radiometric distortions. When multisensor VHR images are considered, additional problems arise due to the type of sensor (Ω Sen ) and thus the differences in the spectral (Ω Spe ) and spatial (Ω Spa ) resolution. Ω Spe can be mitigated by performing a radiometric normalization of the images, whereas Ω Spa should be managed by means of geometric corrections, since they contribute to geometric differences.
To mitigate Ω Spe , two macro-categories of normalization methods exist in the literature: absolute and relative methods. The former involves the conversion of the DN values into the corresponding ground reflectance ones [6], while the latter performs image-to-image adaptation in the histogram feature space [9][10][11][12][13][14]. When data from two different sensors S 1 and S 2 are considered, the spectral information in the multisensor images is not comparable. Thus, absolute normalization is preferred with respect to the relative one, though not limited.
Absolute normalization estimates surface reflectance values providing information at physical level and mitigating spectral differences. The steps for spectral differences mitigation are shown in Figure 3, where X 1,a and X 2,b are converted from DN to At-Surface Reflectance (ρ ASR [unitless]) images, X s 1,a and X s 2,b (where s stands for spectrally corrected). Although this step might seem obvious in the CD process, several works can be found in the literature that use DN in the comparison step. Further, the mitigation of Ω Spe becomes more critical when multisensor data are considered. To get ρ ASR , the digitalization process performed at the sensor during image formation must be inverted [6]. Parameters such as the mean exoatmospheric solar irradiance, solar zenith angle, Earth-Sun distance, radiance value and others are required. They can be retrieved from the metadata or from user guides, and are specific for each satellite. The resulting X s 1,a and X s 2,b have the same physical meaning. However, some differences cannot be mitigated. Thus, in addition to physical driven strategies, some data driven ones (feature extraction) are required (see next sub-section).
Satellites carrying on-board VHR sensors have the capability to acquire images with different view angles; this increases the probability of having multitemporal images with the required time resolution and cloud free on a given area. However, when multitemporal images are considered, differences in the acquisition view angle can induce misalignments because of the impact of topography (Ω Top ), small changes in relief of the terrain or the presence of buildings (Ω Rel ) [40][41][42][43]. Further, when X 1,a and X 2,b are acquired by different VHR sensors, small differences in the spatial resolution (Ω Spa ) are also expected. To achieve the geometric differences mitigation (step 2, Figure 2), the block scheme shown in Figure 4 is followed. X s 1,a and X s 2,b are the input images and X s,g 1,a and X s,g 2,b are the spectrally and geometrically mitigated ones (g stands for geometrically corrected). Geometric distortions due to the joint effect of topography, relief and sensor view angle, are mitigated by applying orthorectification with a high resolution Digital Elevation Model (DEM). In this way, most of the misalignments between multitemporal images due to Ω Top and Ω Rel are corrected. However, additional issues remain because of differences in spatial resolutions, thus co-registration should be applied so that pixels with the same coordinates in the images may be associated with the same area on the ground. This step is very critical since a poor co-registration may result in an unreliable final CD map [44]. On the other hand, it is important to clarify that neither orthorectification, nor co-registration solve the problems derived by the presence of vertical structures (i.e., the parallax problem). These types of changes are usually considered as sources of noise and are not of interest, thus they can be removed/mitigated by some feature extraction strategy applied during the CD stage (e.g., shadow detection and removal).
Pansharpening (PS) could be applied between orthorectification and co-registration as an optional step. It is meant to improve the spatial information by integrating the high spectral and low spatial resolution bands with the high spatial and low spectral resolution panchromatic band.
Several PS methods exist in the literature, e.g., Intensity-Hue-Saturation (IHS), PCA, wavelets and Gram-Schmidt [45]. While co-registering, a resampling of the image with highest spatial resolution is performed in order to get the same common spatial resolution of the one with lower spatial resolution. The spatial resolution of VHR images is metric to sub-metric. Looking at existing satellite missions, the spatial resolution of VHR optical images ranges from the 0.3 m of WV-3 and -4 to the 1 m of Kompsat-2 in the panchromatic channel. Thus, when considering VHR multisensor image pairs the maximum resolution difference may rise up to about 0.7 m. Therefore, VHR images spatial resolutions are different yet similar and comparable. The outputs are multisensor VHR images, X s,g 1,a and X s,g 2,b , showing the same spatial resolution and the same physical information, where Ω Sys have been mitigated.

Ω Grd Detection
Once the Ω Sys have been mitigated, the proposed approach performs Ω Grd detection. It extracts the changes of interest by selecting and extracting significant features for specific changes present in the study area. Standard CD approaches like Univariate Image Difference (UID) [46] and CVA [47] perform multitemporal comparison by means of the difference operator. The multispectral (or single spectral, in the case of UID) difference image X D is composed by Spectral Change Vectors (SCV).
The rationale behind the use of the difference operator is that unchanged samples show similar spectral signatures and thus result in SCVs with almost all zero components, whereas changed samples show SCVs with components far from zero. However, when multisensor images are considered, such an assumption is seldom satisfied, even after Ω Sys mitigation. Thus, further homogenization is required in order to satisfy the a priori assumption for a successful employment of simple methods like UID and CVA. When multisensor images are considered, a proper feature space should be explicitly identified where pixel based comparison is meaningful. Here we propose the use of higher-level physical features, derived from the ρ ASR ones. While reducing residual sensor-induced differences in the unchanged areas, thus improving the homogenization level, a better highlighting of the changes of interest is achieved. In other words, working with higher-level physical quantities improves the level of abstraction while increasing the probability to detect the changes of interest and reducing the number of false alarms [6].
Since the proposed approach is general, any feature with high-level physical meaning can be used (e.g., radiometric indices). Further, since the approach is designed for VHR images, it may benefit from the use of spatial context information [5,48]. However here we are interested in understanding the performance of the approach for CD and thus of the multisensor homogenization procedure in mitigating the effects of sensor differences on the CD map. Accordingly, pixel-based features are considered such as radiometric indices. As an example, if changes due to vegetation phenology (Ω Veg ) are present, a radiometric index to detect vegetation can be used. In the case of natural disasters (Ω Dis ) in urban areas, a building index plus vegetation or soil index could be considered. Radiometric indices suitable to detect most of the relevant type of changes can be found in the literature.
Selection of a proper index becomes more and more complex when the Ω Grd are coming from different sources. Since here we focus on vegetation and urban changes, we select features based on linear transformations such as TC or Orthogonal Equations (OrE) among the others [49,50]. TC features were designed originally as a linear transformation for the agricultural analysis on single date images [49], but they were further analyzed and used for CD analysis in multitemporal images. The literature works mainly used TCs in medium resolution and single sensor images (e.g., Landsat) [28,29]. Three main TCs have been studied (i.e., Brightness, Greenness and Wetness) because of their ability to detect and monitor soil content or transitions, vegetation and canopy moisture. Since TC is an invariant transformation in the physical feature space, its features are consistent between different scenes in a multitemporal time series [49] and therefore could be invariant between multisensor multitemporal images. Similarly, the OrE were derived following the TC philosophy and highlight information on crops, vegetation and soil. Additionally, OrE were designed as a substitution for the sensors which TC coefficients have not been derived in the literature, yet.
Given the above discussion, we use TC or OrE for the Ω Grd detection process. These transformations compute a linear combination of the spectral bands. The number of features derived from TC is the same as the number of input features (in our case A and B, respectively), but only 3 TCs are generally used for CD. In the case of OrE, only 3 features are derived. Equation (1) shows the general equation to calculate TC or OrE features (F), where j represents each of the F features and C j,a are the coefficients calculated for each X F j . The same equation applies for TC and OrE, though only the red, blue, green and NIR bands of the sensors are used for the latter. The following analysis applies for both TC and OrE: Once physical level features have been extracted from different sensors, step 4 ( Figure 2) applies CD. As mentioned above, CVA is employed. Since three features are considered, a 3-D representation is obtained [34]. Each SCV of X F D is defined as in Equation (2): Each SCV component captures the multitemporal behavior of the corresponding feature (either TC or OrE). SCV components tend to assume small values when no change occurred, whereas if changes occurred, components assume large values (either negative or positive) depending on the type of change. This is true, even if single date features may have different ranges across each other since the difference operator (2) is applied to corresponding features computed at t 1 and t 2 showing similar range. To effectively perform CD in the multidimensional space defined by X F D , the information in X F D vector is represented in spherical coordinates by computing its magnitude (ρ), azimuth angle (θ), and elevation angle (ϕ). The relationship between X F D in Cartesian coordinates and Spherical coordinates is described by Equations (3)-(5): In the spherical representation, unchanged samples having small values in all the X F D components assume small magnitude (ρ) values, whereas changed samples assume large values in one or more X F D components thus showing a large magnitude (ρ) and a direction along θ and ϕ variables that depends on the ratios among values of the X F D components (i.e., on the type of change). Therefore, the magnitude variable carries information about the presence/absence of changes, whereas the direction variables carry information about the possible type of changes [22,34,47]. According to these observations, a magnitude-direction domain (D) ( Figure 5) can be defined as in Equation (6) that includes all SCVs: where ρ max is the maximum magnitude of X TC D . Three subsets of D are of interest: (i) the sphere (S n ) that includes unchanged pixels, i.e., the ones with small magnitude values; and (ii) the spherical shell (S c ) that includes changed pixels, i.e., the ones with large magnitude values. S n and S c are complementary, the radius T of S n is the inner radius of S c , and their union provides D. T separates changed from unchanged samples along the magnitude. Since the magnitude is a compressed 1-dimensional representation of the change problem, T is obtained as a trade-off among the effects of the types of change. Yet, the definition of T along the magnitude only is a simple and effective solution [22,34]; and (iii) truncated cone sectors (S k ) of changed pixels associated to preferred directions (θ k , ϕ k ) in S c (gray shaded truncated cone in Figure 5). Each preferred direction is associated to a specific type of change Ω Grd (see Figure 1). The volume S k associated with the k-th change is defined as: The upper and lower bounds θ k 1 , θ k 2 , ϕ k 1 and ϕ k 2 , as well as T, can be calculated manually or automatically [51]. Once the angular thresholds have been estimated, the magnitude threshold T can be refined for S k to account for the behavior of each specific type of change [52]. Finally, the change detection map (CD map ) is built by including the following labels Ω = {ω n , Ω Grd }, with Ω Grd = Ω Dis , Ω Veg , Ω Env , . . . , Ω Ant , where ω n refers to unchanged areas. As last step, non-relevant changes (e.g., misregistration, shadows), usually associated with the left side of the tree of radiometric changes shown in Figure 1, are removed from the CD map .

Dataset Description and Design of Experiments
The proposed approach was validated over different areas located in the Trentino region in the north of Italy ( Figure 6). These areas show interesting properties from the point of view of orographic conformation and environmental variety. Over a relatively small region it is possible to find: (i) flat regions including precious apple and vineyard fields, and urban, sub-urban and industrial areas with different density and structure; and (ii) hill and mountain environments with a variety of tree species. Three bi-temporal data sets made up of different couple of images among QuickBird (QB), two WorldView-2 (WV-2) and one GeoEye-1 (GE-1) were constructed over the sample areas (yellow squares in Figure 6). The three datasets were selected such that different types of change are represented. Therefore, dataset 1 shows the transition from forest area to several types of vegetation, dataset 2 shows transitions among different phenological states of crop areas; and dataset 3 shows transitions from vegetation and bare soil (and vice versa) and changes in roofs and roads around an urban area. These datasets allow us to evaluate the complexity of working with multisensor VHR optical images. Details about the three datasets are provided in Table 1. The three satellites show some remarkable differences due to differences in the view angle, the spectral resolution, the number of bands and the spatial resolution. The QB and GE-1 images have four multispectral bands, whereas WV-2 has eight. The spatial resolution of the QB image is 0.6 m for the panchromatic band and 2.4 m for multispectral bands, whereas WV-2 and GE-1 offer a higher spatial resolution in both panchromatic and multispectral bands with 0.5 m and 2 m, respectively. Table 2 summarizes the characteristics of QB, WV-2 and GE-1 images from the spectral and spatial point of view. The spatial resolution differences imply that the sizes I 1 × J 1 and I 2 × J 2 of X 1 and X 2 images, respectively, are different despite they cover the same surface. The size of QB image in Figure 6 is 8674 × 6300 pixels, whereas the size of WV-2 and GE-1 images is 10297 × 7139 pixels. Thus, pixel-by-pixel comparison cannot be directly applied since the same pixel coordinates in the two images do not correspond to the same position on the ground. Concerning spectral resolution, we can observe that the four primary multi-spectral bands of QB, GE-1 and WV-2 are acquired over similar spectral ranges (e.g., red), but not fully identical (e.g., blue). Similar considerations hold for green and NIR bands. In order to apply the proposed approach, we define the tree of radiometric changes (Ω Rad ) specific for the considered study areas (one single joint tree is provided for the three datasets), apply mitigation, extract suitable features and perform Ω Grd detection. As a first step, we define the specific tree of radiometric changes (Ω Rad ) for the considered problem by starting from the general tree given in Figure 1 [5]. The three datasets show changes due to acquisition conditions (Ω Acq ) and changes occurring on the ground (Ω Grd ). With regard to Ω Grd , there are changes in the phenological state of the vegetation (Ω Veg ) (e.g., radiometry of some crops yards, trees, small roads between crop yards (Ω Cro ), re-vegetation) and changes due to anthropogenic activities (Ω Ant ) (e.g., changes in road Ω Roa , deforestation Ω Def , roofs Ω Bui and crop planting). It is important to clarify that even though the types of changes can be visually separated by photointerpretation, we do not have enough information to give a precise "from-to" label to them.
Concerning Ω Acq , both atmospheric conditions (Ω Atm ) and acquisition system Ω Sys effects are present. Ω Atm is mitigated by means of atmospheric corrections [6]. Ω Sys is related to the type of sensor (Ω Sen ) and to the sensor view angle (Ω Ang ). The Ω Ang changes generate small differences in the appearance of objects, leading to geometric distortions, thus to residual misregistration even after proper alignment of images, and to spectral differences when tall buildings are present. We can also see some differences in shadows, which become more critical when high buildings, structures or reliefs are present. Ω Ang and Ω Sen changes are non-relevant from the application viewpoint. Therefore, they are explicitly handled before proceeding to detect Ω Grd and tuning the final CD map. Ω Sys such as the ones due to sensor acquisition mode (Ω Mod ) are not considered since we are working with passive sensors. Ω Grd like the ones due to natural disasters (Ω Dis ) or environmental conditions (Ω Env ) are ignored, since such events did not occur in the considered study area. According to this analysis and to the general taxonomy in Section 2, the tree of radiometric changes for the considered problem becomes the one in Figure 7. Once the radiometric tree was defined, we performed spectral and geometric differences mitigation. All images were provided by DigitalGlobe Foundation in the context of the "MS-TS-Analysis of Multisensor VHR image Time Series" project [54]. Conversion from DNs to At-Surface Reflectance (ASR) was conducted before delivery by means of the Atmospheric Compensation (AComp) algorithm [6,55,56]. Given the orography of the study area and the possible distortions, we applied orthorectification by using a DEM obtained from LiDAR data [57]. Further distortions appear in dataset 1, since it is located in mountain area, and dataset 3 because of the presence of buildings. Additional pixel-to-pixel problems are also observed due to Ω Ang , and co-registration should be applied.
In order to achieve a better co-registration, PS was applied by means of the Gram-Schmidt method. Here ENVI software package was employed [58]. After PS, the spatial resolution for QB is 0.6 m, and 0.5 m for GE-1 and WV-2 multispectral bands. Co-registration was carried out over the QB-WV-2 and WV-2-GE-1 pairs, covering the whole study area in Figure 6, by using a polynomial function of second order. For the QB 2006 and WV-2 2010 couple, 79 uniformly distributed Ground Control Points (GCP) were selected, whereas 68 uniformly distributed GCP were selected for the WV-2 2011 and GE-1 2011 couple. The WV-2 2010 image was resampled during co-registration. Resampling was performed by means of the nearest neighbor interpolation. Figure 8 shows the pansharpened multisensor VHR QB, GE-1 and WV-2 images after applying spectral and geometric mitigation in the first and second column, respectively. Datasets 1 and 2 show a common spatial resolution of 0.6 m and a size of 640 × 640 pixels, whereas dataset 3 shows a common spatial resolution of 0.5 m and a size of 1800 × 1800 pixels. Datasets 1, 2 and 3 appear in first, second and third row of Figure 8, respectively. In order to perform qualitative and quantitative analysis, a reference map for datasets 1 and 2 was defined by photointerpretation and exploiting prior knowledge on the scene as no ground truth was available (see Figure 8c,f, showing 332,414 and 280,928 unchanged pixels (white color) and 77,186 and 128,672 changed pixels (different colors), respectively). For dataset 3, considering the extent of the area and the fact that we have no complete knowledge of the changes occurring on the ground, it was not possible to derive a complete reference map. Thus, quantitative analysis was based on 62808 pixels marked as changed, and 6263 as unchanged, selected by photointerpretation. For comparison purposes, a false color composition of the two acquisitions is provided, green and fuchsia areas represent changes (Figure 8i). Changed pixels in the reference map include Ω Grd , only. For dataset 1, changes from (i) forest to bare soil; (ii) forest to grass; (iii) base soil to grass and (iv) bare soil to some road are identified. For dataset 2, changes from (i) dense vegetation to sparse or light vegetation and vice versa; and (ii) bare soil to vegetation are present. In addition, for dataset 3, changes are from (i) bare soil to vegetation, both dense and sparse; (ii) one to another color of the roofs; and (iii) old to new roads. Once spectral and geometric mitigation was achieved, mitigation of residual Ω Sys was performed at the level of feature extraction. The selection of the features is therefore designed to detect the residual Ω Sys and the Ω Grd . According to the tree of radiometric changes (Figure 7), residual Ω Sys might be related to Ω Ang , resulting in possible shadows and/or registration noise, whereas Ω Grd includes three type of changes: Ω Cro , Ω Roa and Ω Def . Residual Ω Sys due to shadows, due to vegetation or buildings, were detected by means of the method in [59], whereas Ω Sys due to registration noise are negligible. Ω Grd were detected by employing TC and OrE features.
Three main TCs (i.e., Brightness, Greenness and Wetness) and three OrE (Crop mark, Vegetation and Soil) have been studied because of their sensibility to soil content or transitions from-to soil, vegetation, canopy moisture and anthropogenic activities have been studied. Thus, we expect them to properly detect the different changes in the study area, with the exception of some transitions between green areas that do not show up in TC features. These are the cases of datasets 1 and 3, where transitions from forest to grass and crop to grass are misdetected. This is due to the fact that the difference is more in texture rather than in TC features. In order to evaluate and compare the proposed approach, experiments carried on features such as ASR, are also conducted. The detection of changes is obtained by applying CVA to a 3D TC and OrE feature space. However, other features such as IR-MAD [24] could be also used, despite no approaches for automatic detection with these features in multisensor VHR images exist yet. The drawback with IR-MAD is that it requires an end-user interaction to select the most specific components that represent the specific change of interest and to separate among changed and un-changed samples. This is time consuming and makes the approach not fully automatic.
Two experiments were designed: (i) experiment 1 (exp. 1) applies CVA to the transformed ASR bands features; and (ii) experiment 2 (exp. 2) applies CVA to TC (for datasets 1 and 2) or OrE (for dataset 3 which has a large presence or urban areas) features. In exp. 1, the first two selected bands are the Near-IR (NIR) and Red (R) given their high spectral sensibility in the analysis of vegetation and anthropogenic activities. The R bands of QB, GE-1 and WV-2 have quite similar spectral range, whereas the NIR ones do not (see Table 2). Therefore, between NIR1 and NIR2 WV-2 bands, NIR1 (770-995 nm) was selected given that its spectral range better matches to the QB NIR (760-900 nm) and GE-1 NIR (757-853 nm) band spectral range. Another ASR feature to be selected could be the Green or Blue band. Empirical experiments showed a slightly improvement in the final CD accuracy while using Green band instead of Blue one. Therefore, Green band was selected as third feature. In exp. 2, for datasets 1 and 2, TC features were selected based on: (i) the maximum number of TC features that can be derived for each specific sensor, (ii) the radiometric tree of changes; and (iii) the possibility to compare between multisensor TC features. Thus, 4 TC features were derived, bounded by QB properties. In accordance with the radiometric tree, features should be selected that are able to highlight Ω Veg and Ω Ant . Based on the level of comparison between the multisensor TC coefficients, and according to the state-of-the-art, only the first three TC features of each sensor show similar physical meaning. For dataset 3, only three OrE features exist in the literature and are derived by means of the 4 main spectral bands of the sensors.
TC and OrE features were derived directly from the spectrally mitigated data and by using the coefficients in Tables 3-5. For TC, only coefficients corresponding to the first three TC feature are present. Here the set of coefficients in [60] was applied to the QB image. The coefficients are derived from the DN feature space (Table 3). There are no TC coefficients derived from TOA values for QB images in the literature. Thus, we applied the QB TC coefficients over the QB DN features, and compared the derived TC features as a higher level primitive. For the WV-2 image, coefficients are applied as given in [61], but to the ASR features instead of TOA ones, as originally derived ( Table 4). The OrE features were derived by means of the coefficients shown in Table 5 and as per Equation (1).

Results and Discussion
In order to assess the effectiveness of the Ω Sys mitigation and the Ω Grd detection approach based on higher-level physical features, CVA was applied by considering the 3D feature space defined above and by means of Equations (2)- (5). We first extracted all the areas that correspond to radiometric changes in the image, by thresholding the magnitude variable. The selection of a threshold T over ρ showed to be a simple and fast way to separate among changed and non-changed pixels. A good separation, among the three dimensions of CVA was guaranteed in average. T was automatically selected by applying a Bayesian decision rule [35] and by following the implementation presented in [62]. In [62], the statistical distribution of the magnitude as a mixture model representing the classes of unchanged and changed pixels is approximated by a Gaussian mixture. Parameters for the statistical distribution are derived by means of the EM algorithm and decision of the threshold is made using a Bayesian minimum cost criterion [46]. The T values for each of the datasets in the two experiments are shown in Table 6. Figure 9 shows the multispectral difference image 3D histogram for the dataset 1, where bigger circles represent higher density (corresponding to the un-changed samples) and small circles represent lower densities (related to the different types of changes). In Figure 9a,c, it is possible to see how X ASR D and X TC D are distributed in a coplanar and sparse way, respectively. This kind of distribution leads to an easier visual interpretation of the X TC D , if compared to X ASR D . In fact, from the 3D histograms in Figure 9, we can see that the coplanar distribution of ASR features does not allow good separation between changes of interest and changes of no interest. A similar behavior is observed between the ASR and OrE features. Right column of Figure 9, presents the changed samples after removing the unchanged ones. Different clusters fit to different S k sectors and are associated with different changes. As we move from X ASR D to X TC D , or X ASR D to X OrE D , it becomes evident how different clusters locate around preferred directions and how the number of clusters increases, making their detection and separation more effective. Separation among different clusters can be performed by automatic or manual methods.  The last step builds the multiple CD map by means of the extraction by cancellation strategy. To this end, selection of each of the clusters in the changed region was conducted automatically by means of an adaptation of the TSMO proposed in [36]. The TSMO method yields the same set of thresholds as the conventional Otsu method, but it greatly decreases the required computation time. This is achieved by introducing an intermediate step based on histogram valley estimations, which is used to automatically determine the appropriate number of thresholds for an image. Final multiple CD maps were built by cancelling the remaining Ω Sys and the Ω Grd that are out of interest (e.g., cars in road and parking areas, Figure 10). As expected, some vegetation changes that affect more texture rather than TC features are misdetected in datasets 1 and 3. In other words, the selected higher-level physical features are not optimized for those changes, but different higher-level physical features could be selected to properly model most of the types of changes.
In order to perform a qualitative analysis, a comparison of the CD maps with their reference maps (for datasets 1 and 2) and the set of points collected by photointerpretation (dataset 3) was carried out. The comparison pointed out the improvement achieved when working with higher-level physical quantities, specifically for transitions from and to bare soil and different types of crops. This is confirmed when we analyze dataset 1 where changes are mainly from-to vegetation and bare soil. TC features outperform the results obtained when using ASR features ( Table 7). As expected, both TC and ASR features have similar problems to identify the change from forest to vegetation. An analogous situation occurs on dataset 2, with changes from-to different types of crops. In this case, TC outperforms ASR being able to separate among changes C2 and C3 (which cannot be discriminated when using ASR features-see Figure 10d-f). For datasets 1 and 2, the major improvement is related to the decrease of False Alarms (FA) (see Tables 7 and 8). In dataset 1, the FA correspond to the main road passing through the area and to some of the remaining shadows generated by the tree lines. Even though an index was used to remove the shadows, a small percentage of them remained. In dataset 2, the FA correspond mainly to the linear structures like roads in between the different crops. Moreover, it is possible to observe improvements in terms of detection and separation of the different types of changes. The number of Missed Alarms (MA) decreased as well when working with TC, leading to a better detection of changes, especially when a higher number of changes exist, such as the case of dataset 2. From the quantitative viewpoint, the reduction in both FA and MA reflects in the Overall Accuracy (OA) that increases of about five percent and seven percent for dataset 1 and 2, respectively (see Tables 7 and 8).  In the case of dataset 3, ASR features poorly separates among C3 and C4 classes that correspond to transitions from bare soil to sparse and dense vegetation, respectively. The number of MA for the class C3 by ASR features is clearly larger than for OrE ones (see Table 9). Moreover, OrE features are able to better detect the classes C2 and C7 than ASR features. C2 and C7 correspond to changes occurring on building roofs and roads renewed in the studied period. Finally, ASR features do not detect class C8 (change in roof color of a building), whereas OrE features do. Other transitions from-to vegetation and bare soil can be seen in the change classes with less MA and FA when the OrE features are used. It is worth noting that changes due to moving cars on roads or in parking areas were considered as non-relevant for this study and thus neglected. From the quantitative perspective, the OA obtained by OrE features increased the overall accuracy of about 2% over that of ASR features. This improvement can be considered relevant given the complexity of the scene with the presence of more types of changes. Proper higher-level physical features, such as some radiometric indexes or texture features, may provide better results. Tables 7-9 show the confusion matrices obtained for each dataset in the two experiments.

Conclusions and Future Developments
In this paper, an approach for CD in VHR multispectral multisensor optical images has been presented. The proposed approach aims at defining and illustrating a data flow for effectively handling differences due to acquisition sensors. It is based on a general framework for the design of CD systems for VHR multitemporal images presented in [5]. In order to deal with multispectral and multitemporal images acquired by different sensors, it integrates in the general approach the following two concepts: (i) spectral, radiometric and geometric homogenization between images acquired by different sensors; and (ii) detection of multiple changes by means of features that guarantees homogeneity over time and across sensors. Experimental results on real datasets, made-up of VHR bi-temporal and multisensor optical images, confirmed the effectiveness of the proposed block scheme and the improvement achieved by the use of higher-level physical features (i.e., TC and OrE) over the traditional features (i.e., TOA or ASR). In the specific cases of datasets 1 and 2, a major improvement is observed when changes from-to vegetation and bare soil, and different types of crops are considered. This given that the higher-level physical feature (i.e., TC) were selected to highlight such type of changes. In the case of dataset 3, the use of OrE for the detection of changes described above, as well as for changes from-to vegetation and bare soil (i.e., forest to grass, crop to grass, bare soil to grass/forest) and small changes on roads and roofs, resulted in better CD accuracy than that obtained by using ASR. In general, both TC and OrE features allow a better separation and interpretation of Ω Grd by guaranteeing that these changes are distributed in compact and well separated clusters (when in the 3D feature space).
As future developments, further analysis should be carried out to determine which cluster represents a specific type of change, and to define appropriate features for other types of changes. For the mitigation of remaining Ω Sys and the better detection of the Ω Grd , the use of additional features, either in the physical feature space or in the spatial feature space, could help to make the separation and distinction better, thus improving the final OA. Additional improvements from the Ω Sys mitigation process point of view in both spectral and geometric perspective should be considered. For the spectral differences, and given that some of the VHR multisensor optical images have different number of bands and different spectral ranges, the use of regression methods for predicting bands that match from the spectral viewpoint could be considered. For the geometric differences, improvements on the co-registration process by the use of co-registration methods designed for multisensor images could be also explored. Improvements on the detection of building changes could be also integrated.