Global Mapping of Surface Composition on an Exo-Earth Using Sparse Modeling

The time series of light reflected from exoplanets by future direct imaging can provide spatial information with respect to the planetary surface. We apply sparse modeling to the retrieval method that disentangles the spatial and spectral information from multi-band reflected light curves termed as spin-orbit unmixing. We use the $\ell_1$-norm and the Total Squared Variation norm as regularization terms for the surface distribution. Applying our technique to a toy model of cloudless Earth, we show that our method can infer sparse and continuous surface distributions and also unmixed spectra without prior knowledge of the planet surface. We also apply the technique to the real Earth data as observed by DSCOVR/EPIC. We determined the representative components that can be interpreted as cloud and ocean. Additionally, we found two components that resembled the distribution of land. One of the components captures the Sahara Desert, and the other roughly corresponds to vegetation although their spectra are still contaminated by clouds. Sparse modeling significantly improves the geographic retrieval, in particular, of cloud and leads to higher resolutions for other components when compared with spin-orbit unmixing using Tikhonov regularization.


INTRODUCTION
The photometric variation of directly imaged exoplanets has been considered as an invaluable probe for the environment of habitable planets as well as spectroscopy (Ford et al. 2001). A two-dimensional surface distribution can be decoded from the diurnal and seasonal variations in reflected light (Kawahara & Fujii 2010). To date, this technique, termed as spin-orbit tomography, has been studied in terms of the regularization of geography (Kawahara & Fujii 2011;Fujii & Kawahara 2012), Bayesian formulation (Farr et al. 2018;Kawahara & Masuda 2020), planet's axial tilt determination (Schwartz et al. 2016;Farr et al. 2018), dynamical mapping (Kawahara & Masuda 2020), non-Lambertian effect (Luger et al. 2021), and its application to real Earth data (Luger et al. 2019;Fan et al. 2019).
Furthermore, the importance of regularization has been recognized. Tikhonov regularization, which was originally applied by Kawahara & Fujii (2011) to spin-orbit tomography, tends to exhibit smooth solution as an inferred map. Aizawa et al. (2020) showed that sparse modeling improved the sharpness of the inferred map. Sparse modeling is an optimization technique that assumes the sparsity of the solution, as described in Appendix C. More recently, Asensio Ramos & Pallé (2021) introduced a neural-net-based regulator, learned denoiser, which also improved map quality.
Spin-orbit tomography is formulated to retrieve a two-dimensional spatial map of a single component such as single-band photometry, color difference, and a single principle component. On the other hand, multi-color photometric variation contains information on the spectra of individual surface components such as water, soil, vegetation, snow, and clouds. Rotational spectral unmixing is a blind retrieval of endmember spectra of individual surface components combined with the disentanglement of geography by spin rotation, and it has been explored using EPOXI satellite data (Cowan & Strait 2013;Lustig-Yaeger et al. 2018). However, rotational spectral unmixing suffers from non-uniqueness of the inferred endmember spectra (Fujii et al. 2017).
Recently, Kawahara (2020) proposed spin-orbit unmixing, which is a unified scheme of spin-orbit tomography and spectral unmixing that leverages non-negative matrix factorization and simplex volume minimization.
The latter technique is essentially regularization of the spectral components proposed in the remote sensing field (Craig 1994), which eliminates the ambiguity of the endmember determination. However, the regularization for geography is still a traditional Tikhonov ( 2 -norm) regularization in the current spin-orbit unmixing. In this study, we show that sparse modeling in spin-orbit unmixing improves not only the map quality but also the recovery of the endmember spectra.
The remainder of this paper is organized as follows. We review the basic formulation of spin-orbit unmixing in Section 2. In Section 3, we formulate the spin-orbit unmixing with sparsity. Furthermore, in Section 4, using a toy model, we verify our technique. In Section 5, we demonstrate the new technique by applying it to real observational data of the Earth obtained from the Deep Space Climate Observatory (DSCOVR) by Fan et al. (2019). In Section 6, we summarize our findings.

Spin-Orbit Tomography
The planet flux of reflected light from a solid surface is expressed (Appendix A in Kawahara 2020, for the derivation) as follows: where R p denotes a planet radius; a is a star-planet distance; f is the stellar flux; R(ϑ 0 , ϕ 0 , ϑ 1 , ϕ 1 ) is the bidirectional reflectance distribution function (BRDF); ϑ 0 and ϕ 0 are the incident zenith angle and azimuth angle, respectively; ϑ 1 and ϕ 1 are the reflected zenith angle and azimuth angle, respectively; e O , e S , and e R denote the unit vector from the center of the planet to the observer, primary star, and surface, respectively; and S I ∩ S V denotes the overlapped area of the illuminated region S I ( e S · e R > 0 ) and the visible region S V (e O · e R > 0). Assuming isotropic reflection (Lambertian), the local reflectivity on the planet surface is expressed as a function of the spherical coordinate: m(θ, φ) := R(ϑ 0 , ϕ 0 , ϑ 1 , ϕ 1 ). (2) Then, we obtain where The discretization of t and (θ, φ) yields In this study, we pixelized the planet surface using Hierarchical Equal Area isoLatitude Pixelization (HEALPix) (Górski et al. 2005). Then, Equation (5) is then written in matrix form as follows: where d i := f ref p (t i ); m j := m(θ j , φ j ); W ij := G(t i , θ j , φ j )∆S (i = 1, . . . , N i , j = 1, . . . , N j ); N i and N j are the number of observation data and pixels of the surface, respectively. We infer the surface distribution of planet m est from the observation data d obs . In reality, we must include an observation error ε in the model: Spin-orbit tomography infers m from d obs . In general, W is a function of an orbital inclination i, an orbital phase at Equinox Θ eq , and a planet spin vector or equivalently a set of planet obliquity ζ. Additionally, the planet rotation period should be inferred.
These nonlinear parameters can be inferred within the framework of spin-orbit tomography (Kawahara & Fujii 2010;Schwartz et al. 2016;Farr et al. 2018;Kawahara & Masuda 2020). In this study, we assume that these nonlinear parameters are known. This type of optimization problem then becomes a linear inverse problem. In general, the linear inverse problem requires regularization of the model to suppress the model instability due to noise, namely, to avoid overfitting or overlearning. Regularization can take a variety of formulations such as a bounded model (Kawahara & Fujii 2010), a non-negative condition (Kawahara 2020), regularization term (Kawahara & Fujii 2011;Aizawa et al. 2020), and neural net (Asensio Ramos & Pallé 2021). Using the regularization term, the inverse problem can be formulated as an optimization problem: Kawahara & Fujii (2011) used Tikhonov regularization as follows: where λ Tik denotes the regularization parameter. Tikhonov regularization tends to provide a smoother solution than the other regularization methods. To improve the sharpness of the inferred map, Aizawa et al. (2020) introduced a combination of 1 -norm regularization and Total Squared Variation (TSV) regularization (Kuramochi et al. 2018) as a sparse modeling as follows: where λ 1 and λ TSV denote the regularization parameters, · 1 and · TSV denote 1 -norm and TSV norm, respectively. Spin-orbit tomography can retrieve a map of a single component of a light curve. For instance, Kawahara & Fujii (2011) used the color difference between the near-infrared and visible bands to reduce the cloud contribution. The retrieved map accurately represented the land and ocean distribution. Fan et al. (2019) used a second principle component of the PCA, which also recovered the land/ocean distribution from the real data of Earth obtained via DSCOVR. However, these choices use prior knowledge of the surface of the planet.
To infer map and surface components, Cowan & Strait (2013) applied spectral unmixing to the longitudinal mapping of Earth. However, the retrieved map did not match the real land and ocean distribution. Kawahara (2020) extended their method to a two-dimensional mapping, corresponding to a combination of spectral unmixing and spin-orbit tomography, namely, spin-orbit unmixing. They also introduced regularization terms for geography and spectra. The detail of spectral unmixing is presented in Appendix A.

Spin-Orbit Unmixing
We explain spin-orbit unmixing. Spin-orbit unmixing uses the observation data matrix D ∈ R Ni×N l along the time and wavelength axes. Let M ∈ R Nj ×N l be the surface distribution defined as M jl := m(θ j , φ j , λ l ), and let A ∈ R Nj ×N k and X ∈ R N k ×N l be the surface distribution matrix and endmember matrix, respectively. Then, the formulation of the spin-orbit unmixing is as follows: As an optimization problem, the spin-orbit unmixing can be expressed as follows: where R(A, X) denotes a regularization term and · F denotes the Frobenius norm, which can be defined as follows: Kawahara (2020) used Tikhonov regularization for A and determinant type of the volume regularization for X as follows: where λ A and λ X denote the regularization parameters. Volume regularization is based on the concept of simplex volume minimization, which was developed in the field of remote sensing (Craig 1994;Fu et al. 2015;Lin et al. 2015;Fu et al. 2019;Ang & Gillis 2019). With respect to spectral unmixing, it is known that non-negative matrix factorization with the volume regularization term accurately reproduces high-resolution spectrum components from satellite data. Simplex volume minimization is justified under the assumption that the data points are widely spread in the convex hull defined by the endmembers. The true endmembers are then identified by the data-enclosing simplex, whose volume is minimized (Craig 1994). An intuitive explanation is provided in Figure 1 in Lin et al. (2015) and Figure  1 in Kawahara (2020). The choice of det (XX ) as a regularization term is based on the following fact: the volume of an (N l − 1)-simplex (or convex hull) in (N l − 1)-dimensional space with vertices {x 1 , x 2 , . . . , x N l } is det(XX )/(N l !), where x k denotes the k-th endmember (i.e., the k-th column vector) of X .

SPIN-ORBIT UNMIXING WITH SPARSITY
In this study, we introduce the sparsity of geography A into spin-orbit unmixing. Most of the elements are zero in a sparse matrix. Sparse optimization is described in Appendix C. The objective function in this study is expressed as follows: As a regularization of A, we consider two types of sparse modeling: 1 -norm+Total Square Variation (TSV) and trace norm regularization. We also computed Tikhonov regularization for comparison purposes. In Equation (15), we must optimize A and X. Based on Kawahara (2020), we used the block coordinate descent method, which divides the problem into two separate optimizations for A and X (for example, Kim et al. 2014), where q A = q A (A) and q X = q X (X) are defined by rearranging Equation (16) as and these optimizations are iteratively conducted until convergence.

Spin-Orbit Unmixing with 1 -Norm and TSV Regularization
In the context of spin-orbit tomography, Aizawa et al. (2020) described a procedure for obtaining a sparse solution using the 1 -norm and TSV regularization. We extend the optimization method to spin-orbit unmixing. The optimization problem with the 1 -norm and TSV regularization can be expressed as follows: where a k denotes the k-th column vector of A, and λ 1 , λ TSV , and λ X denote the regularization parameters. We used the determinant type of volume regularization for X as introduced in Section 2.2. First, we solve the optimization problem (21) for A. Q 1+TSV is rearranged with respect to a k as follows: where x k is the k-th column vector of X , p A := (1/ x k 2 2 )∆x k , and ∆ is a matrix defined as ∆ il := D il − j s =k W ij A js X sl . Therefore, the subproblem for solving the optimization problem (21) is expressed as follows: where λ 1 := λ 1 / x k 2 2 and λ TSV := λ TSV / x k 2 2 . Then, we suppose where δ + is defined in Equation (B28). It should be noted that f 1 +TSV (a k ) is differentiable, and ψ 1+TSV (a k ) is non-differentiable. By adopting the above equation, Equation (24) is rewritten as follows: This objective function consists of a combination of two proper convex functions (Definition 3 in Appendix F) that are differentiable and not necessarily differentiable. This type of objective function can be optimized using the proximal gradient method, as explained in Appendix B.2.
The gradient of f 1+TSV (a k ) and proximal operator of ψ 1 +TSV (a k ) can be obtained as follows: whereÑ is defined in Equation (C43), γ > 0 is the parameter indicating the step size, 1 = (1, 1, . . . , 1) denotes a vector such that all entries are one (one vector), and max denotes the element-wise maximum. Hence, the update formula of the proximal gradient method for (24) can be expressed as follows: In this study, we used Monotone FISTA (MFISTA) (Beck & Teboulle 2009a) to solve the subproblem for a k (24). MFISTA deals with the non-monotonic decreasing nature of Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) (Beck & Teboulle 2009b), and Aizawa et al. (2020) used it to solve spin-orbit tomography with the 1 -norm and TSV regularization.
Next, we consider the optimization problem (21) for X. The objective function Q 1+TSV can be rearranged with respect to x k as follows: where L X := W a k 2 2 I, D X := λ X det(X kX k )(I − X k (X kX k ) −1X k ), l X := ∆ W a k , I is an identity matrix, andX k is a submatrix of X with the k-th row removed. The subproblem for solving the optimization problem (21) is expressed as follows: Subproblem (34) is an optimization problem with a non-negative constraint for a differentiable objective function. This problem can be optimized using the proximal gradient method as explained in Appendix B.2. The derivative with respect to x k can be obtained as follows: Then, the update formula of the proximal gradient method for (34) is written as follows: Kawahara (2020) optimized the subproblem (34) using FISTA, which is a proximal gradient method using Nesterov's acceleration method (Nesterov 2003) with the restart method. Nesterov's acceleration method increased the convergence speed. However, the function value did not necessarily decrease monotonically. The restart method (O'donoghue & Candes 2015) avoids increasing the objective function by restarting Nesterov's acceleration method when the function value does not decrease.
Consequently, the optimization algorithm for solving (21) is summarized as follows: Algorithm 1 Spin-orbit unmixing with 1 -norm and TSV regularization Initialization: non-negative matrices A0 and X0 for n in (1, Ntry) do for k in (1, N k ) do Update x k using FISTA with the restart method Update a k using MFISTA end for end for

Spin-Orbit Unmixing with Trace Norm Regularization
We also consider another type of sparse modeling, termed as trace norm regularization. Trace norm regularization is sparse modeling of a matrix, not a vector. It results in a low-rank matrix. Therefore, vectorization of A is not required as opposed to that in the 1 -norm and TSV regularization. The optimization problem using trace norm regularization for A is expressed as follows: where · Tr denotes trace norm defined in Equation (C52), and λ A and λ X denote regularization parameters.
The subproblem for solving the optimization problem is expressed as follows: For trace norm regularization, we do not impose a non-negative condition on A because it is too difficult to implement. Additionally, we used the determinant type of volume regularization for X, same as Section 3.1. Thus, the subproblem (41) is the same as the subproblem (34).
We solved the subproblem for x k (41) using the same scheme as that in Section 3.1. Let us consider the subproblem of A. Given that f Tr (A) := (1/2) D − W AX 2 F is differentiable and ψ Tr (A) := λ A A Tr is non-differentiable, it can be solved using the proximal gradient method as explained in Appendix B.2. The derivative of f Tr (A) and proximal operator of ψ Tr (A) are as follows: where W = U ΣV denotes the singular value decomposition of W (Lemma 8.4 in Tomioka 2015). Hence, the update formula of the proximal gradient method for (40) can be expressed as follows: Hence, the optimization algorithm for solving (39) can be summarized as follows: Algorithm 2 Spin-orbit unmixing with trace norm regularization Initialization: non-negative matrices A0 and X0 for n in (1, Ntry) do for k in (1, N k ) do Update x k using FISTA with the restart method end for Update A using FISTA with the restart method end for The code for optimization is publicly available 1 .

TESTING USING A CLOUDLESS MODEL
In this section, we describe the testing of our method using a cloudless Earth model. To compare with the results obtained using sparse modeling, we also consider spin-orbit unmixing with Tikhonov regularization and the determinant type of volume regularization (Kawahara 2020). The optimization algorithm for the latter is provided in Appendix D.

Cloudless Earth
First, we generate mock multi-color light curves as follows: where A true ∈ R Nj,true×N k,true and X true ∈ R N k,true ×N l denote the input geography and surface spectra, respectively. The time interval is one year divided by N i = 512. Furthermore, N j,true = 12288, N k,true = 3, and N l = 10 denote the number of pixels on the planet surface, surface components, and observing bands, respectively. We considered vegetation, land, and ocean as the endmembers of the surface components. Error matrix E∈ R Ni×N l is randomly generated as follows: where N (0, 1) is a normal distribution with a mean µ = 0 and standard deviation σ = 1. We used the classification map provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) as the input surface distribution A true , ASTER spectral library (Baldridge et al. 2009) for the spectra of vegetation and land, and those produced by McLinden et al. (1997) for the ocean. We set the observation wavelength to 0.425 + 0.05(l − 1) µm (l = 1, . . . , N l = 10). A true and X true are shown in Figure 1. We also generated W using the method described in Section 2.1 with orbital inclination i = 45 • , orbital phase angle at the vernal equinox Θ eq = 90 • , obliquity ζ = 23.4 • , orbital period P orb = 365 days, and rotation period P spin = 23.9344699/24.0 days.
Based on D and W , we infer A ∈ R Nj ×N k and X ∈ R N k ×N l by solving the following optimization problem: where R(A, X) takes the form in Equation (22) ( 1 -norm and TSV regularization), (39) (trace norm regularization), or (14) (Tikhonov regularization). We set the number of pixels in the inferred map to N j = 3024 and number of endmembers to N k = 3.
There is a known indefiniteness of matrix factorization, as explained below.
The surface distribution at each wavelength M ∈ R Nj ×N l can be written as: where a k and x k (k = 1, . . . , N k ) denote the k-th column vectors of A and X , respectively. By using the constant, c k , we obtain the following.
This implies that the inferred surface distribution and spectrum have an indefiniteness of constant multiples. Hence, we normalize the inferred surface distribution and spectrum for X as follows: where x k,true (k = 1, . . . , N k,true = N k ) is the k-th column vector of X true . Furthermore, x k and x k,true denote the means of x k and x k,true , respectively.
The normalized surface distributionÂ and spectraX are shown in Figure 2. The inferred map accurately reproduces the structure of the input. Notably, the inferred spectra are in excellent agreement with the inputs, which is much better than the case of spin-orbit unmixing with Tikhonov regularization (Kawahara 2020).
We used the same evaluation measures as those in Appendix E to select the regularization parameters λ A and λ X . Furthermore, we adopted λ A and λ X with a local minimum value of MRSA similar to that in Appendix E.
In the same manner as in Section 4.2, the normalized surface distributionÂ and spectraX are shown in Figure 3.

Comparison by Varying The Regularization Term
We compare the inferred solutions with 1 -norm and TSV regularization, trace norm regularization, and Tikhonov regularization (Figure 4). Figure 4(d)(e)(f) are the inferred surface distributions by spin-orbit unmixing with 1 -norm and TSV regularization. We can see less noise in the area with zero values and a more continuous surface with values in each endmember than the case with other regularizations. The sparseness and continuousness of the inferred map are induced by 1 -norm and TSV norm regularization, respectively.
In general, it is difficult to retrieve the geography that contributes less to the data such as vegetation in Oceania and land in South Africa       whichever regularization we use. However, the land distribution in Chile inferred with the 1 -norm and TSV regularization seems consistent with the input map while one with Tikhonov norm regularization is equivalent to the noise. Additionally, the inferred maps obtained using spin-orbit unmixing with 1 -norm and TSV regularization (Figure 4(d)(e)(f)) are smoother than that with Tikhonov regularization (Figure 4(j)(k)(l)). This could be because of the differences in properties between TSV and Tikhonov regularizations, both of which tend to exhibit smooth solutions. TSV regularization induces a smooth map by minimizing the square of the difference between the values of neighboring pixels (described in Appendix C.2), while Tikhonov regularization induces a smooth solution by preventing overfitting. Furthermore, the choice of regularization parameters and the non-negative condition can affect the smoothness and noise of the inferred map. The results can be compared using various regularizations or constraints in future studies.
In the inference of the spectra (Figure 4(m)(n)(o)), we used simplex volume regularization in all cases. Although regularization terms for the spectra are the same, the inferred spectra are affected by the change in the regularization term for the surface distribution. Specifically, focusing on the ranges of 0.425-0.575 µm and 0.725-0.875 µm, the one using 1 -norm and TSV regularization (Figure 4(m)) is the closest to the input spectrum. These results indicate that the surface distribution and spectrum inferred by spin-orbit unmixing with 1 -norm and TSV regularization are superior to other regularizations.
Let us also note the solution inferred by spin-orbit unmixing with trace norm.
The inferred surface distributions (Figure 4(g)(h)(i)) capture the features of continuous surfaces such as continents, but pixels with small values are noisier than those inferred with 1 -norm and TSV regularization. This may be due to the similarity of the vectors corresponding to the surface distributions of the endmembers as a result of the low-rank matrix induced by using trace norm regularization.
In this study, we employed only trace norm regularization in the inference. We can consider adding other regularizations or constraints, especially the non-negative condition. The non-negative condition leads to a map in which large parts are zero, as shown in Section 4.2 (spin-orbit unmixing with 1 -norm and TSV regularization), Kawahara (2020) (spin-orbit unmixing with Tikhonov regularization), and Kawahara & Fujii (2010) (spin-orbit tomography with Bounded Variable Least-Squares Solver including the non-negative condition). Therefore, by including the non-negative condition in spin-orbit unmixing with trace norm, we expect improvement in inferences of geography and spectrum. The application of trace norm regularization is subject to future investigation.

APPLICATION TO REAL OBSERVED DATA
In this section, we apply our method with 1 -norm and TSV regularization to real long-monitoring data of Earth as observed by DSCOVR/Earth Polychromatic Imaging Camera (EPIC) (Jiang et al. 2018). Since 2015, DSCOVR has been continuously observing the dayside of the Earth from the first Sun-Earth Lagrangian point (L1). Given that Earth's rotation axis is tilted relative to its orbit, although it is not the same as that in the case of direct imaging, the observed data contain two-dimensional information about the planet surface. This allowed us to perform two-dimensional mapping (Fan et al. 2019). Kawahara (2020) inferred the surface distribution and unmixed spectra by applying spin-orbit unmixing with Tikhonov regularization to the DSCOVR data. In our experiment, we applied spin-orbit unmixing with the 1 -norm and TSV norm described in Section 3.1 to the DSCOVR data. Following the same setup as that in Kawahara (2020) We selected the regularization parameters using the procedure described in Appendix E, same as Section 4.2. However, it is not possible to calculate MRSA and CPR for actual exoplanet observations because the true surface distribution and reflection spectra are unknown. Therefore, only the mean residual and the normalized volume of the spectrum were used to determine the optimal parameters. Figure 5 shows the mean residuals and normalized spectral volumes calculated by varying one of λ 1 , λ TSV , and λ X . We selected λ 1 = 10 −2 , λ TSV = 10 −2 , and λ X = 10 −7 as optimal values because the mean residual significantly increased at a range higher than these values, and the value of normalized spectral volume was sufficiently small (∼ 10 −20 ) at these values. As shown in Figure 6, as the calculation proceeds, the mean residual increases at a point. Therefore, we used the inferred solution at the epoch where the mean residual is minimal. Figure 7 shows the inferred spectra, color composite maps, and the map that excludes Component 0 using the aforementioned procedure (we assume that N k = 4). In the unmixed spectra, the strong oxygen B and A absorption features were observed at 0.688 and 0.764 µm, respectively. As shown in Figure 7(b), we obtained the distribution of real clouds in the mid-latitudes of the Southern Hemisphere, depicted by Component 0. In addition, there are few in the vicinity of the Sahara Desert. Therefore, we can interpret that Component 0 corresponds to the cloud. Kawahara (2020) using Tikhonov regularization resulted in patchy cloud distributions (Figure 8) 2 that were inconsistent with the real distribution. In contrast, we were able to obtain more continuous maps of cloud than that from Tikhonov regularization. Note that real clouds are also distributed in the mid-latitude zone of the Northern Hemisphere (described in Figure 6 in Kawahara & Masuda (2020)), but we were unable to retrieve the distribution. This may be due to the degeneracy with the continents. The map that excludes Component 0 (Figure 7(c)) accurately resembles the real continental distribution. The structure of South America and Australia, depicted by Components 2 and 3, is consistent with Aizawa et al. (2020) (single band mapping using sparse modeling). Component 1 depicts the geography of the ocean. The spectrum of Component 1 also reasonably reproduces that of the ocean. Hence, we can interpret that Component 1 corresponds to the ocean.
On the other hand, Components 2 and 3 seem to be degenerate. However, we can see that the geographical features of North America and Australia are depicted by Component 2, and that of the Sahara Desert and Chile by Component 3. Furthermore, South America and Eurasia are depicted by both Components 2 and 3. This is probably because these continents contain both soil and vegetation on the surface. For unmixed spectra, Component 2 exhibits larger values at 0.688 and 0.764 µm than Component 3, but smaller for 0.779 µm, that is, Component 3 appears redder than Component 2. Additionally, the increase at 0.688 and 0.764 µm might be interpreted that the spectrum of Component 2 captures the red edge of vegetation although the strong oxygen absorption bands make the interpretation difficult. Thus, we can interpret that Component 2 corresponds to vegetation and Component 3 to soil or sands.
Let us also note the South Pole should be depicted as ice, namely Component 0, but it was depicted as Component 2 in Figure 7 and not visible in Figure 8. This inconsistency may be due to the low observational weights on the poles of DSCOVR (presented as Figure 3(b) in Aizawa et al. (2020)). When compared with the inferred map obtained using Tikhonov regularization in Kawahara (2020) (Figure 8), the continents are better separated from each other in the map obtained using 1 -norm and TSV regularization, especially for the Arabian Sea and North Atlantic Ocean.

CONCLUSION
In this study, we introduced sparse modeling ( 1 -norm and TSV regularization) to spin-orbit unmixing for the global mapping of planetary surfaces. For this purpose, we combined and improved the methods proposed by Aizawa et al. (2020) and Kawahara (2020), and modified the method proposed by Fan et al. (2019). Test calculations on a cloudless toy model of the Earth yielded surface distributions with sparsity and continuity. The inferred unmixed spectra were closer to the input model than those inferred by Kawahara (2020). Applying our method to real observation data of the Earth obtained by DSCOVR, we also found that the surface distributions and spectra were reasonably recovered by the current method. We concluded that sparse modeling provides better inferences of the surface distribution and unmixed spectra than the method based on Tikhonov regularization.
This study can be extended in several ways. In this study, we focused on the 1 -norm and TSV regularization, which prefers sparsity and continuity. However, other choices of regularizations for surface distributions and spectrum can be considered. Furthermore, another type of sparse modeling based on matrices was proposed in previous studies (e.g., Candès et al. 2011), and different types of volume regularization in remote sensing can also be used (e.g., Ang & Gillis 2019). Additionally, we assumed the surface distribution of the endmember as static, but we should also consider the dynamical motion of surfaces, especially for clouds. Recently, Kawahara & Masuda (2020) developed dynamic spin-orbit tomography to retrieve the geometry and surface maps in a single band using Tikhonov regularization, and we can extend their method based on sparse modeling. Ultimately, we might be able to combine dynamical mapping (Kawahara & Masuda 2020) and spectral unmixing (Kawahara 2020) into dynamic spin-orbit unmixing to solve the dynamical motions of planetary surfaces. These issues will be addressed in future research.
The authors are indebted to the DSCOVR team for making the data publicly available. We are grateful to Siteng Fan and Yuk L. Yung for providing the processed light curves and their geometric kernel from the DSCOVR dataset. We are also grateful to Kento Masuda and Shiro Ikeda for insightful discussions. We would also like to thank the anonymous reviewer for an attentive reading and fruitful suggestions. This study was supported by JSPS KAKENHI Grant No. JP18H04577, JP18H01247, JP20H00170, JP21H04998 (H. K. ), JP22000005, JP15H02063, and JP18H05442 (M. T. ). A. K. was also supported by JST SPRING, Grant Number JPMJSP2108. This study was supported by the JSPS Core-to-Core Program Planet 2 and SATELLITE Research from the Astrobiology Center (H. K. ).

A. SPECTRAL UNMIXING
In this section, we review spectral unmixing, which originates from remote sensing techniques. First, we consider hyperspectral images, which are targets of spectral unmixing. A hyperspectral image has dimensions in the spatial direction and also in the wavelength direction (Bioucas-Dias et al. 2013). Each pixel in the image contains multiple components (e.g., vegetation, land, and ocean). While we term it a pure pixel that a pixel contains only a single component, we term it a mixel that contains multiple components due to the observational resolution, and each component is termed an endmember. Decomposition of the observed image into the spectra of the endmembers and their abundance is termed as spectral unmixing.
We now consider spectral unmixing with non-negative matrix factorization (NMF) (Paatero & Tapper 1994). Let N j , N k , and N l denote the number of pixels of the image, endmembers, and wavelengths of observation, respectively. The hyperspectral image obtained from the observation is M ∈ R Nj ×N l , where M jl := m(r j , λ l ) represents the observational data at position r j and wavelength λ l . The linear mixing model is expressed as follows: where A ∈ R Nj ×N k denotes the surface distribution matrix, and X ∈ R N k ×N l denotes the endmember matrix. Here, a k , which denotes the k-th column vector of A, denotes the surface distribution of the k-th endmember; namely, an abundance of A jk = a k (r j ). x k , which denotes the k-th column vector of X , denotes the reflection spectrum of the k-th endmember X kl = x k (λ l ). The problem that is described by matrices, such as spectral unmixing, is expressed as optimization problem: where · F denotes the Frobenius norm defined as The optimization problem of adding the constraint that the entries of each matrix are non-negative is NMF.
On the other hand, we can generate matrices A and X using the regular matrix P ∈ R N k ×N k in (A1): Thus, in general, A and X that satisfy (A1) are not unique. Additionally, NMF is known to be NP-hard, and thus, it is difficult to determine the optimal solution. To address the above problems, it is necessary to add appropriate constraints or regularization terms.
where R(A, X) denotes the regularization term. With respect to NMF, using simplex volume minimization as a regularization term can reproduce the high-resolution spectrum components (Craig 1994;Lin et al. 2015;Fu et al. 2019;Ang & Gillis 2019). In the simplex volume regularization, det(XX )/(N l !), the volume of an (N l − 1)-simplex in (N l − 1)-dimensional space with vertices{x 1 , x 2 , . . . , x N l }, is used as the regularization term.
where λ X denotes the regularization parameter.

B. OPTIMIZATION OF A NON-DIFFERENTIABLE FUNCTION
Let f : R n → R be a function. The optimization problem of obtaining a solution x ∈ R n that minimizes f under constraint x ∈ S ⊂ R n is expressed as follows: minimize x∈R n f (x) subject to x ∈ S (constrained problem). (B10) Function f is termed the objective function. Specifically, when S = R n , the optimization problem can be re-expressed as follows: In the following, we consider the objective function f to be a convex function (Definition 1). Optimization problems for convex functions are extensively studied due to their tractable properties (for example, Rockafellar 1970). If f is a differentiable function, then the update formula for the gradient descent method, which is the simplest update method, can be provided as follows: where γ > 0 denotes the parameter indicating the step size.

B.1. Proximal Point Algorithm
If f is a non-differentiable function, then we cannot use the gradient descent method because ∇f does not exist. An algorithm used to solve this problem is the proximal point algorithm (Kanamori et al. 2016). Let ψ be a closed proper convex function (Definition 3, 8) that is not necessarily differentiable. We consider the following optimization problem: The update formula for the proximal point algorithm is expressed as where prox(x | γψ) denotes the proximal operator of ψ (Definition 18), and its value is unique for any x (Proposition 19). Now, we consider the following function for ψ: Subsequently, the function is expressed as where * is the conjugate function (Definition 9). Given that γψ(y) + (1/2) y 2 2 is a 1-strongly convex function (Theorem 6), its conjugate function is a 1-smooth function (Theorem 12). Hence, γψ(y) + (1/2) y 2 2 * is differentiable and ψ γ (x) is differentiable. We term ψ γ (x) the Moreau envelope of ψ, which smoothens ψ. The gradient of ψ γ (x) is Hence, the proximal point algorithm smoothens the objective function prior to applying the gradient descent method.

B.2. Proximal Gradient Method
Let f be a proper convex function that is differentiable and let ψ be a proper convex function that is not necessarily differentiable. We consider the following optimization problem: An algorithm used to solve this problem is the proximal gradient method (Kanamori et al. 2016). It first updates using the gradient descent method for f and then updates using the proximal point algorithm for ψ. These are summarized as Next, we consider the optimization problem for a differentiable proper convex function f with non-negative constraints as follows: The problem is equivalent to where δ + denotes the indicator function of a non-negative set defined as where x ≥ 0 denotes x ∈ {x ∈ R n | x j ≥ 0 (j = 1, . . . , n)}. Given that δ + is a closed proper convex function (Proposition 17), the proximal operator of δ + can be defined.
prox(x | γδ + ) = argmin where max denotes the element-wise maximum. The update fomula of the proximal gradient method for problem (B26) is written as: Sparse modeling is a technique that extracts and analyzes low-dimensional information to explain high-dimensional data. We consider a method to infer the sparse solution to optimization problems for a differentiable proper convex function f : The sparsity of a solution implies that most of its elements are zero. We then infer with the constraint to reduce the number of non-zero elements. A straightforward method to infer a sparse solution involves solving the following optimization problem: where x 0 := #{j | x j = 0} denotes the 0 -norm of x, and C denotes a parameter. However, solving (C34) incurs a huge computational cost because function values must be calculated continuously while changing the value of x 0 . A method of optimization using the 1 -norm instead of the 0 -norm is proposed as follows (Tibshirani 1996): where x 1 := n i=1 |x i | denotes the 1 -norm. A sparse solution can be inferred by using this method. The 1 -norm is an approximation of the 0 -norm because x 1 is a convex hull of x 0 in [−1, 1] n . Moreover, (C35) is equivalent to where λ denotes a parameter (Tomioka 2015). We can solve the problem (C36) using the proximal gradient method because x 1 is a non-differentiable proper convex function.

C.2. Total Variation and Total Squared Variation
Another example of sparse modeling is the Total Variation (TV) regularization defined as where λ denotes the regularization parameter. We then define matrix N to represent adjacent pixels: The TV regularization term is written using N as: TV regularization minimizes the difference between the values of neighboring pixels. This is expected to smooth the values of the neighboring pixels and reduce noise in the solution. Moreover, the Total Squared Variation (TSV) (Kuramochi et al. 2018) is expressed as an extension of TV regularization: (C40) TSV regularization allows us to infer solutions with smooth boundaries in addition to the effects of TV regularization.
To consider the discretized distribution using HEALPix, the TSV regularization term can be re-expressed as:

C.3. Trace Norm Regularization
A type of sparse modeling that uses the structure of matrices is trace norm regularization. First, the singular value decomposition of matrix X is written as X = U ΣV , where U ∈ R m×r and V ∈ R n×r are orthogonal matrices, Σ ∈ R r×r is a diagonal matrix, and r = min{m, n}. When Σ = diag (σ 1 (X), . . . , σ r (X)) (σ 1 (X) > σ 2 (X) > · · · > σ r (X)), σ j (X) (j = 1, . . . , r) is termed as the j-th singular value. We define the trace norm of the matrix X as follows: Let σ(X) be a singular value vector of X, defined as (σ(X)) j := σ j (X); then, the trace norm of X is re-expressed as: Hence, we can infer the matrix with a sparse singular value vector by solving the optimization problem with the trace norm as the regularization term. minimize where λ denotes the regularization term. The number of non-zero singular values of a matrix X is equal to the rank of X, and thus trace norm regularization allows us to infer a low-rank matrix.

D. SPIN-ORBIT UNMIXING WITH TIKHONOV REGULARIZATION
For comparison with sparce modeling, we also consider spin-orbit unmixing with Tikhonov regularization for geography A and volume regularization for X, By rewriting Q Tik as the quadratic form of a k , the k-th column vector of A, we obtain where L A := x k x k W W , T A := λ A I, l A := W ∆x k , and ∆ denotes a matrix defined as ∆ il := D il − j s =k W ij A js X sl . Furthermore, from Q Tik to the quadratic form of x k , the k-th column vector of X , we obtain where L X := W a k 2 2 I, D X := λ X det(X kX k )(I −X k (X kX k ) −1X k ), l X := ∆ W a k , I denotes an identity matrix, andX k denotes a submatrix of X with the k-th row removed. The subproblem to solve the optimization problem (D55) is expressed as minimize a k q A,Tik subject to (a k ) j ≥ 0 (j = 1, . . . , N j ), (D59) minimize x k q X,Tik subject to (x k ) l ≥ 0 (l = 1, . . . , N j ), (D60) q X,Tik := 1 2 Given that (D59) and (D60) are optimization problems with non-negative constraints for differentiable objective functions (B26), they can be solved then using the proximal gradient method. In general, for the following optimization problem: we have ∇q(w) = Lw − l; thus, the update formula of the proximal gradient method for (D63) is written as Kawahara (2020) performed optimization by solving the subproblems (D59) and (D60) at each k. The Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) (Beck & Teboulle 2009b), which is a proximal gradient method using Nesterov's acceleration method (Nesterov 2003), is used to solve each subproblem. Nesterov's acceleration method increases the convergence speed although the function value does not necessarily decrease monotonically. A method to solve this involves using the restart method (O'donoghue & Candes 2015), which restarts Nesterov's acceleration method when the function value does not decrease. Hence, the optimization algorithm for solving (D55) is expressed as follows: Algorithm 3 Spin-Orbit unmixing with Tikhonov regularization Initialization: non-negative matrices A0 and X0 for n in (1, Ntry) do for k in (1, N k ) do Update x k using FISTA with the restart method Update a k using FISTA with the restart method end for end for

E. EVALUATION OF INFERRED SOLUTIONS
We present the evaluation measures used as criteria to select parameters λ 1 , λ TSV , and λ X in the optimization problem (21). The first is the mean residual.
which denotes the difference between the inferred model and observed data. The first row in Figure 9 shows mean residuals for each parameter. The second row of Figure 9 shows det(X X ), which corresponds to the volume of a simplex with each column vector of the normalized spectrum X , where X is defined as: Given that the mean residuals tend to increase as det(X X ) decreases from Figure 9, a trade-off exists between these two values in spin-orbit unmixing with the 1 -norm and TSV regularization. This relationship is also observed in spin-orbit unmixing with Tikhonov regularization (Kawahara 2020). The Mean-Removed Spectral Angle (MRSA) is defined as an evaluation measure that directly compares the inferred spectra with the true values: Because π MRSA(x, y) is the angle between (x − x1) / x − x1 2 and (y − y1) / y − y1 2 , we obtain that 0 ≤ π MRSA(x, y) ≤ π. Thus we get Specifically, if x = y, MRSA(x, y) = 0. We use the average of the inferred spectra at each endmember then compared with the true spectra as the evaluation measure of the model: The third row of Figure 9 shows MRSA for each parameter. Furthermore, we consider the Correct Pixel Rate (CPR) as an evaluation measure that directly compares the surface distribution with the true value. The classification of the j-th pixel of the inferred surface distribution is CPR 1 = 10 3.5 (fix) X = 10 2 X = 10 0 X = 10 2 10 5 10 4 10 3 10 2 TSV TSV = 10 4 (fix) X = 10 2 X = 10 0 X = 10 2 10 4 10 3.5 10 3 10 2.5 1 Figure 9. Evaluation measures for selecting regularization parameters in spin-orbit unmixing with the 1-norm and TSV regularization. The first row indicates mean residuals, second row indicates det(X X ), third row indicates MRSA, and fourth row indicates CPR. λX = 10 2 , λ 1 = 10 −3.5 , and λTSV = 10 −4 are fixed in the first, second, and third columns, respectively, and their values are calculated by changing the other parameters.
namely, c j (A) = k denotes the assignment of the j-th pixel of the surface distribution to the k-th end component. We then define CPR as The fourth row of Figure 9 shows CPR for each parameter. We select a regularization parameter that resulted in smaller mean residuals, det(X X ) and MRSA, and a larger CPR. Given the large rate of change in the value of MRSA, as shown in Figure 9, we measured the value of each evaluation measure by changing the parameter around the local minimum value of MRSA. Furthermore, λ 1 = 10 −3.5 , λ TSV = 10 −4 , and λ X = 10 2 when MRSA exhibits a local minimum. At these points, the mean residual and det(X X ) values are sufficiently small, and the CPR is sufficiently large. Thus, we adopt λ 1 = 10 −3.5 , λ TSV = 10 −4 , and λ X = 10 2 under the assumption that the rating scale changes independently for each regularization parameter. Figure 10 is same as Figure 9 but except with trace norm. We adopt λ A = 10 2.1 and λ X = 10 3.3 .

F. THEORY OF CONVEX OPTIMIZATION
In this section, we review the theory of convex optimization, which is the basis of spin-orbit unmixing.
(i) f is a µ-strongly convex function.
Proof. For any x, y ∈ dom f and any α ∈ [0, 1], we havẽ by Definition 4, we obtainf Therefore,f is convex. Furthermore, by the definition off , we obtain the following: Hence,f is proper convex.
(i)⇐(ii) Supposef is proper convex. Sincẽ by Definition 1, we obtain the following: which is equivalent to Definition 4. Hence, f is µ-strongly convex.
Proposition 7. Let f : R n → R ∪ {+∞} be a convex function. The following statements are true.
(i) If g is a convex function, then f + g is a convex function.
(ii) If g is a µ-strongly convex function and dom f ∩ dom g = ∅, then f + g is µ-strongly convex.
Hence, f + g is convex.
(ii) Suppose g is µ-strongly convex. The functiong(x) := g(x)−(µ/2) x 2 2 is proper convex according to Theorem 6. Given that dom f ∩ dom g = ∅, f +g is proper convex based on (i) of Proposition 7 and Definition 3. Moreover, we obtain the following: Hence, f + g is µ-strongly convex by Theorem 6.
Kuwata et al.
The conjugate function f * : R n → R ∪ {+∞} of f is defined as follows: f * (x) := sup y∈R n x y − f (y) .
Definition 11 (Smooth function, Kanamori et al. (2016, p. 14)). Let f : R n → R be a differentiable function. f is said to be smooth if there exists γ > 0 such that for any x, y ∈ R d , Theorem 12 (Rockafellar & Wets (2009, p. 566 Proposition 12.60)). Let f : R n → R ∪ {+∞} be a closed proper convex function. For any µ > 0, the following statements are equivalent.
Theorem 13 (Rockafellar (1970, p. 242 Theorem 25.1)). Let f : R n → R ∪ {+∞} be a function such that dom f is an open convex set. When f is differentiable in dom f , the following statements are equivalent: (i) f is convex function.
Proposition 14. Let f : R n → R ∪ {+∞} be a differentiable closed proper convex function. The following statements are equivalent: (i) g = ∇f (x).
Corollary 15. Let f : R n → R ∪ {+∞} be a differentiable closed proper convex function. Then, we obtain the following: ∇f * (x) = argmax y∈R n x y − f (y) .

(F99)
Proposition 17 (Fukushima (2001, p. 60 Theorem 2.43)). If S is a non-empty convex set, then δ S is a proper convex function. If S is also closed, then δ S is a closed proper convex function.
When at least one of x or y is not included in S, the function value becomes ∞. This also satisfies the definition of a convex function (Definition 1). Therefore, δ S is a convex function. Furthermore, δ S is a proper convex function based on the non-emptiness of S and definition of δ S . Using Definition 8, the following statement is true: δ S is a closed function ⇔ {x ∈ R n | δ S (x) ≤ c} is a closed set for any c ∈ R (F102) Therefore, if S is a non-empty closed convex set, δ S is a closed proper convex function.
Proof. Given that function · 2 2 is a strongly convex function, function ψ(w) + (1/2) w − x 2 2 is also a strongly convex function by Proposition 7. Hence, the value of prox (x | ψ) is unique based on Theorem 5.