A comparison between gradient descent and stochastic approaches for parameter optimization of a sea ice model

Two types of optimization methods were applied to a parameter optimization problem in a coupled ocean–sea ice model of the Arctic, and applicability and efficiency of the respective methods were examined. One optimization utilizes a finite difference (FD) method based on a traditional gradient descent approach, while the other adopts a microgenetic algorithm (μGA) as an example of a stochastic approach. The optimizations were performed by minimizing a cost function composed of model–data misfit of ice concentration, ice drift velocity and ice thickness. A series of optimizations were conducted that differ in the model formulation (“smoothed code” versus standard code) with respect to the FD method and in the population size and number of possibilities with respect to the μGA method. The FD method fails to estimate optimal parameters due to the ill-shaped nature of the cost function caused by the strong non-linearity of the system, whereas the genetic algorithms can effectively estimate near optimal parameters. The results of the study indicate that the sophisticated stochastic approach (μGA) is of practical use for parameter optimization of a coupled ocean– sea ice model with a medium-sized horizontal resolution of 50 km×50 km as used in this study.


Introduction
Sea ice plays an important role in shaping the climate system in the Arctic Ocean by altering heat, momentum and material exchanges between the atmosphere and ocean (e.g., Wadhams, 2000;McPhee, 2008;Thomas and Dieckmann, 2009).Development of a sea ice model is thus of great sig-nificance not only for understanding the sea ice physics itself but also for understanding the Arctic climate system and its linkage to the global climate.Comprehensive, large-scale sea ice models have existed for more than 3 decades and have provided various insights regarding sea ice and its role in the Arctic climate system (e.g., Hibler, 1979;Hibler and Bryan, 1987;Zhang and Hibler, 1997;Hunke and Dukowicz, 1997).However, even current sea ice models differ in the simulated ice properties and also show pronounced biases compared to observations (Rothrock et al., 2003;Gerdes and Köberle, 2007;Johnson et al., 2007Johnson et al., , 2012;;Martin and Gerdes, 2007;Eisenman et al., 2007).In order to improve simulated ice properties, explorations of suitable parameterization of dynamic and thermodynamic processes of sea ice (Shine and Henderson-Sellers, 1985;Lipscomb et al., 2007) as well as parameterization regarding atmosphere-ice-ocean fluxes (Lüpkes and Birnbaum, 2005;Lu et al., 2011) are still under way.Such studies are always accompanied by sensitivity studies with respect to newly introduced parameters or an estimation of an optimal parameter set.Particularly, an estimation of an optimal parameter set relevant to respective model configurations is nontrivial work for simulating realistic sea ice fields by a model (Miller et al., 2006;Kim et al., 2006;Nguyen et al., 2011).In this study we explore suitable and effective methods for parameter optimization of coupled ocean-sea ice models, which can be applied to any kind of similar models with relatively small programming effort.
To find an optimal parameter for a parameterization of a certain physical process, early studies performed sensitivity experiments in which a single parameter was varied at a time and other parameters were fixed (Shine and Henderson-Published by Copernicus Publications on behalf of the European Geosciences Union.Sellers, 1985;Ledley 1991a, b;Holland et al., 1993).However, Chapman et al. (1994) reported an interdependency of parameter sensitivity and thus the necessity of multivariate sensitivity experiments.In order to find an optimal parameter set in multi-dimensional parameter space, Harder and Fischer (1999) and Miller et al. (2006) performed multivariate sensitivity experiments with a sea ice model.Harder and Fischer (1999) optimized atmospheric and oceanic drag coefficient and ice strength to minimize the misfit between their sea ice model and observations.Similarly, Miller et al. (2006) optimized atmospheric drag coefficient, ice strength parameter and ice albedo simultaneously.In both studies, the parameter space was discretized in a lattice form and combinations of parameters were tested.Although they successfully obtained an optimal parameter set in the three-dimensional parameter space, they had to perform more than 100 experiments even for just 3 parameters.Generally, the number of experiments required for an n-dimensional parameter space increases with the n-th power.
In recent years, more sophisticated approaches for parameter optimization were presented.Kim et al. (2006) applied an automatic differentiation (AD) technique to examine parameter sensitivity of a dynamic thermodynamic sea ice model.In their approach, analytical derivatives of the model with respect to selected parameters were obtained by AD.They used so-called "identical twin experiment" for parameter optimization to demonstrate the performance of their algorithm.It was shown that an AD-based gradient combined with a quasi-Newton search algorithm can effectively retrieve the parameters.Nguyen et al. (2011) presented another sophisticated approach for parameter optimization.They optimized sea ice and ocean model parameters as well as the initial conditions of a coupled ocean-sea ice model with a Green's function approach.Sensitivities of the model with respect to the control parameters were assumed to be linear around a baseline experiment, and then the model Green's function was calculated by perturbation experiments.They obtained a set of parameters, forcing field and initial conditions, which reduces the cost function by 45 %.Other than sea ice models, such methods for model's parameter optimizations can be found in numerous atmospheric and oceanic studies (e.g., Garcia-Gorriz et al., 2003;Menemenlis et al., 2005;Mochizuki et al., 2007;Bocquet, 2012).
Although these approaches provided effective methods to perform a multivariate parameter optimization, problems could arise if the model exhibits a nonlinear response to control parameters, resulting in a complicated shape of the cost function (Evensen and Fario, 1997;Mazzega, 2000).For instance, if the cost function has a multimodal or ill-shaped structure, results of gradient descent methods depend on the initial guess for the parameter set.Generally we cannot exclude the possibility that the cost function has local minima besides the global minimum.In such a situation, one has to perform multiple individual optimizations starting from a variety of initial parameter guesses to find the global minimum.
Another problem may arise from a micro-scale structure of the cost function, because gradient descent approaches can only be reasonably applied if the cost is a smooth function of control parameters.Unfortunately, smoothness of sea ice model's response with regard to its control parameters is not always guaranteed (as will be shown in Sect.3).
One of the possible solutions to these problems is to apply stochastic algorithms, which perform a random search in the parameter space.Stochastic algorithms, such as simulated annealing or genetic algorithms, are kinds of global optimization algorithms (GOAs) and widely applied to parameter optimization problems in other research fields such as biogeochemical modeling (e.g., Athias et al., 2000;Schartau and Oschlies, 2003;Shigemitsu et al., 2012).Advantages of stochastic approaches are their applicability to multimodal or ill-shaped functions, easy implementation and suitability for parallel computational environment.On the other hand, a serious disadvantage of these approaches is that they generally require huge computational resources as compared with gradient descent approaches for an individual search (Vallino, 2000).This may be one of the reasons why these approaches have not been applied to parameter optimizations for sea ice models coupled with ocean general circulation models (OGCMs), which usually require substantial computational resources.Actually, most of the applications of stochastic approaches for parameter optimizations are found in zero-or one-dimensional model studies (e.g., Carroll, 1996;Vallino, 2000;Athias et al., 2000;Schartau and Oschlies, 2003;Lv et al., 2009), and only a few applications for three-dimensional model studies can be found (Huret et al., 2007).
Difficulties arising from a large computational burden can be overcome by combining parallel processing and a lowcomputational-cost stochastic approach.Athias et al. (2000) examined the efficiency of parameter optimization by 3 types of GOAs.They reported that a micro-genetic algorithm (µGA) can more efficiently reach a near-optimal solution than other GOAs like simulated annealing (Kirkpatrick et al., 1983) and TRUST (Cetin et al., 1993).The genetic algorithm (GA) is a quasi-stochastic search algorithm to find an optimal solution based on the natural selection of living things (Holland, 1975;Goldberg, 1989).The µGA is a realization with a small computational load by taking advantage of very small population size (Krishnakumar, 1989;Kim et al., 2002).It has been successfully applied to a variety of optimization problems (e.g., Johnson and Abushagur, 1995;Carroll, 1996;Kim et al., 2002;Schartau and Oschlies, 2003), and its advantages compared with the simple GAs were reported by Krishnakumar (1989) and Kim et al. (2002).As will be shown in Sect.2, the computational load of µGA is quite suitable for multi-processor architecture of recent computers, and a parameter optimization of a coupled ocean-sea ice model can be achieved within a reasonable computational time by using a medium resolution.
The purpose of this study is to provide a simple and systematic method for a parameter optimization of coupled ocean-sea ice models.In order to provide a suitable optimization method, we introduce a cost function composed of model-data misfit of ice concentration, ice drift velocity and ice thickness.We examine the characteristics of the cost function to specify a requirement necessary for an optimization approach, and then demonstrate parameter optimizations in practice by two types of different approaches: a gradient descent approach and a stochastic approach.As an example of gradient descent approaches, we apply the finite difference method, whereas as an example of stochastic approaches we apply the µGA.By examining applicability and efficiency of the respective approaches together with examinations of the characteristics of the cost function, we provide a useful parameter optimization procedure for modelers working on coupled ocean-sea ice models.To achieve our goal, we demonstrate parameter optimizations with a cost function defined by a 1 yr window.This is too short to estimate proper parameters for a long-term simulation, but long enough to examine the properties of the cost function and the efficiency of the methods.A parameter optimization for a realistic simulation by using a longer time window and examinations of the applicability of the estimated parameters to higher resolution models will be presented in forthcoming papers.
The paper is organized as follows: in Sect. 2 we describe the experiment design, which is composed of a brief introduction of the coupled ocean-sea ice model, sea ice data used in this study, definition of the cost function, a description of two types of optimization methods and a description of optimization experiments.In Sect.3, properties of the cost function are examined in two-dimensional parameter space, and then results from the two types of optimizations are provided.Conclusions are given in Sect. 4.

Coupled ocean-sea ice model
The coupled ocean-sea ice model used in this study is the North Atlantic/Arctic Ocean Sea Ice Model (NAOSIM) developed at Alfred Wegener Institute (AWI; Gerdes et al., 2003;Köberle and Gerdes, 2003;Kauker et al., 2003).The ocean part of NAOSIM is based on the MOM-2 model developed at the Geophysical Fluid Dynamics Laboratory (GFDL; Pacanowski, 1995), while the sea ice part of the model is a dynamic thermodynamic sea ice model with viscous plastic rheology (Hibler, 1979;Harder, 1996).Both parts of the model are coupled following the procedure devised by Hibler and Bryan (1987).The model domain encloses the whole Arctic and the North Atlantic Ocean north of approximately 50 • N (Fig. 1), and is formulated on a spherical rotated grid.The geographical North Pole was shifted to 60 • E on the Equator to avoid a numerical singularity arising from convergence of meridians.NAOSIM has been successfully applied to a variety of ocean and sea ice studies in the Arc- tic Ocean (e.g., Kauker et al., 2003Kauker et al., , 2005Kauker et al., , 2009;;Karcher et al., 2005Karcher et al., , 2007Karcher et al., , 2011)), and more descriptive information about the model configuration can be found in Karcher et al. (2003), Kauker et al. (2003) and studies mentioned above.
In this study we employ a low-resolution version of NAOSIM (Kauker et al., 2009), with a horizontal resolution of 0.5 • × 0.5 • and 20 levels in the vertical (     (Köberle and Gerdes, 2003).The model is driven by daily atmospheric forcing from 1948 to 2003 (NCEP/NCAR reanalysis, Kalnay et al., 1996) starting from temperature and salinity fields given by the PHC climatology and 100 % ice concentration with 2 m thickness in regions where the sea surface temperature falls below the freezing point of sea water.The initial model fields for the parameter optimization window are taken from 1 January 2003, and 1 yr integration window forced by daily NCEP forcing from 1 January to 31 December 2003 is used for parameter optimization.
For parameter optimizations by a gradient descent approach, we make use of two types of model codes.One is the standard model code, and the other is a "smoothed" model code (hereafter referred to as standard-and smth-code, respectively).As will be shown in Sect.3, responses of modeled sea ice fields to its control parameters do not provide a tractably smooth function for a local gradient estimation.Possible reasons are the parameterization of certain physical process and the model's coarse discretization in space and time.The smth-code of NAOSIM was developed for its 4DVar data assimilation system (Kauker et al., 2009) and is used here to mitigate the problems of local gradient estimation arising from the standard-code.In the smth-code, Fortran statements such as "if", "max(.)","min(.)"and "abs(.)" in the code, which potentially cause discontinuous model behaviors, were replaced by a continuous function such as "atan(.)" to smooth the local structure of the cost function.
Although the smth-code slightly modifies the modeled ice and ocean fields, the differences of the simulated fields between the standard-and smth-code are acceptable for the present purposes.Differences of the cost function between the standard-and smth-code will be examined in Sect.3.

Data
To evaluate modeled sea ice fields, we make use of 3 types of sea ice data sets obtained from satellite observations (i.e., ice concentration, ice drift velocity and ice thickness).With a basin-wide spatial coverage, these satellite data are suitable to measure model-data misfit and have been applied for parameter optimization of a sea ice model (e.g., Miller et al., 2006) as well as a coupled ocean-sea ice model (e.g., Nguyen et al., 2011).
For sea ice concentration, we use preprocessed sea ice concentration data set of the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) Ocean and Sea Ice Satellite Application Facility (OSI SAF).For the data period in this study, the original data were measured by the Special Sensor Microwave/Imager (SSM/I) and processed following the algorithms described in Eastwood et al. (2010).Here we use the product OSI-409 (available at ftp://saf.met.no/reprocessed/ice/conc/v1/), which contains daily mean ice concentration on a polar stereographic grid with a horizontal resolution of 10 km, covering the entire Arctic Ocean except near the North Pole.We processed the original OSI-409 data set into monthly mean data on the model grid to facilitate model-data comparison.Only the original data whose status flag guarantees its reliability were used.The monthly mean values were defined at points where the number of valid data exceeds at least 30 % of the number of days of the respective month.For the data projection from the data grid to the model grid, we simply calculated the arithmetical mean of valid data contained in each model grid cell.Each grid cell generally contains a sufficient number of data points, and interpolation errors can be negligible.
For sea ice drift, we utilize the low-resolution sea ice drift product OSI-405 from EUMETSAT OSI SAF as well.The data used here are a single sensor product measured by the Advanced Microwave Scanning Radiometer of the Earth Observation System (AMSR-E) and processed following the algorithms described in Lavergne and Eastwood (2010).The data set provides information about positions of ice parcels before and after a certain time interval (48 h) as daily files from January to April and from October to December with some data gaps.In the original data set, the initial positions of the parcels are fixed to the grid points defined on the polar stereographic coordinate with 62.5 km mesh, while the position of the parcels after the time interval is provided as ice displacement data.This procedure introduces certain biases.Nevertheless, this is one of the best available estimates of sea ice motion with large spatial and long temporal coverage.In order to use the data for the present model-data comparison, we calculated monthly mean ice drift on the model grid.In this process, we firstly calculated monthly mean displacement of each parcels on a data grid point when observations were available for more than half of each month.We secondly projected the displacement data from the data grid to the model grid and calculated zonal and meridional ice drift in the model coordinate, with a limitation of maximum interpolation distance of 90 km.
In addition to the above data sets, we also use basin-wide ice thickness data provided by Kwok et al. (2009).The data set is composed of 10 campaigns of Ice, Cloud and land Elevation Satellite (ICESat) from 2003 to 2008 on a polar stereographic grid with a horizontal resolution of 25 km (available at http://rkwok.jpl.nasa.gov/icesat/download.html).Ice thickness is estimated by a method described in Kwok et al. (2007) and Kwok and Cunningham (2008) from total (sea ice plus snow) freeboard measured by a laser altimeter on the satellite.In the present study, we utilized the ON3 campaign from 24 September to 18 November 2003 for model-data comparison.We projected the original data onto the model grid by simply adopting the nearest data point, since the horizontal resolution of the original data is finer than that of the model.

Cost function
The cost function measuring the model-data misfit is defined by a combination of 3 types of sea ice data mentioned above and an additional penalty term.The total cost function, J , is given by where k = 1, 2 and 3 correspond to ice concentration, ice drift velocity and ice thickness observations, respectively; J k represents the contribution from respective component; N k is the number of observational data for the k-th component; P is a penalty term introduced for a gradient descent approach and will be defined later.To reduce the cost from the respective components simultaneously, we organize the cost function so that the contributions from the respective sea ice properties have the same order of magnitude.For this purpose, J k is divided by the number of respective observations, N k , since the numbers of the respective observations significantly differ from each other (N 1 = 50 730 (ice concentration); N 2 = 14 295 (ice drift velocity); N 3 = 2123 (ice thickness)).Together with the observational uncertainties defined later, this normalization makes it possible to evaluate the contribution from the respective components in the same order of magnitude.
We measure each component of the cost function by the squared L 2 norm of model-data misfit weighted by the uncertainties of the observations: (2) T is the convolution of measurement function with the full model dynamics (i.e., G i (m) gives the model's counterpart to the observational data, d i ); W is the weighting matrix to take uncertainties of the observed data into account.In the present experiments, we only consider the diagonal elements of W defined by W i = σ −2 k , where σ k is the uncertainty of the respective observations of k-th component.We assume provisional values, σ 1 = 5 % for ice concentration, σ 2 = 1 cm s −1 for ice drift velocity and σ 3 = 50 cm for ice thickness.These uncertainty values were chosen so as to make the costs associated with respective components have the same order of magnitude and, therefore, contribute to the total cost function to the same extent.An exact evaluation of uncertainties of the merged data and providing an exact form of the weighting matrix is nontrivial and would be quite time-consuming as well as digress from the subject of the present study.
Therefore, we simply assumed the provisional uncertainty values for ice concentration and ice drift velocity, whereas we adopted the uncertainty values provided by Kwok et al. (2009) for ice thickness data, since the thickness data are provided as a time average of each campaign period, and we did not process the data for temporal average.
As control parameters m for the optimization, we selected 7 model parameters from different physical processes as listed in Table 2 (i.e., M = 7 in the present case).Two of the parameters were taken from momentum transfer processes among atmosphere, ice and ocean, while others were taken from dynamic and thermodynamic processes of the sea ice model.Atmospheric and oceanic drag coefficients (cdwin and cdwat) are important tuning parameters for coupled ocean-sea ice models, and a number of studies have focused on obtaining appropriate values for realistic sea ice simulation (e.g., Holland et al., 1993;Chapman et al., 1994;Harder and Fischer, 1999).The present model employs a simple quadratic drag formulation and does not take the atmospheric surface stability into account (i.e., neutral conditions are assumed).The empirical ice strength parameter (P * ) is another key parameter controlling dynamic sea ice processes and has been chosen as one of the tuning parameters in a number of studies (e.g., Holland et al., 1993;Harder and Fischer, 1999;Nguyen et al., 2011).We also include the lead closing parameter in the ice compactness equation, h 0 , in the set of control parameters.Furthermore, we select latent and sensible heat transfer coefficients (cdlat and cdsens) and snow and ice albedo values (albedo), since these are the key parameters controlling the thermodynamic processes of sea ice.The representative albedo value in Table 2 is set to be equivalent to the albedo of frozen snow in the model, and is related to other albedo values by maintaining the original ratio between the albedo of frozen snow and other albedo values (0.77/0.8 for melting snow, 0.7/0.8 for frozen ice, and 0.68/0.8for melting ice).It should be mentioned that the choice of parameters is particular to this study and other Hibler (1979) class of models.A different parameterization of sea ice mechanics and thermodynamics (like in Hunke and Lipscomb, 2001) naturally leads to a different set of parameters and thus different optimization results.
We limited the number of control parameters to 7 to maximize the efficiency of the optimization by the gradient descent approach on our computational environment.This limitation comes from the number of available CPUs in a certain batch job class at the computer facility at AWI, and can be relaxed when we perform a parameter optimization for realistic model simulations.We also set bounds to the parameter values within a prescribed upper and lower limits (see m max i and m min i in Table 2).These limits are provisional values for examining performances of the present two optimization approaches, and also can later be relaxed within the physical constraints (e.g., that the albedo values should not exceed 1.0).The penalty term, P , is introduced so that the gradient descent algorithms can keep the estimated parameters within the prescribed range: where , respectively.This term rapidly increases the cost function when the estimated parameters exceed their prescribed ranges, while adding negligible cost when the parameters stay within.Although the stochastic approach does not need the penalty term, we applied the term to a couple of series of optimization experiments with a stochastic approach for comparison purposes.

Gradient descent approach
As a gradient descent approach, we employ a finite difference (FD) method combined with a quasi-Newton search algorithm to find an optimal parameter set.This is a very fundamental and simple method to optimize model parameters.One of the advantages of the method is that no changes to the model code are necessary to evaluate the gradient of the cost function.Therefore, the method can be easily applied to numerical models of all kinds without a special programming effort.Another advantage is that no linearization of the code is done, and one can explore the fully nonlinear cost function space.On the other hand, the computational cost for gradient estimation increases proportional to the number of the control parameters in contrast to sophisticated approaches such as an adjoint.By exploiting recent parallel computational environments, one can apply the method to parameter optimizations for a moderate number of parameters, while it is still impractical to apply the method to optimize initial and/or boundary conditions of the model.A disadvantage common to all gradient descent approaches is the possibility of becoming stuck in one of the local minima of the cost function.
In order to reduce this possibility, experiments with various initial guesses of the parameter set must be performed.
In the FD method, a gradient of the cost function with regard to each control parameter, ∂J ∂m i , is evaluated at a certain point in the M-dimensional cost function space by a difference of the cost functions divided by an increment: where m i = [0, 0, . . ., m i , . . .0] T is an increment of i-th control parameter normalized by the range of each parameter.By performing M+1 model runs and cost evaluations, we can obtain a full gradient of the cost function ∂J ∂m at a certain point.We applied one of the quasi-Newton search algorithms, the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS) by Liu and Nocedal (1989), to reduce the cost function by searching an optimal parameter set in the gradient descent direction.Re-evaluations of the cost and its gradient and an application of the search algorithm are repeatedly performed, until an Euclidean norm of the gradient (normalized by a Euclidean norm of the control vector) falls below a prescribed threshold or the line search routine in the search algorithm is unable to provide further steps, which satisfies the sufficient decrease and curvature conditions.The threshold for the norm of the gradient is set to 1.0 × 10 −12 .
Before performing parameter optimizations, we tested the performance of the L-BFGS algorithm by a pseudo-cost function defined by a M-dimensional continuous function with only one global minimum, and confirmed that the algorithm finds the minimum of the function within a computational accuracy after dozens of iterations.Figure 2a shows schematics of the parameter optimization system by the present method.As shown in the figure, we perform M+1 model runs and cost evaluations in parallel by exploiting M+1 processors.Therefore, the time required for L iterations is equivalent to L model runs plus the cost evaluations by one processor.Since a parallel computational environment has become quite common, the approach is feasible ….

Model run & cost evaluation
The  for the optimization of a limited number of paramters in a coupled ocean-sea ice model.
Before performing optimization experiments, we also made a survey of the increment, m, for the gradient estimation.If the cost function is a smooth and continuous function of the control parameter set m, we can apply an infinitesimal increment for local gradient estimation in Eq. (4).However, as will be shown in Sect.3, the shape of the cost function given by the coupled ocean-sea ice model is not smooth.It has a spiky micro-scale structure probably due to some parameterizations adopted in the model and the model's discretization in space and time.Although this situation is to some extent alleviated by adopting the smth-code mentioned above, it is not completely solved.A cost function, J , evaluated at a certain point in the parameter space inevitably reflects micro-scale local structure.If we apply a very small increment for a gradient estimation, the gradient represents an inclination of micro-scale unevenness of the function and is not applicable to the minimum search, whereas if we apply a large increment, the gradient cannot represent a local gradient and the accuracy of the search will decline.In such a situation, we have to figure out an appropriate increment size that is sufficiently larger than the width of the micro-scale structure while, at the same time, small enough to capture a local gradient of the function.We performed preliminary optimization experiments with various increment sizes, and evaluated the reduction ratio of the gradient of the function as a measure of an appropriate increment size.If a gradient of the function after an optimization is sufficiently small, the optimized parameter set is close to an extremum of the function.While the gradient remains large, it is still far from an extremum or the gradient captures micro-scale unevenness of the function due to an inordinately small increment.Therefore, we can say that a smallness of the gradient after an optimization compared to the initial gradient is a necessary condition for being close to an extremum.In the preliminary experiments, we tested various sizes of increment with smthcode and found that an increment with (m max i −m min i ) × 10 −2 gives the best reduction ratio of the gradient.We apply this increment value in the following experiments.The adequacy of the increment will be reexamined by two-dimensional survey of the cost function in Sect.3.

Stochastic approach
As an example of stochastic approaches, we apply the microgenetic algorithm (µGA), which is a small population version of the genetic algorithms (GAs).The GAs are global optimization algorithms based on the natural selection of living things (a description of the algorithms can be found in Holland, 1975, andGoldberg, 1989).The advantage of the algorithms is an applicability to an extremum search for illshaped or multimodal functions, whereas the disadvantage is a huge computational cost to obtain a solution with a high accuracy.In the algorithms, a single set of the control parameters (parameter vector) is represented by a genotype of an individual by encoding the parameter vector to binary bit strings (Fig. 3a), and then a generation that is composed of a prescribed number of randomly generated individuals is prepared (Fig. 3b).The generation can be regarded as a pool of various vectors, and the algorithms simulate an evolution of the generation based on a reproductive plan, which consists of selection of individuals, recombination of genes and mutation of individuals.The selection process is conducted by the Darwinian evolutionary principle of "survival of the fittest", which in the present context corresponds to a selection of suitable parameter vector by an evaluation of the cost function (Fig. 3c).The recombination of genes is carried out by an exchange of genes among selected individuals (Fig. 3d), corresponding to a generation of new parameter vector by a random combination of binary bit strings coming from a couple of vectors contained in the pool.The mutation introduces new genic information into the generation, corresponding to an introduction of random seeds of parameter vectors into the pool.Generally, GAs require a population size of O(10 2 ) to preserve sufficient possibilities for the search.Since the population size is correspondent with the number of model runs required for each generation, it is generally not possible to apply the algorithms to a parameter optimization of coupled ocean-sea ice models in its original form.
The µGA, on the other hand, requires a very small population size regardless of the chromosome length (Goldberg, 1989).The basic strategy of the µGA is to perform a quick search in parameter space by taking advantage of the small population size and to perform an intensive search by  iterative reinitializations.Technically, the algorithm is composed of 3 processes: a reproduction of individuals, an assessment of convergence and a reintroduction of randomly generated individuals throughout a reinitialization.The reproduction of individuals is achieved by a recombination of genes in the same manner as in the simple GAs with the exception that the best fittest individual of the current generation is reserved and is directly transferred to the next generation without any change (Fig. 3e).After each renewal of generation, the algorithm evaluates convergence of genotype in the generation, and if the diversity of the genes is lower than a certain criterion, all the individuals except the fittest one are replaced by randomly generated new individuals (reinitialization).Since the new information is repeatedly introduced by the reinitialization process, µGA does not need a mutation process.(An example of evolution of estimated parameters is shown in Appendix A.) Advantages of the µGA compared with the simple GAs have been reported in Krishnakumar (1989) and Kim et al. (2002), and examples of application of µGA to various problems can be found in the papers mentioned in the Introduction.
Figure 2b shows schematics of the parameter optimization system with the µGA.We adopted the genetic algorithm driver developed by Carroll (1996) to implement the µGA into the system.As shown in the figure, the number of CPUs necessary for running the system is equivalent to the number of the population size of the generation.The population size required for the µGA is generally less than 10; Goldberg (1989) indicated that a population size of 3 is sufficient for convergence; Coello and Pulido (2001) used a population size of 4; Krishnakumar (1989), Athias et al. (2000) and Kim et al. (2002) used a population size of 5. On the other hand, Schartau and Oschlies (2003) and Shigemitsu et al. (2012), for example, adopted larger population sizes of 13 and 19.In their studies, the population size is set to be the same number as the number of control parameters.Schartau and Oschlies (2003) mentioned that the choice is not mandatory, but was found to perform well in their test experiment.In the present experiment, we adopt population sizes of 5 and 8.The population size of 5 is recommended by a number of former studies, whereas 8 is a number slightly larger than the number of the control parameters.Both of the numbers are quite feasible for recent parallel computational environments.Since the system executes all the model runs in one generation in parallel, the computational time required for an optimization is given by the time for one model run times the number of generations.
The number of generations required for an optimal solution depends on the needed accuracy.Due to the quasistochastic nature of the µGA, the convergence of a solution particularly near the optimal solution is quite slow.In addition, since the parameter space is discretized by a prescribed increment and the size of the increment is inversely proportional to the size of the parameter space (or the number of possibilities), a small increment necessary for an accurate solution requires a search in a vast space.Therefore, we have to figure out a practical number of generations as well as a size of an increment in relation to the desired accuracy of a solution.In order to obtain a tentative relation among them, we performed preliminary optimization experiments with a pseudo-function.In these experiments we found that the number of possibilities of 2 7 for each parameter requires at least 400 generations for a solution with 1 % expected error, and 1000 generations for a solution with 0.5 % expected error (see Appendix B for details).It should be noted that since these tentative estimations of errors were obtained from a continuous function with only one minimum, it is not clear whether such an error estimate is applicable to a complicated function with a number of local minima.Nevertheless, from the practical point of view, an optimization experiment with 400 generations is an upper limit of model runs if we intend to apply the method to a parameter optimization with a realistic time window, and therefore we employ these tentative estimations to examine efficiency and applicability of the present method.

Optimization experiments
Before performing optimization experiments, we conducted two-dimensional complete surveys of the parameter space spanned by h 0 and P * to demonstrate the nature of the cost function derived from coupled ocean-sea ice model.The surveys were done for both standard-and smth-code with some intensive search in a specific area.All the two-dimensional maps showing the cost function structure are composed of a mesh of 40 × 40 points obtained from 1600 model runs and cost evaluations.
For the gradient descent approach, we performed a couple of series of optimization experiments with the standardand smth-code to examine the effect of smoothing of the model code on the efficiency of an optimization (Table 3).The cost function obtained from the smth-code is reevaluated by applying the optimized parameters to the standard-code for comparison purposes.In each series of experiments, 10 independent optimizations starting from different initial parameter sets were performed to see whether the optimized parameter sets converge to certain parameter values (Table 4).The initial parameter sets are generated as follows: the first parameter set is given by the standard parameter setup used in previous studies of NAOSIM, and the second (the third) parameter set is composed of the upper (lower) limits of respective parameter values.The fourth parameter set is composed of a combination of parameter values with their upper (h 0 , cdwat and cdsens) and lower (P * , cdwin, cdlat and albedo) limits.The fifth to the tenth parameter set are composed of random combinations of selected parameter values, modified in discrete, nearly equidistant steps inside the parameter ranges.It is essential to perform such experiments to assess the applicability of the algorithm to the objective function, because one of the difficulties arising from the approach is missing the global minimum of a function with an ill-shaped structure.
We also performed four series of optimization experiments with the µGA (Table 3).Each series of experiments was composed of 10 independent optimization experiments with different seeds for random number generator in order to statistically assess the efficiency of the algorithm.The first series of experiments (µGA -1) is conducted with the population size of five and the number of possibilities of (2 7 ) 7 .The second series of experiments (µGA -2) has exactly the same setup as the first one but includes the penalty term in the cost function to facilitate the direct comparison with the optimizations by the FD method.The third (µGA -3) and fourth (µGA -4) series of experiments employ different setup parameters in the µGA.The third series uses a population size of eight, whereas the fourth series uses a larger number of possibilities of (2 10 ) 7 .

Property of the cost function
Figure 4 shows a two-dimensional cost function map in h 0 − P * space obtained from the standard-code.The other parameters used for the cost evaluations were fixed to the standard setup values from Table 4.The figure shows that the total cost function reaches its minimum around h 0 ∼ 1.3 and P * ∼ 35 000 (Fig. 4a) with the standard parameter set.This combination of the h 0 and P * values does not correspond to minima of the 3 individual components (Fig. 4b-d) of the cost function.The shapes of the J k /N k coming from different ice properties significantly differ from each other.For example, around the center of the panels, the response of the ice thickness cost to h 0 variation (Fig. 4b) is opposite to that of the ice concentration cost (Fig. 4c), while at the same time, the ice drift cost is relatively insensitive to h 0 variation (Fig. 4d).
Such property-dependent responses of the cost suggest a couple of important features of the total cost function derived from a combination of different ice properties.The first point www.ocean-sci.net/9/609/2013/Ocean Sci., 9, 609-630, 2013  is that the function potentially has more than one minimum even if each component of the cost has only one global minimum.The property-dependent responses may be partly due to biased initial sea ice conditions, shortcomings of modeled physics and also partly due to errors in the forcing data.Since a parameter optimization work is inevitably accompanied by such circumstances, the search algorithms for a parameter optimization should be tolerant regarding the existence of local minima.In other words, the gradient descent approaches may have some difficulties finding a global minimum due to the characteristics of the cost function.The second point is that the shape of the total cost function, and therefore the optimal parameter set corresponding to the cost function, is strongly influenced by relative importance or weightings of respective components.Since the relative importance stems from squared differences between modeled and observed ice properties divided by observational uncertainty, an appropriate choice of respective uncertainties is of significant importance.Although we adopted here the provisional uncertainty values to concentrate on our examination of an applicability and efficacy of the optimization methods, we have to investigate appropriate uncertainties as well as their relative weightings when we perform parameter optimizations for realistic simulations.
Another characteristic of the cost function is its microscale structure.Figure 5 shows a magnification of the black solid rectangles in Fig. 4. The cost function obtained from the standard-code has a micro-scale uneven structure.Although most of the unevenness on this scale stems from the ice concentration cost, the ice drift and ice thickness cost also have uneven structures if we magnify the map further.Such an uneven structure makes it difficult to accurately estimate a gradient of the function required for the gradient descent approach, and then extremely lowers the accuracy of the solution.Although an application of the smth-code mitigates such a situation to some extent (Fig. 6a, e), the cost function still has an uneven structure on a smaller scale (Fig. 6f).If we estimate possible increments for the FD method from Fig. 6, h 0 ∼ 0.025 and P * ∼ 500 are respectively the smallest values that will not be affected by the micro-scale unevenness.These estimations are consistent with the increment values obtained from the preliminary experiments, (m max i − m min i ) × 10 −2 (i.e., h 0 = 0.019 and P * = 450).It should be kept in mind that although the smth-code does not change the basic responses of the cost function to variations of respective parameters, it increases the cost by ap-Fig.7. Time series of ice concentration cost (blue lines) and ice drift cost (red lines) obtained from the standard setup (solid lines) and from the optimized parameter set (dashed lines).The optimized parameter set is obtained by using finite difference method with the standard-code, starting from the standard setup (see Table 4).

Gradient descent approach
To assess the ability of the optimization system, we first examine the cost function and corresponding sea ice fields obtained from an optimized parameter set.As an example, Fig. 7 shows a time series of ice concentration cost and ice drift cost before and after an optimization.The optimization was performed by the FD method with the standard-code starting from the standard setup shown in Table 4.The total cost function was reduced by 26.3 % from 20.87 (standard setup) to 15.39 (after the optimization) after 44 iterations, and all of the 3 components of the cost were also reduced (ice concentration cost: 25.6 %, ice drift cost: 24.6 %, ice thickness cost: 28.7 %).As shown in Fig. 7, the system successfully reduced the ice concentration cost in summer, while the cost in winter and spring seasons remained unchanged.In August the broad ice-free area in the Eurasian Arctic found with the standard setup was drastically improved by an application of the optimized parameter set (Fig. 8a-c).As for ice drift velocity, the time series of the cost shows that the cost was reduced in the early months of the year, while the costs in October and November were rather increased (Fig. 7). Figure 8d-f show monthly mean ice drift velocity fields in March, in which the improvement of the modeled ice velocity field is evident.For the ice thickness field, we have only one ICESat campaign in 2003.The ice thickness averaged over the period also shows significant improvement in the Eurasian Arctic (Fig. 8g-i).These results indicate that the system can at least provide a parameter set that gives smaller cost function, although the set might not be optimal.
Figure 9 shows the initial and optimized cost functions and associated parameter values obtained from the 10 independent optimization experiments by the FD method with the standard-code.As shown in Fig. 9a, all the optimizations starting from various parameter sets successfully reduced the cost to similar values as the optimization starting from the standard setup (opt-1).The final costs range from 15.02 to 15.47 (mean value is 15.26).Nevertheless, the estimated parameter values except albedo show quite divergent distribu-tions despite the fact that they give similar costs (Fig. 9b-h).Possible reasons for the divergent distributions are (1) shortcomings of the FD method associated with an inaccurate gradient estimation caused by the micro-scale unevenness of the cost function and its manifestation particularly in weaksensitivity regions, and (2) the search algorithm being stuck to one of the local minima of the function.The first possibility comes from the characteristics of the cost function shown in Figs. 4 and 5, and also from the fact that the simulated ice fields corresponding to the estimated parameter sets show quite similar spatial patterns, regardless of the different parameter values (Fig. 10, the upper panels).The second possibility comes from the fact that some of the estimated parameters tend to have some specific values as can be seen in Fig. 9c-g (e.g., in Fig. 9f, opt-1, 6 and 8 give similar estimated values around 1.6 × 10 −3 , whereas opt-3, 5, 7 and 10 give values around 2.1 × 10 −3 ) .In any case, we can conclude that the FD method with the standard-code can reduce the cost to some extent, but at the same time, has a weakness for estimating parameters with weak sensitivity to the cost.
The application of the smth-code mitigates the divergent property of the estimated parameters to some extent.Figure 11 shows the initial and optimized cost functions and associated parameter values obtained from the 10 independent optimization experiments with the smth-code.Variations of some of the estimated parameters become small compared to the standard-code (Fig. 11b, g and h).Other parameters still vary over a similar range as with that of the standardcode (Fig. 11e and f).The cost function values after the optimizations range from 16.26 to 16.39, and those values corresponding to the same parameter sets but re-evaluated by the standard-code range from 15.26 to 15.59 (mean value is 15.38).These values are slightly larger than those obtained from the optimizations with the standard-code, indicating that the application of the smth-code, as a result, could not provide parameter values, which gives smaller cost than the cost obtained from the standard-code.

Stochastic approach
In this subsection, we survey applicability and efficiency of the µGA by examining results from µGA -1, in which the population size is set to 5 and the number of possible parameter values is set to (2 7 ) 7 (see Table 3).Afterwards, we briefly show results from a different setup, µGA -2, 3 and 4, to make a direct comparison of optimization efficiency between the µGA and the FD method, and also to examine the effects of setup parameters used in the µGA.
The parameter sets obtained from the µGA optimization experiments also succeeded to reduce the cost function and improved the simulated ice fields as summarized in Figs. 12 and 13.The time series of the cost function (Fig. 12) and simulated ice fields (Fig. 13b, e and h) were obtained from a model run for which the mean parameter values obtained from 10 independent optimization experiments by µGA -1 were applied.The cost function value after the optimization is 14.82, and the reduction rate of the cost compared to the standard setup is 29.0 % (ice concentration cost: 27.5 %, ice drift cost: 28.7 %, ice thickness cost: 32.0 %).As shown in Fig. 12, the reduction of ice concentration cost is emphasized in the summer months similar to the FD optimizations.The costs from January to June are also reduced more than 10 %.For ice drift velocity, the cost decreases in the early months of the year, whereas they increase in October and November, again similar to the FD optimizations.The spatial pattern of the simulated ice fields (Fig. 13b, e and h) also exhibit similar but slightly better structure than those from the FD optimizations (see also Fig. 8b, e and h for comparison).
Figure 14 summarizes the evolution of the cost functions and corresponding parameter values throughout the 10 independent optimization experiments by the µGA (µGA -1).As shown in Fig. 14a, the cost functions rapidly reduced during the first 100 generations, and the reduction of the cost throughout the following 300 generations is relatively small.After 400 generations, the costs obtained from the 10 optimization experiments range from 14.80 to 14.83 (mean value is 14.81).The evolution of the associated parameter values, on the other hand, shows slight modifications even after the 100th generation in some cases.The variances of the estimated parameters become nearly constant after 200 generations.(For line plots showing evolution of estimated parameters, see Appendix C.) It should be noted that the estimated parameters are not necessarily applicable to a longterm simulation and/or to assess physical processes of the model, regardless of the reduction of the cost and the convergence of the estimated parameters.This is mainly due to the assimilation window used in our test case being shorter than the spin-up time of the sea ice-ocean system.For example, extremely high albedo values (∼ 0.99) are probably due to the strongly biased ice concentration and thickness found in the original model run (Fig. 14a and g).The algorithm leads to more ice on the Eurasian Basin side by reducing sea ice melt in summer with extremely high albedo values.For realistic parameter estimation, the assimilation window should be at least the spin-up time of the sea ice-ocean system (i.e., at least about 7 yr).This work will be done in a forthcoming paper.
A remarkable point of µGA -1 optimizations is the small variance of the estimated parameter values even for parameters that varied considerably in the FD method, such as cdwat and cdlat (Fig. 14; see also Figs. 9 and 11 for comparison).
Figure 15 summaries standard deviations of the estimated parameters obtained from the 10 independent optimization experiments for the respective approaches.µGA -1 generally provides better convergence of the estimated parameters compared to the FD method.Although the FD method with smth-code provides slightly better convergence than the µGA for cdsens and albedo values, the standard deviations are quite small in both approaches and the difference between the two approaches is vanishing.
Figure 10d-f show the spatial pattern of standard deviations of the simulated ice fields calculated by the 10 model runs corresponding to the 10 estimated parameter sets by µGA -1.The figure shows that differences among the simulated ice fields are sufficiently small and generally limited to the marginal ice zone.In addition, the differences are quite small compared to those obtained from the FD method (Fig. 10a-c), indicating better convergence of the simulated ice fields than those obtained from the FD method.The small standard deviations of the simulated ice fields indicate that the corresponding cost functions are closer to the global minimum than those obtained from the FD method.
In order to make a direct comparison of the optimization efficiency between the µGA and the FD methods, we performed an additional series of experiments (µGA -2), in which the penalty term was included in the cost function.In this series of experiments, all the cost function values after 400 generations are smaller than the minimum cost obtained from the FD method, and the standard deviations of the estimated parameters are again satisfactorily small (less than 9 % of the prescribed parameter range), again supporting the advantage of the µGA compared to the FD method.
Further experiments were also performed to examine effects of the setup parameters in the µGA on the efficiency of the optimization.One series of experiments examines the effect of the population size (µGA -3), and the other series of experiments examines the effect of the number of possible parameter values (µGA -4).If we increase the population size from 5 to 8, the efficiency of an optimization slightly decreases (the maximum standard deviation of the estimated parameters is 12.3 % of the prescribed parameter range at 400th generation), probably because of a reduced number of reinitializations due to increased population size.For a large number of possibilities, the convergence of the solution again decreases, indicating that the larger parameter space worsens the efficiency of the optimization even for an identical cost function.These results indicate that the selection of the setup parameters used in the µGA is important to achieve a fast convergence within a limited number of generations.

Summary and conclusion
Two types of optimization methods were applied to a parameter optimization problem of a coupled ocean-sea ice model, and a comparison of the two methods was made to assess an applicability and efficiency of the respective methods.One is a finite difference method based on a gradient descent approach, while the other adopts the stochastic approach of genetic algorithms.To evaluate modeled sea ice properties, a cost function composed of model-data misfit of ice concentration, ice drift velocity and ice thickness was introduced.
An example of a two-dimensional complete survey of the cost function in the parameter space showed that the cost function composed of a combination of different types of observations potentially has more than one minimum value.In addition, the survey also showed that the cost function exhibits a micro-scale uneven structure in parameter space, which prevents the estimation of the gradient of the function with high accuracy.Due to the nature of the cost function, the finite difference (FD) method has difficulties estimating optimal parameters.The estimated parameter values depend on their initial guesses, and the standard deviations of the estimated parameters calculated from 10 independent optimization experiments were quite large for some parameters.A longer time window for a more realistic parameter optimization would render the cost function more complicated and would make the FD method even harder to apply.The genetic algorithm, on the other hand, provides satisfactory results regardless of a relatively small number of generations used here.The results show that the standard deviations of estimated parameters calculated from 10 independent optimization experiments were less than 6 % of the prescribed range of the respective parameters.Examinations of standard deviations of the simulated ice fields also suggest that the µGA can provide cost functions that are closer to the global minimum than those from the FD method.From these results, we conclude that the µGA is favorable compared to the FD method for a parameter optimization of coupled oceansea ice model of medium resolution.
In a forthcoming paper we will extend the assimilation window to be comparable to the spin-up time of the sea ice model of about 5 to 7 yr to obtain more "physical" optimal parameters.This set of parameters can be applied to higher resolution models (O ∼ 10 km).The cost function can be evaluated without large computational costs, and the benefit can be examined although the parameters are not "optimal" for these models.A direct application of the µGA to the O(10 km) model is possible but not achievable with the computer resources currently available to us at AWI .The computational costs of the µGA limit the direct application to eddy-resolving models of the Arctic (O ∼ 1 km).For this class of models, further development in highperformance-computing technology is needed.
, where L is the number of optimization experiments and mi is a mean of estimated i-th parameter.abandoned.In the third generation, no new fittest is generated.It is evident in the third generation that the diversity of the parameter values is decreased.In the fifth generation, the difference of binary bits among the individuals is small and correspondingly the diversity of the parameters is also small.
Then reinitialization is done in the sixth generation producing new individuals.

Appendix B Preliminary optimization experiment by the micro-genetic algorithm
In order to estimate the needed number of generations in relation to the number of possible parameter values (which is inversely proportional to the increment) and the expected accuracy of a solution, we performed optimization experiments with a pseudo-cost function.
The pseudo-function is defined as follows: where E is the expectation of the mean error and P (e j ) is a probability function.
Figure B1 shows the expected value of the mean error for different number of possibilities (proportional to reciprocal number of increment) and different number of generations.Each point in the figure (from 2 3 to 2 15 possibilities for each number of generations) is calculated by 100 independent optimization trials (J = 100) by the µGA.In each optimization experiment, we can obtain e j by Eq. (B2), while P (e j ) is simply given by the reciprocal number of optimization experiments, 1/J .The figure indicates that a larger number of generations generally gives a smaller expected error.On the other hand, the number of possibilities has a small contribution to the expected error in the range of possibilities larger than 2 7 .The figure also indicates that a solution with 1 % expected error can be obtained after 400 generations calculated with the number of possibilities of 2 7 , while a solution with 0.5 % expected error needs 1000 generations with 2 7 possibilities.Note that these results are only valid for the simple cost function applied here, and it is not clear whether these results can be directly applicable to an optimization for a realistic cost function with complicated structure.
. At the southern boundary of the model domain, an open boundary condition has been implemented along the Atlantic sector following Stevens (1991), while in the Pacific sector Bering Strait is treated as a closed wall.At the open boundary of the Atlantic sector, temperature and salinity at inflow points are restored toward PHC (Polar science center Hydrographic Climatology, Steele et al., 2001), and barotropic velocities normal to the boundary are specified from a model version covering the entire Arctic and Atlantic Ocean north of 20 • S Figure 2.

Fig. 2 .
Fig. 2. Schematics of a parameter optimization system with (a) finite difference method and (b) genetic algorithm.See text for description.

Fig. 4 .
Fig. 4. (a) A two-dimensional structure of the total cost function in h 0 − P * space obtained from the standard-code, and contribution to the cost from (b) ice thickness, (c) ice concentration and (d) ice drift velocity.Note that the same contour interval is adopted in all the panels, whereas the color scale is different.The black solid rectangles around the center of the panel are the area for the magnifications shown in Fig. 5.

Fig. 6 .
Fig. 6.A two-dimensional structure of the total cost function in h 0 − P * space obtained from the smth-code, and contribution to the cost from (b) ice thickness, (c) ice concentration and (d) ice drift velocity.(e) Magnification of the black solid rectangle in (a).(f) Magnification of the black solid rectangle in (e).(a-d) utilize the same contour interval but different color scale, whereas (e) and (f) utilize a different contour interval and color scale.

Fig. 8 .
Fig. 8. Modeled and observed (a-c) monthly mean sea ice concentration in August 2003, (d-f) monthly mean ice drift velocity in March 2003 and (g-i) ice thickness in ON3 ICESat campaign period.The panels in the left column are NAOSIM with standard setup, while the panels in the center column are NAOSIM with the optimized parameter set obtained from the finite difference method with the standard-code, starting from the standard setup.The panels in the right column are from satellite observations: (c) OSI -409, (f) OSI -405 and (i) ICESat.

Fig. 10 .
Fig. 10.Standard deviation of the simulated sea ice fields calculated from the 10 model runs.In the respective model runs, the optimized parameter sets obtained from the 10 independent optimization experiments by the finite difference method using the standard-code (the model runs in a-c), or those by the µGA -1 (the model runs in d-f) were applied.The standard deviations of (a and d) monthly mean sea ice concentration in August 2003, (b and e) monthly mean ice drift velocity in March 2003 and (c and f) ice thickness in ON03 ICESat campaign period.

Fig. 13 .
Fig. 13.The same as Fig. 8 but with the average of the µGA -1.The results from the standard setup (the left column) and from satellite observations (the right column) are again shown to facilitate comparisons.

Fig. 14 .
Fig. 14.The initial and optimized cost function values and associated parameter values obtained from 10 independent optimization experiments with the µGA (µGA -1; population size = 5, number of possibilities = (2 7 ) 7 ).(a) Cost function values at the initial (red) and after 100 (yellow), 200 (green) and 400 (blue) generations.(b-d) Parameter values at the initial and after 100, 200 and 400 generations (the same color correspondence as (a)).The initial cost functions and associated parameter values for respective optimization experiments were obtained from the best fittest individual in the first generation.

Fig. 15 .
Fig. 15.Standard deviations of the estimated parameters obtained from 10 independent optimization experiments by (a) the FD method and (b) µGA.The standard deviations were normalized by corresponding parameter ranges: the normalized standard deviation of i-th parameter is given by where M is the number of optimized parameter (M = 7), m i i-th parameter value and m central value i = (m max i + m min i )/2 again the central value of the prescribed range of i-th parameter.The function is smooth and continuous everywhere and has only one minimum, cost = 0, at m = m central value .A mean error of estimated parameters after a certain number of generations for a certain optimization trial is given by e is a mean error normalized by the prescribed parameter ranges, m OPT i i-th parameter value after a certain optimization trial, and m ANL i the analytical solution of i-th parameter that gives the minimum cost function.m max i and m min i are the prescribed upper and lower limits of i-th parameter value given by Table2.The expected value of the mean error after an optimization is given by

Fig. B1 .
Fig. B1.An expected error of the estimated parameters by the µGA as a function of number of possibilities and number of generations, obtained from the idealized pseudo-function experiments.Each point in the figure is calculated by 100 independent optimization trials by the µGA.See text for details.

Fig. C1 .
Fig. C1.Evolution of the cost function and associated parameter values through 10 independent optimization experiments by the µGA (µGA -1: population size = 5, number of possibilities of each parameter = 2 7 ): (a) evolution of the cost function, (b-h) evolution of associated parameters (b) h 0 , (c) P * , (d) cdwin, (e) cdwat, (f) cdlat, (g) cdsens and (h) albedo.Note that the division of the horizontal axes is nonlinear.The vertical dashed lines were plotted every 50 generations.

Table 1 .
Vertical grid spacing of the model.

Table 2 .
Parameters applied for the optimizations.The representative albedo of snow and ice is related to respective albedo values in the manner described in the text.

Table 3 .
Five series of optimization experiments with the FD and µGA.

Table 4 .
Initial parameter sets for optimization experiments with the FD method.

Table A1 .
Example of evolution of estimated parameter values by the µGA (µGA−1).No. 4) is transferred to the second generation without any change (No. 1 of the second generation).In the second generation, a new fittest individual emerges by a random combination of the binary bits coming from the individuals of the first generation.Then the new fittest is transferred to the third generation, and the old fittest individual is