Subgrid-scale physical parameterization in atmospheric modeling: How can we make it consistent?

Approaches to subgrid-scale physical parameterization in atmospheric modeling are reviewed by taking turbulent combustion flow research as a point of reference. Three major general approaches are considered for its consistent development: moment, distribution density function (DDF), and mode decomposition. The moment expansion is a standard method for describing the subgrid-scale turbulent flows both in geophysics and engineering. The DDF (commonly called PDF) approach is intuitively appealing as it deals with a distribution of variables in subgrid scale in a more direct manner. Mode decomposition was originally applied by Aubry et al (1988 J. Fluid Mech. 192 115-73) in the context of wall boundary-layer turbulence. It is specifically designed to represent coherencies in compact manner by a low-dimensional dynamical system. Their original proposal adopts the proper orthogonal decomposition (empirical orthogonal functions) as their mode-decomposition basis. However, the methodology can easily be generalized into any decomposition basis. Among those, wavelet is a particularly attractive alternative. The mass-flux formulation that is currently adopted in the majority of atmospheric models for parameterizing convection can also be considered a special case of mode decomposition, adopting segmentally constant modes for the expansion basis. This perspective further identifies a very basic but also general geometrical constraint imposed on the massflux formulation: the segmentally-constant approximation. Mode decomposition can, furthermore, be understood by analogy with a Galerkin method in numerically modeling. This analogy suggests that the subgrid parameterization may be re-interpreted as a type of mesh-refinement in numerical modeling. A link between the subgrid parameterization and downscaling problems is also pointed out.


Introduction
A global infrared image from a geostationary satellite (figure 1) summarizes rich morphologies of the atmospheric phenomena with wide range of horizontal scales involved. Especially, one finds extensive convective activities over the tropics suggested by a cloud distribution. These convective activities involve a wide spectrum range. The smallest scale of those corresponds to individual cumulus convective towers (or hot towers: Riehl andMalkus 1958, Yano 2009) that are only marginally resolved in this image. Those individual convective elements are organized into the mesoscale, as suggested by reddish clusters in the image. These mesoscale organizations are further organized into a planetary scale as suggested by a modulation of these clustering. A best known example of such planetary-scale coherency is the Madden-Julian oscillation (see Zhang 2005). As a whole, the image suggests that this wide spectrum range of convective cloud variability involves intensive interactions crossing the scales (see Yano 1998, Moncrieff 2010. Thus, it is most likely that even for simulating the planetary-scale Madden-Julian oscillation, contribution of individual convective towers must, somehow, be taken into account, as commonly assumed (but see Yano et al 2009, Delayen andYano 2009 for a complementary view).
However, these individual convective towers (hot towers) are hardly resolved by a typical global model. This situation can be inferred by taking longitude-latitude mesh superposed on the image corresponding to a numerical mesh. As a result, contribution of those convective towers to the global tropical dynamics must somehow be accounted for in indirect parametric manner. This problem is called the parameterization (see Plant and Yano 2015).
More than often, the subgrid-scale parameterization is considered a highly technical problem with the details, and a typical prejudice would say, mostly tuning. Randall et al (2003) even conclude that it is in 'deadlock' (see also : Randall 2013). The present review, on the other hand, sees this problem from a more general formulational perspective, and asks: how can we approach to the parameterization problem in a general consistent manner?

What is consistency?: basic issues of the parameterization problem
The notion of consistency is used in scientific discourses in various manners, but here we focus on the matters of logical consistency. Mathematics is constructed upon such logical rigors. Any theoretical developments in physics are also expected to follow this rule. However, an additional dimension emerges here, because these theoretical developments must also be consistent with the laws of physics for physical consistency. Probably, this is a main difference between mathematics and physics: mathematics has a freedom to choose a set of axioms as a starting point, but physics must start from the known laws of physics.
Being said this as a matter of principle, truth is that we hardly know all the laws of atmospheric physics (and chemistry). Cloud physics is a good example with many processes still to be fully understood. However, this caveat does not change the basic principle that we should begin any theoretical developments from known physical laws, as best as we can. It is the only robust manner for approaching with the subgrid-scale parameterization problem. Needless to say that the physical laws constitute a set of partial differential equations (PDEs) from an applied mathematical point of view, which we may simple call a system.
As the name suggests, the basic idea of parameterization is to represent the subgrid-scale processes in parametric manner. The notion parametric may be rather ambiguous. However, the word suggests a drastic simplification of original full physics so that we can no longer say that those physics are described explicitly, but they remain implicit in formulation (see Ooyama 1982).
In order to develop a parameterization consistently, a certain strategy is required so that the full physical system (as known within our knowledge) is reduced into a parametric representation. A strategy must be logical so that a parameterization is derived in consistent manner. In the present review, we identify the three general approaches reducing a full physical system into a parametric representation, as going to be introduced in the next subsection.
However, reduction of a full system into a parameterization does not work like a derivation of a theorem from set of axioms in mathematics. No logically completely self-consistent reduction to a drastic simple representation is possible for many complex physical systems. Rather, certain approximations and hypotheses must inevitably be introduced.
Thus, in order to maintain consistency of a parameterization development, a third element becomes a practical importance: write down carefully any approximations and hypotheses introduced in process of reduction of a system into a parameterization so that we can keep track of them. If all the published literature on subgrid-scale physical parameterizatins follows this simple rule in a very careful manner, we would be in much less trouble with the parameterization problem.
Even with those various approximations and hypotheses, parameterization must be universally applicable. At least, it must be applicable to all the parameter ranges expected in our climate system. Such universality is particularly important in the context of the climate change predictions: in order to properly predict a climate change, the subgrid-scale parameterizations developed and verified under the present climate must also be valid under a modified climate state.
For this reason, these approximations and hypotheses must be carefully chosen, and certain universality must be known. Unfortunately, that is not always the case, and the range of their validity is often hard to estimate. In short, in order to introduce the approximations and hypotheses, the whole problem must be well-posed: however, how can we achieve this goal? This is a key question of this review.
Here, an important wisdom would be to minimize the introduction of the approximations and hypotheses as much as possible (see section 8.1). As Albert Einstein has supposed to have said, 'Everything should be made as simple as possible, but not simpler.' In other words, keep a parameterization as simple as possible, and minimize the number of tunable free parameters as much as possible. More emphatically stated: never invent an equation. Unfortunately, easy is said: many operational developments are performed by not following this simple rule.
Many physical processes are so complex that often researchers simply give up any of such self-consistent deduction procedures, and simply write down an equation that looks empirically correct. Many parameterizations have been elaborated over years, especially, at meteorological operational research centers in this manner. Parameters introduced associated with these elaborations are tuned to give the best forecast. These elaborations are typically incremental: whenever a new parameter is introduced, it is often tuned by keeping all the old ones retained the same.
Also note that these elaborations are more than often simply built upon an existing scheme, rather in ad hoc manner, without asking its consistency. Note that these elaborations are often done with a good intention of making a parameterization more realistic, but forgetting the key point that merely a parametric representation is intended in which a physical process must remain implicit. These rather ad hoc elaborations can also change the meanings of the original parameters, and re-tuning may also be required to them as well (see section 6.3).
These modifications tend to be ad hoc also in the sense that a new element is added without asking how the original scheme is formulated deductively from the full physics. A more proper procedure to add a new process is to perform a deductive derivation from the beginning again. A goal of the present review is to suggest how such a procedure is possible in a more concrete manner.

Plan of the review
As for the basic strategies for developing subgrid-scale physical parameterizations, in the present review, the three possibilities are considered: (1) moment-based approaches, (2) approaches based on distribution of subgrid-scale variables, and (3) the mode decomposition.
The moments can be considered a type of expansion series of a physical variable with the first three corresponds to mean, variance, and skewness. The expansion by moments provide a systematic framework for parameterizations. Alternatively, the problem of parameterization can be reduced to that of defining a distribution of physical variables in subgrid-scale. The third possibility, the mode decomposition, is proposed by Yano et al (2005) as an alternative general approach. It contains the mass-flux convection parameterization as a special case. The latter is a standard formulation for convection parameterization currently adopted by majority of operational and climate models.
These three possibilities are discussed in sections 3-5 after a more formal statement of the subgrid-scale parameterization problem in the next section. The mass-flux convection parameterization is discussed in section 6 as a special case of the mode decomposition. The moment-based approach will receive the tersest attention, whereas the mode decomposition the utmost, with the distribution-based approach in the middle. This choice is made because the mode decomposition approach can elucidate wider contexts of the parameterization problem and its link to the other problems, especially the adaptive mesh refinement (AMR) and downscaling, as going to be further discussed in section 7.
In order to maintain wider perspectives to the parameterization problem, turbulence research in general, and more specifically, turbulent combustion modeling (see Bilger 1989, Veynante and Verisch 2002, Bilger et al 2005, Pitsch 2006) is taken as a point of reference. Combustion is associated with complex chemical reactions, which may be compared to a complexity of the cloud microphysics (see Khain et al 2015) in the climate system. Furthermore, the radiative heat transfer plays an equally important role in combustion processes as well as in the climate system. Thus a close analogy between the climate system and turbulent combustion may be drawn in terms of the order of the degrees of the complexity of the modeling. Most importantly, both systems are associated with turbulent flows. As a result, both systems face an equally formidable subgrid-scale parameterization problem.
The present review does not intend to be comprehensive. Probably, a monograph is required in order to accomplish this goal (see Plant and Yano 2015). References are also highly selective, and not all the materials that even I personally consider important are covered. The main purpose is, rather, to seek a general, consistent formulation for the whole atmospheric subgrid-scale parameterization problem under the principles outlined in section 1.1. Thus, the mathematical formulational structure of the problem becomes the chief concern, including basic strengths and the weaknesses of these three approaches. Many remarks in the following may be rather obvious for the specialists but rarely stated explicitly in the literature. General discussions are further extended in section 8 before concluding the paper in section 9.
The paper does not consider any specific phenomena, but it rather asks a question of formulating the problem in a general and consistent manner as a whole. As a result, phenomenological issues receive the scantiest attention. From the latter perspective, the present review may be considered a stark contrast with Arakawa (2004), which takes more phenomenologically based operational perspectives.

Statement of the problem
In the present review, we exclusively work on a single generic prognostic equation for any physical variable designated by j, which may be given by Here, u is the horizontal wind components, w the vertical velocity, and ρ the density, and F designates a total source term for j. The problem of the parameterization of the subgrid-scale physical processes arises because atmospheric models have only a limited horizontal resolution. The easiest way of seeing the issue is to take an interpretation that a single grid-point value within a model represents a grid-box mean value, say, j. Here, the overbar suggests average over a grid box. By taking an average over a grid box, the generic prognostic equation (2.1) reduces to The right-hand side, , is called apparent source by Yanai et al (1973), which is defined in terms of the subgrid-scale variables as where the prime ¢ designates a deviation from the grid-box mean (e.g. j j j ¢ = -¯). Note that no time average is explicitly here by a conventional wisdom that the time-scale would also be scaled by the spatial scale, though it is not always true.
The goal of the subgrid-scale parameterization is to seek a closed expression for the apparent source, , in terms of the known resolved-scale variables (grid-averaged quantities, j ). Of course, this is a nontrivial problem, because by design, a numerical model only deals with the grid-averaged quantities. The subgrid-scale fluctuation, j ¢, is totally implicit, being estimated in a parametric manner.
More precisely, the subgrid-scale 'estimation' problem consists of the two parts: (1) estimation of the subgrid-scale fluctuation (distribution), j ¢, itself, and (2) procedure for evaluating the apparent source term, . The first part, especially when a spatial distribution of a physical variable is concerned, is often separately dealt as downscaling (see Maraun et al 2010): See section 7.2 for more. The second part is a core of parameterization.
In defining the parameterization problem as above, closely following a classical derivation by Yanai et al (1973), we have talked as if a grid box literally exists within a numerical model. However, this is strictly true only if a finite volume approach (see LeVeque 2002) is taken. In most of the numerical approaches, the concept of a grid box, in fact, does not exist. A more formal approach for defining the 'grid-box' average in turbulence studies is based on filtering (see Leonard 1974: see section 7.3 for more). An alternative, more mathematically elucidating approach is a multiscale asymptotic expansion (see Majda 2007).
The parameterization problem also takes different flavor depending on a dominant term in the apparent source, . In many atmospheric turbulent problems, the dominant term is the vertical eddy transport, r j ¢ ¢ w . On the other hand, in deep moist convection, diabatic heating, F, associated with condensation of water vapor becomes a key term to be determined. To some extent, an approach to take depends on the nature of the apparent source term to be parameterized.

Moment-based approach
Description based on moment expansion is a standard approach in turbulence research. For this reason, the subgrid-scale boundary-layer turbulence is also often formulated in this manner (see Stull 1988, Garratt 1992. Many statistical descriptions of the turbulent flows also follow this formulation (e.g. Tatsumi 1980). Here, a nth moment is formally defined by j n ⟨ ⟩ for any physical variable, j, for =  n 1, 2, , although more than often in practice, a deviation from the mean is considered by setting j ¢ n ⟨ ⟩ instead, as presented below. Furthermore, cross variance such as vertical transport rates become of importance as immediately seen below. Here, ⟨ ⟩ designates an average over a given grid box for the present purpose, equivalent to the overbar in the last section. The averaging operator can be interpreted in various different manners as already suggested.
A prognostic equation for the nth moment is in its general from given by It transpires that this set of equations constitutes an expansion series towards the higher orders. The higher-order moments in concern include the correlation terms with the velocity, The prognostic equations for these quantities are not explicitly stated here just for the sake the brevity. An obvious problem, in turn, as a consequence is that this series of equations is not closed, but always higher-order moments are required in order to compute a temporal tendency of a moment at a given order. In order to truncate this infinite expansion series at a finite order, a certain set of assumptions must be introduced, which is called the closure. Thus, the closure turns out to be a key detail to be resolved under the moment-based approach.
The simplest choice would be to truncate the expansion at the leader order. In this case, the eddy transport rates, j ¢ ¢ u ⟨ ⟩, and j ¢ ¢ w ⟨ ⟩, must somehow be given in a closed form. The simplest choice would be to assume that they are driven towards a down-gradient direction, and the transport rate is simply proportional to the gradient. Thus where k h and k v are the eddy diffusion constants for horizontal and vertical directions, respectively. This closure is very intuitive, because it takes a direct analogy with the molecular diffusion. One of the earliest in-depth discussions on the eddy-diffusion concept in atmospheric modeling is found in Richardson (1926). Next order closures in context of the boundary-layer turbulence modeling are considered by Mellor andYamada (1974, 1982). Deardorff (1978), in turn, examines the physical consistency of the first three-order closures by taking a Gaussian plume model.
Here, we already see a basic dilemma of the parameterization development: the lowerorder truncations are preferred in order to keep the formulation simple, as well as keeping a number of unknown parameters minimum. However, a certain degree of physical reality must also be maintained. From the latter perspective, the higher-order truncations are always preferred. However, as we proceed to the higher orders, the closure problem becomes more involved, and more unknown parameters must be introduced.
A short survey of the literature would reveal that choice of a closure is rather a matter of art than solid science. More than often, a down-gradient transport, as in equations (3.2a) and (3.2b), is assumed in order to ensure a stability of a system. Thus, whatever order we take for a closure, effectively we cannot get away from a physically not-well-based notion of eddy diffusivity (down-gradient transport). Traditionally, furthermore, various symmetries are introduced for further simplifications (see Rotta 1951, Batchelor 1970. However, beyond this, it is hard to identify any clear, general rules for the closures. In this respect, the energy cycle principle recently developed by Zilitinkevich et al (2007Zilitinkevich et al ( , 2008Zilitinkevich et al ( , 2009Zilitinkevich et al ( , 2013) is one of promising developments: in order to establish a welldefined closure of a system, a complete energy cycle must be considered rather than just focusing only on the turbulent kinetic energy as traditional approaches take. The key for further pursing a healthy development of the moment-based approach is to maintain such physical insights in labyrinth of the closure issues.
A more fundamental difficulty associated with the moment-based approach is that it has principally been developed for conservative scalar variables, with a major focus on the vertical eddy flux, r j ¢ ¢ w . However, when a variable is no longer conserved, a contribution of the source term, j ¢ ¢ -F n 1 ⟨ ⟩ , must also explicitly be taken into account. A contribution of the nonhydrostatic pressure in the moment equation is the simplest example, whose treatment is already far from trivial (see Mironov 2001Mironov , 2009). More difficulties are expected in handling more inherently nonconservative processes such as cloud microphysics (see Khain et al 2015). For this reason, in discussing the description of the nonconservative processes under the momentum-based approach, Mironov (2009) quotes mostly from the studies based the DDF-based approach to be discussed in the next section.

Subgrid-scale distribution
Alternative general perspective for approaching the problem of describing subgrid-scale physical processes is to consider it as that of defining a distribution of a set of variables over a given grid box. Once this distribution is defined, the apparent source term (equation (2.3)) can be evaluated readily. That is the basic idea of the distribution-based approach.
Here, before we discuss the actual approach, a short remark on semantics is warranted, because this distribution is often called the probability density function (PDF) for a misleading reason. Here, there is no probability notion behind this distribution-based description: no uncertainty (in Bayesian sense: Jaynes 2003, Gregory 2005 nor stochasticity (physical randomness: Horsthemke and Lefver 1984, Gardiner 1985, Paul and Baschnagel 1999 is in concern. In other words, description is completely deterministic. For this reason, we propose to call it the 'subgrid-scale distribution', and the function to be determined the distribution density function (DDF). This proposal is along the line of the proposal for the concept of filtered density function (FDF) in context of large-scale eddy simulation (LES) by Pope (1990). Clearly, the concept of DDF is more general than FDF. However, unfortunately the acronym PDF nevertheless appears in the following due to certain established terminologies.

A formal approach for DDF: Liouville equation
Already many studies exist adopting the subgrid-scale distribution as a basis for constructing a parameterization, especially for total water and related quasi-conservative quantities for cloud parameterizations (Mellor 1977, Sommeria and Deadorff 1977, Bougeault 1981, Le Treut and Li 1991, Bechtold et al 1992, Richard and Royer 1993, Bony and Emanuel 2001, Golaz et al 2002, Tompkins 2002. However, unfortunately, a clear basic principle is often not stated in the atmospheric literature. This situation is in contrast with the state of art of turbulence combustion studies. Especially, Pope (1985) carefully reviews the various basic principles for subgrid-scale distribution descriptions.
In order to present the most basic principle among those, we re-write the basic physical equation A general formulation exists for describing time evolution of a distribution, j p ( ), of this variable at a given model grid point. Here, a grid box is considered a point in respect to the resolved large-scale processes. This interpretation is best understood when two coordinates are introduced for describing the resolved and the subgrid scales, separately, as under a multiscale asymptotic expansion (see Majda 2007).
Especially, when the total source term, S, does not involve any spatial derivatives (thus the advection effects must be neglected, or parameterized as local processes), its evolution is defined by the well-known Liouville equation (see Landau and Lifshitz 1980): Note that this equation has a simple physical interpretation that in the phase space of j, the distribution is conserved by following its movement given by S. In other words, the equation is understood in analogy with the mass conservation law: here, the distribution is conserved in place of the mass. The Liouville equation is relatively straightforward to solve when the source term in the right hand side is purely given in terms of a single point as often the case with many microphysical processes (with a major exception of sedimentations). Unfortunately, this fact is not widely appreciated (see Eherendorfer 1994). The effort for computation of equation (4.2) is equivalent to bin calculations of the same microphysical process (see Khain et al 2015) when only a single-variable distribution is considered. A bin approach for microphysics may even be considered a special case of the Liouville equation. Thus, the real question is not whether the Liouville equation is a practical approach, but a matter of priority of whether we should explicitly compute a subgrid-scale distribution or a size distribution of hydrometeors. Possibilities of more extensively introducing DDF in LES for modeling turbulent combustion flows are discussed favorably in Pitsch (2006).
Computation of joint distributions can also be performed in the same manner, but the computational cost increases rapidly as a more number of variables are considered for a joint distribution. For this reason, the best compromise would be to focus on the single-variable distributions, or at least a couple.
However, once the total source, S, is allowed to contain a spatial derivative (e.g. advection), the formulation suddenly becomes intractable. An exact formulation for evaluating time evolution of distribution is still available, which is given e.g. by equation (14) of Larson (2004), and its full derivation is given by e.g. Pope (1985), Klimenko and Bilger (1999). However, performing this computation becomes impractically demanding, because a full joint distribution for the all spatial points must somehow be evaluated.
Thus, in summary, the Liouville equation (4.2) is a powerful principle for evaluating time evolution of subgrid-scale distribution when the physical processes are described solely in terms of a single physical point. However, once a spatial dependence begins to play a role, for example by advection (transport), the computation of the distribution suddenly becomes intractable.

Assumed PDF (DDF)
A short cut for overcoming a difficulty for using the Liouville equation is to assume a certain fixed distribution form with a limited number of free parameters. This approach is called the assumed PDF (DDF) in the literature. One of the earliest applications to turbulence is by Lockwood and Naguib (1975). See also an earlier review by Pope (1979). One of the first application of this approach to the atmospheric modeling is Golaz et al (2002).
A starting point of the assumed PDF approach is to notice a close link between the distribution function, j ¢ p ( ), and the moments, j ¢ n ⟨ ⟩. The moments can readily be evaluated once a distribution is known: where the integral is performed for a full range expected for a given physical variable, j ¢. We intuitively expect the reverse is also true: once all the moments are known, the distribution can perfectly be defined. However, unfortunately, such as a series expression for a distribution in terms of moments, in analogy with the Taylor expansion, is not known.
The assumed-PDF (DDF) approach takes an advantage of this link. More precisely, we take a note on the fact that if a distribution is fixed to a certain category with a finite number of free parameters, a set of moments defined by equation (4.3) can be used for perfectly specifying the distribution by determining the free parameters. Time evolution of the given set of moments can be readily performed based on our vast knowledge on moment-based approaches (see section 3). Estimated tendency for the moments, in turn, defines the tendency for the distribution. For example, Golaz et al (2002) develop this idea by assuming a double Gaussian distribution for temperature, moisture, and the liquid water.
Since the moment-based approach is best developed for describing the eddy transport, the assumed-PDF approach also works the best when evolution of a distribution due to eddy transport is in concern. However, when the process is nonsonervative with involvement of, especially, cloud microphysical processes, as already remarked, the moment-based approach works less well, thus the assumed-PDF approach also suffers from the same problem.
Already proposed various subgrid-scale distribution schemes may be re-interpreted from a point of view of the assumed-PDF approach. Especially, a cloud-fraction scheme by Tompkins (2002) can essentially be re-interpreted as a type of assumed-PDF approach by taking the first three moments for the total water for determining an assumed distribution form. However, these three moments are treated in less consistent manner in his approach.

Time-splitting method
As the last two subsections show, there are two principal methods for evaluating the subgridscale distribution: the Liouville principle and the assumed-PDF. The Liouville principle works the best when all the processes are locally described, whereas the assumed-PDF approach works the best when only the transport processes are in concern. Advantages of both methods can be taken by evaluating the temporal tendency for the parts for the local physics and for transport (advection) in time-split manner: physics transport Here, the local physical tendency is evaluated based on the Liouville principle: whereas the tendency ¶ ¶tp transport ( ) due to the transport is evaluated from the tendency for the required moments, j ¶ ¢ ¶t , n transport { ⟨ ⟩ } due the same transport process. Sum of the two tendencies provides a total temporal tendency for the subgrid-scale distribution. A time splitting approach proposed here for the subgrid-scale distribution evaluation is commonly used in many numerical problems (see chapter 17, LeVeque 2002).
Another closely related analogue would be the transformed spectrum approach developed for global atmospheric modelings (Bourke 1972(Bourke , 1974. Under this approach, all the physical tendencies are computed in the physical space, whereas the advection (and transport) are evaluated in spectrum space. With the same token, the proposal here is to compute the physical tendency directly with the Liouville equation, whereas to evaluate the advection tendency in a moment phase space.
Under this time-splitting approach, a joint distribution between a prognostic variable, j ¢, in concern and the vertical velocity, w, clearly becomes important. Conversely, such timesplit approach would also be necessary for the moment-based approach in order to take into account of the nonconservative processes, as suggested in section 2.5 of Mironov (2009).
An interesting variant of the time-split method introduced above is adopted by Thayer-Calder et al (2015). Here, the microphysics are more directly evaluated by subdividing a model grid column into subcolumns (ten by their default). To some extent, the subcolumn coordinate plays the same role as the variable coordinate in the Liouville equation in this case. Note that an idea of subdividing a model grid column is reminescence of segmentally constant formulation to be introduced in the next section. Pope (1994), in his review on the DDF-based methods, makes rather a strong statement that the assumed PDF approaches should not be considered a DDF (PDF) based approach, because by design, it no longer uses a DDF. He in turn, focuses on a promising possibility of pursuing a DDF calculation under a Lagrangian framework, along the coordinates moving with the fluid particles. As a result, an advection term, to be added to the Liouville equation under the Eulerian framework, can be removed, and computations are much facilitated. Required particle trajectories may be evaluated with use of the Langevin equation in combination with a Monte Carlo approach.

Alternative approaches
Another important variance to the DDF-based approaches is to take a conditional DDF. Especially in combustion modeling, a distribution of variables under a given chemical state becomes important. The conditional distribution facilitates to predict a most-likely physical state under a given chemical state. The subgrid-scale modeling based on this methodology, called conditional moment closure (CMC), is carefully reviewed by Klimenko and Bilger (1999). More recent developments with CMC are found in Veynante and Verisch (2002), and Bilger et al (2005). The multiple mapping conditioning (Chen et al 1989, Pope 1991, Klimenko and Pope 2003 may be considered an important recent development as an extension of CMC. These more sophisticated methodologies pursued in turbulent combustion studies are likely to be also useful in atmospheric modeling.

Basic principles
Both the moment and the DDF-based approaches, reviewed in the last two sections, focus mostly on local distribution of variables in subgrid-scales. An implicit premise behind both approaches is that there is no significant spatial structure in subgrid-scale processes. In other words, the subgrid-scale processes must reasonably be 'disorganized', because otherwise a spatial correlation must be included into the statistics in order to account for organization tendency. Unfortunately, less is known how much of structures must be seen in order to include a spatial correlation into these approaches. Thus, sheer existence of coherencies does not automatically exclude an applicability of these two approaches (see Wyngaard 1997).
In general, when subgrid-scale processes contain substantial spatial organizations, a different approach may be taken such as the mode decomposition approach. Note that many of the subgrid-scale processes contain clear organized structures in the atmosphere: boundarylayer wakes (cold pools), plumes, convective towers, mesoscale organization. Coherencies may be considered a fundamental ingredient of the turbulent flows (see Townsend 1956, Robinson 1991, chapter 2 Holmes et al 1996 The mode decomposition approach is specifically designed for representing these organized structures in the subgrid scales. Its first step is to choose a mode set for expanding the given full physical system. In principle, a mode set could be anything (e.g. Fourier, wavelet, EOF), though certain conditions (e.g. orthogonality) are preferred for the the choice as discussed in detail in Yano et al (2005). Moreover, from a physical point of view, the chosen mode set must represent the subgrid-scale structures of the interest well so that it can represent them in efficient manner with a small number of modes.
The basic premise behind the mode decomposition approach is to take a high truncation as much as possible so that the original full physical system can be represented in a much compressed manner in the same sense as the image compression (see Mallat 1998). If a given modal compression is extensive enough, the model can already be considered a prognostic parameterization.

Proper orthogonal decomposition (POD)
One of the oldest applications of the mode-decomposition approach for representing coherencies in turbulent flows may be found in Lumley (1967), Aubry et al (1988). Their basic idea is to use a low-order truncation of POD. This approach is further pursued by Deane et al (1991), Zhou and Sirovich (1992), Podvin and Lumely (1998), Podvin (2001Podvin ( , 2009). General perspectives for use of POD in fluid mechanics are reviewed by Berkooz et al (1993). Rationales for considering the turbulence flows as a type of low-dimensional dynamical systems are extensively discussed in Lumley (1990) along with the other basic turbulent issues. In turn, the turbulent research is reviewed from the POD perspective in Holmes et al (1996). The mathematical structure of POD is systematically examined in Aubry et al (1991). As it turns out, POD is equivalent to singular vector decomposition (SVD: Chatterjee 2000), whose extracted horizontal patters are often called empirical orthogonal functions (EOF) in atmospheric sciences.
In POD-based reduced-order modeling, a key concept is the minimal flow unit (MFU: Jiménez and Moin 1991), a size of the domain just large enough to contains a single coherent structure. This concept is conceptually comparable to the notion of 'grid box' (see section 2) invoked in developing atmospheric subgrid-scale parameterizations. However, a major difference of their modeling strategy is to develop a self-contained description over MFU. Thus a key question becomes a prescription of the pressure-gradient term, for example. This is in contrast to the subgrid-scale parameterization, in which the subgrid-scale processes are considered under external forcing of the large-scale processes such as the pressure gradient force.
Unfortunately, the POD based approach faces the same dilemma as the momentexpansion approach: where to truncate the system. Though a higher truncation is numerically more efficient, it is less accurate, and vice versa for a lower truncation. A series of papers referred above are mostly concerned with the careful quantifications of this general behavior.

Wavelet-based approaches
Considering the fact that these subgrid-scale coherent structures are often spatially localized, wavelet becomes a strong alternative candidate for this purpose. Yano et al (2004a) demonstrate that wavelet can extensively compress (in the same sense as image compression) the atmospheric convective system in support of this expectation. A possibility of taking wavelet as a basis for development of a subgrid-scale parameterization is further discussed in Yano et al (2005).
The multiresolution analysis, as a generalization of the wavelet methods (see Mallat 1998), provides a general methodology for developing a spatially inhomogeneous mesh distribution with a good control of the accuracy (Harten 1993(Harten , 1996. The methodology can be cast into numerical modeling of PDEs (like an atmospheric model) with a capacity of timedependent adaptivity (Harten 1994). Beylkin et al (1991) furthermore discuss an efficient methodology for representing linear operators by wavelets. Further discussions on adaptivity is found in Maday et al (1991). We refer to Yano et al (2004aYano et al ( , 2005 for more recent applications.

Segmentally constant modes
However, use of those orthogonal complete modes (PODs, wavelet) in atmospheric modeling turns out to be not as convenient in practice. Performing cloud microphysical processes (even condensative heating) in phase space is rather cumbersome. A solution for this problem could be to adopt a transformed spectrum method (see section 4.3). However, a slightly more careful reflection leads to a conclusion that if we could stay always in physical space without any transformation, the resulting algorithm would even be much simpler.
These reflections lead to a choice of segmentally constant modes as a basis for the expansion in spite of all the associate drawbacks (i.e., lack of orthogonality and completeness). The basic idea behind this approach becomes clearer when it it re-interpreted as a type of approximation (segmentally constant approximation or SCA: Yano et al (2010)). As it turns out, SCA is closely linked to mass-flux parameterization (Yano et al 2005, Yano and Baizig 2012, Yano 2012a, 2014a: application of SCA to a cloud-resolving model (CRM) system leads to a fully prognostic prototype of the mass-flux parameterization formulation. More standard subgrid-scale parameterizations are obtained under introduction of further approximations and hypotheses, as further discussed below.

Segmentally constant approximation
The SCA is inspired from the mass-flux convection parameterization, originally proposed by Arakawa and Schubert (1974). The basic conceptualization of Arakawa and Schubert's massflux formulation is presented in their figure 1: an ensemble or spectrum of convective (updraft) plumes, as idealization of hot towers (Riehl andMalkus 1958, Yano 2009), is considered within a grid box. A slightly more abstract version of the same conceptualization for a top view is presented by figure 2. Here, the schematics is also slightly generalized by considering also the downdrafts (gray circles) along with the updrafts (black circles). Tones inside the circles are homogeneous both for the updrafts and the downdrafts, because within the mass-flux formulation, it is assumed that the physical variables are horizontally homogeneous within each convective element. The surrounding environment is also assumed to be horizontally homogeneous. Thus, every subgrid-scale component (updrafts, downdrafts, environment, etc) is assumed to be consisting of horizontally homogeneous segments. This leads to the notion of SCA.
Note that the notion of 'environment' plays a special role in figure 2 (see section 6 Yano 2014a). However, distribution of segmentally constant elements can further be relaxed: SCA, as it stands, can subdivide a grid box into horizontally homogeneous subsegments more flexibly. Figure 3 conceptually suggests such a representation by taking a distribution of cloud

Deduction of full physics into mass-flux convection parameterization
Yano (2014a) shows how a full physical system can be deductively reduced to a standard mass-flux convection parameterization. The first step consists of applying SCA to a full physical system, as a basic geometrical constraint imposed on the mass-flux convection parameterization formulation under the hot-tower hypothesis (Riehl andMalkus 1958, Yano 2009). This procedure may be considered a strict application of the consistency principle outlined in section 1.1. In order to achieve this goal, an AMR capacity is added so that distribution of the SCA segments is adjusted with time by following the evolution of a convective system. The study suggests that the model can be compressed up to 20% of the full-resolution model for dry well-mixed boundary layer simulation. This study is further extended by Bouniol (2010, 2011) for the case of tropical squall-lines in moist atmosphere with inclusion of the minimum microphysics. In this latter case, a realistic simulation of a squall line is accomplished with a compression even closer to 10%.
Second, SCA is a basic geometrical constraint imposed to the mass-flux parameterization. Yano and Baizig (2012) show this point by considering only the two SCA segments corresponding to a plume (hot tower) and an environment. This geometrical configuration is equivalent to that of the bulk mass-flux parameterization. Thus, this model becomes a fully prognostic version of the latter. Yano (2014a), in turn, shows that the standard mass-flux parameterization formulation can be reconstructed from NAM-SCA by adding further hypotheses and approximations. Here, the two key hopotheses treat: (1) Entrainment-detrainment (2) Environment.
The final step for recovering the standard mass-flux formulation is to take an asymptotic limit of vanishing fractional area occupied by convection within a grid box, i.e., s  0 c . Most importantly, the limit makes the system steady, and the problem of the mass-flux parameterization becomes a diagnostic one. This final step also leads to a need for introducing a closure as a result.

Structure of the mass-flux parameterization formulation
We refer to Yano (2014a) for detailed discussions on the structure of the mass-flux parameterization formulation. Here, only a very succinct description is given focusing on the key issues that are relevant for considering the general problems of parameterization consistency discussed in section 1.1.
As the name suggests, under the mass-flux parameterization formulation, a key problem reduces to that of defining the convective mass flux, M, which is a vertical momentum flux associated with convection. It may take either a scalar or a vector form depending on the formulation is either bulk or spectrum. Under the standard formulation, the convective mass flux is defined in terms of the fractional rates, ò and δ, for entrainment and deterainment by In order to solve this equation (6.2), we have to sort out two issues. First, the the entrainment and the detrainment rates, ò and δ, must be specified. Next, once the entrainment and the detrainment rates, ò and δ, are specified, the equation can be solved by vertical integrating it (usually upwards) under a given bottom boundary condition. Thus, the second is to identify the bottom boundary condition. The latter is called the closure in the context of the mass-flux parameterization. We discuss these two issues in turn in the following two subsections.

Entrainment-detrainment problem
The convective mass flux, being a type of momentum, would be most legitimately obtained by solving a momentum equation. A full form of this momentum equation is presented in Yano (2014c) for example. The entriainment-detrainment hypothesis is introduced rather by a historical consequence (see Yano 2014b). This hypothesis originates from a classical water-tank experiment of a convective plume by Morton et al (1956). This plume is maintained by a constant buoyancy flux at a bottom of the tank, and leading to its growth with height under a constant fractional entrainment of mass. This entrainment plume is adopted by Simpson et al (1965), Simpson and Wiggert (1969) for verification of their cloud seeding experiments, and they have obtained consistent results with observation with this model (see Simpson 1983 as a review). In turn, this is the main reason why the entraining plume is adopted as a basic element of atmospheric convection by Arakawa and Schubert (1974) in formulating the mass-flux convection parameterization.
In presenting their formulation, they leave a possibility of adding a detrainment in the mass-flux equation as presented in equation (6.2). Since then, equation (6.2) becomes a standard formulation for evaluating the convective mass flux. However, oversimplification of this picture is soon realized, and subsequently, various modifications to the entrainmentdetrainment formulation are proposed. See  for a succinct list for the proposed formulations, Yano (2014b) for a historical review, and de Rooy et al (2013) for the current status.
The entrainment-detrainment hypothesis is a good example of the importance of understanding the origin of the given formulation, that is here traced back to Morton et al (1956). The basic idea of this hypothesis is that these entrainment and detrainment parameters can be determined as simple functions of a given large-scale state of any flow, be that the atmosphere or a water tank. This is definitely the case with the original laboratory experiment by Morton et al (1956), but there is no reason to believe that it is also the case with atmospheric convection in general.
Here, we also see an importance of understanding the structure of a given parameterization formulation. If one well understands the structure of the mass-flux formulation as presented in Yano (2014a), the entrainment-detrainment hypothesis is hardly its indispensable part. As already suggested above, and explicitly shown in Yano (2014a), the mass flux can readily be evaluated by directly solving a corresponding vertical momentum equation without entrainment-detrainment hypothesis.
However, unfortunately, the current research is almost exclusively focused on refining the entrainment-detrainment formulation (see de Rooy et al 2013) without much attention to this alternative possibility. This is an example of how a parameterization development problem is reduced into a tuning problem, being lost in details.

Closure problem
A slightly different way of looking at the mass flux problem (6.2) is to realize that it permits a separation of variables so that the mass flux, M, can be given in the form where h z ( ) is a normalized vertical profile, and M B (t) is a mass-flux amplitude. The entrainment-detrainment formulation is used for defining a normalized vertical profile, h z ( ), whereas a closure condition defines the mass flux amplitude, M t B ( ). Here, the subscript B is added for suggesting that M B refers to the convective bottom value, but its general definition can easily be altered by changing a normalization condition for η.
A general approach for the closure may be formulated based on a conservation law . As an example of conserved quantity, we may take such as energy, moisture. A standard closure condition is often obtained by assuming a steady state to a given conservation law. A simple dictum derived as a corollary of this formulation is: any diagnostic closure hypothesis must have a prognostic counterpart in order that to be physically consistent.
In this respect, the convective energy cycle appears to provide the most self-consistent formulation for the closure as originally argued by Arakawa and Schubert (1974). Their convective quasi-equilibrium hypothesis arises as a natural steady closure under this formulation (Yano and Plant (2012a)). A review by Emanuel et al (1994) promote this idea at a very conceptual level, but it is less precise in specifying the energy cycle. When a full energy cycle is considered, a prognostic version of closure is obtained (Randall and Pan 1993, Pan and Randall 1998, Yano and Plant 2012b, 2012c, Plant and Yano 2013. Here, in contrast to the entrainment-detrainment problem, we identify a clear path for further investigating the closure problem. For further discussions of the mass-flux closure problem, see Yano et al (2013).
Discussion here again points to an importance of well understanding a structure of a given parameterization. Here, the mass-flux convective closure is not necessarily diagnostically formulated as currently the case, but it can also be formulated in a prognostic manner, as just remarked above. We can even totally remove the closure problem by more directly time-integrating a vertical momentum equation, but still keeping the entrainmentdetrainment hypothesis intact, as shown by Yano (2012a).

Further generalization of mass-flux parameterization under SCA
Mass-flux parameterization is originally proposed for parameterizing atmospheric convection based on the hot-tower hypothesis (Riehl andMalkus 1958, Yano 2009). However, by considering a basic geometrical constraint for mass-flux parameterization in terms of SCA in a more general manner, subgrid-scale processes to be parameterized is no longer restricted to hot-tower type convection.
It has been criticized that the mass-flux convection parameterization does not take into account various aspects of atmospheric convection that do not fit into the hot-tower hypothesis. Downdrafts would be the first elements to be included as done in almost all the operational models, but rather in ad hoc manner (Yano 2015) without re-performing a formulation development from first principles (see section 1.1). Representation of convective organization under a shear flow turns out to be even harder under the standard mass-flux formulation. However, re-interpretation of mass flux as SCA much facilitates these generalizations. Yano and Moncrieff (2016) show by considering a four-segment version of NAM-SCA that a SCA formulation can well represent mesoscale convective organizations and reproduce realistic upgradient momentum transport associated with convective organization.
SCA is designed to represent spatially isolated coherencies like hot towers well. Thus, representation of convectively generated gravity waves may be expected to be much less efficient. This issue is addressed by Yano and Lane (2014) by taking an adaptive version of NAM-SCA. As it turns out, NAM-SCA can represent these waves equally well as convective organization even when the model is compressed almost down to the level of 10% relative to the full-resolution case.

Links to the other problems
The mode-decomposition approach in general, and the SCA more specifically, elucidates a whole structure of a particular type of parameterization formulation well. Such a generality much facilitates to construct a parameterization in consistent manner by following the principles outlined in section 1.1. As a result, we also begin to see links of the subgrid-scale parameterization problem with the other problems.

Adaptive mesh refinement
The mode decomposition-based parameterization of the subgrid-scale processes essentially takes a form of Galerkin projection for numerically modeling (see chapter 4, Holmes et al 1996), albeit under a high truncation. Simple extrapolation of this re-interpretation suggests that the subgrid parameterization problem may be re-interpreted as a type of meshrefinement problem in numerical modeling.
SCA is even closer to mesh refinement, because its idea is essentially of subdiving a grid box into sub-grid boxes. The approach by Thayer-Calder et al (2015) also takes the same basic strategy. , and Bouniol (2010, 2011) further add a dynamic adaptivity to their NAM-SCA modeling, thus leading to re-interpretation of parameterization as AMR.
The AMR approaches are already extensively investigated in the contexts of fluid mechanics. One of first such applications is by  to a two-dimensional shock wave with use a method developed by Berger and Oliger (1984). More recently, AMR is applied to LES by e.g. Hoffman and Johnson (2006) (2015) to shallow-water models on the sphere; Hubbard and Nikiforakis (2003) to a three-dimensional global model. See a collection of papers edited by Nikiforakis (2009), and an overview of meteorological applications by Behrens (2006). Bacon et al (2000), and Gopalakrishnan et al (2002) further report use of AMR methodology in operational numerical weather forecasts.

Downscaling
We furthermore see a link between the subgrid parameterization and downscaling problems along this line. The goal of the downscaling is to retrieve information on the subgrid-scale variabilities missing in modeling (see Maraun et al 2010). Though crudely, the mode decomposition exactly retrieve such subgrid-scale information, j ¢. Such information should surely be valuable for a purpose of downscaling, too. As it stands now, many of the downscaling methodologies are purely statistical without much regard to consistency with physical laws. Link between parameterization and downscaling identified here would also provide more physical basis for downscaling (see Yano (2010)).

Limits of AMR interpretation
Setting a wider vision on subgrid-scale parameterization from a point of view of mode decomposition, we may even conclude that the subgrid-scale parameterization problem can essentially reduce into that of mesh-refinement in numerical modeling. Unfortunately, this perspective appears to be rather too naive.
First of all, generating refined mesh itself is far from a trivial problem (see Prusa and Smolarkiewicz 2003). A new grid point cannot be inserted arbitrary into a system without regard to a whole system: a global mesh regularity must be maintained (Budd et al 2015). Furthermore, it would also be desirable that the refined mesh structure is consistent, in a certain manner, with a PDE system to be solved and, thus a numerical solution can be consistently obtained (Budd et al 2013(Budd et al , 2015. We should realize that a numerical solution is closely coupled to a refined mesh structure. Especially, the conservation laws of a given PDE system should be well preserved under mesh refinement (Kühnlein et al 2012). Düben and Korn (2014) show the abnormal reflection and scatter of waves arise over the transition zone of refinement if such a care is not taken. This very last issue becomes a critical problem when a very inhomogeneous finite-volume distribution is considered as in case with the SCA approach.
Another naiveness of replacing subgrid-scale parameterization by AMR, from a physical point of view, is an anticipation that a relatively low-dimension model can describe subgridscale coherent structures in closed manner (see Yano and Mukougawa 1992). The mass-flux convection parameterization may have a better justification being based a hot-tower hypothesis (Riehl andMalkus 1958, Yano 2009), which assumes that a bulk of subgrid-scale transport is due to the hot towers. However, in general, there is no guarantee that that is the case.
More generally, unless mesh-refinement is performed in extremely wise manner, there are always some unresolved processes that must somehow be represented in indirect, implicit manner. In other words, AMR cannot be considered a panacea for the subgrid-scale parameterization problem.
However, introducing the subgrid-scale parameterization into inhomogeneous grid-mesh is far less trivial than a homogeneous case (see Menter and Egorov 2010). This is probably a principal reason why AMR is not yet widely used for LES (see Goodfriend et al 2013), either. As expected, the aforementioned transition zone of refinement is where the subgrid-scale parameterization becomes most problematic (Sullivan et al 1996, Goodfriend et al 2013. Here, mesh refinement may be considered a kind of coordinate transformation from a homogeneous grid-mesh distribution regardless of such transformation is actually explicitly introduced or not. The situation is like introducing a Gal-Chen's terrain-following coordinate (Gal- Chen and Sommerville (1975)). As a result, complex metrices are introduced into the differentiation terms (Wedi and Smolarkiewicz 2003). Subgrid-scale parameterization must be constructed with the contributions of those metrices explicitly taken into account. (In this respect, probably, a way for constructing a boundary-layer parameterization over a complex terrain would be to apply the Gal-Chen coordinate transform and consider a problem over a flat surface but with complex metrices.) Alternatively, in constructing a LES model under an inhomogeneous grid-mesh distribution, space-filtering operator must be adapted accordingly. Discrete filters satisfying these requirements are considered by Vasilyev et al (1998), andMarsden et al (2002).
However, such filtering would merely define a corresponding equation to (2.2) in a more consistent manner. A self-contained expression for the right-hand side term (apparent source) of equation (2.2) is still to be identified. This procedure is not trivial, even for a simply scheme such as eddy diffusivity. A simple dimensional analysis would suggest that the eddy diffusion coefficient is scaled by k~Dx 2 as a function of the mesh size, Dx, as assumed e.g. by Smagorinsky (1963). However, review of observational estimates by Richardson (1926) rather suggests a dependence k~Dx 4 3 .
An attractive alternative possibility is to let the eddy diffusion effect taken care by implicit diffusion embedded in a numerical advection scheme, as proposed by Margolin et al (1999), leading to an idea of implicit large-eddy simulation (Margolin and Rider 2002, Hickel et al 2006. Though this approach has an attractiveness of circumventing the subgrd-scale parameterization issue, it is not yet clear whether this approach provides a physically consistent dissipation. Domaradzki et al (2003) attempt to evaluate this consistency by comparing the obtained implicit numerical dissipation with a theoretical prediction.

Principle of minimum empiricism
In process of deducing the NAM-SCA system into the standard mass-flux convection parameterization, the two less well-defined problems are introduced: entrainment-detrainment and closure. Especially, the former problem can easily be trapped into a tuning exercise as already suggested in section 6.3.
A couple of specific examples are presented here to be more specific. The first is Mapes and Neale (2011), which emphasize that the entrainment rate is a difficult parameter to control: it must be not too large but not too small. They call this issue the 'entrainment dilemma'. This work attempts to resolve this dilemma by introducing a new parameter called the 'organization parameter'. As its name suggests, this new parameter intends to mimic a tendency due to convective organization in convection parameterization. Though its physical justification for a formulation is intuitive, a parameter is introduced without any clear derivation against the consistency principle discussed in section 1.1. In this respect, it is curious to note that Hohenegger and Bretherton (2011) arrive at a similar formulation, but without explicitly introducing such a notion.
The second example stems from the efforts of including contributions of boundary-layer processes into the closure problem. It is commonly believed that a trigger mechanism is required in order to induce convection by overcoming convective inhibition (CIN) in the boundary layer. Mapes (1998) proposes to call it an activation-control principle. Rio et al (2009Rio et al ( , 2013, Grandpeix and Lafore (2010) attempt to describe the boundary-layer processes controlled by CIN by introducing the two further quantities called available lifting energy (ALE) and available lifting potential (ALP). Here, ALE defines the threshold of overcoming a barrier of CIN, whereas ALP measures an eddy-kinetic energy generation rate in the boundary layer. Analysis of the problem by Yano (2012bYano ( , 2014c under SCA suggests that ALE should play no role in convective kinetic energy generation. ALP may be introduced but only by a further drastic simplification of the SCA system. For further critical discussions on the role of triggering in convective closure, see Yano et al (2013).

Combinations and equivalence: unification?
In the beginning of this review (section 1.1), an importance of deriving a parameterized system deductively starting from an original full system is emphasized. If such a procedure is strictly taken, there would only be a single set of parameterization for a whole system. However, as it stands for now, parameterizations are divided into different physical processes, against this principle. Thus, a need for a unification of parameterizations (e.g. Arakawa 2004) is often claimed. However, from a point of view of discussions on consistency in section 1.1, this is not a path recommended: do not 'unify' together the things that are developed in unrelated manner. Rather everything should be developed together from onset as already emphasized.
Some times, combining the formulations is simply claimed to be 'unification'. We critically examine these claims in this subsection.
Equivalence of the moment and DDF-based approaches is intuitively clear, though a robust proof may be difficult, as already discussed in section 4. On the other hand, one may note a distinctive difference between the first two approaches on the one hand and the mode decomposition approach on the other hand: no spatial structure is explicitly considered in the former, whereas the latter is based on a specific consideration of geometrical structures such as ensemble of hot towers.
In literature, certain equivalence between the DDF-based and the mass flux approaches is claimed based on the observation that the latter is essentially a top-hat distribution from the former point of view. Once a top-hat distribution is assumed, it is straightforward to compute all the moments by equation (4.1). Lappen and Randall (2001a, 2001b, 2001c translate the mass-flux formulation into a moment-based description based on this recipe. Though they call their approach 'unified', we should be cautious with this claim because this translation works only in one direction: from mass flux to moments via top-hat distribution. Especially, entrainment and detrainment are calculated under a conceptually different principle after this translation: in Lappen and Randall (2001b), they are determined from the grid-box averaged vertical-velocity variance equation, which is not linked to the lateral exchange between convection and environment in any trivial manner.
Another attempt is to combine two different approaches into a single formulation. A methodology of eddy-diffusivity mass-flux (EDMF) initiated by Soares et al (2004), Siebesma et al (2007) and further developed by Neggers et al (2009), Neggers (2009, and others, for example, combine the moment and the mass-flux based approaches. Their original derivation is heuristic. However, a higher-order approximation to SCA establishes this possibility in robust manner with further elaborations. In order to make this point, instead of introducing the mass-flux approximation (equation (6.1)), we retain contribution of eddies to vertical transport: where j j j  =j j designate a deviation from a SCA value over a j-the subgrid-scale component. The easiest closure is to express the second contribution by an eddy-diffusion form (3.2). That is the essence of the EDMF idea. Thus the total vertical eddy transport is represented by with κ the eddy-diffusion coefficient. Here, j is interpreted as an approximation for the environmental mean, and j j j ¢ =j j¯.
However, equation (8.1) elaborates the matter more: an equal importance in contribution of eddy transport within convective elements. This is shown by applying a standard scaling, s s ~ w w e e c c , to the eddy components with the subscripts e and c designating the environment and convection components, respectively. This simple scaling suggests that the eddy diffusivity within convective elements must be enhanced by the factor, s s e c : when the fractional area occupied by convection is smaller, the eddy contribution from convective elements also becomes inverse-proportionally larger. Thus, equation (8.2) may be better reformulated as ⎡ with the sum spans for c and e, also assuming that the eddy diffusion formulation is useful for convective eddy transport. Note that the above scaling argument suggests, k k c e . equation (8.2) is recovered from equation (8.3), only if the convective-scale eddy transport is negligible. Here, we must also return to an issue of the subgrid-scale parameterization under mesh refinement just discussed in section 7.3: actually defining the values for k c and k e is not trivial.
This discussion suggests how to apply the moment-based approaches more extensively when a grid box is subdivided into a number of subsegments (subgrid-scale components) under SCA: a generalization of SCA to a higher order combines the eddy diffusion and the mass flux formulation. It is, furthermore, straightforward to write the moment equations for individual subgrid-scale components, j ¢ j n by moving to even a higher order. As a result, all the discussions in section 4 can be repeated by adding a subscript j standing for the jth subgrid-scale component.
In this manner, the SCA-based perspective leads to a possibility of much more extensively applying a moment-based framework into individual subgrid-scale components, not only to the environmental component, and also to the higher-order moments. An energy closure introduced by Zilitinkevich et al (2007,2008,2009,2013) can also be generalized under this perspective by applying it to different subgrid-scale components. As a result, a closure hypothesis developed under a moment-based approach finds a conceptual link to a counterpart developed under a mass-flux framework (Randall and Pan 1993, Pan and Randall 1998, Yano and Plant 2012b, 2012c, Plant and Yano 2013.

Stochastic approach?
It is often claimed that the subgrid-scale parameterizations are best formulated in stochastic manner, because the subgrid-scale processes are expected to be 'noisy'. Such a point of view is especially promoted by a series of reviews by Palmer (1993Palmer ( , 2012, and Palmer et al (2005). At a certain corner of the community, there is even a fanatic tendency to believe that the stochasticity is a panacea for all the subgrid-scale parameterization problems (see Berner et al 2016).
On the other hand, the present review does not identify stochasticity as a key concept in seeking a solid starting point. It may still turn out that stochasticity is useful to introduce to a certain problem. However, it must first be justified. The method of homogenization under asymptotic expansions (Pavliotis and Stuart 2007) provides such a solid methodology for introducing a stochasticity in a justifiable manner.
In general, when a more viable theoretical explanation exists under a deterministic formulation for a given process, it would be wise not to invoke stochasticity unless there is a strong reason to do so. For example, there is a rather simple explanation for the transformation of shallow to deep convection based on a deterministic energy cycle principle Plant 2012c, Plant and. In face of it, it is less sensible to pursue an alternative empirical approach invoking a Markov process (see Dorrestijn et al 2013).
On the other hand, when numerical computations of a full deterministic system is too costly, an alternative stochastic approach may be taken. Use of the Langevin equation in the context of the Lagrangian DDF approach, discussed in section 4.4, would be a good example. Alternative possibility could be to add a stochastic forcing to a low-order model constructed by POD, or under any mode decomposition, in order to crudely represent an unrepresented incoherent subgrid-scale process, provided that a formal method (e.g. homogenization) justifies it.

Statistical approach?
Along the same line, the literature also often claims that a parameterization must be based on a statistical approach. The original notion may be traced back to Kuo (1974), who introduces this notion in discussing a life cycle of convection in his formulation. However, his discussion does not quite reveal what 'statististical' means. More than often, in my own reading, the word is introduced merely as an excuse for not considering the subgrid-scale fluctuations, j ¢, more explicitly in their formulations (see section 2).
A more solid proposal may be found in Arakawa and Schubert (1974). In discussing the closure problem, they suggest that an ultimate answer to this problem may be found by developing the statistical cumulus dynamics. However, we still lack with systematic efforts for developing such a statistical dynamics for convection: see a review by Plant (2012). A compact, but very general, self-contained formulation for the statistical mechanics, useful for this purpose, is presented by Jaynes (1978).

Conclusions
The present review has begun by asking how it would be possible to construct subgrid-scale parameterization in consistent manner. Some basic principles required to achieve this goal are outlined in section 1.1, see also Yano et al (2014). In order to see how these principles can be applied more concretely, the three major general approaches for parameterizations are closely examined.
The moment-based approach has been extensively developed for the turbulence as well as the atmospheric boundary layer. The main concern here is to estimate vertical eddy transports of various quantities such as heat and moisture within the boundary layer. From a point of view of the moments, this problem reduces to that of seeking a closed expression for a correlation between the vertical velocity and the transported quantities. Lack of well-defined methodology for defining the closure is clearly the weakest part of this methodology. The methodology begins to face more difficulties when a transported quantity is not conservative.
Another important limitation of the moment-based approach is that it solely depends on a single-point description of the subgrid-scale processes. Thus, when subgrid-scale processes represent substantial spatial structures, we need to take a spatial correlation into account in order to make this approach useful. Extent of spatial organization that is required to force us to take into account a spatial correlation is, however, not known. An optimistic point of view can still be taken that the moment-based approach is valid with some coherencies as observed in the atmosphere (see Mironov 2009).
An alternative approach would be to consider the subgrid-scale distribution (DDF) more explicitly. It is emphasized that the Liouville equation is the basic principle that describes time evolution of DDF. Existing efforts of exploiting this principle as explicitly as possible under turbulent combustion modeling are also emphasized. Similar more systematic efforts based on DDF are still to be seen in atmospheric modeling.
A third possibility considered is the mode decomposition as a basis for developing subgrid-scale parameterizations. This methodology is proposed with ubiquitous existence of coherent structures in turbulent flows, including the atmospheric flows, in mind. The hot tower in atmospheric convection is just a particular example. When a mode set is chosen carefully, these coherent structures can be expressed efficiently under a high truncation of mode expansion. The POD (or SVD, EOF) is a particular such approach proposed in turbulence modeling. The mass-flux formulation for atmospheric convection is considered another example.
Application of mode decomposition to a system is rather straightforward. Our experiences with a choice of mode decomposition are extensive enough to choose a one for a new problem in reasonable manner, though a matter of taste may always remain. Both POD and mass-flux are chosen in such manner. Wavelet is also considered an important alternative. However, a degree of truncation to take is far from obvious, and as difficult as choosing a closure level in moment expansion.
The standard mass-flux convection parameterization faces further issues as it is constructed under further approximations and hypotheses. The two key issues are entrainmentdetrainment and closure. Among those, I personally see the entrainment-detrainment problem misses a clear guiding principle to pursue. As a result, the choice of the entrainment and detrainment rates tends to turn into a tuning exercise. Though more than often the choice is supported by a cloud-resolving simulation, a physical variable that controls the parameters is chosen rather based on a physical speculation. A dependency curve is often drawn from a rather scattered plot. On the other hand, the closure problem can be formulated in a general manner in terms of a stationality of any vertically integrated quantity (e.g. CAPE, columnintegrated moisture). This leads to a much wider range of possibilities to pursue than hitherto considered. The mass-flux convection parameterization can furthermore be generalized by its reinterpretation as SCA. It has already been shown that this generalization can represent convectively generated gravity waves as well as organized convection under a shear flow.
An effort for deconstructing a particular parameterization formulation is hardly overemphasized. Here, by understanding the whole structure of the mass-flux formulation under SCA better, we see a possibility of constructing a version without entrainment-detraiment hypothesis, or closure, or neither.
Construction of parameterization under mode decomposition further suggests that it may essentially reduce to a problem of AMR. Where we expect an ensemble of hot towers, we just place fine resolutions so that they are resolved. Both POD and mass-flux approaches are essentially constructed under this premise. However, this idea may not be able to be pursued in more general manner, because not all the subgrid-scale processes take a coherent structure. Even more generally, whatever resolution is taken locally, there are always certain processes that are not yet explicitly represented, unless the model resolution is high enough to resolve the molecular dissipation scale. Difficulties in AMR procedures themselves are also emphasized.
However, this last perspective suggests a close link between subgrid-scale parameterization and numerical algorithm. Consistency between them is clearly another issue to be closely examined, as attempted in e.g. Margolin and Rider (2002), and also by following general principles outlined in section 1.1.