Predictability Problems of Global Change as Seen through Natural Systems Complexity Description 2 . Approach

Developing the general statements of the proposed global change theory, outlined in Part 1 of the publication, Kolmogorov’s probability space is used to study properties of information measures (unconditional, joint and conditional entropies, information divergence, mutual information, etc.). Sets of elementary events, the specified algebra of their sub-sets and probability measures for the algebra are composite parts of the space. The information measures are analyzed using the mathematical expectance operator and the adequacy between an additive function of sets and their equivalents in the form of the measures. As a result, explanations are given to multispectral satellite imagery visualization procedures using Markov’s chains of random variables represented by pixels of the imagery. The proposed formalism of the information measures application enables to describe the natural targets complexity by syntactically governing probabilities. Asserted as that of signal]noise ratios finding for anomalies of natural processes, the predictability problem is solved by analyses of temporal data sets of related measurements for key regions and their background within contextually coherent structures of natural targets and between particular boundaries of the structures.

General statements of research programmes, con- cerning global change issues, were outlined in Part of the publication.The statements were considered as composite parts of possible discrete dynamics application to study global change by the induced representations about information Corresponding author.255   sub-spaces taking stringent terms of sets, mea- sures and metrics (SMM) into account.Order and chaos categories in dynamic systems were described to develop conceptual models of global analysis, interpretation and modelling using the major framework concerning the SMM categories.
Correct and incorrect problems were mathe- matically set up to find ways of their solutions existence, uniqueness and stability relative to dis- turbances of initial data, given by data of remote sensing measurements.
Below we present our approach to employ in- formation measures and entropy metrics for de- scribing the complexity of natural targets and structures construction to find ordering procedures while processing multispectral satellite imagery of the targets/structures.This will be needed to come to analysis of temporal data sets of the imagery that should approach us to understanding the predictability problems of global change.

KOLMOGOROV'S PROBABILITY SPACE
We shall operate in our considerations with the chaos and order characteristics of the conception of Kolmogorov's probability space.This space is defined when the following three categories are accepted known (f,)c, #), f is a set of elemen- tary events with their composite elements which are only considered in the classical probability theory; 9 c is a special algebra, called as a-algebra of the f sub-sets, that is associated with the a-algebra of the Az: events; the latter are defined provided both their conjunction (sum) UAk and cross-section (product) NkA exist in the infinite sequence of the events; # is a probability measure on the 9 c algebra.
Random variables X with their particular mean- ings x on a finite set 32 can be then defined as a result of the following transformation X: f 32, so that X-(x)E " for all x E 32.The probability of such an event in terms of these random vari- ables is the #-measure of a corresponding sub-set A of the $2 set, i.e. any random variable X with its meanings on the 32 and any distribution P on 32 x y (this notation means the Cartesian product of the infinite sets), the striction of which on the 32 is coincident with the given Px distribution, the Y random variable exists such as Pxr--P.This supposition is sure to be valid if f is the unit interval (0, 1), U is a fam- ily of its Borel's sub-sets in the Euclidean space and # is the Lebesgue measure (Kolmogorov and   Fomin, 1976).
Let us identify sets of all probability distribu- tions on the finite set 32 as sub-sets of the n= 1321-measure Euclidean space that consists in all vectors with components Pk>0 such as kPz:= 1. Linear combinations and convexity are then understood in accordance with the sup- position.For instance, the convexity of a real function f(P) from the probability distribution on 32 means that )   for any distributions P1, P2 and c (0, 1).
It is possible to use topological terms for prob- ability distributions on 32 assuming that they are related to a metrical topology characterized by Euclidean distance.In particular, the convergence Pn -P implies that Pn(x) --+ P(x) for any x 32.
We can introduce after these notations and definitions information measures having in mind their relevance to the well-known probability theory, from one side, and remote sensing data applications, from the other side.Of particular interest here for us are the forthcoming measures: unconditional, joint and conditional entropies, information divergence, mutual infor- mation, etc. P{X A} =/({cv: X(cv) A}).
We shall suppose hereafter that the major prob- ability space (f,)c, #) is "sufficiently rich" in the sense that for any pair of infinite sets 32 and y, INFORMATION

MEASURES
The scalar quantity of the amount of information in any set of measurements is defined through the mathematical expectance operator E in the following way: ) E(-logP(X)) H(P) H(X) E logp(x) P(x) log P(x).
xEX" This is the alternative representation of the entropy of a random variable X or a probability distribu- tion Px P. The entropy is the measure of an a priori uncertainty that is contained in the vari- able X before its measuring or observing; the main property of the measure of the amount of information is 0 <_ H(P) < log Understanding of the entropy as the measure of uncertainty about a process under study is meant that a "more homogeneous" distribution would possess of larger entropy values.If two distributions P and Q are given on X, then saying about the above "homogeneity", we imply that P> Q provided that for any two non-decreasing orderings Pl >_ P2 >_ >--Pn, ql >_ q2 > >_ q, (n--]A']) of probabilities from these distributions, the following inequality is valid for any k, <k<n: so that from the condition P>Q, the other inequality entails: The information divergence, that is connected with statistical hypotheses testing, is the measure of differing between distributions P and Q and is also given by the E operator: The conditional entropy is the measure of additional amount of information that contains in the random variable Y, if X has already been known, and is expressed through the joint entropy of pair of the variables and the unconditional entropy: H(YI X) H(X, Y)-H(X).
Due to the definition PxI( x Y) Pxy(x,y) for any Px(x)> 0, the conditional entropy can be also written as: where H(YIXx) Pylx(ylx)logPylx(ylx), y3 i.e. properties of the information categories en- able to express the conditional entropy H(YIX as the mathematical expectance of the entropy of the conditional distribution Y under the condi- tion X x.Mutual information of the X and Y variables serves as the measure of a stochastic dependence between these variables.We use the other letter for the measure notation (I instead of H in all other cases) just to follow traditional principles to do that.In particular, the formula I(X A X) H(X) expresses the amount of information that is contained in X relative to its own.
It can be found that all the listed and any other information measures have the following general properties: (1) are non-negative; (2) are additive, i.e.
H(X, Y) H(X) + H( Y X), H(X, YIX) H(X[Z) + H(YIX, Z),...; (3) satisfy the "chain rules" for sequences of ran- dom variables: k H(X,, X) H(X X, X_, ), i=1 k I(X1, ,Xk A Y) Z I(Xi A Y IXI,...,Xi_I),... i=1 (4) H(P) is the concave function of P and D(P, Q) is the convex function of the pair (P, (2), i.e. if P(x) aP1 (x) + (1 a)P2(x) and Q(x) aQ1 (x) + (1 a)Qz(x) for anyxE,'and0<a< 1, then oH(P) + (1 -o)H(P2) <_ H(P) and aD(P, Q1) + (1 c)D(P2, Q2) _> D(P, Q).' Owing to their additive properties, these infor- mation measures can be considered as formal identities for the random variables.The adequacy has been proven (Csiszar and Korner, 1981) to exist between these identities to be valid for an optional additive function f and their equivalents in the form of the information measures.Denoting the sign of such adequacy by e=, we can represent the proven facts as where U and N mean the conjunction and pro- duct of the A and B sets, respectively.In general, the theorem was proven in the cited reference that any pair of the information measures would be adequate to an expression of the following typef((A Cq B)\C) with the sign "backslash" denot- ing the sets disjunction, where A, B, C are infi- nite conjunctions of sets (supposed that A and B are not empty, C may be empty).And vice versa: any expression of the same type is adequate to the related information measure.
As a result of the facts, the following quantity (A\B) (B\A) A/B, called as the symmetrical difference between sets A and B (Kolmogorov  and Fomin, 1976), can be used in the studies.This quantity is a metrics of sub-sets A and B on the initial f set of Kolmogorov's space.The functionf((A\B) LJ (B\A)) does not have any direct analog in the theory of information.However, it is the metrics for the quantity on the random variables space.The said can be convinced by realizing that the metrics properties are correspondent to those given by initial ax- ioms of metrical spaces: It is not difficult to find that the information measures are continuous relative to the entropy metrics: However, the information divergence D(P, Q) can- not serve as the measure that satisfies the Euclidean distance requirements, on the probability distribu- tion space since it is not symmetrical.Even the "symmetrized" divergence J(P, Q) D(P, Q) + D(Q, P) is not the distance once such probability distributions P1, P2, P3 can be found, for which simultaneously the following inequalities are valid: D(P1, P2) + D(P2, P3) < D(P, P3), D(P,P) + D(P2, P) < D(P3, P1).
Following the listed results, we can study prop- erties of the information sub-spaces as imbedded into the main probability space that comprises random variables.This would require additional explanations to consider the sub-spaces relative to the distribution space because of the above concavity of the entropy and the convexity of the information divergence in the space.In fact, an opportunity is emerged in the first case to invent a unified description of different data sets repre- sentation for selected classes of natural targets using their transformed images, given by remote sensing measurements.The description is based on the SMM categories giving rise to imagery visualization procedures, which are usually implied while saying about the thematical interpretation of the images in a particular subject area.These procedures enable to find an analog to the sub- jective analysis of single satellite pictures by eyes of an experienced interpretor when an analyzed picture is displayed on the computer screen or as a hard copy.The rigorous definition of such visual- ization that also includes multispectral analysis, practically not accessible for the subjective inter- pretation, would originate from the SMM consid- erations in the metrical information sub-spaces.

IMAGERY VISUALIZATION
Of particular importance for an objective inter- pretation of natural targets variability on their space imagery are Markov's chains as an effec- tive tool to identify "a recipe" of pixels ordering within a spatial structure.The above mutual in- formation is the quantity that is the most profit- able for an analysis of alterations on sets of pixels to be considered as sequences of these random variables.In accordance with its definition, a finite or infinite sequence of variables X1, X2,... with final sets of their values is called Markov's chain (Pougachev, 1979) if for any the variable Xi+l is conditionally independent on (X1,...,X;_I) relative to Xi.The latter notation is common- used in the information theory: if information measures are dependent on a set of random vari- ables and these variables can be represented by the only symbol, the set is written as an argument without any parentheses.The parentheses are used to emphasize mutual information between the variables.Random variables X, X2,... would generate a conditional Markov's chain relative to the variable Y if for any the variable Xi+l is conditionally independent on (Xi,...Xi_) pro- vided (X;, Y).Both types of the chains serve to find elements of ordering on the images.
Since according to the definition of mutual information I(XA Y) 0 if variables X and Y are independent, and I(XA Y Z) 0 if X and Y are conditionally independent variables relative to Z, then it can be stated that elements (pixels) of a multispectral image represented by X,X2,... would make up Markov's chain there and only there where for any i.The similar form of ascertaining the conditional Markov's chain relative to Y looks as I(X1, Xi_ Y) o Assuming random variables X_l, X;, X+ as three levels of delineating pixels of one spectral band for an image while identifying rules of deci- sion making as to the separability of the pixels and variable(s) Y in the above sense as related to the second band of the image, the search of pixels of its structure, for which the last "chain rule" is valid, would represent the essence of the visuali- zation procedure for the two-band image.Finding of contextually coherent structures (Kozoderov, 1997) of natural targets and accounting for the related measures of the targets complexity de- scription would be the result of these rules application.Considering random variables X1,...,Xi, Xi+ 1, Y as characteristics of a particular spectral band, the number of which is / 2, this search of the ordering measures using the chain rules would serve to elucidate the optimal selection of the number of bands and the efficiency of the relevant instruments called imaging spectrometers (Mission to Planet Earth, 1996).Both these aspects of the information measures applicability are needed to be realized in constructing new versions of special computer languages.

NATURAL TARGETS COMPLEXITY
The most pattern recognition and scene analysis techniques that would present the scientific basis for multispectral imagery processing are divided into two groups: one of them is tackled from the decision making position (Tou and Gonzalez,  1978) and the second is considered within the syntax approach (Fu, 1977).Natural objects (specific targets) are characterized by sets of num- bers in the first case.These numbers are digital equivalents of results of remote sensing measure- ments.Pattern recognition as a procedure of attrib- uting of each pixel on an image to some classes is carried out in this case by sub-dividing the entire space of characteristic features on selected areas to be delineated by sets of such rather stan- dard procedures.Classes are to be defined in accordance with the probability distribution functions for sets of pixels on the scene under processing.It is required in the second case (the structural description of each pattern) that the recognition procedure would enable not only to take an object to a particular class, but to de- scribe those peculiarities of the object, which would exclude its taking to any other class.
Developing the known metrical pattern recogni- tion theory (Grenander, 1976;1978), we can extend the definition of pixels in the techniques of the first case to elements x E A" in the second case.
The recognition of the images in the second case is based on an analogy between "the structural patterns" (hierarchical or in a tree form) and the syntax of a computer language.The recognition in this case is in a syntax analysis of "a grammati- cal sentence" that describes a concrete scene under analysis.This scene is reflected by sets of various objects to be quantitatively described by the infor- mation measures.Such elements that can be called as generators are natural to be used for con- structing configurations.It means that inducing a group of transformations on the set A', a set of objects to be recognized is divided on classes of their equivalency.The configurations are deter- mined by the composition and structure of their generators and by the combinatorial theory of the configurations construction on particular imagery of natural objects to be analyzed by the proposed treatments.
If it is possible to assume a structural combi- nation of the generators into configurations, then these combined objects being characterized by the composition of bonds between the elements and by their own structures are initial to study new classes of the metrical images.The direct problem of studying processes of the images for- mation through mathematical operations of com- bination, identification and deformation is usually called the imagery synthesis whilst the inverse problem of selecting particular configura- tions on the images is called the imagery analysis (Grenander, 1978).Denoting by a system of rules or restrictions that are to define what configurations are regular, we can write the following symbolic expression for a computer language representation on a set of such regular configuration Q(7-.): (a, s, where G is the generators set, S is the transfor- mation set of the generators, N is the type of the bonds for the taken sets of generators, p is the ratio of consistency between the possible bonds in their structural connections.These bonds and connections may serve in the first approximation as a measure of the .structurecomplexity.More comprehensive definition of the complexity in the matrix form will be given below.The regularity of these configurations on particular imagery is supposed in the studies as their consisting in spe- cific structural connections not purely random for the elements; otherwise, no opportunity could be found in traditional supervising procedures of pattern recognition techniques and all attempts to create "an artificial intelligence" by finding regular rules of the element connections would be ambiguous. There is the theorem (Grenander, 1976) that the tree type bonds and the equality for the ratio of consistency p induce the above Markov's properties of the probability measures on sets of their regular configurations.These sets are under- stood in the sense of the existence of a topology 7-on Kolmogorov's initial space (on the f set) when any system of sub-sets F should satisfy the following requirements (Kolmogorov and Fomin,   1976): (i) the set f and the empty set belong to 7-; (ii) the sum UF of any finite or infinite set and the cross-section n k_lFk of any finite numbers of such sets from 7-belong to the topology.
Three known axioms of separability are valid for such topological spaces T=(f,7-) (Sadovnichii, 1979): (1) neighbourhood O(x) of a point x, not contain- ing another point y, and neighbourhood O(y) of the point y, not containing the point x, exist for any two points of the T space; (2) any two points x and y of the T space have disjoint neighbourhoods O(x) and O(y) (the known Hausdorff's axiom); (3) any point and any closed set, not containing the point, have disjoint neighbourhoods.
The above regularity of the configurative probability space is accepted by us as satisfying the axioms (1) and (3).Having these rules in mind, the syntactically governing probabilities can be defined as > Pr > O, Z Pr for any E N, rET where r is altered from to p, N is the number of elements of the syntactical variables.Now it is possible to introduce the complexity matrix for the grammatical rules of the description of the bonds for the regular configurations: M--{rnij: i, j 1,2,..., v}, where mij rEiPrn.i(r);7i is the set of the per- mutation rules for the variable c with the/-index; n(r), n2(r),..., nv(r) are numbers of the appearance of the first, second,..., vth syntactical variables, the total number of which is v for a testing grammatics.
Recalling that the entropy is a measure of any ordering, we can use it to write the following ex- pression for the syntactically governing probabili- ties Pr in the attempts to order the grammatic rules: hi h(i) Pr logpr, Ti i= 1,2,...,v.
The entropy of a style of imagery description can then be introduced as: tI-H()-Z p/(B) log P(B), B where probabilities Pj. are to be known for any possible chain J E Q(74.) for sets of outputs while fitting the style of the imagery description in the tree form.B denotes that using the proposed conception of generators, bonds and sets of regular configurations for the information measures, the rules of forming the description style of the taken computer language are to be selected only from the induced syntactically governing probabilities.
Returning to the above information measures relation H(X, Y) H(X) + H( Y IX) and considering X as a random variable for a resulting grammatics and Y as that for the style of its description, one can obtain Hi=hi+ or in the matrix form

H-[I-M]-h,
where the matrix H of ordering of the style description is expressed through the matrix h of ordering of the syntax of imagery by the inverse matrix that is equal to the difference between the unit matrix I and the complexity matrix M.This gives a rule for testing a special language of the structural imagery description (Kozoderov, 1997).
To sum up the results, we can say about the applications of the related languages for testing them while describing the contextually coherent structures mentioned above.It is worthwhile to evolve the techniques for computer work stations to proceed from these improvements to the final stage of the multispectral satellite imagery analysis.This stage is concerned the signal/noise ratios ex- traction from temporal sets of the consequent images with the structures, predescribed in accor- dance with the given complexity procedures.Analyzing temporal sets of satellite imagery, given by multispectral radiometers of different spatial resolutions, and describing the imagery structures by the proposed techniques, we are able to pro- ceed from the regional structures description for selected natural targets to their global changes.
Let us add a few words about the complexity category.
Discussing the incorrect problems of data sets interpretation in Part of the paper, we used the complexity functionals in the regularization tech- niques.These functionals were applied for finding the models, which satisfying to the general forma- lism of solutions of the inverse problems would be of "the minimal complexity".The last term was utilized there to select those specific models from sets of similar other models, which would be comparable with accuracies of data of the obser- vations that were to be fitted to theoretical results of modelling.More complicated models in this sense could be less consistent with the relevant set of observations than these models of the minimal complexity.
Saying about rules and restrictions in the structural conjunction of the proposed "standard blocks" (generators) here, we have introduced the structural complexity of configurations, which are regular in the sense of how one set of these "details" could be imbedded into the other of the higher level construction.We can use, if necessary, the definition of "the quantitative complexity" of a configuration, just simply counting the number of generators in the configuration.By the general expression of the complexity in the matrix form above, we determined grammatic rules of the complexity description by using the syntax and style of the language in the analysis of the struc- tures on multispectral satellite images for selected classes of natural targets.The term "complexity of terrestrial ecosystems" in the International Global Change and Terrestrial Ecosystems (GCTE) project (Kozoderov, 1995) is in fact identical to the biological diversity.Our inten- tions are to describe changes in terrestrial ecosys- tems by the overall natural systems complexity matrix while using regular observations of the systems.Biodiversity changes would inevitably result in the observable changes on remote sens- ing images.Thus, we do have an opportunity to filter out all seasonal harmonics of vegetation growth on the images and deal with "signals" of their possible change by the proposed below application of discrete dynamics techniques.

PREDICTABILITY OF GLOBAL CHANGE
The scientific basis for solving the predictability problems is given by the cross-correlation tech- niques to find asynchronous correlations of anomalies of the fields under study (outgoing long-wave radiation, the biomass amount of vege- tation, etc.).These correlations are represented in the following form of the signal/noise ratio for two autoregression Markov's processes of the first order (Marchuk et al., 1990): a(7-) SD(Rxr(7-)) Here Rxg(7-) n-7-k= are cross-correlations of the anomalies (deviations from the quadric standard) of the studied quantities for n observations and different shifts with time 7-, SD(Cxy) is the standard deviation of observation covariances for two intervals on time, Cxr Rxr(7-) Sxy is the sampling estimate of the cross-covariance for the two discrete processes X and Y, (Xk (Yk r) Sxg-k=l 'T 1 Xk, r---

Hk= Hk=
The index T characterizes the averaging procedure for the two data sets.
The analytical solution for the problem with two Markov's random processes is known as (see where is the covariance of the processes under two shifts 7-1 and 7-2, Nxy is the effective number of pairs of the processes under the shifts, 7xx(7-) oS(X)exp(-xT-), ,),gg(7-) @(Y) exp(-)yT-); @(X) and aF2(Y) are dispersions of the analyzed fields for different periods of time.
The statistical confidence of the mutual corre- lation coefficients from the formulae on the 95% of the confidence level is given by Rxg(O) > 2SD(Cxy)/OF.The predictability of the process X(t) via Y(t), both are represented in the discrete form with time t, is then determined by the expression Pxg(7-) axy(7-) /2 Rxy(7-) (Nxy(1 exp[-(Ax + The statistical significance of the anomalies of the fields under study is defined here by the num- ber of independent samples that is equal to Nxy Rxx(k)Rgg(k) In the final run, the predictability problem is reduced to finding the cross-correlations between "particular points" of multispectral remote sensing imagery and their background.Selection of these points and analysis of the structures, that comprise the points, is the subject of the above visualization techniques.The next stage of imagery processing is to retrieve state parameters of natural objects, classified in accordance with routine pattern recognition and scene analysis techniques using relevant imagery transformation procedures (Curran et al., 1990).The final stage is in the temporal data sets analysis that would enable to understand the predictability problem in the way presented here.All these stages of information and mathematical applications for data sets of satellite imagery interpretation are an example of advances in the multidisciplinary description of natural processes.

CONCLUSION
The information and mathematical aspects of global change, presented in the publication, are demonstrated by the unified approach how to compare sets of data in information sub-spaces and to understand predictive capabilities in solv- ing the problem.In spite of the formal informa- tion theory does not enable to solve all problems of satellite imagery interpretation, we have elabo- rated techniques to describe the complexity problem of natural structures using the mathematical formalism of tackling with sets of data, informa- tion measures and entropy metrics.We employ the known axiomatics of Kolmogorov's probabil- ity space to emphasize the discrepancy of our approach with tendencies of pure numerical appli- cations in current international scientific programmes of global change.Our studies are designed to remove deficiencies given by the common-used information theory, which are due to the assump- tion that a structure under study is finite.Our improvements of the classical theory gets possible owing to the proven possibilities to unify differ- ent data of observations in terms of sets, mea- sures and metrics.The opportunity to account for scientifically the comparability, ordering and calculation measures enables us to extend existing knowledge about the natural structures descrip- tion.Adherent to updated views on order, chaos and similar other categories in natural dynamic systems, we have shown how these categories are represented in metrical spaces for our purpose to find a regularity on structures of natural targets represented in information sub-spaces by these targets imagery.Our main intention in future is to elucidate the problem of "genesis" of informa- tion and situations using the unified description of natural processes by the proposed way.Thus, we are approaching the understanding of global change based on major achievements in informa- tion and mathematical sciences.