LMC and SDL Complexity Measures: A Tool to Explore Time Series

,


Introduction
The word complexity, in the common sense meaning, represents systems that are difficult to describe, design, or understand.However, since Kolmogorov presented the concept of computational complexity [1], new ideas have been associated with this word, mainly in life sciences [2], relating complexity, and information [3].
As a consequence, complexity started to be associated to with systems and with the emergence of unexpected behaviors, due to nonlinearities [4,5] and, concerning system theory [6], a new meaning was carved, postulating that complexity is half way of the equilibrium and disequilibrium [7].
Developing this idea, in a seminal paper [8], López-Ruiz, Mancini, and Calbet proposed the LMC (López-Ruiz, Mancini, and Calbet) complexity measure for a random distribution by using informational entropy [9] to evaluate equilibrium, and the quadratic deviation from the uniform distribution to evaluate disequilibrium.
However, there has been some criticism about the LMC measure, considering that it is inaccurate for some classes of systems obeying Markovian chains and cannot be considered to represent an extensive variable.Feldman and Crutchfield [10] proposed a correction for the disequilibrium term, replacing it by the relative entropy with respect to the uniform distribution.
Shiner, Davison, and Landsberg proposed another modification of the LMC measure, replacing the disequilibrium term by the complement of the equilibrium term.This measure is called SDL (Shiner, Davison, and Landsberg) [11] and presents conclusions similar to that obtained by using LMC, for the majority of usual statistical distributions [2].
The main restriction to LMC and SDL complexity measures is due to Crutchfield, Feldman, and Shalizi, as they argue that an equilibrium system can be structurally complex [12], but this problem could be solved by weighting order and disorder, according to the specific problem to be analyzed.
Since the early 2000s, the idea of adapting LMC and SDL to dynamical systems was successfully applied to different types of time evolution problems: bird songs [13], neural plasticity [14], interactions between species in ecological systems [2], physiognomies of landscapes [15], economic series [16], spread depression [17], and quantum information [18].
With these ideas in mind, this article presents a systematization of the methodology used in the referred papers, 2 Complexity based on LMC and SDL measures, to be applied to temporal series, by defining and calculating the dynamic complexity measures.
The procedure, applied to a temporal series representing some organizational or functional aspect of a system, provides insights regarding the evolution of its complexity.
As the LMC and SDL dynamical measures are based on informational entropy [16], the first task, described in the next section, is to define an alphabet source, associating a probability distribution with the possible system states.
Following the definition of the probability distribution, a new section defines how dynamical LMC and SDL measures can be calculated at each time, based on the individual information associated with the system state at this time, generating temporal series for LMC and SDL measures.
To illustrate the calculation procedure, two examples are presented: one related to a meteorological time series and the other to an economic time series.In both cases section, a practical discussion about how to divide the range of the values assumed by the system state is presented.
The examples were chosen to show that the methodology can be applied to different types of phenomena: precipitation (first example) with strong periodic component and economic time series (second example) that seems to be random.
The work is closed with a conclusion section, emphasizing that the same procedure can be applied to any kind of temporal real numbers series, even with different temporal scale, to calculate complexity measures.

Defining Source and Probability Distribution for a Temporal Series
Considering Shannon's model [9] for an information source, a time series () is considered to be a function of the nonnegative integers into a real interval, i.e., () :  + → (, ), associating with each time  0 +  a real number belonging to (, ), with  0 > 0 being the initial instant and  > 0 an arbitrary period, depending on the data availability.The set ( 0 ), ( 0 + ), . . .( 0 + ) is assumed to be a sequence of independent random variables and the stochastic process () as a whole is stationary [19].
The first step is to divide the interval (, ) into N subintervals.For the sake of simplicity, N is chosen equal to 2  ,  ∈  + .
At this point, it could be asked how to choose N, as there is a compromise between precision (high values of N) and speed of calculation (low values of N).This question will not be addressed theoretically; however, in the example section, practical hints about this choice are presented.
Consequently, the source alphabet is defined by the intervals   ,  = 1, . . ., , with ⋃  =1   = (, ) and   ∩   = , ∀ ̸ = .Then, a time interval defined by a given n must be chosen, and for the time sequence  0 ,  0 +, . . . 0 + the values of the variable () must be read and associated with the intervals   , containing their respective value.
Therefore, for the whole set  0 ,  0 + , . . . 0 + , each interval   belonging to the source alphabet is associated with () a certain number of times   , which defines a relative frequency   =   /( + 1).
As   ≥ 0 and ∑  =1   = 1, it can be taken as a probability, associated with each interval   .
Following the definition, for each subinterval   ⊂ (, ), its individual contribution to the whole information entropy is given by   = −  log 2   ; and the maximum value of the informational entropy for the whole source,   = log 2  = , can be calculated [9].

Dynamical LMC and SDL
As the source alphabet and individual information were defined, the instantaneous values of () are associated with their respective   , allowing the calculation of the instantaneous value of the equilibrium (disorder) term: Combining ( 1) with the different definitions of the disequilibrium (order) terms, dynamical LMC and SDL measures are defined.

LMC Dynamical Measure.
As indicated by López-Ruiz, Mancini, and Calbet [8], the dynamic disequilibrium (order) term can be calculated as the quadratic deviation of the source alphabet probability distribution from the uniform distribution and, consequently, the individual contribution of each interval   is Extending the definition of LMC measure, dynamical LMC, calculated in  0 + , is given by   () = Δ ().() . (3)

SDL Dynamical Measure.
As proposed by Shiner, Davison, and Landsberg [11], the dynamic disequilibrium (order) term can be calculated as the complement of the dynamic equilibrium term: Extending the definition of SDL measure, dynamical SDL, calculated in  0 + , is given by   () = Δ ().() . (5)

Applying the Method to Meteorological Data
A monthly meteorological temporal series is studied in this section, showing that the described method can be applied, independently of the natural time scale and periodicity of the phenomenon.
The meteorological data series relative to rain precipitation in Dourados-MS-Brazil [20] is analyzed, only in a methodological point of view, without any meteorological conjecture about the results.
The monthly precipitation index temporal series, from January 2004 to September of 2012, shown in Figure 1, represents the value of () [20], whose complexity is analyzed.
Based on these probability distributions,   () and   () are calculated and plotted giving an idea about how the measure choice and the interval division affect the results.As Figures 2(a) and 2(b) show, in spite of the numerical differences, the time evolutions of   () and   () are represented by similar curves, in the eight-part division case.
Observing the figures, it is possible to infer that the LMC measure captures the periodic character of the precipitation along the years in a better way.However, the SDL measure assumes its maximum value (.25).
If the range of () is divided into 16 parts, Figures 3(a) and 3(b) show the results for   () and   ().
It can be observed that, in this case (sixteen-division case),   () and   () differ by a scale factor, with LMC measures presenting better accuracy to express the periodic character of the rain seasons.SDL measure presents high value of peaks but the maximum value (.25) is not reached.
Comparing Figures 2(a) and 3(a),   () for different range partitions, the global aspects of the curves are the same and, by increasing the number of divisions, the dynamical range of the measures decreases, and some rapid oscillatory variations similar to noise appear.
Comparing Figures 2(b) and 3(b),   () for different range partitions, the whole aspects of the curves are the same and the noisy aspect due to the increasing number of interval divisions is similar to the presented by   ().

Range Interval Partition.
As it was observed, the dynamical range of the both measures decreases as the number of divisions increases, as a consequence of the fact that the number of elements of the source increases provoking more uniform distribution of the possible measures.
To better understand this phenomenon, the measures are recalculated by increasing the number of intervals of (), and the result for a thirty-two partition is shown in Figure 4(a) for   () and in Figure 4(b) for   ().
By analyzing the results from Figures 2(a), 3(a), and 4(a), it could be observed that, by increasing the number of intervals, the dynamical range of   () decreases but, apparently, for this long series, the temporal evolution of   () maintains its qualitative behavior mixing noise with accuracy.
By analyzing the results from Figures 2(b), 3(b), and 4(b), it could be observed that, by increasing the number of intervals, the dynamical range of   () decreases and its maximum value (.25) is not reached.Apparently, for this long series, the temporal evolution of   () maintains its qualitative behavior mixing noise with accuracy.

Applying the Method to Economic Data
In this section, the economic series relative to the conversion of currencies studied in [16] is taken as an example, showing the applicability of the methodology for random phenomena.
The temporal series related to the daily dollar to Brazilian real (USD/BR) conversion rate, from January 1999 to September of 2015, shown in Figure 5 [16] is analyzed, only in a methodological point of view, without any economic conjecture about the results.
This conversion rate represents the value of (), whose complexity is analyzed.
Based on these probability distributions,   () and   () are calculated and plotted giving an idea about how the measure choice and the interval division affect the results.

Equivalence between LMC and SDL.
Dividing the range of () into 8 parts, the results of the calculation of   () and   () measures are shown in Figures 6(a It can be observed that, in this case (sixteen-division case),   () and   () differ only by a scale factor, with the same qualitative time evolution.Comparing Figures 6(a) and 7(a),   () for different range partitions, the whole qualitative aspects of the curves are the same and by increasing the number of divisions, the dynamical range of the measures changes, implying some rapid oscillatory variations, similar to noise.
Comparing Figures 6(b) and 7(b),   () for different range partitions, the whole qualitative aspects of the curves are the same and the noisy aspect due to the increasing number of interval divisions is maintained.
Consequently, from now on, only LMC measure will be analyzed, since SDL presents the same qualitative dynamical behavior and partition sensitivity.

Range Interval Partition.
By increasing the number of intervals of () and recalculating   (), the result for a thirty-two partition is shown in Figure 8(a) and, for a sixtyfour partition, in Figure 8(b).
By analyzing the results from Figures 6(a), 7(a), 8(a), and 8(b), it could be observed that, by increasing the number of intervals, the maximum value of   () decreases improving the precision but, apparently, for this long series, the temporal evolution of   () maintains its qualitative behavior mixing noise with accuracy.
Attempting to be more precise about how the range interval partition,   () is calculated for the several partitions, but considering a shorter time period for the data.The interval between July and December of 2002 is chosen, because, as explained in [16], it is critical concerning the conversion rates in Brazil.It can be observed from these results that, for shorter intervals, the general qualitative characteristics of the time evolution appear, independently on the partition.However, as the number of subintervals increases, the instantaneous numerical values change but the precision increases, allowing more accurate analysis.

Conclusions
A methodology for calculating LMC and SDL dynamical complexity was developed, starting with the construction of a source and a probability distribution, for any temporal series.The contribution is just concerning to extend ideas, mainly applied to static situations, to temporal evolution of variables representing some kind of organization phenomenon.
LMC and SDL measures were observed to be equivalent in some temporal analyses but, when there is a strong oscillatory component, the LMC measure seems to be more accurate to express the temporal evolution of the complexity, as the meteorological data analysis shows.
For more randomly distributed data, the two measures (LMC and SDL) present the same accuracy, as the economic data analysis shows.
A point that is always an object of discussion is the range interval partition.The choice of the number of subintervals is a matter of experience.
Long time intervals are not so sensitive to the increase of the number of divisions, in spite of meteorological data being more sensitive than economic.However, for short time intervals, increasing the number of divisions produces a less precise analysis, introducing noise.
The examples presented were just to illustrate the methodological approach, without any compromise with meteorologic or economic conclusions that can be inferred by a specialist, by using the developed tool.

4. 1 .
Equivalence between LMC and SDL.Dividing the range of () into 8 parts, the results of the calculation of   () and   () measures are shown in Figures 2(a) and 2(b), respectively.
) and 6(b), respectively.As Figures 6(a) and 6(b) show, in spite of the numerical differences, the time evolution of   () and   () are qualitatively the same and represented by very similar curves, in the eight-part division case.If the range of () is divided into 16 parts, Figures 7(a) and 7(b) show the results for   () and   ().

Figure 9 :
Figure 9: LMC temporal evolution for shorter time intervals.