Presentation of Coupling Analysis Techniques of Maximum and Minimum Values Between N Sets of Data Using Matrix [µ][MKN]

This paper deals with the presentation and study of alternative coupling techniques for maximum and minimum values between data sets, namely the problem which is examined in this work is the possible appearance of maximum or minimum values between data sets in the same or neighboring time points. The data can be time-dependent (time series) or non-time-dependent. In this work, the analysis is focused on time series and novel indices are defined in order to measure whether the values of N sets of data display in terms of time, the maximum or minimum values at the same instances or at very close instances. For this purpose, two methods will be compared, one direct method and one indirect method. The indirect method is based on Matrices of dimensionless indicators which are denoted by [μ][MKN], and the direct method is based on a variance-type measure which is denoted by [V][MKN].


Introduction
The purpose of data analysis is from one point of view to understand the variation of the data, in order to be able to predict future values for the case that the data depend on time or their frequency of occurrence for the case that the data are independent of time. The tools to achieve this goal are the use of statistical methods such as descriptive statistics, hypothesis testing, regression analysis, analysis of variance, quality control, regression models (Álvarez et al., 2021), time series analysis, etc. Many papers have been published in the field of time series analysis, publications with applications to data derived from demography, economics, financial stability (Nguyen and Bui, 2020) and financial indicators, results from biological laboratories, health science (clinical trials), industrial production lines, etc. The analysis of time series coming from a great variety of situations, is based on the study of various characteristics of the data, such as the Trend, the Periodicity, the Stationarity or Similarity between time series.
The similarity between two (or even many) time series attempts with various techniques to study the common changes of two-time series. The methods of calculating the similarity between twotime series can be achieved with simple mathematical measures (Iglesias and Kastner, 2013), by using some transformations in the data (Lin et al., 2003) or by using algorithms (Keogh et al., 2001;Morse and Patel, 2007;Nakamura et al., 2013;Serra and Arcos, 2012), or are based on local autopatterns (Baydogan and Runger, 2016), or by using correlation-aware measures (Mirylenka et al., 2017). Another approach are the measures of divergence defined between two probability distributions, the most well-known being the measure of Kullback-Leibler (Kullback and Leibler, 1951), the Csiszar's φ-divergence family of measures (Ali and Silvey, 1966;Csiszár, 1964), the Cressie and Read measures including well-known measures such as the Pearson's χ2 measure and the classical loglikelihood ratio statistics (Cressie and Read, 1984), the BHHJ deviation measure of Basu (Basu et al., 1998), the generalized BHHJ family of measures (Mattheou et al., 2009), and entropy-type measures and divergences with applications in engineering (Koukoumis and Karagrigoriou, 2021). Finally, there are many dissimilarity measures (Huber-Carol et al., 2002;Meselidis and Karagrigoriou, 2020;Toma, 2008Toma, , 2009Zhang, 2002).
The indices and the corresponding matrices that will be presented in the present work are based on and are a continuation of the idea of MKN indices defined in Makris (2017) and Makris (2018). Through these indices the simultaneous time pairing (i.e., time display) of the maximum (or minimum) values between time series was compared, based on the values of the time series. MKN indices were defined for the first time in experimental data measuring forces, moments, displacements, and rotations for two types of floating wind turbines. More specifically, the combined effect of the anchor lines and the turbine (wind turbine) was considered in relation to the response of the floating body in order to study the hydrodynamic aspects of the floating wind turbine and the undulations of the combined wind and wave action for two data cases, firstly for the case that the data come from regular waves and secondly for the case that the data come from irregular waves (Makris, 2017). Epidemiological data from Greece collected during the period of 2004-2017 and related to influenza were considered in Makris (2018).
The analysis begins in Section 2 with the presentation of a direct way of solving the above problem

Introduction of the Method V
For a better understanding and comparison of the concepts that will be presented in the present work, we will first present an alternative method of calculating the coupling of maximum and minimum values between time series. The method is denoted by the letter V (since it is related to the definition of Variance) and is a direct method (as opposed to the indirect method that will be presented later) and is calculated based on a modification of the Variance. The analysis in this work will focus on time-dependent data, i.e., time series.

Definition of the Matrix [ ] [
] The first part of the analysis begins with the presentation of the measure [ ] which is defined for N=2 time series, by: (1) In relation (1), i and j are two time series of equal length (i.e. each time series has n observations), where the notation ( − +1) expresses the time when the time series i displays its ( − + 1) ℎ maximum value, and ( − +1) denotes the time when the time series j displays its ( − + 1) ℎ maximum value, while ( − +1) represents the ( − + 1) ℎ ordered observation of the time series i and ( − +1) represents the ( − + 1) ℎ ordered observation of the time series j (see also Makris et al., 2021).
It is noted that the function defined in (1) as well as others that will follow, depend on three parameters denoted by M, K and N, where the parameter M denotes whether the values of the time series under investigation are Maximum or Minimum, i.e., M is a binary parameter which takes as values the functions Min and Max, or in code 0 for the Minimum values and 1 for the Maximum values. The parameter K stands for the number of the maximum or minimum values based on which the calculations are made, (for instance if K takes the value 3 then the 3 highest values of a time series { } =1 are ( ) , ( −1) , ( −2) ), and parameter N is the number of time series that are simultaneously compared where in (1) as previously said, the N parameter is equal to 2 (time series i and j).
If M=min the measure defined in (2) takes the form ( For the case where the number of time series that are analyzed simultaneously is N >2, the matrix of functions [ ] [ ] of dimensions N×N (relation (3) below) is created, combining the time series in pairs, based on (1) or (2), and the letter V is enclosed in brackets to distinguish it from definition (1) or (2) which refer to a value: It is easy to notice the following (for M=min but analogously it holds for the M=max): and therefore, the matrix [ ] [ ] is symmetric with diagonal elements equal to 0.
The problems created by the application of the above direct method are that in some cases incorrect results are extracted, as it will be evident from the presentation of Examples 1 and 2 in Section 6.

Definition of the Matrix [ ] [ ]
In the previous section the function V was defined for non-complementary variables. In this section the function V will be defined for complementary variables. Complementary variables can be the demand of two substitute goods, where substitute goods are two products that the consumer can use for the same purpose, (some substitute goods are the tea and coffee, the water in one company and the water in another company). The parameters K and N remain unchanged as in the previous section in terms of definitions, while for the parameter M in the setting of this section, maximum and minimum values are used in combination in order to compare the maximum values of one time series with the minimum values of the other time series simultaneously (the parameter M is denoted here by ). For example, for the case that we study the demand of tea and coffee, as it is well known, when the price of coffee is increased, then the demand of coffee will reduce and the demand of tea (the complementary good of coffee) will increase, in order to replace the decrease in coffee demand.
For the case where N >2, the matrix of measures [ ] [ , ] of dimension N×N is based on (6) and (7) [ , ] are not equal to zero, that is, Finally, in the present section, the symmetric property (5) of the previous section does not apply, namely, Similarly, for M=Min we have where by { } we denote the K time points where the time series i displays its K smaller values.  (12) and (13)

Definition of the Matrix
As an example, if μ= average, the maximum values are under consideration and the value of the parameter N is equal to 2, (i.e., two time series are analyzed) the format of the above matrix reduces to the matrix defined in (15).

Results and Discussion
It this section we study the aforementioned two methods (the direct method via function V and the indirect method using the proposed indices μ) through two numerical examples for two different cases of pairs of time series.

Example 1 (Function V)
Consider two time series A and B consisting of 20 values, where the data are given in rows 2 and 3 respectively of Table 1.
In Figure 1 it is shown that there is in fact same coupling between the two time series

Example 2 (Function V)
Consider two time series C and D with 20 values each, where the data are given in rows 4 and 5 respectively of Table 1.

Examples 1 and 2 (Indices μ)
For the case of Example 1 introduced before notice in Table 2 that the values of the indices ( \ ) are equal to the indices ( \ ) as well as the indices ( \ ) are equal to the indices ( \ ) for each value of the parameter K considered, a result which shows that the time series A and B display their maximum values at the same time points for all K = 1, 2, 3, 4. We arrive at the same conclusion as with the direct method and the measure [ ] . Also observe that as the value of the parameter K increases the value of the MKN indices decreases since the numerator of the index tends to the value of the denominator. For the case of Example 2 we notice in Table 3 that the values of the indices ( \ ) are equal to the indices ( \ ) and the values of the indices ( \ ) are equal to the indices ( \ ) for the values of the parameter K=2 and K=4, a result which shows that the time series C and D display their K larger values in alternating order in terms of time. This index gives better results than the direct method while the measure [ ] between C and D gave extremely high values. One of the advantages of the MKN indices is that they provide more information, that is, in addition to recognizing the possible common placement of the maximum values between the time series, they also provide through their value a new measure of the similarity between the time series.

Conclusions
In this work an indirect method of data analysis based on the indices [ ] as well as a direct method based on the measure [ ] were presented. The first method is defined in terms of coupling of maximum or minimum values of the time series considered while the second method is defined in terms of the time instances of occurrence of the maximum or minimum values. The index [ ] method has an advantage over the direct method, since besides the information about the similarity of the time series, as far as the times of occurrence of the maximum and minimum values are concerned, it also provides information about statistical characteristics of the data such as the Average. This work is part of an ongoing research and further exploration of the capabilities of the [ ] indices will be performed.