Jaccard matrix for nonlinear filter statistics

ABSTRACT We propose the Jaccard matrix (JM) and the Jaccard cell (JC), define them as the extended concepts of the Jaccard index, and theoretically and numerically analyze them. The data on the Euclidean plane can derive the JM as a sparse matrix. We show the JC inherits the feature of similarity of the Jaccard index as the exponential function of mutual information. We theoretically and numerically confirm that the local correlation coefficient of the data on the Euclidean plane relates the JC to the mutual information. Although one could potentially select an arbitrary cell size of the grid to make the JM, the knowledge we can obtain from the matrix decreases if the cell size is too big or too small to distinguish the data clusters appropriately. Therefore, the JM needs a computational procedure to determine the cell size within the appropriate scale. Maximizing the variance of the JCs supports determining the unique cell size, which value locates in the middle range of the parabolic function of the cell-size parameter. The JM could derive an index extracting nonlinear correlation of the data. The maximized standard deviation of the JCs as such an index is a decreasing function of the noise scale of the data under the constraint conditions. The ability to determine the homogeneous rectangular grid pattern of the JM might be a significant feature for finding nonlinear correlation. We would summarize this study as that of a nonlinear filter working as an efficient component of explainable AI and statistics.


Introduction
Discovering functionality between two variables is a significant task of data mining.Pearson correlation coefficient (PCC) [1], a measure of linearity between two variables, is a classical tool for the task, but it cannot detect a general association, such as a nonlinear function with random noise.Such an association not to be measured adequately by PCC is called nonlinear correlation [2].The maximum information coefficient (MIC) is a measure proposed by Reshef et al. [3] of the nonlinear correlation between two variables.MIC is the maximum value of mutual information (MI) calculated for the divided two-dimensional spaces into various grid patterns.As the recent studies on MIC, for instance, there is a new algorithm to optimize it [4] and is an extension to apply to multi-dimensional data [5].MIC is one of the excellent practices in advanced applications of MI as a well-known measure.We are interested in developing a method to create an index of nonlinear correlation by aggregating a canonical measure modified appropriately, inspired by the procedure of MIC.
On the other hand, as the context from a general view of mathematical formulation, we pay attention to mathematical operability and analyzability.A general association of data of two variables, including nonlinear correlation, can be regarded as a binary relation R defined as a subset of a direct product A × B of data sets A and B. A function is a binary relation such that any element of A is in a relation with one and only one element of B. In this sense, data mining for two variables implies knowledge discovery of binary relations or functions from observed data with random noises.The studies on machine learning to estimate such relations or functions suggest the effectivity of matrix representation in terms of operability and analyzability for the mathematical models.The rectangles of the grids of MIC are not congruent to each other, and it might have a disadvantage in terms of mathematical formulation and operation via matrix representation.
Moreover, several experimental sciences have developed indices of similarity between two sets.Jaccard index is proposed as a measure of the similarity in plant biology [6].It has several versions coming from different disciplines, such as Tanimoto coefficient [7] in computer science, Tversky index [8] in psychology, pARIs [9][10][11][12] in cognitive science.These indices have almost the same forms defined as a quotient of dividing a meet of two sets by a join of them.Especially, pARIs based on the research of causal inference [13] including causal induction has interesting backgrounds in terms of detection of general association.Causal induction is a cognitive feature of humans finding a causality from the co-occurrence of two kinds of events [14].
In this context, the indices give non-parametric statistic models as the strength of recognizing causality.The two kinds of events of the models can be assigned to Bernoulli variables when we regard the models as functions with random variables in probability theory.pARIs is an index of the causal induction well reflecting human cognitive features, although why the index well fits human behaviour is not known.It also has the same mathematical form as the Jaccard index.
As the introduction of the present article, we gave the above overview of the three contexts, i.e. data mining of nonlinear correlation, operability and analyzability of mathematical method, and the indices of similarity of sets originated from the experimental sciences.At the junction of the above three contexts, we propose the concepts of the Jaccard cell and the Jaccard matrix.Jaccard matrix is defined as multi-dimensional Jaccard indices by one of the authors of the present article.The first idea of that appeared as pARIX in 2014 [15], and the name of the Jaccard matrix appeared in 2016 [16].The Jaccard matrix as a component of statistical informatics let be independent of pARIs and pARIX connected to cognitive sciences in the description of this article.This concept in this article is well defined and analysed mathematically, but that in these previous studies sufficiently had been not yet.Concretely, we think that the idea of monotonicity in the previous studies was too rough to analyse the results.The mathematical improvements in this study promote an understanding of the meaning of aggregation of the Jaccard index and of the mechanism of nonlinear correlation derived from it theoretically and numerically.
In the following second section of the present article, we induce the Jaccard cell and Jaccard matrix from the Jaccard index and pARIs.In the third section, we mathematically define the Jaccard cell and Jaccard matrix for the data on the Euclidean plane and show their computational procedure.In the fourth section, we study the mathematical features of the Jaccard cell, which is approximated locally by the exponential function of MI.In the fifth section, we study a simple model of the variance of the Jaccard cells induced from the approximated equation and show that maximizing the variance of the Jaccard cells supports determining the unique cell size.In the sixth section, by the simulations, we numerically analyse the features of the Jaccard cell and Jaccard matrix theoretically shown in the previous sections.Finally, we briefly discuss the maximized standard deviation of the Jaccard cells as a decreasing function of the noise scale of the data in terms of nonlinear correlation and summarize this study as that of a nonlinear filter.

Jaccard cell and Jaccard matrix
In this section, we define the Jaccard cell and Jaccard matrix.Their names inherit from the Jaccard index, known as the first one of the affiliated indices.We introduce them as a generalization of a statistical index of causal induction, pARIs, which is one of such indices, The explanation might make our acceptance of their following calculation procedures easy.

pARIs and Jaccard index
In the quantitative studies of causal induction, the counting information of co-occurrence events is given by a form so-call 2 × 2 contingency table [14].Let C and E be events that correspond to causes and effects.Let a, b, c, d be frequency of occurrence of the events, (C ∧ E), (C ∧ ¬E), (¬C ∧ E), (¬C ∧ ¬E), where ¬ be the negation.Then the 2 × 2 contingency table is given as Table 1.
Note that Table 1 is the transposed 2 × 2 contingency table of [14]; i.e. we swap the row and column of Table 1 to be easier to associate the row and column of the table with that of the matrix and with the horizontal x-axis and vertical y-axis of R 2 in the following sections.In addition, even not to distinguish rows with columns of Table 1 and the following Table 2 does not make trouble in our study, since the Jaccard cell and pARIs are symmetric to swapping rows with columns.Although pARIs is an index of causal induction, it has no asymmetricity between cause and effect.
Given information of the 2 × 2 contingency table, we can define pARIs as a statistical index of causal induction [9,11] by the following: where it satisfies 0 ≤ pARIs ≤ 1 for all a, b, c ∈ N. Jaccard index is defined as model of strength of human causal inference:

Definition of Jaccard cell and Jaccard matrix
In this section, we define the Jaccard cell and Jaccard matrix.Let X and Y be Bernoulli random variables X, Y : = {notoccur, occur} → {0, 1}.First, we define an index ψ as the following: Equation ( 3) is the same mathematical form as the Jaccard index and pARIs.It is a model for the Jaccard index and pARIs in probability theory.Secondly, we define Jaccard cell ψ ij as a generalization of the index ψ in (3) to that of multi-dimensional Bernoulli random vectors.Let X = (X 0 , . . ., X j , . . ., X m−1 ) and Y = (Y 0 , . . ., Y i , . . ., Y n−1 ) be one-hot vectors following the categorical distributions, where the one-hot encoding or 1-of-K encoding, like X = (0, 0, . . ., 1, . . ., 0), is well known in machine learning.A categorical distribution P cat , also called a generalized Bernoulli distribution, is a particular case of the multi-nomial distribution [17].Given the above X and Y, we define Jaccard cell ψ ij as the following: The given X j and Y i equal X j , ) following the Bernoulli random variables under the condition, Consequently, the Jaccard cell ( 5) is a natural extension of the index (3) that is equivalent to the Jaccard index and pARIs.Moreover, we define the Jaccard matrix as the following matrix: given by aggregating the Jaccard cell (5).
Let us be back to the context of pARIs.The extension to the Jaccard cell means that "the event occurs or not" is replaced by "the event indexed by i occurs."In this sense, the 2 × 2 contingency table as Table 1 is also extended to n × m contingency table as Table 2, where c ij ∈ N ∪ {0} is the number of data on event occurrence of (X j = 1) ∧ (Y i = 1).Table 2 derives a frequency distribution on event occurrence: and where j=0 c ij is the data size.Consequently, Equations ( 5), (8), and ( 9) derive the following Jaccard cell given statistically: The denominator of Jaccard cell (10) implies the total number of the local cross-shaped region of n × m contingency table.
In this section, we defined the Jaccard cell and Jaccard matrix.pARIX(i, j) of Equation ( 14) in [15] is mathematically equivalent to the Jaccard cell ψ ij defined in Equation ( 5), except for the insignificant difference about the indices, (i, j).Now, it is reasonable to construct their concept strictly and to give the starting point afresh to research this theme as mathematics for information systems away from the context of cognitive sciences, including pARIs and pARIX.

Jaccard matrix induced from the data on the Euclidean plane
In this section, we induce the Jaccard matrix from the data on the Euclidean plane and give the representations to compute it.See Figure 1 to understand the following definitions briefly.
Assume a two-dimensional rectangular space S x × S y ⊂ R 2 .Let us divide S x × S y into the grid as the direct sums of subspaces, S x := m−1 j=0 S x j and S y := , where m, n ∈ N are the number of the sections of the rows and columns.S x j × S y i expresses a small rectangular cell in the grid.(x 0 , y 0 ) and (x m−1 , y n−1 ) express the centre points of the cells at the bottom left and top right in R 2 .Consequently the size of each cell is x = (x m−1 − x 0 )/(m − 1) and y = (y n−1 − y 0 )/(n − 1), thus S x j × S y i is concretely expressed by [x j − 0.5 x, x j + 0.5 x) × [y i − 0.5 y, y i + 0.5 y), and the centre point is (x j , y i ) = (x 0 + j x, y 0 + i y).
Based on the above procedure, we count the number of the data c ij observed in each cell S x j × S y i , which corresponds to S 3 = S 12 in Figure 1.It derives a n × m matrix of non-negative integers, which corresponds to a n × m contingency table.
Let us adjust some matrix representations for GPU programming.Defining 1 n by a n-dimensional column vector that has 1 as all of the elements, i.e. 1 n := t (1, . . ., 1), we obtain the relationship between the matrix C and the data size N c as the following: On the other hand, defining 1 n×m by the n × m matrix that has 1 as all of the elements, i.e. 1 n×m := 1 n t 1 m , we obtain the following n × m matrix H: The element of H, h ij , is the denominator of the Jaccard cell ψ ij , as well as c ij is the numerator of it, since Equation ( 13) derives The region to count these data corresponds to the cruciform one of S 1 ∪ S 2 in Figure 1.Equation ( 14) has the other representation, i.e.
For all i, j, c ij ≥ 0 thus j =j c ij + i =i c i j ≥ 0. Therefore, h ij = 0 ⇒ c ij = 0 foralli, j.Consequently, n × m Jaccard matrix is given by an element-wise quotient of the matrices C and H as the following: where The mean of ψ ij , ψ, is calculated by The variance of ψ ij , V ψ , is calculated by where • is Hadamard product or element-wise product defined by For instance, of the concrete values of the above, we can calculate the following example: assume S x × S y := [0, 2] × [0, 3] and x = y = 1, then we obtain six cells of the grid in the space.If observed data are given as D = {(0.5,0.7), (0.3, 0.2), (0.4, 2.1), (1.6, 2.5), (1.1, 1.5), (1.3, 1.7), (1.8, 1.2)}, they derive the following matrix: then N c = 7, and then c 11 = 3 means that we observed three data in the grid, [1,2] × [1,2].For the matrix C, we obtain the matrix H and the Jaccard matrix as the following: It is not rare that the matrix C is sparse.The above procedure is inefficient when C is a sparse matrix, although it helps the programming of GPU computing for them.Indeed, the data based on an ordinary function f : S x ⊂ R → R can derive a sparse matrix C by sufficiently small x.Therefore, we can use the list including only the non-zero elements of C, such as the format of COO or CSR [18,19], for effective programming.For instance, the COO format of the matrix (20) is given as the following:

Mathematical analyses of the Jaccard cell and Jaccard matrix
In this section, we analyse the mathematical features of the Jaccard cell derived from data on S x × S y ⊂ R 2 .

Jaccard cell depending on the cell-size parameters
The value of the Jaccard cell depends on the size of the cell ( x, y).Therefore, we construct a representation of the Jaccard cell explicitly integrated with cellsize parameters, i.e. we replace the number of the data discretely counted in Equation ( 10) by the probability density function of them continuously integrated as the following: where δ x , δ y > 0 are the cell-size parameters, and p(x, y) is the probability density function of data observed in the point (x, y).Let us write the regions of the integrals P 1 ,P 2 , and P 3 as S 1 ,S 2 , and S 3 .P 3 means the probability of data observed in S 3 .Figure 1 represents the regions S 1 , S 2 , S 3 ⊂ S x × S y ⊂ R 2 .S 1 ∪ S 2 is a local cruciform region, and S 3 = S 1 ∩ S 2 is the rectangular region of the centre of the former.Using Equations ( 23)-( 25) and replacing the sum on i, j ∈ N with the integral on x, y ∈ R, we obtain the Jaccard cell depending on the continuous variables (x, y) and the cell-size parameters as well as pARIs and the Jaccard cell ψ ij in (10).If we divide S x × S y ⊂ R 2 into n × m rectangular cells by the procedure of the previous section, then x, δ y = 0.5 y, (28) for Equations ( 10) and (26).

Jaccard cell approximated by the exponential function of MI
Correlation coefficient and MI are well-known measures for an association between two random variables.In this section, we study the mathematical relationship between the Jaccard cell (26) and these measures.
Let X and Y be the random variables following the two-dimensional normal distribution p(x, y) with correlation coefficient ρ as the parameter of it: Assume that the observed data for the Jaccard cell (26) follows the distribution (29) in the local cell S 3 , and the centre of the local cell is located on the expected values (μ x , μ y ).This assumption probabilistically expresses observed data have a linear functional structure around the local space where the Jaccard cell is defined, although the spatial pattern of the data might be nonlinear globally.The strength of the linear functional structure corresponds to the correlation coefficient, and the data are normally distributed around the linear function.By the calculation in Appendix, we obtain the approximated Jaccard cell as the following: In the following two cases under different conditions, we show that the Jaccard cells are locally approximated by the exponential of MI.

Case 1: the condition of the cell-size parameters that are proportional to the standard deviations of the data distribution
Let us set a sufficiently small value of δ as and set the cell-size parameters, δ x and δ y , as Substituting the above conditions for Equation (30), we obtain the value of the approximated Jaccard cell (AJC) as the following: A MI I(X, Y) of two normal random variables X and Y following (29) generally depends on only the correlation coefficient ρ of them; i.e.

I(X, Y)
Therefore, if δ is sufficiently small as δ 2π(1 − ρ 2 ), then we obtain the value of the approximated Jaccard cell using the mutual information (JCMI) as the following: We compare the value of (33) with that of (35) numerically in Section 6.1.
Studying the mathematical features of the Jaccard cell based on the approximated equations, we should pay attention to the intrinsic constraint conditions of the parameters, δ and ρ.The Jaccard cell (26) satisfies the condition (27) exactly, therefore we generally obtain Substituting Equations (A5), (A6), (A8) in Appendix and the conditions (31) and (32) for inequation (36), we obtain (37) and then, since |ρ| ≤ 1,

Case 2: the condition of the two cell-size parameters that have the same small value
If we set the cell-size parameters, δ x and δ y , as the same small value then, by the calculation of the Appendix, we obtain the approximated value of the Jaccard cell (26) as the following: , we obtain Equation (41) derives Equation ( 35) under the condition δ 0 = δσ x = δσ y .We analyse a simple model of the above approximated Jaccard cell (40) in the following section.

A simple model of the approximated Jaccard cells and the cell-size parameter determined uniquely
In this section, we study a simple model of the variance of the Jaccard cells (19) induced from the approximated Equation (40).As we know in the previous section, the Jaccard cell and Jaccard matrix depend on the cell-size parameters.We select the value of the cell-size parameter derived by a computable procedure to determine the Jaccard matrix uniquely.The analysis of the model in this section supports the unique determination of the Jaccard matrix and its cell-size parameter.

A simple model of the Jaccard cell
Let β = β(σ ) be a function of σ that is a standard deviation as a scale of the noise of observed data.For instance, in Equation ( 40), which is a part of the denominator of that.In this view, we study the following equation as a model of the Jaccard cell (40): The solid lines of (i) and (ii) in Figure 2 show sketches of the graphs of (43) in the cases that β or δ 0 is fixed as each constant value.

Determination of the cell-size parameter
Assume that there are various σ for the Jaccard cells, i.e. we regard σ as a random variable for each Jaccard cell.
In other words, σ of given whole data is a fixed value, but the locally divided data by the Jaccard cells have the various σ .Note that σ of the whole given data might not even be a mean of σ of the locally divided data.Moreover, assume that β is proportional to σ (i.e.β is also a random variable), where (42) satisfies this assumption.
For simplicity, we assume β of Equation ( 43) is a continuous uniform random variable in δ 0 < β b < β < β a .We can regard Equation (43) as the inverse transformation sampling to make the random numbers of ϕ.Therefore, Equation (43) derives the probability density function of ϕ as the following: Therefore, we obtain the following quadratic functions: and the maximum value of the variance We express the values of δ 0 and φ in We assumed that β is a uniform random variable to express the existence of the various sigma of the Jaccard cells as the simple model.Consequently, this derived the procedure to determine the unique value of the cell-size parameter as Equation (48).

Relationship between the standard deviation of the Jaccard cells and that of the given data
Under the determined δ * 0 by (48), we regard β as a fixed value again (i.e.σ is the value of given data).We can regard the standard deviation substituting δ * 0 for δ 0 of Equation (43).Under the assumptions in which β is proportional to σ and δ 0 β, Equation (49) derives The above relationship between the standard deviation of the Jaccard cells and that of the given data might look strange at first glance, but features of the Jaccard cell could make it available.If the Jaccard matrix also generally realizes the relationship in numerical calculations, we could use SD * as an index for nonlinear correlation.

Jaccard cell depending on the correlation coefficient
As the first of the numerical simulations, we investigate the feature of the Jaccard cell for the change of correlation coefficient ρ. Figure 3 shows the graphs of AJC (33) in δ = 0.2, 0.4, 0.6, 0.8, JCMI (35) in δ = 0.2, 0.4, 0.6, MI (34), and the absolute value of the correlation coefficient |ρ|.By Figure 3, we can see that (33) is well-approximated by (35) based on MI in the small δ (i.e.δ = 0.2), but it is not so in δ = 0.4 and 0.6.Moreover, the condition (38) on δ constrains (33) and (35).Solving (38) for ρ, we obtain the bound of ρ for (33) and (35) as the following: Thus, the bounds in Figure 3 are ρ 0.2 ≈ 0.987, ρ 0.4 ≈ 0.948, ρ 0.6 ≈ 0.878, and ρ 0.8 ≈ 0.770.The values of the approximated Jaccard cells are flat in the range of the low and middle of |ρ|.It shows an abrupt increase in that of the high.Therefore, the Jaccard cell could strongly react to correlation locally formed in an appropriate small region.
Table 3.The instances of the function to generate test data.

Jaccard matrix and the cell-size parameter uniquely determined by the standard deviation maximization
As the second of the numerical simulations, we investigate the mean ψ, the variance V ψ , and the standard deviation V ψ of n × n Jaccard matrix, defined by ( 18) and ( 19).We express the numerical values of them as Ave( ), V( ), and SD( ).The input data set where u is a uniform random variable on an appropriately defined interval of real numbers and r is a noise defined as a normal random variable following Norm(0, σ 2 r ).Table 3 expresses the instances of Equation ( 52) to generate data D. For each function f, we test the 30 data sets for the noise scales σ r = 10 −a (a = 0.1, 0.2, . . ., 2.9, 3.0).The data size of D is N c = 2 × 10 4 .
Figure 4 shows the typical behaviour of Ave( ), V( ), and SD( ) for n which defines n × n Jaccard matrix.These lines are generated by Equation (52) using the function Cos14x in Table 3 and σ r = 10 −3 .3 show the behaviour qualitatively equivalent.
The variances V( ) have the stationary points in the numerical simulations, like the parabolic function of the theoretical model (46).Therefore, we can uniquely determine the value of the cell-size parameter and the Jaccard matrix not only in the theoretical model but also in the numerical calculation.Note that the results of simulations qualitatively correspond with that of the model but do not quantitively.

The standard deviation of the Jaccard cells as a nonlinear correlation
As the third of the numerical simulations, we investigate the relationship between the standard deviation of the Jaccard cells and that of the input data given by (52), theoretically expressed by (50).
Figure 6 shows the values of log 10 SD * for the standard deviation σ r of the additional noise r of Equation (52).Note the irregular representation of axes in Figure 6, i.e. the horizontal axis is a logarithmic one for the value of σ r , and the vertical axis is a normal one for the value of log 10 SD * since the absolute value of the derivative of SD * is very small.SD * = SD( * ) means the stationary point of SD( ) in Figure 5.All conditions of input data are the same as in the previous subsection.The horizontal dashed lines (RandomAve±s) in Figure 6 show the mean ± standard deviation values of SD * generated by 30 data sets of i.i.d.normal random numbers, x, y ∼ Norm(0, 1).In Figure 6 The simple theoretical model (43) of the Jaccard cell predicts the relation (50) that corresponds to κ = 1 in (53), but the numerical results show κ = 0.013.The trend of (53) suggests that SD * might be used as an index of nonlinear correlation by the qualitative correspondence between the theoretical prediction and the numerical results, although it has a quantitative difference between them.

Discussion
In this section, we discuss the implications of the theoretical and numerical results in the previous analyses and summarize the view of our study as nonlinear filter statistics.
First, one of the informatically significant meanings of the Jaccard cell is the behaviour as the exponential function of MI, which is shown by Equations ( 35) and (41) theoretically and by Figure 3 numerically.The local correlation coefficient of the data relates the Jaccard cell to the MI.Interestingly, as a result, the indicators our method aggregates, the Jaccard cells, are related to MI, like MIC.By the effect of the exponential function, the Jaccard cell could more strongly sharpen the x−y dependence of the two-dimensional data than just the MI.
Second, the standard deviation of the data affects the value of the Jaccard cell.The approximated Jaccard cell (41) suggests elongating the data deviation to the direction of each axis decreases the value of the Jaccard cell.That means the localized data increase the value of the Jaccard cell more than the expanded data.
Third, we uniquely determine the Jaccard matrix depending on the cell-size parameters by selecting them as the stationary point of the variances V( ), although we potentially could give the values of the cellsize parameters arbitrarily.The variances V( ) have the stationary points in the numerical simulations, like the parabolic function of the theoretical model (46).Note that the results of simulations qualitatively correspond with that of the model but do not quantitively.Thus, we will need to improve the model to analyse the feature of the Jaccard cell in future research.Fourth, SD * at the stationary point determined by the above method is a decreasing function of the noise scale σ of the data under the constraint conditions.The proportionality (50) suggests it theoretically, and Figure 6 and the proportionality (53) numerically.It means that we might be able to use SD * as an index of nonlinear correlation.MIC needs to vary the heterogeneous grid pattern on the computational procedure for each data.By contrast, the Jaccard matrix has a homogeneous rectangular grid pattern.The variance V( ) of the Jaccard matrix shows unimodality and has a stationary point as the above.Thus, it has good features for finding a solution potentially.As with the stationary point analysis of the above, the difference between theoretical and numerical values is a matter for further study.
The structure of our method is abstractly similar to that of a convolutional neural network (CNN) composed of a convolution layer and a pooling layer.CNN extracts features of input data by a convolution layer called a filter and synthesizes them by a pooling layer.The first layer of our method, making the Jaccard matrix, affects the information derived from input data as a "nonlinear filter" extracting their features, but it is not a convolution operation.The second layer of that calculates an index of nonlinear correlation.We guess that SD * could and should replace an alternative statistic to express nonlinear correlation better in future studies.The Jaccard matrix and the procedure of our method will have been effective yet in the case of the replaced index, sinceV( ) has the determining ability of the cell-size parameters by the unimodality.It is significant to characterize features of numerical data as low-dimensional measures when humans understand and judge the structure and order of the data.In this sense, each AI and statistics have roles for themselves, although they are closely related to modern statistical machine learning.We think nonlinear filters like the Jaccard matrix might work as efficient components of explainable AI and statistics.

Conclusion
In this article, we proposed the Jaccard matrix and the Jaccard cell as the element of that, defined them as the extended concepts of the Jaccard index, and theoretically and numerically analysed them.The data on the Euclidean plane can derive the Jaccard matrix as a sparse matrix.
The Jaccard index is well known as a classical index of similarity between two sets and is derived from the 2 × 2 table describing two binary classes.The Jaccard matrix has n × n (generally n × m) elements called the Jaccard cells.It is an extended concept of the classical Jaccard index and has the arbitrariness of the cell size to derive it from observed data on R 2 .We showed the Jaccard cell inherits the feature of similarity as the exponential function of MI.We theoretically and numerically confirmed that the local correlation coefficient of the data on the Euclidean plane relates the Jaccard cell to the MI.
Although one could potentially select an arbitrary cell size of the grid to make the Jaccard matrix, the knowledge we can obtain from the matrix decreases if the cell size is too big or too small to distinguish the data clusters appropriately.Therefore, the Jaccard matrix needs a computational procedure to determine the cell size within the appropriate scale.Maximizing the variance of the Jaccard cells supports determining the unique cell size, which value locates in the middle range of the parabolic function of the cell-size parameter.
Moreover, the Jaccard matrix could derive an index extracting nonlinear correlation of the data.The maximized standard deviation of the Jaccard cells as such an index is a decreasing function of the noise scale of the data under the constraint conditions.The ability to determine the homogeneous rectangular grid pattern of the Jaccard matrix might be a significant feature for finding nonlinear correlation.
The structure of our method is abstractly similar to that of a convolutional neural network.The first layer making the Jaccard matrix affects the information derived from input data as a nonlinear filter extracting their features.The second layer calculates an index of nonlinear correlation.We would summarize this study as that of a nonlinear filter working as an efficient component of explainable AI and statistics.

Figure 1 .
Figure 1.A sketch of the two-dimensional rectangular space S x × S y ⊂ R 2 to define the Jaccard matrix and the cruciform region S 1 ∪ S 2 to calculate the Jaccard cell.

Figure 2 .
Figure 2. The solid lines of (i) and (ii) show sketches of the graphs of the simple model of the Jaccard cell, Equation (43), in the cases where β or δ 0 is fixed as each constant value.
) Equation (44) is a Pareto distribution cut off by a and b.When we substitute b and β b for ϕ and β of Equation (43), we obtain 2δ 0 < β b .The condition of the total probability b a p ϕ dϕ = 1 derives the equation ln(b/a) = −a(b/a) + a + 1/δ 0 , which uniquely determines the ratio b/a, where 1 < b/a < 1 + 1/(aδ 0 ).Based on the distribution (44), we can calculate the mean of ϕ as φ = E[ϕ] = b a p ϕ ϕ dϕ = δ 0 Z E , where Z E = 1 2 (b − a)(2 − b − a), and the variance of

Figure 4 .
Figure 4.The typical behaviour of Ave( ), V( ), and SD( ) for n which defines n × n Jaccard matrix.These lines are generated by Equation (52) using the function Cos14x in Table3and σ r = 10 −3 .

Figure 5
Figure 5 shows the numerical tests for the theoretical model (46) expressing the estimation of the relationship between the mean and the variance of the Jaccard cells.The plotted dots of it are generated by the function Cos14x of σ r = 10 −3 , 10 −2 , 10 −1 , 10 −0.1 .The grey dashed line shows the parabolic function V( ) = −α 0 (Ave( ) − α 1 ) 2 + α 2 , where α 0 = 0.6, α 1 = 0.61, and α 2 = 0.096, to approximate the dots of σ r = 10 −3 phenomenologically.Although the values of Figures 4 and 5 are the results of the function Cos14x, all of the instances in Table3show the behaviour qualitatively equivalent.The variances V( ) have the stationary points in the numerical simulations, like the parabolic function of the theoretical model (46).Therefore, we can uniquely determine the value of the cell-size parameter and the Jaccard matrix not only in the theoretical model but also in the numerical calculation.Note that the results of simulations qualitatively correspond with that of the model but do not quantitively.

Figure 6 .
Figure 6.The values of log 10 SD * for the standard deviation σ r of the additional noise r of Equation (52).The horizontal axis is a logarithmic one for the value of σ r , and the vertical axis is a normal one for the value of log 10 SD * .