MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition

Partial information decomposition (PID) separates the contributions of sources about a target into unique, redundant, and synergistic components of information. In essence, PID answers the question of “who knows what” of a system of random variables and hence has applications to a wide spectrum of fields ranging from social to biological sciences. The paper presents MaxEnt3D_Pid, an algorithm that computes the PID of three sources, based on a recently-proposed maximum entropy measure, using convex optimization (cone programming). We describe the algorithm and its associated software utilization and report the results of various experiments assessing its accuracy. Moreover, the paper shows that a hierarchy of bivariate and trivariate PID allows obtaining the finer quantities of the trivariate partial information measure.


Introduction: Motivation and Significance
The characterization of dependencies within complex multivariate systems helps to identify the mechanisms operating in the system and understanding their function. Recent work has developed methods to characterize multivariate interactions by separating n-variate dependencies for different orders n [1][2][3][4][5]. In particular, the work of Williams and Beer [6,7] introduced a framework, called partial information decomposition (PID), which quantifies whether different input variables provide redundant, unique, or synergistic information about an output variable when combined with other input variables. Intuitively, inputs are redundant if each carries individually information about the same aspects of the output. Information is unique if it is not carried by any other single (or group of) variables, and synergistic information can only be retrieved combining several inputs.
Despite this great potential, the applicability of the PID framework has been hindered by the lack of agreement on the definition of a suitable measure of redundancy. In particular, Harder et al. [23] indicated that the original measure proposed by [6] only quantifies common amounts of information, instead of shared information that is qualitatively the same. A constellation of nonempty subset of (X, Y, Z). The PID decomposes MI(T; X, Y, Z) into finer parts, namely synergistic, unique, redundant unique, and redundant information. These finer parts respect certain identities [6], e.g., a subset of them sums up to MI(T, X) (all identities are explained in Appendices A and C). Following the maximum entropy approach [24], to obtain this decomposition, it is necessary to solve the following optimization problems: and ∆ is the set of all joint distributions of (T, X, Y, Z). The four minimization problems in Equation (1a,b) can be formulated as exponential cone programs, a special case of convex optimization.
The authors refer to [41] for a nutshell introduction to cone programs, in particular the exponential ones. The full details on how to formulate (1a,b) as exponential cone programs and their convergence properties are explained in [51] (Chapter 5). MAXENT3D_PID on its own returns the synergistic information and unique information collectively. In addition, with the help of the bivariate solver [39] (used in a specific way), the finer synergistic and unique information can also be extracted. Hence, the presented model obtains all the trivariate PID quantities. The full details for recovering the finer parts can be found in Appendices C and D.

Software Architecture and Functionality
MAXENT3D_PID is implemented using the standard PYTHON syntax. The module uses an optimization software ECOS [52] to solve several optimization problems needed to compute the trivariate PID. To install the module, the ECOS Python package has to be installed, and then from the GITHUB repository, the files MAXENT3D_PID.py, TRIVARIATE_SYN.py, TRIVARIATE_UNQ.py, and TRIVARIATE_QP.py must be downloaded [53].
MAXENT3D_PID has two Python classes Solve_w_ECOS and QP. Class Solve_w_ECOS receives the marginal distributions of (T, X), (T, Y), and (T, Z) as Python dictionaries. These distributions are used by Solve_w_ECOS sub-classes Opt_I and Opt_II to solve the optimization problems of Equation (1a,b) respectively. The class QP is used to recover the solution of any optimization problems of Equation (1a,b) when Solve_w_ECOS fails to obtain a solution of a good quality. Figure 1 gives an overview of how these two classes interact.

The Subclass Opt_I and Opt_II
The sub-classes Opt_I and Opt_II formulate the problems Equation (1a,b), use ECOS to get the optimal values, and compute their violations of the optimality certificates. They return the optimal values and their optimality violations. These violations are quality measures of the obtained PID. Figure 1 describes this process within the class Solve_w_ECOS. Note that both sub-classes Opt_I and Opt_II optimize conditional entropy functionals; however, the different number of arguments leads to a difference in how to fit the problems into the cone program and retrieving the optimal solution; hence the requirement of splitting them into different classes.

The Class QP
Class QP acts if Solve_w_ECOS returns values of a subset of Equation (1a,b) with high optimality violations. It improves the errant values by best fitting them using quadratic programming, where the PID identities (A12) are respected.

Using MAXENT3D_PID
The process of computing the PID is packed in the function pid(). This function takes as input the distribution P of (T, X, Y, Z) via a Python dictionary where the tuples (t, x, y, z) are keys and their associated probability P(t, x, y, z) is the value of the key; see Figure 2. The function formulates and solves the problems of (1a,b) using Solve_w_ECOS and, if needed, uses QP to improve the solution. This function pid() returns a Python dictionary, explained in Tables 1 and 2, containing the PID of (T, X, Y, Z) in addition to the optimality violations. . Using MAXENT3D_PID to compute the PID of the distribution obtained from the ANDDUPLICATE gate (andDgate). The ANDDUPLICATE gate evaluates T as the logical and of X and Y (X ∧ Y) such that Z copies X. Table 1.
The keys of the trivariate PID quantities in the returned dictionary. Note that UI(T; X i \X j , X k ) and UI(T; X i , X k \X j ) refer to unique and unique redundant information for X i , X k , X j ∈ {X, Y, Z}, CI(T; X, Y, Z) refers to synergistic information, and SI(T; X, Y, Z) refers to redundant or shared information.  The function pid() has three other optional inputs. The first optional input is called parallel (the default value is parallel='off'), which determines whether the process will be parallelized. If parallel='off', then the process is going to be done sequentially, i.e., the four problems of Equation (1a,b) are going to be formulated and solved one after the other. Their optimality violations are also computed consecutively, and then, final results are obtained; whereas, when parallel='on', the formulation of the four problems Equation (1a,b) is done in parallel. The four problems are solved simultaneously, and finally, the optimality violations along with the final results are computed in parallel. Thus, when parallel='on', there will be three sequential steps: formulating the problems, solving them, and obtaining the final results, as opposed to parallel='off', which requires at least twelve sequential steps.

Keys
The second optional input is a dictionary that allows the user to tune the tolerances controlling the optimization routines of ECOS listed in Table 3. In this dictionary, the user only sets the parameters that will be tuned. For example, if the user wants to achieve high accuracy, then the parameters abstol and reltol should be small (e.g., 10 −12 ) and the parameter max_iter should be high (e.g., 1000). In Figure 3, it is shown how to modify the parameters. In this case, the solver will take longer to return the solution. For further details about the parameter's tuning, check [41].  The third optional input is called output, and it controls what pid() will print on the user's screen. This optional input is explained in Table 4. Table 4. Description of the printing modes in the function pid().

Time Mode:
In addition to what is printed when output=0, pid() prints a flag when it starts preparing the optimization problems in Equation (1a,b), the total time to create each problem, a flag when it calls ECOS, brief stats from ECOS of each problem after solving it (Figure 4), the total time for retrieving the results, the total time for computing the optimality violations, and the total time to store the results.

Illustrations
This section shows some performance tests of MAXENT3D_PID on three types of instances. We will describe each type of instance and show the results of testing MAXENT3D_PID for each one of them. The first two types, paradigmatic and COPY gates, are used as validation and memory tests. The last type, random probability distributions, is used to evaluate the accuracy and efficiency of MAXENT3D_PID in computing the trivariate partial information decomposition. More precisely, accuracy is evaluated as how close the values of UI(T; X\Y, Z) and UI(T; Y\X, Z) are to zero when Z has a considerably higher dimension, which is expected theoretically. The efficiency will be depicted in how fast MAXENT3D_PID is able to produce the results. The machine used comes with an Intel(R) Core(TM) i7-4790K CPU (four cores) and 16 GB of RAM. Only the computations of the last type were done using parallelization.

Paradigmatic Gates
As a first test, we used some trivariate PIDs that are known and have been studied previously [25]. These examples are the logic gates collected in Table 5. For these examples, the decomposition can be derived analytically, and thus, they serve to check the numerical estimations. Table 5. Paradigmatic gates with a brief explanation of their operation, where ⊕ is the logical XOR and ∧ is the logical AND.

Instance
Operation

Testing
The test was implemented in test_gates.py. MAXENT3D_PID returns, for all gates, the same values as ( [25], Table 1) up to a precision error of order 10 −9 . The slowest solving time (not in parallel) was one millisecond.

Copy Gate
As a second test, we used the COPY gate example to examine the simulation of large systems. We simulated how the solver handled large systems in terms of speed and reliability. Reliability, in this context, is meant as the consistency of the measure on large systems and the degree to which the results can be trusted to be accurate enough.
Since X, Y and Z are independent, it is easy to see that the only nonzero quantities are UI(T;

Testing
The test was implemented in test_copy_gate.py. The slowest solving time was less than 100 s, and the worst deviation from the actual values was 0.0001%. For more details, see Table 6. Table 6. Copy gate results. The results are divided into three sets ordered increasingly w.r.t. the size of the joint distributions. Dimensions capture the unordered triplet (|X|, |Y|, |Z|), and the deviation is computed as the maximum over all PID quantities of 100| r − r| where r is the obtained PID quantity and r is the analytical PID quantity. Note that the theoretical results are either zero or log 2 (|S|), where S ∈ X, Y, Z.

Random Probability Distributions
As a last example, we used joint distributions of (T, X, Y, Z) sampled uniformly at random over the probability space, to test the accuracy of the solver. The size of T, X, and Y was fixed to two, whereas |Z| varied in {2, . . . , 14}. For each |Z|, 500 joint distributions of (T, X, Y, Z) were sampled.

Testing
As |Z| increased, the average value of UI(T; X\Y, Z) and of UI(T; Y\X, Z) decreased, while that of UI(T; Z\X, Y) increased. In Figure 6, the accuracy of the optimization is reflected in the low divergence from zero obtained for the unique information UI(T; X\Y, Z) and UI(T; Y\X, Z). In Figure 7, the time has a constant trend, and the highest time value recorded was 0.8 s.

Challenging Distributions
We tested MAXENT3D_PID on randomly uniformly-sampled distributions, but with large sizes of T, X, Y, and Z. For each m, 500 joint distributions of (T, X, Y, Z) were sampled where |T| = |X| = |Y| = |Z| = m and 2 ≤ m ≤ 19. The idea was to check with random and huge distributions (not structured as in the case of the COPY gate) how stable the estimator was.

Testing
For m ≥ 5, some of the optimization problems (1a,b) did not converge due to numerical instabilities. This issue started to be frequent and significant when m ≥ 14, for example 5% of the distributions had numerical problems in some of their optimization problems. We noticed that the returned solution from the non-convergent problem was feasible and far from optimal by a factor of 100 at most. The feasibility of the returned solution suggested fitting it along with the returned (optimal) solutions from the other convergent problems into the system of PID identities (A12), which will reduce the optimality gap.

Recommendation
These challenging distributions have mainly two features, namely the high dimensionality of the quadruple (T, X, Y, Z) and a significant number of relatively small (almost null) probability masses along with few concentrated probability masses. We suspect that these two features combined were the main reason for the convergence problems. Our approach was to use a quadratic programming (Class QP), which focuses on reducing the optimality gap and thus returns a close PID to the optimal PID (in case of no convergence problems).
Furthermore, we advise users to mitigate such distributions by dropping some of the points with almost null probability masses. Since the objective functions in (1a,b) are continuous and smooth (full support distributions) on ∆ P , then the PID of the mitigated distribution is considered a good approximation of that of the original distribution. Although we did not test this ad hoc on MAXENT3D_PID, the same technique was applied to such instances for BROJA_2PID ([51], Chapter 5).
We speculated that when m ≥ 50, the solver will suffer dire numerical instabilities. It is recommended for the user to avoid large discrete binning resulting in humongous distributions.

Time Complexity
Theoretically, Makkeh et al. [39,51] showed that the worst running time complexity for solving (1a) (the hardest problem computationally) was O(N 3 /2 log N) where N = |T × X × Y × Z|. Note that this time complexity bound was for the so-called barrier method, whereas ECOS uses the primal-dual Mehrotra predictor-corrector method [54], which does not have a theoretical complexity bound [55].

Summary and Discussion
In this work, we presented MAXENT3D_PID, a Python module that computes a trivariate decomposition based on the partial information decomposition (PID) framework of Williams and Beer [6], in particular following the maximum entropy PID of [38] and exploiting the connection with the bivariate decompositions associated with the trivariate one [28]. This is, to our knowledge, the first available implementation extending the maximum-entropy PID framework beyond the bivariate case [39][40][41][42].
The PID framework allows decomposing the information that a group of input variables has about a target variable into redundant, unique, and synergistic components. For the bivariate case, this results in decomposition with four components, quantifying the redundancy, synergy, and unique information of each of the two inputs. In the multivariate case, finer parts appear, which do not correspond to purely redundant or unique components. For example, the redundancy components of the multivariate decomposition can be interpreted based on local unfoldings when a new input is added, with each redundancy component unfolding into a component also redundant with the new variable and a component of unique redundancy with respect to it [38]. The PID analysis can qualitatively characterize the distribution of information beyond the standard mutual information measures [56] and has already been proven useful to study information in multivariate systems (e.g., [14,17,37,[56][57][58][59][60][61][62]).
However, the definition of suited measures to quantify synergy and redundancy is still a subject of debate. From all the proposed PID measures, the maximum entropy measures by Bertschinger et al. [24] have a preeminent role in the bivariate case because they provide bounds to any other alternative measures that share fundamental properties related to the notions of redundancy and unique information. Chicharro [38] generalized the maximum entropy approach, proposing multivariate definitions of redundant information and showing that these measures implement the local unfolding of redundancy via hierarchically-related maximum entropy constraints. The package MAXENT3D_PID efficiently implemented the constrained information minimization operations involved in the calculation of the trivariate maximum-entropy PID decomposition. In Section 2, we described the architecture of the software, presented in detail the main function of the software that computes the PID along with its optional inputs, and described how to use it. In Section 3, we provided examples that verified that the software produced correct results on paradigmatic gates, simulated how the software scaled with large systems, and hinted to the accuracy of the software in estimating PID. In this section, we also presented challenging examples where the MAXENT3D_PID core optimizer had convergence problems and discussed our technique to retrieve an approximate PID and some suggestions to avoid such anomalies.
The possibility to calculate a trivariate decomposition of the mutual information represents a qualitative extension of the PID framework that goes beyond an incremental extension of the bivariate case, both regarding its theoretical development and its applicability. From a theoretical point of view, regarding the maximum-entropy approach, the multivariate case requires the introduction of new types of constraints in the information minimization that do not appear in the bivariate case (Section 2 and [38]). More generally, the trivariate decomposition allows further studying one of the key unsolved issues in the PID formulation, namely the requirement of the nonnegativity of the PID measures in the multivariate case.
In particular, Harder et al. [23] indicated that the original measure proposed by [6] only quantified common amounts of information and required new properties for the PID measures, to quantify qualitatively and not quantitatively how information is distributed. However, for the multivariate case, these properties have been proven to be incompatible with guaranteeing nonnegativity, by using some counterexamples [30,32,43]. This led some subsequent proposals to define PID measures that either focus on the bivariate case [23,24] or do not require nonnegativity [26,29]. A multivariate formulation was desirable because the notions of synergy and redundancy are not restrained to the bivariate case, while nonnegativity is required for an interpretation of the measures in terms of information communication [34] and not only as a statistical description of the probability distributions. MAXENT3D_PID will allow systematically exploring when negative terms appear, beyond the currently-studied isolated counterexamples. Furthermore, it has been shown that in those counterexamples, the negative terms result from the criterion used to assign the information identity to different pieces of information when deterministic relations exist [32]. Therefore, a systematic analysis of the appearance of negative terms will provide a better understanding of how information identity is assigned when quantifying redundancy, which is fundamental to assess how the PID measures conform to the corresponding underlying concepts.
From a practical point of view, the trivariate decomposition allows studying qualitatively new types of distributed information, identifying finer parts of the information that the inputs have about the target, such as information that is redundant for two inputs and unique with respect to a third [6]. This is particularly useful when examining multivariate representations, such as the interactions between several genes [8,63] or characterizing the nature of coding in neural populations [64,65]. Furthermore, exploiting the connection between the bivariates and the trivariate decomposition due to the invariance of redundancy to context [28], MAXENT3D_PID also allows estimating the finer parts of the synergy component (Appendix D). This also offers a substantial extension in the applicability of the PID framework, in particular for the study of dynamical systems [66,67]. In particular, a question that requires a trivariate decomposition is how information transfer is distributed among multivariate dynamic processes. Information transfer is commonly quantified with the measure called transfer entropy [68][69][70][71][72], which calculates the conditional mutual information between the current state of a certain process Y and the past of another process X, given the past of Y and of any other processes Z that may also influence those two. In this case, by construction, the PID analysis should operate with three inputs corresponding to the pasts of X, Y, and Z. Transfer entropy is widely applied to study information flows between brain areas to characterize dynamic functional connectivity [73][74][75], and characterizing the synergy, redundancy, and unique information of these flows can provide further information about the degree of integration or segregation across brain areas [76].
More generally, the availability of software implementing the maximum entropy PID framework beyond the bivariate case promises to be useful in a wide range of fields in which interactions in multivariate systems are relevant, spanning the domain of social [12,77] and biological sciences [3,10,17,63]. Furthermore, the PID measures can also be used as a tool for data analysis and to characterize computational models. This comprises dimensionality reduction via synergy or redundancy minimization [19,22], the study of generative networks that emerge from information maximization constraints [78,79], or explaining the representations in deep networks [50].
The MAXENT3D_PID package presents several differences and advantages with respect to other software packages currently available to implement the PID framework. Regarding the maximum entropy approach, other packages only compute bivariate decompositions [39][40][41][42]. The dit package [42] also implements several other PID measures, including bivariate implementations for the measure of [23,27]. Among the multivariate decompositions, the ones using the measures I min [6] or I MMI [80] can readily be calculated with standard estimators of the mutual information. However, the former, as discussed above, only quantifies common amounts of information, while the latter is only valid for a certain type of data, namely multivariate Gaussian distributed. Software to estimate multivariate pointwise PIDs is also available [26,29,81]. However, as mentioned above, these measures by construction allow negative components, which may not be desirable for the interpretation of the decomposition, for example in the context of communication theory, and limits their applicability for data analysis in such regimes [22]. Altogether, MAXENT3D_PID is the first software that implements the mutual information PID framework via hierarchically-related maximum entropy constraints, extending the bivariate case by efficiently computing the trivariate PID measures. Funding: This research was supported by the Estonian Research Council, ETAG (Eesti Teadusagentuur), through PUTExploratory Grant #620. D.C. was supported by the Fondation Bertarelli. R.V. also thanks the financial support from ETAG through the personal research grant PUT1476. We also gratefully acknowledge funding by the European Regional Development Fund through the Estonian Center of Excellence in IT, EXCITE.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Appendix A. Williams-Beer PID Framework
In order to decompose MI(T, S) where T is the target and S are the sources, Williams and Beer [6] defined a set of axioms leading to what is known as the redundancy lattice ( Figure A1). These axioms and lattice form the framework for partial information decomposition (PID) upon which all the exiting definitions of PID are formulated.

Appendix A.1. Williams-Beer Axioms
Suppose that a source A is a subset of S and a collection α is a set of sources. A shorthand notation inspired by [38] will be used to represent the collection of sources; for example, if the system is (T, X, Y, Z), then the collection of sources {{X, Y}, {X, Z}} will be denoted as XY.XZ. [6] defined the following axioms that redundancy should comply with: • Symmetry (S): MI(T; α) is invariant with respect to the order of the sources in the collection. Williams and Beer [6] defined a lattice formed from the collections of sources. They used (M) to define the partial ordering between the collections. The axiom (S) reflects the fact that each atom of the lattice will represent a partial information decomposition quantity. More importantly, not all the collections of sources will be considered as atoms since adding a superset of any source to the examined system does not change redundancy, i.e., (M). The set of collections of sources included in the lattice which will form its atoms is defined as: where P (S) is the power set of S. For this set of collections (atoms), the partial ordering relation that constructs the redundancy lattice is: i.e., for two collections α and β, α β, if for each source in β, there is a source in α that is a subset of that source. In Figure A1, the bivariate and trivariate redundancy lattices are shown. The mutual information decomposition was constructed in [6] by implicitly defining partial information measures δ C (T; α) associated with each node α of the redundancy lattice C ( Figure A1), such that the redundancy measures are obtained as: where ↓ α refers to the set of collections lower than or equal to α in the partial ordering, and hence reachable descending from α in the lattice C.

Appendix B. Bivariate Partial Information Decomposition
Let T be the target random variable, X and Y be the two source random variables, and P be the joint probability distribution of (T, X, Y). The PID captures the synergistic, unique, and redundant information as follows: • The synergistic information between X and Y about T, namely CI(T; X : Y).

•
The redundant information of X and Y about T, namely SI(T; X, Y).

•
The unique information of X about T, namely UI(T; X\Y).

•
The unique information of Y about T, namely UI(T; Y\X).
This decomposition, using Beer-Williams axioms, yields these identities: Given the generic structure of the PID framework, the work in [24] (BROJA) defined PID measures considering the following polytope: where ∆ is the set of all joint distributions of (T, X, Y). The work in [24] (BROJA) used the maximum entropy decomposition over ∆ P in order to quantify the above quantities. Moreover, BROJA assumed that the following assumptions hold.
All partial information measures of the redundancy lattice are nonnegative.
The synergistic term, namely δ(T, XY), vanishes on ∆ P upon minimizing the mutual information MI(T; X, Y).
Under the above assumptions and using maximal entropy decomposition, BROJA defined the following optimization problems that compute the PID quantities.
where CoI(T; X; Y) is the co-information of T, X, and Y defined as MI(T, X) − MI(T, X | Y). Note that [38] proved that (A6c) is equivalent to:

Appendix B.1. Mutual Information over the Bivariate Redundancy Lattice
This subsection writes down some mutual information quantities in terms of redundancy lattice partial information measures using (A3). These formulas will be used in the following subsection to verify that the measures defined in (A6a-c) to quantify the desired partial information quantities. MI(T; X, Y) will be the sum of partial information measure on every node of the redundancy lattice C as follows: MI(T; X, Y) = δ(T, XY) + δ(T, X) + δ(T, Y) + δ(T, X.Y). (A7) The mutual information of one source and the target is expressed as: The mutual information of one source and the target conditioned on knowing the other source is expressed as: The co-information CoI(T; X; Y) is expressed as:

. Verification of BROJA Optimization
This subsection will verify that the measures defined in (A6a-c) quantify the desired partial information quantities under the maximum decomposition principle. Assumption A1 implies that (A11)

Appendix C. Maximum Entropy Decomposition of Trivariate PID
Let T be the target random variable, X, Y, Z be the source random variables, and P be the joint probability distribution of (T, X, Y, Z). Chicharro [38] using maximum entropy decomposed mutual information MI(T, X, Y, Z) into: synergistic, unique, unique redundant, and redundant information. In this decomposition, the unique information, UI(T; X i \X j , X k ), captures the sum of the information that X i has about T solely, δ(T; X i ), and the information X i knows redundantly with the synergy of (X j , X k ), δ(T; X i .X j X k ) for all X i , X j , X k ∈ {X, Y, Z}, • the unique redundant information, UI(T; X i , X j \X k ), captures the actual unique information that X i and X j have redundantly about T, δ(T; X i .X j ) for all X i , X j , X k ∈ {X, Y, Z}, • and the redundant information, SI(T; X, Y, Z) captures the actual redundant information of X, Y, and Z about T, i.e, δ(T; X.Y.Z).
Using Beer-Williams axioms. the decomposition yields these identities: and ∆ is the set of all joint distributions of (T, X, Y, Z). The measure uses the maximum entropy decomposition over ∆ P in order to compute the above quantities. Moreover, the work in [38] made some assumptions over the partial information measures of the redundancy lattice.
Assumption A2 (Assumptions a.1 and a.2 in [38]). On the trivariate redundancy lattice (Figure A1), the following assumptions are made to quantify the PID

1.
All partial information measures of the redundancy lattice are nonnegative.

2.
The terms δ(T; X.Y.Z) and δ(T; X i .X j ) for all X i , X j ∈ {X, Y, Z} are invariant on ∆ P .
All synergistic terms, δ(T; XYZ), δ(T; XY.XZ.YZ), δ(T; X i X j ), and δ(T; X i X j .X i X k ) for all X i , X j , X k ∈ {X, Y, Z} vanish at the minimum over ∆ P .

6.
The partial information measures δ(T; X i .X j X k ) for all X i , X j , X k ∈ {X, Y, Z} vanish at the minimum over ∆ P .
Under the above assumptions and using maximal entropy decomposition, the work in [38] defined the following optimization problems that compute the PID quantities.

Mutual Information over the Trivariate Redundancy Lattice
This subsection writes down some mutual information quantities in terms of the trivariate redundancy lattice's partial information measures using (A3). The verification that the optimization defined in (A13a-d) quantifies the desired partial information quantities was discussed in detail by [38] and so will be skipped. However, these formulas are needed later when discussing how to compute the individual PID terms using a hierarchy of BROJA and [38] PID decompositions. The mutual information quantities are in terms of redundancy lattice partial information measures.
MI(T; X, Y, Z) will be the sum of the partial information measure on every node of the redundancy lattice C as follows.

(A14)
For all X i , X j , X k ∈ {X, Y, Z}, the mutual information of two sources (jointly) and the target is expressed as: For all X i , X j , X k ∈ {X, Y, Z}, the mutual information of one source and the target is as follows: For all X i , X j , X k ∈ {X, Y, Z}, the mutual information of two sources (jointly) and the target conditioned on knowing the other source is evaluated as: For all X i , X j , X k ∈ {X, Y, Z}, the mutual information of one source and the target conditioned on knowing only one of the other sources is written as: For all X i , X j , X k ∈ {X, Y, Z}, the mutual information of one source and the target conditioned on knowing the other sources is: For all X i , X j , X k ∈ {X, Y, Z}, the co-information of two sources and the target is expressed as: For all X i , X j , X k ∈ {X, Y, Z}, the co-information of one source, two sources (jointly), and the target is as follows: For all X i , X j , X k ∈ {X, Y, Z}, the co-information of two sources (jointly), two sources (jointly), and the target is evaluated as: For all X i , X j , X k ∈ {X, Y, Z}, the co-information of two sources and the target conditioning on knowing the other source can be written as:

Appendix D. The Finer Quantities of Trivariate Maximum Entropy PID
In Appendix C, the maximum entropy decomposition for trivariate PID returns a synergistic term, which is the sum of all individual synergy quantities, and a unique term, which is the sum of unique and unique redundancy quantities. This section aims to show how to use maximum entropy decomposition for bivariate PID in order to obtain each individual synergy quantity, as well as each individual unique and unique redundancy quantity.
Let T be the target random variable, X, Y, Z be the source random variables, and P be the joint probability distribution of (T, X, Y, Z). Now, BROJA will be applied to some subsystems of (T, X, Y, Z), namely, (T, (X i , X j ), X k ), (one single source) and (T, (X i , X j ), (X i , X k )) (two double sources) for all X i , X j , X k ∈ {X, Y, Z}. Not that the pairs (X i , X j ) and (X i , X k ) are ordered alphabetically. Consider the following probability polytopes upon which the optimization will be carried out:

Appendix D.1. One Single Source Subsystems
These subsystems have the form (T, The unique information of (X, Y) in terms of the redundancy lattice atoms is: Then, in terms of redundancy lattice atoms, the shared information of (X, Y) and Z is: Hence, for all X i , X j , X k ∈ {X, Y, Z}, the BROJA decomposition of (T, (X i , X j ), X k ) is:
δ(T; X i .XX j ), for all X i , X j ∈ {Y, Z} since the (X, X j ) marginal is fixed.

Appendix D.3. Synergy of Three Double Source Systems
Consider the system of the form (T, (X, Y), (X, Z), (Y, Z)). The sources here are called composite, as they are compositions of the primary sources X, Y and Z. Applying the PID measure [38] based on maximum entropy decomposition (A13a-d) captures the synergy of composite sources only and cannot capture other contributions such as those involving unique or redundant composite sources; meaning that the optimization (A13a) is the only useful one for a system of composite sources. Therefore, using (A13a), the optimization is taken over the polytope: ∆ XY.XZ.YZ P = {Q ∈ ∆; Q(T, X i , X j ) = P(T, X i , X j ) for all X i , X j ∈ {X, Y, Z}}. (A25) In this polytope, MI(X i , X j ), CoI(X i , X j ), and MI(X i | X j ) are invariant for all X i , X j ∈ {X, Y, Z}. Therefore, in addition to Assumption 2, the following partial information measures are invariant ∆ XY.XZ.YZ P .

1.
δ(T; X k .X i X j ), for all X i , X j , X k ∈ {X, Y, Z} since the (X i , X j ) marginal is fixed.

2.
δ(T; X i X j .X i X k ), for all X i , X j , X k ∈ {X, Y, Z} since (X i , X j ), (X i , X k ), and (X j , X k ) marginals are fixed.

4.
δ(T; X i ), for all X i , X j , X k ∈ {X, Y, Z} since MI(T; X i ) and δ(T; X i .X j X k ) are invariant over ∆ XY.XZ.YZ P .

5.
δ(T, X i X j ), for all X i , X j , X k ∈ {X, Y, Z} since CoI(T; X i ; X j ), δ(T, X i X j .X i X k ), δ(T, X i X j .X j X k ), δ(T, X i X j .X i X k .X j X k ), and ∆(T; X k .X i X j ) are invariant over ∆ XY.XZ.YZ P .
Hence, the only partial information measure that is not fixed is δ(T; XYZ) and: , and δ(T; Z.XY) are recovered from UI(X k \X i , X j ) of (T, (X i , X j ), X k ) and UI(X k \X i , X j ) of (T, X, Y, Z), for all X i , X j , X k ∈ {X, Y, Z}.
To recover the individual synergistic quantities, construct the following system of equations from the synergy of (T, X, Y, Z), (T, (X, Y), (X, Z), (Y, Z)), (T, (X i , X j ), X k ), and (T, (X i , X j ), (X i , X k )) for all X i , X j , X k ∈ {X, Y, Z}. This hierarchy that is needed to compute the trivariate PID quantities is implemented in script file test_trivariate_finer_parts.py.