Bayesian Inference and Testing of Group Differences in Brain Networks

Network data are increasingly collected along with other variables of interest. Our motivation is drawn from neurophysiology studies measuring brain connectivity networks for a sample of individuals along with their membership to a low or high creative reasoning group. It is of paramount importance to develop statistical methods for testing of global and local changes in the structural interconnections among brain regions across groups. We develop a general Bayesian procedure for inference and testing of group differences in the network structure, which relies on a nonparametric representation for the conditional probability mass function associated with a network-valued random variable. By leveraging a mixture of low-rank factorizations, we allow simple global and local hypothesis testing adjusting for multiplicity. An efficient Gibbs sampler is defined for posterior computation. We provide theoretical results on the flexibility of the model and assess testing performance in simulations. The approach is applied to provide novel insights on the relationships between human brain networks and creativity.


INTRODUCTION 1.1 Brain networks and creativity groups
Recent advances in brain imaging technologies have played a major role in stimulating current research in neuroscience, while providing key evidence against the modular paradigm which considers brain regions as independent actors in specific cognitive functions (Fuster, 2000).This has motivated an increasing consensus within the neuroscience community towards the need to explore the entire brain architecture, with cognitive processes a result of interconnected brain circuits rather than the expression of specialized brain regions (Bressler and Menon, 2010).
Brain connectivity data, also known as connectomes, are now available to facilitate this task, with non-invasive imaging technologies, such as structural magnetic resonance imaging (sMRI), diffusion tensor imaging (DTI) and magnetization-prepared gradient-echo (MP-RAGE) sequence, providing accurate brain network data at increasing spatial resolution; see Stirling and Elliott (2008), Craddock et al. (2013) and Wang et al. (2014) for an overview and recent developments on these brain scanning technologies.Recent studies measure brain networks along with categorical variables, such as the presence or absence of a neuropsychiatric disease or behavioral state or categorized ability scores, including creativity and different aspects of intelligence.In such studies, there is a need for methods assessing how the brain connectivity structure varies across groups.
We are specifically interested in studying the relationship between the brain connectivity structure and creativity.For subject k (k = 1, . . ., n), data consist of the creativity group y k and an adjacency matrix A k representing the brain network.We focus on dataset MRN-111 available at http://openconnecto.me/data/public/MR/, preselecting subjects having high (> 111, y k = 2) or low (< 90, y k = 1) creativity scores.The first group comprises 19 subjects and the second 17, with thresholds chosen to correspond to 15% and 85% quantiles.Creativity scores are measured via the composite creativity index (CCI) (Jung et al., 2010).We are interested in assessing evidence of differences in brain connectivity between the low and high creativity groups, while additionally inferring the types of differences and learning which connections are responsible for these variations.
The brain networks are obtained by processing structural MP-RAGE and DTI brain scans via a recently developed pipeline (Roncal et al., 2013) to obtain symmetric adjacency matrices A k , k = 1, . . ., n, with elements A vu,k = A uv,k = 1 if there is a connection between brain regions v (v = 2, . . ., V ) and u (u = 1, . . ., v − 1) in individual k and A vu,k = A uv,k = 0 otherwise.In our application, V = 70 and each node in the network characterizes a specific brain region according to the Desikan atlas (Desikan et al., 2006), with the first 35 in the left hemisphere and the remaining There has been recent interest in whether brain connectivity varies with creativity, and which brain circuits are responsible for these variations; see Arden et al. (2010), Jung et al. (2010) and Jung et al. (2013) for a review of recent studies.These previous results are primarily based on descriptive analyses relying on subsamples of the brain circuits or focusing on activity of each brain region in isolation.Hence, they lack the ability to infer how the global brain connectivity architecture differs across creativity groups.When the entire network is considered, current literature proceeds by reducing A k to representative features θ k = (θ 1k , . . ., θ P k ) T prior to statistical analysis, assessing via classical methods whether these network features (e.g., number of connections, average path length) exhibit significant variations across y k .Similar procedures have been recently employed in exploring the relation between the brain network and neuropsychiatric diseases, such as Parkinson's (Olde Dubbelink et al., 2014) and Alzheimer's (Daianu et al., 2013), as well as learning whether brain circuits differ with intelligence groups (Li et al., 2009).
Although providing interesting insights, these analyses are sensitive to the chosen network features, with substantially different results obtained for different types of features (Arden et al., 2010).Moreover, inference is available only on the scale of these topological features, which typically discards important information about the brain connectivity architecture that may crucially explain differences among groups.As there is no gold standard for selecting these features, our contribution aims to overcome previous issues by providing a fully generative Bayesian joint modeling approach for the data (y k , A k ), k = 1, . . ., n, which directly models A k instead of reducing analysis to feature vector θ k , while favoring inference and testing on the association between y k and A k .

Statistical modeling of network-valued random variables
A rich variety of models have been considered for a single network, including exponential random graphs (Erdös and Rényi, 1959;Holland and Leinhardt, 1981;Frank and Strauss, 1986), stochastic block models (Nowicki and Snijders, 2001) and latent space models (Hoff et al., 2002).Such approaches have been recently generalized to more complex data structures.Examples include dynamic networks (Robins and Pattison, 2001;Xing et al., 2010;Durante and Dunson, 2014b), where data are available via time-varying adjacency matrices; multiview networks (Tang et al., 2011) in which the dataset is characterized by multiple network views, with each view measuring a different type of relationship on the same set of nodes; and population of networks (Durante et al., 2014) when the replicated network data consist of measurements of the same type of network on different individuals.When node-or edge-specific variables are available, these variables can impact the edge probabilities (Snijders et al., 2006;Hoff et al., 2002;Durante and Dunson, 2014a).
In our motivating application, we do not have brain region or connection-specific predictors, but instead observe a categorical response y k denoting low or high creativity for the subject.We are interested in defining a fully generative nonparametric model for the joint probability mass function (pmf) of the data (y k , A k ), k = 1, . . ., n.This problem is more closely related to the literature on nonparametric Bayes modeling of multivariate categorical data (Dunson and Xing, 2009).In particular, we can characterize the data as counts in the cells of a (V × V + 1)-way tensor, with the first V × V dimensions corresponding to the binary connections among each pair of brain regions and the last dimension denoting the creativity status.With this representation, we can analyze the data as unstructured multivariate categorical data via Dunson and Xing (2009), but this would fail to exploit the network structure and not accommodate testing.
We instead explicitly consider network information in our model formulation, allowing testing of the association between y k and A k and borrowing information across subjects in learning the network structure.This is accomplished by factorizing the joint pmf for (y k , A k ) as the product of the marginal pmf of y k and the conditional pmf for the brain network A k given y k .By modeling the collection of group-dependent pmfs for A k via a flexible mixture of low-rank factorizations with group-specific mixing probabilities, we develop simple testing procedures for global and local variations, while preserving the dependence structure within the network also under the null hypothesis of no association between y k and A k .
In Section 2 we provide the model formulation via dependent mixture of low-rank factorizations and describe the testing procedures.Prior specification and posterior computation are considered in Section 3, with particular focus on theoretical properties.Section 4 provides simulations to assess model performance and testing behavior.Results for our motivating neuroscience application are discussed in detail in Section 5. Concluding remarks are provided in Section 6.

DEPENDENT MIXTURE OF LOW-RANK FACTORIZATIONS AND TESTING
2.1 Dependent mixture of low-rank factorizations Let (y k , A k ) represent the group membership and the network observation, respectively, for the kth subject, k = 1, . . ., n, with y k ∈ M = {1, . . ., M } and A k the V ×V adjacency matrix characterizing the edges in the network.As the brain network structure is available via undirected edges and selfrelationships among brain regions are not of interest, we induce a joint pmf for (y k , A k ) by placing )/2 the vector encoding the lower triangular elements of A k , which uniquely define the network as A vu,k = A uv,k for every v = 2, . . ., V , u = 1, . . ., v − 1 and k = 1, . . ., n.
Let p y,A denote the joint pmf for data {y k , L(A k )}, k = 1, . . ., n, with p y,A (m, i) = Pr{y k = m, L(A k ) = i}, m ∈ M and i ∈ I V a network configuration.We seek to develop a flexible representation for p y,A , which efficiently exploits network information, favors dimensionality reduction and accommodates simple testing procedures for assessing global and local associations between y k and A k .According to these aims, we factorize p y,A as for every m ∈ M and i ∈ I V .It is always possible to characterize the joint pmf p y,A ∈ P M ×|I V | as the product of the marginal p y ∈ P M for the grouping variable and the conditional pmfs p A|m ∈ P |I V | , m ∈ M, of the network-valued random variable given the group membership.This also favors inference on how the network structure varies across the M groups, with p A|m , m = 1, . . ., M fully characterizing such variations.
The marginal probability p y assigned to the M groups is straightforward to represent, and the challenge arises in modeling the group-specific network pmfs p A|m .For every group m ∈ M, one needs a parameter p A|m (i) for each network configuration i ∈ I V to uniquely characterize p A|m , with the number of possible configurations being |I V | = 2 V (V −1)/2 .Recalling our motivating applications, 2 70(70−1)/2 − 1 = 2 2415 − 1 parameters are required to uniquely define each conditional brain network pmf under the usual restriction i∈I V p A|m (i) = 1, m ∈ M. Clearly this number of parameters is massively largely than the sample size.
To address this dimensionality problem in modeling of network-valued random variables without a categorical response, Durante et al. (2014) recently proposed a mixture of low-rank factorizations.
We generalize this approach to characterize changes with y k , while accommodating testing of the null hypothesis of no global association between y k and A k , H 0 : p y,A = p y p A against the alternative of some association H 1 : p y,A = p y p A , where p A denotes the marginal pmf for the network-valued random variable p A (i) = Pr{L(A k ) = i}.This is accomplished by letting for groups m = 1, . . ., M and i ∈ I V , where (ν 1m , . . ., ν Hm ) T ∈ P H is a vector of group-specific mixing probabilities, G k ∈ {1, . . ., H} a latent class variable with Pr( ) defines the probability of an edge among the lth pair of nodes in class h, for each l = 1, . . ., V (V − 1)/2.
According to (2) and recalling our motivating application, conditionally on G k = h and given the corresponding class-specific edge probability vector π (h) ∈ (0, 1) V (V −1)/2 , the connections among brain regions comprising the vector L(A k ) are independent Bernoulli random variables, with π (h) l ∈ (0, 1) defining the probability of a connection among the pair of brain regions indexed by l for l = 1, . . ., V (V − 1)/2.Dependence on the creativity groups is introduced in the latent class assignment mechanism via group-specific mixing probabilities, so that brain networks in the same class share a common edge probability vector π (h) , with the probability assigned to each class depending on the group membership y k of each subject k.
Although massively reducing the number of parameters from M {2 V (V −1)/2 − 1} to H{M + V (V − 1)/2} − M , the choice of modeling separately the connection probabilities in π (h) in ( 2) provides a sub-optimal formulation, which does not explicitly account for the network structure of the brain interconnections.For example if region v has a high probability of connection with both regions u and w, this information might provide insights into the connectivity behavior among u and w.Hence, focusing on the lth pair of brain regions, corresponding to nodes v and u, v > u and following Durante et al. (2014), we further reduce the dimensionality of the problem and explicitly borrow information within the brain network by mapping each π l )} ∈ via the logistic distribution function, with this latent similarity measure further factorized as for each l = 1, . . ., V (V − 1)/2. is modeled via a low-rank matrix factorization, exploiting the network structure to cope with less information in the data about class-specific deviations.This is accomplished by embedding the brain regions in a class-specific R-dimensional latent space, with the location of each brain region in the latent space denoted by a vector of latent coordinates vr represents the rth coordinate of region v in class h.
According to (3) brain regions with coordinates in the same directions have a higher similarity in class h, with the non-negative parameter λ ≈ 0 limit the effect of the corresponding rth coordinate and favor the deletion of redundant dimensions of the latent space.Hence, the connectivity structure in each class is defined by the latent positions, with the parameters λ (h) r measuring the importance of each dimension.For example, if R = 1, connections among pairs of brain regions {v, u} and {v, w} will favor a link between {u, w}, as if u1 and X (h) w1 ; the same will be true for w1 .
We could have considered more complicated scenarios with group dependence introduced also in π (h) , and in the quantities in (3).However, including group dependence only in the mixing probabilities induces further dimensionality reduction, while favoring borrowing of information across the groups in modeling π (h) , h = 1, . . ., H and provides simpler testing procedures.Moreover, as stated in Lemma 2.1, such a representation is sufficiently flexible to fully characterize the collection of group-dependent pmfs p A|m , m = 1, . . ., M .
Lemma 2.1 Any collection of group-dependent probability mass functions p A|m ∈ P |I V | , m = 1, . . ., M can be characterized as in (2) for some H with class-specific edge probability vectors π (h) , h = 1, . . ., H factorized as in (3) for some R.
This additionally ensures that any joint probability mass function p y,A for the pair {y k , L(A k )} admits representation (1)-( 3) and hence our formulation can be viewed as nonparametric given sufficiently flexible priors for the components.It is also important to note that although we assume edges as independent Bernoulli variables conditioned on G k and the class-specific edge probability vectors, the mixture representation and the shared dependence on a common set of node-specific latent coordinates induce rich dependence structures.All proofs can be found in the Appendix.

Testing for group differences in network data
Factorizations (1) and ( 2) facilitate testing of the global association between the network and the grouping variable.Under factorization (1), the hypotheses of interest, H 0 : p y,A = p y p A versus H 1 : p y,A = p y p A , coincide with assessing whether the conditional network pmf remains equal H 0 : for some m, m ∈ M and i ∈ I V .The characterization of p A|m in (2) further simplifies the previous systems to testing equality of the mixing probabilities via This massively reduces the number of parameters to test from M {2 with the shared dependence on the common set of node-specific latent coordinates preserving the dependence structure in the network also under H 0 .Additionally, recalling Lemma 2.1, under our model formulation the system (4) uniquely characterizes H 0 : p y,A = p y p A versus H 1 : p y,A = p y p A .
Recalling our neuroscience application, rejection of H 0 implies that there are differences in brain architecture across creativity groups, but fails to provide insights on which connections are responsible for such global variations.The global differences may be attributable to variations in all brain connections, but more likely there are changes in specific circuits.To obtain further insights, we consider local tests to learn whether each brain connection L(A k ) l , has no association with

differs across low and high creativity subjects
To conduct inference, we measure the association between L(A k ) l and y k exploiting the model-based version of Cramer's V, proposed in Dunson and Xing (2009), obtaining Measuring the local association with ρ l ∈ (0, 1) provides an appealing choice in terms of interpretation, with ρ l = 0 meaning that p y,L(A k ) l = p y p L(A k ) l , and hence the random variable for connection L(A k ) l has no differences across groups.Additionally, as stated in Lemma 2.2, each ρ l , l = 1, . . ., V (V − 1)/2 can be easily computed from the quantities in our model.

Prior specification and properties
We specify independent priors for the quantities Lemma 3.1 According to the prior specification in (6) the full conditional probability of H 0 is with with Γ(x i ) the Gamma function.
Although providing a key choice for performing global testing, it is impractical to adopt formulation (6) for each local point null H 0l : Hence, we replace local point nulls with small interval nulls } to be easily estimated as the proportion of Gibbs samples in which ρ l > .Moreover, as noted in Berger and Sellke (1987) the small interval hypothesis H 0l : ρ l ≤ can be realistically approximated by H 0l : ρ l = 0 .
Beside key computational properties, as stated in Lemma 3.2, our choices induce a prior Π for p y,A with full L 1 support over P M ×|I V | , meaning that Π can generate a p y,A within an arbitrarily small L 1 neighborhood of the true data-generating model, allowing the truth to fall in a wide class.
Lemma 3.2 Based on the specified priors Π y , Π Z , Π X , Π λ , and Π ν , and letting B (p 0 y,A ) = {p y,A : This is a key property to ensure good performance in our application because without prior support about the true data-generating pmf, the posterior cannot possibly concentrate around the truth.Moreover, as p y,A is characterized by finitely many parameters p y,A (m, i), m ∈ M, i ∈ I V , Lemma 3.2 is sufficient to guarantee that the posterior assigns probability one to any arbitrarily small neighborhood of the true joint pmf as the number of observations n → ∞, meaning that Π{B (p 0 y,A ) | (y 1 , A 1 ), . . ., (y n , A n )} converges almost surely to 1, when p 0 y,A is the true joint pmf.

Posterior computation
Posterior computation is available via the following simple and efficient Gibbs sampler.
4. Sample the testing indicator T from the full conditional Bernoulli with probability (7).
5. If T = 0, let ν m = u, m = 1, . . ., M with u updated from the full conditional Dirichlet Since the number of mixing components in (2) and the dimensions of the latent spaces in (3) are not known in practice, we perform posterior computation by fixing H and R at conservative upper bounds.The priors are chosen to allow adaptive emptying of the redundant components, with the posteriors for parameters controlling unnecessary dimensions concentrated near zero.If all the classes h are occupied, then H should be increased.Similarly, if the posterior for λ (h) R is not concentrated near zero for any h, then R should be increased.Our algorithm can easily be modified to instead allow updating of H and R via slice-sampling methods (Walker, 2007).

SIMULATION STUDIES
We consider simulation studies to evaluate the performance of our method in accurately estimating the joint pmf for the pair (y k , A k ), in correctly assessing the global hypothesis of association among the network-valued random variable and the grouping outcome, and in identifying local variations in each edge probability across groups.For comparison we also implement a MANOVA procedure (see e.g.Krzanowski, 1988) to test for global association between the feature vector θ k and y k , with θ k comprising the most commonly used global features (e.g.density in the whole network, density in nodes blocks, averaged shortest path length, clustering coefficient); see Kantarci and Labatut (2013) for an overview on these topological network measures.For the local testing, we compare our procedure to the results obtained when testing on the association of L(A k ) l and y k for each l = 1, . . ., V (V − 1)/2 via separate two-sided Fisher's exact tests (see e.g.Agresti, 2002).
We consider exact tests to avoid issues arising from the χ 2 approximations in sparse tables.
We generate n = 50 pairs (y k , A k ) from our model ( 1)-( 3) with y k having two equally likely groups p 0 y = (1/2, 1/2) T and A k a V × V network with V = 20 nodes.We consider H = 2 latent classes, with π 0(h) defined as in (3).Brain networks are typically characterized by tighter intrahemispheric than inter-hemispheric connections (Roncal et al., 2013;Durante et al., 2014).Hence, we consider two node blocks {1, . . ., 10} and {11, . . ., 20}, and generate entries in Z 0 to favor more likely connections between pairs in the same block than pairs in different blocks.To assess the local testing performance, we induce group differences only on a subset of node circuits V * ⊂ V .
A possibility to favor this behavior is to let R = 1, λ 0(1) = λ 0(2) = 1 and generate X In both scenarios, inference is accomplished by considering H = R = 10, Pr(H 1 ) = Pr(H 0 ) = 0.5 and letting p y ∼ Dir(1/2, 1/2).To favor deletion of unnecessary classes h, we fix the hyperparameter vector in the Dirichlet for u and u m to a = (a 1 = 1/H, . . ., a H = 1/H) T .As noted in Ishwaran and Zarepour (2002), this choice provides also a finite approximation to the Dirichlet Process.For priors Π Z , Π X and Π λ , we choose the same hyperparameters settings suggested by Durante et al. (2014).We collect 5000 Gibbs iterations, discarding the first 1000.In both scenarios mixing was assessed via effective sample sizes for the quantities of interest for inference ν m , m = 1, . . ., M , π (h) , according to Lemma 2.2.This vector coincides with the group- h) under factorization (2).In both scenarios, most of the effective samples sizes are ≈2000 out of 4000 samples, demonstrating excellent mixing performance.As Figures 2 and 3 provide inference on class-specific quantities, we additionally accounted for label switching via the Stephens (2000) relabeling algorithm.However, no relabeling was necessary in our simulations.
The multiplicative inverse gamma for Π λ and the sparse Dirichlet priors for quantities u and Instead, the MANOVA testing procedure on the features vector favors the hypothesis of no association in both scenarios at a level α = 0.1.This result further highlights how global features often fail in accurately characterizing the whole network architecture.
Figures 2-3 confirm the good performance of our method in correctly recovering the key quantities which define the joint pmf through ( 1) and ( 2) in both scenarios.As expected we learn posterior (lower triangular matrix), and posterior mean of the absolute value of π (h) −π 0(h) (upper triangular).
Squares dimensions are proportional to the length of the 95% highest posterior density intervals.
distributions for the mixing probabilities which shift over the grouping variable or remain constant under dependence and independence, respectively, as shown in Figure 2. Borrowing of information across the groups provides accurate estimates of the class-specific edge probability vectors π (h) with posterior distribution concentrated around the true values, as confirmed in Figure 3.We obtain similar performance in estimating p y , with the posterior concentrated around the true p 0 y .
Focusing on the dependence scenario, Figure 4 shows how accounting for sparsity and network information, via our dependent mixture of low-rank factorizations, provides good estimates of local variations in edge probabilities, correctly highlighting pairs which differ across groups in the true generating process.Conducting inference on each pair of nodes separately provides instead poor estimates, with the sub-optimality arising from inefficient borrowing of information across the edges.This lack of efficiency strongly affects also the local testing performance as shown in Figure 5, with our procedure having higher power than the one obtained via separate Fisher's exact tests.
In Figure 5, each Fisher's exact test p-value is transformed via 1/(1 − ep l log p l ) if p l < 1/e and     Table 1 confirms the superior performance of our approach in maintaining all error rates close to zero, in both global and local testing, while automatically adjusting for multiplicity.The information reduction via features for the global test and the lack of a network structure in the local Fisher's exact tests lead to procedures with substantially less power.Although Table 1 has been constructed using a particular threshold of 0.9 in the local testing procedure, we also have clearly In considering sample size versus type I and type II error rates, it is interesting to assess the rate at which the posterior probability of the global alternative Pr{H 1 | (y, A)} converges to 0 and 1 under H 0 and H 1 , respectively, as n increases.We evaluate this behavior by simulating 100 datasets as in the previous simulation for increasing sample sizes n = 20, n = 40 and n = 100 and for each scenario.Figure 7 provides the histograms showing the estimated posterior probabilities of H 1 for the 100 simulated datasets under the two scenarios and for increasing sample sizes.The separation between scenarios is evident for all sample sizes, with Pr{H 1 | (y, A)} consistently concentrating around 0 and 1 under the no association and association scenario, respectively, as n increases.
When n = 20 the test has lower power, with 32/100 samples having Pr{H 1 | (y, A)} < 0.9 when H 1 is true.However, type I errors were rare, with 1/100 samples having Pr{H 1 | (y, A)} > 0.9 when data are generated under H 0 .These values are ≈0 when the sample size is increased to n = 40 and n = 100, with the latter showing strongly concentrated estimates around 1 and 0, when H 1 is true and H 0 is true, respectively.

APPLICATION TO HUMAN BRAIN NETWORKS AND CREATIVITY
We apply our method to the dataset described in the introduction using the same settings as in We also attempted to apply the MANOVA test as implemented in the simulation experiments, with the same network features.However, the averaged shortest path length was undefined for three subjects, as there were no paths between several pairs of their brain regions.Removing these subjects, there was no significant difference between creativity groups (p=0.134).A similar conclusion is obtained when considering all subjects, but replacing the undefined shortest path lengths with the maximum of the finite averaged shortest path lengths (p=0.085).When excluding this feature, we instead obtain a borderline significant p-value of 0.048.This sensitivity to feature choice further motivates tests that avoid choosing features, which is an inherently arbitrary exercise.
Figure 8 provides summaries of the posterior distribution for πhigh −π low , with πhigh = H h=1 ν h2 π (h)   and πlow = H h=1 ν h1 π (h) encoding the edge probabilities in high and low creativity groups, respec-  Right: highlighted pairs of brain regions with Pr{H 1l | (y, A)} > 0.9.tively, as well as the expected value of the corresponding network-valued random variable.Most of these connections have a similar probability in the two groups, with more evident local differences for connections among brain regions in different hemispheres.Highly creative individuals display a higher propensity to form inter-hemispheric connections.Differences in intra-hemispheric circuits are less evident.These findings are confirmed by our local testing procedure displayed in Figure 9.The threshold on ρ l is fixed at = 0.1 and the decision rule rejects the local nulls when Pr{H 1l | (y, A)} > 0.9.These choices provide reasonable settings based on simulations, and results Previous studies show that intra-hemispheric connections are more likely than inter-hemispheric connections for healthy individuals (Roncal et al., 2013;Durante et al., 2014).This is also evident in our dataset, with subjects having a proportion of intra-hemispheric edges of ≈0.55 over the total number of possible intra-hemispheric connections, against a proportion of ≈0.21 for the interhemispheric ones.Our estimates in Figure 8 and local tests in Figure 9 highlight differences in terms of inter-hemispheric connectivity behavior, with highly creative subjects having a stronger propensity for links between regions in different hemispheres.This is consistent with the idea that creative innovations are a result of communication between brain regions that ordinarily are not strongly connected (Heilman et al., 2003).

0L
These findings contribute to the ongoing debate on the sources of creativity in the human brain, with original theories considering the right-hemisphere as the seat of creative thinking and more recent empirical analyses highlighting the importance of the level of communication between the two hemispheres of the brain; see Sawyer (2012), Shobe et al. (2009) and the references cited therein.Beside the different techniques in monitoring brain networks and measuring creativity, as stated in Arden et al. (2010), previous lack of agreement is likely due to the absence of a unifying approach to statistical inference in this field.Our method addresses this issue, while essentially supporting modern theories considering creativity as a result of cooperating hemispheres.
Figure 10 shows how the differences in terms of inter-hemispheric connectivity are found mainly in the frontal lobe, where the co-activation circuits in the high creativity group are denser.This result is in line with recent findings highlighting the major role of the frontal lobe in creative cognition (Carlsson et al., 2000;Jung et al., 2010;Takeuchi et al., 2010).Previous analyses focus on variations in the activity of each region in isolation, with Carlsson et al. (2000) and Takeuchi et al. (2010) inferring an increase in cerebral blood flow and fractional anisotropy, respectively, for highly creative subjects, and Jung et al. (2010) showing a negative association between creativity and cortical thickness in frontal regions.We instead provide inference on interconnections among these regions, with an increase in bilateral frontal connectivity for high creativity subjects, consistent with both the attempt to enhance frontal brain activity as suggested by Carlsson et al. (2000) and Takeuchi et al. (2010) or reduce it according to Jung et al. (2010).

DISCUSSION
This article proposes the first approach in the literature (to our knowledge) for inference on group differences in network-valued data without reducing the network data to features prior to analysis.The last integral is 1 as it represents the integral of the density of a Dirichlet random variable with parameters n h + a h , h = 1, . . ., H over all its support.The proof for ( H h=1 u n hm hm )dΠ um = B(a + nm )/B(a) proceeds in the same manner.

Figure 1 :
Figure1: For two selected subjects in different creativity groups, the brain network A k for a subset of brain regions, with positions given by the spatial coordinates in the brain.

Representation ( 3
) allows inference on shared versus class-specific components of variability in the brain connectivity structure, with Z l ∈ a latent similarity measure for the lth pair of regions common to all individuals and D (h) l ∈ a class-specific deviation.Here Z l is left unstructured as it can be estimated borrowing information across all individuals, while D (h) l H and ν m = (ν 1m , . . ., ν Hm ) T ∼ Π ν , m = 1, . . ., M , to induce a prior Π on the joint probability mass function p y,A with full support over the M × |I V | dimensional simplex P M ×|I V | , while obtaining desirable asymptotic behavior, simple posterior computation and allowance for testing.As p y is the pmf for the grouping variable, we simply let (p 1 , . . ., p M ) T ∼ Dir(a 1 , . . ., a M ), and consider the same prior specification suggested byDurante et al. (2014) for the quantities in (3) by choosing Student-t priors for the entries in Z, standard Gaussians for the elements in X(h)   and multiplicative inverse gammas for λ (h) ∼ MIG(a, b), h = 1, . . ., H.This choice for Π λ favors shrinkage effects with elements in λ (h) stochastically decreasing towards 0 as r increases, so as to learn the dimensions of the latent spaces and penalize high dimensional representations.A key of our prior specification is incorporation of global hypothesis testing (4) in the definition of Π ν .Specifically letting u = (u 1 , . . ., u H ) T and u m = (u 1m , . . ., u Hm ) T , we induce Π ν throughν m = (1 − T )u + T u m , m = 1, . . ., M, u ∼ Dir(a 1 , . . ., a H ), u m ∼ Dir(a 1 , . . ., a H ), m = 1, . . ., M,(6)T ∼ Bern{Pr(H 1 )}.In (6), T is a hypothesis indicator, with T = 0 for H 0 and T = 1 for H 1 .Under H 1 , we generate group-specific mixing weights independently, while under H 0 we have equal weight vectors.By choosing small values for the hyperparameters in the Dirichlet priors, we favor automatic deletion of redundant classes(Rousseau and Mengersen, 2011).In assessing evidence in favor of the alternative, we can rely on the posterior probability, Pr{H 1 | (y, A)}, or Bayes factor [Pr{H 1 | (y, A)}/Pr(H 1 )]/[Pr{H 0 | (y, A)}/Pr(H 0 )].These quantities are easily obtained from the output of the Gibbs sampler proposed below.Specifically, under our prior specification, the full conditional Pr(T = 0 | −) = Pr(H 0 | −) = 1 − Pr(H 1 | −) is analytically available as in Lemma 3.1.
for all v ∈ V * , while fixing the latent coordinates of the remaining nodes to 0. As a result, no variations in edge probabilities are displayed when mixing probabilities remain constant, while only local differences are highlighted when mixing probabilities shift across groups.Under the dependence scenario, data are simulated with group-specific mixing probabilitiesPr(G k | y k = 1) = (0.8, 0.2) T , Pr(G k | y k = 2) = (0.2, 0.8) T .Instead, constant mixing probabilities Pr(G k | y k = 1) = Pr(G k | y k = 2)= (0.5, 0.5) T are considered under independence.Although we focus only on 20 nodes to facilitate graphical analyses, the mixture representation in (2) and the low-rank factorization in (3) scale to much higher V settings.

Figure 2 :
Figure 2: Boxplots constructed using posterior samples of the group-specific mixing probabilities in the the dependence and independence scenarios.

u
m allow us to efficiently remove redundant dimensions, with empty classes for h = 3, . . ., 0 in both scenarios.Similarly good performance is obtained for the global testing procedure with Pr{H 1 | (y, A)} > 0.99 for the association scenario and Pr{H 1 | (y, A)} < 0.01 when y k and A k , k = 1, . . ., n are generated independently.

Figure 3 :
Figure 3: For the same scenarios, posterior mean of π (h) , for the two non-empty classes h = 1, 2

Figure 4 :Figure 5 :
Figure 4: Lower Triangular: posterior mean of the difference between the edge probability vectors in the two groups under our model (left) and maximum likelihood estimate of this difference when considering each pair of nodes separately (right).Upper Triangular: true differences.Triangles highlight the pairs of nodes which differs across groups in the true generating process.

Figure 6 :
Figure 6: For local testing, ROC curves from each simulation constructed using the true hypotheses indicator vector δ(δ l = 0 if H 0l is true, δ l = 1 if H 1l is true, l = 1, . . ., V (V − 1)/2) and Pr{H 1l | and Pr{H 1l | (y, A)} (left) or 1 − p l with p l the Fisher's exact p-values (right).We highlight in black the ROC curves having the highest, median and lowest area under the curve in the 100 simulations.below p * , with p * the Benjamini and Hochberg (1995) threshold to maintain a false discovery rate FDR ≤ 0.1.To maintain a similar rejection region in the two procedures we reject all l with Pr{H 1l | (y, A)} > p * * , p * * = 1/(1 − ep * log p * ) if p * < 1/e, 0.5 otherwise, under our local Bayesian testing approach.

Figure 7 :
Figure 7: For increasing sample sizes n, histograms of the estimated posterior probabilities of the global alternative H 1 in each of the 100 simulations under association and no association scenario.
superior performance allowing the thresholds in the local tests to varying to obtain the ROC curves shown in Figure (6).
the simulation examples, but with upper bound H increased to H = 15.This choice proves to be sufficient with classes h = 12, . . ., 15 having no observations and redundant dimensions of the latent spaces efficiently removed.The efficiency of the Gibbs sampler was very good, with effective sample sizes ≈1500 out of 4000.Our results provide interesting insights into the global relation between the brain network structure and creativity, with Pr{H 1 | (y, A)} = 0.995 strongly favoring the alternative hypothesis of association between brain region interconnections and level of creativity.

Figure 8 :Figure 9 :
Figure 8: Mean and quartiles for the posterior distribution of the difference between the edge probabilities in high creativity subjects πhigh and low creativity subjects πlow .

Figure 10 :
Figure 10: Left: weighted network visualization with weights given by the estimated difference among the edge probabilities in the two creativity groups πhigh − πlow .Edges colors range from red to green as the corresponding difference goes from −1 to 1. Solid lines refer to inter-hemispheric connections and dashed lines to intra-hemispheric connections.Right: estimated difference among the edge probabilities in the two groups highlighted only for those connections with Pr{H 1l | (y, A)} > 0.9 from our local testing procedure.Frontal lobe regions (circles), other lobes (squares) according to Kang et al. (2012) classification of Desikan atlas in anatomical lobes.
Proof of Lemma 2.2 Recalling factorization 2 and letting I * −l the set containing all the possible network configurations for the node pairs except the l-th one, we have that p L(A k ) l |m (1) is equal to i pmf of independent Bernoulli random variables and hence the summation overI * −l = {0, 1} V (V −1)/2−1 , provides i The proof of p L(A k ) l (1) = M m=1 p y (m) H h=1 ν hm π (h) lfollows directly from the previous result on p L(A k ) l |m (1) after noticing that p L(A k ) l (1) = M m=1 p y,L(A k ) l (m, 1) = M m=1 p y (m)p L(A k ) l |m (1).

Table 1 :
Comparison of error rates for our procedure against MANOVA on features for global testing and separate Fisher's exact tests for local hypotheses.
).Moreover, we adjust for multiplicity in the Fisher's exact tests by rejecting all those pairs having a p-value First Quartile : π high − π low Posterior mean : π high − π low Third Quartile : π high − π low Proof of Lemma 3.1 To prove Lemma 3.1 we need to show that ( H h=1 u n h h )dΠ u = B(a+n)/B(a) and ( H h=1 u n hm hm )dΠ um = B(a + nm )/B(a) under the Dirichlet prior for u and u m , m = 1, . . ., M specified in (6).Hence h + a h )} H h=1 Γ(n h + a h ) h + a h )} = B(a + n)/B(a).