Generative network models of altered structural brain connectivity in schizophrenia

Alterations in the structural connectome of schizophrenia patients have been widely characterized, but the mechanisms remain largely unknown. Generative network models have recently been introduced as a tool to test the biological underpinnings of altered brain network formation. We evaluated different generative network models in healthy controls (n=152), schizophrenia patients (n=66), and their unaffected first-degree relatives (n=32), and we identified spatial and topological factors contributing to network formation. We further investigated how these factors relate to cognition and to polygenic risk for schizophrenia. Our data show that among the four tested classes of generative network models, structural brain networks were optimally accounted for by a two-factor model combining spatial constraints and topological neighborhood structure. The same wiring model explained brain network formation across study groups. However, relatives and schizophrenia patients exhibited significantly lower spatial constraints and lower topological facilitation compared to healthy controls. Further exploratory analyses point to potential associations of the model parameter reflecting spatial constraints with the polygenic risk for schizophrenia and cognitive performance. Our results identify spatial constraints and local topological structure as two interrelated mechanisms contributing to regular brain network formation as well as altered connectomes in schizophrenia and healthy individuals at familial risk for schizophrenia. On an exploratory level, our data further point to the potential relevance of spatial constraints for the genetic risk for schizophrenia and general cognitive functioning, thereby encouraging future studies in following up on these observations to gain further insights into the biological basis and behavioral relevance of model parameters.


Introduction
Schizophrenia is a highly heritable neurodevelopmental disorder Lee et al., 2012 ;Sullivan et al., 2003 ) characterized by abnormalities in perception, cognition, affect, behavior, and social functioning ( van Os and Kapur, 2009 ). Converging evidence sup-More generally, human brain networks show a complex architecture favoring topologically advantageous properties while minimizing material and metabolic costs, in line with the proposal that the developmental architecture of the human brain connectome results from an economic trade-off between minimizing wiring costs and allowing adaptively valuable topological features ( Bullmore and Sporns, 2012 ). Indeed, there is evidence for alterations in both connection distance Bassett et al., 2008 ) and network topology including reduced local clustering and modularity ( Liu et al., 2008 ;Alexander-Bloch et al., 2010 ) in schizophrenia, consistent with a biased trade-off between wiring cost and topology ( Bullmore and Sporns, 2012 ). However, the principles by which brain networks form and facilitate disturbances in schizophrenia are poorly understood.
Current network neuroscience approaches predominantly focus on descriptive individual or population-level differences, offering little insight into the mechanisms that give rise to network alterations in brain disorders ( Vertes and Bullmore, 2015 ;Braun et al., 2018 ). The recent adoption of generative network models (GNMs) may help to address some of these limitations. For example, predictive ( Beul et al., 2015 ) and random network models ( Kaiser et al., 2009 ;Samu et al., 2014 ;Song et al., 2014 ) have been used to investigate cortical wiring principles, and applications of generative network models to brain network across species have substantially increased our understanding of preserved and species-specific evolutionarily features ( Horvat et al., 2016 ;Kaiser and Hilgetag, 2004 ). GNMs formalize the stepwise development, growth, or evolution of networks, which can potentially be linked to brain development if the wiring rules mimic neurodevelopmental factors ( Kaiser and Hilgetag, 2004 ;Betzel and Bassett, 2017 ). One can then compare synthetic networks generated by the model to empirical brain networks reconstructed from neuroimaging data, thereby explicitly testing different mechanistic explanations that might govern their (disordered) structural formation ( Betzel and Bassett, 2017 ;Bassett and Sporns, 2017 ). Two recent examples have tested different wiring rules of healthy brain networks, finding converging evidence for a two-factor model where one factor accounts for the spatial embedding of brain networks by penalizing spatially distant connections while the other factor enhances the complexity of local topological organization ( Betzel et al., 2016 ;Vertes et al., 2012 ). The model parameters could be potentially linked to biological processes as they account for the metabolic cost of wiring and the strength of a Hebbian-like wiring rule, in part buttressed by the fact that they undergo progressive changes over the lifespan and show alterations in disease states ( Betzel et al., 2016 ;Vertes et al., 2012 ).
In addition to their biological plausibility and developmental sensitivity, an appropriate model of brain network architecture might illuminate genetic aspects underlying network alterations and formation in mental disorders. An important strategy is the examination of unaffected first-degree relatives of patients, who have an increased familial risk for developing the disorder ( Erk et al., 2014 ;Collin et al., 2017 ). This strategy allows for the identification of intermediate brain phenotypes linked to psychiatric risk independent of potential disease-related confounders . In addition, the genetic contributions to these phenotypes can be studied with modern genetic approaches utilizing the potential of cumulative genetic risk scores.
Here, we combined GNMs with imaging genetics to identify potential developmental mechanisms promoting the altered formation of structural brain networks in schizophrenia. Building on a family of previously described and validated generative models ( Betzel et al., 2016 ), we first replicated their optimal-fitting model in a healthy sample, and subsequently applied it to a group of unaffected first-degree relatives and schizophrenia patients. Following the hypothesis of an aberrant balance between wiring cost and topological properties during network formation in schizophrenia, we tested whether these model parameters show the quality of an intermediate phenotype. Moreover, on an exploratory level we examined the model parameters for potential associations with schizophrenia polygenic risk and cognitive function.

Participants
Patients were recruited from the Department of Psychiatry and Psychotherapy at the Central Institute of Mental Health in Mannheim, Germany. Diagnoses were made by staff psychiatrists. Clinical evaluation included ascertainment of personal and family history, and detailed physical and neurological examination. Patients were excluded if: (i) they were aged < 18 or > 65 years, or (ii) they had a history of brain trauma or neurological disease. We studied 152 healthy controls (HC) without a first-degree relative with mental illness (mean [SD] age,30.32 [10.28] years; 94 women), 32 unaffected first-degree relatives (REL) of patients with schizophrenia (33.25 [11.50] years, 19 women), and 66 unrelated patients satisfying DSM-IV-TR criteria for schizophrenia (SZ,32.77 [9.26] years; 20 women). All participants provided written informed consent for the protocols approved by the local Ethics Committee of the University of Heidelberg.

Neuroimaging data acquisition and processing
Diffusion Tensor Imaging (DTI) data were acquired with a 3-T Siemens Trio scanner using two echo planar imaging (EPI) sequences with different parameters: 1) 32 channel multi-array headcoil, TE/TR = 86/8400 ms, 2 mm slice thickness, field of view (FOV) = 256 * 256 mm 2 , 64 slices, and 46 diffusion directions at b -value of 1000 s/mm 2 ; 2) 12 channel coil, TE/TR = 86/14,000 ms, 2 mm slice thickness, FOV = 256 * 256 mm 2 , 64 slices, and 60 diffusion directions at b -value of 1000 s/mm 2 . A total of 163 participants were scanned with the first sequence and 87 participants were scanned with the second sequence.
DTI data were preprocessed with standard routines implemented in the software package FSL ( https://fsl.fmrib.ox.ac.uk/fsl/ ) including correction for head motion and eddy currents, extraction of non-brain tissues ( Smith, 2002 ), and linear diffusion tensor fitting. After estimating the diffusion tensor, we performed deterministic whole-brain fiber tracking using a modified FACT algorithm ( Yeh et al., 2013 ). When performing deterministic whole-brain fiber tracking, we initiated 1,000,000 streamlines for each subject and removed those with a length of less than 10 mm. To construct the structural connectome, the cerebral cortex was parcellated into 360 areas ( Glasser et al., 2016 ) and the number of streamlines connecting every pair of brain areas was used as an estimate of structural connectivity. Notably, if the number of streamlines connecting two regions was less than 5, we set the connection weight to zero to minimize bias due to false positives ( Zhang et al., 2015 ;Zhu et al., 2020 ) and to enforce an averaged network density of ~2.5% across subjects. As inter-hemispheric connections cannot be adequately modeled, we focused on the right hemisphere (180 areas) following the precedent set in related prior work ( Betzel et al., 2016 ;Vertes et al., 2012 ). To compare the structural connectome between groups and retain connections not only according to weight but also according to length, every connection was required to be present in at least 70% of subjects ( Roberts et al., 2017 ).

Construction of generative network models
For each subject, we constructed synthetic networks using generative models ( Fig. 1 ). After defining a seed network consisting of all edges that were consistently identified across all subjects, edges between nodes were added one at a time until the number of edges in the synthetic network conformed to that of the observed network. The relative probability of edge formation was evaluated at each step according to the equation: Here ( , ) denotes the fiber distance between brain areas and , which was obtained using a streamline-based quantification of distance Overview of generative network models. Deterministic whole-brain fiber tracking was performed to reconstruct white mater pathways, from which we constructed structural networks linking 180 regions of interest. By retaining edges present in all subjects, a seed network was created and then edges were added stepwise with a certain probability of edge formation, P( u , v ) , until the number of connections in the synthetic network was the same as that in the observed structural network. The fitness of the synthetic network was evaluated by comparing the degree, clustering coefficient, betweenness centrality, and edge length distributions between the synthetic network and the observed structural network. We used BrainNet viewer to visualize brain networks ( Xia et al., 2013 ). ( Roberts et al., 2017 ). Note that controls the edge length; when is negative, short-distance edges are favored, whereas when is positive, long-distance edges are favored. The term ( , ) represents the topological relationship between brain areas and , and represents the relative importance of the topological term. Importantly, ( , ) can be varied to realize different wiring rules. All topological parameters are defined in Table S2 and were computed using the Brain Connectivity Toolbox ( https://sites.google.com/site/bctnet/Home ) as implemented in MATLAB.
In this study, we limited our analysis to four generative models, each representing one of four previously-studied classes: the geometric model, the degree-product model, the clustering-product model, and the matching index (MI) model ( Betzel et al., 2016 ). In the geometric model, the probability of forming a connection between two brain regions is a function of their fiber distance (represent by the spatial parameter: ) , based on the intuition that brain regions are less likely to be connected if they are further apart. In the degree-product, clustering-product, and MI models, the connection probability function includes an additional topological term (represent by the parameter ) which is the product of degrees (number of connections of a brain region) between two nodes, the product of clustering coefficients (fraction of connected triangles around a brain region) between two nodes, or the normalized number of nearest neighbors in common between two nodes (homophily), respectively. Here, the intuition would be that two brain regions are more likely to be connected if their connectivity profiles are similar rather than dissimilar.
To evaluate the fitness of synthetic networks and to optimize models, we define an energy function that measures how dissimilar a synthetic network is from the observed network as follows: Here, each term is a Kolmogorov-Smirnov statistic that compares degree ( ), clustering coefficient ( ), betweenness centrality ( ), and edge length ( ) distributions of synthetic and observed networks. Since we defined energy as the maximum of the four statistics, smaller energy indicated greater fitness.
We used classical Monte Carlo methods to find the parameters ( , ) that generated networks with minimal energy, i.e. networks that were most similar to the observed networks. The procedure starts from randomly sampling 2000 points from the defined parameter space. Then, by computing the energy at each point and dividing the whole parameter space into a subset of Voronoi cells, we sample points preferentially within Voronoi cells with low energy. We repeated this procedure five times until it converged to a (locally) optimal solution. Further details regarding this process can be found in Ref. ( Betzel et al., 2016 ).
To explore whether our optimal-fitting model could capture other structural network abnormalities in schizophrenia, we also calculated the global efficiency, modularity (Q, the degree to which the network may be subdivided into such clearly delineated groups where edges are more likely within groups than between groups, Newman's community detection algorithm with default resolution parameter, gamma = 1 ( Newman, 2004 )) and hub degree (defined as the mean degree of the top 10% highest-degree nodes), and then compared them between HC and SZ in both synthetic and observed networks.

Cognitive assessment and factor construction
In a subset of 120 individuals (74 healthy controls, 21 relatives, and 25 patients), we assessed a range of cognitive subdomains frequently impaired in schizophrenia (including attention and psychomotor speed, executive function, memory, impulsivity, and social emotional cognition) using the Cambridge Neuropsychological Test Automated Battery (CANTAB) ( Mesholam-Gately et al., 2009 ;Barch and Ceaser, 2012 ). Because the CANTAB measures displayed significant shared variance, we performed a principal component analysis (PCA) to reduce the redundancies and minimize potential for Type I error ( Levin et al., 2013 ). For this purpose, we selected two main outcome measures per test and performed a PCA, which resulted in five components whose eigenvalues were larger than 1. The first component accounted for 27.1% of the variance, and factor loadings with lower factor values indicated better individual cognitive performance and captured mainly executive function and memory. The detailed description of methods and a full list of included outcome measures and cognitive factors are provided in the supplementary methods and Table S1.

Polygenic risk score
We used standard methods to extract genomic DNA from Ethylenediaminetetraacetic acid blood to perform genome-wide SNP (single nucleotide polymorphism) genotyping of all individuals using the Infinium PsychArray (Illumina Inc). Quality control (QC) and imputation was performed with Gimpute ( Chen et al., 2019 ) including the following steps: Removal of SNPs with sex chromosome heterozygosity, a missing rate greater than 0.05, deviation from Hardy-Weinberg equilibrium in controls ( P < 10 − 6 ) and autosomal heterozygosity deviation of greater than 0.2 as well as removal of samples with a missing rate greater than 0.02. Phasing and imputation was conducted using SHAPEIT and IM-PUTE2 ( ( Kay et al., 1987 ). tSNR = temporal signal to noise ratio, QC = quality control with the imputation reference panel from the 1000 Genome Project dataset (August 2012, 30,069,288 variants, release "v3.macGT1 "). After imputation, we only retained SNPs with an imputation INFO score larger than 0.6, minor allele frequencies larger than 0.01 and successfully imputed in at least 20 individuals. The proportion of alleles shared identity-by-decent estimated using PLINK ( Chang et al., 2015 ( www.cog-genomics.org/plink/1.9/ ) was used to identify relatedness for all pairs of samples. A threshold of ̂> 0.2 was used to identify related pairs of samples and exclude one member of each pair at random.
To control for population stratification, we performed a PCA on the linkage-disequilibrium pruned set of autosomal SNPs using GCTA ( Yang et al., 2011 ). Then we excluded outliers whose principal components were larger than six standard deviations from the group mean and used the first five principal components as covariates in the following association analyses of model parameters.
We computed the polygenic risk score with PRSice v-2, while the expected value of the missing genotypes were imputed based on the sample allele frequency ( Euesden et al., 2015 ). In this study, genomewide association ( Schizophrenia Working Group of the Psychiatric Genomics, 2014 ) nominal P < 0.05, was used to achieve a balance between the number of false-positive and true-positive risk alleles ( Wray et al., 2014 ;Agerbo et al., 2015 ). The association analyses were repeated for thresholds of nominal P < 0.01 and of nominal P < 0.1.

Olanzapine equivalents
To investigate the effect of antipsychotics on the results, we converted the daily doses of patients' antipsychotic medication to olanzapine equivalents (OLZe) according to the classical mean dose method presented by Leucht and colleagues ( Leucht et al., 2015 ). This method is based on the analyses of 13 oral second-generation antipsychotics, haloperidol, and chlorpromazine compared with olanzapine 1 mg/d. To obtain OLZe, we weighted the mean dose of each antipsychotic by the study's sample size and finally divided by the weighted mean olanzapine dose.

Statistical analysis
For each participant, we tuned the parameters ( , ) to the range where the generative model always produced synthetic networks with near-lowest energy, i.e. networks that were most similar to the observed structural brain networks, by using the Monte Carlo methods mentioned above. Within this range, we analyzed the top 1% minimal-energy synthetic networks (100 networks per participant, 10,000 networks in total). We compared individual averaged energy of these top 1% lowestenergy synthetic networks between the four types of generative models and between different groups of participants using a repeated-measures analysis of variance (ANOVA) with the Statistical Package for the Social Sciences 24. We compared the parameters of the optimal-fitting model between groups using a general linear model.
In order to exclude the effect of antipsychotics on our results, we computed individual olanzapine equivalents and then assessed the correlation coefficient between those values and the model parameters.
Furthermore, to investigate the genetic association of the model factors and to circumvent the potential effect of confounding factors not related to the genetic risk for the disorder in patient populations ( Chen et al., 2018 ), we assessed the correlation between the parameters of the optimal-fitting model and polygenic risk scores in healthy controls only while controlling for age, sex, DTI protocol, temporal signal to noise ratio (tSNR) ( Roalf et al., 2016 ), and the first five principal components of population structure. To evaluate the behavioral relevance of the identified model parameters, we assessed the correlation coefficient between the parameters of the optimal-fitting model and the individual cognitive factor loadings, while controlling for age, sex and tSNR in the three groups separately.

Sample characterization
The groups were matched for age, education, tSNR, and head motion, but not for sex and acquisition protocol (see detailed demographic and clinical characteristics as well as image quality control parameters in Table 1 ). To account for the group differences in these latter variables, we included sex and DTI protocol as covariates in all analyses that included multiple groups. The demographic and neuroimaging characteristics for the participants of the different acquisition protocols are provided in Table S3.

Generative network models
Comparison between the four network models revealed significant differences in mean energy (repeated measures ANOVA: F (3,453) = 2964.277, p < 0.001) in HC, with the MI model showing the lowest energy level (see Fig. 2 ). The Akaike information criterion is − 746 for the one-factor (spatial) model and − 1219 for the two-factor (MI) model. In line with this result, the Bayesian Information Criterion of the one-factor model is − 742, and of the two-factor model is − 1212. Both results imply that the quality of the two-factor model is superior to that of the one-factor model (see Supplementary Results for further details). Importantly, when including all diagnostic groups in the analysis, the group-by-model-type interaction was not significant (repeated measures ANOVA with group as a between-subjects factor, and with sex and DTI protocol as covariates: F (6,735) = 0.870, p = 0.516), arguing for the same pattern across all groups. We did not find a sexby-model interaction ( F (3,735) = 1.155, p = 0.326). We also found a significant protocol-by-model interaction effect ( F (3,735) = 26.37, p < 0.001), which comes from the spatial and clustering-product models (see Supplementary Fig. 6 for more details). Hence, in our subsequent investigation we focused on the analysis of the MI model, as among the four tested classes of generative network models it provided the optimal fit to the individual, experimentally-derived structural networks across diagnostic groups. We show the energy landscape and the parameter distribution of the MI model for different groups in the supplement (Fig.  S2). Compared to MI models for the randomized networks, MI models for observed network showed lower energy and significantly different parameters distributions ( Figure S4), implying that the MI model contains meaningful information that is worth investigating further. To show how similar the synthetic networks and observed networks were at the edge level, we calculated the percent in edge overlap and the percent of correctly-modeled subjects for each edge among different groups (Fig. S3).
When comparing global efficiency, modularity, and hub degree between groups, we detected significant between-group differences in global efficiency (ANOVA, with same covariates; F (2,245) = 5.289, p = 0.006) and in hub degree (ANOVA; F (2,245) = 3.875, p = 0.022) in the observed networks. In the synthetic networks, we also detected a marginal between-group difference in hub degree (ANOVA; F (2,245) = 2.751, p = 0.066), and a significant difference in global efficiency (ANOVA; F (2,245) = 3.967, p = 0.02). No group differences were found in modularity for either the observed data (ANOVA; F (2,245) = 0.088, p = 0.916) or the synthetic network (ANOVA; F (2,245) = 0.039, p = 0.962, see Fig. 3 A). Note that, our definition of hub degree ensures that every network has the same number of hubs. To demonstrate that our results are robust to variation in the threshold that defines hubs, we repeated our analysis using different threshold. Specifically, we find that when using a threshold of 8%, the hub degrees of healthy controls are higher than the hub degrees of patients and rel-atives for both the observed networks ( F (2,245) = 3.462, p = 0.033) and for the synthetic networks ( F (2,245) = 2.408, p = 0.092). Similarly, when using a threshold of 12%, the hub degrees of healthy controls are higher than the hub degrees of patients and relatives for both the observed networks ( F (2,245) = 4.138, p = 0.017) and the synthetic networks ( F (2,245) = 2.578, p = 0.078).
Please see Fig. S5 for scatterplots of observed networks (data) and synthetic networks (model) in hub degree, global efficiency and modularity for each group.
In Fig. 3 B, we illustrate the synthetic network structure of a single subject at different parameter values, thereby offering an intuition regarding the roles of each parameter in the MI model. As expected, the two parameters and are strongly anti-correlated (Pearson correlation: r = − 0.613, p < 0.001) in HC, and this relation was conserved across diagnostic groups (relatives: r = − 0.535, p = 0.002, patients: r = − 0.492, p < 0.001). Investigating the between-group differences of the two parameters, we found a significant between-group effect on the distance parameter (ANOVA, with sex and DTI protocol as covariates; F (2,245) = 4.777, p = 0.009) and also on the topological parameter (ANOVA, same covariates; F (2,245) = 3.054, p = 0.049, see Fig. 4 ). The effect sizes (Partial Eta Squared) of the significant group difference were 0.038 ( ) and 0.024 ( ). Post-hoc analyses confirmed significant differences between HC and SZ ( : F (1,214) = 3.956, p = 0.048; : F (1,214) = 4.707, p = 0.031) as well as between HC and REL in ( F (1,180) = 8.970, p = 0.003), but not in ( F (1,180) = 2.609, p = 0.108). We found no significant correlation between individual olanzapine equivalents and model parameters in SZ ( : r = − 0.121, p = 0.393; : r = 0.092, p = 0.517). We did not find any significant correlation between model parameters and positive or negative syndrome scale scores (see the Supplementary Results for further details). While the varying size of parcellations in the atlas can potentially impact the number of successive streamlines, this factor did not influence our main results (see Supplementary Results).

Polygenic risk score
Based on the observed alterations in topological network formation in schizophrenia patients and healthy individuals at familial risk for schizophrenia, we further examined, on an exploratory level, potential associations to polygenic risk for schizophrenia. All healthy controls were of Caucasian ethnicity. A PCA plot to account for potential population stratification is provided in the Supplement (Fig. S1). There was a significant group-difference in the genetic risk scores (ANOVA; F (2,194) = 7.685, p = 0.001) with SZ showing the highest risk scores. To characterize the influence of genetic risk for schizophrenia on both model parameters, we correlated the individual participants' risk scores with and . We found a significant positive association for the distance parameter ( r par = 0.173, p = 0.045, where "r par " refers to a partial correlation that treats age, sex, and principal components as covariates. Fig. 5 A) and a weaker, trend-wise negative association for the topological parameter ( r par = − 0.154, p = 0.073) in HC for all genetic variants with a nominal genome-wide significant association to schizophrenia ( P < 0.05, uncorrected). The 95% confidence interval of the correlation coefficient for PRS and estimated by bootstrapping is (0.003, 0.345). Supplemental analyses confirmed the robustness of this finding to the choice of significance threshold used for polygenic risk score computation: for a nominal P < 0.01, we obtained r par = 0.154 and p = 0.074 for , and we obtained r par = -0.156 and p = 0.071 for , while for a nominal P < 0.1, we obtained r par = 0.196 and p = 0.022 for , and we obtained r par = − 0.174 and p = 0.043 for .

CANTAB
Based on the observed alterations in topological network formation in schizophrenia patients and healthy individuals at familial risk for schizophrenia, we further examined, on an exploratory level, potential and schizophrenia patients (SZ). The matching index model captured the abnormal hubness and global efficiency in SZ well, while also modeling no modularity difference between groups. The deep blue bars represent the global efficiency, modularity, and hub degree of the synthetic network (model) and observed network (data) in HC, respectively, while the blue bars represent REL and light blue bars represent SZ. Bars indicate mean values. Error bars indicate 95% confidence intervals. Asterisks denote significant difference between diagnostic groups. (B) Visualization of parameter effects on network structure for a single subject. The topological parameter mainly influences the degree distribution with larger corresponding to the occurrence of more and larger hubs, while the distance parameter mainly affects the edge distance distribution. Here, = − 1.2 and = 0.24 correspond to the optimal-fitting model. associations to cognitive function. Exploring the correlation between the individual participants' scores of the first component and the distance parameter (or the topological parameter ), we found a significantly negative association for ( r = − 0.261, p = 0.029, Fig. 5 B) and no association for ( r = 0.108, p = 0.374) in HC. The 95% confidence interval of the correlation coefficient for the first component and estimated by bootstrapping is ( − 0.482, -0.014). No association was found in relatives ( : r = − 0.137, p = 0.589; : r = 0.035, p = 0.890) or in patients ( : r = 0.261, p = 0.240; : r = − 0.133, p = 0.556). As expected, healthy controls showed lower factor loadings compared to relatives and patients ( F (2, 115) = 7.680, p = 0.001). We did not detect any significant correlation between the other four cognitive components and the network model parameters.
Limiting our analysis to the subset of individuals acquired with the same DTI protocol (32 channel coil), we still found a significant between-group effect on (ANOVA, with sex as a covariate: F (2,159) = 4.827, p = 0.009), and a trend-wise significant betweengroup effect on (ANOVA, with sex as a covariate: F (2,159) = 2.798, p = 0.064). We list the network density for each group and each protocol separately (Tables S4 and S5). Here denotes the residuals after regressing out covariates. (B) After performing principal component analysis on 13 main outcome measures of the neuropsychological test battery, we obtained five components whose eigenvalues were larger than 1, and we found a significant negative correlation between the first component and ( r = − 0.261, p = 0.029) in healthy controls. As in panel (A), denotes the residuals after regressing out covariates.

Discussion
In this study, we replicate prior work in GNM ( Betzel et al., 2016 ) by demonstrating that the resulting synthetic networks simulate many properties of structural brain networks in both health and disease. We further identify significant differences between healthy controls, firstdegree relatives and schizophrenia patients for the two optimally-fitting model parameters in a pattern that aligns with the increasing familial risk for schizophrenia. Specifically, these differences imply lesser geometric constraints and lesser topological constraints in the processes driving network formation in schizophrenia. Moreover, additional exploratory analyses suggest potential associations between network formation parameters, schizophrenia polygenetic risk and latent features of cognitive functioning, respectively. Notably, these observations highlight a potential plausible explanation of how genetic risk contributes to the malformation of brain networks and cognitive dysfunction, although future studies are needed to validate the biological basis and behavioral relevance of model parameters.
Our study extends the current state of the field in several notable ways. Firstly, our data replicate previous accounts of a superior performance of a model integrating geometric constraints and topological complexity of structural brain networks ( Betzel et al., 2016 ). This observation aligns well with the current theory that the regular formation of brain networks follows an important evolutionary rule ( Chklovskii et al., 2002 ;Kaiser and Hilgetag, 2006 ;Chklovskii and Koulakov, 2004 ) by minimizing the metabolic cost of building and maintaining long-range axonal connections ( Chklovskii, 2004 ) while preserving the adaptive properties of the human connectome, such as the capacity for information processing ( Costa Lda et al., 2007 ) through formation of topological features ( Bullmore and Sporns, 2012 ).
Secondly, when extending this framework to model mechanisms of connectome formation in schizophrenia patients and first-degree relatives, we detected no significant between-group differences in the fit between the synthetic networks and the observed networks across models. Importantly, this suggests that the same wiring rules can equally well describe both normal and abnormal brain network formation. Previous studies have found increased connection distance of brain networks Bassett et al., 2008 ) and altered network topology including reduced clustering and modularity ( Liu et al., 2008 ;Alexander-Bloch et al., 2010 ;Lynall et al., 2010 ) in schizophrenia (see Fig. 3 A). In line with this prior work, theoretical accounts suggest that the abnormal organization of brain networks in schizophrenia may result from a biased trade-off between generative factors of homophilic attraction and distance penalization in the process of brain network formation ( Bullmore and Sporns, 2012 ). To probe further, we tested whether individual model parameters contributed differently to network formation in all three study groups. We identify smaller values of the distance parameter in HC than in relatives and patients, while values of the topological parameter were higher in HC than in relatives and patients. This said, larger values in relatives and patients indicate a lower distance penalization, thus increasing the edge length distribution, which is consistent with the increased connection distance of brain networks in schizophrenia Bassett et al., 2008 ). Since the topological parameter mainly influences the degree distribution (see our Fig. 3 B), smaller values in patients and relatives indicate the presence of fewer and smaller hubs) ( Vertes et al., 2012 ). This observation corroborates previous findings suggesting that brain networks in schizophrenia are less clustered and have fewer hubs ( Bassett et al., 2008 ;van den Heuvel et al., 2010 ). Notably, the presence of decreased spatial constraints and homophilic association in our sample of unaffected first-degree relatives suggests that these network mechanisms may resemble intermediate phenotypes, i.e., relate to the increased familial risk for schizophrenia rather than being epiphenomena of potential confounds such as antipsychotic medication.
Thirdly, we explored the associations between model parameters and schizophrenia polygenic risk in healthy individuals. We identified a weak, but nominally significant positive correlation between polygenic risk and the distance parameter . In addition, we detected a trend-wise negative association between polygenic risk and the topological parameter . Together these findings suggest that increasing genetic risk load for schizophrenia leads to a diminished distance penalization and local homophily of the structural brain connectome. While these effects are small and do not survive correction for multiple comparison, they are in the range of commonly observed effect sizes for polygenetic risk associations ( Dudbridge, 2013 ) and provide an interesting observation that could potentially guide replication efforts in future studies. In general, two interconnected factors are thought to contribute to the formation of long distance connections in the brain: a) neurons connect to each other at an early stage of neurodevelopment when the neural system is small in scale, and b) at later stages, axons follow developmental pathways already established by earlier pioneer neurons (fasciculation) ( Kaiser, 2017 ). Studies in animals models suggest that most neurons already form early connections in a spatially localized system ( Varier and Kaiser, 2011 ) where navigation through differential expression of guidance molecules is feasible. Importantly, the genes coding for such guidance molecules have been repeatedly implicated in the pathophysiology of schizophrenia ( Aoki-Suzuki et al., 2005 ;Eastwood and Harrison, 2008 ;Shi et al., 2004 ). It is interesting to speculate that altered spatial expression of guidance molecules imposes less spatial constraints on brain network formation, and these fine-scale mechanisms are reflected in our large-scale model parameters. However, these observations critically require replication in larger datasets and further call for integrating novel, spatially differentiated methods of gene-expression such as the Allen Brain Atlas. Moreover, we additionally sought to investigate the association of the generative network parameters with a broad and general marker of cognition in an exploratory manner. We found a negative correlation between the distance parameter and individual scores of the first principal cognitive factor, predominantly capturing converging aspects of executive function and memory, in HC. In particular, we observed that larger values of (reflecting less geometric constraints, thus a higher probability of long distance connections) were associated with better cognitive performance. Healthy human brain networks usually contain only a small fraction of long-distance shortcuts preferentially linking hub regions Bullmore and Bassett, 2011 ). Although these long-distance connections are expensive in terms of material and metabolic cost, they greatly reduce the path-length of information transfer between spatially remote regions, thus increasing the potential for efficient information processing in a binary network ( Buzsaki et al., 2004 ;Bullmore and Sporns, 2012 ). A number of previous studies have shown that more topologically efficient structural and functional networks are associated with enhanced cognitive performance Breckel et al., 2013 ). Cross-species comparisons of connectome have suggested that modifications of human brain connectivity that are beneficial for higher cognitive function may also render humans vulnerable to brain dysfunction ( van den Heuvel et al., 2019 ). On one hand, long-range connections are essential to maintain the integrity of the expanding brain network and are therefore under positive selection pressure ( Hofman, 2014 ). On the other hand, these long range connections are at an increased vulnerability ( Crow, 1997 ) due to their increased metabolic cost and also due to their topological importance. Hence brain disorders such as schizophrenia manifest themselves in alterations of these long-range connections. Indeed, previous studies have shown an increased proportion of long-range connections in schizophrenia resulting in brain networks shifted towards random networks ( Lo et al., 2015 ). The association of the distance parameter and cognition was not detectable in schizophrenia patients and firstdegree relatives, suggesting an optimum in the number of long-range connections potentially exceeded in those populations, meaning that a higher proportion of long-range connections in patients and relatives may not lead to higher network efficiency, but subtle randomization ( Lo et al., 2015 ).
There are a number of methodological considerations that deserve discussion. Firstly, even complete correspondence of two networks does not necessarily imply that both models have been shaped by the same biological mechanism(s). While we have attempted to limit interpretational restraints by external validation with other well-established network features, it is important to note that generative models can be used to offer candidate mechanisms for an observed topology, but cannot conclusively prove that a given candidate mechanism actually occurred in the developing organism ( Betzel and Bassett, 2017 ). Secondly, while GNMs can provide insights into the formation of structural brain networks, they do not explicitly model neurodevelopmental processes. Such investigations are warranted, and will require longitudinal datasets as well as the use of an advanced GNM framework explicitly modeling variant developmental processes within subjects. Finally, our sample size used here is relatively small for investigating associations between model parameters and polygenic risk score and cognitive scores, calling for bigger sample sizes and replications in independent samples in future studies.
In conclusion, we show that the distinct wiring rules can simulate normal and abnormal network formation in humans, resemble intermediate connectomic phenotypes for schizophrenia familial risk and manifest as altered spatial and topological characteristics of brain connectome formation in schizophrenia and first-degree relatives. We further demonstrate preliminary evidence that the GNM model parameters can be linked to schizophrenia polygenic risk and explore their relevance for cognitive functions frequently disturbed in schizophrenia. Together, these data suggest that brain network formation is under genetic control, is potentially optimized to support cognitive functioning and is disturbed in heritable developmental disorders such as schizophrenia. While these results provide an important first step to harness the potential of GNM by linking it to neurobiologically interpretable factors, longitudinal studies in developmental cohorts are needed to further elucidate successful and aberrant brain connectome formation.