Topological data analysis of task-based fMRI data from experiments on schizophrenia

We use methods from computational algebraic topology to study functional brain networks in which nodes represent brain regions and weighted edges encode the similarity of functional magnetic resonance imaging (fMRI) time series from each region. With these tools, which allow one to characterize topological invariants such as loops in high-dimensional data, we are able to gain understanding of low-dimensional structures in networks in a way that complements traditional approaches that are based on pairwise interactions. In the present paper, we use persistent homology to analyze networks that we construct from task-based fMRI data from schizophrenia patients, healthy controls, and healthy siblings of schizophrenia patients. We thereby explore the persistence of topological structures such as loops at different scales in these networks. We use persistence landscapes and persistence images to represent the output of our persistent-homology calculations, and we study the persistence landscapes and persistence images using k-means clustering and community detection. Based on our analysis of persistence landscapes, we find that the members of the sibling cohort have topological features (specifically, their one-dimensional loops) that are distinct from the other two cohorts. From the persistence images, we are able to distinguish all three subject groups and to determine the brain regions in the loops (with four or more edges) that allow us to make these distinctions.


INTRODUCTION
Schizophrenia is a chronic psychiatric disorder that affects more than 21 million people worldwide (World Health Organization, 19 September 2015). Up to 80% of the risk factors appear to be genetic, although it has proven difficult to identify the specific genes that are involved in the disease (Bertolino & Blasi, 2009). The disease usually commences in early adulthood, and symptoms range from hallucinations and avolition to cognitive deficits (such as impaired working memory) (Dawson et al., 2014;World Health Organization, 19 September 2015). The cause of the cognitive deficits is thought to originate from compromised functional integration between neural subsystems (Bassett et al., 2008;Bullmore, Frangou, & Murray, 1997;Dawson et al., 2014;Peled et al., 2001). There can be significant differences in the properties of time series from imaging measurements of healthy versus schizophrenic individuals. Different studies have found seemingly contradictory results when comparing functional magnetic resonance imaging (fMRI) time series from two distinct brain regions in a schizophrenia patient and a healthy control. The majority of studies have concluded that schizophrenia patients have less-similar time series across different brain regions . Zalesky, Fornito, Egan, Pantelis, and Bullmore (2012) suggested that such reduced similarity may arise from an altered coupling between brain regions and local decoherence within brain regions in schizophrenia patients. However, some studies have observed that schizophrenia patients have more-similar series than controls across brain regions. For a detailed discussion of these seemingly contradictory findings, see Fornito and Bullmore (2015). In some cases, methodological steps in fMRI analyses seem to yield increases in these similarities, but abnormal neurodevelopment or drug treatment may play a role in increasing them in other cases (Fornito & Bullmore, 2015).
Studies of functional networks of schizophrenia patients have revealed that such networks differ significantly from the functional networks of healthy controls (Alexander-Bloch et al., 2012;Bassett et al., 2008;Fornito et al., 2012;Y. Liu et al., 2008;Lynall, Bassett, Kerwin, McKenna, & Kitzbichler, 2010;Rubinov & Bullmore, 2013;Singh & Bagler, 2016). For example, schizophrenia patients can have rather different community structure from controls (Alexander-Bloch et al., 2012;Flanagan, Lacasa, Towlson, Lee, & Porter, 2018). In one paper, Alexander-Bloch et al. (2012) observed that a small subset of brain regions lead to significant differences in the community assignments in schizophrenia patients, whereas the communities for healthy subjects appear to be consistent with each other. Moreover, the maximum modularity of functional networks appears to be smaller for schizophrenia patients than in healthy controls (Alexander-Bloch et al., 2010, 2012. Two recent papers, Flanagan et al. (2018) and Towlson, Vértes, Müller, and Ahnert (2018), compared the network structures of schizophrenia patients and healthy controls under the effects of different drugs and a placebo. t fMRI imaging data: Brain activity is monitored over time while subjects are performing a task.

Functional parcellation:
The brain is divided into regions of interest; these resions are the nodes of a functional network. 5 4 3 2 1 6 Functional network: We determine edge weights between the different brain regions based on their time-series similarity. A large edge weight signifies high similarity, and a small edge weight signifies low similarity between the time series.
In this paper, we construct functional networks using fMRI data from schizophrenia patients, healthy controls, and siblings of schizophrenia patients. We create a nested sequence of networks in which we add edges, one by one, to the networks in order from largest edge weights to smallest. (In the unlikely case of two edges having the exact same weight, we add both edges simultaneously in one step.) We then construct a weight rank clique filtration (WRCF) (Petri et al., 2013) by determining cliques and tracking their changes in every step of the network sequence. We then compute PH and Betti numbers (Croom, 1978;Edelsbrunner & Harer, 2010) of the WRCF and examine the results by applying tools from statistics and machine learning, respectively, on the persistence landscapes and persistence images that result from our computation of PH. We compare our findings from these two approaches and also study Betti numbers using Betti curves. We focus on loops (with four or more edges) 1 in the networks in our nested sequence, rather than on connected components, because one can also study the latter using more conventional approaches, such as by examining the spectrum of the combinatorial graph Laplacian (Bollobás, 1998;Newman, 2018).
The rest of our paper is organized as follows. We introduce the data set and the mathematical methods in an intuitive way in Materials and methods, present our findings in Results, and discuss our comparisons in the context of current biological research in Discussion. We give some additional details about a few results in Appendix.

Data set: fMRI data of schizophrenia patients, siblings of schizophrenia patients, and healthy controls
We use a data set that consists of time series from blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) data collected from 281 subjects (54 schizophrenia patients, 50 healthy siblings of schizophrenia patients, and 177 healthy controls) with 120 time steps (where the length of 1 time step corresponds to ∆t = 2 s). The brain regions were determined according to the Montreal Neurological Institute template (Talairach & Tournoux, 1988). Prior to obtaining the time series, the fMRI data were corrected for head motion, and they were normalized and smoothed with a Gaussian filter. The voxel-wise signal intensities were normalized to the whole-brain global mean. The data set was acquired by Bertolino, Blasi, and their collaborators as part of a larger fMRI data set over a period of approximately 10 years. Subsets of the data set have been studied previously Rampino et al., 2014;Sambataro et al., 2010); these previous studies of the data did not include the data for siblings.
The experimentalists obtained fMRI images while subjects were performing a block paradigm of a so-called 'n-back task'. During an n-back task, subjects are presented with a sequence of numbers. In each step m of the sequence, they are first shown a number and then asked to recall the number from 1 We use the term 'loop' to refer to at least four edges in a network that are connected in a way that forms a cycle. Conventionally, loops (other than self-loops) in undirected graphs must have at least 3 edges, and loops in directed graphs must have at least 2 edges. In our paper, we adapt this terminology to represent the topological features that we detect in our simplicial complexes. sequence step m − n. For example, during a 2-back task, subjects are shown a sequence {. . . , x i−1 , x i , x i+1 , x i+2 , . . . } and are asked to recall number x i−1 while being shown number x i+1 , recall number x i while being shown number x i+2 , and so on. For the present data set, the stimuli consisted of alternating blocks of 40 seconds each of 0-back tasks and 2-back tasks.
We preprocess the data in the following way. For each subject and time step, we calculate the mean signal for white-matter brain regions, the mean signal for regions that consist of cerebrospinal fluid, and the mean of the global signal. In addition to these mean values, we also use the squares and cubes of the global signal means, as well as head-motion parameters (3 translation and 3 rotation parameters), to construct 11 × 120 subject-specific design matrices. We then perform linear regression for each time series using MATLAB's command for the Moore-Penrose pseudoinverse 2 PINV(); we exclude brain regions without grey matter from our calculations. We then use the residuals from the regression as our time series for the 120 brain regions that we list in Tables A.1-A.5.

Functional connectivity
We obtain functional networks from the fMRI time series for each subject by using the 120 distinct brain regions (see Tables A.1-A.5) as the nodes of the networks and calculating Pearson correlations 3 (without a time lag) between the nodes' time series as a measure of pairwise functional connectivity. The values of the pairwise functional connectivity give the edge weights between the brain regions in the functional networks.
In all but one of our analyses, we consider four contiguous time regimes of 30 time points each, yielding four functional networks per subject. (The exception is Subsection , in which we use each subject's full time series, which consists of 120 time points, to construct a single functional network for each subject.) We summarize each functional network in an adjacency matrix A = A(subject, time regime), whose entry A ij is given by the edge weight between node i and node j.
We apply a statistical threshold, described in Bassett et al. (2011), to the weighted adjacency matrices 2 As some of the matrices are ill-conditioned, there are variations in the resulting networks across different runs of the preprocessing code. However, in our observations, the matrices differ by only up to 0.2% of entries after two runs of preprocessing. 3 There are numerous ways to measure functional connectivity (Bullmore & Bassett, 2011;Smith et al., 2011;Zhou, Thompson, & Siegle, 2009). For a discussion in the context of schizophrenia research, see Fornito et al. (2012).
without modifying the remaining edge weights. To obtain the thresholded adjacency matrices, we estimate p-values for the correlations using the MATLAB function CORRCOEF and retain only those entries whose p-value is less than 0.05. We then separate each adjacency matrix into a positive and a negative part, A = A + + A − , and study only the positive A + part of the adjacency matrix 4 . In Fig. 2 Figure 2: Steps that we perform on the preprocessed time series of each brain region to construct a functional network for each subject during each of four time regimes. We study the positive parts of the resulting networks using persistent homology.

Persistent homology
Persistent homology (PH) is a technique from topological data analysis, which aims to understand the 'shape' of data (Otter et al., 2017). PH is based on the topological concept of homology, which is used to study the shape of objects, disregarding changes from stretching and bending.
We motivate our use of PH for brain networks by considering different types of cheese and how they differ in their homology. Calculating homology allows one to differentiate between the shape of a 4 By using this approach, we avoid interpreting of negative correlations between time series. stereotypical Swiss cheese (of the Emmental sort) with holes and the shape of a mozzarella cheese by providing information about the presence or absence of holes in the cheeses. (See Fig. 3 for examples of the aforementioned cheeses.) One can thereby consider the space surrounding the holes; these are the so-called loops. However, homology does not give information about the geometry of the cheeses; for example, it does not 'see' that the Swiss cheese is a cube or that the mozzarella is a sphere (unless it happens to be hollow), as it only detects the differences in the number of holes.
We now give a brief intuitive introduction to a few concepts behind homology and PH for network data. For more mathematical introductions, see Croom (1978); Harer (2008, 2010

Simplicial complexes
To study the characteristics of topological spaces (Kosniowski, 1980), such as the Swiss cheese or the mozzarella, we consider small, simplified pieces ('morsels'), on which we can perform computations more easily. When reassembled, the morsels carry the same overall topological information as the original space. We begin building these morsels (i.e., 'spaces', to be more formal) using a discrete set of points, which we call 'nodes'. We then add 'edges' to connect pairs of nodes; 'triangles', which consist of three nodes, three edges, and a face; 'tetrahedra'; and so on. Formally, these elements are called k-simplices, where k indicates the dimension of the simplex. A point is a 0-simplex, an edge is a 1-simplex, a triangle is a 2-simplex, and the tetrahedron is a 3-simplex.
We can combine different simplices to capture different aspects of a topological space. For example, to capture the holes in the Emmental cheese, we glue together a collection of triangles and edges around the holes, enclosing the same number of holes as in the original cheese. Note that we can only capture the holes that are enclosed inside the cheese (using the triangles), as one can deform the visible holes on the surface into a smooth surface of the cheese. For demonstrative purposes, we therefore assume that the Emmental cheese in Fig. 3 is a cross section of a larger cheese that encloses the holes that are visible in the image.
One can combine simplices to obtain a simplicial complex Σ, and we take the dimension of Σ to be the dimension of its highest-dimensional simplex. We show examples of simplicial complexes in Fig. 4, where we again note that we are assuming that the Emmental cheese is only a cross section of a larger hunk of cheese.

Homology and Betti numbers
Homology assigns a family of vector spaces (called homology groups in more general settings) to a simplicial complex. For a given dimension, the vector spaces capture the topological features in that dimension. For example, for dimension 0, homology gives a vector space whose elements are connected components; for dimension 1, homology gives a vector space that has loops as its elements. The dimensions of these vector spaces are called Betti numbers, where β D denotes the Betti number for dimension D. One can interpret the first three Betti numbers (β 0 , β 1 , and β 2 ), respectively, as representing the number of connected components, the number of 1-dimensional holes (i.e., loops), and the number of 2-dimensional holes (as found in the Emmental cheese) in a simplicial complex.

Weight rank clique filtration (WRCF)
Similarly to being able to distinguish between two types of cheese, we are interested in whether we can use homology (and specifically persistent homology) to distinguish between functional networks of schizophrenia patients, siblings of schizophrenia patients, and healthy controls.
In a network, we take a loop to consist of a sequence of four or more nodes and edges that begins and ends at the same node. If two loops surround the same hole and can be deformed into one another in the space without tearing open one of the loops, then one counts the loops only once, and we construe them to be different representatives (also called generators) of a loop.
To obtain simplicial complexes from a weighted network, we construct a so-called filtration. A filtration is a sequence of embedded simplicial complexes that starts with the empty complex: One can obtain a filtration from data in various ways. When given data in the form of a weighted network, the easiest way is to filter by weights (Lee, Kang, Chung, Kim, & Lee, 2012). In the first filtration step, one includes all nodes and the edge(s) with the largest weight in the simplicial complex. In the second step of the filtration, one adds the edge(s) with the second-largest weight to the simplicial complex from step one, and so on. In this way, one obtains a sequence of embedded simplicial complexes that fulfills the properties of a filtration. To construct a weight rank clique filtration (WRCF) (Petri et al., 2013), one performs one additional step: Whenever three edges in a simplicial complex of a filtration form a triangle, one fills in the associated face, and one construes the triangle as a 2-simplex. Similarly, when four nodes are all connected pairwise by edges, the nodes form a (filled) tetrahedron (i.e., a 3-simplex). We use the WRCF to analyze our weighted networks. The WRCF has been applied to weighted neuronal networks in several previous studies, including Giusti et al. (2015); Petri et al. (2014Petri et al. ( , 2013; Stolz et al. (2017).
We can use homology to study topological features, such as loops, in every step of a filtration and determine the amount of persistence of a feature with respect to the filtration (Otter et al., 2017). We say that a topological feature h in a given dimension is born at filtration step m if the homology group of Σ m is the first homology group of a simplicial complex in the filtration to include the feature. Similarly, we say that a topological feature dies at filtration step n if it is present in the homology group of Σ n−1 but not in the homology group of Σ n . The lifetime of a feature in a filtration is defined as the persistence p. That is, If a feature persists until the last filtration step, we say that it has infinite persistence. Persistence was first used as a measure to rank topological features by their lifetime in a filtration in Edelsbrunner et al. (2002).

Representations of persistent homology
There are multiple ways to represent the output of persistent homology (PH) calculations and to visualize the persistence of topological features and their location within a filtration. The most common representations are barcodes and persistence diagrams. In recent years, a desire to leverage the output of PH computations for machine-learning and data-mining tasks has resulted in the development of alternative representations to both barcode and persistence diagrams (Otter et al., 2017). Two of these alternative representations are persistence landscapes (Bubenik, 2015;Bubenik & Dłotko, 2017) and persistence images (Adams et al., 2017). In the following subsubsections we describe barcodes, persistence diagrams, persistence landscapes and persistence images.

Barcodes
A common representation of the output of PH calculations is a barcode (Ghrist, 2008). See   circles), which we interpret as the nodes (indicated by dots) of a network, and weighted edges between the nodes. To construct the filtration, we add the nodes in step 0, followed by the edge with the largest weight in step 1, the edge with the second-largest weight in step 2, and so on. As soon as three nodes are all connected pairwise by edges, we cover the resulting region with a triangle. When four nodes are all connected pairwise, we fill in a tetrahedron. In a 0-dimensional barcode, each connected component is represented by a bar starting when the component is born and ending when it dies (e.g., when two components combine with each other). In a 1-dimensional barcode, each bar represents a loop, which consists of 4 or more edges and starts and ends at the same node. In persistence diagrams, on represents topological features by points rather than by bars. The distance of a point to the diagonal (the purple line) indicates the persistence of the corresponding feature in the filtration.

Persistence diagrams
As an alternative to a barcode, one can use a persistence diagram (PD) (Cohen-Steiner, Edelsbrunner, & Harer, 2007), which is a planar representation of a barcode that conveys the same information: each [birth, death) interval in the barcode is mapped to birth-death coordinates, where the horizontal coordinate of a point represents the birth time of a feature in the filtration and its vertical coordinate represents the death time of that feature. Alternatively, one can use a birth-persistence coordinate system, which is particularly useful when examining persistence images.
Points that are farther away from the diagonal identity line represent more-persistent topological features in a filtration. We show an example of a persistence diagram in Fig. 5. As with barcodes, one can treat persistence diagrams as mathematical objects, and one can endow the space of persistence diagrams with a distance.

Persistence landscapes
A persistence landscape (PL) (Bubenik, 2015;Bubenik & Dłotko, 2017) is a sequence of piecewise-linear functions that one can use to visualize and analyze the information in a barcode or persistence diagram. Instead of using a bar and its length to represent a feature and its persistence, one now interprets each topological feature as a peak, whose height is determined by the feature's persistence and whose location corresponds to the feature's location in the filtration. In contrast to a barcode or persistence diagram, a PL has three dimensions. As in a barcode, the horizontal axis represents the filtration step. The other two dimensions of a PL are the persistence of a feature and the different layers of the PL.
To create a PL from a barcode, one first defines peak functions for each bar. For a given [birth, death) interval in a barcode, one constructs the function One then collapses the collection of peak functions onto the horizontal axis of the barcode. For a barcode that consists of the collection {[birth, death) l } t l=1 of intervals, the qth layer (with q ≥ 0) of the PL (i.e., the qth PL) is the following set of functions: If the qth-largest value does not exist, λ q (x) = 0. The 0th layer of a PL consists of the maximum function values among the collection of functions evaluated across the filtration. Similarly, the 1st layer of the PL consists of the second-largest values of the collection of functions evaluated across a filtration. Other layers are defined analogously. The persistence landscape λ of a barcode {[birth, death) l } t l=1 is defined as the sequence {λ q } of the functions λ q . We illustrate the pipeline from a barcode to a PL in Fig. 6. An advantage of PLs is that one can construct a mean PL for a set of landscapes. A mean landscape no longer corresponds to a barcode or a persistence diagram. However, one can define pairwise distances between two or more mean landscapes and use them to quantify the difference between two sets of barcodes. We will use the L 2 distance. One can also use a variety of statistical tools on PLs (Bubenik, 2015). Such calculations have been used for applications such as conformational changes in protein binding sites (Kovacev-Nikolic, Bubenik, Nikolic, & Heo, 2016), phase separation in binary metal alloys (Dłotko & Wanner, 2016), classification of music audio signals (J.-Y. Liu, Jeng, & Yang, 2016), and human motor learning (Stolz et al., 2017).

Persistence images
Another representation of topological features in PH calculations are persistence images (PIs), which are based on persistence diagrams and take the form of real-valued vectors 5 that one can use as an input into a variety of machine-learning approaches. The transformation from a PD to a PI is stable with respect to the 1-Wasserstein distance and maintains a clear and interpretable connection to the original PD (Adams et al., 2017).
We show a schematic of the mapping from a PD to a PI in Fig. 7, which depicts the various stages involved in the transformation. Recall that the output of a PH computation is usually a set of points (or intervals) corresponding to the birth and death times of each topological feature for a specified homological dimension. In Fig. 7(b), we show a sample PD 6 for the sample point cloud from Fig. 7(a). In  Fig. 7(d) over a uniformly-spaced grid. (We set the resolution so that we have a 50 × 50 grid of elements.) One can then reshape this final PI into a vector by stacking the columns (or, equivalently, the rows), as is often done in image processing. As described in Adams et al. (2017), the generation of a PI involves the choice of (1) a 2D probability density function to center at each point in the birth-persistence PD, (2) a resolution, and (3) a weighting function. The role of the weighting function is, when necessary, to suppress points in a PD that lie very close to the diagonal and are often construed as "noisy" features. For all PIs in this paper, we use the default settings for the code: 2D Gaussian probability density functions, a linear weighting function, and a 50 × 50 grid of elements. We choose additional parameters that are associated with these choices (e.g., the variance of the Gaussians) according to the defaults in Adams et al. (2016).

Software employed
For our PH calculations, we implement MATLAB code constructed using JAVAPLEX Adams, Tausz, and Vejdemo-Johansson (2014), a software package for PH 7 . For a given filtration of a 5 We use the term 'vectorization' for the production of such vectors. 6 This corresponds to the output after running a WRCF for dimension 1 on the functional network of the first sibling from the first time regime in our data set. 7 For an overview of available PH software and additional references, see Otter et al. (2017). simplicial complex, JAVAPLEX can output [birth, death) barcode intervals, representatives for each topological feature, and PDs. JAVAPLEX outputs PDs in the standard birth-death coordinates, from which one computes birth-persistence coordinates as (birth, death − birth). For the WRCF, we also use a maximal clique-finding algorithm (that is based on the Bron-Kerbosch algorithm (Bron & Kerbosch, 1973)) from the Mathworks library (Wildmann, 2011). For the analysis and interpretation of our barcodes, we use the PERSISTENCE LANDSCAPES TOOLBOX (Bubenik & Dłotko, 2017). We create PIs using the code available at Adams et al. (2016) with the default parameters.

Clustering methods from data mining and network analysis
Given output of PH calculations, one can use clustering methods. There are myriad ways to proceed. In this paper, we use a few different approaches. First, we apply the k-means clustering algorithm and community detection to examine whether we can separate the three subject groups based on the topological features of their functional networks. Second, we apply a linear sparse support vector machine (SSVM) to identify pixels in PIs to discriminate between the subject groups and examine which brain regions are generators of loops that help discriminate between groups. We describe these techniques in the following subsections.
Employing k-means clustering for subject-group separation The method of k-means clustering aims to produce a partition of a metric space into k clusters of points (Gan, Ma, & Wu, 2007). Suppose that there are µ data points in the metric space. One selects k of the µ points as "centers" and assigns all other points of a data set into clusters based on their closest center point. The "score" of such a clustering is the sum of the distances from each point to its nearest center. The desired output of k-means clustering is an assignment of points to clusters with the minimum clustering score. An exhaustive search for a global minimum is often prohibitively expensive. A typical approach to search for a global minimum is to choose a large selection of k initial centers uniformly at random, improve each selection of centers iteratively until the clustering score stabilizes, and then return the identified final clustering with the lowest score for each initialization. One iteratively updates the centers by setting the new center equal to the mean of the points assigned to the center from the current iteration. One can apply k-means on either a distance matrix (which one can calculate for either PDs or PLs) or on a set of input vectors (such as those obtained from a PI).

Community detection for persistence-landscape classification
Community detection is a method from network analysis that attempts to partition a network into sets (called "communities") of nodes that are more densely connected to themselves than to other sets of networks (Fortunato & Hric, 2016;Newman, 2018;Porter, Onnela, & Mucha, 2009). One can detect communities in either weighted or unweighted networks. In a weighted network, one finds larger total edge weight within communities than between them.
One can also use community detection to partition data (e.g., for classification) by studying a given distance matrix of data objects such as (mean) PLs. One interprets the n PLs as n nodes of a network and converts the pairwise distances into edge weights, where a large edge weight signifies closeness in the distance matrix and a small edge weight signifies a long distance between two landscapes. We convert the distance d(i, j) between landscapes i and j into an edge weight A ij between nodes i and j with the following formula: This yields an adjacency matrix A with elements A ij . Naturally, there are many choices for converting from pairwise distances to pairwise weights, and one has to be careful about how that influences community structure and other computations.
There are numerous methods that one can use for community detection in networks (Fortunato & Hric, 2016). One approach for decomposing a network into communities (i.e., for performing a "hard partitioning") is to seek a partition that maximizes an objective function Q. The quality function that we use is modularity where P (with elements P ij ) is a null-model matrix (which specifies the expected edge weight between nodes i and j), the resolution parameter γ is a factor that determines how much weight one gives to the null model, and δ(g i , g j ) = 1 if nodes i and j are in the same community g (i.e., if g i = g j ) and δ(g i , g j ) = 0 otherwise (Fortunato & Hric, 2016;Porter et al., 2009).
For our computations, we use the GENLOUVAIN package (Jutla, Jeub, & Mucha, 2011Mucha, Richardson, Macon, Porter, & Onnela, 2010), which maximizes Q using a variant of the Louvain algorithm (Blondel, Guillaume, Lambiotte, & Lefebvre, 2008) to algorithmically detect communities in our (mean) PLs. We vary the weighting factor γ (which is often called a "resolution parameter") to compare results for different values of γ.
Linear sparse support vector machines for discriminatory feature selection The 1-norm, regularized, linear support vector machine (i.e., SSVM) classifies data by generating a separating hyperplane between data points that depends on very few input-space features (Bradley & Mangasarian, 1998;Zhang & Zhou, 2010;Zhu, Rosset, Hastie, & Tibshirani, 2004). A hyperplane is a flat surface that cuts an ambient space into two parts. One can use an SSVM to identify discriminatory features between different groups of data points. One implements linear SSVM feature selection on data points in the form of vectors, so we can use it on our PIs to select "distinguishing pixels" during classification. In a PI, a distinguishing pixel is a bounded region in the birth-persistence coordinate system. For clarity, we use the term "distinguishing pixel' to signify a region selected by SSVM and a "feature" to refer to a topological feature from a PH computation. During the analysis of our results (see Results from analysis of PIs), we aim to match distinguishing pixels to their corresponding features.
We apply a "one-against-all" (OAA) SSVM to dimension-1 PIs from each subject to identify pixels in PIs that can discriminate between the subject groups. In a one-against-all SSVM, there is one binary SSVM for each class to separate members of that class from members of all other classes. In our case, this amounts to defining three hyperplanes: one that separates patients from controls and siblings, one that separates siblings from patients and controls, and one that separates controls from patients and siblings. We use a 5-fold cross-validated SSVM. One specifies an optimal separating hyperplane by a normal vector. The values of the components of the normal vector are called SSVM weights. We select distinguishing pixels for each classifier by retaining the vector components (which are pixels in this application) with nonzero SSVM weights, ordering the nonzero SSVM weights by decreasing magnitude, and discarding SSVM weights when the ratio of successive SSVM weights drops below a user specified tolerance; for details, see Chepushtanova, Gittins, and Kirby (2014). Given a set of distinguishing pixels, we can see for each subject whether there are any loops in the associated functional network that are born and persist in the corresponding bounded PI region. If there are loops in this region, we can identify a set of brain regions that are representative of that loop in the network. We are thereby able to leverage PIs to obtain (biologically) interpretable information about the involvement of different brain regions in the task (as measured with fMRI) for different subject types.

RESULTS
We now present our results of our PH computations to examine loops in functional brain networks. We focus exclusively on topological features in dimension 1 and, except in Results from Betti curves, we perform our computations on all four times regimes as part of one data set rather than separating the data for each time regime. Aside from the aforementioned exception, we run our PH computations on four functional networks per subject. From the PH output, we create either PLs or PIs. We then perform our computations either on (i) the full data set of PLs or PIs of 281 subjects and four time regimes (which gives 1124 landscapes or images, respectively, for the data set) or on (ii) the 12 subject-group means of the landscapes/images (from three subject groups with four time regimes each). We indicate which case we are examining in the relevant subsections. In Results from Betti curves, we consider one full time series for each subject; in other words, we study one functional network per subject.
For both PLs and PIs, we find that there seem to be differences in the topological features of the functional networks between subject groups, although we only observe these for PLs when examining means across groups. To illustrate limitations of the methods, we also discuss results in which we were unable to find differences between subject groups.

Results of k-means clustering on PLs
Using k-means clustering on mean PLs, we are able to separate siblings of schizophrenia patients from controls and patients. For these calculations, recall that we use all four time regimes in each of the 12 mean landscapes.
We construct mean PLs from the 1D barcodes (i.e., the barcodes that represent loops in the networks) for each time regime and each subject group. We obtain 12 mean landscapes and exclude infinitely-persisting bars, because all of our landscapes include (persistent) infinite features, and these tend to dominate the first several layers of the landscapes. Other researchers have excluded layers of landscapes (e.g., the first twenty) to filter out "topological noise" (Patrangenaru, Bubenik, Paige, & Osborne, 2018). In our case, the presence of infinite features prevented us from discriminating between landscapes based on pairwise distances between them.
We calculate a pairwise L 2 distance matrix of the mean landscapes, and we then perform k-means clustering on the distance matrix (which has 12 × 12 entries). For k = 3, we obtain the expected division of the mean landscapes into patients, controls, and siblings. Although the fact that one can separate the three cohorts based on fMRI data is not a new finding -see We also perform k-means clustering for k = 2. Surprisingly, we find that the patients and controls are grouped in one cluster for all time regimes, whereas the siblings are in a separate cluster for all time regimes. We show the mean landscapes and clusters in Fig. 8. For larger values of k, we do not observe a clear subject-group separation. To compare our results with ones from other clustering methods, we also apply average linkage clustering to the distance matrix and perform community detection on networks that we construct from the distance matrices (as described in Community detection for persistence-landscape classification). We also obtain the same qualitative result for these two methods. For community detection, we observe a clear separation for resolution-parameter values γ = 0.82, 0.83, . . . , 1.14 into two communities (the siblings versus the patients and controls). Our results appear to indicate that the sibling cohort is particularly distinct from the other two cohorts, as compared to any other pairwise comparison among the three cohorts, with respect to their loop topology in the functional networks.
We also perform a permutation test on the mean PLs for each time regime. In the permutation test, we regroup the individual landscapes into three groups uniformly at random, create a new mean landscape for each newly assigned group, and calculate the pairwise L 2 distances between them. We then count how many of the L 2 distances of the new groups are larger than the ones that we observe when using the mean landscapes of the three subject groups. We use 10000 permutations to obtain our results, which we summarize in Table 1. Interestingly, for time regimes 1 and 2, we find significant distances between the patient and sibling mean landscapes, whereas the p-values for time regime 3 and 4 suggest that the distance is not significant (even though the p-values are comparably small). The distance between the mean landscapes of the controls and the siblings appears to be significant for time regime 2, but this does not appear to be the case for the other time regimes, although the p-values are again much smaller than for the distances between the mean landscape of the patients and controls. Thus, for the controls and the patients, there are many other divisions into two groups that lead to more extreme distances between the mean landscapes than what one obtains by simply assigning them to a control group and a patient group.
To see if we can further support our result from k-means clustering for k = 2, we artificially group the controls and patients into one group to create a mean landscape and again perform a permutation test to verify whether the distance between the mean landscapes for the two groups is significant. In Table 2, we show the p-values that we obtain with 10000 permutations. For time regime 2, we obtain a significant distance, but the p-values for time regimes 1, 3, and 4 are about 0.1. Given the artificial grouping of the two subject groups, we construe these values as small, although they are not statistically significant.

Results of community detection on a distance matrix from individual PLs
We construct PLs from each of the 1D barcodes, which we calculate by examining each subject in each of the four time regimes, and we calculate the L 2 distance matrix for the resulting 1124 PLs. We again use the distance matrix to construct a network between the PLs, and we detect communities in this network by maximizing modularity. For γ = 0.92, 0.93, . . . , 1, we obtain a separation into two communities. The partition that is closest to what we observe with 2-means clustering for the mean landscape distance occurs for the resolution-parameter value γ = 0.93. We summarize our results in Table 3.
We also apply k-means clustering and average linkage clustering to the distance matrix from the individual PLs (results not shown). Of all classification methods that we perform on these distance matrices, community detection appears to perform best at "separating" the subject groups, although we do not observe a very clear separation.

Results from analysis of PIs
We find that PIs can identify discriminatory topological features across the three subject groups considered. We generate PIs for each of the subjects for each of the four time regimes for the 1D persistence diagrams. We set the resolution, probability density function, and weighting function to the defaults in the PI code available from Adams et al. (2016).
However, there is an additional pair of values that one must choose from the data that one is analyzing  Figure 9: (a) Mean vectorized PI for each subject group generated using the maximum values of birth and persistence across all subjects to create all PIs. We then take the means over the PIs of each group. (b) Mean vectorized PI for each subject group generated using maximum values of birth and persistence determined by calculating the maximum birth and persistence for each of the three groups separately and using this group-specific information to create the PIs for each subject within its group.
We then take the means over the PIs of each group.
Alternatively, if we use a priori knowledge of subject-group membership and fix the maximum birth values separately for each subject group (based on the collection of PDs that we computed separately for each subject group), we can discriminate between the three subject groups. This provides a first interesting observation from PIs: the maximum birth time which corresponds (or almost corresponds, in exceptional cases in which multiple edges have exactly the same weight) to the number of pairs of regions in the brain with positive functional connectivity, appears to contain nontrivial information.
(Recall that we do not add edges that correspond to negative Pearson correlations.) On the right of Fig. 9, we show the mean vectorized PI for each subject group, where we set the maximum birth and persistence values separately for each subject group (instead of setting the maximum birth and persistence values to be the same for all subjects). Observe that the sibling and control means both have two humps, whereas the patients have one that is clearly discernible. Similarly, in Fig. 10, we observe two patches along the prominent diagonal with high intensity for the sibling and control means; however, in the bottom row, we It is also worth noting the locations of the local maxima for each subject type. Relative to the maximum values across each class, groupings of loops occur at different locations. From the values of the vectors, we see that the controls and patients have more similar maximum magnitudes than do the patients and their siblings. Based on these similarities and differences, we conclude that we are able to accurately separate the populations using PIs. Surprisingly, despite the pronounced difference in PI performance when we use different maximum values for each class, the distributions of the maximum birth times and persistences for each subject type are not statistically-significantly different from each other. In Fig. 11, we show Gaussian fits to the set of maximum birth times and maximum persistences for each subject type. Observe the strong similarity across all classes and the especially close similarity between the control and patient distributions. Because the maximum values are linked closely to the preprocessing of the data, it is important to conduct further research into how to account for these observations. The results that we discuss subsequently are based on the PIs that we generate using a priori membership knowledge. As we discussed in Linear sparse support vector machines for discriminatory feature selection, it is possible to apply a linear SSVM to the set of PIs to identify distinguishing pixels that allow interpretation of our classification results. Using a one-against-all SSVM with 5-fold cross validation, we obtain a 100% classification accuracy. In Fig. 12, we show the distinguishing pixels from each of the three binary classifiers. We obtain the complete accuracy using the set of 41 unique pixels from the total of 625 pixels in the PIs. Again, we emphasize that each of these pixels corresponds to a bounded region in the birth-persistence plane.
Interpreting these distinguishing pixels requires discussing relations with particular regions in the brain. We make these connection as follows: For each subject, it is possible to determine whether or not a topological feature (in our case, a loop) in a filtration of a network exists in the bounded region of the birth-persistence plane that corresponds to a particular distinguishing pixel. If a loop does exist, one can identify a set of brain regions that comprise the loop (i.e., representatives of this loop; see Discussion).
We are particularly interested in brain regions that are consistently involved in the generation of particular loops across subjects. We identify the set of nodes, which we call top node(s) 8 , that are involved in the generation of loop(s) for each distinguishing pixel in each of the four of the time regimes for each subject. We then create histograms of the union of the nodes that we select in this fashion to examine the relative importances of top nodes across each subject type.
In Fig. 13, we present the relative importances of different brain regions for each pixel. In the left panel, we show the top nodes for each subject type based on frequency (proportion of the subject type for which that top node is involved in the generation of a loop in the distinguishing-pixel region). In the right panel, we show the proportion of the subjects for which the top node(s) is (are) present. The vertical gaps in each plot signify that there are no nodes that are consistently involved in loops for that distinguishing pixel. We can make several observations from the left panel of Fig. 13. First, there are only five distinguishing pixels for which we find top nodes for the patients. We are thus unable to predict which brain regions are involved in loops in the functional networks during the given task for schizophrenia patients. By contrast, there are many distinguishing pixels for which we find top nodes for the siblings.
The control group lies between the other two in terms of its number of distinguishing pixels with top nodes, but there are still few top nodes, relative to the number of distinguishing pixels that have top nodes. In Tables 4-7 (see also Figs. A.1-A.3 of Appendix A: ), we indicate which brain regions (as well 8 One can construe our calculation of top nodes in a similar spirit as calculations of node centralities (Newman, 2018).
as their locations) we identify as top nodes. We include only the distinguishing pixels at which top nodes exist within a cohort.
An equivalent way to identify a top node is to calculate the percentage of a given subject class that has a topological feature in the corresponding pixel region (see the bar graph in Fig. 13) and determine if a specific node is in the group of representatives for all of the subjects that have a topological feature in the pixel region. If a node occurs in the list of representatives for a topological feature for every subject of the class with a topological feature in the pixel region, then it is identified as a top node. Thus, when considering Tables 4-7, it is possible to see the same brain regions listed for more than one distinguishing pixel index. This is also reflected in Fig. 13 by the occurrence of multiple markers along the same horizontal line.

Results from Betti curves
Finally, we also study Betti curves, introduced in Giusti et al. (2015), which describe Betti numbers and their changes across a filtration. We use the entire time period (i.e., one time regime, rather than four separate ones) of the experiment. In all other respects, we construct the functional networks as we described previously (see Functional connectivity). We compute the mean and standard deviation across the Betti numbers for dimension 1 (i.e., the number of loops) for each cohort in each filtration step. We find that, apart from a slightly larger standard deviation in the patient cohort, the Betti curves for the three groups look essentially the same. We show our results in Fig. A.4.

DISCUSSION
We applied methods from persistent homology to analyze loops in functional brain networks of schizophrenia patients, siblings of schizophrenia patients, and healthy controls. We constructed both PLs and PIs, and we analyzed them using several clustering techniques.
We observed topological differences in the functional brain networks of schizophrenia patients, siblings of schizophrenia patients, and healthy controls with respect to the loops in their networks. We also found that PLs and PIs have different practical advantages and disadvantages when applied to the same data set, and these insights may useful for interpreting the results of persistent-homology computations in networks in diverse applications.     Computing PLs gave interesting results when comparing mean PLs of the cohorts but not when comparing individual landscapes of the subjects. Using mean PLs, we were able to separate the sibling cohort from the other two subject groups in each of the four time regimes. This is supported by the p-values that we obtained for the distances between the mean landscapes of the sibling cohort versus the controls and patient cohorts, though not all of our p-values are statistically significant. The shape of the mean PLs seems to suggest that loops that occur in the functional brain networks of siblings are on average more persistent than those in the functional networks of controls or patients. This could imply either that loops in the networks of siblings tend to be longer or that the third edge between three nodes has a small edge weight and thus that three brain regions with a large pairwise Pearson correlation between one region and two of the other regions do not necessarily imply that there is a large correlation between the other two brain regions; this facilitates the creation of a loop structure in the filtration.
(Recall that we need at least four nodes for our loops.) To examine this issue further, it may be useful to analyze cross links in the functional networks, as in Bassett, Wymbs, Porter, Mucha, and Grafton (2014).
For the above computations and their interpretation, we need to take into account that we did not include infinitely-persisting loops (which persist until the end of a filtration). We also only include positive edge weights in our networks, so we only analyzed loops that arise from brain regions with positive pairwise Pearson correlations.
Although we were able to obtain interesting insights about the data using mean PLs, we did not find interpretable results from comparing individual landscapes, and only being able to use the mean landscapes reduces the amount of information the we can obtain from this approach. By contrast, using individual PIs and SSVMs allowed us to separate the entire set of subjects (with 100% accuracy) in each of the four time regimes. In previous work, Anderson and Cohen (2013) obtained 65% accuracy for schizophrenia classification by applying machine-learning techniques to functional brain networks. A study on Alzheimer's disease using PLs (Bubenik, 2016) and machine learning attained a 73% separation of diseased and healthy subjects. It is important to note, however, our results are based on using a priori knowledge of group membership, including specifically the maximum birth times of loops within subject groups. These birth times seem to include nontrivial information, which is important to pursue further in future studies. This a priori knowledge is tied closely to the choice of statistical thresholding when preprocessing fMRI data. Developing a statistical model that can classify a novel subject based on a PI representation thus also requires further exploration into how to choose such a threshold.
Computing PIs also allowed us to identify brain regions with consistent involvement in loops in the functional networks within subject cohorts. Of the three cohorts, we found that siblings have the highest level of consistent brain-region involvement in the performance of the mental task in this study across the four time regimes. That is, regions that are involved in loops for siblings in one of the time regimes are more likely to also be involved in loops in other time regimes than is the case for patients or controls. It is particularly noteworthy that the number of brain regions that are consistently involved in the separation of the three cohorts is larger in the siblings of schizophrenia patients than in the healthy controls. We view variable involvement of brain regions in loops as a notion of neurological 'flexibility'. Various works have studied concepts of brain flexibility using community structure Braun et al., 2016). In those studies, flexibility was defined differently -based on how often a brain region changes its allegiance to a community of nodes over time, so it does not use loops directly -but it is noteworthy that Braun et al. (2016) observed that relatives of schizophrenia patients have large flexibility than healthy controls. In our work, we found that a specific group of brain regions leads to the separation of the three subject groups when using PIs and observed for the schizophrenia patients that the regions that lead to a separation consistently in each of the four time regimes are fewer in number than for the siblings and controls. Braun et al. (2016) reported that there is larger node flexibility in network organization of schizophrenia patients than in healthy controls. Additionally, Siebenhühner, Weiss, Coppola, Weinberger, and Bassett (2013) observed a greater variability in temporal networks constructed from Magnetoencephalography (MEG) data of schizophrenia patients than those from in healthy controls.
We did not observe any differences between the four time regimes, which each consist of responses to both a 0-back task and a 2-back task, in any of our calculations. No significant changes seem to be occurring in the persistence or appearance of loops in the networks over the course of the data measurement. Additionally, when studying experiments as a single regime using Betti curves, we did not observe a clear difference between the cohorts.
Schizophrenia has a high genetic determinism, so siblings of schizophrenia patients have a significant genetic risk of developing the disease themselves (Bertolino et al., 2009), and it has been demonstrated that they have abnormalities in their structural neuronal networks (Collin, Kahn, de Reus, Cahn, & van den Heuvel, 2014). Although our results that functional brain networks constructed from fMRI measurements of siblings differ both from patients and from healthy controls do not agree completely with the current standard in the literature, other studies have also reported that the features of fMRIs of siblings of schizophrenia patients differ from both schizophrenia patients and controls. For example, Callicott et al. (2003) observed in an fMRI study that there was no difference in task performance between healthy siblings of schizophrenia patients and healthy controls, yet they detected a physiological similarity between the sibling cohort and the schizophrenia patients in the corresponding fMRI data.
Similarly, Sepede et al. (2010) observed using fMRI data from a different data set that healthy siblings of schizophrenia patients exhibit differences in brain function to schizophrenia patients, although they did not differ significantly in task performance.
It was demonstrated recently that schizophrenia patients undergo a cortical normalization process over the course of the disease (Guo, Palaniyappan, Liddle, & Feng, 2016), and a current study on blood samples of schizophrenia patients (Scolamiero, 2016) has also observed that the measurements for patients who have had the disease for a long time are more similar to the measurements of healthy controls than to those of early stage patients. We would need further phenotypic information to assess whether any of the aforementioned studies can be connected more directly to our observations.
To give another cautionary note, one needs to take into account that there are difficulties when interpreting the information about node participation in loops from computations of PH, as the software used for such computations (including, specifically, JAVAPLEX, which is what we used) only finds representatives of the loops. These representatives are not determined optimally, and they need not be 'geometrically nice' (Adams & Tausz, 2015). For example, in these calculations, one often encounters double loops or even triple loops as generators for one loop in a functional network. Selecting a basis of homology generators that behaves in a biologically representative way corresponds mathematically to solving a problem known as the "optimal homology-basis problem", which is not trivial and can be NP-hard (Erickson, 2012).
Despite these difficulties, our list of discriminating nodes provides a useful starting point for further investigations into neuronal abnormalities in functional networks of schizophrenia patients.
Another important issue is that we preprocessed the data for our study. This is very common when working with fMRI data, but such steps are not uncontroversial, and studies on functional connectivity in schizophrenia patients have found contradictory results depending on whether or not one performed global signal correction (Fornito & Bullmore, 2015;Fox, Zhang, Snyder, & Raichle, 2009). It is also relevant to keep in mind that the choice of functional connectivity measure can influence results (Smith et al., 2011). We chose to use a Pearson correlation due to its simplicity and the fact that it is a widely used measure of functional connectivity (Bassett, Nelson, Mueller, Camchong, & Lim, 2012;Wang, Metzak, Honer, & Woodward, 2010). Many other choices are also available.

A: APPENDIX
We now give some additional details on a few calculations and results.

Top brain regions in the distinguishing pixel birth-persistence bounds
In this section, we illustrate the top nodes (i.e., top brain regions) in the distinguishing pixel birth-persistence bounds for the three cohorts. Recall that each pixel in the birth-persistence plane corresponds to a bounded region of the original PD (i.e., in the birth-death plane). We show results for siblings in Fig

Supplementary Tables
In Tables A.1-A.5, we give the numbering of the brain regions and their corresponding IDs.

Betti curves
In Fig. A.4, we show the results of computing Betti curves.