scoreInvHap: Inversion genotyping for genome-wide association studies

Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor.

Introduction Frequent polymorphic inversions contribute to adaptation and phenotypic variation [1,2]. However, their global contribution to complex traits remains unknown because there is no specific high-throughput technology to genotype inversions in large cohorts. Previous methods have successfully used SNP data to detect the presence of polymorphic inversions by linkage differences at the breakpoints [3][4][5] as well as to infer inversion genotypes from the mapping of inversion status to haplotype groups, when the breakpoints are known [6][7][8]. While inversion calling can be performed by the congruence of different SNP signals [8], only a limited amount of experimentally-validated inversion genotypes have been available for assessing reliable inferences in large cohorts. As such, large association studies that infer inversion genotypes from SNP data have been limited to three human inversions [9][10][11].
Those inversions that have been successfully genotyped at large scale are either tagged by SNPs (e.g. inv-17q21.31) or their genotypes fully explain the clustering of individuals in the first principal components (PCs) of the SNPs within their breakpoints, such as inv-8p23.1 [6] or inv-16p11.2 [10]. In the later cases, the subject clusters correspond to different haplotypegenotypes (e.g. A/A, A/B, B/B) of divergent haplotype groups (A and B), supported by the suppression of recombination between inversion states (inverted: I, non-inverted: N) [7]. Few individuals can be then selected for costly experimental genotyping, with methods like FISH, to help labeling the clusters according to the inversion-genotypes (I/I = A/A, I/N = A/B, N/ N = B/B). The genotypes of the rest of the subjects are then inferred by haplotype-genotype cluster membership [6]. This unsupervised inference, with posterior experimental labeling of the clusters, has allowed the genotyping of inversions in large cohorts [6][7][8]. However, this approach is still very limited because individual inferences are based on the analysis of entire population samples, making them computationally inefficient [9] and forcing the reanalysis of the whole dataset when new individuals are included. In addition, it has been observed that some inversions exhibit multiple clusters that exceed the three inversion genotypes and therefore their labeling is unclear [10,12]. Current methods do not address the needs required for the meta-analyses of inversion association studies that include efficient and reliable genotyping in large population samples and inversion-genotype harmonization across different sources of SNP data.
To tackle these problems, we developed scoreInvHap, a novel inversion-genotype classifier that enables the inclusion of inversions in regular GWASs. scoreInvHap compares how similar the SNPs of a new individual are to those in reference haplotype-genotypes, previously linked to reported experimental inversion-genotypes. Our current implementation enables the efficient and reliable genotyping of 20 human inversions in large cohorts. We studied the performance of the method on the inversion calling of inv-8p23.1 and inv-17q21.31 against two other methods (invClust and PFIDO) in a wide range of data types: whole genome sequencing, four SNP microarray studies and two exome datasets. We also evaluated the performance of scoreInvHap in inv-7p11.2 and inv-Xq13.2 [13], showing that scoreInvHap can confidently call inversions with multiple haplotypes. We illustrate how scoreInvHap can be used to replicate previous associations of inv-8p23.1 and inv-17q21.31 with autism and schizophrenia, and to perform a genome-wide association study of 15 inversions on breast cancer.

scoreInvHap for 20 human inversions
scoreInvHap can generate reliable and scalable inferences for 20 human inversions, whose experimentally validated inversion-genotypes are highly concordant with the haplotype-genotypes of the European individuals of the 1000 Genomes Project [14] (S1 Text, S1 Dataset). These inversions can be genotyped with scoreInvHap in any GWAS of European individuals, showing high prediction accuracy on experimental genotypes not used in the classifier (Table 1, S1 Table). Six of these inversions cannot be called with previous methods as they support more than two inversion-haplotypes, revealed by the presence of more than three clusters in the first PCs of the SNPs within the inverted region. We demonstrated that haplotype-inversion labeling for these inversions is recovered at higher PC dimensions, where subject clusters are reliably mapped to numerous haplotype-genotype groups (Fig 1). Using a coalescent simulator for inversions [15], we observed that the existence of more than two haplotype groups is a common feature (S1 Fig).

scoreInvHap against current methods
We studied the performance of scoreInvHap on the inversion calling against invClust [8] and PFIDO [6]. First, we assessed the methods' accuracies to predict experimental genotypes in the European subjects of the 1000 Genomes project. We found that invClust and PFIDO had low accuracy for inversions with more than 2 haplotypes (Table 1) or when the first MDS component did not completely capture the inversion genotype groups (S1 Text). Second, we tested how sample size affected the calling accuracy of the methods for inversions inv-8p23.1 and inv-17q21.31. We resampled varying number of individuals from 1000 Genomes Project and observed that scoreInvHap had high accuracy even when using only one individual reference per haplotype-genotype, whereas invClust and PFIDO required at least 20 and 30 subjects for accurate classification (S2 Fig). We also run the three methods on the African individuals of 1000 Genome project for inversion 8p23.1 and observed that scoreInvHap and invClust had lower accuracy (85%), while PFIDO was unable to return a classification. These results can be explained by a low concordance between haplotype-genotypes and experimental inversion-genotypes. Nonetheless, we could not completely rule-out experimental error that penalized the methods' accuracy (S3 Fig).
Third, we compared the three methods on the inversion calling of inv-8p23.1 and inv-17q21.31 (Table 2, S1 Table) among studies with different sources of SNP data: four SNP microarray studies and two exome sequencing datasets (S2 Table). The four SNP microarray studies came from trio studies, so we could evaluate the transmission errors. Although the three methods returned similar inversion frequencies (S4 and S5 Figs), we observed that scor-eInvHap and invClust had very low transmission errors while PFIDO underperformed ( Table 2). We did not find any substantial differences among the accuracies of the methods in imputed data (S6 and S7 Figs, Table 2). Inversion calling in the UK10K exome data allowed us to demonstrate the suitability of the method under low SNP coverage. We observed that scor-eInvHap returned consistent inversion frequencies with those observed for the Europeans of the 1000 Genomes, while PFIDO's frequencies were significantly different and invClust failed to identify the inv-8p23.1 genotype clusters (Fig 2, S8 Fig).
Finally, we compared the runtime of the three methods on one of the trio datasets (SSC 1Mv3) and found that the parallel version of scoreInvHap was the fastest method (S3 Table).

scoreInvHap for inversions with multiple haplotypes
We then demonstrated that the method is efficient in calling genotypes of inversions with multiple haplotype groups. We specifically studied the performance of scoreInvHap to call group for the standard allele shows two possible haplotype subpopulations (A and C). (C-D) If a fourth haplotype group (D) is supported by the inversion, the clustering pattern on the first three MDS components should reveal a tetrahedron pattern where the inverted allele can be mapped to either one (1+3), two (2 +2) or three (3+1) haplotype groups.
https://doi.org/10.1371/journal.pgen.1008203.g001 New method to genotype inversions inversion genotypes of inv-7p11.2 and inv-Xq13.2, the largest inversions with multiple haplotype-genotypes (Table 1, S1 Table). We observed that scoreInvHap classification matched true inversion-genotypes under low SNP densities (10% of original SNP coverage), for both inversions (S9 Fig). Testing the performance in SNP array data, we observed consistent inversion frequencies with those found for the Europeans of the 1000 Genomes project (S4 Table, S10 and S11 Figs) and found low transmission errors (S5 Table).

scoreInvHap in association studies
We applied scoreInvHap to validate initial association analyses of inv-17q21.31 and inv-8p23.1 with autism (cases/controls = 604/5,529) and schizophrenia (cases/controls = 1,308/5,528), using the exome data of UK10K studies [16]. Note that scoreInvHap is the only method that allows testing associations with inv-8p23.1 since inversion calling from such a low coverage data could not be performed with other methods (Fig 2). We tested the associations under three inheritance models, adjusting by genome-wide PCs (Fig 3A). We replicated a significant association between schizophrenia and inv-8p23.1 (additive OR = 0.91, P = 4.9×10 −2 ) and inv-17q21.31 (additive OR = 0.84, P = 1.4×10 −3 ). However, we did not replicate the association with autism ( Fig 3A) where we could not rule-out remaining differences in genetic ancestry between the studies nor the lack of power for a study with 604 cases and 4358 controls to detect OR~1.12 (power = 0.466), as computed with Genetic Association Study (GAS) Power Calculator [17]. Finally, to illustrate a first genome-wide association study of experimentally-validated human inversions, we tested the association between breast cancer and 15 inversions of Table 1 that could be reliably called in a GWAS study of 1,061 cases and 1,033 controls [18,19]. We did not detect any significant association adjusting for genome-wide PCs, age and multiple comparisons ( Fig 3B). However, we did observe associations at a nominal significance level for inversions at 7p11.2 (additive OR = 1.14, P = 4.2×10 −2 ), 6p21.33 (recessive OR = 1.36, P = 1.8×10 −2 ) and 6q23.1 (recessive OR = 4.30, P = 3.5×10 −2 ), which should be further investigated in larger association studies. These applications demonstrate that scoreInvHap is a robust genotyping tool of inversions, easy to use on already available GWAS data.

Discussion
We developed scoreInvHap, a new bioinformatics tool to call inversions from SNP data. Its main advantage is the quick call of inversion genotypes from SNP data at the individual level with consistent genotype labeling. As a consequence, inversion genotyping is readily harmonized. Another important advantage is that the method allows the calling of inversion-genotypes using different sets of SNPs. As a result, inversions can be called on datasets with lower New method to genotype inversions SNP coverage than the dataset used for the references as well as to call inversion-genotypes on individuals with missing SNP genotypes.
Previous bioinformatics methods relied on applying a dimensionality reduction technique to SNP data followed by clustering the individuals. Although these methods have been used to associate chromosomal inversions to phenotypic traits [20], they have some limitations. First, these methods partition a population sample into inversion-genotypes but require external information for labeling the inverted-homozygous group, challenging the harmonization of inversion calling in multi-centric studies. Second, the methods are computationally intensive and are inefficient for calling inversion genotypes in large cohorts [9]. Finally, they require a minimum number of individuals to compute accurate calls, so the whole dataset needs to be recalled to include inferences in new individuals. In contrast, the link between haplotypes and inversion status is previous to scoreInvHap classification. Consequently, scoreInvHap classification is readily comparable across different studies and genotyping techniques (from SNP array to exome data), allowing the harmonization of inversion calling in large meta-analyses. As the method classifies each individual separately, further gains in computational efficiency can be obtained from processing large datasets by batches allowing the genotyping of multiple inversions to be included in association studies.
scoreInvHap is the only method designed to genotype inversions with multiple haplotypes, whose abundance in the human genome is likely underestimated. We found inversions with multiple haplotypes on simulations under neutrality and in the inversions reported in invFEST and 1000 Genomes. This result suggests that the less common presence of only two haplotypes in inversions inv-8p23.1 and inv-17q21.31 could be due to the reported selection process that occurred in these regions [6,11,21]. Inversions supporting three or four haplotypes have already been described in the literature [10,12]. For inversions inv-7p11.2 and inv-Xq13.2, Aguado and colleagues generated inversion-haplotype trees [13]. Based on the major branches of these trees, they observed that both inversions support four possible haplotype groups, where inv-7p11.2 supports two standard and two inverted haplotypes and inv-Xq13.2 supports three standard and one inverted haplotypes. The tetrahedron structure that we observed for the first three MDS components of these inversions clearly matched the phylogeny of the haplotypes. Sanders and colleagues described more than 100 polymorphic inversions based on a single cell sequencing method [22]. Most of these inversions have not been previously detected with bioinformatics methods designed for inversions with two haplotypes. Therefore, an assessment of the haplotype complexity of these inversions for inference in large association studies is warranted. Further research is also needed for establishing the frequency of complex haplotype patterns in inversions and for elucidating the mechanisms involved in the formation of divergent haplotype groups, supported by the presence of an inversion polymorphism.
scoreInvHap, nonetheless, also has limitations. In particular, its performance depends on the representativeness of the reference inversion-genotypes. For inversions inv-8p23.1 and inv-17q21.31 and European samples, we captured the haplotypic variability of the inversions using only one reference per inversion genotype. However, scoreInvHap needs to increase the number of experimental references for inversion calling in population samples with higher within haplotype variability, such as inv-8p23.1 in Africans. Further studies are needed to determine the accuracy of the method in inversions with larger genetic variability or populations with admixture.
The inversion genotyping by scoreInvHap, like other SNP based methods, is indirect: it does not detect the change of DNA orientation but relies on the haplotype structures generated by inversions. Therefore, it has some clear limitations against experimental methods to detect inversions, such as iPCR [13], next generation sequencing or single strand sequencing [22]. In particular, scoreInvHap cannot detect small, recent or de novo inversions, as these inversions do not support different haplotype groups. In addition, scoreInvHap will produce wrong classifications for recurrent inversions, where the same haplotype can be found in standard and inverted chromosomes. Despite these limitations, scoreInvHap has the advantage of working with stringent conditions of SNP coverage and sample sizes.
All inversions in Table 1 can be genotyped with scoreInvHap using SNP data in common formats, like PLINK, snpMatrix or vcf. Performing the genotyping of new inversions in large studies, in human and other species, can be achieved by creating their classifiers within scor-eInvHap. To build a new classifier, the first step is to demonstrate that a reference sample of individuals can be clustered into haplotype-genotypes. The second step is to show that haplotype-genotypes are unambiguously labeled by experimentally inversion-genotypes. Finally, the reference haplotype-genotypes can be included in the program for genotyping the inversion in new individuals.
We showed how scoreInvHap inferences can be used to perform association studies, but additional analyses are needed to understand how the inversion affects the phenotype. One option is that the positional change caused by the inversion affects the regulation of nearby genes, leading to phenotypic differences between individuals. Another option is that the inversion captures the allele (or a combination of alleles) that are the causal variants. Structural variants, such as deletions, copy number alterations or complex re-arrangements, can also be captured by the inversion status and produce the phenotype. Only further analyses can elucidate the mechanism linking an inversion to a phenotype.
In summary, scoreInvHap can reliably perform inversion calling for large multi-centric studies with SNP genotype data. The method has been implemented for the calling of 20 human inversions which can be immediately included in any GWAS, to forward our understanding of the role of inversions in complex traits. The method is easily extended to other inversions, in humans and other species, as soon as experimental inversion genotypes become available.

Inversion-haplotype mapping
Inversions suppress recombination within the inverted segment when heterozygous. Therefore, for an ancient non-recurrent inversion, two divergent haplotype groups emerge for each inversion status [7]. Haplotype groups that map to a single inversion status are defined as inversion-haplotypes. In this model, standard and inverted homozygous can be considered as subpopulations where chromosomes belong to the same haplotype group while individuals that are heterozygous for the inversion belong to a 1:1 mixture of standard and inverted chromosomes. This mixture can be seen in the first Multi Dimensional Scaling (MDS) components of the SNPs within the inverted region. In the simplest cases (i.e. inv-17q21.31 and inv-8p23.1), two clear haplotype groups (A and B) emerge for each inversion status (N and I), resulting into three differentiable clusters, or haplotype-genotypes, on the first MDS component (Fig 1A). Heterozygous haplotype-genotype individuals (AB) are visualized equidistant to the homozygous haplotype-genotype groups (AA/BB). Therefore, a univocal map, given by experimental validations, between inversion status and haplotype groups can be established (A = N, B = I). However, in other inversions, more than two haplotype groups have been observed. Inversion at 16p11.2 shows, for instance, a pattern consistent with two haplotype groups (A,C) in the standard configuration and one haplotype group in the inverted allele (B) [10]. In the first MDS components of the SNPs in the region, one can see that heterozygous haplotype-genotype clusters (i.e. AC) are equidistant to their respective homozygous haplotype-genotype clusters (AA and CC), forming a triangular 6-cluster pattern (Fig 1B).
Experimental validation is therefore needed to correctly assign an inversion status to each haplotype group (A,C = N, B = I). More complex scenarios are also possible, where four haplotype groups are observed in the region, supporting ten clusters in the first three MDS components consistent to all possible haplotype-genotypes. Experimental inversion-genotypes are then needed to identify the inversion status to which the haplotype-genotypes map.

Selection of reference haplotype-genotypes
We studied 59 inversions reported in the 503 European individuals of the 1000 Genomes projects. For each inversion, we perform an MDS analysis for all SNPs within the inverted region and studied whether the clustering conformed to a model where haplotype-genotypes could be unambiguously defined. We, therefore, selected the inversions that followed any of the patterns illustrated in Fig 1, increasing the number of MDS components, from 1 to 3, until one of the patterns was clearly identified. This heuristic procedure is described in S1 Text and can be used as a guideline to extend scoreInvHap to new inversions. Each cluster of individuals was then identified as a reference group for a given haplotype-genotype to which new individuals are compared for inferring their own haplotype-genotypes. Consequently, at least one reference individual is needed for each haplotype-genotype. The haplotype-genotypes were then mapped to experimental inversion-genotypes to determine their inversion status. At this stage, we measured the degree of concordance between the haplotype and inversion genotypes by their percentage of agreement across individuals, accounting for the cases where more than one haplotype group was found in a single inverted status.

Algorithm description
We developed scoreInvHap for inversions that could be consistently mapped to haplotypes. Therefore, scoreInvHap is suitable for those inversions for which the clustering pattern presents no haplotype sharing between the inverted and standard status, and where individuals can be reliably classified into haplotype-genotype groups. We considered that both conditions were fulfilled when clusters followed at least one of the inversion-haplotype mappings in Fig 1. scoreInvHap computes a similarity score between a subject's SNP genotypes in the inverted region and the haplotype-genotypes that map to experimentally validated inversion-genotypes. Note that the mapping is at the level of SNP and haplotype genotypes and not on individual chromosomes. As such, no phasing is needed for the inferences.
scoreInvHap then classifies a new individual into the reference haplotype-genotypes for which their link to inversion-genotypes has been established. The classification is based on similarity scores between the SNP genotypes of the individual and the SNPs in each haplotype-genotype reference group. To compute the score, we first build the classifier from the frequency of each SNP i in each reference haplotype-genotype k made of M k reference individuals, where f ki is the frequency of the i-th SNP genotype x = {0,1,2} in the haplotype-genotype reference group k. The frequency is the ratio between the number of reference individuals (n k ) in k with SNP genotype x i and M k . The score of a subject S, with L (L � N) SNP genotypes in the inverted segment (s 1 ,. . .s L ), s = {0,1,2}, in the haplotype-genotype reference group k is defined as where ρ i 2 is the maximum linkage disequilibrium between the SNP i and the haplotype groups in the reference individuals. For inversions with two haplotypes, ρ i 2 corresponds to the linkage disequilibrium R 2 between SNP i and the inversion-genotypes. For inversions with three haplotypes (A, B and C), we compute the R 2 between SNP i and each haplotype-genotype. For instance for haplotype A the three haplotype-genotype are given by RR: {BB, BC, CC}, RH: {AB, AC} and HH: {AA}. We use these three haplotype-genotypes to compute the R 2 between the haplotype group A and SNP i in the reference individuals. ρ i 2 is then, the highest R 2 across A, B and C. The inferred haplotype-genotype of the individual S is, therefore, the genotype for which the score is maximum, that is arg(max{H 1, . . . H J }) where J is the total amount of haplotypegenotypes; that is, 3 haplotype-genotypes for 2 haplotype groups, 6 for 3 groups, 10 for 4, and so on (Fig 1). The inversion-genotype for the individual follows from the link between haplotype-genotypes and experimental inversion-genotypes in the reference individuals.
For imputed data, the score is computed as where P i (t) is the probability that the individual S has genotype t.

Implementation
We implemented scoreInvHap in an R package that supports snpMatrix or VCF formats, two standard Bioconductor classes for SNP data. The stable version is available in Bioconductor (https://bioconductor.org/packages/release/bioc/html/scoreInvHap.html) while the development version can be installed from the GitHub repository (https://github.com/isglobal-brge/ scoreInvHap/). scoreInvHap requires the SNP genotypes of an individual in the inversion region. The allele frequencies of SNPs in each genotype reference and the ρ 2 between the SNPs and the validated inversion genotypes are built in the classifier and included in the package for the 20 inversions described in Table 1. We have also developed imputeInversion, a wrapper to impute SNP array data to use scoreInvHap. This tool is available from the GitHub repository (https://github.com/isglobal-brge/imputeInversion).

Datasets
We used the SNP data (MAF > 5%) of 503 European individuals of 1000 Genomes phase 3 [23]. We analyzed autism cohorts from the Autism Genome Project (AGP) [24] and the Simon Simplex Collection (SSC) [25]. SSC contained data from three different arrays: Illumina 1Mv1, Illumina 1Mv3 Duo and Illumina HumanOmni 2.5. We considered each array as a different dataset. To include European subjects only, we run a Principal Component Analysis (PCA) using 128 SNP markers for ancestry [26] including all autism cohorts and HapMap3 individuals [27]. We generated a confidence ellipse of 0.99999 around European HapMap subjects and we discarded all individuals outside the ellipse. We discarded 111 subjects of AGP.
We obtained exome data from the UK10K Neurodevelopment datasets. We analyzed two datasets to compare scoreInvHap to clustering methods: one of schizophrenia cases (UK10K_NEURO_ABERDEEN) and another of autism cases (UK10K_NEURO_ASD_GAL-LAGHER). Both datasets are deposited in the European Genome-phenome Archive (EGA) under study accession codes EGAD00001000433 and EGAD00001000436. To select European individuals, we performed a genome-wide PCA of the merge between the UK10K neurodevelopment datasets and two control GWAS datasets: British Birth Cohort (BBC) and National Blood Service (NBD). We discarded subjects outside the central PCA cluster, likewise AGP.

Inversion simulation
We generated four different inversions using invertFREGENE [15]. We used default values of recombination (1.25×10 −7 ) and mutation rates (2.3×10 −7 ). In all simulations, the entire simulated region was 2Mb while the inversion comprised 800Kb. Stop frequency was set at 0.4 for the first three inversions and to 0.2 for the forth.

Inversion genotyping
We run scoreInvHap in SNP arrays, imputed data and exome data using the inversion-genotype references included in the package. We discarded SNPs with call rate lower than 0.9. We ran invClust using the first two multidimensional scaling components of the SNPs in the inverted regions. We ran PFIDO with the default values of SNPs and subject call rate filtering (0.9). We forced the model to return 3 groups and set all the other parameters to default.

Association analysis
We tested the associations between autism spectrum disorder and schizophrenia, and inversions inv-8p23.1 and inv17q21.31 in ten UK exome studies of the UK10K project (S6 Table). We used subjects from Welcome Trust Case Control Consortium 2 as controls. This dataset consists of two cohorts (National Blood Service (NBS) Cohort and 1,958 British Birth Cohort) genotyped with Illumina 1.2M. We only included individuals classified as Europeans by peddy [28]: 5,529 controls, 604 autism cases and 1,308 schizophrenia cases. To run peddy, we created two datasets: the first one was the merger between controls and autism cohorts and the other was the merger between controls and schizophrenia. In both cases, we included the 68,689 SNPs that were common between the SNP arrays and the exome data. We applied scoreInvHap on each dataset. As cases and controls cohorts belong to different studies, we tested whether the differences in inversion frequencies were not statistically significant (chi-squared test) between the two control cohorts, and among the ten cases cohorts. We used SNPassoc for association testing between disease status and inversion genotypes in the joint dataset across all cohorts, adjusting for the joint genome-wide PCs.
We tested the association between inversions and breast cancer on the Cancer Markers of Susceptibility (CGEMS) study [18,19], available in dbGaP (dbGaP Study Accession: phs000147.v3.p1). We only included individuals classified as European with a probability higher than 0.9 inferred by peddy [28]: 1,061 cases, 1,033 controls. We imputed the chromosomes containing inversions in Table 1 with Michigan Imputation Server [29]. We selected HRC r1.1 2016 as reference panel and SHAPEIT as phasing algorithm. We removed SNPs with an imputation R 2 smaller than 0.4. We called inversion genotypes with scoreInvHap in the 15 inversions having at least 4 SNPs with high quality imputation. We used SNPassoc for association testing between disease status and inversion genotypes, adjusting for age and the joint genome-wide PCs. EUR is the frequency in the European individuals of the dataset described in this study (https://www.sfari.org/resource/simons-simplex-collection/) by applying at https://base.sfari.org/.