Bio-collections in autism research

Autism spectrum disorder (ASD) is a group of complex neurodevelopmental disorders with diverse clinical manifestations and symptoms. In the last 10 years, there have been significant advances in understanding the genetic basis for ASD, critically supported through the establishment of ASD bio-collections and application in research. Here, we summarise a selection of major ASD bio-collections and their associated findings. Collectively, these include mapping ASD candidate genes, assessing the nature and frequency of gene mutations and their association with ASD clinical subgroups, insights into related molecular pathways such as the synapses, chromatin remodelling, transcription and ASD-related brain regions. We also briefly review emerging studies on the use of induced pluripotent stem cells (iPSCs) to potentially model ASD in culture. These provide deeper insight into ASD progression during development and could generate human cell models for drug screening. Finally, we provide perspectives concerning the utilities of ASD bio-collections and limitations, and highlight considerations in setting up a new bio-collection for ASD research.


Background
Autism spectrum disorder (ASD) is a group of early onset and heterogeneous neurodevelopmental disorders affecting males (1/42) more often than females (1/189) [1]. The prevalence of ASD has risen rapidly; from 0.5/1000 people in early epidemiological studies of 1960-1970 [2,3] to 1/68 children of school age according to recent data from the Centre for Disease Control [1].
ASD is characterised by atypical development of social behaviour, communication deficits and the presence of repetitive and stereotyped behaviours [4]. It is highly clinically heterogeneous and accompanied by commonly occurring comorbidities that are not core to the disorder but frequently disabling. Communication deficit also persists in social communication disorder (SCD), and the new diagnosis of SCD (DSM-5) makes it possible to distinguish ASD from SCD individuals. The severity may vary across a range of parameters including ASD symptoms, IQ and comorbid behaviours [4]. For example, 70% ASD patients will have at least 1 comorbid psychiatric disorder [5], such as social anxiety, depression and bipolar disorder [6]. In addition, ASD is frequently associated with epilepsy, gastrointestinal and immune disorders [7]. ASD is a highly heritable complex polygenic condition. Estimated heritability based on family and twin studies are 50-80% [8,9]. It is strongly linked to genetic factors involving the development and function of the nervous system [10], mitochondrial function [11], the immune system [12] and epigenetic regulations [13]. Genetic risk is attributed to rare copy number variants (CNV) and single nucleotide variants (SNV) acting on the background of common genetic variation (reviewed by [14]). High throughput genome sequencing technologies have facilitated genomic discovery, and advanced bioinformatics methodologies have enabled investigation of proteinprotein interactions [15,16] and functionally related pathways [17,18]. The pathway to gene discovery has required large-scale international collaborative efforts based on the assembly of large bio-collections that are now publicly available and the subject of this review. In parallel to bio-collections, large-scale patient registries have provided epidemiological data that illustrate the course and prognosis of ASD and are helping to identify environmental factors influencing the aetiology [19][20][21][22]. Despite the advances, significant gaps in our knowledge of the aetiology remain and effective treatments for core ASD symptoms are elusive. The genetic and clinical heterogeneity of ASD means that further advancement will require larger bio-collections coupled with rich clinical data, ideally longitudinally to obtain a clear picture of the disorder both on the molecular and physiological levels.

Autism bio-collections
A bio-collection is a large set of biologically characterised samples, such as blood or tissue collected from a group of individuals who typically have a specific medical condition. Bio-collections are useful as a dedicated resource to generate clinical and scientific data for the analysis of medical conditions on a large scale [23], as well as to create functional disease models to explore the biology of clinical conditions. Large-scale bio-collections and associated comprehensive data that can aid the interrogation of the relationship between the genotype and phenotype effects at the individual and group levels can address the issue of heterogeneity. The purpose of this review is to provide a summary of the publicly available ASD bio-collections, to highlight the impact of these on ASD research and to identify new directions for ASD bio-collection for future research purposes.

Methods and search criteria
A literature search was conducted amongst published studies from Jan 2001 to Nov 2016 on electronic databases of Web of Science, EBSCO, PubMed, Science Direct, MEDLINE, Wiley Online Library. The search terms included "biobank", "registry", "collection", "autism" and citation of bio-collections. A total of 263 studies from ASD bio-collections have been included in the tables and references of this review (Tables 2, 3, 4, 5 and 6).

Inclusion criteria
This review included (a) studies using original samples of human tissues in ASD bio-collections; (b) studies using bio-samples extracted from systematically collected bio-resources (i.e. DNA, RNA, protein) for investigating the risk or influence of ASD; (c) the population studies involving participants of autism, Asperger and pervasive developmental disorder not otherwise specified (PDD-NOS); (d) studies published in peerreviewed journals and (e) in English.

Exclusion criteria
Studies were excluded (a) if they did not mention the collection(s) in the research data, references, acknowledgements or supplementary materials; (b) if the bio-samples were not derived from a systematic sample collection; and (d) if studies only concerned animal models of ASD without using ASD bio-collections or data.
We focus largely on studies from five bio-collections, four providing DNA, cell lines and metabolites, the Autism Genetic Resource Exchange (AGRE), Simons Simplex Collection (SSC), The Danish Newborn Screening Biobank (DNSB) and The Autism Simplex Collection (TASC) one providing brain tissue, Autism BrainNet (formerly the Autism Tissue Program (ATP)).We also included two emerging bio-collections that have fewer or no publications released yet, but could be of significant impact in the future. They are the Autism Inpatient Collection (AIC) [24] and the Autism Spectrum Stem Cell Resource [25]. An overview of the bio-collections and their website links can be found in Table 1.

Autism Genetic Resource Exchange (AGRE)
AGRE was established in 1997 by the Cure Autism Now (CAN) Foundation and the Human Biological Data Interchange (HBDI). Samples are provided by families with children affected by ASD and are coupled with anonymously coded clinical diagnostic data, such as Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observational Schedule (ADOS). Additional clinical data include photographic dysmorphology, neurological and physical examination, and family and medical history. AGRE is currently managed by Autism Speaks. It contains over 2500 families and the resource has contributed to high profile genetic discoveries relating to ASD (Table 2). Samples are housed at the National Institute of Mental Health repository at Rutgers' University in the form of immortalised cell lines, DNA and serum samples which can be accessed by researchers through applications [20].
AGRE lymphoblastoid cells enabled studies into shared ubiquitin and neuronal gene expression in lymphoblastoid cells and brain [73,74], gluthathione metabolism, oxidative stress [75,76] and stress response [77], microRNAs and their use in ASD profiling [78,79], CYFIP1 dosage effect on mTOR regulation [80], and changes in methylation patterns of RORA and BLC2 and their effects on apoptosis, cellular differentiation, inflammation and neural development [47].
The AGRE collection was also used to establish genetic methodologies and bioinformatic tools. This included using mismatch repair to detect amplicons in ASD [81], using multiplex ligation-dependent probe amplification (MPLA) to improve detection of microduplications and microdeletions [82], and incorporating disease symptoms to improve linkage detection in genetic data [83] and analysis of genetic loci to search for candidate genes [84].

Simons Simplex Collection (SSC)
The SSC is a genetic and clinical repository, which contains material derived from 2600 families. Whereas the AGRE contains multiplex families and trios, The SSC ascertained "simplex" ASD families defined as families where only one child has ASD and at least one other typically developing sibling. DNA is available for both parents, the affected child and an unaffected sibling. Thus the SSC samples are particularly valuable in evaluating parental inheritance. Samples were collected at multiple sites and were stored as immortalised cell lines at Rutgers University Cell and DNA Repository (RUCDR). Each sample was verified for parentage, gender and Fragile X mutation. In-depth clinical phenotypes were characterised for all participants to support genotype-phenotype analyses. These included data on diagnostic status, medical and psychiatric comorbidity, family history and medication use for the affected person. Broader ASD phenotype measures were collected for unaffected family members.
The SSC has become a vast resource of ASD and contributed significantly to numerous Whole exome sequencing studies of ASD in the past~7 years ( Table 3). The main findings showed that de novo mutations were frequently enriched in ASD patients [60]. Wholegenome sequencing results showed a significant enrichment of de novo and private disruptive mutations in putative regulatory regions of previously identified ASD risk genes. It also identified novel risk factors of CANX, SAE1 and PIK3CA with small CNVs and exonspecific SNPs, which were overlooked in previous CNV studies or exon sequencing [85]. It has also been observed that many de novo mutations were of paternal origin (4:1) and positively correlated with paternal age, [65]. The disruptive mutations were located in genes involve in transcription regulation, chromatin remodelling and synapse formation [86,87].
CHD8 was further evaluated as an ASD candidate gene in children with developmental delay or ASD, and 15 independent mutations were identified and enriched             Study numbers listed as families or subjects wherever applicable in a subset of ASD with altered brain size, distinct facial features and gastrointestinal complaints. Disruption of CDH8 in zebra fish recapitulated some of the patient phenotypes including increased head size and impaired gastrointestinal motility [88]. CHD8 is shown to control expression of other high-confidence de novo ASD risk genes such as DYRK1A, GRIN2B and POGZ [89]. Mutation of DYRK1A was strongly linked to a subset of ASD patients with seizures at infancy, hypertonia, intellectual disability, microencephaly, dysmorphic facial features and impaired speech [71,89]. POGZ gene which plays a role in cell cycle progression is also found to contribute to a subset of ASD with varying developmental delay, vision problems, motor coordination impairment, tendency of obesity, microcephaly, hyperactivity and feeding problems [90].

Danish Newborn Screening (NBS) Biobank
The NBS Biobank has a large collection of dried blood spot samples (DBSS), which are taken from new-borns 5-7 days after birth. They are sent to the New-born Screening lab at the Statens Serum Institute for analysis, and stored at −20°C in a separate freezing facility at the NBS Biobank. Prior to collection, parents are informed via leaflets about the biobank, with focus on what the samples will be used for (documentation, testing and retesting, research, etc.). Participants can opt out of storage at any time via a letter to the department. For security, both the clinical data and biological samples are linked via a unique number, kept in separate buildings, and are accessible by authorised personnel only [91]. The advantage of the NBS resources is that it provides a large amount of non-ASD controls as well as Danish ASD samples.
In the past 30 years the NBS Bio-collection has accumulated samples from 2.2 million individuals, around 65,000-70,000 samples per year from Denmark, Greenland and the Faroe Islands. Most recently this resource has been included under the Danish iPsych consortium with the Psychiatric Genomics Consortium (PGC), added 8-12 k samples to the PGC analysis and significantly increased its power to detect common genetic effects for ASD, which have been recently published [92].
DBSS were also used to examine metabolites. A group led by Abdallah carried out a series of studies on Danish collections (Table 4) to examine the potential role of cytokines and chemokines involved in signalling and immune response of ASD. Initially using amniotic fluid from the Danish Birth Cohort (DBC) collection [93,94], they followed up with DBSS from new-borns crossed referenced from that cohort [95,96]; they detected an imbalance of cytokines amongst ASD subjects compared to the controls. Most of the chemicals were lower than normal, such as Th-1 and Th-2 like cytokines involved in proliferation, priming and activation of these cell types, whereas a small number of cytokines displayed increased expression in ASD. The abnormal levels of these chemicals could lead to a hypoactive or "inactive" immune system in the brain, making it more susceptible to infection-related ASD. However, when chemokine levels were examined in amniotic fluid, no concrete relationship could be established.

The Autism Simplex Collection (TASC)
TASC is a trio-based international bio-collection that was assembled in collaboration with the Autism Genome Project and funded by Autism Speaks [97]. Trios, comprised of both parents and a child affected with ASD with no known medical or genetic cause. Collection of samples took place between 2008 and 2010 across 13 sites; 9 in North America and 4 in Europe. Management, storage and distribution of TASC data are handled by the Centre for Collaborative Genetic Studies on Mental Disorders (CCGSMD) [97]. Samples are housed at the NIMH and AGRE repositories both of which are located at Rutgers University.
So far, TASC has been used for GWAS studies [66] and CNV studies [72,98,99] and WES Studies [16,100,101]. In addition, TASC has also been used in WGS as part of the MSSNG project, which is discussed below

Autism Inpatient Collection (AIC)
The AIC is a bio-collection for ASD research based on those on the serve end of the spectrum with severe language impairment, intellectual disability and selfinjurious behaviour. This collection was founded on the basis that this segment of ASD patients are largely unrepresented in current studies. Bio-samples are initially recruited from 147 patients, and ongoing recruitment is estimated at 400 per year. Psychiatric, clinical and phenotypic data are collected in addition to blood samples for the creation of lymphoblastoid cell lines by RUCDR. Amongst this collection, over half are nonverbal, over 40% have intellectual disability and a quarter exhibit self-injurious behaviour [24]. This collection has yet to be used in any genetics-based studies. The fact that many patients are on the severe end of the spectrum makes it a welcome addition, and it opens opportunities to explore this under-represented group.

Autism Tissue Program (ATP)/Autism BrainNet
The Autism Tissue Program, now the Autism Brain Network, is a post-mortem ASD brain collection coordinated by a network of parents, caregivers, physicians and pathologists. Brain samples are preserved in formalin and/or in −80°C freezers to maximise the potential studies. In some cases, both hemispheres are fixed in formalin when there is freezing capacity or if the postmortem interval exceeds 24 h. Corresponding clinical data include age, sex, ethnicity, diagnosis, brain size, cause of death, post-mortem interval and preservation method for the left and right hemisphere of the brain. Due to the rarity of the sample, a thorough application procedure assesses scope, scale and feasibility of proposed projects prior to access of tissue, with the expectation that data, images and presentations generated by research on the samples are provided back to the Autism Brain Network 3 months after formal release of publications [102].
Brain pathology and molecular mechanisms have been the focus of studies using the ATP resource (Table 5) although many studies looking at brain anatomy and cell morphology employed samples from this collection, molecular and genetic studies are the primary focus of this review. Such studies included transcriptomics [103][104][105], epigenetics [29,[106][107][108][109][110][111][112][113][114][115] and alternative splicing [116,117]. A key discovery was the identification of convergent molecular pathology linking to neuronal, glial and immune genes [105] in a transcriptomics study that investigated the gene co-expression network between autistic and control brains. This led to the proposal of abnormal cortical patterning as an underlying mechanism due to attenuated differential expression in frontal and temporal cortices in ASD brains.
A recent study showed reduced Vitamin B12 in ASD brains [118] where the ATP made a very large contribution. Post-mortem examination of brain tissue ranging from foetal to the elderly subjects also showed a marked decline of the brain vitamin B12 with age, together with lower activity of methionine synthase in the elderly, but the differences were more pronounced in ASD and schizophrenia subjects when compared to controls. Acetylation is an important post-translational modification in the field of epigenetics. ATP also made a significant contribution to a large-scale histone acetylomewide association study (HAWAS) using the prefrontal cortex, cerebellum and temporal cortex in ASD patients and controls. Despite their heterogeneity, 68% of syndromic and idiopathic ASD cases shared a common acetylome signature at >5000 cis-regulatory elements in the prefrontal cortex and temporal cortex. Aberrant acetylome was found to be associated with synaptic transmission, ion transport, epilepsy, behavioural abnormality, chemokinesis, histone deacetylation and immunity [113].
The ATP sample was used in a methylation study that investigated differential methylation in CpG loci in three brain regions: temporal cortex, dorsolateral prefrontal cortex and cerebellum. Differential methylation of four genes (PRRT1, C11orf21/TSPAN32, ZFP57 and SDHAP3) was detected. PRRT1, C11orf21/TSPAN32 were hypomethylated while the latter two were hypermethylated [109]. A further investigation in Brodmann's area also found a pattern of hypomethylation of a number of genes including C11orf21/TSPAN32 that are implicated in immune function and synaptic pruning [111]. These hypomethylated genes correlated with those showing overexpression by Voineagu. The methylation studies have further uncovered dysregulation of OXTR and SHANK3 genes in ASD. OXTR gene encoding oxytoxcin receptor was significantly hypermethylated in the peripheral blood cells and temporal cortex of ASD, highlighting a reduced oxytocin signalling in the aetiology of ASD [108] and a therapeutic target of ASD. Differential methylation of the SHANK3 gene was detected between ASD and control brains. They found that when three 5′ CpG islands of the gene were examined, they observed altered methylation also changed SHANK3 splicing, with specific SHANK3 isoforms expressed in ASD [114]. This is echoed by a recent study, which reveals a dynamic microexon regulation associated with the remodelling of protein-interaction networks during neurogenesis. The neural microexons are frequently dysregulated in the brains of ASD, which is associated with reduced expression of SRRM4 [116]. The neuronalspecific splicing factor A2BP1/FOX1 and A2BP1dependent splicing of alternative exons are also dysregulated in ASD brain [105].

Replication studies and pooling resources
Research data from one bio-collection is not always replicable in another sample set. Therefore, cross-validation between different bio-collections will not only minimise false positive, but also identify the common risk factors and subset-specific factors. For example, a genome-wide survey was carried out to test trans-generational effects of mother-child interactions, and the AGRE and SSC samples were used to replicate the original findings of 16 ASD risk genes (PCDH9, FOXP1, GABRB3, NRXN1, RELN, MACROD2, FHIT, RORA, CNTN4, CNTNAP2, FAM135B, LAMA1, NFIA, NLGN4X, RAPGEF4 and SDK1) involving urea transport and neural development. The results from the AGRE and SSC cohorts did not match the original study and showed fewer associations. When post-correction of the statistics was applied, the results lost their significance [119]. This could partially be due to the differences in the array design with different coverage of SNPs and/or different methodologies.
The meta-analysis of five data sets including the AGRE and SSC demonstrates that females have a greater tolerance to CNV burden. This leads to a speculation that the maternal tolerance of the CNVs can result in decreased foetal loss amongst females compared to males, and that ASD-specific CNV burden contributes to high sibling occurrence. What is interesting about this study is that the results for high CNV burden in females are consistent throughout each data set. This is an example showing how multiple bio-collections can give a clearer picture in a combined study where individual studies may be ambiguous [120,121].
Many major studies on the genetics of ASD have also been accomplished as a result of the collaborations amongst the institutions (Tables 2, 3, 4, 5 and 6). An effort was made to evaluate the association of Fragile X Mental Retardation 2 locus (AFF2) with ASD using joint resources from AGRE (127 males) and SSC (75 males). AFF2 encodes an RNA-binding protein, which is silenced in Fragile X. The study found that 2.5% of ASD males carry highly conserved missense mutations on AFF2 gene which was significantly enriched in ASD patients, when compared to >5000 unaffected controls [122]. A WES was published recently, which sequenced the exomes of over 20,000 individuals, including those from the SSC and Swedish registries. The study identified 107 candidate genes, and reinforced ASD pathways of synaptic formation, chromatin remodelling and gene transcription. This study detected mutations in genes involved in calcium-(CACNA2D3, CACNA1D) and sodium-gated channels (SCN2A) which were related to neuronal function, and in genes involved in posttranslational methylation (SUV420H1, KMT2C, ASH1L, SETD5, WHSC1) and demethylation (KDM4B, KDM3A, KDM5B, KDM6B) of lysine residues on histones which provided molecular basis linking to neuronal excitation and epigenetic changes in ASD [86].
Multiple bio-collections were employed to investigate SHANK1, 2 and 3, which are scaffolding proteins implicated in ASD. They devised a genetic screen and metaanalysis on patients and controls including cohorts from the AGRE, SSC and Swedish twin registry. In total,~1% of all patients in the study had a mutation in this group of genes. The mutations in SHANK3 had the highest frequency (0.69%) in patients with ASD and profound intellectual disability. SHANK1 (0.04%) and SHANK2 (0.17%) mutations occurred less frequently and were present in individuals with ASD and normal IQ, and ASD with moderate intellectual disability [123].
Recently Autism, Speaks, in coordination with Google and Genome Canada, have launched another initiative; MSSNG (https://www.mss.ng/). The objective of the MSSNG project is whole genome sequencing of 10000 genomes of families affected by ASD. This incorporates AGRE along with other bio-collections to sequence the entire genomes of families with autistic children, and as of the summer of 2016, it has reached the halfway goal of 5000 genomes out of 10000, with the contribution of the AGRE (1746) and TASC (458). Two studies have been published from this initiative. In the first study, genomes from 200 families were sequenced [124]. The findings revealed many of the de novo mutations (75%) from fathers, which increased dramatically with paternal age. Clustered de novo mutations however were mostly maternal origin, and located near CNV regions subject to high mutation. The ASD genomes were enriched with damaging de novo mutations, of which 15.6% were noncoding and 22.5% genic non-coding, respectively. Many of the mutations affected regulatory regions that are targeted by DNase 1 or involved in exon skipping [124]. The second study [125] featured 5205 sequenced genomes with clinical data, where an average of 73.8 de novo single nucleotide variants and 12.6 insertions/deletions/CNVs were detected per ASD patient. Eighteen new genes were also discovered (CIC, CNOT3, DIP2C, MED13, PAX5, PHF3, SMARCC2, SRSF11, UBN2, DYNC1H1, AGAP2, ADCY3, CLASP1, MYO5A, TAF6, PCDH11X, KIAA2022 and FAM47A) that were not reported in ASD previously. These data clearly demonstrate that ASD is associated with multiple risk factors, and within an ASD individual, and multiple genetic alterations may be present. The Whole genome sequencing is therefore a powerful tool to detect genetic changes at all levels. Resources like MSSNG are valuable, and pooling of ASD bio-collections are essential for identification of the common and subgroup-specific    pathways and drug targets of such a multi-factorial disease of ASD which involves hundreds of risk factors.

Stem cell research and autism spectrum stem cell resource
A major impediment to recent drug discovery particularly in the field of neuroscience is the lack of human cell models. The iPSC technology developed by Nobel Laureate Shinya Yamanaka has provided an excellent opportunity [126]. Fibroblasts from patients' biopsy can be converted into iPSCs with defined transcription factors, which resemble embryonic stem cells and can become most cell types in our body. Therefore, patient-derived iPSCs may be used to investigate disease pathology, progression and mechanisms to create human disease models for drug screening and testing [127,128]. The SSC has also commenced efforts to create iPSC lines from idiopathic ASD patients who have large head circumference but unknown gene association [129]. The iPSCs were grown into organoids to mimic cortical development, and ASD organoids were shown to display a disproportionate ratio of inhibitory: excitatory neurons. The cortical gene FOXG1 was overexpressed in ASD organoids, and this overexpression correlated with the severity of ASD and their head size [129]. This study has demonstrated a proofof-concept to model ASD in culture stem cells.
The Children's Hospital in Orange County California has set up a bio-collection dedicated to this task, the ASD Stem Cell Resource. ASD patients were screened and accepted based on the following criteria: ASD patients if they have no other conditions (i.e. trauma, stroke, seizure disorders) affecting the central nervous system other than ASD; if they have no features of other known genetic conditions (e.g. tuberous sclerosis); Fragile X patients if they are genotypically confirmed for the CGG repeat number of the FMR1 mutation; idiopathic autism patients who are negative for FMR1 mutation and chromosomal abnormality; if they possess an IQ of 40 or greater, and if they are 8-year-old or above. Skin punches and blood were collected in one location (MIND Institute), and fibroblasts were cultured and stored at the Hospital. The collection has been organised into seven groups; unaffected controls, Fragile X without ASD, Fragile X with ASD, permutations without ASD, permutations with ASD, ASD (not meeting full criteria for idiopathic status) and idiopathic ASD.
As of 2014, this resource was composed of iPSCs from 200 unaffected donors and patients. The collection includes fibroblasts, blood, iPSCs, iPSC-derived neuronal and glial cells. The first study published using this bio-collection was the iPSC models of Fragile X syndrome [130]. The Fragile X patient fibroblasts were used to derive iPSCs and differentiate into neurons for transcriptomic analysis. The neuronal differentiation genes (WNT1, BMP4, POU3F4, TFAP2C, PAX3) were shown to be upregulated, whereas potassium channel genes (KCNA1, KCNC3, KCNG2, KCNIP4, KCNJ3, KCNK9, KCNT1) were downregulated in Fragile X iPSC-derived neurons. The temporal regulation of SHANK1 and NNAT genes were also altered, with reduced SHANK1 mRNA and increased NNAT mRNA in patient cells. While the stem cell collection is relatively new, it has great potential to facilitate brain cell culture in vitro, which would otherwise not be feasible by using post mortem brain tissue.

Discussion
It is clear from the studies reviewed here that large ASD bio-collections have had an undisputable impact on progressing genomic discovery in ASD, leading to enhanced understanding of ASD neurobiology. While many studies used private collections as sources for tissue and data, large and well characterised samples from the collections reviewed have supported the discovery of small genetic effects, e.g. in GWAS and rare genetic mutations such as pathogenic CNV and SNV but it is clear, as highlighted for other neurodevelopmental disorders such as Schizophrenia that larger samples are required. Both genetic and phenotypic heterogeneity are impediments to gene discovery. Large bio-collections aim to reduce these effects but challenges remain. Each of the biocollections reviewed has its own strengths and limitations.

Phenotypic and genotypic heterogeneity
Some of the bio-collections, e.g. SSC, AGRE, TASC, reduced phenotypic heterogeneity through the use of research gold standards for ASD diagnosis, ADI-R and ADOS. Different versions of these instruments based on the timeline when these data have been collected have been used. IQ measurement is more complex to calculate due to the broad range of IQ commonly included within biocollections. Differences also exist in the clinical profile of subjects included in the different collections with some samples, e.g. SSC, comprised of more individuals with higher cognitive functioning relative to AGRE, TASC or AIC. Medical and psychiatric comorbidities [7] have greater recognition but are not as systematically evaluated in each of the collections. Differences in ascertainment are also relevant. The SSC focused on simplex autism, i.e. families where only one child was affected to maximise the detection of rare variants. Consequently, the relative contribution of common genetic risk within the SSC sample appears reduced. In contrast to autism specific bio-collections, the DNSB, provides a large population-based sample with clinical diagnosis that can maximise power within GWAS studies to detect common genetic variation but does not provide in-depth clinical data for phenotype-genotype analyses. This was evident in the studies on amniotic fluid and DBSS where different diagnostic criteria would have been applied at the time of the subjects' diagnoses, meaning one criteria would have excluded subjects(ICD8) whereas another would not (ICD10) [95,96] [93,94]. Throughout the studies listed here, there is an imbalance of ethnicities of bio-collections, as many of the studies rely heavily on Caucasian/European descent, which has been pointed out in some journals [131] and should consider diverse family structures [132], which can otherwise lead to population stratification [133]. Fortunately, efforts are underway to explore genetics of ASD in other countries such as China [134] and Brazil [135], which will reinforce many of the earlier findings covered in this review.

Samples
Large collections providing DNA for genomics studies have been advantageous; however, as studies move beyond the scope of genetics into transcriptomics, epigenomics and proteomics, a wider variety of sample types will be required. Serum will be valuable for investigating circulating metabolites and proteins that are expressed peripherally, including chemokines [93,95], cytokines [94,96], neurotropins [136], MMPs [137] and hormones [138]; however, this may not be the most appropriate tissue to investigate brain relevant ASD genes and proteins. DBSS, which can be useful for WES [139,140], methylation [141] and gene expression [142], would not be as useful as fresh drawn blood for WGS, as DBSSderived DNA would need to be amplified prior to use for analysis, potentially causing bias.
However, human brain tissue is a rare resource; brain tissue is very difficult to access due to its scarcity, and the preservation methods used may limit studies being carried out. Also, the types of brain cells are dependent on brain tissue being used; neuronal tissue in grey matter or glial tissue in white matter. Many of the studies listed in the Autism BrainNet, for example, utilised certain sections of the brain; and the most commonly used sections are the prefrontal cortex, temporal cortex, Brodmann's area, cerebellum and cingulate gyrus. While findings from these sections have been of crucial importance, a capacity to model the entire brain and to observe progression of ASD development would be ideal, and patient's somatic cells can now be converted to iPSCs and then into disease cell types.
IPSCs have been used as disease models for Fragile X syndrome [143][144][145] and Rett syndrome [146], and iPSCs have been generated from patents with deletions in SHANK3 [147] which are implicated in a number of neurodevelopmental disorders. The three-dimensional culture is developed and iPSCs can also be used to create mini-organoids, which can come very close to mimicking aspects of brain development [129,148]. In addition to the brain cell types discussed earlier [129,149], the iPSCs could be used to generate other cell types implicated in ASD co-morbidities, such as the gut [88,150] and the blood brain barrier [151,152].
Fibroblasts are the first cell type used to make iPSCs from mice [126] and humans [153] and remain as the most popular cell type for generating neural stem cells, neurons or iPSCs. Fibroblasts are easier to reprogram than many other somatic cells, and the reprogramming efficiency is between 0.1-1% depending on the reprogramming method [154]. They require basic culture media and proliferate rapidly, so large numbers of fibroblasts can be generated in a short period. Unlike keratinocytes they require trained medical personnel to obtain skin biopsies, which could be distressing to some ASD patients. Low passages of fibroblasts are required for reprogramming as higher passages dramatically reduce reprogramming efficiency and increase genomic instability [155].In addition to their use for IPSCs, fibroblasts can be used to investigate amino acid transport, and ASD fibroblasts were found to have greater affinity for transporting alanine, but less affinity for tyrosine-a key component for the synthesis of the neurotransmitter dopamine [156]. Fibroblasts can be used as a proxy to investigate transport across the blood-brain barrier [156,157] and to investigate calcium signalling [158,159].
Keratinocytes can also be used for generating IPSCs [160]. Collection is less invasive than skin biopsy and can be carried out by non-medical personnel. The hair samples are easy to transport and culture and transformed cells are easier to identify and isolate. Similar to fibroblasts, keratinocytes are reprogrammed at low passages and fewer methods have been employed to reprogram keratinocytes than fibroblasts. The lentiviral, retroviral and episomal reprogramming were tried successfully [155,161,162], and keratinocytes were shown to have high reprogramming efficiency of 1-2%. The major challenge is the reproducibility of keratinocyte growth, and it often requires repeated rounds of hair plucking from a same donor.

Organization
There are many generic articles and white papers for biobanks available, including consensus best-practice recommendations. For those who may wish to start their own bio-collections, we have listed a few articles in Table 7 for further reading on topics pertaining to collection, management, sustainability and quality control. In addition, links to international guidelines can be found here (http:// www.oecd.org/sti/biotech/guidelinesforhumanbiobanksand geneticresearchdatabaseshbgrds.htm; http://www.isber.org/ ?page=BPR; https://biospecimens.cancer.gov/practices/). However, even when using best practice guidelines, the storage and use of bio samples will be subject to the laws where the facilities are located, and will vary from country to country [163].

Participation and ethics
Stakeholders can have a considerable influence on how a bio-collection operates and how a bio-collection can be set up, managed and monitored [164]. In addition to researchers, clinicians and parents in bio-collections of ASD research, autistic stakeholders should be included as part of the stakeholder group, which could help guide and inform how research is carried out. A recent survey [165] was carried out amongst researcher-community engagement on ASD research in the UK. A high dissatisfaction and level of disengagement was expressed by parents and patients, who felt that research outcomes made little or no difference to their day-to-day lives and that they were not communicated, not involved or valued. Patients also felt that they did not receive follow-up and researchers were unapproachable and driven by data collection. Establishment and sustainability of a good stakeholder engagement are essential in ASD research and in biobanking. This will not only help guide research to subjects that matter to the community, but also the future of the biobank. One initiative, such as SPARK (Simons Foundation Powering Autism Research for Knowledge) is underway to encourage ASD communities in the USA to participate in ASD research. While such a goal is laudable, it is crucial that participants are engaged in the entire process. They are not just the suppliers of biocollections for research and data collection, but also make an input into research areas, which directly impinge on the quality of their life. Meanwhile, regular public events to update research progress and challenges to the stakeholder community may help win their understanding, appreciation and continuous support.
The ethics and obtainment of consent are significant factors for bio-collection research. The main considerations include what information shall be given to potential donors regarding the protocol and its implications of the research, how consent should be obtained [166] or what shall be done if consent was not clearly given [167]. It is also a matter of debate whether the consent should be "broad" and if the patient shall consent to a framework of research; if ethical review of each project shall be carried out by independent committees, and what are the strategies to inform and renew consent if Table 7 Description of papers relating to aspects of biobanking Reference Subject of paper [357] Introduces concept of adding value to stakeholders (patient donors/funders/research customers) and to find balance between aspects of sustainability (acceptability/efficiency/ accomplishment) [358] Feasibility of simplified consent form for biobanking. Result indicates simplified forms combined with supplemental information for further reading effective in minimising form length and complexity [359] Review paper detailing best practice guidelines for sample collection and storage, management of data and infrastructure. In addition, ethical, legal and social issues are explored [360] Paper discussing aspects of embryonic stem cell banking that can be applied to iPSCS [361] Key issues relating to delivery and safety testing of iPSC stocks for use in research and therapy. Importance of international and national coordinated banking systems are also discussed [362] Description of enclosed culture system for iPSCS and neural precursors for use in preclinical and basic research there is significant deviation of framework; where shall the consent be revisited and renewed for every new study [168]; how the data will be protected and accessed [169,170]; and how the findings will be communicated [171]. The latter is especially important if findings are of clinical significance to certain donors or it may affect their health or well-being [167]. These are the issues that each ethical application faces in making the application. For people with ASD, it can be very complicated. Parents will give consent for their children if they want to donate samples for the bio-collection, but there is a question of adults who may not have the ability to give consent or to fully understand the implications. It is also important to clearly communicate what this research will mean for the patient and the family, including findings that may be of pathological as well as clinical significance. Liu and Scott have commented on how the discoveries made in ASD research can be distorted by media. If parents/patients are misled to believe that a cure will come out a few years down the road, this may lead to disappointment and make them reluctant to participate in further research. Liu and Scott pointed out that the Neurodiversity Movement group (high-functioning autists) would have issues with certain research. They will not participate in research if they feel it may threaten or undermine people with ASD [128]. They prefer investment on services and therapies, rather than on genetic studies which may result in prevention of autistics being born [172][173][174], and the idea of curing autism is a complicated topic of debate [175].
For iPSC research, it was suggested to educate participants on the current state of research, to clearly explain the benefits and risks of biopsy donation and to consult the ASD community on research focus of an ASD biocollection and on distribution of the cell lines [128]. For clinical trials of stem cells, stem cell counsellors shall inform participants the benefits and risks of enrolling in stem cell trials and to safeguard them from the dangers of stem cell tourism. Such an approach should also be considered for ASD-related studies [176].

Conclusions
In conclusion, bio-collections have been shown as valuable resources and enabled large-scale studies on ASD. The recent genetic studies have begun to reveal de novo mutations on major cellular pathways [17,177]. There is also emerging evidence that ASD continuum contains subgroups with discrete mutations in specific genes such as CDH8 [88], DYRK1A [71] and POGZ [90] and gene mutations like NRXN1 [28,60,73,178,179] and SHANKs [72,98,114,123] recurring in broad populations. There is a vast amount of clinical and biological information available in these bio-collections, and the data are in the need for concrete guidelines on ethics and governance. The communication and trust shall be maintained between the researchers and families who have given biological and personal information. Finally, the availability of iPSC resources dedicated to idiopathic and syndromic forms of ASD could be a tremendous boon to the research community and such models are anticipated to be complementary with animal models and to speed up the development of therapeutic interventions for ASD. They could open up the possibilities of functional studies of ASD on a large scale and could become a future model for other iPSC bio-collections to be set up worldwide.