Genetic analyses in a bonobo (Pan paniscus) with arrhythmogenic right ventricular cardiomyopathy

Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a disorder that may lead to sudden death and can affect humans and other primates. In 2012, the alpha male bonobo of the Milwaukee County Zoo died suddenly and histologic evaluation found features of ARVC. This study sought to discover a possible genetic cause for ARVC in this individual. We sequenced our subject’s DNA to search for deleterious variants in genes involved in cardiovascular disorders. Variants found were annotated according to the human genome, following currently available classification used for human diseases. Sequencing from the DNA of an unrelated unaffected bonobo was also used for prediction of pathogenicity. Twenty-four variants of uncertain clinical significance (VUSs) but no pathogenic variants were found in the proband studied. Further familial, functional, and bonobo population studies are needed to determine if any of the VUSs or a combination of the VUSs found may be associated with the clinical findings. Future genotype-phenotype establishment will be beneficial for the appropriate care of the captive zoo bonobo population world-wide as well as conservation of the bobono species in its native habitat.

the human reference genome, and further annotated and analyzed for their possible involvement in the etiology of ARVC. This analysis included several in silico prediction algorithms as well as variant frequencies available in human population databases as described in the Methods section. Most variants, though not listed as the reference in the human genome, were listed as the reference nucleotide in the chimpanzee genome and/or the bonobo genome (https://www.ncbi.nlm.nih.gov/genome/10729), which suggested they may be tolerated and simply reflecting the difference between humans and apes. Therefore, to better assess the involvement of each variant in disease, we focused on Lody's variants that were not found in our unaffected control bonobo, Kitty, reportedly the oldest bonobo to have lived in captivity in North America (Table 1). This resulted in the identification of two heterozygous variants of uncertain clinical significance (VUS) in the CTNNA3 and JUP genes.
The first Lody-specific DNA change found was a non-synonymous heterozygous c.1774G > A (p.A592T) variant in exon 6 of the CTNNA3 gene (Fig. 2) leading to the semi-conservative amino acid substitution of an Alanine residue with a Threonine residue at a position conserved in human, rhesus, mouse, dog, and elephant, as well as in bonobo and chimpanzee (but a Serine residue in chickens). This sequence change has not been previously described as a disease-causing variant in cardiomyopathy patients and it has not been reported in the single nucleotide polymorphism database (dbSNP), in the 2,500 subjects of the 1000 Genomes Browser, in the 6,500 subjects of the NHLBI Exome Sequencing Project (ESP) database, in the 60,312 unrelated individuals of the Exome Aggregation Consortium (ExAC), or in the gnomAD browser beta, which evaluated exome data from individuals of European, African American, Hispanic, Asian and other backgrounds, suggesting this is not a common apparently neutral variant in these populations [16][17][18][19][20] . Variants that affect the same amino acid p.A592G/D/S are present in the gnomAD browser beta with less than 0.00001 allele frequency 20 . While not validated for clinical use, the computer-based algorithms PolyPhen (HumDiv and HumVar) and Mutation Taster classified this variant as probably damaging or disease causing, while SIFT and SNPs&Go classified this variant as tolerated or as a neutral polymorphism.
The second Lody-specific variant was a synonymous heterozygous c.1506 C > T (p.I502I) variant in exon 9 of the JUP gene (Fig. 3A). The p.I502I is a silent variant located in the third codon of exon 9. However, the bonobo reference genome (https://www.ncbi.nlm.nih.gov/genome/10729), which is based on the sequencing of one female bonobo 21 presents with the variant nucleotide at this position (that is, the thymine and not the cytosine, as in the human reference), while the chimpanzee reference (https://www.ncbi.nlm.nih.gov/genome/?ter-m=pan+troglodytes) has a gap in the sequence at that position and surrounding nucleotides and thus could not be evaluated. The Isoleucine residue is conserved in human, rhesus, mouse, dog, elephant, chicken, and frog, while a similar amino acid, Valine is found in zebrafish and lamprey. This sequence change has not been previously described as a disease-causing variant in cardiomyopathy patients. This sequence change has been reported in the SNP database as rs372963143, but was not detected in the 2,500 subjects of the 1000 Genomes Browser or in the 6,500 subjects of the NHLBI ESP database, while it was found in 4/248,880 alleles (0.002%) in the gnomAD browser beta. This variant was detected in heterozygosity in 1/8806 alleles (0.01%) from African subjects, in 1/13798 alleles (0.007%) from South Asian subjects, and in 1/60274 alleles (0.002%) from European (non-Finnish) subjects, while it was not detected in all other ethnicities in the ExAC database. Overall, population  (Table 1) were present in Lody's heart DNA. Freq., frequency; Gap, gap in genome assembly; N/A, not applicable or not available; NM, no match to genome assembly. data suggests this is not a common apparently neutral variant. The computer-based algorithm Mutation Taster classified this variant as disease causing, and the in-silico prediction tool ESE Finder predicts the addition of an SRSF5 (SRp40) putative responsive exon sequence to the mutant transcript (Fig. 3B). Given that we did not find any pathogenic/likely pathogenic variants using our Sanger approach, we used a custom NGS panel to look for variants in the coding and splicing regions of 246 genes associated with cardiovascular disorders in Lody's whole blood and heart and Kitty's whole blood DNA as described in the Methods section and in a previous study 22 . In total, using the human genome GRCh37/hg19 as reference, 20,756 variants  were found in Lody's whole blood DNA, 19,817 variants were found in Lody's formalin-fixed paraffin-embedded (FFPE) heart DNA, and 20,040 variants were found in Kitty's whole blood DNA. There were 20,679 variants that were found in Lody's whole blood DNA but not in Lody's FFPE heart DNA, while there were 551 variants that were found in Lody's FFPE heart DNA but not in Lody's whole blood DNA (19,343 variants were found in both DNA sources and 21,230 variants were unique to one of the DNA sources). After variant classification, subtractive filtering analyses and Sanger confirmation studies, no pathogenic/likely pathogenic variants were found in Lody's blood or heart DNA (see Methods section). However, using our NGS approach, several VUSs were found in Lody's blood or heart DNA, and are listed in Table 1. The CTNNA3 and JUP VUSs found by Sanger sequencing alone were detected by the NGS approach but did not pass our stringent post-bioinformatics filtering steps because their damage score was below 4.
In addition to sequencing approaches, we performed SNP microarray analysis of Lody's whole blood and FFPE DNA using a microarray chip designed for the human genome. The ancestors of humans split from bonobos approximately 4 million years ago. Recent completion of the bonobo genome has revealed it to be 98.7% identical to corresponding sequences in the human genome 21 . We analyzed the data for regions of significant absence of heterozygosity (AOH), deletions or duplications involving genes that could be potentially influencing Lody's clinical phenotype. Using the call setting described in the Methods section, we found a total of 536 calls in the heart and 907 calls in the blood. After further data curation to exclude AOH, deletion and duplication calls with no OMIM genes, we found a total of 242 calls in the heart (16 AOH, 73 gain, and 153 loss) and 479 calls in the blood (3 AOH, 5 gain, and 471 loss) ( Table 2 and Supporting Information Table S1). There were no significant calls involving the 10 ARVC genes sequenced by Sanger methods. Additionally, we determined if there were patterns of long stretches with AOH, also termed runs of homozygosity (ROH). ROH has been utilized to determine the level of parental relatedness in an individual [23][24][25] . We did not observe substantial ROH in Lody, suggesting there has not been inbreeding in immediate generations. Overall, data for the heart FFPE sample was suboptimal, likely due to the poor quality of the DNA for use in microarrays.

Discussion
The primary goal of our study was to investigate the molecular basis of the seeming ARVC in a specific subject, a 38-year-old founder bonobo (Lody), who died suddenly of a histologically characterized arrhythmogenic cardiomyopathy. A definitive identification of the molecular culprit(s) would have aided in identifying individuals harboring the genetic variant(s) and attaining proper risk-stratification of Lody's offspring, thus warranting a rigorous surveillance and appropriate medical treatment. Two of three full-sibling descendants from this founder's mating with a founder female with progressive limb weakness of unknown etiology had early deaths, further supporting the potential inherited nature of the disease, thus adversely affecting future generations and the sustainability of this species. A secondary goal of our project was to find variants that would shed new light into the genetics of heart diseases in bonobos and develop a test panel that could be utilized by other institutions and provide a replicable, non-invasive test for all great apes to be easily screened for cardiovascular disease. This preliminary information may help identify and diagnose individuals at an earlier stage of disease progression and thus enroll them in training to participate in echocardiogram tests and other non-invasive monitoring and treatment plans. Early diagnosis is vitally important to prevention and treatment, and is especially important for the bonobo species, which is currently listed as endangered in their wild habitat (west-central Africa). Finally, in the case of the bonobo, these results will assist the Bonobo SSP in making more knowledgeable breeding recommendations for the already limited captive population.
In our efforts to identify the genetic etiology for the clinical presentation for Lody, we sequenced the coding regions and flanking exon-intron boundaries of 10 ARVC-associated genes (LMNA, CTNNA3, DES, TGFB3, JUP, TMEM43, PKP2, DSC2, DSG2, and DSP) in our subject and compared his data to that of an unaffected female bonobo (Kitty). The initial method of choice for this proposal was Sanger sequencing as it provided the flexibility necessary to design the assay, the complete coverage of the targeted regions, and avoidance of duplicated homologous loci. We did not find any variants with sufficient evidence to be clearly pathogenic in the blood or heart tissue from Lody; however, we found two VUSs in the CTNNA3 and JUP genes that were found in Lody's whole blood and heart DNA, but not in Kitty's whole blood DNA. We do not have enough functional and segregation data to support that any of these variants may alone explain Lody's phenotype, but we cannot rule out that one of them or the combination of these two variants, may be related to the etiology of ARVC in our subject.
The catenin (cadherin-associated protein), alpha 3 (CTNNA3) protein is a member of the vinculin/ alpha-catenin family and has a role in cell-cell adhesion in muscle cells, with mutations being associated with ARVC 26 . The heterozygous c.1774G > A (p.A592T) variant found in Lody has not been studied functionally, has not been previously described in cardiomyopathy human patients or control populations, and in silico data for prediction of effect on the structure/function of the protein is inconclusive. Two rare (<0.05% in the ExAC database) missense variants are found in the same residue, p.A592D and p.A592G (rs768797369), but no functional or  segregation data is available for these variants. The functional significance of this sequence change is not known at present and its contribution to Lody's disease phenotype cannot definitely be determined. The junction plakoglobin (JUP) protein is a member of the catenin family and is present in desmosomes and intermediate junctions, with mutations being associated with ARVC 27 . The silent heterozygous c.1506 C > T (p.I502I) variant found in Lody has not been studied functionally, has not been previously described in cardiomyopathy human patients, and is very rare in control populations. The in silico exonic splicing enhancer (ESE) software predicts the creation of a novel responsive sequence for the binding of the serine/arginine-rich splicing factor 5 (SRp40) in the mutated transcript. However, we cannot determine how this may affect the structure/ function of the protein, and thus, we cannot determine its contribution to Lody's phenotype. Interestingly, the bonobo reference genome, which is based on the DNA sequence of one female individual 21 , also presents with the c.1506 T. Currently, clinical information about this individual is limited but indicates that she is healthy and is approximately 22.5 years old (personal communication with Jean-Pascal Guery, La Vallee des Singes zoological director, 04/20/2016). At present, we cannot absolutely assure that the presence of the thymine at this position is benign.
Given the Sanger sequencing results, we performed supplementary sequencing using NGS for a group of 246 cardiovascular genes, but did not find any pathogenic/likely pathogenic variants. Instead, we found several additional VUSs in Lody's blood and/or heart DNA which may be causing or contributing to his clinical phenotype alone or in combination with other variants. Finally, we found no significant deletions, duplications, or AOH in the blood or heart tissue from Lody using chromosomal microarray analysis testing.
We acknowledge several limitations to our study. Further studies would be desirable to better understand the functional effect of each variant found in this report. Our results are based on the analysis of one affected and one unaffected individuals, which is not powerful enough to resolve the potential pathogenicity of the variants we found, since they are not highly characterized variants. Also, it is possible that some of the variants were overlooked because they were found in both subjects, and we should consider the possibility of low penetrance of such variants and/or the effect of these variants on disease expressivity. In addition, our population data is based on human databases such as the gnomAD, ExAC browser, the ESP, and the 1000 genomes project, which, although very helpful, are not necessarily optimal for the bonobo interpretation. Furthermore, the current bonobo reference genome is based on the sequencing of the genome of a single female, Ulindi. Personal communication with La Vallee des Singes, the current housing institution for Ulindi, reported that she was born on August 1993, was strong and healthy at the age of 22 years (when communication was established), and had a healthy one-year-old baby. Finally, although NGS resequencing is employed daily in clinical practice for human subjects, the application of resequencing for bonobo is limited with the current UCSC Genome Browser assembly ID: panPan1 uploaded in 2012. Although, de novo genome sequencing is possible, it requires an enormous bioinformatics and post-bioinformatics effort for variant calling and curation, which is beyond the scope of this current project.
To our knowledge, this is the first comprehensive genetic approach using targeted gene sequencing, next generation sequencing, and genome-based chromosomal microarray analysis attempting to find a genetic etiology for the ARVC presentation of a deceased alpha male bonobo from the MCZ by searching for pathogenic variants in genes previously linked to ARVC, as well as in 246 cardiovascular genes. The sequencing of other affected relatives and segregation analyses may further support the pathogenic role for each variant or support the multi-locus additive effect of a combination of variants if they co-segregate with the disease phenotype in Lody's family members. Alternatively, the sequencing of the genes where VUSs were found or whole exome sequencing in several additional affected and unaffected captive bonobos, as well as from all four non-human primate genera may be helpful to elucidate the basis of such potentially devastating disease. We hope that through the development of this cardiovascular assessment strategy, the bonobo, as well as the other great apes, will benefit from the molecular technologies currently available and therefore be more likely to live a longer, more productive life, and be able to contribute to the breeding and preservation of their species. We have a rare opportunity to utilize the large, collaborative efforts and data banks from human studies for the benefit of our closely-related apes. Finally, findings from non-human primates may make clinical significant contributions to the overall understanding of ARVC in human subjects.

Methods
Study Subjects. "Lody, " the founder male of the MCZ bonobo troop, died of sudden cardiac arrest at the age of 39 after several years suffering from heart failure with severely reduced ejection fraction (EF = 21%) and arrhythmias, despite medical therapy employing ACE inhibitors, beta blockers, and low dose aspirin. The autopsy revealed no evidence of gross atherosclerosis and normal cardiac chambers dimensions. In addition to the more typical left ventricular myocardial fibrosis, gross anatomical and fine histologic analysis revealed unexpected findings of right ventricular fibro-fatty replacement of the myocardium and trabeculated endocardial surfaces in both ventricles. The right ventricular focal transmural fatty replacement of the myocardium along with patchy subendocardial fibrosis in the left ventricular free wall was histologically identical to ARVC in humans (Fig. 1).
"Kitty, " an unrelated female bonobo who died at age 64 years with normal cardiac findings at necropsy, served as an unaffected control for gene sequencing. All post-mortem studies were approved by the Research Committee of the MCZ in accordance with principles of the Bonobo Species Survival Project of the Association of Zoos and Aquariums. Imaging and other phenotypic analyses including MRIs, Holter monitoring, and ECGs often used in human patients are not available for the study subjects, which were wild animals and not domesticated; and these and similar diagnostic procedures require general anaesthesia to be performed in bonobos, even those in captivity. Indiana University School of Medicine Molecular Genetics Diagnostic Laboratory of the Department of Medical and Molecular Genetics, Indiana University School of Medicine, for DNA extraction and sequence analysis. DNA from Lody and Kitty's whole blood were extracted using Qiagen's Gentra Puregene Blood Kit (Qiagen, Germantown, MD) following the manufacturer's instructions. DNA from Lody's FFPE heart tissue was extracted using Qiagen's All Prep DNA/RNA FFPE Kit following the manufacturer's instructions.

DNA
Design of PCR Primers. Primers were designed using Primer3 web 28 for the coding regions and flanking exon-intron boundaries of the LMNA, CTNNA3, DES, TGFB3, JUP, TMEM43, PKP2, DSC2, DSG2, and DSP genes using the corresponding human genes from assembly GRCh37/hg19 as templates 29 . At the time this project started, the first assembly of the bonobo genome (panpan1) was recently published 21 (https://www.ncbi. nlm.nih.gov/genome/10729) and was available for BLAST search to identify sequence homology between the more common Pan troglodytes (https://www.ncbi.nlm.nih.gov/genome/?term=pan+troglodytes; chimpanzee, the chimpanzee and bonobo genomes share 99.6% identity) 21 and Homo sapiens. However, the major genome browsers [29][30][31] did not support the genome analysis for bonobo and a refined annotation was not available at that time. Primers were verified using in silico PCR 30 using the chimpanzee genome to ensure primers would align properly. The UCSC Browser BLAT tool for chimpanzee genome was used to ensure that a single product would result 30 . The NCBI Blast tool for P. paniscus was used to check for regions containing SNPs between chimpanzee and bonobo (to avoid any possible allele dropout) 32 .
PCR, BigDye Sequencing, and Analysis of Variants. Genomic DNA from Lody (whole blood and heart tissue) and from Kitty (whole blood) was amplified by PCR using Qiagen's HotstarTaq DNA Polymerase ® and lab designed primers from Integrated DNA Technologies (IDT, Coralville, IA) with M13 tails used for BigDye (Sanger) sequencing (Supporting Information Table S2) [33][34][35][36][37][38][39][40][41][42][43][44] . Finally, variants were determined to have more burden if they were found in Lody (our subject) but not in Kitty (our control), as those in common between the two individuals were judged to be either unlikely to be clinically relevant or to be of low penetrance (see Table 1).
Next Generation Sequencing and Analysis of Variants. A previously described custom next generation sequencing (NGS) panel containing probes for human DNA was used in the DNA from Lody's whole blood and heart and from Kitty's blood to sequence the coding and splicing regions of 246 genes associated with cardiovascular disorders including arrhythmias, cardiomyopathies, congenital heart defects, aortopathy, connective tissue disorders, Noonan spectrum disorders, pulmonary arterial hypertension, metabolic disorders that afflict the heart and lipid disorders 22 . Briefly, paired-end sequencing was performed using an Illumina MiSeq sequencer (Illumina, Inc., San Diego, CA), followed by read alignment using the BWA software, local realignment, base quality recalibration, and variant identification using the GATK software, and variant annotation using ANNOVAR 22 . A 300× average depth of coverage was obtained among variants for the three samples.
Variants found in Lody's whole blood were divided into those found in the HGMD and those not found in the HGMD (non-HGMD). The HGMD variants were classified as being pathogenic/likely pathogenic, pathogenic/ likely pathogenic in autosomal recessive disorders, modifier, VUS, or benign/likely benign based on our interpretation from the literature 22 . In order to filter the non-HGMD variants (variants that did not have a previous association with human diseases) and find those with higher likelihood of having disease association, we selected variants that were in coding regions, excluded synonymous and nonsynonymous that were found in more than 2% of the populations in the 1000 Genomes browser, the NHLBI Exome Sequencing Project, and the gnomAD browser beta 20 , excluded variants that had a damage score lower than 4 (based on software in silico prediction of damage to protein structure/function) 22 , excluded synonymous and non-frameshift TTN variants, excluded variants that mapped to regions of known segmental duplications, and excluded non-frameshift variants that were found in more than 2% of the populations in the 1000 Genomes browser, in the NHLBI Exome Sequencing Project, or in the gnomAD browser beta. Following, a subtractive analysis was performed to select only those SCIentIfIC RePORTS | (2018) 8:4350 | DOI:10.1038/s41598-018-22334-5 resulting HGMD and non-HGMD variants of acceptable sequencing quality scores that were specific to Lody (not in Kitty, unless heterozygous in Kitty for a homozygous variant in Lody), as those in common between the two individuals were judged to be either unlikely to be clinically relevant or to be of low penetrance. Finally, the presence of each variant in the bonobo and in the chimpanzee genome was annotated as explained above. This analysis resulted in seral VUSs but no pathogenic/likely pathogenic variants. VUSs found to have acceptable quality and sequence depth scores were carefully chosen (Table 1).
Variants found in Lody's FFPE heart but not in Lody's blood extracted DNA were divided into those found in the HGMD and those not found in the HGMD (non-HGMD). Filtering and classification of variants was performed as explained above. This analysis resulted in seral VUSs but no pathogenic/likely pathogenic variants. VUSs found to have acceptable quality and sequence depth scores were carefully chosen (Table 1).

SNP Microarray Analysis.
Single nucleotide polymorphism (SNP) array analysis was performed on Lody's blood and FFPE heart DNA using the AffymetrixCytoScan HD array that contains 1,953,246 human non-polymorphic markers and 743,304 human SNP markers (Affymetrix, Santa Clara, CA). Allele peaks and signal intensity log2 ratios were analyzed with the Chromosome Analysis Suite (ChaS) software (Version 2.0.0.195) (Affymetrix) to determine gains, losses, and copy-neutral absence of heterozygosity (CN-AOH). The settings used for calls were as follows: 60 marker counts, 300 kbp for gain; 60 marker counts, 300 kbp for loss; 3000 kbp for AOH. Data was analyzed based on the NCBI human genome build GRCh37/hg19.