Introduction

The influence of genetic variation on the pathogenesis of pediatric kidney disease extends from the earliest stages of kidney development in utero to conditions that arise throughout a child’s life. Most successful gene discovery efforts have come from studying rare, large kindreds with multiply affected members [14], groups of children who share a similar renal phenotype [5], or children with a chromosomal abnormality that have multiple anomalies, including those of the kidney [6]. Using techniques such as linkage analysis, positional cloning, and karyotype analysis, the single gene or chromosomal region for innumerable kidney conditions has been defined, ranging from autosomal recessive polycystic kidney disease [7] to branchio-oto-renal syndrome [5] to Wolf-Hirschorn syndrome [8].

Major advances in genomic technologies and increases in computing power have resulted in the ability to generate, process, and store genotype and/or sequence data on a genome-wide level. There has been a parallel development of sophisticated analytical approaches to incorporate genomics data into the study of human disease, led by biostatisticians, bioinformaticians, and human geneticists. Importantly, these advancements have been accompanied by continually decreasing costs of sequencing, which have allowed more investigators to study a greater number of individuals.

These technologies have been successfully incorporated into existing family-based studies of Mendelian disease. In addition, the technologies and accompanying methodologies are allowing investigators to ask new sets of questions using large cohorts of unrelated individuals. These analyses include the role of low-frequency genetic variants or the contribution of copy number variants (CNV) in disease pathogenesis.

In this review, we will first define key terminology as it relates to describing types of genetic variants and methodologies for genomic inquiry. We will then highlight studies over the past year that have used these approaches to discover novel genes and loci associated with pediatric renal disease. We will also review studies that utilize risk variants, previously identified through genome-wide association studies (GWAS), to study the affects of these alleles in new populations or identify the functional role of these variants in model systems. Finally, we will discuss how we believe genomic inquiry will evolve in pediatric kidney disease research moving forward.

Definitions for Genomic Inquiry

Prior to discussing recent studies that employ innovative genomics approaches to define genes and loci associated with pediatric kidney disease, it is important to develop a working understanding of the terms and methodologies employed in genomic discovery investigations (Table 1).

Table 1 Genomic discovery methodologies

Genomic Variants

Across the genome, nucleotide positions can be found that vary from the reference sequence in a certain percentage of the population. These can be referred to as “single nucleotide variants (SNVs).” Rare, disease-causing SNVs, absent in healthy subjects, were historically referred to as “mutations,” such as those in nephrin in congenital nephrotic syndrome [9] and alanine/glyoxylate aminotransferase in primary hyperoxaluria type 1 [10].

Beginning in the early 2000s, SNVs and their frequencies were systematically defined genome-wide in different populations. Millions of SNVs with a minor allele frequency greater than 5 % were discovered. These were called single nucleotide polymorphisms (SNPs) and were employed to perform GWAS. With the deeper sequencing of more subjects over the past decade, increasingly rare SNVs have been annotated. A common nomenclature that has arisen refers to SNVs of a frequency between 0.5 and 5 % as “low-frequency variants” and less than 0.5 % as “rare variants.” A generalizable observation is that there is an inverse relationship between disease-associated variant frequency and magnitude of association with disease risk (lower frequency = increasing effect size). SNV may follow recessive, dominant, or additive modes of disease association.

Protein-Encoding Genes

Protein encoding genes are composed of exons, with intervening introns, and also have non-coding features associated with them, such as promoters or enhancers, which regulate their transcription. Of the 3,572 disease phenotypes listed in the Online Mendelian Inheritance of Man (OMIM) database, the vast majority are due to either exonic variants or deletions or duplication of genes [http://omim.org/]. The traditional method for gene discovery has been by identification of a candidate causal gene followed by Sanger sequencing of the exons and splice sites of candidate genes. While incredibly successful and accurate, it is time-consuming, costly, and interrogates only one gene at a time.

Genomes and Exomes

Besides protein-encoding genes, other functional elements of DNA include promoters, enhancers, transcription factor binding sites, epigenetic marks, and genes that are transcribed into micro-RNAs and long non-coding RNAs [11•]. DNA also has structural variants, with duplications and deletions, ranging from less than 10 base pair insertion–deletions (indels) to entire chromosomes [12]. Finally, DNA has a three-dimensional structure, involving histones, chromatin, and physical interactions of DNA segments that are far away in terms of linear distance across the nucleotide sequence [12, 13]. Taken together, all protein-coding genes, as well as these other characteristics of DNA, comprise the genome, and the study of these elements is referred to as “genomics.”

Next Generation Sequencing Technologies

There are many children with kidney disease for whom a clinical phenotype of family history suggests that there is a genetic component. Many times there are no obvious candidate genes on which to perform Sanger Sequencing, and either small family size or limited numbers of affected patients preclude traditional linkage or positional cloning approaches. In this situation, whole exome sequencing (WES), a method in which every exon of every protein-coding gene in an individual undergoes simultaneous next-generation sequencing (NGS), can discover rare, causal variants in a novel gene [14]. In other situations in which the clinical phenotype of an unsolved disease suggests a certain mechanism of disease (e.g., a ciliopathy), targeted exon capture can be used to select large numbers (10–100 s) of hypothesis-based candidates for sequencing, rather than all 20,000 genes, to discover a causal variant [15]. This method can also be used to determine the prevalence of variants in a group of known disease genes in individuals with a genetically heterogenous disease.

Whole genome sequencing (WGS), in which every nucleotide is sequenced, provides additional information about variants in non-coding, yet conserved regions of the genome. It can also determine copy number variants (CNV) in intergenic regions that may contribute to disease. However, its cost and the complexity of analysis of a dataset that is ~50× larger than that of WES has thus far precluded its widespread use.

Genotyping Arrays

Another major tool of genomic discovery is the genome-wide SNP genotyping array or “SNP array.” These can include millions of SNPs across the genome, or can be customized to SNPs that have a particular functionality or association with a certain type of disease. SNP arrays are cost efficient, allowing investigators to study genomic variation in large cohorts of affected individuals across the genome in an undirected manner. In this way, investigators can identify risk loci and follow up these leads through fine mapping of the implicated regions.

SNP arrays can also detect CNV [16], segments of the genome that have either greater or fewer copies as compared to the reference sequence. This can be detected at a resolution down to 1 kb [17•]. CNV can be inherited or occur de novo, and can be benign or deleterious [18, 19]. CNV have been associated with disease through mechanisms such as direct changes of the dosage of a particular gene, or altering regulatory sites of the genome. CNV can also be detected by array comparative genome hybridization (aCGH) [20].

We will now highlight studies that have emerged over the past 12 months that illustrate how investigators in the human genetics of kidney disease have taken advantage of each of these technologies to (1) improve their ability to discover new genes causing rare, Mendelian diseases, (2) search for more commons variants associated with complex diseases, or (3) study known genomic risk loci in different populations, different phenotypes, or in model systems.

Rare, Causal Variants

Pseudohypoaldosteronism Type II (PHAII), a rare Mendelian disease characterized by hypertension, hyperkalemia, and metabolic acidosis, has been previously found to be caused by mutations in a class of genes called WNK kinases [21]. However, many affected subjects remain who lack variants in these genes. Boyden et al. [22•] recruited 52 patients with PHAII and their immediate families, of which only 7 had WNK mutations, to discover new genes causing this disease. They performed whole exome sequencing (WES) on 11 unrelated affected individuals to discover rare variants in novel genes, and found 124 genes with three or more protein altering variants. At the same time, they performed SNP genotyping of the index cases and their families and determined that variants in 28/124 genes segregated with the disease state in two or more families. Using their knowledge of the physiology of PHAII and the distribution of rare variants in these candidates, they focused on kelch-like 3 (KLHL3I), which had both heterozygous and homozygous variants in 5 of the initial 11 kindreds undergoing WES. Targeted sequencing of this gene in the affected members of all 52 kindreds found KLHL3 mutations in 24 of the investigated kindreds. A main binding partner of KHL3 is cullin 3 (CUL3), and rare heterozygous variants in this gene were found in 2/11 individuals initially undergoing WES. Thus targeted resequencing of CUL3 was performed in all unsolved cases, and mutations were found in 17 individuals.

KLHL3 and CUL3 are present in the distal nephron and act together to ubiquinate targets. They are posited to cause the PHAII phenotype by ubiquination of substrates that activate the Na–Cl cotransporter, and affect K+, and H+ handling. In this study, patients with CUL3 mutations were found to be younger at presentation, with more severe hyperkalemia and acidosis, as well as growth impairment. Subjects with KLHL3 mutations had a disease severity in between those with causal variants in CUL3 and WNK.

Two independent groups used similar methodologies to discover rare variants in myosin 1E (MYO1E) as a cause of focal segmental glomerulosclerosis FSGS [23, 24•]. Both groups identified a consanguineous kindred that had three siblings with FSGS. By genotyping the family through high-density SNP arrays and performing both linkage analysis and homozygosity mapping (an approach to find autosomal recessive disease genes), the investigators were able to identify a region on chromosome 15 presumed to harbor the disease-causing variant. In one study, WES had already been performed on these subjects, so analysis was restricted to the 263 positional candidates at this locus. In the other study, targeted exon capture followed by NGS was performed on the 112 genes at the locus. Variant analysis and bioinformatic query identified homozygous variants in MYO1E as the most likely causal gene for FSGS in this family. Rare variants in this gene were subsequently replicated in an unrelated family and functional studies using cell cultures and model systems confirmed MYO1E as a newly discovered cause of FSGS (24). They showed that MYO1E, a non-muscle myosin, localizes to the podocyte and affects actin cytoskeleton dynamics, and when knocked out, causes NS in mice (24).

Congenital anomalies of the kidney and urinary tract (CAKUT) have been associated with rare variants in numerous genes, discovered in both children with this condition as well as animal models of CAKUT. This represents an ideal condition for targeted exon sequencing to test in many children at once whether variants in these known disease, or biologically plausible, genes contribute to the CAKUT phenotype. Saisawat et al. [25•] pooled the DNA from 40 children with non-syndromic CAKUT, 29 with unilateral renal agenesis (URA) and 11 with other forms of CAKUT. Simultaneous PCR of 313 exons of 30 candidate URA genes was performed followed by NGS and variant filtering and analysis. After numerous filtering steps, heterozygous variants were confirmed in 7/40 subjects, consisting of two known human CAKUT genes, RET and BMP4, as well as two novel ones, FRAS1 and FREM2. Homozygous rare variants in FRAS1 and FREM2 have been implicated in URA in the context of Fraser Syndrome, but have not been previously implicated in non-syndromic CAKUT. The results of this study may lead to consideration of these genes as a more prevalent cause of non-syndromic CAKUT.

Copy Number Variants CNV

The pathogenic role of CNV in pediatric renal disease has been most apparent in CAKUT, with a previous report of four large pathogenic CNV in 3/30 children who had CAKUT plus an additional extrarenal anomaly [19]. These large CNV contained hundreds of genes and regulatory regions that require further investigation to determine the relationship with CAKUT. A recently published report builds upon this observation and expands its scope.

Sanna-Cherchi et al. [26•] used SNP arrays to examine the burden of rare CNV, >500 kb, in 192 patients with renal agenesis and/or hypodysplasia (RHD). Known and novel disorders of genomic copy number were discovered in this cohort and then replicated in 330 RHD cases from two independent cohorts. As compared to CNV of controls, subjects with RHD were found to have larger CNV and enrichment of duplications versus deletions. There were 34 known genomic disorders detected in 55/522 cases, including renal cysts and diabetes syndrome, Potocki-Lupski syndrome, and Wolf-Hirschorn syndrome. In addition, another 40/522 cases had large CNVs impacting genes that were absent or extremely rare in 13,838 controls, thus identifying 38 novel or rare genomic disorders for RHD. Altogether, this indicates that about 18 % of subjects with CAKUT have a large, rare CNV.

Genome-Wide Association Studies GWAS

The discovery of rare, causal variants associated with pediatric kidney diseases has been invaluable. However, because of their rarity, these specific variants will not impact the risk of disease across a population. To identify common variants associated with increased risk of disease in populations, a GWAS can be employed. Once discovered, the higher frequency of these risk variants allow them to be studied in other populations. Specific questions can be asked about how this risk SNP impacts other populations, disease phenotypes, or outcomes.

Encouragingly, in the past 4 years, GWAS has discovered risk alleles associated with glomerular diseases such as FSGS [27, 28], membranous nephropathy MN [29], IgA nephropathy (IGAN) [30], and acquired nephrotic syndrome [31], as well as with phenotypes relating to kidney function, such as estimated glomerular filtration rate (eGFR), chronic kidney disease (CKD) [32] and albuminuria [33]. The GWAS of FSGS and MN are notable in that variants of large effect size were identified using small, but very well, histologically characterized, cohorts of affected individuals. Identification of specific risk SNPs in all of these phenotypes have allowed them to be focused upon in numerous follow up investigations. We will discuss some of these pertinent studies that relate to the risk alleles associated with FSGS and eGFR/CKD.

Initial GWAS of African-Americans (AA) with FSGS and hypertensive ESRD implicated a chromosome 22 locus with greatly increased risk of these conditions [27, 34]. MYH9 was the initial gene implicated as harboring the causal risk variant for these phenotypes in AA. Four years later, it is known that while MYH9 is associated with modestly increased odds of disease, it is in fact, variants in apolipoprotein L1 (APOL1) at this locus that confer the majority of the increased risk of FSGS in this population [28]. There is an increased odds ratio (OR) of 17 for FSGS in individuals with two copies of the risk allele [35]. In the past year, APOL1 risk alleles have been studied in adults to characterize how their impact on a variety of kidney phenotypes or clinical decisions.

In a disease-based cohort study, FSGS associated with two APOL1 risk alleles was associated with earlier onset of disease and faster progression of disease [36•]. The same study showed that having two risk alleles also puts an AA at a 4 % lifetime risk for FSGS and explains 18 % of FSGS in AA. The presence of two copies of the APOL1 risk variants in AA adults were associated with microalbuminuria and eGFR <60 ml/min/m2 [37], increased risk of hypertensive end-stage nephropathy and rate of kidney functional decline [38], and lower age of initiation of dialysis in those with non-diabetic ESRD [39, 40]. Finally, it was shown that in kidney transplants, allograft survival was shorter if the AA donor had two APOL1 risk alleles [41], but was not affected by the APOL1 status of the recipient [42].

In regards to loci that impact eGFR and CKD, Kottgen et al. performed two GWAS in 2009 and 2010 on more than 85,000 adults of European ancestry and discovered common variants in more than twenty loci associated with eGFR, prevalent CKD, or creatinine production/secretion [32, 43]. As opposed to the two APOL1 risk alleles in FSGS that had large OR, all of these risk variants for CKD/eGFR together explain less than 2 % of the variance seen in eGFR in individuals. As in APOL1, follow-up investigations of these validated eGFR/CKD SNPs has focused on identifying the effects of these variants in other populations and in other phenotypes. There has also been interesting work done to use existing biological knowledge and functional studies to both prioritize risk loci for further investigation and to characterize the role of these SNPs and their associated genes in disease pathogenesis.

eGFR/CKD risk SNPs were applied to other phenotypes. SNPs associated with a lower eGFR were not found to be associated with urinary albumin to creatinine ratio or albuminuria [44•]. However, a SNP tagging the gene cubilin was found via GWAS to be associated with albuminuria [31], and it was also subsequently found to be associated with ESRD in native kidneys and graft failure in transplanted kidneys [45]. The majority of eGFR-associated SNPs were shown to be associated with or show a strong trend towards association with incident CKD, but were not strongly associated with ESRD [46].

In terms of clinical translation of GWAS findings in CKD, investigators created a genetic risk score composed of 16 SNPs associated with stage 3 CKD [47•]. They tested its performance to determine its ability to predict incident CKD stage 3 in a previously collected prospective cohort. They found that this test’s performance was no better than utilizing known clinical risk factors.

Finally, multiple studies are integrating eGFR/CKD GWAS data with biological functional data, such as gene expression data or gene ontology analysis. One successful study used this approach to help prioritize novel risk SNPs for kidney function [48•]. Another study determined the genes linked to the top SNPs implicated in a GWAS of CKD, and then used morpholino knockdown of these genes in zebrafish to illustrate the importance of these gene on the kidney phenotype [49, 50].

Future Directions

In both the near- and long-term, the future of genomic inquiry in pediatric nephrology is exciting (Table 2). Aided by the ability to successfully perform gene discovery studies using small kindreds or single affected individuals, the genetic lesions underlying other Mendelian forms of pediatric renal disease will be discovered using WES and WGS. In addition, the role of rare variants in highly conserved, non-coding or regulatory regions of the genome may be implicated in some inherited forms of renal disease. The deleterious effects of non-coding variants is becoming increasingly evident in other recently published work [51, 52].

Table 2 Near- and long-term future directions for genomic research of pediatric kidney disease

WES-based or WGS-based studies of unrelated children with the same clinical phenotype will identify rare SNV associated with disease. Additionally, meaningful genetic risk variants may be found in association studies utilizing relatively small cohorts of well-phenotyped children with non-Mendelian forms of kidney diseases. The discovery of significant risk variants with large effect sizes in a GWAS of FSGS using 190 subjects [27] and of MN in a 75 subject cohort [29] supports this concept. Genotyping children for the risk SNPs discovered from adult GWAS, such as those associated with eGFR and CKD [32, 43] and IGAN, may help us to understand the genetic influence of these diseases in pediatrics patients [30].

We will capitalize on large, deeply phenotyped cohorts of children with non-Mendelian, complex disease such as NS in the Nephrotic Syndrome Study Network (NEPTUNE) [53] or CKD in the Chronic Kidney Disease in Children (CKiD) Study [54]. Genotyping arrays could identify novel disease-associated loci using either GWAS or specialized pathway-based SNV analyses. Genome-wide sequencing or targeted NGS efforts can be used as well to discover novel rare variants. Finally, these cohorts would allow the study of the effects of known risk variants previously identified in adult studies.

Finally, we will continue to integrate variant data with previous biological knowledge and newly generated gene expression information to expand our understanding of the genomic influence on the pathogenesis and progression of pediatric kidney disease. Recent results from the ENCODE project indicated that 80 % of the human genome has a functional role [55•]. The series of studies that used eGFR/CKD GWAS data demonstrated that defining whether a non-coding SNP from a GWAS has a functional effect can both narrow down a list of candidate SNPs as well as elucidate the biological effect of the particular variant on the disease under investigation [43, 48, 49].

This integrative, functional genomics approach will also be employed by NEPTUNE [53] to understand the role of regulatory variation in the pathogenesis of NS. Subjects with NS will undergo WGS, targeted exon sequencing, and exome chip genotyping, as well as generation of gene expression data derived from glomerular and tubulointerstitial tissue obtained via renal biopsy. With these DNA- and RNA-based datasets, subjects with known or novel genetic risk variants will be identified, and the effects of these variants on gene expression in their own diseased kidney tissue can then be functionally characterized. This information will then be integrated with clinical data to connect coding and non-coding genetic variation to clinical phenotypes and outcomes. In the future, this approach could be applied to many pediatric renal diseases in cohorts of varying sizes.

Conclusion

In this era, it is apparent that almost every child with a kidney condition can be enrolled in some form of genomic investigation. Family-based studies can now be performed in kindreds of all sizes to discover rare, causal variants for conditions such as disorders of bone and mineral metabolism or nephronophthisis. A child with a particular phenotype, such as renal dysplasia, that has not been evaluated genetically, can be enrolled in a cohort with similarly affected, unrelated children. If a child has a phenotype (such as steroid-resistant nephrotic syndrome or atypical hemolytic uremic syndrome) with known disease-causing genes, they can all be screened simultaneously to achieve a genetic diagnosis. For those children who remain undiagnosed, further discovery efforts can be done in a genome-wide manner or by analyzing hundreds of candidate genes simultaneously. Children with sporadic or syndromic forms of CAKUT may benefit from undergoing an analysis for CNV. Finally, for children affected by those conditions that do not follow simple Mendelian inheritance patterns, such as NS, CKD, or nephrolithiasis, enrollment into a cohort with similar individuals could allow unique approaches that incorporate common and rare sequence and structural variant and/or gene expression to discover risk alleles.

Moving forward, efforts should be made to increase awareness that many of children with a renal phenotype may have a genetic component to their conditions. It is important to help clinicians determine whether a clinical- or research-based genetic study is indicated. And from a research perspective, the process of connecting patients and families with a particular condition to the appropriate investigator for the optimal genetic test should be made easier. By identifying the genomic contribution to pediatric renal disease in increasing numbers of children, diagnostic tests can be improved to include those genes known to be involved in the disease, genotype–phenotype correlations could be discovered or refined to inform clinical decisions making, and key biological pathways and networks can be identified and targeted for therapeutic development and intervention. It is with these advances that the discoveries from genomic research can be increasingly translated into genomic medicine for children with kidney disease.