Over the past three decades, the combination of genetic analysis based on well-defined families with inherited disease, and pathobiochemical studies of aggregated brain proteins, has paved the way towards our present understanding of the causes and effects of many neurodegenerative diseases. One of the first disorders to be investigated in this way was Huntington’s disease (HD), where genetic linkage studies first established the HD locus on chromosome 4 in 1983 [27]. Nonetheless, it took a further 10 years to ascertain that association to this locus was due to a mutation in the huntingtin gene (HTT), this being a trinucleotide (CAG) repeat in the intron to exon 1 of the gene [60]. Normal individuals carry up to 32 repeats, whereas individuals with HD bear in excess of 36 repeats. Success in this area quickly led to the identification in 1994 of similar disorders, such as Dentato Rubro Pallido Luysian Atrophy (DRPLA) [36] and spinocerebellar atrophy type 1 (SCA-1) [5], also with CAG repeat expansions.

A breakthrough in Alzheimer’s disease (AD) research came in 1991 with the identification of missense mutations in and around the sequence region responsible for encoding the amyloid β peptide (Aβ) sequence component of the amyloid precursor protein (APP) gene on chromosome 21 [24], from which the peptide is derived. This was an entirely logical and expected finding once the peptide sequence of Aβ had been identified [23], and it had been recognized that persons with Down’s syndrome bearing an extra copy of chromosome 21 developed the pathology of AD by middle age in an entirely predictable and orderly fashion, initiated by the deposition of Aβ within the cerebral cortex [41]. Soon further causative mutations were identified in the presenilin-1 (PSEN-1) gene in families showing linkage to chromosome 14 [56], quickly followed by the finding of mutations within a homologous gene, presenilin-2 (PSEN2) [54]. APP and PSEN-1 form an active focus that affect the degradation of APP into Aβ, favouring an alternative cleavage pathway involving sequential cleavages by β- and γ-secretases cutting the APP molecule at the amino-terminus and carboxy-terminus of the Aβ sequence to yield the peptide. Use of this pathway is normally prevented, or is at least minimized, by the employment of a further enzyme, α-secretase, which cuts APP across the mid-point of Aβ peptide sequence, thereby precluding formation of Aβ. Most mutations in APP, and those in PSEN-1, increase the use of this alternative pathway, elevating the production of Aβ to a point where it can then aggregate into the amyloid plaques characteristic of AD. Hence, genetics signposted the ‘amyloid cascade hypothesis’ [28], which despite its many critics still remains the most plausible explanation of the cause of AD to date. The recognition that mutations in genes in AD, allied to the aggregated protein, resulted in the identification of mutations in α-synuclein gene (SNCA) given that the protein α-synuclein had been identified [59] as an integral component of the Lewy body—the classic neuronal lesion of Parkinson’s disease and dementia with Lewy bodies.

While this was happening, other groups were focusing on the genetics of FTLD, trading on the high familial prevalence (as much as 40 %) in individuals with this disease. Observations of the presence of tau protein within structures resembling the neurofibrillary tangles of AD, or in the ‘so-called’ Pick bodies, led to the identification in 1998 of a causative locus on chromosome 17, for which mutations in the tau gene (MAPT) itself were unsurprisingly responsible [31]. Missense mutations outside exon 10 of MAPT favour the phosphorylation of tau, preventing it from normally binding microtubules, and diverting it towards aggregation. On the other hand, mutations in and around exon 10 promote the use of an alternative splicing site which increase the generation of tau transcripts containing this exon [31]. When exon 10 is included, tau proteins with four microtubule binding sites (4-R tau) are produced—exclusion leads to proteins with three repeats (3-R tau). Transcription of tau isoforms is tightly regulated, and disturbance of stoichiometry in favour of 4-R will lead to an excess production of this, and its intracellular aggregation [31]. While this, in itself, may not necessarily compromise microtubule function, the tau aggregates, as in AD, may exert toxicity, either as fibrillar, or non-fibrillary (oligomeric) structures.

Nonetheless, there remained other families with linkage to chromosome 17 where mutations in MAPT could not be identified. For a long time thereafter, unusual causal genetic events in MAPT were unsuccessfully sought, though the lack of tau pathology in affected individuals should have directed efforts elsewhere. Consequently, it was found in 2006 [4, 14] that disease in such families was associated with insertions/deletions in the progranulin gene (GRN), located almost next door to MAPT! Most mutations drive a similar loss of function effect related to a haploinsufficiency of PGRN protein caused by nonsense mediated decay of prematurely terminated RNA transcripts [4, 14]. The enthusiasm of discovery of this second gene for FTLD was, however, tempered by the knowledge that collectively MAPT and GRN mutations were unlikely to explain more than 20 % of genetic variation associated with inherited FTLD, and less than 10 % of all cases of FTLD. The hunt was on for further loci.

The discovery that the aggregated protein present in around 50 % of cases of FTLD, and in those cases with GRN mutation specifically, was the nuclear transcription factor, TDP-43, and not PGRN itself (the latter observation would have been predicted given the haploinsufficiency state), combined with the observation that this same protein was contained in the inclusion bodies in motor neurons in ALS [2, 51], consolidated the long known clinical association between FTD and ALS [30]. Moreover, it pointed towards a shared locus in families with FTD, ALS or FTD + ALS clinical phenotypes, known to be associated with linkage to chromosome 9 [48, 64]. Initially, the linked region provided by such families was wide, though eventually GWAS studies in ALS and FTLD narrowed it to a region containing just three genes [55, 65].

Finally, late in 2011, the chromosome 9 baby was born. This relates to a hexanucleotide [GGGGCC (G4C2)] expansion in a gene with unknown function, termed C9ORF72 [16, 53]. Since then, work on the gene in both health and disease has exploded, with around 300 publications on C9ORF72 currently being listed in PubMed. In the series of articles included in this issue of Acta Neuropathologica, the genetic structure of C9ORF72 is discussed in terms of how it might influence clinical phenotype (Cooper-Knock et al. and Woollacott and Mead). Pliner and colleagues comment on the origins and spread of the expansion. The tissue consequences of bearing the expansion are described and discussed by Neumann et al. Gendron and colleagues highlight possible toxic mechanisms associated with the expansion, and Stepto and colleagues critically evaluate present cell and animal models in terms of how these have furthered our understanding of the effects of carrying the expansion.

So what have we learned in the past 2 years? Firstly, we know that expansions in C9ORF72 account for 20–80 % of familial and 5–15 % of sporadic ALS and FTLD in North American and European populations [16, 53], making this the most common cause of disease in these groups so far identified. However, world prevalence varies with highest numbers of cases seen in Northern Europe, decreasing towards Southern Europe, and being uncommon in Asia, raising question of a common founder [44], or repeated genesis due to inherent chromosomal instability (see Pliner and Cooper-Knock). Healthy individuals generally carry no more than 20 repeats (and usually only a handful of such), but in persons with inherited FTLD or ALS, the expansion is huge, running to as many as 2,000 repeats [16, 53]. Intermediate alleles of 20–30 repeats have been recorded [11, 25], though the status of these in terms of disease risk remains unclear (see Cooper-Knock and Woollacott and Mead). Apart from questions as to how the expansion translates into disease mechanism, there are other serious issues to be resolved and explained. For example, does the number of repeats affect age at onset or disease severity (as in HD)? This has been difficult to assess due to technical limitations in the resolution power of southern blotting, though refined techniques now make determination of expansion length more accurate [6, 10, 62], but still no associations with age at onset or disease severity have been detected [17, 62]. However, such observations are confounded by regional mosaicism, with reports that longer repeat lengths in cerebellum are associated with a longer, not shorter, duration (see Cooper-Knock). Such somatic heterogeneity is consistent with an inherent degree of instability within the repeat and evidence of anticipation has been reported with reducing age of onset in subsequent generations [13].

Secondly, given that there appears to be no consistent (group) differences in repeat length between ALS and FTLD cases [17, 62], it remains to be explained how the pathological or clinical phenotype (FTD vs. ALS) is determined. Expansion length variability, or epigenetic factors, such as TMEM106B genotype (see articles by van Blitterswijk et al. and Gallagher et al. this issue), could explain phenotypy, though there is much more to be done in this respect. Nonetheless, observations of clinical heterogeneity suggest gene modifiers which could indicate alternative therapies.

What do we know regarding how possession of the expansion might cause disease? We know that C9ORF72 is variably transcribed with at least three major transcripts being formed (see Stepto), and while there is evidence for loss of transcription with haploinsufficiency [7, 16], it is notable that rare cases of homozygosity for expansion in C9ORF72 do not display a more aggressive clinical phenotype, and appear not to accrue greater pathology [20]. This sets it apart from other disorders, where homozygosity yields a more severe, or even a different, phenotype (see TREM2 and Nasu-Hakola Disease vs. AD [26], or GBA-1 and Gaucher’s disease versus Parkinson’s disease [57]), thereby arguing against a haploinsufficiency state.

Investigations into the inheritance and cause of certain other neurological disorders, like spinocerebellar ataxia type 8, myotonic dystrophy types 1 and 2 (DM1 and DM2) and Fragile X Tremor/Ataxia Syndrome (FXTAS) have provided major clues as to pathogenicity. In the latter disorders, the major pathogenic function of repeat-expanded RNA arises from the formation of nuclear RNA foci [42, 49, 66]. As in these latter conditions, in C9ORF72 disease toxic RNA foci are formed from the expanded sequences through formation of hairpin structures, and these might sequester other RNA transcripts and binding partners [43] (see Gendron), thereby reducing key elements of transcription and protein production. Apoptosis, through engagement of caspase-dependent mechanisms may then ensue (see Stepto). While this remains a plausible mechanism underpinning neurodegeneration, the critical partner(s) in crime remain to be identified. A growing list of potential candidates, such as hnRNP A3, hnRNP-H, Pur-alpha and ADARB2 have been reported from in vitro binding studies [18, 38, 46, 67], but which, if any of these, is guilty remains to be determined.

In Spinocerebellar Ataxia type 8 and myotonic dystrophy type 1, expanded (CTG) repeat sequences can be translated in the absence of an upstream AUG codon, a process known as non-ATG initiated (RAN) translation [68]. This too occurs in C9ALS and C9FTLD with translation of the expansion occurring in both a sense and an antisense direction, leading to the formation and brain aggregation of all five possible dipeptide repeat proteins (DPRs) [3, 21, 45, 47, 67, 69]. Indeed, DPR can consist of any or all of the five possible species [3, 21, 39, 40, 45, 47, 67, 69]. These DPR proteins seem to be entirely specific to the expansion, not being seen in FTLD and ALS cases without expansions [39, 40], though curiously are not specific for FTLD and ALS per se with rare appearances in cases with pathologically confirmed corticobasal degeneration [40, 58], AD [12, 29, 35] but see [15], Creutzfeldt–Jakob disease [6], and even in apparently healthy controls [6, 13]. This raises issues of penetrance, or even preclinical disease [6] (see Cooper-Knock, Mead). Given that proteinaceous accumulations have figured strongly in the pathogenesis of AD, Parkinson’s disease, Creutzfeldt–Jakob disease, and in forms of FTLD associated with tau, TDP-43 and FUS, it is highly tempting to regard DPRs in this light, and attribute neurotoxic properties. However, while this is possible—it remains to be proven, at least in a disease context. In C9 FTLD/ALS the major cell types affected are within the cerebellar cortex (granule cells), hippocampus (dentate gyrus and CA2-4 pyramidal cells), and pyramidal neurons in deeper layers of the cerebral cortex, even the occipital cortex, and various subcortical structures [1, 9, 39, 40]. Nonetheless, the distribution of such DPR-containing cells is at odds with the known distribution of neurodegeneration in FTLD/ALS, which itself maps closely to that of TDP-43 inclusion distribution [39] (see Neumann). Indeed, overlap of DPR and TDP-43 is confined to a few cells in DG, but interestingly it would appear that within such co-localisations DPR is innermost and TDP-43 outermost, suggesting DPR changes might predate, or even predispose to TDP-43 pathology [39]. Despite this, it cannot be denied that the cells affected by DPR appear not to be those lost in FTLD/ALS. Consequently, whether these structures confer anything beyond diagnostic utility remains questionable. It is possible that DPR inclusion body formation is a potentially protective response to cope with soluble toxic protein species, as is possible in other neurodegenerative diseases. However, in the latter context, it is notable that those cells that are vulnerable to TDP-43 pathology and lost in FTLD and ALS never appear to show at least some DPR aggregates, which might be expected if precursor soluble species were present at any stage in the cell death process. Hence, we have the alternative that the production of DPR proteins is a disease marker, but not a relevant factor in C9ORF72 disease pathogenesis.

Therefore, at present we have three plausible, and indeed not mutually exclusive, hypotheses for neurotoxicity: in fact studies have shown that cells accumulating toxic RNA foci again do not overlap with those accumulating DPRs [21, 43], suggesting these are complimentary and not competing events. So where does this leave us?

A further line of argument could be that the expansion does not in fact determine, or direct, disease, per se, but in some way damages the brain, or reduces its capacity for resistance, thereby opening the door to those very same factors which might cause FTLD/ALS in non-C9ORF72-associated disease. There is good evidence in support of this line of argument. Firstly, although rare, pathologies other than FTLD/ALS, such as corticobasal degeneration [40, 58], AD [12, 29, 35] but see [15] and Huntington disease-like symptoms [6], have been associated with expansions in C9ORF72. Indeed, there has been one Belgian case with C9ORF72 expansion and clinical FTLD which lacks detectable TDP-43 pathology, but shows FTLD-UPS pathology [22]. Secondly, there appears to be no link between expansion size and TDP-43 histological type [40], and while a FTLD-TDP type B (preponderance of neuronal cytoplasmic inclusions) is the most common TDP-43 histological subtype associated with C9FTLD and C9FTLD/ALS [39], a significant number of cases display type A histology [40], and rare cases with type C histology have been described [33, 50]. If expansions in C9ORF72 were to directly drive, or trigger, events leading to a TDP-43 proteinopathy, it is hard to reconcile this with the diversity of TDP histologies recorded. Thirdly, there is a higher than expected coincidence of repeat expansions in individuals carrying other genetic variants involving mutations in GRN with C9ORF72 [19, 63], or MAPT with C9ORF72 [8, 34, 37, 63] (so-called oligogenic inheritance [61]), suggesting that another ‘hit’ may be necessary for clinical disease, yet in these, apart from the DPR changes, either a TDP-43 proteinopathy, or tauopathy, typical of the accompanying (GRN or MAPT) mutation prevails [8, 19, 34, 37, 63].

Consequently, until we know more about the normal function of C9orf72 protein, what cells it is present in, and where in the cell it is located, all is mere speculation. Nonetheless, the concept of the expansion being a risk factor for disease, rather than a cause, is tempting since it would neatly accommodate the observed diversity of clinical and histological subtypes associated with C9 FTLD/ALS. However, one problem would be to explain why expansions are also not (more widely?) seen in more common disorders like AD and Parkinson’s disease, though in these the frequent co-occurrence of limited TDP-43 proteinopathy could in some way stem from variations in C9ORF72 genetic structure or expression and allied protein function which render vulnerable regions of frontal and temporal cortex to a more limited TDP-43 proteinopathy. Nevertheless, it is clear that expansions in C9ORF72 are not without pathological effect, though what this might translate into in clinical terms is not at all clear, or even certain that they do have a clinical counterpart.

While the depth of knowledge regarding C9ORF72 increases at a pace, it is again worth remembering that present cases with the expansion, combined with those with GRN and MAPT mutations may still only account for about half of autosomal dominant inherited disease. This implies that there may be other major gene(s) linked to disease out there, waiting to be discovered. However, having said that, recent linkage analysis of cases originally employed to show association to chromosome 9, still show residual association, even when known C9ORF72 carriers are excluded [32]. Such observations imply either a second nearby locus (and there are precedents in the FTLD world for this with MAPT and GRN), or there is as yet undisclosed variability within C9ORF72 locus itself. Our own observations of FTLD cases with expansions in C9ORF72 determined by both Southern blot and the presence of p62/DPR pathology [52], but not showing this on previously validated repeat primed PCR [53], would imply an alternative sequence, and suggest that the present screening PCR methodology may not detect all cases with expansions. There may, therefore, be (possibly many) other cases that also bear (such variant) expansions in C9ORF72, and these could account for a substantial proportion of the remaining genetic variability within FTLD and ALS.

Thus at present, we are like the blind men describing the elephant as a whole from the various parts they can individually feel. We have an idea what it looks like, but not what it does, nor what happens when it goes wrong. We have firmly grabbed this particular tiger by the tail, but what lies at the head end, and how sharp its teeth and claws might be, remains a mystery. We presently have some clues, but there are many more questions to be answered before we know whether this particular beast actually roars or just plain whimpers.