Deviating from the norm

At first glance the nucleus is a highly conserved organelle. Overall nuclear morphology, the octagonal nuclear pore complex, the presence of peripheral heterochromatin and the nuclear envelope appear near constant features right down to the ultrastructural level. New work is revealing significant compositional divergence within these nuclear structures and their associated functions, likely reflecting adaptations and distinct mechanisms between eukaryotic lineages and especially the trypanosomatids. While many examples of mechanistic divergence currently lack obvious functional interpretations, these studies underscore the malleability of nuclear architecture. I will discuss some recent findings highlighting these facets within trypanosomes, together with the underlying evolutionary framework and make a call for the exploration of nuclear function in non-canonical experimental organisms.


Background
Eukaryogenesis, one of the most significant events in life history, was the transition of prokaryotic into eukaryotic forms, and encompassed the origins of most eukaryotic organelles [1e4] (Figure 1).Eukaryotes likely arose from archaebacterial ancestors and the splendidly named Asgard archaea represent, at the genome level, the closest known relatives [5,6], albeit with no evidence of internal membranous cellular structures [7].Important eukaryogenesis events are the origin and diversification of intracellular membranebounded organelles, including the nucleus.The critical proteins involved in organelle identity, construction and maintenance are largely derived from paralogous families [8].Significantly, the evolution of protein transport vesicle coats and membranous organelles has occurred more than once, as such features are prominent within some bacteria, but these organisms are not ancestral to the eukaryotic lineage [9].
The ultimate product of eukaryogenesis, the last eukaryotic common ancestor (LECA), was a complex flagellate [10] that diversified into the modern extant groupings.Among these, trypanosomes (Euglenozoa) represent the earliest branch and contain a great diversity of unicellular organisms.The Euglenazoa include Diplonema (an abundant group of aquatic organisms), Euglenids (mainly photosynthetic freshwater forms) and Kinetoplastida (predominantly parasitic species).Kinetoplastida in particular display considerable divergence, albeit frequently examples of extreme utilisation of a particular mechanism rather than complete novelty [11].For example, trypanosomes use an unusual trans-splicing mechanism for mRNA maturation, where a 39 nucleotide sliced leader is ligated to the 5' end of maturing messages, and, coupled with a polycistronic mode of transcription, removes much of the influence of promotor elements from transcriptional regulation.Trans-splicing is almost exclusively used for trypanosome mRNA maturation, whereas trans-splicing occurs elsewhere in eukaryotes, but in combination with cis-splicing [12].Despite parasitic lifestyles, which invokes concerns of secondary losses due to reliance on host resources, multiple divergent features are present across the Euglenazoa include the free-living Diplonema, suggesting that many of these aspects arose at the root of the Euglenazoa or even earlier, and hence are not a result of parasitism per se, suggesting broader relevance.Most mechanistic studies of Euglenozoa focus on a small number of human pathogens, prominently Trypanosoma brucei, the African trypanosome, a haemoflagellate with a complex life cycle.Conventional interpretations of trypanosome divergence are that these reflect features from the LECA as well as specific adaptations from later evolutionary events.An alternative interpretation is that trypanosomes represent more closely the configuration in the LECA, with other lineages having implemented distinct systems post divergence from trypanosomatids.Taking several recent examples within nuclear biology I will consider these possibilities.

Trypanosomes are the analog kids
Multiple widely conserved factors are absent from trypanosomes, including the LINC complex, lamin-binding proteins, canonical Cajal bodies, much of the nuclear envelope protein cohort and many mRNA processing complexes [13*,14,15*,16].However, a plethora of core nuclear functions are supported by analogs, i.e. proteins/ protein complexes with equivalent functions but lacking evidence for shared ancestry.Prominent examples are the lamina, kinetochore and several subcomplexes of the nuclear pore complex (NPC).Briefly, there are no lamins in trypanosomes but nuclear integrity, telomeric positioning and silencing are supported by two large coiled coil proteins, NUP-1 and NUP-2, that form oligomers and connect with the NPC [13*,14].Similarly, the kinetochore is composed of kinetoplastida-specific proteins, albeit with a level of organisation into inner and outer complexes that resembles the canonical form [17*].Both of these examples are probably specific to the kinetoplastida and do not extend into the euglenids [12], which suggests an origin following the Euglenazoa/ main eukaryote lineage split.Adaptations to the nuclear pore complex are also recognised, and specifically a symmetric arrangement of the FG-repeat nucleoporins (which is distinct from animal and fungal NPCs where there is an asymmetric arrangement), absence of a cytoplasmic mRNA export platform and possibly replacement by cytoplasmic mRNA processing structures or nuclear peripheral granules, together with novel nuclear basket protein components [18e20].Once more these analogs may be specific to the kinetoplastida, but in silico evidence is equivocal and experimental evidence lacking, and hence these adaptations may extend more widely.Regardless, these examples underscore the distinct nuclear organisation in trypanosomes, but more recent work is extending these observations further to provide additional examples of divergence.
Mex67 is a conserved mRNA transport factor and functions within a heterodimeric complex that also includes mtr2p (yeast nomenclature) and which binds to mRNA.In animals, fungi and higher plants mex67:mtr2p acts in an ATP-dependant manner to facilitate mRNA export.There is no evidence for an interaction between Mex67 and the GTPase Ran that modulates protein transport across the NPC.Trypanosomes possess three orthologs of Mex67, two of which have canonical architecture and the third is somewhat divergent [21**].Rather uniquely, all exploit a Ran-dependant export mechanism, with division of labour between the two canonical Mex67 paralogs in mRNA transport, while the third paralog associates with both ribosomal assembly proteins and rRNA, indicating an mRNA-independent function (see Ref. [22] for a detailed discussion).Timeline of life on Earth and divergence in nuclear structure.Panel a: Key events from the origin of life to the present day.Life originated over four billion years ago, with eukaryotes appearing up to two billion years ago (although this is contested).It is accepted that eukaryotes arose from Archaea, and a transitional period saw the emergence of the eukaryotic cellular bauplan, likely via a combination of paralog gene expansions, acquisition of the mitochondrion (not shown) and other processes.Euglenazoa, which contain the trypanosomatids, most likely arose as one of the earliest eukaryotic lineages, with fungi and metazoa as amongst the more recent.For scale, Tyrannosaurus rex, a late Cretaceous (Maastrichtian) species, arose ~65 million years ago.Numbers are millions of years.The figure is intentionally simplistic and the reader is referred elsewhere for insights into the complexity of this topic.Panel b: Range of divergence in nuclear components.Key nuclear factors are indicated above the line which is coloured blue to red for conserved to part replaced (amalgum) to absent.Significantly, several processes such as mRNA processing and chromatin modification, appear to be an amalgam of conserved and lineage-specific proteins.
mRNA maturation involves multiple interacting complexes and include SpteAdaeGcn5 Acetyltransferase (SAGA), TRanscription-EXport (TREX) and TREX2.SAGA is a coactivator of transcription (with multiple roles and over 18 subunits) and interacts with TREX-2 (five subunits), a nuclear pore complex-associated complex involved in genome stability, mRNA biogenesis and export.Significantly TREX2 and SAGA share the Sus1 protein.TREX possesses seven or more subunits and associates with both RNAP II and the mRNA export machinery.Together this trio of multisubunit complexes support a plethora of functions coordinating gene expression in a wide context.Comparative genomics suggests that the subunits of SAGA, TREX and TREX2 (with the exception of Sub2), are either absent or of such divergence as to be unrecognisable in trypanosomes.The former possibility was supported by proteomics analysis of mRNA maturation complexes, which failed to identify the absent subunits [15**].Moreover, this work also identified a considerable cohort of trypanosome-specific proteins, suggesting that analogs are functioning to process and validate mRNA for export [15**].Overall, these examples suggest a conserved core of proteins involved in mRNA export, including Mex67, Sub2 and many nucleoporins, but surrounded by a constellation of lineage-specific proteins.A combination of polycistronic transcription and trans-splicing essentially removes the role of promoters and the need for complex mechanisms to both mediate and validate splicing in trypanosomes, as essentially there is a single splicing event common to all protein coding messages.Some workers, including this author, have argued that an exclusive use of trans-splicing is the driver behind the specialisations at the NPC, within mRNA maturation complexes and elsewhere [22], but when taken together the divergence between kinetoplastids and remaining eukaryotic lineages may speak to something deeper in terms of evolutionary origins.Mechanistically there is little that differentiates trans-and cis-splicing, and arguments for secondary losses due to 'no frills' splicing pathways may be over-simplistic in light of the presence of so extensive a cohort of lineage-specific proteins, which essentially excludes simplification within the trypanosomes.

Moving pictures: heterochromatin and transcriptional regulation
The presence of polycistronic transcription and transcriptional start sites (TSSs) responsible for recruitment of RNAP II in trypanosomes is well documented, with clear evidence that histone modifications regulate genome activity [23*,24*].Similarly to higher eukaryotes, post-translational modifications or replacement of canonical histones with variants leads to alterations in nucleosome packing and provides binding sites for specific factors.Trypanosomes contain one variant of each of the four core histones [25,26] with TSSs enriched with H2A.Z and hyperacetylated nucleosomes.
A significant array of histone modifications are present in trypanosomes, including acetylation, methylation and phosphorylation marks [27*].Significantly, the canonical H3K9 methylation site, which in higher eukaryotes is critical to specifying heterochromatin, is absent but a lysine is present at position 10 and can be methylated [27*], albeit at low abundance, suggesting that the histone code for heterochromatinisation in trypanosomes is likely distinct.Histone acetylation events are part of the mechanism defining the TSS.Specifically, depletion of one of two MYST-class acetylases leads to loss of TSSassociated H4 acetylation and H2A.Z deposition and altered RNAP II initiation sites.Depletion of the second MYST acetylase decreases H2A.Z acetylation, leading to a global decrease in transcripts [27*].This suggests these two histone acetylases function in tandem to secure TSS fidelity.
Histone modification also acts for recruitment of multiple factors controlling transcriptional activity, and are recognised by proteins bearing multiple different domains, the most prominent being bromo and chromodomain proteins.A extensive cohort of proteins assemble into a set of subcomplexes and associate with TSSs and H2A.Z marks in trypanosomes.Within this cohort are multiple bromo-domain proteins that participate in four interaction networks enriched at TSSs [28**] (Figure 2).Additionally two SET-domain proteins, SET27 and SET26 (likely methyltransferases), define a TSS-associated complex and a complex more extensively associated with downstream transcription units, respectively.Furthermore, SET27 forms a complex with a chromodomain protein and additional factors to constitute a SET27 promoter-associated regulatory complex (SPARC), and critically, when silenced, leads to upregulated mRNA expression across megabase chromosomes as well as derepression of subtelomeric variant surface glycoprotein (VSG) genes [29**].Broadly, roles for bromo, chromo and SET-domain protein functions appear conserved in trypanosomes [30], but many novel proteins were also identified in these studies, a reflection of the amalgam observed in RNA maturation pathways discussed above.
A final example of distinct mechanisms in trypanosomes comes from recent studies of the telomerase complex, and which is intimately connected to antigenic variation.The minimal telomerase enzyme is telomerase reverse transcriptase (TERT) and telomerase RNA (TR), but, as in higher eukaryotes, the complex in vivo contains many additional factors, which in trypanosomes are both conserved and divergent.For example, the dyskerinbinding H/ACA domain in metazoan TR is replaced by

Vital signs: the key to antigenic variation
A key survival mechanism of African trypanosomes is immune evasion, and in particular antigenic variation.VSGs are expressed at the parasite surface and monoallelic expression mechanisms ensure that only one VSG is expressed, albeit with periodic switching between VSG genes to accomplish antigenic variation.Relevant here is that there are some 15 VSG expression sites (ESs), each situated at a subtelomeric location and each is transcribed from an RNAP I promoter producing a polycistronic message of several expression site-associated genes and a single VSG (Figure 2).Expression of a single ES is secured by complex chromatin interactions and significantly, while the mechanism appears specific to African trypanosomes, the vast majority of factors involved are either pan-eukaryotic or pan-kinetoplastida, indicating a repurposing rather than wholesale evolution of new chromatin modulators for monoallelic expression.The single active ES is located within the ES body (ESB), which locates close to, but is distinct from, the nucleolus.Multiple proteins associated with the ESB are known (Figure 2), and various approaches indicate that epigenetics, telomeric complexes, the nuclear lamina and other chromatin factors including SUMOylation, all contribute towards ESB function and/or monoallelic expression, as silent VSG ESs are silenced by association with heterochromatin [14,33,34*].
Amongst key players activating an ES are VSG exclusion proteins, VEX1 and VEX2, which physically associate with the histone chaperone CAF complex [35] and locate proximal to the ESB.VEX1 and 2 have major roles in monoallelic expression; when silenced many VSGs are  Interestingly, ESB1 is non-essential in insect stages that do not express VSG, but this is likely a consequence of the essentiality of VSG expression.ESB1 is restricted to trypanosomes, with a similar evolutionary distribution to VEX1, albeit with no evidence for direct interaction between these proteins.The presence of a RING-like domain at the C-terminus of ESB1 suggests a role in ubiquitylation, and is also consistent with ESB1 overexpression leading to activation of additional VSG ESs, potentially due to degradation of regulatory factors [38**].
An additional subnuclear structure, NUFIP, located close to the ESB, has been described recently [37*].NUFIP contains at least five kinetoplastida-specific proteins, all of which contain RNA-binding motifs.The precise role of NUFIP is unclear, but perhaps most provocative is that all identified components are also associated with the divergent kinetochore, suggesting a connection between VSG transcription and chromosomal segregation.

Conclusions and outstanding questions
It is generally accepted that trypanosomes (Euglenozoa) are one of the earliest lineages to part from the main eukaryote grouping, a history reflected in possession of nuclear systems ranging between conserved to complete replacement.Further, inactivation of specific loci (e.g.telomere-proximal VSG genes) by heterochromatinisation is analogous with higher eukaryotes; inactive genes are associated with the lamina, telomere-specific protein complexes, SUMOylation status and a nuclear peripheral location.Electron dense material is located at the nuclear periphery (as well as elsewhere in the nucleus), but the absence of histone modifications normally associated with heterochromatin points to mechanistic divergence.There is also clear novelty within mRNA processing, transcriptional start sites and nuclear export mechanisms that extends to a deep level.This raises a considerable conundrum; Are trypanosomes reflective of the state that existed in the LECA nucleus, or representative of divergence post-LECA, illuminating a cohort of lineage-specific adaptations?While present evidence does not provide a robust answer, whch may lie between the two extremes, the sheer scope of changes in trypanosomes indicates huge flexibility within the manner by which the nucleus operates and profoundly influences our views of transcriptional mechanisms within the LECA.

Declaration of competing interest
The author declares that they have no known competing financial interests or personal relationships that could, have influenced the work reported in this paper.

Figure 1
Figure 1 a novel C/D box domain in trypanosomes and several unique C/D box snoRNA binding proteins (snoRNPs) are present [31*].Most unusually, universal minicircle sequence-binding proteins (UMSBPs) are conserved at replication origins of minicircles in the mitochondrial genome of kinetoplastids, but also operate at telomeres.Of two paralogs, TbUMSBP1 is mitochondrial but TbUMSBP2 is nuclear and a telomerase complex component [31*].Silencing TbUMSBP2 leads to a decrease in nucleosome disassembly with impacts on gene expression, most significantly VSG [32**].This is particularly interesting as the experiments were performed in insect stage trypanosomes where all VSG genes are normally inactive.

Figure 2 Unique
Figure 2 expressed.VEX1 binds chromatin at the spliced leader locus, while VEX2 associates with the active ES [36**], also explaining how VSG transcription is so efficient by delivering VSG transcripts to high concentrations of spliced leader.Nearly 700 VSG mRNAs are estimated to be synthesised per hour, ultimately producing 10% of cell protein [37*].Significantly, both VEX1 and 2 are present beyond African trypanosomes, albeit with VEX1 in trypanosomes only and not extending to Leishmania or more basal kinetoplastids, while VEX2 is present across all kinetoplastida, but not beyond.Regardless, VEX1 and 2 provide a paradigm for the recruitment of proteins into highly specific functions from presumably more general roles, as the VSG/SL association and extreme expression level is unknown in other kinetoplastida.Additional proteins associated with the ESB and other subnuclear compartments are being identified and provide novel examples of mechanisms controlling nuclear/ chromatin architecture, although our current picture is likely very sparse.ESB1 is essential in VSG-expressing lifestages and required for ES transcription [38**].