Comparative Genomics of Peroxisome Biogenesis Proteins: Making Sense of the PEX Proteins

PEX genes encode proteins involved in peroxisome biogenesis and proliferation. Using a comparative genomics approach, we clarify the evolutionary relationships between the 37 known PEX proteins in a representative set of eukaryotes, including all common model organisms, pathogenic unicellular eukaryotes and human. A large number of previously unknown PEX orthologs were identified. We analyzed all PEX proteins, their conservation and domain architecture and defined the core set of PEX proteins that is required to make a peroxisome. The molecular processes in peroxisome biogenesis in different organisms were put into context, showing that peroxisomes are not static organelles in eukaryotic evolution. Organisms that lack peroxisomes still contain a few PEX proteins, which probably play a role in alternative processes. Finally, the relationships between PEX proteins of two large families, the Pex11 and Pex23 families, were analyzed, thereby contributing to the understanding of their complicated and sometimes incorrect nomenclature. We provide an exhaustive overview of this important eukaryotic organelle.


INTRODUCTION
Peroxisomes occur in almost all eukaryotes. Their number, size and protein composition are highly variable. In lower eukaryotes, such as yeast, peroxisome proliferation is stimulated by specific growth substrates. In higher eukaryotes, peroxisome abundance and composition vary with organism, tissue and developmental stage. Conserved peroxisomal pathways are the β-oxidation of fatty acids and hydrogen peroxide degradation. Examples of specialized pathways are the biosynthesis of bile acids and ether lipids in man, photorespiration in plants and the biosynthesis of antibiotics in certain filamentous fungi (Smith and Aitchison, 2013). The crucial role of peroxisomes for human health is illustrated by the occurrence of inborn errors that cause severe diseases and are often lethal. However, roles in non-metabolic processes such as aging, anti-viral defense and cancer show that the significance of peroxisomes in human health goes far beyond the relatively rare inherited peroxisomal disorders (Islinger et al., 2018).
Peroxisomes are very simple organelles that consist of a protein rich matrix surrounded by a single membrane. Peroxisomal enzymes almost exclusively occur in the matrix. The membrane contains transporters, pores for solute transport and proteins involved in diverse processes such as matrix and membrane protein sorting, organelle fission and movement (Figure 1).
In 1996, the term peroxin was coined for proteins "involved in peroxisome biogenesis (inclusive of peroxisomal matrix protein import, membrane biogenesis, peroxisome proliferation, and peroxisome inheritance)" (Distel et al., 1996). Peroxins are encoded by PEX genes and also called PEX proteins. So far, 37 PEX proteins have been described. Some are highly conserved, whereas others only occur in a limited number of species. Since 1996, tremendous progress has been made in our understanding of the molecular mechanisms involved in peroxisome biology. However, with the increasing number of PEX proteins, their nomenclature became more and more complex (Smith and Aitchison, 2013).
Previous comparative genomics studies on peroxisomes have shed light on the origin of peroxisomes (Gabaldón et al., 2006;Schlüter et al., 2006) and their absence in some species, mostly parasitic protists (Žárský and Tachezy, 2015;Gabaldón et al., 2016;Moog et al., 2017;Mix et al., 2018). However, a comprehensive and up-to-date overview of all PEX proteins was still missing. Here, we present an exhaustive up-to-date overview of all the PEX protein families. We analyzed PEX proteins in a highly diverse set of eukaryotes, including but not limited to model organisms frequently used in cell biology, pathogenic unicellular eukaryotes and higher eukaryotes. Using this information, we combine phylogenetic reconstructions with other protein features (e.g., Pfam domain, protein disorder and transmembrane domain predictions) to understand the evolutionary relationships between these proteins, clarifying certain inconsistencies in the nomenclature of PEX proteins. Important questions that we answer are (i) how are the different PEX genes conserved across eukaryotes, (ii) what is the core set of PEX genes present in the last eukaryotic common ancestor and (iii) what are the typical features of the PEX proteins.

MATERIALS AND METHODS
For the ortholog detection of PEX proteins, we systematically used two approaches: reciprocal searches of single protein sequences and reciprocal searches based on protein profiles (Hidden Markov models). We selected a set of eukaryotic proteomes from UniProt (The UniProt Consortium, 2017) (see Table 1) and for both approaches, performed the reciprocal searches starting from the sequences of different organisms (see Table 1) and made a consensus for the assignment of orthologs between the searches.
The first approach was based on phmmer searches [HMMER package (Potter et al., 2018)]. As peroxisomal proteins can be multidomain proteins, when the first reciprocal hit failed, we also checked the best domain e-value hit from the target proteome. In this way, we also retrieve potential orthologs taking into account alternative domain architecture. The second approach was based on reciprocal jackhmmers followed by hmmsearches [HMMER package (Potter et al., 2018)]. This method is applied in order to detect divergent orthologs undetectable by the previous approach, although it can be problematic for proteins containing common domains like PEX1/6, PEX4 (containing functional domains like WD40, ATPase, zinc-finger and ubiquitin ligases; see Table 2). Due to the diverse nature of the PEX proteins, different e-value thresholds and iterations were applied. For example, searches involving transmembrane proteins and tandem protein repeats (TPR) were conducted with two iterations and a relaxed e-value, 1e-2. Alternatively, for the other common domains, we applied two iterations and constrained e-value, 1e-20. The reciprocal detection for these common domains were often/frequently unsatisfying showing the limitation of this method for abundant and common domains.
Once the ortholog assignment of both methods combined, for each set of orthologs we manually filtered-out possible false positive by performing a multiple sequence alignment using Mafft [einsi-mode (Katoh and Standley, 2013)] followed by visual inspection. We additionally searched for missing orthologs. We built HMM profiles through Hmmbuild using the MSA generated previously and made searches into the suspect proteome through Hmmsearch [both from the HMMER package (Potter et al., 2018)]. It is important to note that if no orthologs were identified for a particular PEX protein in a specific organism, this does not necessarily mean that no ortholog exists. Possible causes of not identifying orthologs are incomplete genome information and sequence divergence of the 'true' ortholog. For example, the T. pseudonana proteome seems to be incomplete in the Uniprot database: a previous study identified a T. pseudonana Pex12 ortholog (Mix et al., 2018) that matches our criteria for orthology, but is absent from Uniprot.
Ortholog sequences included in the final dataset were aligned with Mafft, and trimmed the gap position with Trimal using different thresholds. Phylogenetic trees were constructed using IQ-TREE (Nguyen et al., 2015) obtaining branch supports with ultrafast bootstrap (Hoang et al., 2018) and applying the automatic model selection calculated by ModelFinder (Kalyaanamoorthy et al., 2017). Trees were visualized and annotated using iTOL (Letunic and Bork, 2019). Functional domain annotation was carried out using the Pfam database (El-Gebali et al., 2019), transmembrane domains using the TMHMM server 1 and structural disorder with IUPred2 (Mészáros et al., 2018).

RESULTS
The proteomes of 38 eukaryotes were investigated to identify all PEX proteins known to date. Not all eukaryotes contain peroxisomes (Schlüter et al., 2006;Žárský and Tachezy, 2015;Gabaldón et al., 2016). In agreement with previous studies FIGURE 1 | Schematic representation of the PEX proteins. Core conserved PEX proteins (shapes in dark colors, names in white), fungi-specific proteins (light, names in black) and the moderately conserved PEX protein PEX26 (gray, name in black, which is only present in Metazoa and Fungi) are depicted. Membrane proteins are ovals, soluble proteins round. Matrix protein import. Peroxisomal matrix proteins contain a peroxisomal targeting signal (PTS) that is recognized by cytosolic receptors: a C-terminal PTS1 or (less commonly) an N-terminal PTS2, recognized by PEX5 and PEX7, respectively. PTS2 import involves a co-receptor (Co): PEX5 (animals, plants, and protists), PEX18/21 (S. cerevisiae) or PEX20 (Fungi). Next, the receptor-cargo complex associates with the docking complex, consisting of PEX13/14 (and in Fungi PEX17 or PEX33). Upon cargo translocation and release, the PTS (co-)receptor is ubiquitinated and recycled. Ubiquitination involves the ubiquitin conjugating enzyme (E2) PEX4 (recruited to the membrane by PEX22) and the ubiquitin ligase (E3) activities of the RING finger complex, consisting of PEX2/10/12. Receptor extraction requires the AAA+ ATPase complex PEX1/6, which is recruited to the membrane via PEX26 (PEX15 in S. cerevisiae, APEM9 in plants -only PEX26 shown). PEX8 links the docking and RING finger complexes, and functions in receptor-cargo dissociation. Peroxisomal membrane protein (PMP) targeting involves PEX3, PEX19 and PEX16. PMPs can sort directly to peroxisomes or indirectly via the ER. In the direct pathway PEX19 acts as receptor/chaperone, while it functions at the ER in PMP sorting via the indirect pathway. The Pex11 protein family (all show as PEX11) and the fungal peroxins PEX35 and PEX37 have been mainly implicated in peroxisome proliferation. Pex11 family proteins are also present in mitochondria-peroxisome contact sites and PEX11 functions as non-selective ion channel. Members of the fungal Pex23 protein family localize to the ER and are involved in the formation of peroxisome-ER membrane contact sites. Created with BioRender.com. (Schlüter et al., 2006;Gabaldón et al., 2016), several protist species of our initial analysis were found to lack most PEX proteins, namely Cryptosporidium parvum, Theileria annulata, Babesia bovis, Monosiga brevicollis, Plasmodium falciparum, Blastocystis hominis, and Entamoeba histolytica. To facilitate comparison between species containing and (likely) lacking peroxisomes, the latter species was included in further analyses, but all others likely lacking peroxisomes were omitted.

Distribution and General Description of PEX Proteins Across Eukaryotic Lineages
The results of our computational survey are summarized in Figure 2. We detect a core of PEX proteins that are broadly conserved in all eukaryotic lineages, encompassing PEX3/19/16 [peroxisomal membrane protein (PMP) sorting], PEX1/6, PEX2/10/12, PEX13/14, and PEX5/7 (matrix protein import) and proteins of the Pex11 family (peroxisome proliferation and contact sites). Some detected absences are probably real. On the other hand, in other cases the function of missing PEX proteins may be taken over by other homologous proteins. For instance, the function of the ubiquitin conjugating enzyme (E2 enzyme) PEX4 in receptor ubiquitination is performed by proteins of the E2D family in Metazoa, which lack a PEX4 ortholog (Grou et al., 2008). Similarly, the function of PEX26 is complemented by the homologous protein APEM9 in plants (Cross et al., 2016) and PEX15 in S. cerevisiae. Furthermore, we observe an important bias toward Fungi (yeasts and filamentous Fungi) reflected in the large number of PEX proteins that are specific to fungi, such as PEX8, PEX20/18/21 and the Pex23 family (Figure 2). This is a result of the fact that the large majority of studies investigating peroxisomes, in particular their biogenesis, have been performed in yeast models such as S. cerevisiae, O. polymorpha, K. phaffii, and Yarrowia lipolytica.
We analyzed the structural features of the PEX proteins (see Table 2). Structural protein disorder seems to be a common feature among some PEX proteins. In some of them, structural disorder is only predicted for a short fragment, but others like PEX19, PEX18/20/21, PEX14/33-13 are predicted as almost entirely disordered. Also, transmembrane helical domains are usually present in certain PEX proteins, such as PEX3, PEX14, and PEX26. Several PEX proteins have common eukaryotic structural domains, like the E2 enzyme PEX4 and the AAA+ ATPase domain present in PEX1 and PEX6. We also detect several functional domain associations, such as the RING finger (zinc finger) domain in PEX2/10/12 and SH3 domains in PEX13, both being involved in signal transduction and controlling proteinprotein interactions. Other recognizable fold types in PEX proteins include α-solenoid formed by the TPR repeat domains in PEX5 and the β-propeller formed by WD40 repeats in PEX7.
The functional diversification of proteins is caused by the duplication of the respective genes. This is one of the main sources of cellular complexity and development. This process is called paralogization, where paralogous proteins are those having a common origin, i.e., belonging to the same protein family. These gene duplications (paralogizations) can be ancestral (deep paralogs) or they can be asynchronous during evolution: appearing later and being restricted to specific taxonomic clades (in-paralogs). The paralogization of PEX proteins seems to have been relevant for the development of peroxisomes in Eukarya domain. Indeed, some of these paralogizations preceded the diversification of eukaryotes, like the peroxins of the AAA+ ATPase protein family PEX1/6, the RING finger proteins PEX2/10/12 and proteins of the Pex11 protein family. On the other hand, some other PEX proteins have been duplicated in specific eukaryotic taxons. These proteins have often been inconsistently named, since newly discovered proteins were sometimes given a new number. This should be kept in mind when studying such proteins. For instance, the S. cerevisiae PEX9 is actually a copy of PEX5 (in-paralogs, not ancestral duplication in Fungi). Similarly, the fungal PTS2 co-receptors PEX18/20/21 should be considered as a single group: PEX18 and PEX21 of S. cerevisiae are actually the result of a duplication of the ancestral PEX20 form. The PEX23 family is specifically found in Fungi and encompasses multiple copies in specific organisms, such as PEX30/31/32 and PEX28/29 in S. cerevisiae, resulting from the duplication of PEX23 and PEX24, respectively. In the previous examples, different proteins derived from the same ancestral protein, i.e., belonging to the same protein family, have received different numbers. On the other hand, the opposite has happened for certain other PEX proteins. Many members of the Pex11 family have the same number, but were given a different appendix instead: for instance, PEX11α/β/γ or PEX11A/B/C/D/E. We detected that these paralogs originated from independent paralogizations in different lineages, but their naming does not always reflect this. For instance, fungal PEX11C belongs to the same subfamily as human PEX11γ, but PEX11C from A. thaliana does not. Similarly, A. thaliana PEX11A is not equivalent to human PEX11α. Based on phylogenetic reconstructions, we propose that two different subfamilies can be distinguished within the Pex11 family. In addition, the Pex11 family includes an in-paralog group specific to Fungi, containing PEX25/27/34/36. Therefore, in some cases, the nomenclature ascribed to the PEX protein paralogizations could lead to confusion, because there is no uniformity in the way in which paralogous, inparalogous or non-paralogous/unrelated proteins have been named. Furthermore, some paralogizations have led to paralogs of PEX proteins that may no longer function in peroxisome biology. For instance, vertebrates express a PEX5 paralog called PEX5R (TRIP8b), whose only known function is the regulation of hyperpolarization-activated cyclic nucleotide-gated (HCN) channels -key modulators of neuronal activity (Han et al., 2020).
Taking into account all of the above, we review the role of these PEX proteins below, in order to gain a comprehensive understanding of their functional classification.

A Core Set of PEX Proteins Is Broadly Conserved in Eukaryotes
A core set of PEX proteins is broadly conserved across all eukaryotic lineages, encompassing proteins involved in PMP

Mpv17_PMP22 Fungi
The actual groups indicate to the main groups (deep paralogs) identified in phylogenetic reconstructions. Protein disorder was predicted using IUPRED and transmembrane helices through TMHMM software. Functional protein domains annotated using Pfam database. Question marks indicate those features that were present in a subset of protein sequences from our data set.
sorting (PEX3, PEX19, and PEX16; the latter absent in some species), matrix protein receptors (PEX5 and PEX7), components of the receptor docking site (PEX13 and PEX14), enzymes involved in receptor ubiquitinylation (PEX2, PEX10, PEX12, and PEX4), two AAA+ ATPAses that play a role in receptor recycling (PEX1 and PEX6) and a protein family involved in peroxisome proliferation (Pex11 family). The function of these conserved PEX proteins is central to peroxisome biology and thus maintained. In the following section, we will review how these processes define the biology of the canonical peroxisomes as well as the mechanistic models proposed in the field. Furthermore, we describe variations in the repertoire of PEX proteins in certain eukaryotes.
Sorting of PMPs (PEX3, PEX19, and PEX16) Only three PEX proteins (PEX3, PEX16, and PEX19) are known to be involved in targeting of PMPs. Two mechanisms of PMP sorting to the peroxisome membrane have been described (see Figure 1; for a detailed review, see Jansen and van der Klei, 2019). According to the direct sorting model, PEX19 binds to newly translated PMPs in the cytosol. In this pathway PEX19 acts as a chaperone and cycling receptor (Jansen and van der Klei, 2019). The PEX19-PMP complex binds to the PMP PEX3 and is subsequently inserted in the membrane by a currently unknown mechanism. In the indirect pathway, PMPs traffic first to the ER and accumulate at a subdomain, where PMP containing vesicles bud off. PEX3 plays a role in the intra-ER sorting of PMPs (Fakieh et al., 2013), while PEX19 is important for vesicle budding (Van Der Zand et al., 2012;Agrawal et al., 2016). PEX16 plays a role in the indirect pathway (Hua and Kim, 2016). Notably, PEX3 is also involved in a host of other functions, including pexophagy, peroxisome retention during yeast budding and the formation of contacts between peroxisomes and vacuoles. In all these processes, PEX3 recruits proteins to the peroxisomal membrane (e.g., Atg30/36, Inp1) (Jansen and van der Klei, 2019). Our computational survey shows that PEX3, PEX19, and PEX16 are conserved well, with a few exceptions, suggesting minor variations in mechanisms of PMP sorting. For instance, PEX16 is widely conserved, but is absent in all (investigated) yeast species, C. elegans and several protists. A characteristic motif in PEX19 orthologs of many species is a CaaX box at the C-terminus. Farnesylation of this motif causes conformational changes in PEX19 and increases its binding affinity for PMPs (Rucktaschel et al., 2009;Emmanouilidis et al., 2017). Previous studies in S. cerevisiae and humans are contradictory regarding the importance of this post-translational modification for peroxisome function (Vastiau et al., 2006;Rucktaschel et al., 2009;Schrul and Kopito, 2016). Interestingly, Schrul and Kopito (2016) found that the CaaX box of human PEX19 was important for targeting of lipid droplet protein UBXD8, but not for peroxisome biogenesis (Schrul and Kopito, 2016). We checked if the CaaX box is present in all eukaryotes. We found that while this motif is present in all animals, plants and Fungi, it is absent (or difficult to align) in many protists, like Euglenozoa and Amoebozoa, despite these organisms expressing the enzyme required for farnesylation (see e.g., Buckner et al., 2002) (Figure 3). Interestingly, putative PEX19 orthologs were also identified in Entamoeba histolytica and M. brevicollis, despite these species very likely lacking peroxisomes. This may suggest an alternative function for PEX19, unrelated to peroxisomes.

Matrix Protein Receptors (PEX5 and PEX7)
Newly synthesized matrix proteins are first recognized by their cytosolic peroxisomal targeting signal (PTS) receptor. The majority of peroxisomal matrix proteins contain a PTS1 or a PTS2, recognized by PEX5 and PEX7, respectively.
PEX7 contains WD40 repeats, which fold into a β-propeller structure that provides a platform for interaction with the PTS2 motif and PTS2 co-receptor (Pan et al., 2013). While PEX5 was identified in all eukaryotic organisms, PEX7 is absent in C. elegans, T. pseudonana, and G. sulphuraria, which may be explained by a loss of the PTS2 targeting pathway. This was shown to be the case in C. elegans: proteins normally containing a PTS2 have gained a PTS1 instead (Motley et al., 2000). A similar loss of the PTS2 targeting pathway has been proposed for T. pseudonana and the red alga Cyanidioschyzon merolae (Gonzalez et al., 2011). As G. sulphuraria is a red alga belonging to the same family as C. merolae (Cyanidiaceae), it is likely that the same happened in G. sulphuraria. Why most species utilize multiple matrix protein targeting pathways as opposed to just one is unclear. It could be that proteins of different pathways are differentially expressed depending on growth conditions, as is the case for PEX5 and its copy PEX9 in S. cerevisiae (Effelsberg et al., 2016;Yifrach et al., 2016). In a similar vein, it may be a matter of targeting priority, with one pathway responsible for targeting key proteins, while the other targets proteins that are less important. Another possibility is that the location of the targeting signal at either the N-or C-terminus affects protein function, making one of the targeting signals not feasible for a particular protein.
PEX5 is conserved in all eukaryotes analyzed and is characterized by a disordered region at the N-terminal and several tetratricopeptide repeats (TPR) at the C-terminal (Figure 4). While the TPR domains are responsible for its interaction with the PTS1 motif (Gatto et al., 2000), the N-terminal region interacts with a rarer PTS, PTS3 (Rymer et al., 2018) and with docking proteins PEX13 and PEX14 (Saidowsky et al., 2001;Otera et al., 2002), with the interacting regions partially overlapping (Rymer et al., 2018). As previously recognized, the structurally disordered region at the N-terminal of some PEX5 proteins shares sequence similarities with the Fungi-specific PEX20 proteins (Kiel et al., 2006). These similarities between the PEX5 N-terminal and PEX20 rely on: (i) a conserved motif at the N-terminal domain, (ii) followed by one or more WxxxF/Y motifs and (iii) a PEX7-binding domain (Schliebs and Kunau, 2006). The conserved N-terminal domain of PTS2 co-receptors contains a highly conserved cysteine residue (Schliebs and Kunau, 2006), which has been implicated in (co-)receptor recycling and cargo translocation (Leon and Subramani, 2007;Hensel et al., 2011;Okumoto et al., 2011). The WxxxY/F motifs are important for binding to PEX14 and PEX13 (Saidowsky et al., 2001;Otera et al., 2002). These WxxxF motifs are not only found in PTS2 co-receptors, but also in PEX5 of species where PEX5 does not act as PTS2 co-receptor but only FIGURE 3 | Phylogeny and protein features of PEX19 orthologs. The phylogeny is rooted at mid-point to ease the visualization and labels of the main taxonomic groups are colored according to the legend. Note that the topology does not necessarily reflect the actual evolutionary trajectory of such proteins. Protein domain architecture is defined by pfam annotations and transmembrane helices (TMH) according to TMHMM software. The line-dot plot, indicates the regions predicted to be disordered (red) and not disordered (gray). The sequence alignment shows the conservation of the CaaX box in PEX19 orthologs of distant eukaryotes, with 'C' denoting Cys, 'a' an aliphatic residue and 'X' usually being a Ser, Thr, Gln, Ala or Met. Asterisk indicates forced alignments manually.
as PTS1 receptor (Schliebs et al., 1999). As the name implies, the PEX7-binding domain allows the co-receptors to bind to PEX7. We checked the conservation of this domain by manually generating a hidden Markov model of the fungal PEX20, and found that this domain is detected in some but not all PEX5 orthologs that act as PTS2 co-receptors (see Figure 4; Pex20 * domains in PEX5).
Phylogeny shows that vertebrates and S. cerevisiae have duplicated their PEX5 gene independently (Figure 4). In S. cerevisiae, PEX5 works as a general import receptor for all PTS1-containing peroxisomal matrix proteins, while its paralog PEX9 acts as a condition-specific receptor for a subset of PTS1 proteins (Effelsberg et al., 2016;Yifrach et al., 2016). PEX9 has lost the N-terminal disordered region that is normally present in PEX5 (see Figure 4). Vertebrates express PEX5R, a PEX5-related protein also called TRIP8b. PEX5R is preferentially expressed in the brain and can bind PTS1-containing proteins in vitro (Amery et al., 2001). Nevertheless, it is unclear whether PEX5R plays any role in matrix protein targeting, although the paralogizations of PEX5 could involve different functional novelties for peroxisome protein import as in S. cerevisiae.

The Docking Site (PEX13 and PEX14)
Once the peroxisomal matrix protein is bound to its receptor, the receptor-cargo complex associates to the docking complex, consisting of PEX13 and PEX14 (and in Fungi PEX17 or PEX33), at the peroxisomal membrane (Figure 1). FIGURE 4 | Phylogeny and protein features of PEX5 orthologs. The phylogeny is rooted at mid-point to ease the visualization and labels of the main taxonomic groups are colored accordingly to the legend. Note that the topology does not necessarily reflect the actual evolutionary trajectory of such proteins. Protein domain architecture is defined by pfam annotations. The Pex20 * is a manually generated hidden Markov model (CSM, this study). The line-dot plot indicates the regions predicted be disordered (red) and not disordered (gray).
Transmembrane helices were predicted in some, but not all, PEX13 orthologs (Supplementary Figure 1A). In addition, only in Opisthokonta organisms (Fungi and Metazoa) and amoebozoa, PEX13 has a predicted SH3 domain at the C-terminal (Supplementary Figure 1A), which likely controls its interaction with other proteins. PEX14 also contains a predicted transmembrane helix, but seems to be largely structurally disordered (Supplementary Figure 1B), although it also includes several coiled-coil domains (e.g., Lill et al., 2020). In vitro protease protection experiments using human PEX13 and PEX14 confirmed that both proteins are integral membrane proteins. Human PEX14 has an N in -C out topology, while PEX13 adopts an N out -C in topology, thereby exposing its SH3 domain to the peroxisomal matrix (Barros- Barbosa et al., 2019a). The architecture of the S. cerevisiae PEX14-PEX17 complex was recently elucidated and revealed that PEX14 forms a 3:1 heterotetrameric complex with PEX17, forming a rod-like structure of approximately 20 nm that is exposed to the cytosol (Lill et al., 2020). This structure is mainly formed by the coiled-coil domains of PEX14 and PEX17. Besides its coiled-coil domains, PEX14 has a predicted intrinsically disordered C-terminal domain, which may be involved in recruiting import receptor PEX5 (Lill et al., 2020).
After docking, the cargo is translocated into the peroxisomal matrix. For S. cerevisiae PTS1 protein import it was shown that PEX5 integrates into the peroxisomal membrane to form a transient translocation pore alongside PEX14 (Meinecke et al., 2010). For PTS2 import, the pore is formed by PEX14, PEX17, and PEX18 (Montilla-Martinez et al., 2015). Little is known about the matrix protein import pores in other organisms, but the involvement of PEX14 seems to be a common denominator (Barros-Barbosa et al., 2019b). After formation of the translocation pore, the cargo is released into the peroxisomal matrix.
Receptor Ubiquitination (PEX4, PEX22, PEX2, PEX10, and PEX12) After cargo release, the PTS (co-)receptor needs to be extracted from the peroxisomal membrane, so it can be used in subsequent rounds of peroxisomal matrix protein import (Platta et al., 2014). PEX5 is mono-ubiquitinated at a conserved cysteine, leading to its extraction and recycling (Platta et al., 2014). In most eukaryotes, this ubiquitination depends on the ubiquitinconjugating enzyme (Ubc or E2 enzyme) PEX4, associated to the peroxisomal membrane via PEX22, and on the ubiquitin ligase activities of PEX2, PEX10, and PEX12. Notably, PEX4 and PEX22 are absent in all Metazoa and in several other species (see Figure 2). However, mono-ubiquitination at a conserved cysteine of PEX5 occurs in a comparable manner in mammalian cells through the E2D proteins UbcH5a/b/c (Grou et al., 2008). PEX4 and metazoan UbcH5a/b/c belong to the same protein family and are thus closely related, but belong to different protein subfamilies (see Supplementary Figure 2). Indeed, the true orthologs of metazoan UbcH5a/b/c in Fungi are the soluble Ubc4/5 proteins (see Supplementary Figure 2). S. cerevisiae Ubc4 and the partially redundant Ubc1 and Ubc5 catalyze polyubiquitination of PEX5 at two lysine residues, targeting PEX5 for proteasomal degradation (Platta et al., 2014). This reveals that in the absence of a PEX4 ortholog, functional compensation in specific organisms is possible, showing that the ubiquitination process can be shifted between subfamilies of the whole ubiquitin conjugating enzyme family. Thus, for other organisms lacking PEX4 and PEX22, it could be expected that other E2 enzymes perform this function.
The RING finger complex proteins PEX2/10/12 have ubiquitin (E3) ligase activity (Platta et al., 2014) and are broadly conserved in eukaryotes (Figure 1). The three paralogous proteins PEX2, PEX10, and PEX12 form a heterotrimeric complex (El Magraoui et al., 2012). Characteristic for these three proteins is a highly conserved region at the N-terminus (annotated as Pex2_Pex12 pfam) and a zf-RING finger domain at the C-terminus. While the first domain can display a transmembrane helix (predicted in some of the species, suggesting membrane anchoring), the latter domain is responsible for the E3 ubiquitin ligase activity of the proteins (Platta et al., 2014) (Figure 5). The strong conservation of both domains in most of the sequences could indicate that the cooperation of both domains is crucial for peroxisome biology. The phylogeny of these enzymes, which clearly establishes the three main subfamilies (PEX2, PEX10, and PEX12 that each contain organisms from almost all lineages), suggests that they are deep paralogs and that their functional speciation was important and early in eukaryotic evolution.

Receptor Extraction (PEX1/6)
Once PEX5 is ubiquitinated, peroxisomal AAA+ ATPases PEX1 and PEX6 are responsible for PEX5 export from the peroxisomal membrane in order to recycle it back to the cytosol. PEX1 and PEX6 belong to the AAA (ATPase associated with diverse cellular activities) family (Pedrosa et al., 2018), a group of protein motors that use ATP binding and hydrolysis to mechanically unfold, disaggregate or remodel substrates (Olivares et al., 2016). Proteins of this family form ring structures with a central channel, through which they can translocate their substrates (Gates and Martin, 2020). PEX1 and PEX6 form a hetero-hexameric complex with alternating subunits in a double-ring structure (Blok et al., 2015;Gardner et al., 2015). In S. cerevisiae, the complex mechanically unfolds its substrates via progressive threading in an ATPdependent manner (Gardner et al., 2015). Pedrosa et al. (2018) demonstrated using an in vitro setup that the PEX1/PEX6 complex directly interacts with ubiquitinated (human) PEX5, unfolding it during extraction (Pedrosa et al., 2018). The phylogeny of PEX1 and PEX6 splits both subfamilies, while their protein domain architecture shows that the architecture is more conserved in PEX1 than in PEX6 (Supplementary  Figure 3). Similar to PEX2/10/12, these facts suggest that the functional speciation of PEX1 and PEX6 was also important and early in Eukaryotes.

The Pex11 Family
Pex11 family proteins coordinate peroxisome proliferation (Koch et al., 2010). The Pex11 family is a large and complex protein family, with some members containing predicted transmembrane helices. A previous evolutionary analysis of the Pex11 family revealed that PEX11 is highly conserved and underwent independent paralogizations in different Opisthokont lineages (Chang et al., 2015). Unlike this previous reconstruction, we investigated PEX11 phylogeny by combining all recognized PEX11 homologs from the organisms analyzed in this study (Figure 6). We obtain a topology that is difficult to interpret, probably due to the low sequence conservation and divergent sequences. This provokes weak support at some basal nodes (i.e., bootstraps lower than 80%), demonstrating the limitations for inferring PEX11 evolution and thus, these results should be interpreted with caution. On the other hand, several paralogizations in Opisthokonta, as reported previously (Chang et al., 2015), but also in Archaeplastida and different protists can be inferred from this reconstruction, demonstrating independent PEX11 protein expansion in eukaryotic lineages. Thus, the PEX11 protein family reveals a complex evolutionary history through eukaryotic evolution, as previously suggested (Chang et al., 2015). We can tentatively distinguish two main groups within the Pex11 protein family: one containing amongst others fungal PEX11 and vertebrate PEX11α/β, and one containing fungal PEX11C and vertebrate PEX11γ (Figure 6, shaded light and FIGURE 5 | Phylogeny and protein features of PEX2/10/12 orthologs. The phylogeny is rooted at mid-point to ease the visualization and labels of the main taxonomic groups are colored accordingly to the legend. Note that the topology does not necessarily reflect the actual evolutionary trajectory of such proteins. Protein domain architecture is defined by pfam annotations and transmembrane helix according to TMHMM software. dark gray, respectively). Both groups contain organisms from most taxonomic lineages, with the exception of plants, which apparently do not have orthologs from the group with fungal PEX11C, although they have intermediary Pex11 sequences that fall outside our main groups (along with other Pex11 protist sequences; Figure 6). Due to the limitations of this phylogeny, it is unclear whether these forms are actually deep paralogs or whether they represent alternative evolutionary histories.
The phylogeny of PEX11 shows that these (putative) deep paralogs have subsequently undergone independent paralogizations in different lineages. For instance, proteins from the group containing fungal PEX11 (indicated in light gray in Figure 6) were clearly duplicated independently in vertebrates and in several filamentous fungi, resulting in human PEX11α and PEX11β among others. Notably, some additional paralogizations seem to have undergone extreme sequence divergence, probably providing artifactual clustering like the Fungi-specific PEX25/27/34/36 subgroup within this group, which contains shortened proteins up to 144 amino acids.
Fungal PEX11 and human PEX11α and PEX11β contain a conserved amphipathic helix capable of tubulating negatively charged membranes in vitro (Opaliński et al., 2011;FIGURE 6 | Phylogeny and protein features of PEX11 family proteins. The phylogeny is rooted at mid-point to ease the visualization and labels of the main taxonomic groups are colored accordingly to the legend. Note that the topology does not necessarily reflect the actual evolutionary trajectory of such proteins. Protein domain architecture is defined by pfam annotations and transmembrane helix prediction (black box). The two main groups, shaded light and dark gray, are distinguished according to the most supported and basal bootstraps and their taxonomic compositions. Yoshida et al., 2015). We mapped this amphipathic helix onto the multiple sequence alignment of Pex11 family proteins, observing that three positively charged residues are generally conserved in these proteins. However, the second positively charged position is not conserved in the Pex11 group containing fungal PEX11C and human PEX11γ (indicated in dark gray in Figure 6; Supplementary Figure 4), which may suggest a possible functional difference between these main Pex11 groups. This is, however, speculative and would need to be verified experimentally. Furthermore, we observed that S. cerevisiae PEX34 has lost this amphipathic helix, while the C-terminal region is conserved (Supplementary Figure 4).
Several members of the Pex11 protein family have been studied. So far, the majority of studies have investigated members of the group that includes PEX11 from Fungi and PEX11α/β from mammals. In the yeast species S. cerevisiae, O. polymorpha and K. phaffii, the absence of PEX11 results in fewer and larger peroxisomes, while cells overexpressing PEX11 have increased peroxisome numbers with smaller size (Erdmann and Blobel, 1995;Krikken et al., 2009;Joshi et al., 2012). Intriguingly, the absence of Y. lipolytica PEX11 results in cells that lack morphologically identifiable peroxisomes (Chang et al., 2015). Similarly, overproduction of PEX11α or PEX11β in vertebrates induces peroxisome proliferation, while reduction of protein levels resulted in lower peroxisome numbers (Schrader et al., 1998;Li and Gould, 2002). This led to the hypothesis that these proteins play a role in peroxisome fission. Peroxisome fission takes place in three steps: organelle elongation, constriction and scission (Schrader et al., 2016). PEX11 plays a role in the first step where it functions in membrane remodeling (Schrader et al., 2016). So far, no proteins have been identified that are responsible for organelle constriction. Peroxisomal fission shares several components with the mitochondrial fission machinery, such as the dynamin related protein Dnm1 (Drp1/DLP1), Fis1 and Mff (Schrader et al., 2016). Human PEX11β recruits DRP1 to the peroxisomal membrane (Li and Gould, 2003;Koch and Brocard, 2012), and both S. cerevisiae PEX11 and human PEX11β have been reported to function as GTPase activating protein (GAP) for Dnm1 (DRP1) (Williams et al., 2015). Interestingly, while S. cerevisiae PEX11 and its more divergent homologs PEX25 and PEX27 are all involved in regulating peroxisome numbers, each protein seems to play a distinct role in this process: PEX11 is important for peroxisome maintenance and promotes proliferation of existing peroxisomes, while PEX25 seems to initiate membrane elongation and may act in de novo biogenesis, whereas PEX27 in turn may have an inhibitory function (Huber et al., 2012). It seems plausible that similar patterns are present in other organisms expressing multiple Pex11 family proteins.
Several other functions have been attributed to proteins of the group containing fungal PEX11. O. polymorpha PEX11 has been implicated in peroxisome segregation during cell division (Krikken et al., 2009). S. cerevisiae PEX11 and PEX34 are involved in peroxisome-mitochondria contact sites (Ušaj et al., 2015;Shai et al., 2018), while O. polymorpha PEX11 has been implicated in peroxisome-ER contact sites (Wu et al., 2020). S. cerevisiae PEX11 has also been proposed to act as a pore-forming protein (Mindthoff et al., 2016) and has been implicated in medium chain fatty acid oxidation as well (van Roermund et al., 2000). As only a subset of proteins from this group have been investigated, perhaps other functions will still be discovered.
Much less is known about proteins of the other Pex11 group (shaded dark gray in Figure 6), which includes PEX11γ from Metazoa, fungal PEX11C and GIM5A/B from T. brucei. However, they also play a role in peroxisome proliferation (see e.g., Koch and Brocard, 2012;Opaliński et al., 2012). PEX11γ has been suggested to coordinate peroxisomal growth and division via heterodimerization with other mammalian PEX11 paralogs and interaction with Mff and Fis1 (Schrader et al., 2016). O. polymorpha PEX11C is downregulated upon shifting from peroxisome repressing (glucose) to peroxisome inducing (methanol) growth conditions (van Zutphen et al., 2010) suggesting that PEX11C is not required for peroxisome proliferation. In Penicillium rubens, deletion of PEX11C has no significant effect on peroxisome number or size, while overexpression strongly stimulates peroxisome proliferation (Opaliński et al., 2012). In T. brucei, the absence of both GIM5A and GIM5B is fatal due to cellular fragility (Voncken et al., 2003). In S. cerevisiae proteins of the Pex11C group are absent.
The remaining proteins, which do not have a clear evolutionary relationship with the two groups described above, illustrate independent protein expansions. In plants, the most studied PEX11 proteins in this category are PEX11C/D/E from A. thaliana. These proteins cooperate with FIS1b and DRP3A in peroxisome growth and division during the G 2 phase just prior to mitosis (Lingard et al., 2008). Interestingly, in plant cells where PEX11C, PEX11D, and PEX11E were silenced simultaneously, peroxisomes were enlarged, but not elongated, suggesting that these proteins act in peroxisome growth, but not tubulation (Lingard et al., 2008).

PEX Proteins Specific for Fungi
Several PEX proteins are specific to Fungi (see also Schlüter et al., 2006). The high number of known fungal PEX proteins is probably due to the extensive screens for yeast peroxisomedeficient mutants that have been performed in the past (Erdmann et al., 1997). Additionally, current peroxisome biogenesis research is still taking advantage of a wealth of genetic and biochemical toolboxes to analyze the molecular biology of these organelles in yeast.
The PEX7 Co-receptors (PEX18, PEX20, PEX21) In plants, animals and protists like T. brucei, D. discoideum and L. major (a longer splicing variant of) PEX5 acts as PEX7 co-receptor for PTS2 protein import (Schliebs and Kunau, 2006). In contrast, in many Fungi the PEX7 co-receptor is a separate PEX protein, namely PEX18, PEX20, or PEX21 (see for more detailed reviews, e.g., Schliebs and Kunau, 2006;Kunze, 2020). Duplication of the ancestral PEX20 in S. cerevisiae (see Supplementary Figure 5), resulted in the partially redundant paralogs PEX18 and PEX21 that perform the same function (Purdue et al., 1998). Therefore, these proteins can be considered as a single PEX20 group. As previously described, some sequence features relate PEX20 with the N-terminus of PEX5 proteins: a conserved cysteine, WxxxF motifs and PEX7 binding domain (Schliebs and Kunau, 2006). Due to the fact that that PEX5 is present in most eukaryotes and Pex20 domains can be found at the N-terminus of many such proteins, it is most likely that PEX20 is the result of a protein domain separation specific to Fungi, rather than the previously proposed protein fusion of PEX5 and PEX20 (Kiel et al., 2006).

PEX17 and PEX33
In all species, PEX13 and PEX14 are components of the receptor docking site. An additional component of the docking site in yeasts is PEX17, while in filamentous fungi PEX33 is part of the docking complex. PEX17 is characterized by a single transmembrane helix at the N-terminal. As described above, S. cerevisiae PEX14 and PEX17 together form a rod-like structure at the peroxisomal membrane (Lill et al., 2020). PEX33 is a paralog of PEX14, whereas PEX17 is a protein partially aligning to the C-terminal of PEX14 and PEX33, suggesting PEX17 is a PEX14-like protein. The exact functions of PEX17 and PEX33 are still unclear, but PEX17 in S. cerevisiae is a main component of the PTS2 import pore (Montilla-Martinez et al., 2015) and seems to increase the efficiency of binding of import receptors PEX5 and PEX7 to the docking complex (Lill et al., 2020).

Pex23 Family Proteins
PEX23, PEX24, PEX29, PEX32 (for O. polymorpha for example) and PEX28, PEX29, PEX30 PEX31, PEX32 (for S. cerevisiae) are homologous proteins containing a highly conserved domain called the Pex24p domain (pfam). This domain contains a Dysferlin (DysF) motif at the C-terminal region, the function of which is still unclear (Wu et al., 2020). At the N-terminal, these proteins have several transmembrane domains suggesting that these proteins are anchored to membranes. A group of proteins related to this Pex23 protein family are the Pex23like proteins (Kiel et al., 2006), including SPO73, a protein involved in sporulation. Pex23-like proteins do not usually present the region containing the predicted transmembrane helices. The phylogeny of all these proteins can be divided into three main groups that here we call PEX23 subfamily, PEX24 subfamily and Pex23-like proteins ( Figure 7A). The sequences from the PEX23 and PEX24 subfamilies appear to differ in protein extensions at their C-and N-termini, respectively, with predicted structural protein disordered regions. Due to the fact that the main PEX23 and PEX24 subfamilies contain most of the Fungi analyzed, it is likely that both subfamilies originated from an ancestral duplication in Fungi. Later, these PEX23 and PEX24 paralogs duplicated in yeasts leading to amongst others PEX28/PEX29 and PEX30/31/32 in the ancestor of S. cerevisiae. In filamentous fungi on the other hand, no duplication occurred, and these fungi express only one protein of each group. Thus, these proteins have diversified differentially in Fungi.
Unlike other peroxins, proteins of the Pex23 family localize to the ER instead of peroxisomes. Although initially reported at the peroxisome (Brown et al., 2000;Tam and Rachubinski, 2002;Vizeacoumar et al., 2003Vizeacoumar et al., , 2004, later studies either reported dual localization to peroxisomes and ER (Yan et al., 2008;David et al., 2013) or exclusive localization at ER subdomains (Joshi et al., 2016;Mast et al., 2016;Wu et al., 2020). A recent study characterizing O. polymorpha Pex23 family members reported the involvement of PEX24 and PEX32 in peroxisome-ER contact sites (Wu et al., 2020). This could explain the previous contradictory reports on their localization, as they can be expected to be present in spots where peroxisomes and ER interact. Furthermore, S. cerevisiae PEX30 and PEX31 are ER membrane shaping proteins (Joshi et al., 2016). S. cerevisiae PEX30 plays a role in regulating budding of pre-peroxisomal vesicles and lipid droplets from specific ER subdomains (Joshi et al., 2016(Joshi et al., , 2018. It has been proposed to facilitate this by collaborating with seipin to organize ER subdomains to alter the membrane lipid composition (Wang et al., 2018). In humans, no orthologs of PEX30 have been identified, but MCTP2 has been suggested to act as a functional analog (Joshi et al., 2018).
PEX23 homologs were found in Metazoa, but these proteins cannot be considered orthologs of PEX23. These proteins were previously published as metazoan PEX23 orthologs (e.g., Jeynov et al., 2006;Mast et al., 2011;Di Cara et al., 2017) and are also annotated as such in some databases (e.g., protein Q9VWB0| TECPR_DROME annotated as PEX23 in Uniprot and FlyBase). However, their domain architecture (see Figure 7B) is clearly different from previously established Pex23 family proteins, and they actually belong to the TECPR1 family of proteins. TECPR1 proteins are localized to lysosomes and play a role in autophagy (Chen and Zhong, 2012). While TECPR1 proteins do contain a DysF domain, like the proteins from the PEX23 family, they also contain several tectonin repeats (TECPR) and a PH domain, in addition to a beta-propeller structure (Ogawa et al., 2011). It is therefore unlikely that they perform a function similar to PEX23 family proteins in Fungi and they cannot be considered PEX23 orthologs.

PEX35 and PEX37
Little is known about both PEX35 and PEX37, but both seem to play a role in regulating peroxisome proliferation. PEX35 is unique to S. cerevisiae and closely related species in the Saccharomycetaceae family, while PEX37 is found in most other yeast species and filamentous fungi. PEX35 has no known functional domains or similarity to other known PEX proteins. Only one study investigating PEX35 has been published to date, showing that PEX35 is a PMP that interacts with vesicle budding inducer Arf1 and localizes at the proximity of proteins from the Pex11 family (Yofe et al., 2017). The authors speculate that PEX35 may regulate peroxisome fission alongside proteins of the Pex11 family. O. polymorpha PEX37 is a peroxisomal transmembrane protein that affects peroxisome segregation and proliferation under peroxisome-repressing conditions, but not on peroxisome-inducing conditions. So far, only one study has investigated this protein (Singh et al., 2020). PEX37 belongs to the same protein family as human PXMP2, N. crassa Woronin body protein Wsc and S. cerevisiae mitochondrial inner membrane protein Sym1 and its human homolog MPV17, many of which are thought to act as channels. Human PXPM2 is able to partially rescue the phenotype present in the absence of O. polymorpha PEX37, suggesting that these proteins have similar functions (Singh et al., 2020).

Moderately Conserved PEX Proteins
In many species, the PEX1/PEX6 complex is recruited to the peroxisomal membrane via an anchoring protein. These membrane anchors are much less conserved than PEX1 and PEX6 themselves, with different homologous, but not orthologous, proteins acting as anchoring protein in different species. In vertebrates and most Fungi, the anchoring protein is PEX26, while in S. cerevisiae and closely related species in the Saccharomycetaceae family it is PEX15 (Kiel et al., 2006) and in plants it is APEM9 (Cross et al., 2016). Despite sharing only weak sequence identity, PEX15, PEX26, and APEM9 do have several features in common. All three proteins are tailanchored proteins (Halbach et al., 2006;Cross et al., 2016) and tether the PEX1/PEX6 complex to the peroxisomal membrane via PEX6 (Birschmann et al., 2003;Matsumoto et al., 2003;Goto et al., 2011).

DISCUSSION
We used a comparative genomics approach to provide an upto-date overview of all PEX protein families identified so far, in a range of representative organisms from distant eukaryotic lineages, including many model organisms that are currently extensively used in cell biology research. In agreement with previous studies, our computational survey identified a core set of PEX proteins that is broadly conserved across distant eukaryotic lineages (PEX1/2/3/5/10/12/14/19 (Gabaldón et al., 2006;Schlüter et al., 2006) and in addition PEX6/7/11/13/16, this study). Gabaldón et al. (2006) previously proposed a minimal ancestral eukaryotic peroxisome in LECA consisting of PEX1/2/4/5/10/14 and several non-PEX proteins, based on the proteins found in yeast, rats, T. brucei and L. major (Gabaldón et al., 2006). Similarly, Schlüter et al. (2006) proposed an extended peroxisomal core set including PEX3/19/10/12 proteins as peroxisomal markers (Schlüter et al., 2006). We suggest a broader core set of PEX proteins based on a wider range of species than previous studies, encompassing organisms from distant clades such as Metazoa, Fungi, Amoebozoa, Archaeplastida, Stramenophiles, Alveolates and Rhizaria (SAR) and Excavata. This suggests that the core set of PEX proteins (or protein families like Pex11) defined in this study were likely already present in the last eukaryotic common ancestor (LECA) and that they define the identity of ancestral peroxisomes.
It is well-established that several organisms, mainly pathogenic protists, do not present any of the core set of PEX proteins or retain only few of them (Žárský and Tachezy, 2015;Gabaldón et al., 2016;Moog et al., 2017). However, given that related organisms from the same clade contain the majority of proteins from this core set, it is likely that these absences are due to secondary losses. For example, while D. discoideum presents a complete core set of PEX proteins, its close relative E. histolytica (an anaerobic human pathogen) only presents some PEX proteins, such as PEX5, PEX16, and PEX19 (see Figures 2-4). This could mean that these organisms have lost this organelle relatively recently and thus have not entirely lost all PEX proteins yet, but it could also suggest that the remaining PEX proteins retain non-peroxisomal functions. This is not utterly unrealistic, as some PEX proteins have already been suggested to be involved in non-peroxisomal functions. For instance, human PEX3 and PEX19 have been implicated in targeting of lipid droplet protein UBXD8 (Schrul and Kopito, 2016). On the other hand, the example of E. histolytica illustrates drastic evolutionary changes in peroxisomal biology, a fact already observed in other amoeba species like Mastigamoeba balamuthi (Le et al., 2020). Although peroxisome evolution is beyond the scope of our manuscript, we refer to the seminal contributions to the field of others (e.g., Gabaldón et al., 2006;Schlüter et al., 2006;Žárský and Tachezy, 2015;Gabaldón, 2018). These evolutionary studies revealed that peroxisomes likely originated from the endoplasmic reticulum (Gabaldón et al., 2006;Schlüter et al., 2006), with a portion of the peroxisomal proteins, mostly enzymes, having a mitochondrial origin (Gabaldón et al., 2006).
Besides a broadly conserved core set of PEX proteins, we found that a large number of PEX proteins is specific to the kingdom of Fungi, in line with previous findings by Schlüter et al. (2006). Although there is increasing consensus that homology detection failure is frequent (Weisman et al., 2020), our inner controls (see methods) still suggest that these Fungispecific proteins are absent in other lineages. This exposes not only a bias in peroxisome research toward Fungi, but also reveals that peroxisomes are dynamic organelles, their composition evolving under different evolutionary pressures. The loss of specific PEX proteins in some eukaryotes, such as the loss of proteins associated with the PTS2 targeting pathway in C. elegans and the loss of PEX16 in S. cerevisiae and other yeasts, further supports this notion.
Intriguingly, PEX proteins in human pathogens, like T. gondii, T. brucei, and L. major, were often difficult to detect. Moreover, these PEX proteins frequently had additional domains, which could indicate that they may have obtained additional functions. The low homology between PEX proteins of human and human pathogens may be advantageous for the identification of specific drug targets.
The vast majority of the core PEX proteins (PEX1/2/5/6/7/10/12/13 and 14) are involved in matrix protein import, while only a few (PEX3, PEX16, and PEX19) play a role in PMP sorting. In addition to these core PEX proteins all eukaryotes contain multiple proteins of the Pex11 family, which are involved in several peroxisome-related processes. It is unclear why so few proteins have been identified that play a role in PMP sorting. Proteins of the common ER protein sorting machineries, such as the Sec and GET translocons, have been reported to function in the indirect pathway of PMP sorting. The absence of these proteins is lethal in yeast, explaining that such mutants have not been obtained in screens for yeast peroxisome deficient mutants. For the direct pathway of PMP sorting it is unlikely that the entire sorting/insertion machinery consists of only three, or even two for yeast (PEX3/19), proteins.
Most of the currently known PEX genes have been identified in the nineties of the previous century by very successful genetic approaches to identify peroxisome deficient (pex) yeast mutants. Yeast pex mutants are viable and have distinct growth phenotypes (e.g., deficiency to grow on oleic acid or methanol), which greatly facilitated the isolation of these mutants and cloning of the corresponding genes by functional complementation. Most likely this caused the bias toward fungal PEX genes. In addition to S. cerevisiae, which is the main yeast model in cell biology, a few other yeast species were used to identify PEX proteins (Komagataella phaffii [formerly Pichia pastoris], Ogataea polymorpha [formerly Hansenula polymorpha] and Yarrowia lipolytica). Notably, several conserved PEX proteins that are present in the latter three yeast species are absent in S. cerevisiae (for instance PEX20, PEX26, PEX37 and proteins of the main Pex11 group containing fungal PEX11C), while orthologs of the S. cerevisiae PEX proteins PEX9, PEX15, PEX35 are absent in all other species that we analyzed. This stresses the importance of using several yeast models besides S. cerevisiae in cell biology research.
Fusion of human cell lines, derived from patients suffering from peroxisome biogenesis disorders, resulted in the classification of these patients in 12 genotypes/complementation groups (Fujiki, 2016). Using known yeast PEX genes, human orthologs were identified by homology searches on the human expressed sequence tag database. By functional complementation of the cell lines with these putative human PEX genes, 12 of the currently known human PEX genes were identified. Because mislocalization of the PTS1 protein catalase was used as criterion for peroxisome deficiency, the human PTS2 receptor PEX7 was not identified by this approach (Fujiki, 2016). Together with the results of functional complementation of mutant Chinese hamster ovary (CHO) cell lines, at present 16 mammalian PEX proteins are known (compared with 29 in S. cerevisiae).
It is unlikely that all human/mammalian PEX proteins have been identified. Mutations in human/mammalian PEX genes could cause lethal phenotypes, explaining why they have not been isolated in mutant screens. Also, there may be functional redundancy among human PEX genes, which prevents their identification by mutant complementation approaches. Conversely, mutations in yet unknown mammalian PEX genes could cause relatively weak phenotypes and hence were overlooked. Indeed, the approaches used so far resulted in the identification of PEX11β, but not of PEX11α and PEX11γ. Alternative approaches, like the identification of novel peroxisomal proteins using proteomics of isolated mammalian peroxisomes may result in the characterization of novel mammalian PEX proteins.
PEX proteins (peroxins) were originally defined as proteins "involved in peroxisome biogenesis (inclusive of peroxisomal matrix protein import, membrane biogenesis, peroxisome proliferation, and peroxisome inheritance)" (Distel et al., 1996). However, proteins fitting this definition are not always named as such. For instance, T. brucei GIM5A is a member of the Pex11 protein family, but is not named 'PEX.' Also, two proteins involved in peroxisome inheritance, Inp1 and Inp2, are not called PEX. Therefore "inheritance" could be omitted from the original definition of PEX proteins, or these proteins could be renamed. Some proteins that fulfill the PEX protein definition are also involved in other processes and obviously not called PEX. This is for instance the case for the organelle fission proteins FIS1 and DRP1, and ER proteins that play a role in the indirect sorting pathways of PMPs.
Current PEX protein nomenclature has several issues and inconsistencies that can easily lead to confusion. As PEX proteins are numbered chronologically, there is no intuitive link between their names and their function and/or conservation. Additionally, there are several naming inconsistencies relating to PEX protein families. For instance, in higher eukaryotes Pex11 protein family members are named PEX11'X' (e.g., PEX11α/β/γ, PEX11A/B/C). The nomenclature of yeast proteins does not allow the addition of the extra symbol 'X.' These genes invariably consist of a three-letter code (PEX) followed by a number, explaining why PEX11 orthologs in yeast have been designated PEX25, PEX27, and PEX34, not PEX11X.
Since most PEX proteins were initially identified in yeast species and numbered in the order in which they were described, proteins belonging to the same protein family have received different names. For instance, the two AAA ATPases are called PEX1 and PEX6, while the three RING proteins are called PEX2, PEX10 and PEX12. Lastly, there are proteins carrying the same name that are not actually orthologs (e.g., PEX23 in Metazoa). In summary, current PEX protein nomenclature can easily lead to confusion as it is often far from intuitive, sometimes inconsistent and occasionally wrong. This not only leads to confusion within the peroxisome field, but the large number of PEX proteins numbering up to 37 can be quite intimidating for researchers from other fields.
We therefore suggest that it may be prudent to come up with a new naming system. Although it is beyond the scope of the current paper, similar new naming systems are not unprecedented. Indeed, the name PEX protein itself was devised to unify nomenclature regarding proteins involved in peroxisome biogenesis (Distel et al., 1996), thereby renaming the 13 proteins known at the time to be involved in peroxisome biogenesis. More recently, proteins involved in mitochondrial contact site and cristae organizing system (MICOS) (Pfanner et al., 2014), autophagy-related proteins (Klionsky et al., 2003) and ribosomal proteins (Ban et al., 2014) have been re-named. In addition, we recommend setting up guidelines for naming newly discovered 'PEX proteins, ' taking into account phylogeny to extend to ortho-and inparalogs. Moreover, we propose amending the definition of 'PEX proteins' as posed in 1996 (Distel et al., 1996). Proteins involved in peroxisome inheritance such as Inp1 and Inp2 have so far been named differently and should be removed from the definition.
Adopting an entirely new naming system may be very difficult. However, it would already be very helpful to only re-name the most confusing and inconsistent parts. The two largest protein families, the Pex11 family and the Pex23 family, together make up about one-third of all PEX numbers and are arguably the most confusingly named.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Uniprot (https://www.uniprot.org/).

ACKNOWLEDGMENTS
A preprint of this manuscript has been released on BiorXiv.