A novel predicted ADP-ribosyltransferase-like family conserved in eukaryotic evolution

The presence of many completely uncharacterized proteins, even in well-studied organisms such as humans, seriously hampers full understanding of the functioning of the living cells. ADP-ribosylation is a common post-translational modification of proteins; also nucleic acids and small molecules can be modified by the covalent attachment of ADP-ribose. This modification, important in cellular signalling and infection processes, is usually executed by enzymes from the large superfamily of ADP-ribosyltransferases (ARTs). Here, using bioinformatics approaches, we identify a novel putative ADP-ribosyltransferase family, conserved in eukaryotic evolution, with a divergent active site. The hallmark of these proteins is the ART domain nestled between flanking leucine-rich repeat (LRR) domains. LRRs are typically involved in innate immune surveillance. The novel family appears as putative novel ADP-ribosylation-related actors, most likely pseudoenzymes. Sequence divergence and lack of clearly detectable “classical” ART active site suggests the novel domains are pseudoARTs, yet atypical ART activity, or alternative enzymatic activity cannot be excluded. We propose that this family, including its human member LRRC9, may be involved in an ancient defense mechanism, with analogies to the innate immune system, and coupling pathogen detection to ADP-ribosyltransfer or other signalling mechanisms.


INTRODUCTION
ADP-ribosyltransferases (ARTs) are enzymes catalyzing the transfer of ADP-ribose from oxidized nicotinamide adenine dinucleotide (NAD + ) to different acceptor molecules. Thus, they are responsible for chemical modification of various targets such as proteins, nucleic acids and small molecules. These enzymes are highly conserved in evolution and widespread in nature. They are common to all three domains of life: the Archaea, the Bacteria and the Eukarya. Multiplicity of ADP-ribosylated substrates is reflected in a variety of functions performed by ARTs (Aravind et al., 2015;Cohen & Chang, 2018;Munnur & Ahel, 2017;Otto et al., 2005;Teloni & Altmeyer, 2016).
The members of the poly-ADP-ribosyltransferase (PARP) family, PARP1 and PARP2, are examples of ARTs that modify proteins interacting with DNA. PARP1 and PARP2 are localized in the nucleus where they are involved in many cellular processes (Choi et al., 2016;Liang et al., 2013;Riccio, Cingolani & Pascal, 2016;Szántó et al., 2014). PARP1 and PARP2 are engaged in DNA repair, that is elimination of DNA damage such as deamination, hydroxylation and methylation as well as for repair of single strand breaks. Notes. *PARP family members are sorted by subgroup (font format) on the basis of similarities in amino acid sequence, intron positions and associated protein domains (Otto et al., 2005). ** For these domains, Pfam family assignments cannot be made using standard Pfam HMM tool. Italicized motifs were identified based on HHpred and FFAS03 alignments to canonical PARPs, in some cases the two methods were not in agreement.DUF3715 domains correspond to a part of TASORlike ART domains.
domains (Dudkiewicz & Pawłowski, 2019). Other examples of novel ADP-ribosylation players are provided by the viral macro domains, present in many dangerous viruses, including the SARS and SARS-CoV-2 coronaviruses (Saikatendu et al., 2005), and by the recently characterized novel ART-like domain (DUF3715) in the human TASOR protein involved in gene silencing (Douse et al., 2020;Tchasovnikarova et al., 2015).
In this paper, building up on experience in identification of novel enzyme families (Dudkiewicz, Lenart & Pawłowski, 2013;Dudkiewicz & Pawłowski, 2019;Dudkiewicz et al., 2012;Pawłowski et al., 2006;Sreelatha et al., 2018), we identified structural similarity between ART-like catalytic domains and a family of eukaryotic uncharacterized protein domains present in homologs of the leucine-rich repeat containing protein 9 (LRRC9). For clarity, we will call this domain LRRC9-ART. Human LRRC9 is annotated in the UniProt database as protein of unknown function with experimental evidence at transcript level. LRRC9 locus is located on chromosome 14 (q23.1). Gene and protein names refer to the twenty-two leucine-rich repeats located between positions 97 and 246 and 717-1365. According to The Human Protein Atlas (Uhlen et al., 2015), expression of LRRC9 mRNA is enhanced in brain, pituitary gland and testis.
We investigated sequence similarities between the novel and the known ADPribosyltransferases. We reconstructed phylogenetic relationships between sequences within LRRC9 domain family and other ART clan families. We also explored sequence variability in the novel ART-like domain in the context of predicted three-dimensional structures and proposed their likely biological functions.

MATERIALS AND METHODS
The FFAS03 (Xu et al., 2014), HHpred (Zimmermann et al., 2017) and Phyre2 (Kelley et al., 2015) servers were used to determine distant sequence similarities of LRRC9 central domain to proteins of known structures from public databases. Standard parameters and significance thresholds were selected.
The representative set of LRRC9 ART-like domain sequences was collected by submitting the ART-like domain of the human LRRC9 protein (UniProtKB: Q6ZRR7.2, positions 389-628) to two iterations of JackHMMER (also with standard parameters) ran on the Reference Proteomes database.
The MAFFT program (Katoh, Rozewicki & Yamada, 2019) was used to build the multiple sequence alignments of the novel domains and PARP catalytic domains obtained from the rp75 set from the Pfam database (El-Gebali et al., 2019). In-house scripts were used to merge family-wise multiple sequence alignments according to a FFAS pairwise alignment of representatives of the two families. The WebLogo server (Crooks et al., 2004) was used to visualize results as sequence logos.
The CLANS (Frickey & Lupas, 2004) algorithm was used to visualize close and distant similarities between the putative and the known ADP-ribosyltransferase domains. Sequence similarities up to BLAST E-value of 1, 1e −2 or 1e −4 and the BLOSUM45 substitution matrix were used. In order to acquire collection of the known ADP-ribosyltransferase domains, the following sets of sequences were obtained from Pfam database: ADPrib_exo_Tox rp15, ADPRTs_Tse2 rp15, Anthrax-tox_M rp75, Arr-ms rp15, ART rp15, ART-PolyVal rp15, AvrPphF-ORF-2 rp15, Diphtheria_C rp75, Dot_icm_IcmQ rp35, DUF2441 rp15, DUF952 rp15, Enterotoxin_a rp15, Exotox-A_cataly rp75, NADase_NGA rp35, PARP rp15, Pertussis_S1 rp15, PTS_2-RNA rp15, RolB_RolC rp55 and TNT rp15. Next, collected data were supplemented with three sequence sets representing the following ART-like domain families not included in Pfam database: ESPJ, NEURL4 and DUF3715/TASOR. Sets of ESPJ and SidE domain sequences were obtained by submitting sequences from UniProtKB and Protein NCBI databases (UniProtKB: W0AL45, positions 52-214 and refseq ID: YP_094288.1, positions 686-912, respectively) to two iterations of JackHMMER with standard parameters, ran on UniProtKB database. The input sequence of NEURL4 (UniProtKB: F2TYZ7, positions 102-275) was subject to the same procedure, but extended to three iterations and ran on the Reference Proteomes database. Set of DUF3715 sequences was obtained by submitting the TASOR ART domain (UniProtKB: Q9UK61.3, positions 92-337) to two iterations of JackHMMER with standard parameters. Next, the whole collection of putative and known domain sequences were prepared for the CLANS procedure by clustering with CD-HIT and selecting representatives at a 70% sequence identity threshold (Huang et al., 2010).
In order to determine phylogenetic spread of the novel ART domains, a set of LRRC9 ART-like domain homologs was obtained using JackHMMER search seeded with input sequence UniProtKB: Q6ZRR7.2, positions 389-628. Then, Phylogeny.fr platform (Dereeper et al., 2008) was utilized to construct phylogenetic tree. A multiple sequence alignment was generated using the MUSCLE program (Edgar, 2004). Next, advanced mode of the platform was utilized to build phylogenetic tree using the maximum likelihood PhyML program (Guindon et al., 2009) with standard settings (WAG amino-acid substitution model, four substitution rate categories, aLRT test for bootstrap support). Phylogenetic relationships between homologs were visualized as a dendrogram using the iTOL server (Letunic & Bork, 2016).
Putative NAD binding pockets in all modelled structures and chosen native PARP structures were first found and described using DoGSiteScorer fully automatic algorithm for pocket and druggability prediction available at ProteinsPlus server (Volkamer et al., 2012) and independently identified using the COFACTOR algorithm (Roy & Zhang, 2012), an option delivered by the I-Tasser (Iterative Threading ASSEmbly Refinement) server for protein structure and function prediction (Yang et al., 2015) based on threading the query structural model through the BioLiP protein function database. COFACTOR identifies potential functional sites by local and global structure matches and suggests annotations. Putative poses of NAD ligand in the LRRC9-ART trRosetta model were predicted by docking using AutodockVina module in UCSF Chimera (Trott & Olson, 2010). Structures were visualized and analyzed using UCSF Chimera tools (Huang et al., 2014).
The three-domain model for full-length human LRRC9 protein was built using AIDA (Ab Initio Domain Assembly Server) (Xu et al., 2015) based on structures modelled using Modeller9v21 separately for three identified domains: N-terminal LRR region with helical fragment (PDB code of modelling template: 3OJA, 15% identity), ART-like domain (template: 3HKV), and C-terminal LRR region domain (template: 4LSX, 21% identity).
Conservation values of sequence positions in multiple sequence alignment created for the human LRRC9-ART domain homologues (for mapping onto the modelled structures), were derived from Jalview alignment editor (Waterhouse et al., 2009) and were automatically calculated as the number of conserved physicochemical properties for each column of the alignment (Livingstone & Barton, 1993).
The LRR repeats were identified using the LRRFinder tool (Offord, Coffey & Werling, 2010). Gene Ontology analysis of human genes encoding LRR-containing proteins (i.e., their cellular/extracellular localization, molecular function and association with biological processes) was performed with the use of Panther Classification System (Mi et al., 2017). The domain organization of LRRC9 proteins was visualised using the DOG2.0 tool (Ren et al., 2009).
For analysis of protein and mRNA expression, the Proteomics-db, PAXdb, neXtProt and Protein Atlas databases were used (Samaras et al., 2020;Thul & Lindskog, 2018;Wang et al., 2015;Zahn-Zabal et al., 2020). The BioGRID Interaction Database was used to obtain information about possible place of LRRC9 in human protein-protein interaction networks (Oughtred et al., 2019). For prediction of the subcellular localization of LRRC9, the DeepLoc server was used (Almagro Armenteros et al., 2017).

Assignment of the central domain of LRRC9 to the ADP-ribosyl clan
A sequence similarity screen of human proteins for remote homologs of ADP-ribosyltransferases (ARTs) using the FFAS03 method indicated that weak ART similarity might exist in the central region of the LRRC9 protein (Figs. 1, 2). In order to examine in detail this similarity, we used FFAS03, HHpred and Phyre2 servers. All of them are dedicated to protein structure prediction, allowing detection of non-obvious sequence similarity between very distant protein homologs. These three independent bioinformatics tools revealed statistically significant albeit distant sequence similarity of a region of human LRRC9 protein to representatives of the PARP catalytic domain family between positions 392 and 625 (Table 2). Such remote sequence similarity cannot be the basis for drawing a definite conclusion about an enzymatic function of the putative ART domain.
The relationship of the novel LRRC9 ART-like family, found in many eukaryotic lineages (Fig. 3), to the known ART families can be visualized using the sequence-based CLANS clustering approach (Fig. 4). CLANS analysis was performed at three different E-value   Multiple sequence alignment comparison allowed us to identify few similarities and substantial differences between sequences of PARP catalytic domains and ART-like domains, which we identified in human LRRC9 protein sequence in its central part, from position 392 to 625. The tyrosine residue (motif II), one of the catalytic triad elements in PARP, was found to be replaced by a weakly conserved Glu (E525) in human LRRC9-ART. Also, His and Ser residues (from the motif I, His-x-Ser, positions 474-476 in the logo in Fig. 1), highly conserved among PARP enzymatic domain members, were not conserved in LRRC9. The His residue was most often substituted by Tyr. Glutamate, the third element of the catalytic triad (motif III, position 604 in the sequence logo), was replaced by a poorly conserved Lys (Fig. 1). In PARP1, PARP2, PARP10, PARP12 and PARP15 sequences, there is also another important Tyr residue (Tyr932 in human PARP10) that is responsible for nicotinamide stacking (Karlberg et al., 2015). Some authors (Karlberg et al., 2015) include this second Tyr in the group of conserved residues being the hallmark of PARP ART-domains (Karlberg et al., 2015), hence we will use the term ''catalytic tetrad HYYE'', representing the non-contiguous catalytic motif H-x(n1)-Yx(n2)-Y-x(n3)-E. This tyrosine is conserved in PARP (at the logo position 538, Fig. 1A) and not conserved in the LRRC9 ART-like domain (Fig. 1B). Lack of most of the catalytic amino acid residues, evolutionarily conserved in PARP catalytic domains, suggests that LRRC9 ART-like domains may be pseudoenzymes. Interestingly, the FFAS03 sequence alignment between human LRRC9 ART-like and PARP10 ART domains, especially in its central part, is not unequivocal. We obtained different alignments using FFAS03 and HHpred servers and even in FFAS results, there are some suboptimal alignment paths to be considered (Figs. S1 and S2). According to the FFAS alignment, PARP region flanked by two tyrosines belonging to the catalytic tetrad HYYE is aligned to a poorly conserved region 525-538 of the LRRC9 ART-like domain (Fig. 1A), but three positions upstream and downstream of this fragment there are two distinct motifs of LRRC9-ART domain: PR [IL] and [DE]xxxFRHG, respectively. This suggests that sequence-based alignments may be inaccurate in this region and one of both above-mentioned LRRC9 conserved motifs may in fact be involved in the active site. Also, the strongly conserved D-(5)-PEY-(2)-EFEY motif in LRRC9 (logo positions 609-623), although not aligned to PARP active site, may be speculated to be functionally important.

Structural analysis of the homology model
Because distant homology detection methods did not offer us consensus for template selection, we decided to create model based on three different approaches: classical one-template homology modeling (Modeller), reassembling structural fragments from threading templates using Monte Carlo simulations (I-Tasser approach, best model with C-score 0.5) and deep learning based methods which use covariance signals obtained from sequence alignments to predict inter-residue distances (trRosetta, best model confidence: 0.70). In the next step, we used DogSiteScorer to analyse the structure models to identify putative ligand binding pockets and compare them to the pockets in known PARP structures (Table S1, Fig. S3) To illustrate uncertainty of structural arrangements of putative ligand-binding residues identified by FFAS03 alignments, we used the homology model (PARP10/3HKV template-based) and the trRosetta (deep learning) model which according to ligand pocket analysis are more likely to accommodate a ligand. Comparison of the template 3HKV structure and these two alternative models is presented in Fig. 5. To answer the question if the observed distant sequence similarity between LRRC9-ART and PARP families can translate into functional similarity, we focused first on the known structural determinants of ADP-ribosyltransferase activity (Karlberg et al., 2015) and relating these to LRRC9-ART features. In classical PARPs, like PARP1 and PARP2, possessing a poly-ADP-ribosylation (PAR) activity, besides the canonical H-x(n1)-Y-x(n2)-Y-x(n3)- [EI] tetrad mentioned above, there are two other key positions associated with structural and functional roles of these proteins: glycine residue with nicotinamide anchoring function and an additional catalytic amino acid, lysine 903 (according to PARP1 numbering) (Karlberg et al., 2015). In PARP10 and PARP12 with mono-ADP-ribosylation (MAR) activity, instead of this catalytic lysine, there is a leucine or tyrosine residue. Investigating profile-profile alignments for human LRRC9 and PARP10 ART domains one can easily notice that only the first of the canonical tetrad residues is conserved, although substituted (H->Y). Thus, histidine and glycine responsible for nicotinamide anchoring and binding are replaced by tyrosine and valine (Figs. 5B, 5C). In PARPs, the glycine makes two conserved H-bonds to the nicotinamide via main chain carbonyl oxygen and amine hydrogen. These should not be disrupted by valine substitution as the side chain points outward from the active site. Interestingly, in LRRC9-ART at position 532 there is a weakly conserved Lys, as in PARP family members with poly-ADP-ribosylation (PAR) activity. However, instead of a second Tyr (logo position 538) and Glu/Ile residues (604), there are poorly conserved Ser and Lys.
To identify possible ligand pockets in modelled structures we used DoGSiteScorer. For comparison, we also analysed classic PARP structures where ligand binding sites are known and described. We selected PARP1 (ART domain with poly-ADP-ribosylation activity), PARP10 (mono-ADP-ribosylation activity) and two examples of inactive PARP/ART-like domains: PARP13 and TASOR. In case of native structures, DoGSiteScorer correctly identified pockets in the vicinity of PARP hallmark H-Y-Y-E tetrad or its counterpart as the best scoring ones. In case of models, pockets detected in the neighborhood of H-Y-Y-E aligned residues were the best scoring ones for I-Tasser and trRosetta models (Table S1). Overall shape, volume and surface of modelled pockets were very different, but the best score (close to the score obtained for native NAD binding pocket in PARP1) and parameters mostly resembling active PARPs were identified in trRosetta model of LRRC9-ART (Fig.  S3). Thus, this model appears to best capture the putative ligand binding site in LRRC9.
Additionally, to investigate possible ligands for the LRRC9-ART domain we used COFACTOR. COFACTOR is a functional annotation tool based on threading of the protein structure query through a database of ligand-protein binding interactions by local and global structure matches to identify functional sites (Zhang, Freddolino & Zhang, 2017). COFACTOR suggests the best candidates fitting into the ligand binding site of a given structure. Adenosine monophosphate (AMP) turned out to be among the best scoring ligands for the LRRC9-ART homology model while NAD, a typical ligand for ART domains, was absent from the predicted ligand list. AMP molecule, which differs from NAD by the lack of nicotinamide mononucleotide moiety, is smaller and probably fits better in the modelled LRRC9-ART ligand binding groove. The docked AMP molecule pose was deduced from homology to crystal structure of the catalytic domain of Pseudomonas aeruginosa exotoxin A (PDB ID: 1DMA) which catalyzes the transfer of ADP ribose from NAD to elongation factor-2 in eukaryotic cells and hence inhibits protein synthesis. These docking results do not suggest that AMP is a physiological ligand of LRRC9-ART, rather, they provide a likely binding pose of the AMP moiety of NAD. For trRosetta model, COFACTOR found tankyrase-1 complexed with a PARP inhibitor, P4L, as the best ligand binding site analogue. Here, NAD molecule was not present on the list of ligands complexed with identified binding site structural homologues.
To examine the possibility of NAD binding, we attempted to dock the full NAD molecule into the LRRC9-ART hypothetical active site in the trRosetta (deep learning) structure model. The ligand was constructed in UCSF Chimera build tool and then it was docked using Autodock Vina procedure without specific constraints to the binding groove identified by DoGSiteScorer (Fig. S4B). The docking procedure located NAD in the best scoring pocket in few different poses (Fig. S4A), the best one with a good docking score of −6.8 (Fig. S4C). This suggests we cannot exclude the LRRC9-ART domain, at least as modelled using trRosetta approach, can bind NAD. However, docking a ligand to a structure model built using very distant templates has to be seen as speculative.
Replacing the catalytic Glu residue by Ile is characteristic for PARP family members having mono-ADP-ribosylation (MAR) activity (Karlberg et al., 2015). Among FFAS/HHpred hits for LRRC9-ART domain, proteins with MAR, as well as PAR activity and with no confirmed ADP-ribosylation activity appeared. In the sequence logos of ART and LRRC9-ART, there are no common conserved motifs between the two domain families. In LRRC9, the second tyrosine of the HYYE tetrad is replaced by glutamine 525, which is noticeably less conserved.  positions 616-623). As mentioned earlier, these motifs may correspond to each other, and the apparent misalignment may be just an artifact. Figure S4 shows the trRosetta model of LRRC9-ART structure, with NAD molecule docked in the predicted active site. Figure S4B focuses on the same residues as Fig. 5 but colored according to conservation in the LRRC9-ART family. Remarkably, only tyrosine 474 and valine 475 (out of the residues aligned to the catalytic tetrad) are quite strongly conserved in the whole family. The rest of LRRC9 amino acids aligned to the classical HYYE tetrad residues in the ligand-binding hydrophobic pocket are not highly conserved. This approach of docking a ligand to a structure model built using remote sequence similarity cannot be regarded as authoritative; it is only preliminary and can give a general idea of ligand placement, but no detailed information on its interactions. The model presented in surface representation (Fig. S4C) shows the conserved residues are grouped at the bottom of the putative ligand binding groove.
The ART-like domain described here is only one region of approx. 200 residues within the whole LRRC9 protein molecule, which is 1453 amino acids long, with clear similarity to leucine-rich repeat-containing domains in its N-and C-terminal parts, and a very long helical ''spine'' in the center. We were tempted to construct the full-length molecule model, based on homology to LRR-containing templates and using the already modelled ART-like domain structure. The results are shown in Fig. S5. In the full-length structure model proposed here, a large groove is surrounded by LRR ''horseshoes'' from three sides and an ART-like domain from the fourth. The size of the groove seems to suggest a possibility of binding a large molecule, e.g., a protein rather than a small cofactor. It is tempting to speculate that LRRC9, binding some ''bait'', rearranges its conformation and activates the enzymatic ART-like domain, thereby triggering a cellular signal. However, the full-length model is only an illustration of possible domain arrangements.

Taxonomic distribution of the LRRC9-like proteins
The LRRC9 protein is widespread in several of the major eukaryotic lineages. It is found in Alveolata and Stramenopiles (from the TSAR supergroup Burki et al., 2020), Cryptophyta, Haptophyta, Viridiplantae, Fungi and Metazoa (Opisthokonta) and Euglenozoa while absent from several other main lineages, e.g., Hemimastigophora, CRuMS, Metamonada, Malawimonadida, Ancyromonadida, Ancoracysta, Picozoa (Fig. 3). Thus, this protein appears to be an ancestral eukaryotic feature, that was subject to losses in a number of lineages.
ART-like domain is present in all vertebrate classes as well as in 11 non-vertebrate animal phyla (e.g., Cnidaria, Echinodermata, Brachiopoda, Mollusca, Annelida, Hemichordata, Chordata). However, notable is its absence in several animal phyla, some of which include important model organisms (e.g., Arthropoda, Nematoda, Porifera; see Fig. 3). Also, strikingly, although LRRC9-ART domains are found in some Viridiplantae, e.g., mosses, green algae, liverworts, and club mosses, they are absent from flowering plants with the exception of the water lily Nymphaea colorata. Notably, water lilies belong to Magnoliophyta, a taxon thought to have diverged earliest from the lineage leading to most extant flowering plants (Zhang et al., 2020). Altogether, this taxonomic spread suggests an ancient function for LRRC9, common to many diverse eukaryotes.

DISCUSSION
Because the regions of LRRC9-ART domain corresponding to ADP-ribosyltransferase active sites are poorly conserved, the most straightforward conclusion appears to be that LRRC9-ART is a pseudoenzymatic domain. However, as exemplified by the recent cases of two pseudokinases identified by us, SelO and SidJ, apparent pseudoenzymes may in fact perform enzymatic functions somewhat similar to those performed by their ''non-pseudo '' cousins (Black et al., 2019;Sreelatha et al., 2018). Also, many inactive pseudoenzymes engage their non-pseudo cousins and modulate their activity, e.g., allosterically (Adrain, 2020). Thus, we discuss the significance of the LRR repeats flanking the ART-like domain while speculating that the central domain either performs by itself an ADP-ribosylationrelated catalytic function, or modulates such function performed by yet another molecule.
Apart from the ART-like domain, LRRC9 proteins comprise a variable number of leucine-rich repeats (LRRs) flanking the ART domain (Fig. 2). In all the homologs analysed, the LRR-ART-LRR domain architecture is conserved in evolution. The N-terminal region possesses 1-6 LRR repeats while the C-terminal region possesses 11-20 such repeats. The N-terminal repeats and C-terminal repeats are most similar to the LRR repeats from proteins such as Leucine Rich Repeat Containing 23 (LRC23) and TLR4 Interactor with leucine-rich Repeats (TRIL), respectively, however, degree of similarity between LRR motifs from different proteins is generally low and makes it difficult to draw specific functional conclusions. The LRR motifs typically have a length of about 20-30 amino acid residues and are rich in leucines or other aliphatic residues (typically seven per motif, localized at positions 2, 5, 7, 12, 16, 21 and 24 within the consensus LRR sequence) (Kobe & Deisenhofer, 1994). The LRR motifs may form various combinations of two or three secondary structures, usually involving two stretches of a β-strand, e.g., two β-strands and an α-helix. The LRR repeats typically fold into a horseshoe-like conformation composed of helices present on its convex face and parallel β-sheet localized on the concave one (Enkhbayar et al., 2004). LRR motifs determine functionality of LRR-containing proteins, enabling them to make protein-protein interactions (Kobe & Deisenhofer, 1994) as well as to bind various target structures such as small molecule hormones, lipids and nucleic acids (Helft et al., 2011). The non-globular shape of LRR motifs may facilitate the contact between LRR-containing proteins and the target structures, increasing the interaction area (Kobe & Deisenhofer, 1994). For example, ribonuclease inhibitor is able to bind pancreatic ribonucleases and inhibit their enzymatic activity. This interaction may affect RNA turnover in the context of angiogenesis (Kobe & Deisenhofer, 1993).
Employing the LRR motifs, LRR proteins perform various functions. They may play a role in signal transduction and cell development as well as they are able to participate in RNA splicing and effective response to DNA damage (Kobe & Deisenhofer, 1995). Many LRR-containing proteins are adhesive molecules that perform important functions in processes such as regulation of collagen-fibril formation, osteogenesis, myelination and platelet adhesion at a site of vascular injury. Many proteins with LRR motifs play a role in signal transduction as specific receptors. For example, in response to LPS stimulation, LRR-containing CD14 protein induces tyrosine phosphorylation of intracellular proteins to stimulate antibacterial activity of macrophages. Other receptors exhibit specificity to gonadotrophins such as luteinizing hormone, chorionic gonadotrophin and folliclestimulating hormone (Bluhm et al., 2004).
The LRR domains, structurally conserved in evolution, occur widely within plant, invertebrate and vertebrate proteins responsible for innate immunity. Due to their ability to mediate protein-protein interactions, LRR repeats determine signaling function of many pattern recognition receptors (PRRs), such as Toll-like receptors (TLRs) and NOD-like receptors (NLRs), by conditioning their affinity to various ligands, e.g., viral, bacterial, fungal and parasite antigens . On the other hand, ADP-ribosylation is known to be involved in regulation of the host immune response (Fehr et al., 2020;Rosado et al., 2013). It may direct intracellular signaling to parthanatos (type of programmed death) (Fatokun, Dawson & Dawson, 2014;Greenwald & Pierce, 2019;Hoch & Polo, 2019), whereas such scenario implies induction of inflammatory response to infection. Poly-ADP-ribosylation is known to influence the stability of transcripts encoding proinflammatory cytokines (Ke et al., 2019). PARP-dependent chromatin modification may counteract progression of viral infection. ADP-ribosylation also influences other innate immune mechanisms such as NF-kB expression, phagocytosis and macrophage polarization (Kunze & Hottiger, 2019). Here, co-occurrence of ART and LRR domains within the LRRC9 protein suggests that they may act together to support a version of host innate immunity. In this context, the LRR domains could perform signal sensors function, and the ART domain might play the effector role.
Overall, in the human proteome there are 234 proteins containing LRR repeats. The molecular functions significantly overrepresented among these proteins include peptide binding (11 proteins, FDR p-value 8E-6), ubiquitin-protein transferase activity (11 proteins), heparin binding (7 proteins) and G-protein-coupled peptide receptor activity (7 proteins). However, as many as 182 human LRR proteins are functionally uncharacterized, according to Panther-db. Human proteins containing LRR repeats are involved in processes such as neurogenesis (17 proteins, FDR 1E-5), ubiquitin-dependent protein catabolic process (11 proteins), positive regulation of innate immune response via toll-like receptor signaling pathway (10 proteins, FDR 8E-10).
The ART-like domain combined with LRR motifs has precedents. As many as 35 human LRR proteins possess enzymatic domains. Notable here are the NLRP immune sensors/effectors containing NACHT ATP-ase domains, totaling 17 human proteins (Martinon & Tschopp, 2007;Pawlowski et al., 2001). Other enzymes with LRR motifs include protein kinases (11 proteins), nucleases and peroxidases. Eight human LRR proteins also possess F-box motifs and are involved in ubiquitin ligase complexes.
A bioinformatics study analyzing non-LRR regions embedded within LRR proteins identified the LRRC9 family as containing such a ''non-LRR island'', noted a few conserved sequence motifs and homologs in a number of non-vertebrate eukaryotes but did not observe similarity to any characterized functional domains (Matsushima et al., 2009).
As no high-quality protein expression data is available for LRRC9, it is annotated as ''existence validated on transcript level'' in the neXtProt (Zahn-Zabal et al., 2020) database. According to PAXdb (Wang et al., 2015), LRRC9 protein expression is elevated in tonsil, esophagus and seminal vesicle tissues. This may suggest that LRRC9 protein expression is limited to specific circumstances, and is not easily captured in typical tissue proteomics experiments. However, recently, a study aimed at validating the ''missing proteins'' confirmed the existence of LRRC9 protein in human sperm by applying targeted mass spectrometry and antibody-based methods (Carapito et al., 2017). LRRC9 mRNA expression is highest within the brain as well as endocrine and male reproductive tissues, i.e., pituitary gland and testis, respectively (Protein Atlas database). The apparent discrepancy between mRNA and proteomics expression data for LRRC9 may be related to technical difficulties (e.g., too few unique peptides of sufficient length). According to the DeepLoc prediction, the subcellular localization of LRRC9 is cytoplasmic (likelihood 0.79).
For an uncharacterized protein, its physical interactors may shed light on its function. However, according to the BioGRID database, affinity capture-mass spectrometry analysis provided evidence that LRRC9 physically interacts with a single protein, ZFP36L2 (Noguchi et al., 2018). ZFP36L2 is an RNA-binding protein regulating cell cycle (Galloway et al., 2016) that contributes to pathogenesis of pancreatic ductal adenocarcinoma (PDAC). ZFP36L2 is responsible for an increase in cancer cell aggressiveness (Yonemori et al., 2017). A Kaplan-Meier survival analysis (Protein Atlas) shows that LRRC9 mRNA expression is positively correlated with survival among patients with pancreatic cancer (p = 0.0034), although the gene is not classified as prognostic. This may suggest an antagonistic relation between LRRC9 and ZFP36L2 activities in the context of ZFP36L2-dependent promotion of PDAC progression. However, the LRRC9-ZFP36L2 interaction was only reported in a high-throughput experiment, and further investigation is required here.
LRRC9-ART should be classified as a separate ART-like domain family, not only because of the co-occurrence with LRR domains, but primarily because the LRRC9-ART similarity to known ARTs is very remote, and the conservation patterns in LRRC9-ART family are distant from those typical for either PARP or TASOR families.
Pseudoenzymes are characterized by the lack of enzymatic activity despite the common evolutionary origin and structural similarity to catalytically active homologues (Jeffery, 2020;Murphy, Mace & Eyers, 2017;Ribeiro et al., 2019). Pseudoenzymes do occur among PARP family members, including PARP13 that is distinguished by amino acid substitutions and structure changes preventing catalysis. Similarly to LRRC9-ART, PARP13 catalytic domain is characterized by lack of conserved His and Glu residues (Karlberg et al., 2015). It has been shown that substitution of catalytic His with Ala within the ART domain of PARP1 substantially decreases its catalytic activity (Marsischky, Wilson & Collier, 1995). The results were consistent with previous studies on diphtheria toxin, another member of H-Y-[EDQ] clade. Replacement of catalytic His reduced the toxin's activity by at least 70-fold (Blanke et al., 1994). Thus, the LRRC9-ART family, lacking most of the ART active site, can be hypothesized to be pseudoenzymes. However, one cannot exclude the possibility that this ART-like domain has retained or regained the ART catalytic function, or acquired a novel enzymatic activity, in both cases employing an atypical and/or migrated active site. Such a scenario has been recently reported for apparent pseudokinases, SelO and SidJ, that were shown to be AMPylases and polyglutamylases, respectively (Black et al., 2019;Sreelatha et al., 2018). It can also be envisaged that LRRC9 may require another protein factor to complement its active site, similarly to ADP-ribosyltransferases PARP1 and PARP2 that form complexes with HPF1 which contributes a ''missing residue'' to the otherwise ''incomplete'' ART active site (Suskiewicz et al., 2020). Another precedent for a ''third party'' protein required for ART activity is provided by the PARP9/DTX3L heterodimer whereas the otherwise inactive PARP9 needs the interaction partner to become active (Yang et al., 2017).
On the other side, in eukaryotic cells, poly-ADP-ribosylating proteins may perform some of their functions without utilizing the enzymatic activity. For example, PARP1 participates in NF-kB-dependent gene transcription via two different mechanisms but only one requires poly-ADP-ribosylation whereas the second seems not to involve PARP1 enzymatic function and depends on the structure of the protein (Hassa et al., 2003;Weaver & Yang, 2013). Both a pseudo-enzymatic character of LRRC9-ART, and migration of the active site are possible for this intriguing novel family. Experimental studies are needed to confirm one of those hypotheses. The unique, conserved domain architecture of LRRC9, suggests that this mysterious protein family could be involved in a defense mechanism, with some analogies to the innate immune system, coupling within a single molecule the detection of foreign objects (LRRs) and downstream signalling (ART domain).