Microproteins: Overlooked regulators of physiology and disease

Summary Ongoing efforts to generate a complete and accurate annotation of the genome have revealed a significant blind spot for small proteins (<100 amino acids) originating from short open reading frames (sORFs). The recent discovery of numerous sORF-encoded proteins, termed microproteins, that play diverse roles in critical cellular processes has ignited the field of microprotein biology. Large-scale efforts are currently underway to identify sORF-encoded microproteins in diverse cell-types and tissues and specialized methods and tools have been developed to aid in their discovery, validation, and functional characterization. Microproteins that have been identified thus far play important roles in fundamental processes including ion transport, oxidative phosphorylation, and stress signaling. In this review, we discuss the optimized tools available for microprotein discovery and validation, summarize the biological functions of numerous microproteins, outline the promise for developing microproteins as therapeutic targets, and look forward to the future of the field of microprotein biology.


INTRODUCTION
Recent studies have shown that the mammalian genome harbors hundreds of previously unannotated short open reading frames (sORFs) with the potential to code for functional small proteins called microproteins. [1][2][3][4][5] Microproteins are typically less than 100 amino acids (AAs) in length, and until now, they have evaded detection because traditional genome annotation methods relied on stringent rules to distinguish protein coding RNAs versus non-coding RNAs (ncRNAs) to minimize the discovery of false positives including a minimum ORF length of 300 base pairs (bps). This ad hoc 100-codon threshold was initially selected based on the calculated probability that ORFs over 300 bps are significantly more likely to encode stable proteins. 6 sORF-encoded microproteins have emerged as important new players in cellular biology and physiology, and they continue to be identified at high rates. [1][2][3][4][5] Of course, small proteins (i.e., peptides) have long been recognized as key players in biology such as the essential hormone peptide insulin, 7 and the neuropeptides substance P 8 and neuropeptide Y. 9 However, these peptides are fundamentally different from microproteins in that they are synthesized as larger precursor molecules (i.e., preproproteins) and are post-translationally processed and cleaved by proteases to generate their active peptide product. 10 At the genomic and mRNA level, microproteins and preproproteins share many features. Their genes can be comprised of single or multiple exons, and their mRNAs are post-transcriptionally processed prior to translation (capped at the 5 0 end, spliced, and 3 0 polyadenylated). However, microproteins are translated directly from their mRNA as mature, functional proteins and are not typically products of post-translational cleavage.
Microproteins participate in diverse biological processes and are under investigation as possible therapeutic targets for diseases such as heart failure, obesity, and cancer. A large number of identified microproteins function as allosteric regulators of larger proteins and modify or fine-tune their activities, [11][12][13][14] while others have been shown to work independently as signaling molecules or effector proteins. [15][16][17] Interestingly, a large portion of microproteins functionally characterized thus far contain conserved transmembrane domains and localize to the plasma membrane or membranes of subcellular organelles where they exert their functions. [12][13][14][18][19][20][21][22][23] While it is possible that microproteins are enriched in membrane domains, another possibility is that current microprotein identification methods are inherently biased for the identification of membrane microproteins due to the presence of their highly conserved transmembrane domains, which can be readily detected using domain prediction tools. [24][25][26] Therefore, it is important to continue to systems and identified by MS analysis. For example, the APEX system was recently used to identify interacting partners of the uncharacterized microprotein C11orf98. 45 Additionally, a recent study elegantly applied the MicroID system to identify numerous microproteins and alternative proteins (alt-proteins) in subnuclear compartments both in vitro and in vivo 44 highlighting the ability of this method to be implemented in the discovery of microproteins when coupled with MS. While these proximity ligation assays have enabled the discovery of microproteins, it is important to note that these systems rely on treatment with biotin or hydrogen peroxide which are found endogenously and may yield false positives, thus necessitating experimental validation. 44

Publicly available bioinformatic tools and databases
Numerous bioinformatic tools and databases have been developed for the identification of microproteins and a subset of these tools is featured in Table 1. Additionally, a recent review highlighted many of the tools and databases that are publicly available 29 including SmProt 47 and sORFs.org, 48 which contain thousands of predicted sORFs with protein coding potential. However, concerns have been raised regarding the stark difference between both databases, though both used many of the same datasets for construction. 5 This is presumably because each applied different criteria for classifying putative coding ORFs, further highlighting the need for more cohesive approaches for classifying sORFs. Collaborative efforts to standardize the annotation of translated sORFs are currently underway and will facilitate high confidence microprotein identification in the future. 49 Still, while there are a growing number of bioinformatic tools available to identify sORF-encoded microproteins, there is still a need to continue optimizing these tools and databases to specifically identify microprotein-coding potential in the genome.
Here, we will highlight one such tool, PhyloCSF, which is a conservation-based algorithm that has been successfully used to identify numerous microproteins. [11][12][13][14]50,[56][57][58][59] A hallmark feature of protein coding ORFs is their high degree of sequence conservation at the AA (codon) level, which differs from ncRNAs that are typically not highly conserved. These conservation signatures can be leveraged to identify sORF-encoded microproteins using PhyloCSF, which applies a phylogenetic sequence conservation algorithm to analyze and score codon substitution frequencies across >50 mammalian species. 50 PhyloCSF positively scores synonymous codon substitutions (those that encode the same AA) or AAs with similar properties (polarity, iScience Review charge, hydrophilicity, etc.), while nonsynonymous or missense substitutions are scored negatively. Importantly, PhyloCSF can classify very short portions of coding sequences in isolation from the full sequence, which is necessary when considering individual exons and this inherent property of the program can be leveraged to help identify sORFs with microprotein-coding potential. 49,50 PhyloCSF is user-friendly and can be easily accessed and used via the UCSC Genome Browser. 60

Microprotein characterization and validation
It is important to note that while bioinformatic and computational approaches can be readily applied to assess microprotein-coding potential for genomic regions of interest, such predictions do not guarantee that identified coding ORFs will be translated into functional protein products. Therefore, identified microproteins must be validated by experimental approaches such as raising a sequence-specific antibody to detect the microprotein of interest. A caveat to antibody-based validation is that due to the inherent small size of microproteins, there are limited options for target antigen design for antibody production, and available sequences may not be suitable to generate strong antigenicity. Additionally, since many microproteins contain transmembrane domains or are associated with larger proteins, the antibody-binding site may not be readily available under native conditions for recognition using methods such as immunoprecipitation and immunocytochemistry. Alternatively, the CRISPR/Cas9 genome editing system can be used to insert a coding sequence for an epitope tag (e.g., HA, FLAG, myc) in-frame with the microprotein of interest within its endogenous locus via homology-directed repair (HDR). 61 When using CRISPR/Cas9 HDR knock-in methods to generate epitope-tagged fusion proteins, it is ideal to screen knock-in clones with the tag inserted into the N-and C-terminus of the sORF of interest because the properties of the tags themselves may influence microprotein localization and function. 61 Many of these methods have been successfully implemented to elucidate the microproteins described in detail below.

THE FUNCTIONS OF MICROPROTEINS IN DISTINCT SUBCELLULAR DOMAINS
sORF-encoded microproteins play diverse functions in cellular physiology that can be attributed in part to their localization to specific subcellular structures and organelles including the plasma membrane, sarco/ endoplasmic reticulum (S/ER), endo/lysosome, mitochondria, cytoplasm, and nucleus. When characterizing new microproteins, deciphering their subcellular localization is an important step toward determining potential interacting partners and understanding their function. Here, we will introduce numerous newly discovered microproteins and discuss the critical biological functions they perform in discrete subcellular domains (summarized in Table 2).

Plasma membrane
In addition to partitioning individual cells, the plasma membrane is responsible for mediating functions such as ion exchange, cell fusion, and signal transduction. Many of these processes are controlled by large protein complexes that are subjected to regulation by microproteins. For example, the Phe-Xaa-Tyr-Asp (FXYD) family (FXYD1-7) are type I single-pass transmembrane microproteins that regulate ion transport at the plasma membrane. FXYD proteins contain an invariant PFXYD motif in their extracellular N-terminal domain that enables them to serve as auxiliary subunits of the sodium (Na + )/potassium (K + ) ATPase (NKA). 18,62 NKA is a plasma membrane enzyme that generates an electrochemical gradient for Na + and K + across the membrane by exporting 3 Na + ions and importing 2 K + ions for every ATP molecule hydrolyzed. This ubiquitous membrane transporter is essential for maintaining cell volume, pH, nutrients, and ion gradients that supply energy for secondary membrane transport. FXYD proteins were recognized for several decades as small proteins (61-95-AA, except FXYD5 at 178-AA) that could affect ion transport when overexpressed, though their function and interactions remained unknown until more recently. 18 FXYD2 (g-subunit of NKA) was the first member of the family defined to be a regulator of NKA. Expressed primarily in the kidney, 63 FXYD2 regulates NKA ion affinity depending on membrane potential. In addition, human mutations in FXYD2 have been linked to primary hypomagnesemia, 64 representing one of the few known instances of a pathology-causing microprotein. The other members of the FXYD family are expressed in a tissue-specific manner, allowing for precise regulation of NKA throughout the body. The cardiac-enriched FXYD1, also called phospholemman (PLM) is depicted in Figure 2A.
PLM is one of the best examples of the dynamic actions of microproteins and their ability to fine-tune cellular physiology. Originally described to interact with and regulate NKA, PLM was subsequently found to also be an endogenous regulator of the Na + /calcium (Ca 2+ ) exchanger (NCX), 19  iScience Review investigation into its role in regulating skeletal and cardiac muscle contractility. 65 PLM activity, and therefore NKA and NCX function, is regulated by the phosphorylation state of PLM on cytoplasmic serine residues 63 and 68. 66,67 PLM is phosphorylated by protein kinase A (PKA) and protein kinase C (PKC) during stress and catecholamine release, which induces its dissociation from NKA and relief of inhibition. In contrast, phosphorylated PLM promotes inhibition of NCX via interaction with the intracellular loop. 66,68 The elegant nature of PLM activity is evident when viewed at the level of the cardiomyocyte: during stress responses, PLM is phosphorylated which relieves inhibition of NKA and decreases intracellular Na + . 19 The relative loss of intracellular Na + activates NCX, which in-turn reduces cytoplasmic Ca 2+ . 19 However, because phosphorylated PLM inhibits NCX activity, intracellular Ca 2+ loss is limited. These coordinated efforts simultaneously lower the risk for arrythmia while preserving contractility 19 (Figure 2A).
Plasma membrane microproteins function beyond ion exchange to affect other critical processes such as fusion and cell signaling during development. For example, the 63-AA microprotein NEMEP (Nodal Enhanced MEsendoderm Peptide) is a transmembrane microprotein that helps regulate early mesendoderm development. 69 Discovered as a putative ncRNA target of Nodal, NEMEP localizes to the plasma membrane and interacts with the glucose transporters GLUT1 and GLUT3 to directly enhance glucose transport into cells. NEMEP loss-of-function studies demonstrate impaired mesodermal differentiation and glucose uptake in vitro and mesodermal developmental defects in vivo. 69 Similarly, the 84-AA microprotein myomerger, 70 also called myomixer 71 and minion, 72 is important for myoblast fusion during muscle formation. It is now known that two myoblast proteins, myomerger and the 221-AA protein myomaker, regulate fusion in a stepwise manner, requiring first myomaker on both cells and then myomerger on only one cell to complete the process. 73 Myomerger-null mice do not undergo myoblast fusion which leads to neonatal lethality, [70][71][72] marking myomerger as a microprotein that is essential for muscle formation and viability. Endocytosis is a critical fundamental cellular process that utilizes membrane-bound vesicles to partition functions such as signal transduction, energy sensing, autophagy, and exogenous protein digestion within the endo/lysosome. 74 Microproteins contribute to several distinct aspects of these processes, such as the 83-AA cancer associated small integral membrane open reading frame 1 (CASIMO1). Using a microarraybased approach, CASIMO1 was discovered as a sORF-encoded microprotein upregulated in breast cancer samples. 75 Functional characterization revealed co-localization of CASIMO1 with the late endosomal marker LAMP1, leading to the discovery of CASIMO1's role in actin cytoskeletal organization, cell migration, and proliferation, which are key features of cancer progression and prognosis.
SPAR is another endo/lysosomal-associated microprotein that was discovered through MS-based proteomic screening of ncRNAs. 76,77 The ncRNA LINC00961 was revealed as a candidate microprotein and validated to code for a 90-AA protein that localizes to the late endo/lysosome and interacts with the membrane-associated v-ATPase hydrogen (H + ) pump. Results further showed SPAR as an inhibitor of mTORC1, a key player in cell growth and proliferation in response to skeletal muscle injury 78 ( Figure 2B). SPAR is now recognized to have important implications for skeletal muscle regeneration and continues to be investigated as a regulator of tissue injury response.
Several human microproteins have been identified to have Drosophila melanogaster homologs, including the 88-AA hemotin 79 which functions at the late endosome. Hemotin is a necessary component of the Drosophila immune system via regulation of phagocytosis in macrophage-like cells, a function which is shared with its human homolog, stannin. Stannin promotes endosomal maturation and phagocytic processing in macrophages, 79 confirming the evolutionarily conserved function of hemotin/stannin in organismal immunity. iScience Review Sarco/endoplasmic reticulum The S/ER regulates cellular homeostasis on multiple fronts including protein synthesis, protein folding, and storage of intracellular Ca 2+ . Ca 2+ is a ubiquitous second messenger that participates in numerous cellsignaling events upon its release from the S/ER into the cytosol. 80 Ca 2+ is actively transported back into the S/ER via the S/ER Ca 2+ ATPase (SERCA) to restore basal cytoplasmic Ca 2+ levels.  Figure 2C). The SERCA-regulins are co-expressed with different SERCA isoforms (SERCA1a, SERCA2a, SERCA2b, and SERCA3) in a tissue-specific manner. 12,81 These small proteins share sequence homology within their TM domain, which enables direct interaction with SERCA. 12 The potent activity of these SERCA-regulins can be appreciated in the context of cardiomyocyte contractility. During cardiac contraction, Ca 2+ is rapidly released from the S/ER, which induces sarcomere shortening and activates SERCA2a (the primary SERCA gene product in the heart), which uses the energy generated from ATP hydrolysis to pump Ca 2+ back into the S/ER to induce cardiac relaxation. 82 This process is highly regulated by the cardiomyocyte-expressed SERCA-regulins (PLN, SLN, ALN). Under normal physiological conditions, a population of SERCA2a is inhibited by PLN to maintain a ''cardiac reserve'' for situations where heart contractility must be rapidly increased. In response to catecholamine induced b-adrenergic signaling, PLN is phosphorylated on its cytoplasmic serine 16 residue by PKA, causing dissociation from SERCA2a. This alleviates its inhibitory effects and potently increases SERCA2a Ca 2+ transport activity and cardiomyocyte contractility (i.e., the ''fight-or-flight'' response). 22 Notably, PLN-deficient mice exhibit maximal myocardial contractile performance at baseline and completely lose the ability to respond to b-adrenergic signaling. 83 In addition to the SERCA-regulins, dwarf open reading frame (DWORF, 34-AA) has been identified as a positive regulator of SERCA2a in cardiac and slow-skeletal muscle. 14 The exact mechanisms that regulate DWORF's stimulatory effects on SERCA2a are not yet fully understood, but DWORF has been shown to act in an antagonist manner to PLN by binding to the same residues on SERCA2a and actively displacing PLN from the pump to relieve its inhibitory effects, resulting in an increase in SERCA2a Ca 2+ affinity. 14,84 Additionally, new evidence from heterologous cell culture systems suggests that DWORF may also directly stimulate the maximal Ca 2+ transport activity (V max ) of SERCA2a. 85,86 Given its potent stimulatory effects on SERCA2a and cardiomyocyte contractility, DWORF has recently emerged as a potential heart failure therapeutic. 84,87,88 Additionally, there has been a recent report of another SERCA-interacting protein, 48-AA pTUNAR, that enhances SERCA activity during neural differentiation. 58 Both DWORF and pTUNAR directly interact with SERCA, but their transmembrane domains lack the SERCA-inhibitory motif that is present in the SERCA-regulins, 12,58 thus indicating that their unique transmembrane domains may mediate their distinct ability to activate SERCA.

Mitochondria
A large number of microproteins that have been identified thus far localize to the mitochondria. 89,90 Mitochondria are the primary organelles responsible for integrating cell signaling and survival cues to match changing metabolic and energetic demands. Several studies have identified microproteins that affect various aspects of tissue and systemic metabolism, implicating these microproteins in health and disease. One such protein is mitoregulin (Mtln), 91,92 also known as MOXI (micropeptide regulator of b-oxidation). 13 Previously annotated as a non-coding RNA (LINC00116), three independent groups discovered the 56-AA mitoregulin and defined its role in fatty acid oxidation, lipid metabolism, and respiratory chain activity in heart and skeletal muscle. 13,91,92 Additional studies have described a role for mitoregulin in regulating lipolysis and mitochondrial b-oxidation in adipocytes 93 and contributing to systemic lipid metabolism. 94 The exact molecular mechanisms driving mitoregulin function are still under active investigation; however its involvement in lipid and fatty acid metabolism is well supported, suggesting it could be developed as a target for modifying these processes in disease.
Numerous studies have defined microproteins as essential core components, accessory subunits, and assembly factors for the electron transport chain (ETC). 56,89,91,95,96 This is not surprising given that the ETC complexes are known to contain a disproportionately high number of small proteins. In fact, compared to the total cellular proteome where proteins under 100 AAs make up <2% of the known proteome, they constitute 28% of ETC proteins. 89 Microproteins that have been recently identified that are directly involved in ETC function and supercomplex assembly (i.e., respirasomes) include BRAWNIN (71-AA) 89 UQCC3 (93-AA), 96 UQCC4 (132-AA), 95 SMIM4 (70-AA), 95,97 mitolamban, 56 and mitoregulin 91 ( Figure 2D). Collectively, these studies highlight the important roles that microproteins play in the formation of the functional ETC and respiratory supercomplexes, processes that are currently not fully understood.
In addition to contributing to metabolism and respiration, microproteins have been implicated in organelle-organelle communication at the mitochondria. Such is the case for the 54-AA microprotein PIGBOS, which facilitates interactions between the outer mitochondrial membrane (OMM) and mitochondrial-associated ER membrane (MAM) during the stress-induced ER unfolded protein response (UPR). 57 The UPR has been implicated in numerous neurodegenerative diseases such as Alzheimer's, Parkinson's, amyotrophic lateral sclerosis, and others 98,99 and is thus an area of active investigation for targeting disease. PIGBOS increases resistance to ER-stress and apoptosis by interacting with the ER membrane protein chloridechannel CLIC-like 1 (CLCC1), suggesting a unique role for PIGBOS in inter-organelle communication and the cellular stress response.
Another mitochondrial microprotein is the anti-tumorigenic 133-AA miPEP133. 100 miPEP133 is translated from its microRNA (miRNA) precursor, miR-34a, which is known to suppress transcription of oncogenes in multiple cancers. 101-103 miPEP133 itself functions independently in an anti-tumorigenic positive-feedback loop to increase transcription of p53 and miR-34a. miPEP133 promotes anti-cancer effects by interacting with HSPA9 in the mitochondria where it inhibits interactions between HSPA9 and its binding partners. This causes changes in membrane potential, ATP production, and the cell cycle that induces apoptosis and limits tumorigenesis. miPEP133 levels correlate with favorable cancer prognosis, emphasizing the relevance of this microprotein as a prognosis factor.

Cytoplasm
One of the earliest discovered microproteins was humanin, a 24-AA secreted protein that has been shown to be beneficial in Alzheimer's disease (AD). 104 Humanin is encoded by the mitochondrial genome, with its ORF found within the mitochondrial 16S rRNA gene. The beneficial effects of humanin in AD have been linked to its anti-apoptotic activity and prevention of neuronal cell death both in the cytoplasm and via cell surface receptors. Within the cytoplasm, humanin binds and stabilizes Bax, preventing downstream apoptotic signaling of the Bax/Bcl family of apoptosis-inducing proteins. 105 Humanin also exerts cytoprotection through binding to cell surface receptors including the cytokine receptor complex CNTFR-a/ WSX-1/gp130 106 and the G-coupled protein receptor FRPL1. 107 The interest in humanin as a promising therapeutic expands beyond AD to other neurodegenerative diseases, diabetes, and cardiovascular disease among others. [108][109][110] The 68-AA microprotein NoBody functions within the cytoplasm to regulate mRNA stability. 111 Previously annotated as the ncRNA LINC01420, NoBody is expressed in multiple cell lines and shows high levels of AA conservation in mammals. NoBody localizes to processing-bodies (P-bodies) in the cytosol, which regulate mRNA degradation and translation repression. 112 P-bodies achieve partitioning though liquid-liquid phase separation, raising interesting questions about NoBody as a microprotein that functions within a cellular compartment that lacks a membrane. NoBody interacts with the mRNA decapping complex protein EDC4, leading to enhanced mRNA decapping, decreased mRNA stability, and increased mRNA degradation. Therefore, there is an inverse correlation between NoBody levels and the number of P-bodies, leading to effects on mRNA stability and overall gene expression.

Nucleus
The nucleus is the home of many highly regulated processes, and therefore, it is not surprising that numerous microproteins localize to this site of complex cellular function. Two such examples include the splice isoforms MRI-1 (modulators of retrovirus infection homolog 1) and MRI-2 which modulate distinct pathways of DNA repair. MRI-1 was first described as a 157-AA modulator of retroviral infection, 113 though later studies discovered its role in regulating DNA repair, leading to its renaming to CYREN (Cell cYcle REgulator of NHEJ). 114 CYREN suppresses the error-prone and possibly genotoxic non-homologous end-joining (NHEJ) DNA repair pathway in favor of the more accurate homologous recombination (HR iScience Review localizes to the nucleus and associates with the Ku-DNA NHEJ complex. 115 In instances of DNA doublestrand breaks, MRI-2 increases the rate of NHEJ to promote DNA repair and prevent apoptosis. CYREN and MRI-2 represent microprotein splice isoforms that independently regulate DNA repair and overall cell survival.
Another microprotein that localizes to the nucleus is the 87-AA pTINCR, so named due to its associated lncRNA, TINCR, which plays a role in epithelial differentiation. 59,116 In mechanisms distinct from TINCR, pTINCR was found to promote epithelial differentiation by contributing to post-translational modification of CDC42. CDC42 is an epithelial pro-differentiation factor that has been associated with oncogenic phenotypes. However, the interaction of pTINCR with CDC42 was shown to be anti-tumorigenic and pro-differentiating, thus marking it as an anti-oncogenic factor for many epithelial cancers.

MICROPROTEINS AS THERAPEUTIC TARGETS
As discussed in detail in this review, microproteins often function as regulators of ion channels, enzymes, or multi-protein complexes and their dysregulation can lead to disease phenotypes. The discovery of numerous microproteins that regulate critical cellular processes creates the exciting possibility of their use as therapeutic targets to treat human diseases such as heart failure and cancer. An added benefit to using microproteins over drugs or small molecule inhibitors is that such molecules are often accompanied by cytotoxicity and off-target effects while microproteins have very specific targets and may be less likely to confer cytotoxicity.
Microprotein gene therapy can be achieved using approaches such as adeno-associated viruses (AAVs), which can be designed to include cell-type specific promoters to drive microprotein expression and limit off-target effects. 117 Microproteins are inherently small; therefore, they can easily meet the packaging capacity limits required for AAV. 118 Nanoparticle delivery systems can also be used to deliver DNA, mRNA, or si/shRNAs to target microproteins 119 and antisense locked-nucleic acid (LNA) gapmers or antisense oligonucleotides (ASOs) can also be used to modify microprotein expression levels, 120 though these methods currently lack tissue specificity. Here we will discuss several microproteins that show promise as potential therapeutic targets for heart failure, obesity, diabetes, and cancer ( Figure 3, Table 2).
Targeting microproteins in heart failure, obesity, and diabetes There are several microproteins that are abundantly expressed in the heart and regulate important processes such as Ca 2+ homeostasis (the SERCA-regulins and DWORF), energy metabolism (MOXI, BRAWNIN, UQCC3, UQCC4, SMIM4, Mitolamban, MOTS-c), and ER-stress (PIGBOS). Therefore, there is great potential for these microproteins to serve as therapeutic targets to treat heart failure and obesity.

DWORF
Ca 2+ dysregulation is a universal feature of heart failure and is characterized by impaired Ca 2+ sequestration into the S/ER and reduced SERCA2a activity and expression. [121][122][123] Therapeutic approaches that enhance the expression and/or activity of SERCA2a can restore Ca 2+ homeostasis and ameliorate disease in animal models of heart failure and have even made it to clinical trials. 122,[124][125][126][127][128] An innovative approach to enhancing SERCA2a activity is via overexpression of the potent SERCA2a activating microprotein DWORF. 14,84,87,88 DWORF is highly expressed in the heart and soleus, and it has been shown that DWORF activates SERCA2a activity in part by displacing the SERCA2a inhibitor PLN, leading to increased peak systolic Ca 2+ transients, SR Ca 2+ load, and SERCA2a enzymatic activity. 14 DWORF protein and mRNA levels are decreased in various models of heart failure and DWORF overexpression in genetic 84,88 and experimental 87 mouse models of heart failure using transgenic or AAV-mediated gene delivery systems is cardioprotective and reduces disease pathogenesis. These studies highlight the potential to use microproteins as therapeutics to modulate the activity of ion transporters such as SERCA2a and active development of DWORF as a heart failure therapeutic is currently ongoing. 84,87,88 MOTS-c MOTS-c is a microprotein that is encoded in mitochondrial DNA (mtDNA), which was first demonstrated to target skeletal muscle and regulate insulin sensitivity and glucose metabolism. 15,129 MOTS-c is detected in the circulation, indicating it is a mitochondrial hormone, or ''mitokine''. 15,129,130  iScience Review MOTS-c has also been shown to relieve hyperglycemia and insulin resistance in gestational diabetes mellitus in a high fat diet/low-dose-streptozotocin mouse model 131 and reduces lipid accumulation and fatty acid levels in livers of mice fed a normal diet. 132 Additionally, there have been reports that circulating MOTS-c levels are reduced in obese male children and adolescents, and the decrease is further exacerbated in patients who also have insulin resistance. 133 The mechanism of action of MOTS-c has been an active area of investigation and studies have found that it primarily acts on skeletal muscle and the heart, increases cellular levels of AICAR (an AMP-activated protein kinase [AMPK] agonist), activates AMPK, and helps maintain metabolic flexibility and homeostasis. 15,130 MOTS-c has also been shown to translocate to the nucleus and affects gene expression during metabolic stress. 134 MOTS-c has been labeled as an ''exercise mimetic'' as it has been shown to confer some exercise-like effects by lowering blood glucose levels and holds promise to treat obesity and type 2 diabetes. 15,130 Additionally, another group has published interesting findings demonstrating that MOTS-c modulates the TGF-b/SMAD-signaling pathway and can improve osteoporosis. 135,136 Targeting microproteins in cancer Numerous microproteins have been implicated in cancer as key players in cell proliferation, tumor suppression, invasion, and metastasis. 75 iScience Review potential therapeutic targets using overexpression (CIP2A-BP, MIAC, MP31) or inhibition (ASAP, APPLE) strategies ( Figure 3).

CIP2A-BP
Encoded by the ncRNA LINC00665, the 52-AA microprotein CIP2A-BP was identified as being downregulated by TGF-b in breast cancer cell lines, and low levels of CIP2A-BP expression are associated with poor survival in triple-negative breast cancer patients. 139 CIP2A-BP was shown to compete with protein phosphatase 2 (PP2A) to bind to CIP2A (cancerous inhibitor of PP2A, cancer inhibitory factor) and inhibit activation of the PI3K/AKT/NFkB pathway, preventing the migration and invasion of triple-negative breast cancer cells both in vitro and in vivo. 139

MP31
Extensive ribosome profiling analysis of human glioblastoma (GBM) samples and paired normal brain tissues identified MP31 as a 31-AA microprotein translated from the 5 0 UTR of the well described tumor suppressor PTEN. 142 Under normal conditions, MP31 binds to lactate dehydrogenase B (LDHB) to suppress lactate-pyruvate conversion, and downregulation of MP31 in GBM cells was associated with increased lactate utilization, a characteristic metabolic finding of cancer cells with high tumorigenicity. 143 Patientderived GBM cell lines exhibited downregulation of MP31, and MP31 overexpression in these cells led to reduced tumorigenicity. Additionally, systemic injection of recombinant MP31 reduced tumor size in mice in a patient-derived xenograft in situ GBM model leading to prolonged survival. 142

ASAP
The 94-AA microprotein ASAP (ATP synthase-associated peptide) was identified as a highly upregulated gene (ncRNA LINC00467) in HCT116 colorectal cancer (CRC) cells and its expression predicts poor outcomes in CRC patients. 144 ASAP was demonstrated to interact with ATP synthase and enhance mitochondrial ATP production, resulting in increased CRC cell proliferation in vitro and in vivo. Targeting ASAP in patient-derived xenografts using CRISPR/Cas9 resulted in the inhibition of CRC cell proliferation and reduced tumor size, suggesting that targeting ASAP in CRC may be therapeutically beneficial. 144

APPLE
The 90-AA microprotein APPLE was identified from Ribo-seq data from hematopoietic malignancies 145 and publicly available MS data. 41 APPLE is highly expressed in acute myeloid leukemia (AML) patient samples and its expression is associated with poor outcomes in hematopoietic malignancies. 145 Termed an ''oncomicroprotein'', APPLE was shown to localize on the ribosome-bound ER and interact with translation elongation factors and poly(A)-binding proteins to promote mRNA looping and eIF4A complex assembly to regulate the translation of a subset of mRNAs that contribute to AML progression, thus supporting a pro-cancer translation program. 145 Targeting APPLE using shRNA resulted in broad anti-cancer effects both in vitro and in vivo, indicating that targeting APPLE may be a clinically relevant approach to inhibit oncoprotein synthesis in cancer cells. 145

CONCLUSIONS AND FUTURE DIRECTIONS
Since the recent genesis of the microprotein field, many sORF-encoded microproteins have been defined as key players in cellular physiology. However, it is hypothesized that there are still a significant number of microproteins hidden in the genome that have yet to be identified, validated, and functionally characterized. Numerous experimental methods, computational tools and bioinformatic platforms have been iScience Review developed and applied to discover microproteins and these tools are currently undergoing further optimization to increase their sensitivity and reliability. In this review we have highlighted many of these powerful tools which incorporate sequencing-and MS-based discovery, domain prediction, and evolutionary conservation that have been used in isolation or in combination to robustly identify sORF-encoded microproteins. Following their identification and experimental validation, extensive functional characterization of candidate microproteins will help elucidate contributions to cellular and whole-body physiology. Additionally, in the future it may be advantageous to apply new artificial intelligence tools such as Alpha-Fold to predict microprotein structure from their AA sequences to give insight into their putative functions 146 or develop new drugs to target microprotein activity.
As comprehensively demonstrated in this review, many microproteins that have been identified thus far localize to discrete subcellular domains where they regulate pivotal cellular processes, often via their direct interaction with larger proteins and/or multi-protein complexes. Due to their unique ability to endogenously regulate and fine-tune specific physiological processes, microproteins represent attractive potential therapeutic targets for future development. Herein, we have highlighted the ongoing discovery of microproteins in parallel with advances in bioinformatics, experimental methods, and therapeutic development that collectively contribute to the exciting future of microprotein biology.

ACKNOWLEDGMENTS
This work was supported by grants from the National Institutes of Health (HL160569, HL141630), Cincinnati Children's Research Foundation, and American Heart Association (PRE1020028). All figures were created using BioRender (www.biorender.com).

DECLARATION OF INTERESTS
The authors declare no competing interests.