Computational and experimental characterization of the novel ECM glycoprotein SNED1 and prediction of its interactome

The extracellular matrix (ECM) protein SNED1 has been shown to promote breast cancer metastasis and control neural crest cell-specific craniofacial development, but the cellular and molecular mechanisms by which it does so remain unknown. ECM proteins exert their functions by binding to cell surface receptors, sequestering growth factors, and interacting with other ECM proteins, actions that can be predicted using knowledge of protein’s sequence, structure and post-translational modifications. Here, we combined in-silico and in-vitro approaches to characterize the physico-chemical properties of SNED1 and infer its putative functions. To do so, we established a mammalian cell system to produce and purify SNED1 and its N-terminal fragment, which contains a NIDO domain. We have determined experimentally SNED1’s potential to be glycosylated, phosphorylated, and incorporated into insoluble ECM produced by cells. In addition, we used biophysical and computational methods to determine the secondary and tertiary structures of SNED1 and its N-terminal fragment. The tentative ab-initio model we built of SNED1 suggests that it is an elongated protein presumably able to bind multiple partners. Using computational predictions, we identified 114 proteins as putative SNED1 interactors. Pathway analysis of the newly-predicted SNED1 interactome further revealed that binding partners of SNED1 contribute to signaling through cell surface receptors, such as integrins, and participate in the regulation of ECM organization and developmental processes. Altogether, we provide a wealth of information on an understudied yet important ECM protein with the potential to decipher its functions in physiology and diseases.


INTRODUCTION
The extracellular matrix (ECM) is a complex scaffold made of hundreds of proteins that instructs cell behaviors, organizes tissue architecture, and regulates organ function (1). It plays prominent roles during embryonic development, aging, and diseases (2)(3)(4)(5)(6)(7). Mechanistically, ECM proteins can play these roles through their interactions with each other, with growth factors or morphogens, and with receptors present at the cell surface (1,8,9). These molecular interactions are mediated by specific protein domains, motifs, or sequences and govern the nature of the chemical and mechanical signals conveyed by the ECM. Characterizing the composition of the ECM alongside the interactions taking place within this compartment and determining how they regulate cellular functions is the first step towards building a systems biology view of the ECM.
We previously used the characteristic domain-based organization of known ECM proteins (10)(11)(12) to utilize sequence analysis to computationally predict via sequence analysis the ensemble of genes coding for ECM proteins and ECM-associated proteins. We termed this ensemble the "matrisome" (13). Our interrogation of the human genome found that 1027 genes encoded matrisome proteins, among which 274 encoded "core" ECM components such as collagens, proteoglycans, and glycoproteins. While all 44 collagen genes (14) and 35 proteoglycan genes (15) had previously been reported, several of the 195 genes predicted to encode structural ECM glycoproteins based on the protein domains present were or still are of unknown function (13, 16,17). One such gene is SNED1. It encodes the Sushi, Nidogen, and EGF-like domain-containing protein 1 (SNED1) and was initially named Snep for stromal nidogen extracellular matrix protein, since the murine gene, was cloned from stromal cells of the developing renal interstitium, and its pattern of expression overlapped with that of the ECM basement membrane proteins nidogens 1 and 2 (18). Sned1 is broadly expressed during mouse development, particularly in neural-crest-cell and mesoderm derivatives (18,19). The interrogation of RNA-seq databases indicates that SNED1 is also expressed in multiple human adult tissues, although at a low level (unpublished data from the Naba lab). A decade after the cloning of this gene, we identified SNED1 in a proteomic screen comparing the ECM of poorly and highly metastatic mammary tumors and further reported the first function of this protein as a promoter of mammary tumor metastasis (20). Intrigued by this novel protein, we sought to identify its physiological roles. To do so, we generated a Sned1 knockout mouse model and demonstrated that Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200675/905850/bcj-2020-0675.pdf by guest on 19 March 2021 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200675 4 glycine solution at pH 3 and pH 2.5, dialyzed against phosphate-buffered saline (PBS), and stored at 4˚C. The reactivity and specificity of the antibody were assessed by western blot (Supplementary Figure S1).

Plasmid constructs
The cDNA encoding full-length human SNED1 (fl-SNED1) cloned into pCMV-XL5 (clone SC315884) was obtained from Origene. The cDNA encoding full-length murine Sned1 cloned into pCRL-XL-TOPO (clone 40131189) was obtained from Open Biosystems (now, Thermo Fisher,). Fl-SNED1 and a construct spanning the most N-terminal region of SNED1 and including the NIDO domain (amino acids 1 to 260, referred to as "N-terminal fragment" of SNED in the text and as "N-ter" in the figures) were subcloned into the bicistronic retroviral vector pMSCV-IRES-Hygromycin between the BglII and HpaI sites, and a FLAG tag (DYKDDDDK) was added at the C-terminus of both constructs ( Figure 1A). These constructs were used to establish stable cell lines (see below). 6x-His-tagged constructs of human and murine SNED1 cloned into pCDNA5/FRT (Thermo Fisher) between the FseI and AscI sites were used to transiently transfect 293T cells to validate the anti-SNED1 antibody generated in this study (Supplementary Figure S1). Fl-SNED1 was subcloned into p-Select-eGFP-Blasti (Invivogen) between the AgeI and NcoI restriction sites. Fl-SNED1-GFP or GFP alone were then shuttled into the bicistronic retroviral vector pMSCV-IRES-Puromycin between the BglII and EcoRI sites. Retroviral particles were obtained as described below and used to express GFP and fl-SNED1-GFP in immortalized mouse embryonic fibroblasts (see below). All primers used are listed in Supplementary Table S1. All constructs were validated by Sanger sequencing.

Cell culture
Human embryonic kidney (HEK) 293T cells (further referred to as 293T cells) were cultured in Dulbecco's Modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum and 2 mM glutamine.

Retrovirus production
293T cells were plated at ~30% confluency and transfected 24 h later using the Lipofectamine 3000 system (Invitrogen) with a mixture containing 1 μg of retroviral vector with the construct of interest, 0.5 μg of packaging vector (pCL-Gag/Pol), and 0.5 μg of coat protein (VSVG). The transfection mix was prepared according to the manufacturer's instructions and added to the cells for 24 h, after which Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200675/905850/bcj-2020-0675.pdf by guest on 19 March 2021 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200675 5 the transfection mix was removed and cells were fed with fresh culture medium. Cells were then cultured for an additional 24 h, after which the viral-particle-containing culture medium was collected and filtered through a 0.45-μm filter and then either stored at −80 °C or immediately used.

Establishment of 293T cells stably expressing fl-SNED1 or the N-terminal fragment of SNED1
293T cells were plated at ~40% confluency. Undiluted viral-particle-containing conditioned medium (see above) was added to the cells 24 h after seeding, and cells were fed with fresh culture medium 24 h after transduction. Transduced cells were selected with hygromycin (100 μg/mL) over a period of 10 days. Protein expression and secretion were monitored by performing western blot analysis on cellular protein extracts obtained by lysing cells using 3X Laemmli buffer (0.1875 M Tris-HCl, 6% SDS, 30% Glycerol) supplemented with 100 mM dithiotreitol, and on the cell conditioned media (CM) with the rabbit polyclonal anti-SNED1 antibody (2 μg/mL) described below, a rabbit polyclonal anti-FLAG antibody (2 μg/mL; Sigma, F7425), or the monoclonal anti-FLAG M2 antibody (Sigma, F3165).
Secondary anti-rabbit antibody conjugated to the horseradish peroxidase (Thermo Fisher, 31460) was used and immunoreactive bands were detected by chemiluminescence (SuperSignal West Pico PLUS, Thermo Fisher or ECL Prime Western Blotting System, GE Healthcare).
For large-scale expression of fl-SNED1 and the N-terminal fragment of SNED1, cells were cultured in HYPERFlasks™ in DMEM (Sigma-Aldrich, D5796) supplemented with 50 μg/mL of gentamicin (Sigma-Aldrich, G1272) as previously described (21). Culture media were harvested every 48 h for up to 18 days. After collection, 3 tablets of EDTA-free cOmplete inhibitor (Roche) were added to the culture medium which was then centrifuged at 14,000 × g for 30 min at 4°C. Supernatants were stored at -80°C until use. Fl-SNED1 and the N-terminal fragment were purified by affinity chromatography on an anti-FLAG resin (Sigma-Aldrich, A2220) as previously described (21) in presence of 150 mM NaCl. In brief, FLAG-tagged proteins were purified at 4°C on the anti-FLAG resin at a flow rate of 20 ml/h with a P1 pump (GE Healthcare) and eluted by competition with a FLAG peptide solution at 200 μg/mL in 10 mM HEPES, 150 mM NaCl, pH 7.4 (HEPES buffered saline, HBS). The purified proteins were then concentrated on either Amicon (Merck Millipore, MWCO 10 kDa) or Vivaspin (Sartorius, MWCO 5 kDa) concentration columns. The yield was approximately 200 g/L and 600 g/L culture medium for the N-terminal fragment of SNED1 and fl-SNED1, respectively. iMEFS were plated at ~40% confluency. Undiluted viral-particle-containing conditioned medium containing the cDNA encoding fl-SNED1-GFP (see above) was added to the cells 24 h after seeding, and cells were fed with fresh culture medium 24 h after transduction. Transduced cells were selected with hygromycin (100 μg/mL) over a period of 10 days. Protein expression and secretion in the culture medium were monitored by performing western blot analysis on cellular protein extracts and conditioned medium obtained as described above, using the mouse monoclonal anti-GFP antibody [9F9.F9] (Abcam #ab1218; used at a final concentration of 2 g/mL) and a secondary HRP-coupled anti-mouse antibody.

Deoxycholate (DOC) solubility assay
293T cells stably expressing FLAG-tagged fl-SNED1 were grown to full confluency and lysed in DOC buffer: 2% deoxycholate; 20 mM Tris-HCl, pH 8.8 containing 2mM EDTA, 2mM N-ethylamine, 2mM iodoacetic acid, 167 μg/mL DNase and 1X protease inhibitor (Thermo Scientific, A32953) as previously described (22). Lysate was then passed through a 26G needle to further shear DNA and reduce viscosity. Centrifugation was used to pellet the DOC-insoluble, ECM-enriched, protein fraction from the DOC-soluble supernatant, enriched for intracellular components. These fractions were analyzed for the presence of SNED1 by western blot as described above.

Preparation of cell-derived matrices
Cell-derived matrices (CDMs) from iMEFs were prepared following the protocol published by the Schwarzbauer lab (23). In brief, glass coverslips were coated with 0.2% gelatin (Sigma, G1890) for 1 h

Immunofluorescence
CDMs were fixed with 4% paraformaldehyde, and immunofluorescence staining was performed using the following primary antibodies: mouse monoclonal anti-GFP antibody [9F9.F9] (Abcam #ab1218; used at a final concentration of 10 g/mL) and rabbit serum containing anti-fibronectin polyclonal antibodies (a kind gift from Richard Hynes, MIT, used at a 1:100 dilution), and the following secondary antibodies: goat anti-mouse coupled to Alexa Fluor 647 (Thermo Fisher A21236; used at 4 g/mL) and goat anti-rabbit coupled to Alexa Fluor 568 (Thermo Fisher A11036; used at 4 g/mL).
Coverslips were mounted on glass slides and imaged using the Zeiss Axio Imager Z2 or the Zeiss Confocal LSM 880. Images were acquired and processed using ZEN v2.3. All negative control staining are provided in Supplementary Figure S9.

Analysis of SNED1 post-translational modifications by SDS-PAGE and western blot
Conditioned media from 293T cells stably expressing FLAG-tagged fl-SNED1 and the N-terminal fragment of SNED1 were incubated with PNGase F as previously described (21), or with heparinase III and chondroitinase ABC (2 mU per 40 μL of conditioned medium) as previously described (24).
Proteins were separated by SDS-PAGE and transferred onto nitrocellulose membranes. Membranes were probed with an anti-FLAG antibody or the rabbit polyclonal anti-SNED1 antibody generated in this study to identify recombinant SNED1. To determine whether SNED1 is phosphorylated, FLAG-tagged fl-SNED1 was immunoprecipitated from 1.25 mL of medium conditioned by cells for 72 h using an anti-FLAG resin (Sigma-Aldrich, A2220). Bound proteins were resolved by SDS-PAGE and western blots were performed with the anti-FLAG antibody to validate the immunoprecipitation of FLAG-tagged fl-SNED1 and with antiphosphoserine (1 μg/mL; Abcam, ab9332), anti-phosphothreonine (1 μg/mL; Sigma-Aldrich, AB1607), or anti-phosphotyrosine (1 μg/mL; Sigma-Aldrich, 05-321) antibodies.

Circular dichroism (CD)
Far-UV CD spectra were recorded in a quartz cuvette at 20°C with a path length of 0.1 cm in a 7.12 (Malvern Instruments Ltd). The theoretical hydrodynamic radii of fl-SNED1 and the N-terminal fragment of SNED1 were calculated using folded proteins parameters and the number of amino acid residues of each protein (27).

Bioinformatic analysis of the amino acid sequence of human SNED1
The sequence of human SNED1, without its peptide signal, was used for further queries unless stated otherwise (UniProtKB Q8TER0, residues 25-1413; Figure 1A). The domain organization of human SNED1 was drawn with Illustrator of Biological Sequences 1.0.3 (28). The secondary structure of SNED1 and NIDO was predicted using Proteus2 (29 Ser-Gly-X-Gly, X being any amino acid residue except proline (35) and Glu/Asp-x-Ser-Gly (36) sequences, corresponding to glycosaminoglycan (GAG) attachment sites were searched with PATTINPROT (https://npsa-prabi.ibcp.fr) (37). The Ser-Gly pattern was searched manually in SNED1 sequence. Disulfide bond forming cysteines and ternary cysteine classification were performed using DISULFIND (34)  A template-free, ab-initio protein structure prediction, QUARK (41), was used to generate models of the NIDO domain. The models was refined with MolRefiner (42) and its quality was assessed with ProSAII (43), Verify-3D (44), and PROCHECK Ramachandran plot analysis (45).

Reactome Pathway Analysis
The 114 proteins predicted to interact with SNED1 were input as a dataset into the Reactome pathway algorithm (https://reactome.org/) (57,58). In brief, a statistical test determines whether certain Reactome pathways and biological processes are enriched in the submitted dataset. The test produces a probability score corrected for false discovery rate using the Benjamini-Hochberg method.

Computational analysis of the sequence of the ECM protein SNED1
As previously described (18,19), SNED1 is a multidomain protein containing one NIDO domain, one follistatin domain, one Sushi domain, also known as complement control protein (CCP) domain, 15 EGF-like and EGF-Ca ++ domains, and 3 fibronectin III domains in its C-terminal region ( Figure 1A, domain boundaries were predicted using SMART (59)). While these protein domains are found in lower organisms (60,61), the results of our phylogenetic analysis have revealed that orthologs of SNED1 are found in vertebrates, including the following model organisms: mouse, rat, chicken, zebrafish, and xenopus. Sequence homology ranges from 85% between mammalian species to ~45-50% with other vertebrates. No ortholog of SNED1 was found in lower organisms (19).
An interesting feature of SNED1 is the presence of a NIDO domain (SMART: SM00539) in its Nterminal region (amino acids 103-260). This domain is only found in 4 other human or rodent proteins in addition to SNED1: the basement membrane proteins nidogen-1 and nidogen-2 (62,63), mucin-4, and alpha-tectorin, a component of the tectorial membrane, the apical ECM of the inner ear (64)(65)(66).
Identity between the NIDO domain of human SNED1 and that of other vertebrate SNED1 orthologs ranges between 73% and 92% (19). Sequence alignment within the NIDO domains of the 5 human proteins showed that the NIDO domain of SNED1 is most closely related to that of alpha-tectorin (77% of similarity and 58% of identity; Figure 1B

Development of a mammalian cell system to produce and purify SNED1 in vitro
In order to study the biochemical and biophysical properties of SNED1, we devised a mammalian cell system to produce recombinant FLAG-tagged full-length SNED1 (fl-SNED1) or the N-terminal fragment that contains the NIDO domain of SNED1 ( Figure 1A). We found that both proteins were secreted by the cells, since we detected them in the conditioned medium of 293T cells stably expressing them, from which we can purify the proteins using the FLAG-tag added to their C-terminal ends (Figure 2A). In order to study SNED1, we also generated a rabbit polyclonal antibody that we validated and found to be specific to human SNED1 since it did not recognize murine SNED1 (Supplementary Figure S1A and S1B).
The canonical DOC solubility assay has been used to demonstrate the incorporation of proteins, such as fibronectin, into the ECM (22,70). Here, we show that fl-SNED1 is detected in the DOC-insoluble fraction, indicating its relative insolubility and likely incorporation in the ECM deposited by 293T cells in vitro ( Figure 2B).

Determination of the secondary and tertiary structures of full-length and of the N-terminal fragment of SNED1
Determining the structure of SNED1 has the potential to shed light on the mechanisms underlying its possible functions and signaling mechanisms. We thus turned to molecular modeling and biophysical assays using purified proteins to determine the secondary and tertiary structures of SNED1 and its Nterminal fragment.

Secondary structures of SNED1 and its N-terminal fragment
The predicted secondary structure of the N-terminal fragment of SNED1 using Proteus2 (29) was 33% -strand, 9.3% helix, and 57.6% random coil (Supplementary Figure S2), whereas the deconvolution of its circular dichroism (CD) spectra showed the presence of 39% -strand, 5% helix, and 36% of random coil ( Figure 3A). These results confirmed that the N-terminal fragment of SNED1 containing the NIDO domain is mostly composed of -strands, with a small percentage of helices (≤9%). Two helices were predicted in the NIDO domain itself ( 124 PAMLRRATEDVRHY 137 and 235 DMAEVETT 242 ). A large proportion of random coil (73%) was predicted in fl-SNED1 together with 26% of -strands, and 1% of helix corresponding to the sequence also found in the N-terminal fragment of SNED1, 124 PAMLRRATEDVR 135 ( Figure 3A). However, a higher percentage of strands (52%), and a lower amount of random coil (41%) together with 9% turns were found by CD analysis. Interestingly, no helix was experimentally detected in fl-SNED1.

Determination of the molecular weight of SNED1 and its N-terminal fragment using sizeexclusion chromatography-multi-angle laser light scattering
The theoretical molecular weight (M w ) of fl-SNED1 determined by the ProtParam tool

SNED1 and its N-terminal fragment are disulfide-bonded
The sequence of human SNED1 contains 107 cysteine residues. All the cysteine residues except one are located in two regions: residues 265-902 and 1311-1391. DISULFIND (73) predicted the presence of 53 disulfide bonds in the SNED1 sequence (Supplementary Table S3). Most domains of SNED1 were predicted to be disulfide-bonded except the fibronectin III domains (Supplementary Table S3).
Only one cysteine residue, Cys 99 , is present in the N-terminus of SNED1.
To experimentally determine if SNED1 is stabilized by disulfide bonding in vitro, purified

Determination of the hydrodynamic radius of SNED1 and its N-terminal fragment
The hydrodynamic radius (Rh) of fl-SNED1 determined by dynamic light scattering (DLS) was 9.02 ± 0.97 nm. This value was more than two-fold greater than the value calculated for a fully folded protein comprising the same number of amino acid residues as SNED1 (3.7-3.9 nm), suggesting that SNED1 is  (Figure 3A). We did not obtain a concentration of the N-terminal fragment of SNED1 sufficient enough to get a clear signal in the DLS experiment and a reliable experimental value of its hydrodynamic radius, but its theoretical hydrodynamic radius, calculated assuming that it folds as a globular protein, is 2.3-2.4 nm. The Stokes radius of the N-terminal fragment estimated by SEC was 3.4 nm. Altogether, our results provide the first computational and experimental determination of the structural and biophysical parameters of SNED1 ( Table 1).

3D model of the NIDO domain of SNED1
Since no crystal structure or model of SNED1 or its NIDO domain are available, we sought to build a computational model of the NIDO domain using QUARK and then refined it with ModRefiner (coordinates file provided in Supplementary File S1). The TM-score was 0.9 (the topology is assumed to be correct if this value is > 0.5), and the QMEAN, which should be above -4, was -3.77.
The model contained two helices predicted by Proteus2 (although the second helix was longer than predicted, 233 TADMAEVETTT 243 ), a short -sheet, and two additional -strands ( Figure 3E). The ProSA z-score, which indicates the overall model quality, was -4.21, within the range of scores typically found for X-ray structures of proteins of similar size (Supplementary Figure S4). The model was also correct according to Verify-3D (Supplementary Figure S5) and according to PROCHECK, that returned that 99.7% of the residues were found in allowed regions and 69.9% in the most favored regions (Supplementary Figure S6). Only 2 residues, excluding glycine and proline residues, were found in disallowed regions. The overall quality of the model of the NIDO domain was thus considered good.

SNED1 is a glyco-phosphoprotein
A key feature of ECM proteins is their high level of post-translational modifications (PTMs), including glycosylations (75) and phosphorylations (76)(77)(78), which potentially mediate protein-protein interactions and scaffolding (especially of mineralized tissues in the case of phosphorylations). We thus used several algorithms and queried multiple databases to determine whether SNED1 was predicted to be subject to PTMs (Supplementary Table S4) and further tested our findings experimentally.  In addition to glycosylation sites, several potential attachment sites of glycosaminoglycans (GAGs) were identified within the sequence of SNED1, including 8 Ser-Gly motifs, 1 Ser-Gly-X-Gly sequence ( 846 SGGG 849 ), and 2 Glu/Asp-X-Ser-Gly sequences, 66 Figure   S7B).

Phosphorylation
Sequence analysis also revealed 133 predicted phosphorylation sites in SNED1 (Supplementary residues are predicted to be phosphorylated by casein kinase II which is known to phosphorylate ECM proteins, including collagen XVII (ecto-caseinase 2), fibronectin, and vitronectin (78).
Through database interrogation, we found experimental evidence showing the phosphorylation of 12 of these residues: 5 serine, 5 threonine, and 2 tyrosine residues, none of which lie within the N-terminal domain of SNED1 (Supplementary Table S4D). In order to determine whether SNED1 was phosphorylated when secreted by 293T cells, we immunoprecipitated FLAG-tagged SNED1 and conducted western blot analysis of the immobilized protein using anti-phosphoserine, antiphosphothreonine, and anti-phosphotyrosine antibodies. While we were not able to obtain consistent results with the anti-phosphotyrosine antibody, our results show that both human fl-SNED1 ( Figure   4C, left panel) and its N-terminal fragment ( Figure 4C, right panel) were phosphorylated on serine and threonine residues.
Altogether our results provide evidence that SNED1 secreted by 293T cells is both N-glycosylated and serine-and threonine-phosphorylated, and that some of the modified residues lie within the N-terminal region of SNED1. Future studies are needed to determine which enzymes are responsible for these PTMs and how these PTMs relate to SNED1 structure, interactions, and functions.

Domain-domain interaction network of SNED1
A domain-domain interaction network of fl-SNED1 was built using 3did, the database of 3dimensional interacting domains, and returned 106 unique interactions (Figure 5 and Supplementary  Figure 5). Two interactions were retrieved twice (Sushi/EGF and EGF/EGF_CA). Of note, the lack of knowledge about the NIDO domain was further exemplified here since NIDO is simply absent from the database 3did, and no other protein domain has ever been reported to interact with it.

Prediction of the protein-level SNED1 interactome
The query of the interaction databases MatrixDB (49) and IntAct (50)  Methods). We focused on secreted proteins and membrane proteins, which resulted in the prediction of 114 unique interactions by at least one algorithm, including SNED1 auto-interaction ( Figure 6A).
Sequence analysis of SNED1 revealed that it displays two integrin-binding consensus sequences, RGD and LDV (Figure 1A), which suggests that integrins may serve as SNED1 receptors. Two additional membrane proteins, Indian hedgehog (IHH) and tissue factor (F3), are also annotated as matrisomeassociated and secreted proteins respectively (Figure 6A and Supplementary Table S6). Last, 47 partners of SNED1 are identified as being extracellular proteins, including 30 matrisome proteins, 10 matrisome-associated proteins, and 7 secreted proteins ( Figure 6A and Supplementary Table S6). 45 unique interactions were also predicted to involve intracellular proteins (Supplementary Table   S6).
Comparison of the predictions made by the different algorithms revealed that 10 interactions were predicted by at least two methods, including the ECM proteins collagen VII (COL7A1), tenascin N (TNN), and fibronectin (FN1), the ECM receptor integrin 4 subunit (ITGB4) and the related secreted We then focused on the 13 binding partners predicted to interact with SNED1 by HOMCOS (Supplementary Table S6E). This tool specifically allows the 3D modeling of structures and interactions using 3D molecular similarities, resulting in the mapping of the potential interactor binding sites to SNED1 domains. We found that most putative interactor binding sites were located within EGF-like domains, whereas only a few partners, including the proteoglycan aggrecan (ACAN) Of note, no partner was predicted to interact with the NIDO, follistatin, or C-terminal domains of SNED1 (Supplementary Figure S8B and C), which, again, may reflect the limited experimental data available for these domains. Future studies will be aimed at testing experimentally whether these predicted interactors indeed bind SNED1 and, if so, we will further determine the characteristics of these interactions (e.g. binding affinity, precise mapping of interaction sites) and their biological relevance.

Potential binding partners of SNED1 are involved in multiple signaling pathways
The in-silico interaction network of SNED1 was then analyzed using Reactome (58) to identify associated biological pathways. No annotation could be retrieved for SNED1 itself, further highlighting the critical gap in knowledge about this protein (Reactome, version 73, released June 17, 2020). The Reactome database included information on 94 of the 114 predicted SNED1 partners, and since the majority of them are either part of the matrisome or are transmembrane receptors, the processes most over-represented in SNED1's network were "signal transduction", "cell-cell communication", "ECM organization", and "developmental biology" (Figure 6B). More specifically, the predicted SNED1 interactors were found to be part of, or contribute significantly to, more defined pathways, including "integrin cell surface interaction" and "ECM proteoglycans" (Figure 6C). This analysis, together with the list of predicted SNED1 interactors, will help prioritize future lines of investigation focused on uncovering the molecular mechanisms by which SNED1 controls aspects of embryonic development (19) and breast cancer metastasis (20). 20 While 293T cells are an excellent mammalian system to produce and purify proteins and ECM proteins, they do not assemble an ECM scaffold in vitro. In order to test the interaction of SNED1 with its predicted partners, we needed a cellular system in which we can assess ECM proteins in situ.

SNED1 is a fibrillar ECM protein and colocalizes with fibronectin
Fibroblasts are the main producers of ECM proteins in vivo. In vitro, these cells can secrete, deposit, and assemble ECM proteins into a structural ECM scaffold (23,86,87). To study the pattern of deposition of SNED1 within the ECM and test its interaction with other ECM partners, we sought to take advantage of mouse embryonic fibroblasts (MEFs) we recently obtained from the Sned1 knockout mouse model we generated (19). While the SNED1 antibody reported in the present study detected SNED1 in applications such as western blot (Figure 2), it failed to detect SNED1 in-situ. We with the anti-GFP antibody, further confirming the specificity of the fibrillar pattern observed. This fibrillar pattern was reminiscent of that of other known fibrillar ECM proteins such as fibronectin (23,86,87). Since we predicted that fibronectin could be a potential interactor of SNED1 (Figure 6A), we sought to determine whether SNED1 and fibronectin colocalized within the ECM. We observed a partial overlap and co-alignment between these two proteins ( Figure 7C and Supplementary Figure   9C). Future studies are now needed to determine whether SNED1 and fibronectin are capable of physically interacting and, if so, whether their interaction is direct or mediated by other ECM proteins or GAGs. It would also be interesting to determine, in future studies, the role of this possible interaction in ECM deposition, assembly, or remodeling.

DISCUSSION
Deciphering the nature of protein-protein interactions within the ECM is critical to understand the mechanisms governing proper ECM assembly and signaling functions in health and disease. We report here the computational prediction of the structure and interaction network of the novel ECM protein SNED1 and provide experimental insight into this protein's properties. While SNED1 shares structural The NIDO domain found in the N-terminal region of SNED1 is only found in 4 other human proteins.
Structure/function analysis of the NIDO domain of mucin-4 has revealed its role in promoting the invasiveness of pancreatic tumor cells (68,88). We have previously demonstrated that SNED1 promotes mammary tumor metastasis (20), and SNED1 was also identified in a screen as a mediator of p53-dependent pancreatic tumor cell invasive phenotype (89 interactors contribute identified "ECM organization" as one of the most significantly enriched pathways, and 6 collagen chains (COL6A3, COL7A1, COL12A1, COL14A1, COL16A1, and COL20A1) were predicted to bind to SNED1. Together, these results hint at a potential role for SNED1 in regulating collagen deposition and organization, which will need to be experimentally assessed.
While our focus is on the full-length, secreted ECM protein SNED1, early reports indicated that the 3' half of SNED1 could encode an intracellular protein, then named insulin response element-binding protein 1 (IRE-BP1), since it was identified by phage display to bind the IRE of the gene encoding insulin-like growth factor-binding protein 3 (IGFBP3) (91,92). Sequence analysis suggests that an alternative start codon could generate this shorter isoform. Further database interrogation revealed multiple putative isoforms of SNED1, however none of them have been reported experimentally yet.
Last, the only experimentally-detected interactor of SNED1 is the intracellular estrogen receptor beta (79). Whether the full-length secreted SNED1 can interact with this protein, or whether it is an intracellular shorter isoform or a truncated form of SNED1 remains to be determined, as does the physiological relevance of this interaction.

CONCLUSION
In summary, our study has provided the first biochemical and biophysical insights into the novel ECM protein SNED1 and is paving the way for future mechanistic studies that will eventually help us understand its multi-faceted roles in development, health, and disease.

DATA AVAILABILITY STATEMENT
All antibodies, constructs, and cell lines generated for this study are available upon request to Dr.
Naba. Additional information on experimental design can be obtained from Dr. Ricard-Blum and Dr. Naba.

FUNDING SOURCES
This work has been supported by a grant from the Fondation pour la Recherche Médicale n°DBI20141231336 to SRB, and by a start-up fund from the Department of Physiology and Biophysics at UIC to AN.

CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest with the contents of this article.