Ten Years of Extracellular Matrix Proteomics: Accomplishments, Challenges, and Future Perspectives

The extracellular matrix (ECM) is a complex assembly of hundreds of proteins forming the architectural scaffold of multicellular organisms. In addition to its structural role, the ECM conveys signals orchestrating cellular phenotypes. Alterations of ECM composition, abundance, structure, or mechanics have been linked to diseases and disorders affecting all physiological systems, including fibrosis and cancer. Deciphering the protein composition of the ECM and how it changes in pathophysiological contexts is thus the first step toward understanding the roles of the ECM in health and disease and toward the development of therapeutic strategies to correct disease-causing ECM alterations. Potentially, the ECM also represents a vast, yet untapped reservoir of disease biomarkers. ECM proteins are characterized by unique biochemical properties that have hindered their study: they are large, heavily and uniquely posttranslationally modified, and highly insoluble. Overcoming these challenges, we and others have devised mass-spectrometry–based proteomic approaches to define the ECM composition, or “matrisome,” of tissues. This first part of this review provides a historical overview of ECM proteomics research and presents the latest advances that now allow the profiling of the ECM of healthy and diseased tissues. The second part highlights recent examples illustrating how ECM proteomics has emerged as a powerful discovery pipeline to identify prognostic cancer biomarkers. The third part discusses remaining challenges limiting our ability to translate findings to clinical application and proposes approaches to overcome them. Lastly, the review introduces readers to resources available to facilitate the interpretation of ECM proteomics datasets. The ECM was once thought to be impenetrable. Mass spectrometry–based proteomics has proven to be a powerful tool to decode the ECM. In light of the progress made over the past decade, there are reasons to believe that the in-depth exploration of the matrisome is within reach and that we may soon witness the first translational application of ECM proteomics.

Ten Years of Extracellular Matrix Proteomics: Accomplishments, Challenges, and Future Perspectives Alexandra Naba 1,2,* The extracellular matrix (ECM) is a complex assembly of hundreds of proteins forming the architectural scaffold of multicellular organisms. In addition to its structural role, the ECM conveys signals orchestrating cellular phenotypes. Alterations of ECM composition, abundance, structure, or mechanics have been linked to diseases and disorders affecting all physiological systems, including fibrosis and cancer. Deciphering the protein composition of the ECM and how it changes in pathophysiological contexts is thus the first step toward understanding the roles of the ECM in health and disease and toward the development of therapeutic strategies to correct disease-causing ECM alterations. Potentially, the ECM also represents a vast, yet untapped reservoir of disease biomarkers. ECM proteins are characterized by unique biochemical properties that have hindered their study: they are large, heavily and uniquely posttranslationally modified, and highly insoluble. Overcoming these challenges, we and others have devised mass-spectrometry-based proteomic approaches to define the ECM composition, or "matrisome," of tissues. This first part of this review provides a historical overview of ECM proteomics research and presents the latest advances that now allow the profiling of the ECM of healthy and diseased tissues. The second part highlights recent examples illustrating how ECM proteomics has emerged as a powerful discovery pipeline to identify prognostic cancer biomarkers. The third part discusses remaining challenges limiting our ability to translate findings to clinical application and proposes approaches to overcome them. Lastly, the review introduces readers to resources available to facilitate the interpretation of ECM proteomics datasets. The ECM was once thought to be impenetrable. Mass spectrometry-based proteomics has proven to be a powerful tool to decode the ECM. In light of the progress made over the past decade, there are reasons to believe that the in-depth exploration of the matrisome is within reach and that we may soon witness the first translational application of ECM proteomics.

THE EXTRACELLULAR MATRIX: THE MASTER ORGANIZER OF MULTICELLULAR ORGANISMS
The extracellular matrix (ECM) is a complex assembly of proteins forming the architectural scaffold of all multicellular organisms (1)(2)(3). As such, the ECM guides cell polarization and serves as a substrate to cell migration, it organizes cells into tissues and tissues into organs, and confers mechanical properties to tissues. In addition to its structural roles, the ECM exerts signaling functions through mechanotransduction (4,5). It also provides biochemical cues interpreted by cells via cell-surface receptors (e.g., integrins (6), syndecans, adhesion GPCRs (7)) that orchestrate most, if not all, cellular functions, from cell proliferation and survival to adhesion and migration, to stemness and differentiation. The ECM thus plays critical roles during development, growth, and other physiological processes including wound healing and aging (8)(9)(10)(11)(12). Simply put, the ECM is essential for multicellular life.

THE EXTRACELLULAR MATRIX: A KEY PLAYER IN DISEASE ETIOLOGY AND PROGRESSION
The ECM is a dynamic compartment that undergoes compositional turnover and structural remodeling mediated by both enzymatic and nonenzymatic processes. Disruption of ECM homeostasis, caused by mutations in ECM genes (13), by an imbalance between ECM production and degradation, or by inadequate ECM remodeling, results in disorders and diseases affecting all physiological systems (14-16) including the musculoskeletal system (e.g., Ehlers-Danlos syndrome (17), arthritis), the skin (e.g., scleroderma (18), epidermolysis bullosa (19)), the cardiovascular system (e.g., Marfan syndrome (20)), the respiratory system (lung fibrosis (21)), and the excretory system (e.g., Alport syndrome, Goodpasture syndrome, renal fibrosis (22,23)), to list a few. In addition, excessive ECM accumulation is a hallmark of fibrosis (24) and cancer (25)(26)(27). The extent of ECM deposition in the context of cancer, assessed by the tumor:stroma ratio, has been shown to be of prognostic value for patients with colorectal cancer (28,29). Nine genes of the 70-gene MammaPrint panel used for early breast cancer diagnosis (30) are ECM genes. ECM proteins present the advantage of being readily accessible, outside the cells. Consequently, they can be used for the targeted delivery of imaging agents (31)(32)(33) or drugs, for example, by using bispecific agents composed of a moiety recognizing a disease-specific ECM protein and an immunomodulatory cytokine (34)(35)(36). Lastly, it has been proposed that modulating the architecture or biophysical properties of the ECM or ECM-cell interactions could be a valid therapeutic approach in various contexts (37)(38)(39)(40)(41)(42). The ECM thus constitutes a large reservoir of biomarkers and potential therapeutic targets. Yet, while some proteins (e.g., fibronectin, elastin) or families of proteins (e.g., collagens, tenascins) of the ECM have been extensively studied, the ECM as a whole, remained, until recently, largely underexplored (43) and uncharted (44).

CHALLENGES POSED BY ECM PROTEINS
The very biochemical properties allowing ECM proteins to assemble into an architectural scaffold capable of withstanding significant mechanical stress and deformations have hindered our ability to study the global composition of the ECM. The core, structural proteins of the ECM tend to be very large, on average 1045 amino acids long. ECM proteins undergo extensive intracellular and extracellular posttranslational modifications (PTMs), including glycosylation, lysine and proline hydroxylation for collagens and collagen-domaincontaining proteins that contribute to the stabilization of the triple-helical structure of collagens (45), and glycation. ECM proteins also assemble into higher-order molecular structures established via hydrogen bonds (e.g., collagen triple-helical structures (46,47)), disulfide bonds (e.g., fibronectin dimers (48)), and covalent cross-links (e.g., elastin (49), collagens (50)). These biochemical properties contribute to making ECM proteins highly insoluble and, hence, challenging to study using standard biochemical approaches like SDS-PAGE, immunoprecipitation and pull-down assays or mass spectrometry (MS). Because of their high insolubility, ECM proteins are underrepresented in global proteomic datasets. Further contributing to this underrepresentation is the fact that, apart from a few exceptions, the ECM represents a small fraction of healthy organ and tissue mass.
The second challenge limiting the comprehensive characterization of the ECM is its broad dynamic range in terms of protein abundance. The ECM is comprised of very large and highly abundant structural ECM components, which can generate many peptides (for example, there are 121 trypsin cleavage sites in the alpha 1 chain of collagen I), and also smaller secreted factors, such as ECM-remodeling enzymes, growth factors, or morphogens, present in much lower abundance. This limitation is not unique to the ECM, and advances in instrumentations and methods to fractionate protein and peptide samples, that will not be discussed here, have been key to capture the complexity of different subproteomes and are now being applied to ECM proteomics (see below).
The first attempts at profiling the protein composition of the ECM of ECM-rich tissues, like the cartilage, or following ECM enrichment for other tissues, employed SDS-PAGE or 2D gel electrophoresis to separate the subsets of ECM proteins that could be solubilized, followed by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). These studies reported the detection of up to a few dozen structural ECM proteins. At the time, this was no small feat and these early studies have been instrumental in helping shape the field of ECM proteomics (51)(52)(53)(54)(55)(56). Of note, with sample preparation protocols tailored to account for the unique challenges posed by ECM protein (insolubility, extensive glycosylation), separation by 1D SDS-PAGE resulted in the identification of nearly 100 distinct extracellular proteins (57,58).
However, for most studies, known ECM proteins, expected to be detected in those tissues, were not identified. One may then ask: how can we ensure capturing potential novel ECM proteins or proteins not known to be present in these tissues? And indeed, the third challenge our field faced when attempting to characterize, in an unbiased manner, the protein composition of the ECM of tissues, was the lack of a defined parts list to systematically annotate experimental output. As a result, in the early days of ECM proteomics, many proteins listed as "ECM" proteins were in fact intracellular proteins involved in cell-ECM adhesions or secreted proteins but not incorporated in the ECM. Conversely, proteins for which no prior knowledge existed would fail to be annotated as belonging to the ECM. This represented a significant limitation to any attempt aiming to identify biomarkers of diseased states. It thus became obvious that tailored experimental and analytical approaches would be needed to decipher the complexity of the ECM. This review will discuss the latest developments in ECM proteomics, from enhancement in sample preparation and analytical methods to the application of ECM proteomics for the purpose of biomarker and therapeutic target discovery with a focus on cancer. As part of the Special Issue on Clinical Proteomics, this article will highlight selected studies performed on clinical samples or rodent models of human diseases that show translational promise. Of note, ECM proteomics is also applied to study the ECM of many multicellular organisms, for example, zebrafish (59)(60)(61), drosophila (62), or planarians (63) or to the ECM produced by cells in culture. These studies are instrumental to advance fundamental knowledge of the ECM in the context of development, health, and disease. In addition, this review will focus on bottom-up MS-based proteomics, but, it is worth noting that other MS-based modalities can be employed to study different facets of the ECM, including the glycosylation patterns of ECM proteins using glycomics (64)(65)(66)(67), the identification of cleavage fragments of ECM proteins using degradomics (68), or the localization and distribution of ECM proteins using imaging MS (69,70).

DEFINING THE "Matrisome" AND ESTABLISHING A FRAMEWORK FOR ECM PROTEOMICS RESEARCH
In 2012, we published in this journal an article describing a two-pronged approach to define the protein composition of the ECM of tissues (71). While prior studies had attempted to overcome some of the limitations described above (for example, by decellularizing samples or extracting ECM proteins in guanidine hydrochloride), we set up to tackle them all. In brief, we first took advantage of the differential solubility of intracellular and ECM proteins to deplete non-ECM proteins by sequential incubations in extraction, or decellularization, buffers concomitantly enriching for ECM proteins. Observing that incubation in 8 M urea and 100 mM DTT did not fully solubilize ECM-enriched samples and suspecting that many ECM components would be found in the insoluble material, we processed "crude" 8 M-urea-resuspended samples. We then hypothesized that deglycosylating ECM proteins would enhance trypsin accessibility and thus treated samples with Peptide-N-glycosidase F (PNGaseF). We further preincubated deglycosylated ECM-enriched protein suspension with LysC, a smaller protease capable of digesting tightly folded proteins, prior to tryptic digestion. To capture a broad range of ECM components, we fractionated peptide samples using off-gel electrophoresis. Last, to ensure the correct identification and quantification of all ECM proteins, we stipulated the ECMspecific PTMs lysine and proline hydroxylations as variable modifications for database search. Indeed, proline represents 19% of the amino acid sequence of the alpha 1 chain of collagen I and is found in positions X and Y of X-Y-Gly repeats and is often hydroxylated (45).
In parallel, we developed a robust nomenclature to annotate and classify ECM proteins. In brief, we used sequence analysis and the characteristic domain-based organization of ECM proteins (72,73) to derive an ECM parts list (71,(74)(75)(76) that we called the "matrisome." This term was originally coined by Dr George Martin, in the early 1980s, to describe the components of a specialized type of ECM called the basement membrane (77). In 2012, we proposed to expand the definition of the "matrisome" to describe the compendium of genes predicted to encode structural ECM proteins, i.e., the "core matrisome" including collagens, noncollagen ECM glycoproteins, and proteoglycans. The "matrisome" also included proteins not directly contributing to the structure of the ECM meshwork but are involved in ECM homeostasis and signaling functions, such as ECM remodeling enzymes and growth factors capable of interacting with ECM proteins (71,(74)(75)(76).
The combination of an experimental approach, tailored to study ECM proteins, and the definition of the matrisome parts list resulted in the characterization of the ECM of murine lung and colon samples. These comprised well over 100 proteins, including both core matrisome and matrisome-associated proteins, such as the ECM-cross-linking enzymes from the lysyl oxidase and transglutaminase families, ECM-degrading enzymes of the matrix metalloproteinases family, growth factors, and interleukins (71).
A broad distribution via an open access publication (71) and a companion website (http://matrisome.org), easy-to-follow step-by-step protocols and videos (78,79), and the utilization of either commercially available reagents or reagents easy to prepare in any wet-lab with basic biochemistry knowledge have allowed our methods to become broadly adopted and built upon.

OVERVIEW OF CURRENT ECM PROTEOMIC WORKFLOWS
Over the past decade, bottom-up proteomics has become the method of choice to profile the protein composition of the ECM. While nuances exist in the ECM-enrichment/ decellularization protocols employed, in the reagents used to extract or solubilize ECM proteins or in the enzymes used for ECM protein deglycosylation or to generate peptides, all current workflows are based on a similar stepwise process illustrated in Figure 1. Recent developments are briefly reviewed here.

Sample Preparation Methods
Similar to our pipeline, the Schiller and Mann laboratories have devised an ECM-enrichment strategy using detergent that generates four fractions, three containing proteins of low to intermediate solubility and an insoluble ECM-enriched fraction (80). They further proposed to analyze not only the insoluble fraction but also those of intermediate solubilities by LC-MS/ MS to derive a quantitative detergent solubility profile (QDSP) of the ECM of the murine lung (80), human fibrotic lung and skin (81), aging human lung (82), and, more recently, the different regions of the murine brain (83). Upon collation of lists of matrisome proteins detected in each fraction, the QDSP protocol is one of the protocols that results in the largest number of matrisome proteins detected. However, it is a costly procedure, since it requires running a larger number of samples. In addition, it is important to appreciate that the detection of matrisome proteins in fractions of low or intermediate solubility can correspond to the pool of intracellular matrisome proteins not yet secreted or to matrisome proteins found in the extracellular space but not as tightly assembled into the insoluble ECM meshwork. Yet, QDSP is a powerful approach, for researchers interested in evaluating changes in matrisome protein solubility in different physiological or pathological states.
The Hansen laboratory largely contributed to develop approaches for ECM proteomics, including the use of nonenzymatic approaches to digest proteins using hydroxylamine or cyanogen bromide (84,85). More recently McCabe, Hansen, and colleagues conducted an extensive comparative study and assessed the efficiency of five different ECM enrichment methods, four protein extraction methods, and two deglycosylation methods to profile the ECM of several murine tissues (86). However, this study only focused on the impact of experimental parameters on the identification of core matrisome proteins, only partially capturing the complexity of the ECM. Other recent comparative studies of protocols, differing in the methods employed for ECM enrichment, ECM protein solubilization, and/or peptide generation, have revealed that there is a large overlap in the matrisome proteins identified, and that there are also, as expected, subsets of proteins uniquely identified using one protocol or another (86)(87)(88).
The Amorim lab has proposed a simpler two-step fractionation approach using commercially available MS-compatible reagents and separating tissue lysates into two pools of protein: detergent soluble and detergent insoluble (89). Similarly, the Ge lab has developed a photocleavable detergent called azo and applied it in a two-step fractionation approach to characterize the ECM composition of murine mammary tumors (90). As anticipated, in both studies, the pool of soluble proteins contained a larger number of matrisome-associated proteins and the analysis of the pool of insoluble proteins contained a larger number of core matrisome proteins. As discussed above, the detection of matrisome proteins in fractions of low or intermediate solubility can correspond to intracellular matrisome proteins or to matrisome proteins found in the extracellular space but not incorporated in the insoluble ECM meshwork. In addition, the initial in silico prediction of the matrisome-associated genes was intentionally broad and included all members of protein families even if only one member of that family was known to act extracellularly (e.g., cathepsins, cystatins, or annexins). Orthogonal validation, for example, using tissue staining, is thus necessary to gain information on protein localization, which, in turn, will provide crucial information on protein function.
ECM protein deglycosylation plays a key role in enhancing trypsin accessibility. Common protocols employ PNGaseF, an enzyme that cleaves N-linked glycans. However, some peptides may be resistant to PNGase F (91), and PNGaseF will not cleave O-linked glycans. Removal of O-linked glycans can be achieved by exoglycosidases (e.g., sialidase, O-glycosidase, galactosidase). Recent protocols have reported other deglycosylation methods using trifluoromethanesulphonic acid or enzymes to cleave the glycosaminoglycan (GAG) moieties of proteoglycans such as chondroitinase ABC, heparin lyase, heparanase, and keratanase. While deglycosylation may enhance the release of peptides and thus protein discovery and quantification, it is worth noting that not capturing the glycosylation profile of ECM components only results in a partial view of this compartment (see below).
In conclusion, there is no "best" or no "right" sample preparation method. Different methods will yield different results. It is thus important to consider the strengths and limitations of each protocol available, as well as more practical considerations like ease of use, reagent availability, time, and cost. Ultimately, researchers should opt for the method that is best suited to investigate the one facet of the multifaceted ECM they are interested in, for example, identifying the largest number of proteins, focusing on the more insoluble components of the ECM to understand the mechanisms of assembly of the ECM meshwork, or detecting ECM remodeling enzymes present in differential abundance between healthy and diseased tissue for the purpose of drug target discovery.

Tailoring Data Acquisition and Database Search to Enhance Matrisome Protein Discovery
Over the past decade, ECM proteomics has adopted methods broadly used for global proteomics such as tandem- mass labels (TMT, iTRAQ) for accurate determination of protein abundance or the inclusion of peptide fractionation using high-pH reversed-phase liquid chromatography (79). While data-dependent acquisition remains the main MS modality for ECM profiling, a few recent studies have also employed dataindependent acquisition (92)(93)(94)(95).
For accurate protein identification and estimation of protein abundance, it is imperative to maximize peptide-to-spectrum matching, which requires prior knowledge, in particular on the nature of the posttranslational modifications that can affect proteins of interest. For collagens and collagen-domaincontaining proteins, these include oxidation (or hydroxylation) of lysines and prolines. These variable modifications are typically not selected by default for database searches, since they increase the search time for global proteomic studies and have limited impact on the output. However, we and others have shown that omitting lysine and proline oxidation results in the underestimation of the number and abundance of matrisome proteins present in samples (79,96).
To overcome the limitation posed by the large dynamic range of matrisome protein abundance, we attempted to implement a custom dynamic exclusion approach to "ignore" all peptides derived from the most abundant proteins detected in our samples (97). To do so, we constructed a list of 2332 unique peptides including all identified peptides from the alpha 1 and alpha 2 chains of collagen I (Col1a1 and Col1a2), the alpha 1 chain of collagen type III (Col3a1), and the alpha 2 chain of collagen IV (Col4a2). We also excluded all identified peptides from vimentin, myosin-9, and filamin-A, three nonmatrisome proteins detected with high spectral count in our samples. This attempt had only modest success in increasing the number of matrisome proteins identified; however, it resulted in a significant increase in the number of matrisome peptide spectrum matches and, consequently, led to a more accurate quantification of the proteins detected across our experimental conditions (97). A drawback of this approach is that it requires an initial "discovery" run to identify abundant components to be excluded subsequently and thus necessitates additional material (which may pose a problem when working with small human biopsies) and comes at a higher cost, which may limit the applicability of such approach.

TRANSLATIONAL IMPLICATIONS OF ECM PROTEOMICS STUDIES: A FOCUS ON CANCER
Having overcome inherent challenges in studying the ECM and having established the feasibility of analyzing the protein composition of the ECM using bottom-up proteomics, the question became: can we derive meaningful biological and clinical information from ECM proteomic studies?
Since the ECM plays key roles in maintaining the homeostasis of all physiological systems, dysregulation of the ECM has consequences on all those systems. Over the past decade, ECM proteomics has thus been applied to a vast range of samples (e.g., clinical biopsies, tissues from mouse models of human diseases, tissue interstitial fluid or serum, ECM produced by cells in culture) and in the context of diverse diseases and disorders. Excellent reviews have recently discussed the application of proteomics to study the ECM of the skin (98), cardiovascular diseases (99,100), liver diseases (101,102), lung diseases (103), and neurodegenerative diseases (64,104,105).
This section will thus focus on selected studies of the past 5 years that have employed matrisomics to study the ECM of cancer microenvironments. For a survey of early proteomic studies of the cancer matrisome (including melanoma, multiple myeloma, colorectal cancer), readers are invited to refer to our previous review (106).

The Matrisome of Breast Cancers
In an early study, we applied ECM proteomics to characterize the protein composition of the ECM of poorly and highly metastatic primary mammary tumor xenografts. This study led to the identification of a 43-matrisome-protein signature characteristic of mammary tumors of higher metastatic potential (107). This signature included matrisome proteins that had previously been associated with breast cancer progression such as lysyl oxidase-like 2 (108) and angiopoietin-like 4 (109), and also identified known matrisome proteins that had never been associated with this disease before such as the latent TGFβ-binding protein 3 (LTBP3). It also identified a novel ECM glycoprotein of unknown function, SNED1. Importantly, we further demonstrated that the level of expression of LTBP3 and SNED1 negatively correlated with breast cancer patient prognosis (110). In a follow-up study, we employed TMTbased proteomics to compare the protein composition of the ECM of metastases arising from human mammary tumor cells in different organs, namely, the brain, lungs, liver, and bone marrow. This approach allowed us to delineate the contribution of tumor cells and stromal cells to the building of the metastatic niche, since human proteins secreted by tumor cells can be distinguished from murine proteins secreted by host cells (111). This study led to the quantification of over 300 human and mouse matrisome proteoforms produced by either or both tumor cells and stromal cells. We further showed that each metastatic niche was characterized by a unique ECM composition contributed by both tumor cells and stromal cells (111).
Work pioneered by the late Dr Patricia Keely revealed that remodeling of the architecture of the fibrillar collagen meshwork is a key driver of breast cancer progression (112)(113)(114). Over the past 5 years there has thus been an increased interest in applying ECM proteomics to characterize and identify the mechanisms driving these changes. Tomko and colleagues applied ECM proteomics to profile compositional changes accompanying the remodeling of the fibrillar collagen meshwork observed in invasive ductal carcinomas (115). They identified 27 matrisome proteins with differential abundance between healthy mammary tissue and invasive ductal including tenascin-C and thrombospondin-2 that presented a distribution pattern correlating with that of the remodeled fibrillar collagen meshwork in tumor samples. In a 2020 study, the Weaver laboratory employed ECM proteomics to identify protein signatures correlating with the four classes of mammographic densities according to the Breast Imaging-Reporting and Data System (116). The team found that samples from tissues with higher mammographic density were enriched for fibrillar collagens (type I and V) and the fibril-associated collagen type XII. Importantly, a higher mammographic density has been correlated with an increased life-time risk of breast malignancy.
The Cox laboratory recently published a study surveying changes in matrisome protein composition at different stages of tumor progression using a genetically engineered mouse model of breast cancer (117). The team reported the identification of four classes of proteins: the first one encompassing proteins detected with increased abundance over time in healthy mammary gland and with even higher abundance at different stages of tumor progression; the second consisting of proteins detected in higher abundance in tumors as compared with healthy tissues; the third including proteins with abundance negatively correlating with tumor progression; and the fourth representing proteins detected in decreased abundance in early and mid-stages of development as compared with healthy tissue but with increased abundance in late-stage tumors as compared with normal age-matched mammary glands (117). The team reported that collagen type XII strongly correlated with tumor progression and confirmed results of a prior study conducted in patients with breast cancer demonstrating an increased expression of COL12A1 in breast cancers as compared with normal mammary tissue (116). The team further showed, using tissue microarray (TMA) staining on a cohort of 150 patients, that the abundance of collagen type XII negatively correlated with disease-specific and recurrence-free survival. This result demonstrates the robustness of data obtained using ECM proteomics on a small number of samples (n = 5) and the translational potential of findings obtained on preclinical models to patients. Mechanistically, the authors showed that collagen type XII is expressed by cancer-associated fibroblasts and contributes to create a permissive microenvironment supportive of cancer invasion by remodeling the architecture of the fibrillar collagen meshwork. This study opens up the possibility of using COL12A1 not only as a prognostic marker but also as a target for antimetastatic strategies.
Recent large-cohort studies have established that obesity is associated with an increased risk of cancer, including breast cancer (118). The Fischbach laboratory established that obesity could induce mammary ECM remodeling and, in turn, promote breast tumorigenesis in a mouse model (119). Based on this observation, the Oudin laboratory employed TMT proteomics to compare the ECM composition of mammary gland from mice fed a chow diet or a high-fat diet. The team reported the identification of a set of ECM proteins present in differential abundance between the two conditions, with collagen type XII presenting the highest abundance increase in the mammary gland of obese mice as compared with control (120). Comparison of the ECM proteins found enriched in obese mammary gland as compared with control mammary gland, in murine mammary tumors as compared with healthy mammary gland (121), and in highly metastatic versus poorly metastatic mammary tumor xenografts (107) and identified a 9-ECM-protein signature comprising collagens type VI and type XII, fibronectin, elastin, vitronectin, the laminin alpha 5 chain, annexin A3, galectin 1, and the von Willebrand Factor A Domain Containing 1 protein (120). This suggests that obesityinduced compositional changes in the mammary gland ECM overlaps with changes observed during cancer progression. The functional relevance and prognostic value of this signature was further exemplified by showing that collagen type VI promoted breast cancer invasiveness and that the expression level of the three genes COL6A1, COL6A2, and COL6A3, encoding the three chains assembling for the functional collagen type VI protein, negatively correlated with breast cancer patient prognosis (120).
Another study from the Oudin laboratory characterized the changes in the ECM composition of the mammary gland of mice that received chemotherapy (paclitaxel or doxorubicin) or a vehicle control (122). The authors showed that treatment with paclitaxel resulted in changes in ECM composition (5 matrisome proteins detected in higher abundance and 54 in lower abundance, as compared with vehicle control) distinct from the changes induced by doxorubicin treatment (32 matrisome proteins detected in higher abundance and 28 in lower abundance, as compared with vehicle control, with a marked decrease in the abundance of several collagens). The comparison of the two signatures identified two secreted factors S100a9 and trichohyalin in higher abundance upon treatment, while 11 matrisome proteins were detected in lower abundance upon treatment, including the core matrisome glycoproteins, periostin, thrombospondin 2, and hemicentin.

The Matrisome of Pancreatic Ductal Adenocarcinomas
Pancreatic cancers accounted for close to 60,000 new cancer cases diagnosed in the United States in 2022 (123). With one of the worst 5-year survival rates (11%), pancreatic ductal adenocarcinoma (PDAC) is also one of the deadliest cancer types (124). This is because it tends to be diagnosed at late stages and only limited therapeutic interventions are available (125). PDAC is also one of the most fibrotic cancer types and is characterized by an excessive ECM accumulation, which, we now know, is a key factor driving cancer aggressiveness and resistance to treatment (126). Proteomic profiling of the ECM of genetically engineered mouse models of pancreatic ductal adenocarcinomas at different stages of tumor progression revealed that the early stage of PDAC progression was characterized by an increase in the abundance of the glycoprotein fibronectin, the matricellular proteins tenascin-C and thrombospondins 1 and 2, and the ECM cross-linking enzymes lysyl oxidase-like 2 and transglutaminase 2 and a concomitant decrease in the abundance of basement membrane components: alpha chains 1 and 2 of collagen IV and the chains composing the laminin trimer 221 (127). Further disease progression saw a decrease in matricellular proteins but an increase in several proteoglycans (fibromodulin, biglycan, prolargin) (127).
More recently, Tian and collaborators profiled the ECM composition of early-stage human pancreatic intraepithelial neoplasia and advanced PDAC samples using TMT proteomics and identified a subset of 136 matrisome proteins, including collagen type I, fibronectin, fibrillin 1, and periostin, present in high abundance in pancreatic intraepithelial neoplasias as compared with healthy pancreatic tissue, and in even higher abundance in PDAC samples (128). Moreover, using xenograft models, Tian and collaborators demonstrated that over 90% of the matrisome proteins of the PDAC microenvironment were secreted by stromal cells, while 10% were secreted by tumor cells. In a follow-up study, the team demonstrated that three of the proteins secreted by the tumor cells, agrin, SERPINB5, and cystatin B, promoted PDAC metastasis and correlated with poor patient prognosis (129).

ECM Signatures of Distinct Steps of Cancer Progression
Most cancer types undergo a stepwise progression. One of the key steps in cancer progression is angiogenesis, the step at which tumors become vascularized (130). This step is critical for tumor growth through the increased availability of oxygen and nutrients and for tumor dissemination via the systemic circulation (131). Using label (iTRAQ)-based quantitative proteomics and a genetically engineered mouse model recapitulating insulinoma progression and characterized by a precisely defined angiogenic switch (132), we profiled the ECM composition of nonangiogenic and angiogenic tumors. We found that the core matrisome proteins periostin and the EGF-containing fibulin extracellular matrix protein 1 (Efemp1) increased in abundance, while decorin, hemicentin, DMBT1, and the von Willebrand factor A domain containing protein 5A (VWA5A) decreased in abundance during the angiogenic switch (133).
Dormancy is another critical step of cancer progression. During this phase, tumor cells that have disseminated to distant organs enter quiescence prior to reawakening, which leads to patient relapse, sometimes decades later (134). Recent studies have pointed to the role of the local and systemic environments in the maintenance of dormancy or, on the contrary, in the reactivation of a proliferative program (135,136). Yet, the precise mechanisms by which the ECM influences this key step in cancer progression are not known. In a 2022 study, we reported the characterization of the ECM of a mouse model of proliferative (T-Hep3) and dormant (D-Hep3) head and neck squamous cell carcinoma xenografts and identified collagen type III as produced in higher abundance by dormant tumor cells (137). We further showed that re-expression of COL3A1, encoding collagen type III, restored the proliferative potential of tumor cells via the discoidin domain receptor 1 and the Stat signaling pathway. Mechanistically, we showed that the reactivation or "awakening" of tumor cells upon collagen type III production was supported by a change in the architecture of the fibrillar collagen meshwork within the ECM (137). This study strongly supports the idea that manipulating the tumor microenvironment may be a viable approach to preventing metastatic growth and relapse.

ECM Signatures of Cancer Subtypes
Tumors arising in a given organ can originate from different cell types and/or in different regions in that organ and hence may contain a different ECM. Proteomics has been applied to study the ECM composition of two of the main common types of brain tumors, medulloblastomas and glioblastomas (GBMs), and has revealed that, in addition to unique ECM signatures distinguishing medulloblastomas and glioblastomas, a subset of ECM proteins was detected in higher abundance in both tumor types as compared with control brain tissues (138). In a recent study, Sethi et al. applied glycoproteomics and glycomics to investigate the ECM composition (glycoproteins and GAG moieties of proteoglycans, respectively) of different human glioblastoma subtypes with the goal of identifying potential GBM biomarkers (139). This study employed an interesting protocol, using a TMA of fixed GBM and control samples as starting material. Presumably, the limited size of the starting material did not allow sample decellularization, but using a chemical inkjet printer, the team was able to "print" on each TMA core a combination of enzymes (e.g., deglycosylating enzymes, proteinases) and derive the matrisome profile of each sample. The study identified a total of 146 matrisome components and revealed differences in the nature of the GAGs detected, in the nature and abundance of ECM proteins detected, and also, interestingly, in the nature and abundance of certain types of posttranslational modifications (e.g., proline hydroxylation of collagen peptides) between the different GBM subtypes. Since the study was conducted on intact tissue samples, the team was also able to detect changes in the abundance of intracellular enzymes responsible for the PTMs of ECM proteins (e.g., glycosyltransferase and glycosidase enzymes), something that cannot be captured when working on decellularized samples.
In summary, ECM-focused proteomics is now broadly adopted to study the composition of tumor microenvironments and to identify proteins characteristic of disease stages. As recent studies have shown, matrisome proteins whose change in abundance correlates with disease progression can serve as prognostic markers or be used to monitor treatment efficiency. Interestingly, certain ECM proteins or families of proteins (matricellular proteins such as tenascin-C, thrombospondins, periostin; fibrillar and fibrilassociated collagens like collagens type I, III, and XII; ECM cross-linking enzymes of the lysyl oxidase and transglutaminase families) are consistently altered across different cancer types, opening the possibility of the existence of common ECM-dependent mechanisms being at play during tumor progression. This is of paramount importance as we think about ways to exploit the ECM to develop new anticancer strategies.

Capturing ECM Proteoform Diversity
Once challenging, the characterization of the protein composition of the ECM of tissues is now broadly accessible. But ECM proteins, beyond their identity, have much more to reveal. For example, to what extent are they posttranslationally modified? Do ECM PTMs vary during the time course of disease progression or with aging? How are ECM proteins folded? With which other proteins do they interact? Addressing these questions will contribute to increasing our knowledge of ECM protein functions and of the fundamental mechanisms orchestrated by cell-ECM interactions. MSbased proteomics has the potential to help address these questions.
The recognition of proteoform diversity is not recent (140), neither is the recognition of the need to profile proteoforms (141,142). Examples of key roles played by ECM proteoforms abound in the context of cancer: proteoforms arising from alternative splicing of fibronectin or tenascin-C are specifically detected in tumors but not healthy tissues and have been shown to promote tumor progression (143)(144)(145). Diseasespecific ECM proteoforms have thus been proposed to be excellent candidates to mediate the targeted delivery of imaging or therapeutic agents to diseased tissues (31,35,146). Yet, current ECM proteomics approaches result, with the exception of the most abundant ECM proteins, like collagen type I, in modest sequence coverage, limiting our ability to capture proteoform diversity. For example, interrogation of the ECM protein knowledge database, MatrisomeDB (147), reveals that the coverage of fibronectin (UniProt P11276) is 20.6% in healthy liver samples or 11.5% in primary metastatic colorectal tumor samples, and that of thrombospondin1 (UniProt P07996) 10.9% in the ECM of normal saphenous vein samples.
In addition to proteoforms resulting from alternative splicing, ECM proteins can include single amino acid variants. For example, we have recently shown that, in most cancers, matrisome genes are mutated with higher frequency than nonmatrisome genes (148). Importantly, we have further shown that, in certain cases, mutations in ECM genes correlated with patient survival (148). The Polyak laboratory has also recently demonstrated that there is an enrichment in somatic mutations in matrisome genes during the transition of ductal carcinoma in situ to invasive ductal carcinoma (149). However, the functional impact of these mutations remains unknown, and we are yet to identify products of mutated genes in cancer samples.
The main process leading to proteoform diversity is via posttranslational modification. For ECM proteins, this process can occur both intracellularly and extracellularly and has been shown to contribute to different pathophysiological processes. For example, increased collagen cross-linking resulting in a stiffer ECM is a hallmark of cancer (150); fibronectin citrullination increases cell migration in vitro and in vivo in the context of wound healing (151); lysine acetylation of fibronectin is involved in renal fibrosis (152). ECM proteins can also be phosphorylated. However, the kinases responsible for the phosphorylation of secreted proteins have only recently been discovered (153)(154)(155)(156) and most ECM proteomic studies, including our own, did not allow phosphorylations of serine, threonine, and tyrosine as variable modifications. Current strategies aimed at enriched posttranslationally modified proteins (157) cannot be readily applied to study the ECM because of the insoluble nature of ECM proteins. It is thus becoming obvious that strategies to increase protein sequence coverage (e.g., protein and peptide fractionation), as well as the enrichment of protein databases used to search LC-MS/MS output with patient-specific sequences derived from the large body of RNA-Seq studies, and broader or even open search strategies, should contribute to enhance our ability to detect and quantify ECM proteoforms. As a community, we will also need novel analytical methods to interpret LC-MS/MS output. For example, Merl-Pham and colleagues have reported the development of a pipeline aimed at mapping and quantifying proline and lysine hydroxylations and lysine glycosylations for 15 collagen chains produced in vitro by primary lung fibroblasts isolated from patients with idiopathic lung fibrosis (158).
Proteolytic cleavage is an important mechanism leading to matrisome proteoforms diversity. ECM protein degradation by matrix metalloproteinases (159, 160), a disintegrin and metalloproteases (ADAM) (161), a disintegrin-like and metalloproteinase domain with thrombospondin-type 1 motifs proteins (ADAMTS) (162), or cathepsin (163) plays key roles in pathophysiological processes (15). In addition, cleavage fragments of ECM proteins, known as matrikines (or matricryptins) (164), exert signaling functions, sometimes distinct from that of the proteins they arise from (165,166). For example, endostatin is a naturally occurring cleavage fragment of type XVIII collagen, arresten and tumstatin are cleavage fragments of type IV collagen, and all have antiangiogenic properties (167). Identifying proteolytic fragments of proteins using standard proteomics is challenging. To overcome this, novel MS-based methods grouped under the term of "degradomics" targeting the investigation of protein N termini and protease substrates have been devised (168). In recent years, ECM degradomics has become an active field of investigation (68,169,170), in particular in musculoskeletal and dermatology research (171)(172)(173)(174)(175).
Studying the protein components of the ECM only gives a partial view of this complex compartment. Indeed, most ECM proteins are glycosylated, and, in the case of proteoglycans (e.g., perlecan, decorin, versican) (176), the protein moiety only constitutes the minor portion of these components, with GAGs such as chondroitin sulfate, dermatan sulfate, and keratan sulfate composing their majority. MS-based glycomics and glycoproteomics are powerful approaches to characterize the glycans conjugated to proteins (67,(177)(178)(179). These approaches have begun to be applied to the study of the ECM (64) and have led to the identification of glycosylation sites and the characterization of the nature of GAG (180).
Lastly, posttranslational modifications that accumulate over time can alter protein folding and this can impact the nature and quantity of peptides detected by LC-MS/MS. We can thus envision being able to leverage ECM proteomics to gain structural information on ECM proteins. To this end, Eckersley and colleagues have developed a peptide mapping strategy called "peptide location fingerprinting" to identify structural changes, approximated by peptide yields, in ECM proteins (181)(182)(183). By reinterrogating previously published datasets on the ECM of aging mouse lung, aging human intervertebral discs, and atherosclerotic arteries, the team identified proteins displaying conserved structural differences in healthy versus altered states (181,184).

Toward a More Accurate Quantification of Matrisome Protein Abundance
The field of ECM proteomics has not yet broadly adopted targeted assays. This is not unexpected as target identification is a prerequisite to designing such assays. Yet, being able to quantify the abundance of ECM proteins across different conditions more accurately is a first step toward understanding the mechanisms leading to ECM dis-homeostasis in disease, and a necessary step toward the development of interventional strategies aimed at modulating the ECM to revert or correct disease-associated phenotypes. With significant advances made over the past decade, our field is now mature and ready to enter the next phase and employ targeted and more quantitative approaches.
One such approach uses QConCATs, which are standard peptides concatenated in artificial polypeptides, allowing the precise quantification of a limited number of target proteins (185,186). The Hansen lab has developed QConCATs to quantify 77 matrisome proteins and successfully applied this strategy to profile the ECM of the mammary gland and liver microenvironments (85).
Selective or multiple reaction monitoring (187) is another approach permitting the quantification, with high accuracy, of a subset of proteins of interest. As of today, such approaches have not been applied to ECM research. Yet, resources are available. Of the 3500 validated assays available in October 2022 in the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) portal (188,189), 359, or approximately 10%, are designed to quantify 169 human or murine matrisome proteins: 48 ECM glycoproteins, 8 collagens, 7 proteoglycans for the core matrisome and 29 ECMaffiliated proteins, 71 ECM regulators, 28 secreted factors, for matrisome-associated proteins (supplemental Table S1). Global ECM proteomics, by permitting the identification of proteins of interest, has thus laid the foundations of future targeted studies.

Building Spatially Resolved Maps of the ECM
The first step of all sample preparation protocols described above consists of the mechanical disruption of tissues. Any information regarding the spatial distribution of ECM proteins within tissues and organs is thus lost (190). Yet, we know that ECM proteins are not homogeneously distributed across tissues. For example, the specialized basement membrane ECM surrounding blood vessels or underlying epithelia is different in composition, structure, and functions than the interstitial ECM found in connective tissues. Knowledge of the patterns of distribution and organization of ECM proteins in situ can thus provide crucial information on their functions. Certain structures within organs can be physically isolated to allow for a more specific ECM analysis. For example, treatment of murine pancreata by collagenase releases pancreatic islets that can further be processed for matrisome profiling (133). The Lennon laboratory has employed laser capture microdissection to isolate glomeruli from kidneys for analysis of the glomerular basement membrane ECM (191). A similar approach was adopted by Paunas and colleagues to identify ECM changes in the glomerular basement membrane of patients diagnosed with IgA nephropathy (192). Using laser capture microdissection of decellularized skin samples to isolate six distinct regions of the skin, Li and colleagues have defined the first spatially resolved map of the skin matrisome and have shown that each region is characterized by a unique ECM signature and by distinct cellular programs (94). Using the spatial separation of different brain regions and precise dissection, the Götz laboratory compared the ECM composition of the nonneurogenic somatosensory cortex, the olfactory bulb, the lateral subependymal zone where most neural stem cells reside, and the medial subependymal zone and identified a subset of matrisome proteins enriched in the neural stem cell niche of adult murine brain samples (83).
While these approaches allow the analysis of defined structures or regions within organs and tissues, we are still lacking the tools to build spatially resolved maps of the ECM at the cellular resolution. Progress may come from another MS-based technology, imaging mass spectrometry (IMS). IMS is commonly applied to the study of lipids, glycans, and metabolites (193) but has recently been applied for the first time to define the distribution pattern of ECM proteins within tissues (69,70,193). IMS, however, presents limitations: it has a narrow dynamic range and ECM proteins may not be good candidates for IMS due to their highly cross-linked nature, which may hinder ionization. Leveraging LC-MS/ MS data, multiplexed antibody-based approaches can also be envisioned (e.g., CODEX (190)). Significant efforts are being put toward the validation of antibodies, for example, through the Human Protein Atlas (194) or the production of Organ Mapping Antibody Panels by the Human BioMolecular Atlas Program (195). ECM proteins should not be forgotten in these efforts, since they provide the context in which cells function.

RESOURCES FOR ECM PROTEOMICS RESEARCH
The emergence of MS-based proteomics as the method of choice to study the ECM has been concurrent to the emergence of principles dictating "Findability, Accessibility, Interoperability, and Reuse" (FAIR) of -omic datasets (196). This has permitted the development and expansion of ECM knowledgebases and databases briefly described here and summarized in Table 1.

MatrisomeDB and the ECM Atlas
Witnessing the adoption of proteomics to study the ECM, we released in 2016 a draft of an ECM atlas, including ECM proteomics datasets from 14 different tissue and tumor types and made the data available through a searchable database (197). We expanded our efforts and released in 2019 Matri-someDB (https://matrisomedb.org/), the ECM protein knowledge database, which included curated ECM datasets from 17 published studies reprocessed using unified search parameters and available through an interface allowing multiple query inputs (e.g., gene symbol, protein name, protein signature, matrisome category, tissue type, species) (198). The latest release of MatrisomeDB includes datasets from 42 curated ECM proteomic studies, providing data on 2051 human and 949 mouse matrisome proteoforms identified from 6,891,623 human and 4,763,174 mouse matrisome-proteinderived peptide-to-spectrum matches (147). Novel functionalities implemented in this release include enhanced peptide and PTM mapping on domain-based representations and 3D structures of matrisome proteins, as well as referencing to external databases such as the CPTAC assay portal (188) and the Peptide Atlas (199) to facilitate the application of targeted MS to the study of matrisome proteins (147).

The Manchester Peptide Location Fingerprinting
The Manchester Peptide Location Fingerprinting developed by the laboratory of Dr Michael Sherratt at the University of Manchester (https://www.manchester proteome.manchester.ac.uk/#/MPLF) allows the user to query preanalyzed datasets with the goal of identifying structural alterations of ECM proteins, resulting in differential peptide yields in proteomic datasets, across pathophysiological conditions (183).

TopFIND
Developed by the laboratory of Dr Overall, the Terminioriented protein Function Inferred Database (TopFIND) is a database of protease cleavage sites, protein termini, and protein terminus modifications (https://topfind.clip.msl.ubc. ca/) (200). While this resource is not exclusively dedicated to ECM research, users can interrogate TopFIND for cleavage fragment and neotermini identified or inferred for their favorite ECM proteins via the "protein" interface.

MatrixDB: the ECM Interaction Database
A key to understanding ECM roles in pathophysiological processes is to decipher the nature of ECM protein interactions with each other, which lead to the assembly of the ECM scaffold, and also with cells. However, the insoluble nature of matrisome proteins significantly limits our availability to employ strategies used in other contexts to study protein interactomes, such as immune precipitation coupled to MS (201). The Ricard-Blum laboratory developed MatrixDB (http:// matrixdb.univ-lyon1.fr/), a database reporting ECM protein-ECM protein and ECM protein-glycan interactions validated experimentally in vitro and allowing the building of interaction networks. However, to interact, proteins need to be present in the same tissue with precise stoichiometry. To account for this, we integrated the content of MatrisomeDB to the 2019 release of the MatrixDB, allowing users to build tissue-specific interaction networks (202).

MatriNet
MatriNet (https://www.matrinet.org/) was developed by the Izzi laboratory at the University of Oulu to study the ECM connectome and ECM networks in normal and cancerous tissues (203). Originally developed to mine transcriptomic datasets and identify coregulated ECM genes and pathways, MatriNet can also leverage proteomic profiles to identify features of ECM networks across pathophysiological conditions.

Basement Membranebase
Basement membraneBASE (https://bmbase.manchester. ac.uk/) is a knowledgebase jointly developed by the University of Manchester and Duke University that provides information on the composition of the specialized basement membrane ECM in development, life, and disease and across multiple species (204). It also includes information on matrisome protein localization from immunohistochemical staining experiments and provides users with resources, including a list of antibodies and protocols, to study basement membrane proteins. CONCLUSION A better understanding of the biochemical properties of ECM proteins has led, over the past decade, to the development of MS-based strategies capable of decoding the compositional complexity of the ECM of tissues. The democratization of such strategies has resulted in the definition of the matrisome of many tissues, across diverse pathophysiological states, and in the identification of ECM signatures characteristics of these states, leading, in some cases, to the discovery of novel biomarkers of human diseases such as cancer. The ECM holds the promise to be an important reservoir of novel biomarkers and therapeutic targets (205,206). In 2016, we had postulated that the ECM was ready to enter the -omic era (207). With the significant advances made over the past years, and successful examples from preclinical studies, ECM proteomic research is ready for a new chapter and can now enter the clinical proteomic era.
Supplemental data -This article contains supplemental data.
Acknowledgments -I would like to thank Dr Richard O. Hynes, Steven A. Carr, and Karl R. Clauser for their mentorship and support in the early days of ECM proteomics research. I would also like to thank my close collaborators Dr Sylvie Ricard-Blum, Dr Valerio Izzi, and Dr Yu (Tom) Gao, and all past and present members of the Naba laboratory for helpful discussions. Last, I would like to thank Nandini Kapoor, undergraduate research assistant in the Naba lab, for her critical reading of the manuscript. This work was supported in part by the National Institutes of Health [1U01HG012680 and 1R21CA261642]. The content is solely the responsibility of the