Inherited Connective Tissue Disorders of Collagens: Lessons from Targeted Mutagenesis

The extracellular matrix (ECM) is the cell structural environment in tissues and organs. The ECM is a dynamic structure that it is constantly remodelled. It contributes to tissue integrity and mechanical properties. It is also essential for maintaining tissue homeostasis, morphogenesis and differentiation, which it does, through specific interactions with cells. The ECM is composed of a mixture of water and macromolecules classified into four main categories: collagens, proteoglycans, elastic proteins, and non-collagenous glycoproteins (also called adhesive glycoproteins). The nature, concentration and ratio of the different ECM components are all important factors in the regulation of the assembly of complex tissue-specific networks tuned to meet mechanical and biological requirements of tissues.


Introduction
The extracellular matrix (ECM) is the cell structural environment in tissues and organs. The ECM is a dynamic structure that it is constantly remodelled. It contributes to tissue integrity and mechanical properties. It is also essential for maintaining tissue homeostasis, morphogenesis and differentiation, which it does, through specific interactions with cells. The ECM is composed of a mixture of water and macromolecules classified into four main categories: collagens, proteoglycans, elastic proteins, and non-collagenous glycoproteins (also called adhesive glycoproteins). The nature, concentration and ratio of the different ECM components are all important factors in the regulation of the assembly of complex tissue-specific networks tuned to meet mechanical and biological requirements of tissues.
Collagens form a superfamily of 28 trimeric proteins, distinguishable from the other ECM components by their particular abundance in tissues (collagens represent up to 80-90% of total proteins in skin, tendon and bones) and their capacity to self-assemble into supramolecular organized structures (the best known being the banded fibers). The collagen superfamily is highly complex and shows a remarkable diversity in structure, tissue distribution and function (Ricard-Blum and Ruggiero, 2005).
The importance of collagens has been illustrated by the wide range of mutations in collagen genes that result in minor and severe human diseases. Various mutations (point, null or structural mutations, insertions, exon skipping, deletions) in genes encoding collagens are known to be responsible for a large spectrum of human disorders (e.g., Elhers-Danlos syndrome, epidermolysis bullosa, chondrodysplasia, osteogenesis imperfecta, Alport syndrome, Bethlem myopathy, Ulrich congenital muscular dystrophy, Fuchs' endothelial dystrophy, Knobloch syndrome) that affect different tissues and organs, such as skin, blood vessels, cartilage, bones, kidney, muscle, cornea and retina. Considering the variety of collagen-related diseases and the complexity of collagen biology, there is a clear need to understand how mutations alter collagen synthesis, cell trafficking, cell and molecular interactions to result in tissue dysfunction. In the eighties targeted mutagenesis emerged as a new approach to help establish the structure-function relationship of collagens. Along with the emergence of protein engineering and genetically modified mice, site-directed mutagenesis has become instrumental in understanding the physiopathology of diseases, as well as in developing new and specific therapies and drugs for the treatment of human diseases. To date about 20 distinct genes encoding collagen chains have been ablated (by knock-out mutations) in mice or are involved in naturally occurring mutations. Only a few knock-in modified mice has been generated, in which a single point mutation or an exon deletion, for example, has been generated in a specific gene. This is likely due to the very large size of collagen genes. Site-directed knock-in mutations in mice have often proven to be more useful than knock-out mutations (which inactivate genes) for the analysis of the genotype-phenotype relationship, since small mutations represent the primary bases of inherited diseases.
The aim of this chapter is to describe the use of targeted mutagenesis in the understanding of the physiopathology of inherited connective tissue disorders. Specifically we are concerned with mutations in collagen genes. We will focus on the use of site-directed mutagenesis to analyze the causative effects of human-identified collagen gene mutations. Recombinant molecules were used to analyze the effects of these mutations on collagen structure, biosynthesis, posttranslational modifications and interactions with binding partners and cells. This work has considerably improved our knowledge in development and in human disorders. These results will then be compared with the limited information about the introduction of subtle targeted mutations into murine collagen genes.

The collagen superfamily at a glance
The 28 members of the collagen superfamily exhibit considerable complexity and diversity in structure, assembly and function. However, collagens also share common features. (i) All members are modular proteins composed of collagenous (COL) domains flanked by non collagenous (NC) domains or linker regions. (ii) They are trimeric molecules formed by the association of three identical or different -chains, which are characterized by repetitions of the G-x-y tripeptide (with the x and y positions often occupied by proline and hydroxyproline, respectively). (Abbreviations and single-letter codes for amino acids are given in Table 1 of the chapter by Figurski et al.) (iii) They are able to assemble into supramolecular aggregates in the extracellular space, although this property has not been proven for all recently identified collagen members. Collagens also undergo various posttranslational modifications, including proteolytic processing, fibril formation, reticulation, shedding of transmembrane collagens and production of functional domains (also called matricryptins) (Ricard-Blum and . The mechanisms of collagen biosynthesis are far from being completely understood. Our knowledge is primarily based on the biosynthesis of fibril-forming collagens. Triple-helix formation commonly starts at the Cterminus (C-propeptide) of the -pro-chain and proceeds toward the N-terminus (Npropeptide) in a zipper-like fashion. Prior to and simultaneously with triple-helix formation, specific prolines and lysines are chemically modified by addition of hydroxyl group. These modifications play a pivotal role in stabilization and resistance to temperature. Completed trimeric procollagens are secreted from the cells, proteolytically processed and assemble into collagen fibrils (Ricard-Blum and Ruggiero, 2005).
Based on their structure and supramolecular organization, collagens have been divided into several subfamilies (Myllyharju and Kivirikko, 2001). They are (i) the fibril-forming collagens I, II, III, V, XI, XXIV and XVII, which share the capacity to assemble into organized fibrils; (ii) the network-forming collagens IV, VIII and X and the FACIT (Fibril-Associated Collagen with Interrupted Triple-helix collagens) collagens IX, XII, XIV, XVI, XIX, XX, XI and XXII, which are known to mediate protein-protein interactions; (iii) the basement membrane multiplexin (multiple triple-helix domains and interruptions) collagens XV and XVIII; (iv) the transmembrane collagens, including the neuronal XXV collagen and types XIII, XVII, XXIII; and finally (v) other unconventional collagens, such as the anchoring fibrils collagen VII and the ubiquitous collagen VI, which assembles into characteristic beaded filaments (Table 1).
The length of the triple helical domains varies noticeably among different collagen types. Fibril-forming collagens consist of a long central COL domain with about 1000 amino acids (330 G-x-y tripeptide repeats), flanked by small terminal globular extensions (NC domains). After proteolytic processing of the N and C-terminal extensions, the mature molecules aggregate into highly ordered fibrils with a banded pattern observable by transmission electron microscopy. In other collagens, the COL domains are shorter and/or contain interruptions. The NC domains can represent the main part of the molecule, as for the FACIT collagen XII. Most, if not all, collagen types are recognized by specific cell receptors, such as the major ECM integrin receptors, collagen-specific discoidin domain receptors (DDR) and the transmembrane proteoglycan syndecans (Humphries et al., 2006;Xian et al., 2010;Leitinger et al., 2007). Through various interactions with these cell receptors, collagens can induce intracellular pathways directly or indirectly and regulate cell functions, such as migration, proliferation and differentiation. Certain collagens can also bind to growth factors and control their bioavailability by acting as reservoirs. The controlled release of growth factors by proteolytic activity or expression of a splice variant that does not contain the binding site controls morphogenesis, as described for the cartilage collagen II (Zhu et al., 1999).

A large spectrum of mutations in collagen genes causes inherited disorders
A myriad of mutations has been characterized in collagen genes ( Table 1). The function of the gene product and its tissue localization are criteria that lead to a number of inherited connective tissues disorders (reviewed in Bruckner-Tuderman and Bruckner, 1998;Bateman et al., 2009). Typically mutations in collagen genes are null-mutations, i.e., those resulting in the translation of an -chain that cannot assemble into a triple helix and is consequently degraded intracellularly. Null mutations reduce the overall quantity of collagen in tissue and generally cause a human disorder. Small deletions and base substitutions can lead to synthesis of a mutated -chain that is able to form a triple helix. The molecule is secreted, but its structure is compromised for supramolecular assembly, which normally occurs in the extracellular space. In fine collagen gene mutations result in defective matrix assembly and organization that in turn can affect cell function ( Figure 1). In cases of large multimeric molecules, such as collagens, dominant-negative mutations can be more deleterious than null mutations. However, a growing body of evidence shows that the synthesis of a large quantity of abnormal collagen molecules in cells during development can induce endoplasmic reticulum stress, with consequences ranging from cell recovery to death (Tsang et al., 2010). The correlation between phenotype severity and the location of a point mutation in the gene is not clear. However, a mutation located in the coding region for the aminoterminus of the fibrillar collagen triple helix generally results in a mild phenotype, whereas a mutation in the coding region for the carboxy-terminus of the molecule is often lethal. This observation may be related to the C-to N-terminus directional propagation of the triple helix and the role of the C-propeptides in -chain registration and triple helix nucleation. The nature of glycine substitution in the G-x-y repeats and the neighboring amino-acid sequence may have different biochemical and clinical consequences. These consequences include (i) delay of the triple-helix formation and over-glycosylation (Raghunath et al., 1994); (ii) alteration of procollagen processing (Lightfoot et al., 1994); (iii) retention of unfolded abnormal proteins intracellularly, leading to ER stress; and (iv) formation of abnormal unstable trimeric molecules, leading to disrupted fibrillogenesis.
The presence of a glycine in every third position is critical for triple-helix formation, since only glycine, the smallest amino acid, fits into the center of the triple helix. The majority of dominant-negative mutations in collagen genes are due to replacements of one of the glycines in the collagenous domains of the -chains with a larger amino acid. Glycine substitution mutations in collagen genes underlie heritable connective tissue diseases, such as osteogenesis imperfecta (OI), chondrodysplasias, certain subtypes of Ehlers-Danlos syndrome (EDS), or Alport's syndrome (reviewed in Bruckner-Tuderman and Bruckner, 1998;Bateman et al., 2009). Since a non-glycine amino acid does not easily fit into the interior space of the triple helix, helix formation is distorted, thereby affecting its structure and stability and impeding fibrillogenesis. Delay in triple-helix formation can result in overmodification and may affect collagen function.
Osteogenesis imperfecta (OI), also known as brittle bone disease, is caused by mutations in genes for collagen I, the most abundant collagen in organisms. OI is characterized by fragile bones that break easily and reduced bone mass. Most OI cases are believed to be associated with glycine substitution mutations in the COL1A1 or COL1A2 genes. Over 200 mutations have been reported for the COL1A1 (located on chromosome 17) and COL1A2 (located on chromosome 7) genes, which code for the collagen I pro-1 and pro-2 chains, respectively. This fact may explain the wide range of clinical characteristics and degrees of severity that are seen in the disease (Kuivaniemi et al., 1991, Byers and Steiner 1992, Dalgleish, 1998. Because collagen I is found in other tissues of the body, OI has non-skeletal manifestations as well. People with OI may also suffer from muscle weakness, hearing loss, fatigue, joint laxity, distensible skin, or dentinogenesis imperfecta. The fibril-forming collagen I is mostly synthesized as the [1(I)] 2 2(I) heterotrimer chain, though a minor form [1(I)] 3 is expressed in embryonic tissues. COL1A1 and COL2A1 are both susceptible to various mutations responsible for the production of quantitatively or qualitatively deficient fibrils. The clinical severity of OI relates to the extent of the conformational change in the collagen triple helix induced by the glycine substitution. These mutations result in altered fibrillogenesis. However, no general mechanism can be drawn from genotype/phenotype analyses.
Collagen VII, encoded by COL7A1, is the major component of the anchoring fibrils at the dermo-epidermal junction (Burgeson, 1993). COL7A1 gene mutations cause dystrophic epidermolysis bullosa (DEB), a skin-blistering disorder (Bruckner-Tuderman, 1999). Approximately 200 mutations of COL7A1 have been characterized, leading to a very high molecular heterogeneity of collagen VII defects (Dunnill et al., 1996). Almost all cases of dominant DEB are caused by a glycine substitution in the triple helical region of collagen VII, and most of the mutations are unique to individual families. Some glycine substitutions in collagen VII interfere with biosynthesis of the protein in a dominant-negative manner, whereas others may lead to collagen VII retention within the rough endoplasmic reticulum.
Mutations in the COL5A1 and COL5A2 genes, encoding respectively the pro-1 and pro-2 chains of the fibril-forming collagen V, have been identified in approximately 50% of patients with a clinical diagnosis of classic Ehlers-Danlos syndrome (EDS) (Malfait et al., 2010). Collagen V contains a third chain, pro3(V); but no mutation in COL5A3 has been reported so far. Classic EDS is a heritable disorder of connective tissues characterized by skin hyperextensibility, fragile and soft skin, delayed wound healing with formation of atrophic scars, easy bruising, and generalized joint hypermobility. The majority of mutations lead to a non-functional COL5A1 allele. One mutant COL5A1 transcript showed a premature stop codon. A minority of mutations affect the structure of the central helical domain. In approximately one-third of patients, the disease is caused by a mutation leading to a non-functional COL5A1 allele, resulting in collagen V haploinsufficiency. Structural mutations in COL5A1 or COL5A2, resulting in the production of a functionally defective protein, account for a small proportion of patients.
Collagen V is a quantitatively minor fibril-forming collagen that co-polymerizes with collagen I to form heterotypic fibrils (Fichard et al., 1995). Co-polymerisation has a critical role in the nucleation and growth of fibrils in tissues. A collagen V feature is to retain in the mature molecule a major part of the 1(V) N-propeptide which projects beyond the surface of collagen fibrils. This domain was proposed to limit heterotypic fibril growth by steric hindrance and electrostatic interactions (Linsenmayer et al., 1993). Skin biopsies revealed abnormalities in fibril formation (altered diameter, contour, or shape of dermal fibrils). However, abnormalities of fibril structure affected less than 5% of fibrils (reviewed in Fichard et al., 2003). Moreover, the clinical phenotype of classical EDS supports an important role of collagen V in the biomechanical integrity of the skin, tendon and ligaments, although collagen V is only a minor component of the affected tissues. Thus, collagen V may be involved in functions other than the control of fibril growth in classical EDS. A likely hypothesis is that collagen V might be involved in the physiopathology of EDS through interactions with other fibril-associated components and/or with cell receptors. Along this line, it has been shown that mutations in the genes for the collagen V-binding partners, tenascin-X (TNXB gene) and collagen I (COL1A1 gene), resulted in EDS (Lindor and Bristow, 2005).
Although mutant gene products are thought to impair matrix structure and assembly that eventually alters tissue function, growing evidence links ER stress and the unfolded protein response (UPR) to the initiation and progression of a broad repertoire of connective tissue disorders, including those caused by collagen gene mutations. Some mutant chains cannot be incorporated into procollagen molecules, consequently causing protein degradation with important downstream effects. Misfolded or slowly folding collagens are retained within the endoplasmic reticulum (ER) and ultimately targeted for degradation by a mechanism initially called "protein suicide." Because connective-tissue cells typically produce large quantities of collagens, the contribution of ER stress induced by misfolded collagens in disease pathogenesis has certainly been underrated. The current knowledge on the implications of unfolded protein response and ER stress in connective tissue diseases has been recently reviewed, and readers are referred to these reviews for further reading (Boot-Handford and Briggs, 2010;Tsang et al., 2010). Notably, mutations in genes encoding collagen I (COL1A1 and COL1A2) (osteogenesis imperfecta), collagen II (COL2A1) (spondyloepiphyseal dysplasia), and collagen X (COL10A1) (metaphyseal Table 1. Collagen types, associated-diseases and mouse models. Inherited Connective Tissue Disorders of Collagens: Lessons from Targeted Mutagenesis 259 chondrodysplasia) have been shown to induce ER dilatation in patient cells. Mutations that affect the triple helix, the C-propeptide for the fibril-forming collagens, and splice donor sites, as well as single amino-acid substitutions, were shown to cause ER stress. Recently, mutations that affected the signal peptide domain of the pro1(V)-collagen chain were shown to cause classic EDS. The signal peptides are the addresses of proteins destined for secretion. The mutant procollagen V is retained within the cell, leading to a collagen V haploinsufficiency and altered collagen fibril formation. It is probable that the signal peptide mutation also causes accumulation of the mutated protein within the ER and eventually to ER stress, as described for other collagen-related disorders (Symoens et al., 2009). Mutations in the three major collagen VI genes (COL6A1, COL6A2 and COL6A3) cause multiple muscle disorders, including the severe Ullrich congenital muscular dystrophy (UCMD) and the mild Bethlem myopathy, which is characterized by muscle weakness with striking joint laxity and progressive contractures. Three genetically distinct novel chains 4(VI), 5(VI), and 6(VI) have recently been identified; but very little is known about their molecular assembly and biosynthesis and their possible involvement in human diseases (Gara et al., 2011). Collagen VI biosynthesis is a complex multistep process. Monomer formation results from the heterotrimeric association of the three chains [1(VI), 2(VI), and 3(VI)] encoded by the COL6A1, COL6A2 and COL6A3 genes. Monomers first assemble into antiparallel dimers that associate laterally to form tetramers stabilized by disulphide bonds. The tetramers associate linearly to form the unique beaded filaments, the ultimate step of collagen VI biosynthesis. Dominant and recessive autosomal mutations in COL6A1, COL6A2, and COL6A3 primarily result in dysfunctional microfibrillar collagen VI in muscle extracellular matrix. However they also affect other connective tissues, such as skin and tendons. Different mutations have been shown to have variable effects on protein assembly, secretion, and its ability to form a functioning extracellular network. As observed in other collagen-related diseases, glycine-substitution mutations in COL6A1, COL6A2, or COL6A3 that disrupt the triple-helix motif constitute a frequent pathogenic mechanism. Triple-helix distortion may exert a dominant-negative effect by reducing the ability of mutated monomers to form beaded filaments. Interestingly, mitochondrial dysfunction was implicated in the pathogenesis of a myopathic phenotype. Muscles lacking collagen VI are characterized by the presence of a dilated sarcoplasmic reticulum and dysfunctional mitochondria. This condition triggers apoptosis and leads to myofiber degeneration. Recently, it was shown that the persistence of abnormal organelles and apoptosis observed in some congenital muscular dystrophies are caused by defective activation of the autophagic machinery. Autophagy has a key role in the clearance of damaged organelles and in the turnover of cell components and is thus essential for tissue homeostasis. Recently, 56 novel mutations have been described, allowing a clinical classification and revealing the complexity of genotype-phenotype relationships (Briñas et al., 2010).
The paucity of evidence-based data regarding correlations of genotype and phenotype is in part due to the large spectrum of mutations reported for the collagen genes [e.g., about 200 mutations for the collagen I genes responsible for OI (Dalgleish, 1998); 160 mutations in the COL4A5 gene encoding collagen IV 5 chain responsible for Alport syndrome; 200 mutations in COL7A1 responsible for EDB]. Things are not as simple as one gene-numerous mutationsone phenotype. Sometimes a combination of a mutation for a connective tissue disorder and a specific collagen gene mutation will result in another disease. Some patients with UCMD show clinical characteristics typical of classical disorders of connective tissue, such as EDS. Ultrastructure of skin biopsy samples from patients with UCMD showed alterations of collagen fibril morphology in skin that resemble those described in patients with EDS (Kirschner et al., 2005). Recently, using the yeast two-hybrid approach, we showed a direct interaction between collagen V and collagen VI that may nicely explain the overlap of UCMD and classic EDS (Symoens et al., 2011). Unexpectedly an arginine-->cysteine substitution localized at position 134 of the 1(I) collagen chain resulted in classical EDS (Nuyntick et al., 2000). This finding is indicative of genetic heterogeneity in collagen-related disorders.
A powerful approach to study the biochemical consequences of mutation and the protein structure/function relationship is to engineer a specific mutation into a functional domain of the molecule. Targeted mutagenesis approaches, including the use of alanine-scanning mutagenesis techniques, have led to important insights into the effects of collagen mutations on protein structure and function. A major limitation of mutagenesis strategies to investigate collagens is the large number of collagen gene mutations to be investigated in order to have a better understanding of the molecular mechanisms of "collagenopathies." Knowledge about the impact of collagen mutations has also been hampered by the technical difficulty of introducing targeted mutations of very large collagen genes into mice.

Lessons from site-directed mutagenesis of recombinant collagen genes and derived fragments
Production of a recombinant collagen gene represents a powerful technique to introduce a human mutation into the gene of interest by site-directed mutagenesis. It allows one to analyze the impact of the mutation on collagen assembly and secretion. Collagen biosynthesis is a complex multistep process that takes place in the intracellular and extracellular space and includes various post-translationnal modifications, such as prolyland lysyl-hydroxylation, glycosylation, trimerization, proteolytic processing, polymerization and cross-links. Because of recombinant technology, these large multimeric proteins have been produced in large amounts in almost all existing expression systems (Ruggiero and Koch, 2008). This technological breakthrough enabled researchers to analyze in detail the effects of collagen mutations on biosynthesis, molecular and cell interactions, processing and, in some cases, self-assembly. Researchers can also address the question of the correlation of genotype, protein structure and function.
Mutations occurring in collagen I genes are the most extensively studied mutations among all collagen types. A first set of experiments substituted glycine 859 of the pro1(I) chain with cysteine or arginine by site-directed mutagenesis to reproduce two mutations identified in OI patients. In order to study the expression of the mutant molecule in the presence or absence of the wild-type pro1(I) chain, the mutated constructs were transfected into normal fibroblasts to look for a dominant-negative effect in the presence of the wildtype gene or in fibroblasts isolated from Mov13 homozygous mice (referred to as Mov13 fibroblasts hereafter), whose cells carry a provirus that prevents transcription initiation of the natural pro1(I) gene (Schnieke et al., 1987). In agreement with observations of collagen I in OI patients, the mutated collagens were poorly secreted from the cells and exhibited reduced thermal stability and increased sensitivity to degradation. This supported the idea that the strict preservation of the G-x-y triplets is absolutely required for proper formation of the triple helix.
The integrity of the C-propeptide is pivotal for the trimerization of all fibril-forming collagens. The C-propeptides of the pro1(I) and pro2(I) chains contain an Asn-Ile-Thr sequence. That sequence fits a consensus sequence for the addition of N-linked oligosaccharides. To analyze the role of this post-translational modification, the asparagine residue of the pro1(I) chain was changed to glycine by site-directed mutagenesis. The expression of the corresponding molecule was analyzed in transfected normal and Mov13 fibroblasts (Lamandé and Bateman, 1995). The mutation did not impair heterotrimeric assembly and secretion of hybrid procollagen I into the extracelllular space. Only a slight effect on C-proteinase cleavage efficiency was observed with the unglycosylated molecule. To circumvent the difficulty of producing a large repertoire of full-length mutated collagens I in order to undertake a genotype/phenotype analysis, a recombinant trimeric minicollagen I was recently expressed in an Escherichia coli system. Recombinant mini-collagens can be obtained by fusing the sequence encoding a fragment of the pro1(I) chain triplehelix to the sequence encoding the C-terminal domain (called "foldon") of the bacteriophage T4 fibritin, which is capable of trimerization (Xu et al., 2008). Two mutations (G901S and G913S), corresponding to mild and severe types of OI, respectively, were introduced into the recombinant mini-collagen I. Biophysical measurements and protease cleavage analysis revealed that the G913S mutant chain resulted in the formation of an unstable collagen I triple helix by disrupting salt bridges important for maintaining the chains in a triple-helix conformation (Yang et al., 1997;Xu et al., 2008). A very recent study utilized a recombinant bacterial collagen to develop a mutagenesis scheme in which a glycine residue within the triple-helix sequence is substituted with arginine or serine. The purpose was to analyze the positional effect of glycine mutations on triple-helix formation and stability (Cheng et al., 2011). Interestingly, all glycine mutations provoked a significant delay in the triple-helix formation. However, a more severe defect was observed when the mutation was located near the trimerization domain of the triple-helix where folding is initiated.
COL7A1 mutations cause dystrophic epidermolysis bullosa (DEB), a skin blistering disorder. Woodley and collaborators (2008) have used site-directed mutagenesis to elucidate the effect of human mutations on the function of collagen VII, which is the major component of the epidermal anchoring fibrils. To undertake a comprehensive analysis of the impact of human mutations in the formation, folding and stability of collagen VII and, particularly relevant to the DEB phenotype, its effect on cell attachment and migration, four distinct substitutions occurring in collagen VII (G2049E, R2063W, G2569R, and G2575R) were introduced using COL7A1 cDNA. The authors demonstrated that the G2049E and R2063W mutants caused local destabilization of the triple helix and reduced the capability of collagen VII to elicit cell adhesion and migration. The G2569R and G2575R mutants interfered with triple-helix formation and stability. Alterations of protein stability and/or cell attachment to collagen VII mutants help explain the fragility of the dermal-epidermal junction observed in DEB patients. Naturally occurring COL7A1 mutations were investigated in a separate study (Hammami-Hauasli et al., 1998). As commonly described for glycine-substitution mutants of collagens, the authors showed that three glycine substitutions located in the same triplehelix portion affected folding, stability and secretion of procollagen VII in a dominantnegative manner. However, the glycine substitution G1519D located in another segment of the triple helix had no effect on procollagen VII secretion or its ability to anchor fibril assembly. These data showed that the biological impact of glycine substitutions can depend on their position within the triple helix, as shown for collagen I (Cheng et al., 2011).
Human collagen IV mutations, thought to affect the biosynthesis of this basement membrane collagen, were extensively investigated. These mutations were known to cause Alport syndrome, a severe renal disease leading eventually to kidney failure. Collagen IV chains, 1(IV)-6(IV), are encoded by 6 genes, COL4A1-COL4A6, respectively. Although mutations have also been identified in COL4A3 and COL4A4, about 30% of known missense mutations occur in the COL4A5 gene, which encodes the human 5(IV) chain. Most of them are glycine substitutions. One glycine-substitution mutation in COL4A5 could prevent correct -chain folding or/and the association with other -chains to form a stable triple helix. To address this question, the authors took advantage of the bacterial system. A DNA encoding a 22-kDa recombinant domain of the 5(IV) triple helix in its wild-type form or harboring the G1015V or G1030S mutations was expressed in E. coli (Wang et al., 2004). The recombinant wild-type and mutant proteins were purified and assayed for changes in triplehelix assembly and stability by circular dichroism. The two different glycine-substitution mutants displayed different defects in the secondary structures of their protein products that matched with the severity of the patient phenotypes. However, the use of a bacterial system to analyze the effects of specific human mutations on mini-collagen assembly and stability presents several disadvantages. Because collagens are large multimeric proteins, full-length molecules cannot be produced in a bacterial host. Most importantly, the bacterial system is limited. Not all post-translational modifications needed for the triple-helix formation and stability, such as hydroxylation, glycosylation, and disulfide-bond formation, are present in bacteria. A few years later, the bacterial limitations were bypassed by the development of the production of full-length recombinant collagen molecules in mammalian cells (Fichard et al., 1997;Ruggiero and Koch, 2008). No less than eighteen human mutations (11 substitutions and 7 deletions) were introduced into the sequence encoding the trimerization NC1 domain of the 5(IV) chain gene. The constructs were transfected into cells together with constructs containing the wild-type sequences of 3(IV) and 4(IV) chains to analyze the impact of the mutations in the NC1 domain on the formation of the 345 collagen IV heterotrimer. Twelve out of 15 mutant chains did loose their capacity to assemble into heterotrimeric molecules. The three remaining mutants formed heterotrimers, but the mutations prevented their secretion into the extracellular space (Kobayashi et al., 2008). The authors nicely demonstrated, using site-directed mutagenesis, that amino acid substitutions in the 5(IV) NC1 trimerization domain are specifically responsible for impairment of collagen IV heterotrimer assembly. This defect may be a main molecular mechanism for the pathogenesis of Alport syndrome. Interestingly, an interactome (a map of known and predicted molecular interactions, as well as phenotypic and structural landmarks) of collagen IV was recently constructed to identify functional and disease-associated domains and genotype-phenotype relationships (Parkin et al., 2011). Construction of such interactomes will greatly improve our capacity to integrate all data from different site-directed mutagenesis experiments. This advance will greatly help our understanding of the molecular mechanisms underlying "collagenopathies"; and, consequently, it may lead to the development of specific treatments.
Collagens undergo a great variety of proteolytic modifications. The fate and functions of the released fragments derived from collagens are still under intensive investigation, but the consequences of mutations in the coding regions for the cleavage sites on collagen structure, self-assembly and function have not been investigated in detail. A large repertoire of proteinases is responsible for these processing interactions. Included among such enzymes are the ADAMTS (a disintegrin and metalloprotease with thrombospondin motifs) and the BMP-1/tolloid families of metalloproteinases and more recently the furin-like proprotein convertases (Ricard-Blum and . To investigate collagen processing, fastidious extraction and purification steps were often necessary to obtain limited amounts of unprocessed proteins and enzymes with full activity in order to undertake in vitro enzymatic assays. To circumvent this problem, we recently described a new cell system allowing a rapid and straightforward analysis of processing interactions. Our system relies on the use of site-directed mutagenesis. This strategy was particularly instrumental in analyzing the complex procollagen V processing during maturation. We showed it to be unique among the fibril-forming collagens (Bonod-Bidaud et al., 2007). Collagen V is a minor fibrillar collagen that can be distinguished from the others by its capacity to control fibrillogenesis (Fichard et al., 1995). In addition this molecule undergoes a particular form of processing; and it is involved in fundamental processes, such as development and human connective tissues disorders. The pro1(V) N-terminus can be processed by the procollagen proteinases ADAMTS-2 and BMP-1 (Colige et al., 2005;Bonod-Bidaud et al., 2007), whereas the C-propeptide can be cleaved by furin and BMP-1 (Kessler et al., 2001). The pro1(V) Cpropeptide furin cleavage site, which occurs immediately downstream of the recognition sequence RTRR, was double-mutated to alanine residues (R1584A/R1585A) to abolish furin cleavage. All constructs were introduced into cells, along with a BMP-1-expressing construct; and the cleavage products were directly analyzed in conditioned medium of the transiently transfected cells. We were able to show that BMP-1 is capable of processing the 1(V) C-propeptide in absence of furin activity (Bonod-Bidaud et al., 2007). In the same way, the determinant for 1(V) N-propeptide processing by BMP-1 activity was identified by introducing in the coding region for the cleavage site (S254/Q255-D256) three single mutations (S254A, Q255A and D256A), two double mutations (S254A/Q255N and Q255A/D256A) and one triple mutation (S254A/Q255A/D256A). The data highlighted the unexpected importance of the aspartic acid in the P2' position of the BMP-1 cleavage site (Bonod-Bidaud et al., 2007). Processing, proteolytic release of functional domains and shedding of collagens are involved in fundamental processes. It is likely that substitutions located in the proteolytic cleavage sites may represent a molecular cause of connective tissues disorders. A reported mutation in the 1(V) N-propeptide in one patient with classic EDS resulted in a protein product missing the sequence of exon 5 that encompasses the BMP-1 cleavage site. The abnormal-sized N-propeptide present in the mutated collagen V caused dramatic alterations in fibril structure (Takahara et al., 2002).

Lessons from site-directed mutagenesis in mice
In vitro studies are useful and necessary approaches to understand the mechanisms of collagen biosynthesis and to establish structure-function relationship. However, they do not always reflect the normal and pathological in vivo situations. Genetically modified mice appear to be a powerful technique to better understand the physiopathology of connective tissue disorders. Several different genetically modified mice have been created during the last 10 years (reviewed in Aszódi et al., 2006). This clearly opened doors to better understand collagen function in developing tissues and provide reliable mouse models for inherited collagen diseases. Along this line, a targeted disruption of Col4a3 gene led to renal failure and eventually to the death of mice at 3-4 months of age (Cosgrove et al., 1996;Miner and Sanes, 1996). This result is consistent with defects described for Alport disease.
In most cases, the gene of interest was disrupted and knock-out mice were preferably generated. Few transgenic mice harbouring point mutations or small deletion in collagen genes have been generated (Table 1). Naturally occurring mutations in mice disrupting collagen genes have also been identified and characterized. The oim mice present a spontaneously acquired deletion in the Col1a2 gene that leads to an accumulation of [(I)] 3 collagen homotrimer in the extracellular matrix. These mice develop a phenotype similar to moderate OI in humans, providing a good model for this collagen disorder (Chipman et al., 1993). It was shown that homozygous Mov13 embryos harboring an inactivated pro1(I) chain (due to the insertion of the Moloney murine leukaemia virus into the first exon of the Col1a1 gene) died in utero around day 12 because of vascular failure (Löhler et al., 1984). However, in 1999 Forlino et al. developed the first knock-in mouse model for human OI by introducing a G349C mutation into the Col1a1 gene. Along this line, a knock-in mouse model for OI, harboring a point mutation (G610C) in Col1a2 was recently created (Daley et al., 2010). These mice had reduced body mass and bone strength and exhibited bone fracture susceptibility consistent with the clinical features of human OI. Thus, the G610C knock-in mouse represents a novel model for the study of OI pathogenesis and also for testing potential therapies for OI.
Another example concerns collagen V deficiency/dysfunction, which is responsible for Ehlers-Danlos syndrome (EDS). In the absence of the Col5a1 gene, the mice died at the onset of organogenesis at approximately embryonic day 10 (Wenstrup et al., 2004). Interestingly, a targeted deletion in the Col5a2 gene, encoding the pro2(V) chain, recapitulated many of the clinical, biomechanical, morphologic, and biochemical features of the classical EDS. The deletion removes the sequence encoding the N-telopeptide (pN), a 20-residue region that confers flexibility to the N-terminal part of the molecule (Andrikopoulos et al., 1995). A detailed study of the skin at the morphological, histological, ultrastructural and biochemical levels indicated that the Col5a2 deletion impairs assembly and/or secretion of the [1(V)] 2 2(V) heterotrimer. Consequently, the [1(V)] 3 homotrimer, and not the [1(V)] 2 2(V) heterotrimer, is the predominant species deposited into the matrix, which in turn severely impaired extracellular matrix organization (Chanut-Delalande et al., 2004). These data underscored the importance of the collagen V [1(V)] 2 2(V) heterotrimer in dermal fibrillogenesis and can explain defects observed in the dermis of EDS patients.

Concluding remarks
Site-directed mutagenesis has been extensively used in collagen engineering and has shed light on collagen structure, expression, folding, secretion, interactions and self-assembly in the extracellular space. It also opened the way for the analysis of specific functional domains. It allowed the study of the wide variety of collagen types, including those expressed in trace amounts in tissues but nevertheless display pivotal functions. While it is true that site-directed mutagenesis has yielded important information on the functional consequences of a range of collagen mutations responsible for human diseases, only few studies have approached the consequences of collagen gene mutations on cell adaptation to ER stress. Collagen gene mutations affect protein synthesis, folding and secretion imbalance, which eventually induces ER stress. In vitro studies have been done on transfected cells, in which expression and trafficking of mutant collagen can be easily manipulated and analysed at the cellular level. The effects of gene manipulation can be studied in vivo using mice. The effect of collagen gene mutations on induction of an ER stress response could be straightforwardly addressed in the near future. It may be a key factor in pathogenesis (Boot-Handford and Briggs, 2010).
Mouse models are particularly useful for analysing the biological significance of collagens in pathological situations. Knock-out mice often lead to embryonic lethality, which hampers in-depth analysis of the phenotype. A few knock-in mice have been created with subtle mutations or small deletions that reproduce human mutations. The major reason for the paucity of knock-in mice is certainly that collagen genes are very large. Thus, they are difficult to manipulate. The introduction of a small deletion or a single point mutation in murine collagen genes still represents a considerable challenge. Nevertheless, the few examples of knock-in mouse lines tend to prove that mouse models can bring new information about in vivo consequences of collagen dysfunction that cannot be predicted by in vitro approaches. Knock-in mice are also indispensable models for assessing the effects of subtle mutations on tissue function, development, and aging. They are also valuable for developing specific gene therapy approaches to combat collagen-related disorders. The combination of site-directed mutagenesis in transfected cells and knock-in approaches in mice to address the impact of specific mutations will enable us to identify mechanisms