Genome-wide analysis of the Catalpa bungei caffeic acid O-methyltransferase (COMT) gene family: identification and expression profiles in normal, tension, and opposite wood

Caffeic acid O-methyltransferase (COMT) is an important protein that participates in lignin synthesis and is associated with the ratio of G-/S-type lignin in plants. COMTs are associated with the wood properties of forest trees; however, little known about the COMT family in Catalpa bungei, a valuable timber tree species in China . We performed a comprehensive analysis of COMT genes in the C. bungei genome by describing the gene structure and phylogenetic relationships of each family member using bioinformatics-based methods. A total of 23 putative COMT genes were identified using the conserved domain sequences and amino acid sequences of COMTs from Arabidopsis thaliana and Populus trichocarpa as probes. Phylogenetic analysis showed that 23 CbuCOMTs can be divided into three groups based on their structural characteristics; five conserved domains were found in the COMT family. Promoter analysis indicated that the CbuCOMT promoters included various cis-acting elements related to growth and development. Real-time quantitative polymerase chain reaction (PCR) analysis showed differential expression among CbuCOMTs. CbuCOMT2, 7, 8, 9, 10, 12, 13, 14, 21, and 23 were mainly expressed in xylem. Only CbuCOMT23 was significantly downregulated in tension wood and upregulated in opposite wood compared to normal wood. Our study provides new information about the CbuCOMT gene family and will facilitate functional characterisation in further research.


INTRODUCTION
Caffeic acid O-methyltransferase (COMT, EC:2.1.1.68) is a lignin monomer-specific enzyme that catalyses O-methylation of the C5 hydroxyl moiety of suitably hydroxylated phenolic rings of monolignols, leading to preferential formation of syringyl subunits associated with syringyl (S) lignin monomer synthesis. Trees often have their specific S/G ratio (S-type to G-type lignin monomers) in xylem (Cai et al., 2016); this ratio may be related to the mechanical properties of wood due to the different molecular structures of the monomers (Ozparpucu et al., 2018). Higher G-unit content causes lignin to be more condensed due to larger proportions of biphenyl, phenylcoumaran, and other carbon-carbon linked units because the C5 position on their aromatic rings is available for radical coupling reactions. In contrast, S-units are usually linked by ether bonds at their available 4-hydroxy position and eventually yield linear, chemically labile lignin polymers (Ozparpucu et al., 2018). In lignin, a proper S/G ratio is beneficial for application in various wood materials as well as for plant resilience (Li et al., 2016;Wang et al., 2018a;Wang et al., 2018b). Some recent studies have focused on the effects of altering the S/G lignin ratio using genetic engineering modification methods or oversuppression of COMT gene expression in plants such as Arabidopsis (Vanholme et al., 2010), wheat (Wang et al., 2018a;Wang et al., 2018b), Miscanthus sinensis (Yoo et al., 2018) and switchgrass (DeBruyn et al., 2017;Li et al., 2017).
The COMT gene was first cloned in aspen in 1991 (Li et al., 2015a); since then, several studies have identified COMT family genes and their expressional profiles in plants for their great potential in molecular breeding, especially among important forest timber species (Xu et al., 2009;Myburg et al., 2014;Shi et al., 2010;Carocha et al., 2015). The results have suggested that the COMT gene family comprises multiple members. For example, there are seven COMT genes in Eucalyptus grandis (EgrCOMT 1, EgrCOMT 2, EgrCOMT 3, EgrCOMT 4, EgrCOMT 5, EgrCOMT 6, and EgrCOMT 7; Carocha et al., 2015), 25 in Populus trichocarpa (PtrCOMT 1-25; Shi et al., 2010), and 14 and 42 in Arabidopsis and Brassica napus, respectively (Li et al., 2016). Within these species, differential COMT gene expression is observed among different tissues. For example, in Eucalyptus grandis, EgrCOMT 1 was dramatically expressed in xylem compared to the other six COMTs, but exhibited low expression in the fruit capsule, flower buds, and shoot tips, whereas EgrCOMT 2, 3, 4, and 5 were highly expressed in flowers and fruits, but showed low expression in xylem and phloem (Carocha et al., 2015). Specific spatiotemporal expression patterns may be due to their different functions.
Catalpa bungei belongs to the Bignoniaceae family and is a valuable timber forest tree native to China. Wood products of C. bungei exhibit excellent mechanical properties that make it ideal for construction and the manufacture of upmarket wooden products (Jing et al., 2015;Shi et al., 2017;Zheng et al., 2017). COMT genes are associated with wood mechanical properties (Li, Wu & Southerton, 2011;Liu et al., 2015;Wang et al., 2018a;Wang et al., 2018b); therefore, an understanding of the COMT gene family members and their expression profiles in C. bungei would lay a foundation toward future improvement of wood properties; however, this work has not yet been conducted. In this study, we identified possible members of the COMT gene family in C. bungei and investigated their evolutionary divergence and conserved domains. Previous studies have demonstrated that tension wood (TW) and opposite wood (OW) developed under tension or compression stress generally possess physical properties that differ from those of normal wood (NW), possibly due to differences in the composition and structure of polymers that comprised of the cell wall (Chen, Chen & Zhang, 2015). Tension wood usually forms on the upper side of the bent stems and was induced by gravistimulation and has been used as a model system for the study of carbon partitioning between lignin and cellulose in trees (Zinkgraf et al., 2018). Compared to NW, TW usually has less lignin, mannose and xylose, but more glucose and cellulose (Guedes et al., 2017). To select CbuCOMT genes associated with wood properties, the expression of 23 CbuCOMT genes in tension, normal, and opposite wood was detected by quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis. The results of this study will provide a foundation for future functional studies of CbuCOMT genes.
Pairwise identity and similarity scores of the identified CbuCOMT s were calculated using the MatGAT v2.0 software. The theoretical isoelectric point (pI) and molecular weight (MW) of the COMT family genes in C. bungei were determined using the Expert Protein Analysis System (ExPASy, http://cn.expasy.org/).
We selected 1-year-old field-grown C. bungei clone ''9-1'' trees (height: 3-3.5 m, diameter at breast height (DBH): 2-2.5 cm) for the induction of TW and OW by bending and fixing the stems using nylon ropes to maintain the breast height (ca. 1.5 m) point of the stem at an angle of ca. 45 • . Bending treatment was applied from April 19 to July 20 (the sample collection date) in 2017, during active cambial growth. Stem pieces were isolated at the bending point using lopping shears. TW (upper side) and OW (lower side) were collected from the same stem section using a sharp chisel after removing the bark, phloem, and cambium following the method of Li, Yang & Wu (2013). The breast height points of upstanding trees were selected to represent NW and sampled simultaneously. All samples were collected in the morning, cut to ca. 2 × 1×4 mm, and then immediately stored in liquid nitrogen.

Sequence alignment, phylogenetic analyses and gene structure determination
Multiple sequence alignment was conducted using the DNAMAN software v. 6.0 (Lynnon Biosoft, Quebec, QC, Canada). Phylogenetic trees were constructed using the MEGA7.0 software. The phylogenetic trees were constructed using maximum likelihood (ML) method with the following parameters: and a bootstrap test with 1,000 replications, Poisson model, uniform rates, partial deletion of gaps and nearest neighbor interchange (Kumar, Stecher & Tamura, 2016). Full-length amino acid sequences of these genes were aligned using the ClustalW program under the default settings. We aligned the coding sequences to their corresponding genomic sequences to obtain the exon-intron structures of the COMT genes. A graph of the exon-intron structures was prepared using the online Gene Structure Display Server (GSDS, http://gsds.cbi.pku.edu.ch; Hu et al., 2015). The MEME web tool (http://meme-suite.org/) was used to search for motifs among all CbuCOMT genes. The number of motifs was set to eight (Kasirajan & Aruchamy, 2015).

Analysis of regulatory elements in the promoter regions of CbuCOMT genes
The elements in the promoter fragments of the CbuCOMT genes (1,500 bp upstream of the translation initiation sites) were identified using the online program PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/).

RNA isolation and qRT-PCR
Total RNA was extracted using the RNeasy Plant Mini Kit (Tiangen) according to the manufacturer's instructions. First-strand cDNA synthesis was performed using ca. 2 µg RNA using the PrimeScript II 1st Strand cDNA Synthesis Kit (TaKaRa, Kyoto, Japan) according to the manufacturer's protocol. Gene-specific primers (Table S1) with melting temperatures of 58-62 • C and amplification lengths of 150-260 bp were designed using the Primer 5.0 software (Applied Biosystems, Life Technologies, New York, NY, USA). qRT-PCR was performed as follows: initial denaturation at 95 • C for 30 s, followed by 40 cycles of 5 s at 95 • C and 30 s at 60 • C, then one cycle of 5 s at 95 • C, 60 s at 60 • C, and a final stage at 95 • C (acquisition mode: continuous; five acquisitions per • C). RT-qPCR was performed using a Roche LightCycler 480 System (Roche, Basel, Switzerland) using the SYBR Premix Ex Taq Kit (TaKaRa) and an internal control (actin) primer pairs (Table S1) were selected (Jing et al., 2015). All reactions were conducted with four technical replicates and three biological replicates. Results obtained for different tissues were standardised to the levels of actin using the 2 − CT method. The data were statistically analysed by one-way ANOVA using SPSS 19.0 (SPSS Inc., Chicago, IL, USA). For the gene expression differences between OW, NW and TW, we employed a fold change not less than 2 as significant differential expressed genes.

Genome-wide identification of the COMT gene family in C. bungei
To identify COMT genes in C. bungei, we performed a BLASTP search against the C. bungei protein database using COMT conserved domain sequences and amino acid sequences from Arabidopsis and Populus trichocarpa. After removing sequences lacking the functional domain using InterProScan and SMART, a total of 23 genes encoding putative COMT proteins were identified and named as CbuCOMT 1 to CbuCOMT 23 (Table 1).

Sequence features and sequence similarities among CbuCOMT s
Sequence feature analysis of COMT genes suggested that CbuCOMT gene lengths varied from 1,132 bp (CbuCOMT 7) to 5,870 bp (CbuCOMT 22) and the lengths of open reading frames (ORFs) ranged from 579 bp (CbuCOMT 7) to 1,482 bp (CbuCOMT 22), with deduced amino acid sequence lengths varying from 193aa to 493aa (Table 1). Phylogenetic analysis of the 23 CbuCOMT s indicated that the CbuCOMT genes could be classified into three groups: group I was composed of 12 CbuCOMT proteins; CbuCOMT 1, CbuCOMT 2, and CbuCOMT 19 belonged to an independent branch, group II; the remaining CbuCOMT genes comprised group III (Fig. 1). A matrix of amino acid sequence similarity for the CbuCOMT gene family is presented in File S1. A high percentage of amino acid identity and similarity was observed between CbuCOMT 5 and CbuCOMT 21 (91.3 and 94.9%, respectively), CbuCOMT 14 and CbuCOMT 22 (88.4 and 92.8%, respectively), CbuCOMT 4 and CbuCOMT 15 (94.1 and 97.2%, respectively), and between CbuCOMT 18 and CbuCOMT 19 (83.6 and 92.1%, respectively). However, most CbuCOMT pairs showed low amino acid sequence identity and similarity (<80%).
To gain further insight into the structural diversity of CbuCOMT genes, full-length cDNA sequences were compared with the corresponding genomic DNA sequences to determine the numbers and positions of exons and introns within the genomic DNA (Fig. 1). The number of introns fluctuated markedly from zero to five. CbuCOMT 2 contained no introns, whereas CbuCOMT 9 had five, indicating that gene structure may not be conserved among CbuCOMT family members (Fig. 1).

Motifs and cis-regulatory elements in the promoter regions of CbuCOMT
After performing a search using the MEME motif search tool, eight consensus motifs were detected among the CbuCOMT s (Fig. 5). Most CbuCOMT s possessed motifs 1, 2, and 3. Notably, all five conserved domains were totally or partly contained within the five motifs; for example, conserved domain I (LVDVGGGxG), which is an S-adenocyl-L-methionine (SAM) binding domain, was included in motif 2 and found in many COMT genes. In addition, three COMT-specific catalytic residues were found in motifs 1 (histidine, H), 4 (glutamic acid, E), and 8 (glutamic acid, E) (Li et al., 2015b).
To identify the likely cis-regulatory elements (CREs) of CbuCOMT s, the promoter regions of the CbuCOMT genes were used to search the PlantCARE database; the results are listed in Table 2. A series of CREs involved in developmental processes such as circadian motifs and the Skn-1 and GCN4 motifs found in the promoters of the CbuCOMT 14, 20, and 8 genes. Additionally, Box I, Box 4, the ATCT motif and other light response-related motifs were also found among the promoters of CbuCOMT genes, indicating that the expression of CbuCOMT genes may be regulated by light. Notably, several hormone response motifs were also identified. For example, ABRE (abscisic acid [ABA]-responsive element), GARE-motif, P-box, and TATC-box (gibberellic acid-responsive elements), ERE (ethylene-responsive element), and TCA (salicylic acid-responsive element), and TGA (auxin-responsive element) were found among the promoters. In addition, several CREs related to abiotic stress responses were found in the promoters of CbuCOMT genes; for example, HSE was present in the promoters of 14 CbuCOMT genes. The anaerobic induction element (ARE), defence and stress-responsive element (TC rich), and MYB binding sites involved in drought inducibility (MBS) were found in 15, 14, and 13 CbuCOMT gene promoters, respectively.

Expression profiles of CbuCOMT genes in various tissues
To identify the expression patterns of CbuCOMT s, qRT-PCR was performed on the 23 CbuCOMT s in six different tissues, the leaves, bark, xylem, phloem, branches and flowers of 8-year-old C. bungei plants. To verify the specificity of each primer for qRT-PCR, the primers were checked using BLAST against the genome sequence of C. bungei, and qRT-PCR products were sequenced. The qRT-PCR results are shown in File S2. COMT genes exhibited markedly different expressional profiles; for example, CbuCOMT 10, and 23 were highly expressed in xylem, but low expression was observed among the other five tissues, whereas CbuCOMT 17 and 18 were more highly expressed in leaves, young stems, and phloem, but showed very little expression in flowers and bark. Among the 23 CbuCOMT s, CbuCOMT 1, 3, 6, 7, 8, 13, 14, 16, 19 showed extremely low expression in all six tissues, possibly due to spatially or temporally specific expression. Expression levels of CbuCOMT 1 and 20 were much higher in flowers than in other tissues.

Expression profiles of CbuCOMT s in TW, NW, and OW
The relative expression levels of the 23 CbuCOMT genes in TW, NW, and OW are shown in Table 3. The qRT-PCR results showed that CbuCOMT 3, 6, 13, 16, 19, and 22 were not expressed in xylem (TW, NW, and OW), and CbuCOMT 11 was not expressed in TW. Although the expression levels of most CbuCOMT genes changed following the bending treatment, only CbuCOMT 10 and 23 showed significant differences (|log 2 FC| ≤1). The

DISCUSSION
COMT family genes are ubiquitous among plant species and have been surveyed at the whole-genome level in several species including A. thaliana, Brassica napus, Brachypodium distachyon, Oryza sativa, Populus trichocarpa , and others. In this study, a total of 23 COMT genes were identified based on the Catalpa bungei genome and named as CbuCOMT 1 to CbuCOMT 23. Genomic DNA, CDS lengths, and deduced amino acid sequences were variable among these genes (Table 1), which led to changes in the theoretical MW and pI of CbuCOMT members. The structure of the phylogenetic tree obtained from the alignment of AtCOMT s, PtCOMT s, and CbuCOMT s indicates that COMT genes can be divided into five major groups, as reported in a previous study (Li et al., 2016). The functions of COMTs have been identified, and they play vital roles in S-type lignin production; however, to date, their roles in C. bungi have remained unclear. Therefore, we preliminarily predicted the functions of CbuCOMT s based on the homology of other COMTs. According to the phylogenetic tree, AT5G54160.1 belonged to group III and was closely related to CbuCOMT 3,7,8,9,10,11,14,and 22, indicating that CbuCOMT s belonging to group III may have similar functions. AT5G54160.1 encodes a caffeic acid O-methyltransferase that participates in S-lignin synthesis (Moinuddin et al., 2010;Vanholme et al., 2010) and melatonin synthesis by catalysing N-acetylserotonin methylation in Arabidopsis ((Byeon et al., 2014)). It has been reported that some COMT genes are involved in lignin biosynthesis and other processes such as glucosinolate metabolism. At1g21100, At1g21110, At1g21120, At1g21130, and At1g76790 mainly participate in glucosinolate metabolism and other  processes (Li et al., 2016). These genes clustered together with CbuCOMT s in group III, indicating that CbuCOMT s in this group may also participate in glucosinolate metabolism. Except for CbuCOMT 3, CbuCOMT s in group III also clustered with PtrCOMT 2, of which protein abundance and COMT total activity gradually and continuously increased during the lignification of Populus early stems, indicating that PtrCOMT 2 may participate in the lignification of Populus stems (Liu et al., 2015). In our study, CbuCOMT 7, 9, and 10 from group III were more highly expressed in xylem than in the other five organs, suggesting that they may be involved in the lignification of xylem in C. bungei. CbuCOMT s in groups I and II clustered with PtrCOMT s. The downregulation of COMT expression in poplar has been explored by several researchers. Jouanin et al. (2000) found a COMT downregulated transgenic poplar line with almost no COMT activity. In this line, the lignin level was reduced by 17% and lignin structure was strongly altered, with a 200% increase in condensed bond content and a nearly complete lack of S-lignin. In Populus trichocarpa, 25 COMT genes were identified by Shi et al. (2010); the expression profiles of 25 PtrCOMT differed significantly from each other. PtrCOMT 2 was highly expressed in leaves, shoots, phloem, and xylem, whereas PtrCOMT 19 had only slight or no expression in these organs, indicating that, although these genes belong to the same family, they may have different functions . This finding is also supported by a study of the AT4G35160.1 gene, which was identified as a COMT gene based on functional domain analysis, but was later shown to mainly participate in melatonin synthesis with very little caffeic acid O-methyltransferase activity in Arabidopsis (Lee et al., 2014). Notably, CbuCOMT 23 expression was dramatically higher than that of the other 22 CbuCOMT s in NW (xylem), indicating that CbuCOMT 23 may have important functions in xylem. TW usually contains more cellulose and less hemicellulose and lignin than does NW, whereas OW has greater lignin content. Thus, many researches have conducted microarray analysis of artificially bent trunks to identify genes that participate in the synthesis of lignin, phytohormone, cellulose, and many other components (Chen, Chen & Zhang, 2015). In our study, CbuCOMT 23 expression decreased under tension stress (TW) and increased in OW, implying that CbuCOMT 23 gene may be involved in lignin synthesis in xylem, and associated with C. bungei wood properties. The expression of CbuCOMT 10 showed significantly reduce in TW, suggesting CbuCOMT 10 may be involved in the declined lignification of TW. However, these hypotheses must be verified through further experiments. In addition, the expression of genes were normalized by a reference gene, not absolute quantification, which is just a relative quantification. In later study, we will further quantify the COMT genes using RNA-seq or absolute quantification.
Fewer 3.48 Gb). The number of COMTs did not increase with the enlargement of the genomes, possibly due to the unusual expansion and contraction history of the COMT gene family in these species. The phylogenetic tree indicated that C. bungei may have a closer genetic evolutionary relationship with Sesamum indicum than with the other plants investigated in this study; this result was similar to those of our studies on other genes (unpublished data, the relevant paper is under preparation now).
In our study, most CbuCOMT s had one to three introns, and within the same phylogenetic group, members generally had similar exon-intron structures. For example, CbuCOMT s in group II had one intron or none, whereas CbuCOMT s in group III had two or more introns, indicating that the evolution of the COMT gene family may be closely related to the diversification of gene structures. Similar results have been obtained in other gene families . It has been reported that genes with lower intron densities are rapidly expressed after induction because introns may affect expression efficiency through at least three possible mechanisms, delaying transcript production by (1) splicing, (2) additional length of the nascent transcript, or (3) increased energetic cost due to increased transcript length (Jeffares, Penkett & Bähler, 2008). In our study, most CbuCOMT s clustered in groups I and II had fewer introns than CbuCOMT s from group III, suggesting a possible faster response to induction, however, it still needs to be further demonstrated.
The regulation of gene transcription is a complex process involving various proteins bound in a sequence-specific manner to cis-regulatory elements present in the promoter regions. In our study, numerous cis-regulatory elements related to light response were found; among these, GATA-motifs (Argüello-Astorga & Herrera-Estrella, 2018), I-box (Donald & Cashmore, 1990), GT1-motif (Gao et al., 2004), and G-box (Giuliano et al., 1988) are essential for light-mediated transcriptional activity. The results of our study suggest that CbuCOMT genes may regulate S-type lignin synthesis by interacting with light-inducible proteins and that 14 CbuCOMT gene promoters also contained motifs for circadian cycles. In higher plants, the expression of a large number of genes is under circadian regulation, including genes associated with photosynthesis, starch mobilising enzymes, and some metabolic pathways. Hormones are key regulators of plant growth and development. ABA-, auxin-, ethylene-, GA-, and salicylic acid-responsive elements were found in the promoters of CbuCOMT genes, indicating a potential role for hormones in the regulation of CbuCOMT genes, in agreement with Kim et al. (2013), who demonstrated that the expression of a COMT gene from kenaf (Hibiscus cannabinus) increased after 6 h of treatment with salicylic acid. They also found that the COMT gene could be induced by cold, H 2 O 2 , and salt (Kim et al., 2013), which indicated that the transcription of COMT genes may be induced by both hormones and some abiotic stresses. Li et al. (2016) found that some COMT family genes in Brassica napus had higher expression levels under drought treatment than under non-stressed conditions (Li et al., 2016). Similarly, Li et al. (2015a) demonstrated that the expression of a COMT gene in Ligusticum could be significantly induced by cold and drought, but not salt (Li et al., 2015a). In our study, motifs related to stress, such as ARE, LTR, and MBS were present in some COMT promoter sequences, indicating that these COMT genes may responsible for sensing environmental stresses. Heat-responsive cis-acting regulatory elements were found in some CbuCOMT gene promoters, suggesting their possible roles in heat stress response and MBS-conferred drought response in plants (Asif et al., 2014; Table 2). Other CREs such as Box-W1, TC-rich repeats, ARE, and LTR are also involved in stress response . Our results indicate that CbuCOMT gene expression may be induced by abiotic stresses; however, this finding must be further studied.

CONCLUSION
A relatively complete basic analysis of COMT gene family members in C. bungei was performed in this study. We identified 23 genes as putative CbuCOMT genes and their expression levels in six different C. bungei tissue types were assessed using qRT-PCR. Distinctly different expression profiles among members of the CbuCOMT gene family suggest that these genes may play different roles in development. Our results provide a foundation for elucidating the functions of CbuCOMT family genes; however, further study of each family member using genetic modification is essential to resolve their specific functions.