Genome-wide identification and function analysis of HMAD gene family in cotton (Gossypium spp.)

Background The abiotic stress such as soil salinization and heavy metal toxicity has posed a major threat to sustainable crop production worldwide. Previous studies revealed that halophytes were supposed to tolerate other stress including heavy metal toxicity. Though HMAD (heavy-metal-associated domain) was reported to play various important functions in Arabidopsis, little is known in Gossypium. Results A total of 169 G. hirsutum genes were identified belonging to the HMAD gene family with the number of amino acids ranged from 56 to 1011. Additionally, 84, 76 and 159 HMAD genes were identified in each G. arboreum, G. raimondii and G. barbadense, respectively. The phylogenetic tree analysis showed that the HMAD gene family were divided into five classes, and 87 orthologs of HMAD genes were identified in four Gossypium species, such as genes Gh_D08G1950 and Gh_A08G2387 of G. hirsutum are orthologs of the Gorai.004G210800.1 and Cotton_A_25987 gene in G. raimondii and G. arboreum, respectively. In addition, 15 genes were lost during evolution. Furthermore, conserved sequence analysis found the conserved catalytic center containing an anion binding (CXXC) box. The HMAD gene family showed a differential expression levels among different tissues and developmental stages in G. hirsutum with the different cis-elements for abiotic stress. Conclusions Current study provided important information about HMAD family genes under salt-stress in Gossypium genome, which would be useful to understand its putative functions in different species of cotton. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03170-8.


Background
Halophytes are ideal candidate crop for soil reclamation of heavy metal polluted soils [1]. Heavy metals (HMs), on the one hand, as micronutrient elements level (such as Fe, Cu, Zn, Co, Mn, Mo, Ni) is essential for the plant growth while become toxic in excess; on the other hand, other heavy metals (Ag + , Cd 2+ , Pb 2+ , Hg 2+ ) even at low doses, are highly toxic because of no need for life and biological roles [2]. HMs contamination significantly affects not only the plant itself, but also the soil microbial community structure and function [3][4][5]. Heavy metal stress mainly concentrated in the signaling networks of calcium signaling, hormone signaling and MAPK (mitogen activated protein kinase) signaling and peroxide, which focused on ion detoxification and transport [6,7]. Metal chelators is majorly Phytochelatins (PCs) and Metallothioneins (MTs), although MTs protects the plant from heavy metals by scavenging of the ROS and sequestration, even which is multi-resistant under abiotic stress such as cold, heat, salt, drought and so on [8,9]. Compared to metal chelators, prominent groups of heavy metal ion transport families are P-type ATPases and the cation antiporters, such as HMA (Heavy metal ATPase), ABC (the ATP-binding cassette), NRAMP (Natural resistance and macrophage protein), CDF (Cation Diffusion Facilitator), yellow-stripe-like (YSL) transporter, ZIP (the Zrt, Irt-like proteins), CAX (the cation exchanger), CTR (the copper transporters), pleiotropic drug resistance (PDR) transporters, and metal responsive transcription factor 1 (MTF-1), distributing at plasma membrane or on tonoplast membrane of cell [10][11][12][13][14]. For HMA hyperaccumulators, vacuolar compartmentalization and HMs ion long-distance translocation that depends on P-type ATPases and a set of tonoplast transporters play important role in heavy metals homeostasis [15][16][17].
HMA (Heavy Metal ATPase) belonging to P1B-type ATPases (also called CPx-ATPases), is responsible for ion detoxification/transport [29][30][31] and vacuolar compartmentalization [32]. It is interesting in double mutant that HMA not only affects the transport of heavy metals [33], but also affect the plant growth and development [33]. And in rice, the DNA methylation state was altered in response to the heavy metal stress and showed transgenerational inheritance [34]. In Sorghum bicolor, arsenic stimulates expression of the P1B-ATPase transporter through the abscisic acid signaling pathway. In addition, Antioxidant Protein1 (OsATX1), as a Cu chaperone in rice, interacts with the P1B-ATPases HMA4, HMA5, HMA6, and HMA9, resulting in Cu trafficking and distribution in order to maintain Cu homeostasis in different rice tissues [35]. In a model of semihalophyte M. crystallinum, HMA4 (heavy metal ATPase 4) and IRT2 (iron-regulated protein 2) had a significantly higher expression level compared to the control between Cd-untreated and NaCl-untreated, and effects on IRT2 expression were cumulative [36]. Moreover, salinity stress overlaps with HMs toxicity to some extent, as several integrated mechanical and chemical signals are responsible for stress-related responses [37]. For example, chloroplast and chlorophyll content can measure salt stress [38], also affect the transport of heavy metals [39,40]. Even flavonols have shown the ability in alleviating toxic effect of Pb and improving the resistance of plants, because it activated anti-oxidative process [41].
The relationship between salt and heavy metals needs more research to show that the combined application of NaCl and CuSO 4 has a significant adverse effect on wheat varieties [101]., while in cucumber (Cucumis sativus L.), salinity decreases the content of Zn uptake and increased other heavy metals (Cd, Cu, Ni, Pb) uptake [102]. What is more, there is an antagonistic effect of sodium chloride to differential heavy metal toxicity, especially to Cd 2+ [103]. In Spirodela polyrrhiza (Lemnaceae), a high level of salinity inhibits the accumulation of the cadmium (Cd) and nickel (Ni) [104]. Ni at 20 mg kg-1 will increase the growth of wheat by alleviating salinity stress [105]. Additionally, Cd inhibited the Cu absorption of the root system [106], and cadmium was more toxic than copper on plants [107]. So far, the most researches about microorganisms have been reported both salt-tolerant and heavy-metal resistant, some of which can alleviate the heavy metal and salt stress in plants [108][109][110][111]. In addition, halophytes [112] and semi-halophyte [36] is known to be related to both salt and heavy metals. Besides, the eggplant breeding lines resistant against salt and drought stresses had higher Pb tolerance [113]. In willow species, S. linearistipularis had higher salt tolerance than S. matsudana, which plays important roles in heavy metal phytoextraction [106,114].
Cotton (Gossypium L.), as a moderately salt-tolerant cash crop, is a pioneer crop for soil reclamation of saline-alkaline land [115,116]. And cotton is an important fiber crop which provides the natural fiber for the textile industry [117]. Previously, much progress has been made in the identification of HMAD (heavy-metalassociated domain) genes in different plants [118][119][120]. However, there are no detail study has been reported in the identification, functional characterization, conserved domain analysis and expression profiles of the HMAD genes under salt-stress condition in cotton. The released genome sequence data of cotton and a publicly available database on CottonGen (https://www.cottongen.org/) allow us to comprehensively identify and analyze the HMAD gene family in cotton [117]. In this study, we conducted a comprehensive identification of HMAD genes in G. hirsutum, G. barbadense, G. raimondii and G. arboreum, with their chromosomal distribution, syntenic analysis, gene structure and conserved motifs analysis, as well as Ka/Ks values and expression pattern. In addition, predicted regulatory mechanism showed 111 HMAD genes were possibly regulated by salt-stress. This study will provide the basic information to explore the specific functions of HMAD gene family in cotton under salt-stress.

Genome-wide identification and phylogenetic analysis
We used the Hidden Markov Model (HMM) profile of HMAD domain (PF00403) from Pfam (http://www.pfam. sanger.ac.uk/) database as queries to search the HMAD members in G. hirsutum, G. arboreum, G. raimondii and G. barbadense by Hmmer software with default parameters. A total of 169 proteins were identified belonging to the HMAD gene family in G. hirsutum with the number of amino acids ranged from 56 to 1011 (Table 1). Furthermore, we identified 84, 76 and 159 HMAD proteins in each G. raimondii, G. arboreum and G. barbadense, respectively (Table S1).
In order to explore the evolutionary relationships of the HMAD gene family, an unrooted phylogenetic tree was constructed using the full length HMAD protein sequences from G. arboreum, G. barbadense, G. raimondii, G. hirsutum (Fig. 1). The HMAD proteins in the four Gossypium species were divided into five groups (I, II, III, IV, Va, Vb, Vc), which the P1B-ATPases HMA5-8 belongs to IV group (Table S3). Additionally, 87 orthologs of HMAD genes ( Table 2) were identified in four Gossypium species (I account for 18.39%, II account for 18.39%, III account for 1.15%, IV account for 10.34%, Va account for 1.15%, Vb account for 20.69%, Vc account for 29.89%) ( Fig. 1), such as genes Gh_D08G1950 and Gh_A08G2387 of G. hirsutum are orthologs of the Gorai.004G210800.1 and Cotton_A_25987 gene in G. raimondii and G. arboreum, respectively.

Chromosomal distribution and syntenic analysis
Physical mapping of the 169 G. hirsutum HMAD genes showed that 79 and 77 HMAD genes were variably distributed on 26 chromosomes of the A and D subgenomes, respectively (Fig. 2), among which 13 genes localized in scaffold. Additionally, a maximum of 17 and 16 genes were localized on the paralogous chromosome 12 of the A sub-genomes and D sub-genomes. Moreover, there were nine pairs and two gene clusters were marked as tandem duplication based on the criteria of less than five intervening genes. Among these tandem duplication genes, five pairs and two clusters belonged to group Vb except of Gh_D05G1684 -Gh_D05G1685 and Gh_A05G1510 -Gh_A05G1511pairs, which belonged to group III. To study the locus relationship of orthologs between the A and D sub-genomes, we also performed synteny analysis. 72 and 73 HMAD genes    were unevenly mapped onto 13 chromosomes of G. arboreum and G. raimondii, respectively. In G. arboreum, each chromosome contained 2-11 HMAD members. Chromosome 12 contained 11 HMAD genes, while chromosome 5 and chromosome 8 had two HMAD genes, respectively. And one gene of G. arboreum on chromosome 12 correspond to Gh_Sca013298G01 in scaffold13298 (Fig. 3). In G. raimondii, the number of each chromosome genes ranged from 1 to 15 HMAD members. Chromosome 8 contained maximum 15 HMAD genes, while chromosome 13 had only one HMAD gene. Otherwise, one gene of G. raimondii on chromosome 6 correspond to Gh_Sca004952G01 in scaf-fold4952 (Fig. 3). The result of synteny analysis indicated that most of the HMAD genes loci were highly conserved between the A and D sub-genomes respectively (Fig. 3), and 15 genes were lost during evolution, among which 4 in A genome (Cotton_A_04626, Cotton_A_25931, Cotton_A_00150, Cotton_A_35231), 11 in D genome (Gorai.001G250300.1, Gorai.005G218500.1, Gorai.005G220100.1, Gorai.007G134300.1, Gorai.007G295300.1, Gorai.008G005700.1, Gorai.009G162900.1, Gorai.009G199900.1, Gorai.009G414800.1, Gorai.012G027800.1, Gorai.008G245900.1). We surveyed the collinear relationship among the orthologous HMAD genes between G. barbadense and G. hirsutum (Fig. S2). There were 161 pair genes in G. barbadense and G. hirsutum. In G.

Analysis of gene structure and conserved motifs
Gene structure is important to determine its role in showing the phylogenetic relation between the HMAD genes. A NJ tree was generated with MEGA using all the HMAD protein sequences from G. hirsutum and gene structure was determined (Fig. 4). As shown in the Fig.  4, HMAD genes from G. hirsutum were divided into five subclades (group I, group II, group III, group IV, group Va and group Vb, among which, group I contained 13 genes while group II to group Va and group Vb contained 66, 29, 14, 22 and 25 genes, respectively. Furthermore, the analysis of gene structure showed that the introns in the gene structure are particularly variable among of HMAD gene family, which include 5 genes (Gh_D01G1640, Gh_Sca011408G01, Gh_A05G3385 of group I, Gh_D08G1263 and Gh_A08G0990 of group Vb) without intron, 35 genes with 1 intron, 79 genes with 2 introns, 23 genes with 3 introns, 17 genes with 4 introns, one gene (Gh_A05G0564 belonging to P1B-ATPases HMA5) with 7 introns, 3 genes (Gh_D05G0693 belonging to P1B-ATPases HMA5, Gh_A12G0443 belonging to P1B-ATPases HMA7, Gh_D12G0446 belonging to P1B-ATPases HMA7) with 8 introns, one gene (Gh_ D07G0041) with 15 introns, 3 genes (Gh_D06G0881 belonging to P1B-ATPases HMA8, Gh_A06G0745 belonging to P1B-ATPases HMA8 and Gh_A03G1525 belonging to P1B-ATPases HMA6) with 16 introns. Gh_ D06G0881 and Gh_A06G0745 was divided into cluster I between the four Gossypium species (Fig. 1), and in G. hirsutum (Fig. 4). Gh_A03G1525 was divided into cluster 1 between the four Gossypium species (Fig. 1), whereas it was grouped into cluster II in G. hirsutum (Fig. 4). Gh_A12G0443, Gh_D12G0446, Gh_D05G0693, Gh_ A05G0564 was divided into cluster I between the four Gossypium species (Fig. 1), whereas it was grouped into cluster III in G. hirsutum (Fig. 4). Though the number of genes used for generating this phylogenetic tree was different from the phylogenetic tree shown in Fig. 1, the gene members within the subclades were nearly same.
To investigate the presence of domain sequence and the degree of conservation of the HMAD domain in G. hirsutum, we performed multiple sequence alignment by using the full-length sequences of the HMAD family proteins. The result of different HMAD protein groups indicated that five conserved motifs were identified in the sequences of HMAD family proteins, and the order of motifs on each family protein was not exactly the same (Fig. 5a). In addition, we also analyzed the conserved HMAD domain in all family proteins by multiple sequence alignment, and found a highly conserved motif presence in the domain (Fig. 5b), in which, an anion binding box (CXXC) exist in the catalytic center. Consistent with previous studies [122,123], the anion binding box with two conserved cysteines as the metal binding.
Based on the Ka/Ks ratio, it could be assumed that Darwinian positive selection was linked with the HMAD gene divergence after duplication [124,125]. In our study, we found that 79 genes pairs had low Ka/Ks ratios (smaller than 0.5) and 24 gene pairs had the Ka/Ks ratios between 0.5 and 1.0. And 13 genes pairs had Ka/Ks larger than 1, might be due to relatively rapid evolution following duplication ( Table 2). As most of the Ka/Ks ratios were smaller than 1.0, we presumed that the cotton HMAD gene family had undergone strong purifying selection pressure with limited functional divergence that occurred after segmental duplications and whole genome duplications (WGDs).

Expression profile of HMAD genes across different tissues and different stress conditions in TM-1
To understand the temporal and spatial expression patterns of different HMAD genes, publicly deposited RNAseq data was used to assess the expression profile across different tissues (root, stem, leaf, torus, petal, stamen, pistil, calycle), developmental stages (−3dpa ovule, −1dpa ovule, 0dpa ovule, 1dpa ovule, 3dpa ovule, 5dpa ovule, 10dpa ovule, 20dpa ovule, 25dpa ovule, 35dpa ovule, 5dpa fiber, 10dpa fiber, 20dpa fiber, 25dpa fiber) and stresses treatment (1 h treated with cold, 3 h treated with cold, 6 h treated with cold, 12 h treated with cold, 1 h treated with hot, 3 h treated with hot, 6 h treated with hot, 12 h treated with hot, 1 h treated with salt, 3 h treated with salt, 6 h treated with salt, 12 h treated with salt, 1 h treated with PEG, 3 h treated with PEG, 6 h treated with PEG, 12 h treated with PEG). Results showed that the 169 genes can be divided into 10 groups, which include cluster 1 with two genes (Gh_ A08G1780, Gh_D08G2126), cluster 2 with two genes (Gh_A05G3446, Gh_D04G0145), cluster 3 with just one gene (Gh_A01G1576), cluster 4 with two genes (Gh_ Fig. 2 Mapping of the HMAD genes in the chromosomes. Partial HMAD genes localized in scaffolds. White color bar indicated the chromosomes from At and Dt sub genomes of G. hirsutum. At_chr1-At_chr13 represented the chromosomes from At sub genome while Dt_chr1-Dt_chr13 represented the chromosomes from Dt sub genome. Genes' chromosomal locations calculated from published genome data were presented at the left side of each chromosome of At and Dt sub genome; and corresponding gene names were written at the right of each chromosome of At and Dt sub genome D01G1883, Gh_D12G1886), cluster 5 with just one gene (Gh_D03G0414), cluster 6 with just one gene (Gh_ D09G0521), cluster 7 with three genes (Gh_A01G1399, Gh_D01G1640 and Gh_Sca011408G01), cluster 8 with two genes (Gh_D04G0001, Gh_Sca013298G01), cluster 9 with four genes (Gh_A12G2296, Gh_D10G2047, Gh_ D12G2433 and Gh_D12G2434), cluster 10 with 151 genes (Fig. S3).
In cluster 1, the expression level was higher in torus, ovule development every once day, 25dpa fibers and stresses treatment after 6 h. In cluster 2, the expression level was all high (except petals, −3dpa ovule, −1dpa ovule, 0dpa ovule, 1dpa ovule, 3dpa ovule, 5dpa ovule, 10dpa ovule). In cluster 3, the expression level was higher in calycle tissue and stresses treatment after 1 h, which decreased gradually. In cluster 4, the expression level was higher in calycle tissue, 1 h treated with cold, 1 h treated with hot, 1 h treated with salt, 3 h treated with salt, 1 h treated with PEG, 3 h treated with PEG. In cluster 5, the expression level was higher in root tissue and 1dpa ovule. In cluster 6, the expression level was higher in pistil tissue and ovule development especially at 3dpa, 5dpa and 35dpa. In cluster 7, the expression level was higher in leaf tissue and ovule development especially at -1dpa ovule. In cluster 8, the expression level was higher in root, petal, stamen and pistil. In cluster 9, the expression level was higher in torus tissue, calycle, 6 h treated with hot, 6 h treated with salt and 12 h treated with PEG. In cluster 10, there was the largest number of 151 genes, but most of whose expression level were low or even none. While some genes expression level is different, for example, Gh_D05G1684 highly expressed in the 10dpa in fiber. Interestingly, we found that some HMAD genes highly expressed under stress condition (Fig. 6). For example, Gh_D08G0132 and Gh_A05G1510 highly expressed after 12 h of the salt stress condition, while Gh_A01G1576 highly expressed after 1 h of the stress condition (cold, salt, PEG). Gh_A09G1374, Gh_ D09G1375, Gh_D10G0078 expression level increased under stress condition (cold, salt, PEG).

Core promoter element analysis
To further explore why HMAD gene family highly expressed under biotic stress condition except heavy metal, the core promoter element of HMAD genes from G. hirsutum were divided into four types (hormone, stress, tissue and others) (Fig. 7), among which, element involved in hormone-responsiveness mainly contained . Element involved in defense and stress responsiveness mainly contained drought, low-temperature, dehydration, salt stress, anaerobic, among which, 72 genes involved in drought, 51 genes involved in low-temperature responsiveness, 55 genes involved in defense and stress responsiveness with TC-rich repeats element, and 1 gene (Gh_ D04G1066) both involved in salt and low-temperature responsiveness. In total, there were 111 genes of 169 HMAD genes with core promoter element responding to stress. As described above in TM-1 RNA-seq data, 12 of the 18 genes were highly expressed with at least one abiotic stress-related promoter element (Table S2). Element involved in tissues including the palisade mesophyll cells, meristem, endosperm, seed-specific. And element involved in Fig. 5 Logo of conserved motifs in HMAD domain in G. hirsutum. a: Conserved motifs were predicted from MEME (http://meme-suite.org/tools/ meme). The length of proteins were estimated using the scale at the bottom. The motifs were displayed in the different colored boxes with various numbers, black line indicated the non-conserved amino acid. b: Logo of conserved motif was predicted from MEME (http://meme-suite.org/tools/meme) other's function, such as circadian control, cell cycle, flavonoid biosynthetic. It was interesting that 9 of 12 genes with element of flavonoid biosynthetic were along with other's stress element. In previous study, anthocyanins, as secondary metabolites, may respond to stress resistance through osmotic equilibrium [126][127][128]. For example, Gh_A01G1576 highly expressed after 1 h of the stress condition (cold, salt, PEG), whose core promoter element contained drought-inducibility, low-temperature responsiveness and MBSI promoter element involved in flavonoid biosynthetic genes regulation (Table S2).

The expression level of HMAD gene in different tissues under Na 2 SO 4 stress
To identify the function of HMAD genes under other abiotic stress, we used the material Zhong 9835 [129]. Based on the HMAD gene family of RNA-seq data (Fig. 8) in Zhong 9835 (Table S5), 14 genes significantly expressed differentially in roots, stems and Fig. 6 Expression levels of HMAD genes in different tissues and different stress. A phylogenetic tree at left from HMAD protein sequences was constructed with MEGA X using neighbor-joining method. The heatmap at right was generated on the basis of RNA-seq data from the website (http://structuralbiology.cau.edu.cn/gossypium/). The color bar represents the expression values. The color scale was shown at the right of the figure. Higher expression levels were shown in red, and lower in green leaves between control and treatment with 300 mM Na 2 SO 4 (Table S4, Fig. S1), in which 10 genes with at least one core promoter element about stress (Table S2). It is interesting to note that 3 of 4 flavonoid biosynthetic element were along with the stress element. More important, some genes highly expressed in both TM-1 and Zhong 9835 under stress condition, such as Gh_D04G0145, Gh_ D10G0078, Gh_Sca011408G01, Gh_A01G1576 and so on.

Discussion
In this study, HMAD family genes from G.arboreum (84 genes), G. raimondii (76 genes), G. hirsutum (169 genes), and G. barbadense (159 genes), respectively were identified, which contain the total numbers of HMAD genes in the two diploid cotton (G. arboreum and G. raimondii), as A and D genome donor species, were lower than that in allotetraploid (G. hirsutum and G. barbadense) cotton. Syntenic analysis of the HMAD gene family in four cotton species revealed that 4 genes in G. arboreum and 11 genes in G. raimondii were lost during evolution, while 24 genes appeared in G. hirsutum, showing that these genes played a critical role in cotton evolution. As most of the Ka/Ks ratios were smaller than 1.0, we presumed that the cotton HMAD gene family underwent strong purifying selection pressure with limited functional divergence. These results suggested that there was possible gene loss and/or as a result of chromosome rearrangement during the evolution [121].
169 G. hirsutum genes were identified belonging to the HMAD gene family. The molecular weights (kDa) of 169 HMAD proteins ranged from 5.8 to 108.5 kDa ( Table 1). The isoelectric point (pI) of the majority of the 169 HMAD proteins was alkaline except for 55 genes less than 7.6 ( Table 1). The various molecular weight and gene sequence length indicated that the physical and chemical properties of HMAD family genes have little difference. Based on the WoLF PSORT analysis, the HMAD family genes are mainly distributed in the chloroplast (62 genes), the cytosol (54 genes), the nucleus (39 genes) and the plasma membrane (11 genes) ( Table 1). 169 HMAD genes were divided into five subclasses: I, II, III, IV, Va, Vb, among which the II subclass contained the highest number of genes (66 members) and followed by III subclass (29 members). Structural analysis of the 169 HMAD gene family showed that just 5 genes (Gh_D01G1640, Gh_Sca011408G01, Gh_ A05G3385 of group I, Gh_D08G1263 and Gh_A08G0990 of group Vb) contained no intron. While the rest of the HMAD genes contain multiple introns, especially P1B-ATPases HMA5-8 contains most introns than other genes. Among the gene functional annotations of 169 HMAD genes, the number of Heavy metal transport/detoxification superfamily proteins is 116 (Table S5), which are divided into four categories between the phylogeny tree and gene structural. 13 genes pairs had Ka/Ks larger than 1, which includes 13 Heavy metal transport/detoxification superfamily proteins (Gh_A05G2686, Gh_ A08G1875, Gh_A10G1490, Gh_A10G2083, Gh_ Gh_A12G0038, Gh_A12G0079, Gh_ A13G2272, Gh_D05G0215, Gh_D05G2984, Gh_ D07G0938, Gh_D08G2237, Gh_D10G1733). Additionally, the signature of four conserved amino acids CXXC for binding metal ions was discovered through sequence alignment [54,55]. The classified genes and conserved motifs with conserved amino acids CXXC for binding metal ions indicated that 169 HMAD genes may be different response to heavy metals in various organelles, especially some Heavy metal transport/detoxification superfamily proteins under relatively rapid evolution.
Gene expression patterns with the differentiation of promoter regions can provide important insights to gene function [130]. After the RNA-seq data of TM-1 analysis, the most genes of expression level cluster 10 with 151 genes had a lower expression level or none. And after the promoter element analysis of four types (hormone, stress, tissue and others), there were 111 genes of 169 HMAD genes with core promoter element responding to stress. The results showed that 169 HMAD genes were not widely expressed in tissues as well as under stress condition (cold, salt, PEG) (Fig. 6), indicating their Cotton is half halophytes, and Zhong 9835 was resistance to salt [131], including Na 2 SO 4 . Based on the transcriptome data of TM-1, we found that heavy metal transport protein highly expressed under adversity abiotic stress condition. Further, through gene sequences and promoter element analysis, we found that HMAD evolution speed was quickly, which divided into five types of HMAD family, and some of those genes with responding to stress element had a highly expression under adversity abiotic stress condition. According to the analysis of the root, stem and leaf between Na 2 SO 4 treatment and control, 14 genes with stress element significantly expressed differentially (Fig. S1). HMAD highly expressed under salt condition, probably because of ROS caused by ion balance [6]. For example, on the one hand, gene expression in ROS way and ion balance maintenance, such as Ca 2+ signaling pathway and MAPK, MYB transcription factor [132,133], programmed cell death [134,135]. And then the GSH, as the main way to remove ROS under the condition of high concentration, can not only response to heavy metal ions [136], also can response to salt stress ion [137]. At last, the balance of ions, such as anthocyanins were associated with the salt stress [6]. HMAD with anthocyanins related promoter elements highly expressed under Na 2 SO 4 condition, similar to previous study that anthocyanins involved in resistance to salt, at the same time involved in heavy metal transport [137]. On the other hand, the transfer of heavy metals and salt stress are vacuole segregation [138,139], such as the P-type ATP as an important role, can not only balance the salt ions and also can balance of heavy metal ions [140,141].
The sequences of HMA (Heavy Metal ATPase) of P1B-ATP from G. hirsutum based on the sequences of HMA in A. thaliana, also contained P-ATPases (E1-E2 ATPases) and HAD (halo acid dehydrogenase) domain and HMAD (heavy-metal-associated domain) domain (Table S3). In this study, HMAD gene family contained HMA5-HMA8 (except Gh_A08G2387) (Table S3). HMA5 localized in the plasma membrane, of which Gh_ A05G0564, Gh_A08G2388, Gh_D05G0693 with 8 TMHs, while Gh_D08G1950 with 6 TMHs. In HMA6, Gh_A03G1525 with 7 TMHs localized in the plasma membrane, whereas Gh_A04G0969 and Gh_D04G1512 without TMHs localized in the chloroplast. HMA7 and HMA8 localized in the plasma membrane with 8 TMHs and 5 TMHs, respectively. Obviously, in cotton HMA genes evolutionarily adapted quickly in the TM region through the analysis of the sequence, gene structure, Ka/ Ks ratio and the phylogenetic tree [144].

Conclusions
In summary, we identified 169, 159, 76 and 84 fulllength putative HMAD genes in G. hirsutum, G. barbadense, G. arboreum and G. raimondii, which were much larger than that of the other gene families. We also found that HMAD gene family with promoter elements in response to stress, may plays important roles in different abiotic stress. Our results provided a foundation to further explore the crosstalk of molecular mechanism of HMAD genes under abiotic stress and heavy metal condition.

Phylogenetic analysis
The multiple sequence alignment of HMAD domain sequence containing genes of four cotton species was accomplished by ClustalX2 software [146] with default parameters. The unrooted phylogenetic tree was constructed by the neighbour joining tree (NJ) in MEGA X software [147] (https://www.megasoftware.net/) with the bootstrap analysis for 1000 iterations and ggtree (v2.2.4) packages [148] of R (v4.0.3) software.

Chromosomal mapping and gene duplication
The physical location data of HMAD genes were retrieved from genome sequence data of four cotton species, and was subsequently used to map these genes using Mapchart-2.23 [149]. Synonymous and non-synonymous rates of evolution were computed using the maximum likelihood method by the Ka/Ks calculator 2.0 [150].

Gene structure and domain analysis
The exon and intron organizations of HMAD genes inferred in the gene structure display server (http://gsds. cbi.pku.edu.cn/) through comparison of genomic and CDS sequences. The conserved motifs in HMAD genes were identified by MEME (http://meme-suite.org/tools/ meme) and TBtools-0.6673 [151].
Genome wide synteny analysis of HMAD genes A BLASTP comparison was used to obtain the pair wise gene information between two allotetraploid cotton species (G. hirsutum and G. barbadense) and two diploid cotton species (G. raimondii and G. arboreum). According to the BLASTP output, the synteny analysis was constructed using circos-0.69-3 software package (http:// circos.ca/software/) with default parameters.

Analysis of cis-elements in the promoters
Promoter element sequences extracted from upstream 2000 bp of genes, cis-element were found through Plant CARE database (http://bioinformatics.psb.ugent.be/ webtools/plantcare/html/).

RNA-seq between control and treatment with Na 2 SO 4
Zhong 9835, a preserved self-bred line from cultivar of G. hirsutum by our lab, was used for this study. Seeds were sown in sand soil pots. The sand was washed cleanly and sterilized at 121°C for 8 h. Four seedlings in each pot were cultivated in a 28°C/16 h light and 25°C/ 8 h dark cycle with a light intensity of 150 μmol·m-2·s-1 and 75% relative humidity for approximately 30 days. Then, 300 mM Na 2 SO 4 after 12 h was chosen as the applicable stress concentration and time. Seedlings transplanted into normal water were used as controls. After exposure for 12 h, leaf, stem and whole root samples were collected. Each sample was tested three times. Samples were frozen in liquid nitrogen and stored at − 80°C.

RNA extraction and qRT-PCR analysis
Total RNA was isolated from root, stem and leaf between control and treatment with 300 mM Na 2 SO 4 in the Zhong 9835 by the EASY spin Plant RNA Kit (TIANGEN). Afterwards, first-strand cDNA was synthesized using Prime Script TM II 1st strand cDNA Synthesis Kit (TaKaRa) according to the manufacturer's instructions. The qRT-PCR was carried out in 20 μL volume containing 1.4 μL cDNA, 0.8 μL of 10 μM forward and reverse primer, 10 μL SYBR Premix Ex Taq II (2×), and 7.8 μL ddH 2 O. PCR amplification was performed under the denaturation at 95°C for 30 s; 40 cycles at 95°C for 5 s and 60°C for 30 s; followed by 95°C for 15 s, 60°C for 1 min by Bio-Rad CFX96 Real-Time PCR system. qRT-PCR was carried out by the gene-specific primers, His-tone3 (AF024716) (F: TCAAGACTGATTTGCGTT TCCA, R: GCGCAAAGGTTGGTGTCTTC) was employed as an internal control. In the end, relative gene expression was quantified using the 2 -△△Ct method.