Adaptive and degenerative evolution of the S-Phase Kinase-Associated Protein 1-Like family in Arabidopsis thaliana

Genome sequencing has uncovered tremendous sequence variation within and between species. In plants, in addition to large variations in genome size, a great deal of sequence polymorphism is also evident in several large multi-gene families, including those involved in the ubiquitin-26S proteasome protein degradation system. However, the biological function of this sequence variation is yet not clear. In this work, we explicitly demonstrated a single origin of retroposed Arabidopsis Skp1-Like (ASK) genes using an improved phylogenetic analysis. Taking advantage of the 1,001 genomes project, we here provide several lines of polymorphism evidence showing both adaptive and degenerative evolutionary processes in ASK genes. Yeast two-hybrid quantitative interaction assays further suggested that recent neutral changes in the ASK2 coding sequence weakened its interactions with some F-box proteins. The trend that highly polymorphic upstream regions of ASK1 yield high levels of expression implied negative expression regulation of ASK1 by an as-yet-unknown transcriptional suppression mechanism, which may contribute to the polymorphic roles of Skp1-CUL1-F-box complexes. Taken together, this study provides new evolutionary evidence to guide future functional genomic studies of SCF-mediated protein ubiquitylation.

162 reference sequence to assemble two allelic sequences. Only when the new Col-0 allelic 163 sequence was 100% identical to the reference sequence was the variant allele considered to be 164 assembled correctly. To assemble an outgroup sequence, an amino acid sequence alignment of 165 an ASK protein and its A. lyrata ortholog was obtained by MAFFT (Katoh et al. 2017) and used to 166 guide the assembly of a nucleotide sequence of the outgroup. The sites introducing gaps in the 167 reference ASK sequence were removed.

168
169 Determination of tandemly duplicated Skp1 genes 170 Two Skp1 genes were determined to be tandemly duplicated if they were both separated by ≤5 171 genes and located within 10 kb.

172
173 Clustering analysis 174 Sequences were clustered using Heatmap.2 (dist method = "manhattan", hclust method = 175 "word.D") in R (http://www.r-project.org) to demonstrate similar evolutionary constrains of 176 mutations as described previously  252 gaps and mis-matched sites, but also retained a reproducible result with 95 ± 5% of the full 253 length Skp1 protein sequences being aligned ( Figure S1), maximizing the sequence length and 254 variable sites essential for a good phylogenetic analysis (Nei and Kumar 2000). As a proof of 255 concept, an ASK2-rooted maximum likelihood tree generated based on the 19 aligned ASK 256 protein sequences showed a compatible topology to the one reported previously (Kong et al. 257 2007) ( Figure 1A). However, unlike the previous tree where intronless and intronic ASK genes 258 are intermingled ), all intron-containing ASK genes (except for ASK15, whose 259 intron was gained after duplication ) were clustered at the base of the tree 260 ( Figure 1A). This result better explains a single origin of the intronless ASK genes, which were 261 duplicated through retroposition from a highly expressed ASK gene, most likely the ancestor of 262 ASK1.

263
264 Subsequently, a maximum likelihood tree rooted to ASK2 was generated based on the 265 consensus protein sequence alignment of the 47 Arabidopsis Skp1 protein sequences by 266 RAxML (Stamatakis 2014). Encouragingly, the resulting phylogenetic tree showed a similar 267 topology to that obtained for the ASK genes, and the 47 Skp1 genes from the three Arabidopsis 268 species were intermingled in 11 clusters ( Figure 1B). Therefore, most Arabidopsis Skp1 genes 269 were duplicated at least 5-10 mya, at the time when the three Arabidopsis species split (Hu et

283
284 We further reconciled a gene tree based on this maximum likelihood tree. Along with a species 285 tree, we detected significant variance of birth/death rates between each species ( Figure 1C).  Table S1). This clear 301 orhology relationship allowed us to examine evolutionary constraints on the sequence 302 divergence of ASK genes. We primarily applied the method of Nekrutenko et al. (2002) to test 303 whether the Ka/Ks ratio (i.e., ω) of an ASK gene is significantly diverged from 1, which indicates 516 In this study, our improved phylogenetic analysis resolved the inconsistency between the 517 phylogeny of ASK genes and the single origin of retroposed ASK members ).
518 Through evolutionary selection analysis and sequence polymorphism comparison, we 519 discovered both adaptive and degenerative evolutionary processes in the ASK family. Yeast 520 two-hybrid quantitative interaction assay and expression analysis across different accessions 521 further indicated that recent neutral changes in the ASK2 coding sequence likely weakened its 522 interactions with F-box proteins and that highly polymorphic upstream regions of ASK1 may 523 contribute to adaptive roles of SCF complexes in Arabidopsis, respectively.

Figure 1(on next page)
A short evolutionary history of the Arabidopsis Skp1 genes within the Arabidopsis genus.
A) Phylogenetic relationships of the ASK members. Intronic genes are highlighted in red. Manuscript to be reviewed    Manuscript to be reviewed  Quantitative interaction assay of ASK1 and ASK2 with 15 selected known F-box proteins.

A S K 1 A S K 2 A S K 3 A S K 5 A S K 6 A S K 7 A S K 8 A S K 9 A S K 1 0 A S K 1 1 A S K 1 3 A S K 1 4
The accession IDs of F-box proteins are listed in Table S2. The mated yeast cells expressing the indicated pair of bait and prey proteins were grown on SD-Leu-Trp medium and used for the assay. The beta-galactosidase activities shown are mean values ±SD measured from two independent assays.