Arid5a uses disordered extensions of its core ARID domain for distinct DNA- and RNA-recognition and gene regulation

AT-rich interacting domain (ARID)-containing proteins, Arids, are a heterogeneous DNA-binding protein family involved in transcription regulation and chromatin processing. For the member Arid5a, no exact DNA-binding preference has been experimentally defined so far. Additionally, the protein binds to mRNA motifs for transcript stabilization, supposedly through the DNA-binding ARID domain. To date, however, no unbiased RNA motif definition and clear dissection of nucleic acid–binding through the ARID domain have been undertaken. Using NMR-centered biochemistry, we here define the Arid5a DNA preference. Further, high-throughput in vitro binding reveals a consensus RNA-binding motif engaged by the core ARID domain. Finally, transcriptome-wide binding (iCLIP2) reveals that Arid5a has a weak preference for (A)U-rich regions in pre-mRNA transcripts of factors related to RNA processing. We find that the intrinsically disordered regions flanking the ARID domain modulate the specificity and affinity of DNA binding, while they appear crucial for RNA interactions. Ultimately, our data suggest that Arid5a uses its extended ARID domain for bifunctional gene regulation and that the involvement of IDR extensions is a more general feature of Arids in interacting with different nucleic acids at the chromatin–mRNA interface.

AT-rich interacting domain (ARID)-containing proteins, Arids, are a heterogeneous DNA-binding protein family involved in transcription regulation and chromatin processing.For the member Arid5a, no exact DNA-binding preference has been experimentally defined so far.Additionally, the protein binds to mRNA motifs for transcript stabilization, supposedly through the DNA-binding ARID domain.To date, however, no unbiased RNA motif definition and clear dissection of nucleic acid-binding through the ARID domain have been undertaken.Using NMR-centered biochemistry, we here define the Arid5a DNA preference.Further, high-throughput in vitro binding reveals a consensus RNA-binding motif engaged by the core ARID domain.Finally, transcriptome-wide binding (iCLIP2) reveals that Arid5a has a weak preference for (A)U-rich regions in pre-mRNA transcripts of factors related to RNA processing.We find that the intrinsically disordered regions flanking the ARID domain modulate the specificity and affinity of DNA binding, while they appear crucial for RNA interactions.Ultimately, our data suggest that Arid5a uses its extended ARID domain for bifunctional gene regulation and that the involvement of IDR extensions is a more general feature of Arids in interacting with different nucleic acids at the chromatin-mRNA interface.
Among the large number of DNA-binding proteins (DBPs), ARIDs compose a distinct family of nuclear proteins with manifold functions in cellular processes alongside transcriptional regulation (reviewed in (1, 2)).ARID proteins are classified with respect to their shared DNA-binding domain, named AT-rich interactive domain (ARID), reflecting the supposed preference for AT-rich DNA (3,4).Beyond that, ARID-containing proteins-further referred to as 'Arids' for the sake of clear distinction from the ARID domain-are diverse in size and domain architecture, based on which the 15 known human Arids are divided into seven subfamilies (5).All ARID domains share a conserved fold, comprising a minimal core structure of six a-helices (H1 to H6, Fig. 1), with H3/ 4 and H5 forming a central helix-turn-helix (HTH) motif, a widespread DNA-binding unit of DNA-binding domains (5)(6)(7).Turn-containing motifs, similar to the HTH, are in principle also capable of recognizing dsRNA (8).It is thus not surprising that the general ability of nucleic acid-binding proteins to interact with both DNA and RNA (DRBPs) is conceived more widespread than previously thought (9).Still, most DRBPs are assumed to exploit distinct domains to interact with DNA and RNA, respectively, as for example, known for Sox2 (10) and SAFB proteins (11).Yet, certain domains, such as the zinc finger motifs, were early found to interact with both types of nucleic acids, for example, described for the Xenopus laevis protein TFIIIA (12).
Arid5a is the only Arid representative described as capable of binding both RNA and DNA (13).The Arid5 family members 5a and 5b share the least conserved domain architecture among Arids.Their ARID domains, however, are 73% identical (5).The large divergence of Arid5a and 5b reflects distinct functions: Arid5b is categorized as a transcriptional coactivator with essential roles in adipogenesis and liver development, involving chromatin interaction (14).The existing high-resolution information for an Arid5b ARID-DNA complex has been obtained with the supposedly specific dsDNA consensus motif 5 0 -AATA[CT]-3 0 (15, 16).However, the motif has merely been questioned in singlenucleotide exchanges and, more importantly, motif expansions have not been tested.At the same time, a possible capability of the Arid5b ARID domain to interact with (ds) RNA has not been investigated, as is true for all other Arids.
Arid5a, significantly smaller in size, has been classified mainly as a transcriptional repressor (17), for example, for nuclear hormone receptors (18).On the other hand, Arid5a is thought to function actively in the transcription of specific genes and support gene de-repression by histone acetylation together with Sox9 (19).In 2013, Arid5a was termed an RNA-binding protein (RBP), stabilizing the mRNA of Il-6 (13), thus counteracting the degradation mediated by the regulatory RBPs Regnase-1 and Roquin (20).Follow-up work suggested additional targets of Arid5a in an immunological context, among them Stat3 (21) and Ox40 (22), soon categorizing the protein as pro-inflammatory factor.In these studies, the ARID domain is claimed to interact with particular RNA stem-loop structures, known to exist in Stat3 (21), Ox40 (23,24), and possibly also the Il-6 3 0 -UTR, suggesting shape-specific recognition of RNA cis-regulatory elements similar to Regnase-1 and Roquin (23,(25)(26)(27).Though RNA recognition has indirectly been attributed to the Arid5a ARID domain in mice (21), a direct proof for its interaction with RNA is still missing.At the same time, we have no insight how Arid5a uses its ARID domain to distinguish between specific DNA-and RNA-binding and whether flanking regions are involved.
Arid5a was initially found differentially expressed in tissues unrelated to the adaptive immune system, but with a clear nuclear localization, in line with transcriptional regulation (17).Recent work has extended both findings with the protein being able to shuttle upon lipopolysaccharide stimulation in immune cells (28), a prerequisite for transcript protection against cytoplasmic nucleases.It remains unexplored how RNA motif preferences of the Arid5a ARID domain are related to this, but they should exist independent of cellular localization.There is to date no systematic study identifying the transcriptome targeted by Arid5a independently of its immunological role.
In Arids, the so-called core ARID can appear as N-and/or C-terminally extended domain (Fig. 1), that is, additional helices (H0, H7) or intrinsically disordered regions (IDRs) enlarge the interface with nucleic acids and likely modify preferences.The strength of IDRs, modifying the function and specifics of DBPs, has recently been brought up for transcription factors (TFs), many with a previously unknown affinity to RNA mediated through their IDRs (29).While well conceivable as a more general feature, for example, for compartmentalizing (co)-transcriptional processes, no structural proofs exist for a simultaneous or mutually exclusive interaction of protein domains with RNA and DNA.Similarly, the lack of high-resolution ARID structures with DNAs-with only few exceptions-has hindered us from identifying concepts of specific target recognition through core domains and in combination with flanking regions.As such, most motifs assigned to individual Arids are derived from genetic studies or do not unambiguously define the ARID domain as responsible for interactions.And, the currently known studies have not addressed binding of RNAs by Arids.We here present a systematic analysis of the Arid5a ARID domain towards specific DNA-and RNA-binding.Using a combination of NMR and EMSAs, we compare nucleic acid recognition of the core with the IDR-extended ARID.Our work provides unambiguous proof for the dual nucleic acid recognition by the domain.We provide in-depth evidence for its preference towards specific AT-DNA motifs, while RNA Bindn-Seq (RBNS) reveals a preference for an unexpected CAGG-CAG consensus motif, accompanied by a general preference for AU-rich motifs (Data Table S1).We find that the ARIDflanking IDRs strongly modulate affinity for complex RNA and nonspecific DNA sequences.We show that Arid5a exists in the nucleus under unstressed conditions and perform the first individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP2) experiment to map Arid5a-binding sites throughout the transcriptome and identify an in vivo (A) U-rich consensus target RNA motif.We find Arid5a to bind RNA-processing related nascent transcripts.While we suggest Arid5a to mainly exert DBP functions, we show that extended ARID domains of other Arids have a similar capacity to interact with RNA.Thus, we stress the idea of Arids as more general dual nucleic acid-binding proteins.We suggest an essential role of the ARID-extending IDRs in nucleic acid recognition, in particular for-but not restricted to-Arid5a.

Highly conserved ARID domains show distinct DNA-binding preferences
Doubts have evolved over recent years to whether all name giving AT-rich interactive domains of the 15 human Arids share exclusive preference for AT-rich sequences (recently reviewed by Korn and Schlundt (5)).Indeed, controversial sequence preferences reported for a number of Arids gave reasons to unbiasedly probe for individual target sequences (1, [30][31][32].We thus picked representative ARID domains from three subfamilies that had been described to target DNA with different sequence preferences (Fig. 2A).While some literature describes Arid1a to bind DNA nonspecifically through its ARID domain, the ARID domains of Arid5b and JARID1a are suggested to be specific for AT-and GC-rich dsDNA, respectively (1, 16,32).We used fluorescently labeled AT-or GC-rich dsDNA to monitor preferences of these ARID domains in EMSAs (Figs. 2B  and S1).Interestingly, the ARID domains of Arid1a and JAR-ID1a are less specific for AT-rich DNA than the ARID domain of Arid5b (see also Fig. S1).Furthermore, and in line with multiple studies (30,31,33), the Arid1a ARID domain displays similar affinity for a GC-rich dsDNA, supporting its nonspecificity for DNA.In summary, the data argue against ARID domains as exclusive AT-binders and raise the need to carefully de novo define and interpret available consensus motifs for the individual domains despite their highly conserved fold.
The Arid5a ARID domain uses an extended binding interface with AT-rich DNA Because of the above-described variance in the DNAbinding preferences of ARID domains, we first decided to investigate the Arid5a ARID domain's sequence preference.Although 9mer dsDNA sequences were sufficient for binding, we observed a minor increase in affinity with longer dsDNAs plateauing at 13 bp length (Fig. S3) and thus used 13mers for our study.In EMSAs, we tested ARID 37-183 against fluorescently labeled dsDNAs, either GC-rich based on Jarid1a binding to a "CCGCCC" motif (32) or with variations of a central AT-stretch based on a published motif for the closely related Arid5b ARID (16) (Table S3 and Fig. S3).We find that the Arid5a ARID domain clearly favors AT-rich dsDNA, and a central "AATA" motif is important as evident from EMSAderived affinities around 0.8 to 2.3 mM (13merAT WT, var1, var2, and var4) contrasting the DNAs without four consecutive A/Ts, for which no K D could be derived (Fig. S4).
To investigate differential complex formation on the residue-resolved level, we next used NMR and performed 1 H- 15 N-heteronuclear single quantum coherence (HSQC) titrations of either 13merAT or 13merGC dsDNA to the extended ARID 37-183 and plotted the combined chemical shift perturbations (CSPs) over the protein sequence (Fig. 3, A and B, see Experimental procedures section for details regarding CSP calculation).With this experiment, we sought to (i) identify the precise interface(s) of the ARID domain with DNA beyond its core fold and (ii) spot potential differences in CSP patterns caused by the two dsDNA ligands.The titrations clearly show that ARID 37-183 binds to both the 13merAT and the 13merGC dsDNA.However, different exchange regimesfast exchange for 13merGC and intermediate exchange for 13merAT (insets Fig. 3A)-support the significantly higher affinity of Arid5a ARID to AT-rich than GC-rich DNA observed in EMSAs (Fig. S4).Of note, maximum CSPs within core ARID residues are much smaller for 13merGC than for the AT-rich DNA (Fig. 3, A and B).Interestingly, CSP differences of flanking IDR residues, especially the C-terminal extension (residues 150-160), are less pronounced between GC-and AT-rich DNA targets than within the core domain, suggesting the contribution of IDRs to DNA-binding is less or nonspecific.
From the CSP plots, we concluded that the Arid5a ARID domain interacts with DNA through residues in loop L1 and the HTH motif (H4-L2-H5) (Fig. 3B).This is in good agreement with the reported DNA-binding interface found for other ARID domains (16,(34)(35)(36) and an R-to-A mutant in murine Arid5a (corresponding to R133 in the human version, see Fig. 1B) incapable of DNA-binding (21).Importantly, our data reveal an additional contribution of residues K152 and L154 within the C-terminal extension.Mapping significant CSPs obtained for the AT-DNA interaction on an Arid5a ARID RoseTTAFold model clearly shows them to cluster in the canonical DNA-binding interface (Fig. 3C).
To investigate the potential contribution of both N-and Cterminal extensions to DNA-binding in more detail, we created constructs of the ARID domain with either the separate N-(ARID 37-152 ) or C-terminal (ARID 49-183 ) extension and compared their DNA interaction with 13merAT to the core domain (ARID 49-152 ) and the extended ARID 37-183 (Figs.3D, S7, and S8).In contrast to the N-terminal IDR, the C-terminal extension shifted the ARID-DNA interaction towards an NMR-observed intermediate-to-slow exchange regime (Figs.3D and S7), supported by observable changes in the EMSA patterns (Fig. S9).The latter does not only support the higher affinities for C-terminally extended ARID constructs (ARID 49-183 and ARID 37-183 ) but also reveals the formation of more prominent complex bands for these two constructs, indicating DNA-protein complexes sufficiently tight to maintain their integrity in the native gel condition and that are less pronounced in ARID domains devoid of the C-terminal extensions.
To confirm sequence-specific DNA recognition in the 13merAT DNA compared to 13merGC, we titrated increasing concentrations of ARID 37-183 to the respective DNAs and monitored effects on imino protons (Fig. 3E).We undertook a complete assignment of 13merAT imino resonances, which allowed a base pair-resolved analysis (Fig. S5).In line with the EMSA-observed stable complex formation, we found strong line broadening within the 13merAT DNA after addition of the protein.As expected, this effect is more pronounced for the central base pairs of the 13merAT DNA suggested to form the interface with ARID (Fig. 3C) and including the central AATA motif, as compared to the flanking terminal base pairs (compare residues G2/12 and T7), which, however, still show weak CSPs.In contrast, the 13merGC merely displayed minor line broadening upon ARID 37-183 addition, more evenly distributed over all imino signals.This supports a weak, but nonspecific interaction with the GC-rich DNA, driven by electrostatic interactions with the DNA backbone rather than base-specific contacts.Merely predicted domains are striped black and white.B, the DNA-binding preference of extended ARID domains comprising the minimal core ARID plus 18 N-and 20 C-terminal residues were studied by EMSAs with either 10 nM 13merAT or 13merGC fluorescently labeled dsDNA (Table S3).Protein concentrations are shown above each lane in mM, and all experiments have been carried out in standard Arid5a buffer.Of note, the EMSA gel for Jarid1a with 13merGC has been spliced to skip additional concentrations to better align with the 13merAT EMSA above (indicated by the lines).Uncropped gels (and replicates) are given in the source data file.
Arid5a-extended ARID binds DNA and RNA

Mutational studies of Arid5a confirm key residues for DNAbinding
To confirm the ARID DNA-binding interface, we designed protein mutants by replacing selected residues, located either in L1 or the HTH motif, by alanine.Residues were chosen based either on their high CSPs observed in the ARID 37-183 titration with 13merAT (K85 and Q86) or on literature and sequence comparison to other Arids-especially Arid5b-and their key DNA-binding residues (R78A, R109A, T125A/ S126A) (1, 5, 16).Mutations were introduced both in the core ARID 49-152 and extended ARID 37-183 background, to further elucidate the role of IDRs in this context.As the spectra for the mutants only showed minor local CSPs (Fig. S10), we were able to unambiguously transfer most assignments from the WT spectra to the mutants (see also Experimental procedures section). 1H-15 N-HSQC spectra of proteins alone and in the presence of 2-fold molar excess 13merAT dsDNA were recorded to quantify the effect of single, double, triple and quadruple mutations on DNA-binding (Figs. 4, S11, and S12).
Mutation of T125 and S126, located in the HTH at the transition of loop 2 to helix 5, strongly impaired DNA-binding of the core ARID domain (Fig. 4A).This is in line with their expected role in making specific contacts with an AT base pair in the DNA major groove, as suggested by the complex structure of the closely related Arid5b ARID domain with ATrich dsDNA (15).Loop 1 mutations (K85A/Q86A) on the other hand-despite high CSPs (Fig. 3 Arid5a-extended ARID binds DNA and RNA the effect of DNA-binding mutations was less pronounced in presence of the extending IDRs, evident when comparing global CSPs between ARID 37-183 and ARID 49-152 (Fig. 4, B and C).We thus conclude that the C-terminal extension to the ARID domain can compensate for mutations within the core ARID domain by a general, but nonspecific mode of increasing affinity for dsDNA.
In vitro RNA-binding of the Arid5a ARID domain Arid5a was recently identified to stabilize the Ox40 mRNA in murine CD4+ T cells by direct interaction with a stem-looped structure in its 3 0 -UTR, known as an alternative decay element (ADE) (22).In doing so, Arid5a interferes with controlled degradation of the Ox40 transcript by the nuclease Regnase through targeting the same cis-regulatory element.We wondered if the ARID domain in Arid5a was responsible for the underlying complex formation with the RNA stem-loop and used NMR spectroscopy to observe atom-resolved binding of the ARID domain to the ADE element (Figs. 5 and S13).Interestingly, the minimum core ARID 49-152 showed only marginal interactions, even with a high stoichiometric excess of the 19-nt ADE (Figs. 5, D and E and S13), when judged by the magnitude of CSPs compared to the AT-DNA before (see Fig. 3B for comparison), and in fact is rather reminiscent of binding to GC-DNA.However, similar to DNA-binding, the basic C-terminal extension in ARID 37-183 contributed to an increased ADE interaction as indicated by both visible CSPs within this region and slightly increased CSPs for the core ARID 49-152 domain (Fig. S13B), suggesting the extension to carry an essential role in Arid5a-based mRNA regulation in vivo.
While our data are the first structural proof of a direct ARID-RNA interaction, we were surprised by the observed moderate binding affinity.To this end, we decided to set up an unbiased search for a general consensus RNA target motif of the core ARID fold, which had not been undertaken prior to this study.We thus performed RBNS to test the ARID domain's capability to interact with specific RNA motifs.This in vitro high-throughput assay allows to identify the binding preferences of an RBP (37,38).A pulldown is performed with a 20 nt random RNA pool flanked by short constant adapter sequences with different concentrations of Strep-tagged RBP (ARID 49-152 : 0.25, 1, 5 mM).The constant regions are then used to add sequencing adapters for subsequent analysis by nextgeneration sequencing.We obtained 35 to 50 million unique reads for each ARID protein concentration.By comparing the frequencies of k-mers in the input library with the pulldown libraries, we were able to identify enriched 6mers (Fig. 5B and Data Table S1).The motifs found here can be broadly divided into two types: (i) those that contain AGGC as a core motif and no uracil and (ii) those that are rich in AU.Analysis of enriched 5-and 7-mers yielded similar results (Fig. S14A and Data Table S1).Complex binding motifs were calculated to get an insight into the environment of the binding sites (Fig. S14C).Clustering of the AGGC core motifs results in the 9-mer (A)CAGGCA(GG) (Fig. 5C).Based on this, we designed two reverse-complementary 9-mer RNAs in agreement with our minimal length for affine DNA-binding (Figs. 5C and S3).The structural features of the identified binding motifs were estimated by calculating the average base pairing probability with RNAfold in silico.Here, the AGGCcontaining motifs show almost no base pairing and appear unstructured, while the AU-rich ones show no particular preference for being structured or unstructured (Fig. S14D).We tested ARID 49-152 binding to the (A)CAGGCA(GG) motif both as ssRNA with the forward (fw) and reverse (rev) strand individually as well as their annealed dsRNA (Fig. 5C).Comparison of CSPs similarly to RBNS data reveals a clear preference of ss versus dsRNA; yet within the ssRNA context, specificity for a defined motif is not particularly pronounced (Figs. 5, D and E and S14D).Unexpectedly, the protein regions interacting with ssRNA are the same as are interacting with dsDNA, with residues 80 to 90 and 120 to 130 showing the highest CSPs, which raises the question of a so-far unknown single-stranded nucleic acidbinding mode by ARID.Furthermore, titrations with the RBNS-based ssRNAs indeed show stronger CSPs compared with the ADE despite the larger size and the previously suggested specificity of Arid5a for the ADE interaction mediated by the ARID domain (22).Those findings are in line with the lack of ADE-related motifs observed in RBNS.Altogether, our data suggest a previously unknown RNA sequence preference specific to the core ARID domain, which may indicate possibly unidentified (m)RNAs bound by Arid5a in vivo.

IDRs increase the ARID RNA-binding affinity in a lengthdependent manner
To this stage, our data suggest the ARID core domain to prefer ssRNA over dsRNA and folded RNA and an obvious contribution of the C-terminal IDR to the binding affinity for the ADE.Consequently, we wondered how the ARIDextending IDRs would influence the RNA-binding capacity of other RNA sequences that are longer and more complexly folded.We started with a prolonged dsRNA (19mer_ds), with a central AU-rich core and GC-stabilized flanking regions (Figs. 6 and S15).We recorded 1 H-15 N-HSQCs of the minimal core and the extended ARID domain in the presence and absence of RNA.The core ARID domain interacted only weakly with the 19mer_dsRNA, indicated by minor CSPs in the spectral overlay (Fig. 6A).In contrast, severe line broadening of the IDR-extended ARID 37-183 suggested a strongly increased interaction with the 19mer_dsRNA (Fig. 6B).This suggests an increasing contribution of IDRs in the case of extended RNA stretches, likely reasoned by the steric possibilities and high density of charges.
To test this hypothesis, we used a previously described physiologically relevant target sequence (13) located in the 3 0 -UTR of the Il-6 mRNA.The 129-nt sequence likely represents the naturally occurring RNA-folding complexity, providing stretches of ssRNA (loops) and base-paired regions (Fig. S15G).Strikingly, while the ARID core domain still binds with only moderate affinity to the RNA in 1.2-fold molar excess, the extended ARID 37-183 strongly interacts already in substoichiometric concentrations (<0.1×) (Fig. 6, C and D).Our results suggest that the ARID IDRs drive RNA-binding affinity in dependence of the provided density of negative charge.To confirm this hypothesis, we further compared EMSA-derived apparent affinities to RNAs of increasing length and see a clear correlation between affinity and RNA length (Fig. 6E).Likely, this effect is also supported by more than one protein binding to the larger RNAs (see right panel).
In conclusion, our results show that Arid5a is principally capable of interacting with RNA.The intrinsically low affinity of the core ARID domain is compensated by its IDR extensions in a nonspecific manner.Consequently, those nonspecific interactions favor RNAs of increasing size, while the core ARID domain remains restrictive to very specific sequences.
iCLIP2 reveals that Arid5a binds to ssRNA in cells with a preference for U-rich stretches Our data so-far suggest that Arid5a is capable of tight interactions with available RNAs, albeit primarily driven by charge interactions.On the other hand, the core ARID domain shows a particular preference for short motifs, which still may steer specific interactions with RNAs more selectively.We speculated that those motifs will be embedded in a larger RNA context in vivo, where RNP complex formation will be supported and modulated by the presence of the ARID-flanking regions and potentially further regions of Arid5a beyond that.Motivated by those assumptions, we performed individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP2) (39) with full-length Arid5a in murine P19 cells.To our knowledge, this has been the first CLIP experiment carried out with an Arid protein to date.
Murine P19 cells express Arid5a mRNA (Fig. S16A), but a specific antibody suitable for iCLIP is lacking.Hence, we Arid5a-extended ARID binds DNA and RNA generated an expression plasmid with the murine full-length Arid5a (see sequence alignment and conservation with human Arid5a in Fig. S2) fused to a C-terminal GFP-tag (mAr-id5a-GFP, Table S2).P19 WT cells were transfected in three replicates and subjected to the iCLIP2 procedure using an anti-GFP antibody (Fig. S16B, see Experimental procedures).All three replicate experiments were highly reproducible and gave rise to more than 7 million crosslinks (Fig. S16, B and C) and 9895 binding sites with an optimal width of 7 nucleotides (nt) that were used for downstream analysis.Arid5a-binding sites are found predominantly in 2607 protein-coding genes but also in 48 lncRNAs and other noncoding RNAs (Fig. 7A).A 3mer enrichment analysis reveals that Arid5a crosslinks preferentially at U-rich stretches (Fig. 7C).The ramp-like enrichment pattern indicates that Arid5a sits at the very 3 0 -end of polyU stretches (Fig. 7, C and E).We hypothesized that this positioning of Arid5a may result from a specific interaction downstream of the polyU stretches via the ARID domain, while its adjacent IDRs crosslink to Us in a fixed Arid5a-RNA orientation.U-rich stretches were also found enriched by RBNS (Fig. S14, B and C), suggesting that this preference is not merely due to a UV crosslinking bias for U. To test our hypothesis, we searched for enriched 3-mers downstream of the Arid5a-binding sites.Of note, we observe a general enrichment of AG-rich 3-mers (Fig. 7D).Moreover, the four 3mers CAG, AGG, GGC, and GCA contained within the RBNSenriched consensus motif (A)CAGGCA(G) (Fig. 5, B and C) are consistently enriched downstream of the binding sites (Fig. 7, D and F), suggesting that Arid5a shows a preference for this RNA motif also in vivo.The positioning of these 3-mers downstream to the polyU stretches might help to position the binding of the ARID domain.
Looking at the bound transcripts, we observe Arid5a across all transcript regions, including introns, 3 0 -UTRs, and exons (Fig. 7B), indicating that Arid5a binds to pre-mRNAs in the nucleus.Arid5a targets are enriched for transcripts involved in mRNA processing, chromatin remodeling, and translation regulation (Fig. 7G).Altogether, our data suggest that Arid5a is a sequence-specific dual DNA-and RNA-binding protein that binds to a subset of transcripts in vivo and may have an accessory function in chromatin-related transcript processing or in transcription regulation, possibly in an RNA-supported context (see Discussion).

Arid5a is a strictly nuclear protein and colocalizes with heterochromatin
The proposed dual function of Arid5a in gene regulation both via interaction with DNA/chromatin and protection against mRNA degradation in the cytoplasm (13,21,22) requires the protein to be present in both the nucleus and the cytoplasm.However, our iCLIP2 data show that Arid5a binds preferentially to unspliced pre-mRNAs, suggesting an exclusively nuclear function of Arid5a associated with chromatin.To test the subcellular localization of Arid5a under normal conditions, we performed confocal fluorescence microscopy of P19 WT cells transfected with mArid5a-GFP.As controls, we transfected SRSF3-GFP as marker for the nucleoplasm (40) and performed immunofluorescence for G3BP1 as a cytoplasmic marker.A plasmid-expressing GFP alone was used as an ubiquitously present protein (41).
Arid5a clearly localizes to the nucleus, and no signal is detectable in the cytoplasm (Fig. 8).However, Arid5a shows a markedly distinct localization pattern compared to the splicing regulator SRSF3, which is found in nuclear speckles and the nucleoplasm.Arid5a perfectly colocalizes with some of the bright heterochromatin dots, indicating a close proximity to silenced chromatin.Together with our iCLIP2 data, this suggests that Arid5a might use its dual nucleic acid-binding capability to interact with DNA and pre-mRNA simultaneously, for example, to detect transcribed loci and then modulate transcriptional repression as it was described earlier (17,18) and very recently for TF with an RBP activity (42).

RNA-binding is a more widespread capacity of ARID domains
Prior to this study, RNA-binding had only been described for Arid5a but for none of the other 14 human Arids.Driven by the observations for Arid5a above, we wondered if RNAbinding was a more general feature within this protein family.To test this, we used the closely related Arid5b as well as Arid1a and Jarid1a to perform analogous 1 H-15 N-HSQC measurements with and without the 19mer_dsRNA (Fig. S15,  A-E).Surprisingly, all three Arids showed obvious line broadening with Jarid1a being most affected and very similar to ARID 37-183 .Arid5b and Arid1a showed less but still mentionable line broadening.This indicates that also other Arids are able to bind RNA in vitro and hints at so-far unexplored functions of these proteins.To corroborate proteinobserved HSQC spectra, we recorded imino proton spectra of the 19mer_dsRNA with and without the Arids in order to examine the influence of protein binding on the RNA chemical shifts (Fig. S15F).We did not observe significant CSPs for the imino peaks, but minor line broadening indicates that all Arid proteins interact with the RNA backbone, suggesting little specificity for the herein provided RNA motif.Nonetheless, these results reveal that some-if not all-Arid proteins are generally able to interact with RNA through their respective ARID domains, and RNA-binding competence is thus not a unique observation for Arid5a.This opens up interesting questions for further detailed studies in the future into whether and how RNA-binding is of functional relevance for them (as suggested for Arid5a).

Discussion
A recent study suggests more than 100 TFs are actively involved in splicing through their DRBP function (43).The ability to interact with both DNA and RNA is either conferred by a combination of specialized domains, for example, in Sox2 (10) or SAFB2 (44) or by the dual exploitation of one domain (12,45).Recently, the role of IDRs for DNA-and RNArecognition, often via the same sequences (46), has come into focus, but specificity parameters like in Arid5a remain elusive based on the lack of structure-derivable knowledge.
Arid proteins are categorized as exclusive DNA-binders with only one exception: Arid5a is capable of binding RNA, with specific target mRNAs and a responsible folded motif presented in earlier work (2,13).However, not only the structural basis of this unique observation has remained unresolved, but also a clear understanding of the precise target nucleic acid preferences of Arid5a, all of which are expected to involve regions beyond the core ARID domain.In support, prior data on Arid5a RNA-binding had been achieved with the full-length protein (21,22,28), while RNA-binding is abolished in the absence of the ARID domain (13).The latter, as well as a study involving a mutant within the core ARID (21), claim that the domain is sufficient for RNA-binding but ignore contributions from sequence elements directly adjacent.Altogether, an atom-resolved proof of the Arid5a ARID domain interacting with DNA and RNA in an isolated, in vitro setup had been missing.
We here provide a detailed interrogation of the Arid5a ARID-preferred DNA target DNA motif, focusing on the core domain, but taking into account contributions of N-and Cterminally extending IDRs.With a core "AATA/TATT" sequence, we find that the core ARID prefers a similar DNA target motif as its related family partner Arid5b (15,16).This was unexpected considering the core ARIDs of both proteins share a sequence identity of only 70.2%, and the extended domains an even lower 58.3%, respectively.Interestingly, the regions involved in DNA-binding (L1 and H4-L2-H5) share a sequence similarity of 97.6% and identity of 85.4%, explaining their preference for identical DNA motifs (Fig. 1B).This is further supported by the finding that amino acids analogous to L2-residue T125 in Arid5a are determinants of DNA preference (32).T125 is both conserved in Arid5a between species and between Arid5a and 5b.Finally, early work had already suggested Arid5a to interact with multiple AT-rich sites, but not with a precise motif (17).In contrast, other members of the Arid family do not necessarily prefer AT-rich sequences, as, for example, reported for Arid1a (31,35) and JARID1a (32) Arid5a-extended ARID binds DNA and RNA with a serine and lysine, respectively, at this position.Our data (Fig. 2) confirm that Arid1a and JARID1a can bind AT-and GC-rich DNA equally strong.Similarly, we do not confirm Jarid1a to exclusively bind GC-rich DNA, thus contradicting the previous suggestions (32).
The co-existence of Arid5a and 5b in higher eukaryotes remains enigmatic, seeing their shared DNA target motif preference of the core ARID domain.Notably, literature does not list an overlap of genes regulated in transcription.Our findings suggest that a fine-tuning of DNA targets may take place through modulation by the non-identical IDRs.This is supported by earlier findings, in which Arid5b was shown to interact with DNA via its C-terminal extension (36).
Our data show that the positively charged C-terminal IDR also supports the affinity of Arid5a to DNA.Notably, the NMR data reveal a larger relative contribution to binding of GC DNA.This suggests this region provides a general support in DNA engagement, ultimately allowing the core ARID domain to selectively encounter AT motifs (Fig. 9A).While here, we suggest opposing charges to drive encounter complex formation, a recent study found negatively charged IDRs to accelerate specific motif search (47), likely preventing too tight interactions.We, however, did not find a similar contribution from the negatively charged N-terminal extension.Certainly, nature has established multiple modes of fine-tuning DNArecognition through IDRs (48)(49)(50), including roles for hydrophobic sequences as recently shown by Jonas et al. (51).
The strong similarity in DNA-binding between Arid5a and Arid5b raises the question why to date only Arid5a was found to bind RNA.A sequence conservation within the extended ARID domains of the two proteins below 60% supports the hypothesis that the IDRs play a central role in (distinctive) RNA-binding competence.For Arid5a, we here unambiguously provide an atom-resolved proof for its proposed dual nucleic acid-binding competence (Fig. 9A), while no work had shown RNA-binding by robust in vitro experiments before.Unexpectedly, we found only weak binding of the utilized Arid5a constructs to the previously described Ox40 ADE motif (22)(23)(24), while visibly enhanced by the IDRs.Our RBNS approach (unprecedented for an ARID domain) suggested short ssRNAs, superior to the ADE.In support, these motifs were partially found in vivo, demonstrated by the first iCLIP2 experiment with an Arid protein.
In general, binding to sequence-and size-equivalents of DNA (Fig. S17) revealed the subordinated affinity of the ARID domain to RNA.In fact, we find that RNA binding shows a CSP pattern reminiscent of nonspecific DNA-binding by NMR (Fig. S18).We do not rule out that we missed a complex-folded RNA motif preferentially bound by the ARID domain, similar to the unique binding of ADE and CDE elements by the Roquin ROQ domain (23,52).
The nuclear localization of Arid5a is supported by early characterization of the protein in different tissue types (17).More recent work identified Arid5a as specific RBP in stimulated immune cells, including export to the cytosol (13,21,22,28,53).Our data reveal a full nuclear localization, while we did not perform iCLIP2 under differential conditions and do not question a possible engagement with specific transcripts outside the nucleus.Still, we doubt Arid5a is a broadly acting RBP, neither in the nucleus nor cytoplasm, as it crosslinked much less to RNA than, for example, the splicing factor SRSF5 with 40 times more binding sites in a similar approach (54).
Apart from the above, the general capability of interacting with RNA had not yet been tested for other Arid proteins to our best of knowledge.When we compare findings for Arid5a to Arid1a, 5b, and Jarid1a, our data hint at a more common RNA-binding capability of ARID domains than expected.This suggests unknown functions for Arids with respect to gene regulation, for example, at the interface of transcriptional and posttranscriptional levels as very recently shown for Arid1a (55).Considering a difference in affinity between DNA and RNA, as found here for the Arid5a ARID domain, we can speculate whether RNA-binding functions require high protein or target RNA concentrations.In such a scenario, for example, Arid5a will automatically expand from DNA-binding (transcription regulation) to RNA (transcript)-binding as a consequence of its own abundance.For example, Arid5a may first act as transcriptional repressor on the chromatin level, while it then stabilizes or blocks transcripts from translation at a later stage, including its abundance-based co-export from the nucleus, thus fulfilling a regulatory role on multiple levels.
In fact, TFs possibly involve simultaneous RNA-binding as a feedback mechanism in transcription or for recruitment to transcriptional start sites, for example, via (l)ncRNAs.Similar to the emerging role of circRNAs for RBPs (56), RNAs may also act as sponges for excessive DBPs (57) via IDR interactions.DNA-and RNA-binding is a strong indicator for subcompartmental clustering of transcriptional processes, for example, for co-transcriptional splicing (43) or miRNA processing.The latter was suggested for SAFB2 (58,59) as a bona fide example of a DRBP (44).Our iCLIP2 and microscopy data now suggest a similar role for Arid5a, which could function in a mechanism of RNA-induced transcriptional silencing or activation in line with differential regulation of transcription in Arid5a k.o.conditions (60).Both scenarios will involve the core ARID domain binding to dsDNA and to particular ssRNAs.The observed ramping effect in our iCLIP2 data suggests that Arid5a recognizes steep changes in nucleotide composition, that is, from longer U-rich stretches to purinerich sequences (in accordance with RBNS), using its core ARID domain and the flanking IDRs.Such changes in nucleotide composition occur at intron-exon junctions in pre-mRNAs (Fig. 9B).We speculate that Arid5a normally binds to DNA but hops on nascent RNAs in the vicinity when such boundaries emerge and could thereby discriminate normal pre-mRNAs from spurious transcripts.The fact that we detect bound transcripts by iCLIP2 suggests that Arid5a binding rather prevents silencing of the locus, perhaps through loss of DNA-binding, but this requires further investigation.
The support by its adjacent IDRs additionally allows for protein-regulatory features steerable via posttranslational modifications (PTM).Likewise, IDRs are by default susceptible to proteolysis, an excellent tool to disrupt functional protein moieties (61,62).Similar to the described PTMs in more distal parts of Arid5a (63), PTMs in the extended ARID domain may be relevant with respect to DNA versus RNA preference and general affinity.
We here focused on solution NMR spectroscopy as a valuable tool to en-detail correlate chemical shift information with binding modes.CSP patterns, that is, trajectories, magnitudes, and exchange regimes, are unambiguous indicators of a protein domain's preference for nucleic acid as recently shown in similar studies by us (44,64) and others (65,66).As such, CSPs can be used to compare DNA and RNA-binding by Arid5a, and consequently, the approach is transferable to other nucleic acid-binding domains of interest.The straightforward NMRcentered biochemical setting will on the longer run also allow to unambiguously read-out selective inhibition of one or both DNA and RNA functions as intended for Arid5a earlier (67).

Experimental procedures Arid protein construct design and mutagenesis
Human Arid5a ARID constructs used in this study were designed and cloned as described previously (68).In brief, we used different domain boundaries, comprising the minimal ARID core (ARID 49-152 ) alone or extended either N-(ARID 37-152 ) or C-(ARID 49-183 ) terminally or both (ARID 37-183 ), with the numbers representing the natural sequence in the full-length context (Fig. 1A).ARID-coding DNA sequences for human Arid1a (residues 999-1132), Arid5b (residues 300-434), and Jarid1a (residues 66-198) were designed to comprise the minimal core ARID plus 18 N-and 20 C-terminal amino acids.They were obtained from Eurofins Genomics, optimized for Escherichia coli codon usage, and sub-cloned into the pET24d-derived vector pET-Trx1a (Gunter Stier, EMBL/BZH Heidelberg) (69, 70) by NcoI/XhoI restriction and subsequent ligation.Minimal core ARID domains were generated using the respective oligonucleotides listed in Table S1.
Arid5a ARID point mutations, in either the minimal ARID 49-152 or extended ARID 37-183 context, were introduced by site-directed mutagenesis (Tables S1 and S2).Constructs with multiple nonadjacent mutations were cloned in subsequent steps.A gene encoding for murine full-length Arid5a (fl-Arid5a) (Eurofins Genomics) was cloned into the vector pEGFP-N1 (CLONTECH) (Tables S1 and S2) to obtain Arid5a-GFP, for transfection and imaging in human and murine cell lines.Cloning was performed via Gibson assembly (71).Briefly, the PCR-linearized pEGFP-N1 and fl-Arid5a with homologous 5 0 -and 3 0 -ends were mixed in the reaction, incubated for 60 min at 50 C, and transformed into E.coli Dh5a.

Arid5a-extended ARID binds DNA and RNA
To create an Arid5a production vector for a recombinant ARID domain with Strep-tag for RBNS experiments, we amplified the Arid5a gene from pET-Trx1a_ARID 49-152 and cloned it into pET_TRX_Bsa_StrepTag-N via Golden Gate Assembly (72) using BsaI restriction sites (Table S1).The resulting fusion protein then contains a His 6 -Tag followed by a thioredoxin-tag (TRX), a TEV cleavage site, a Twin-Strep-tag (IBA Lifesciences, Göttingen), and ARID 49-152 .The amino acid sequences are listed in Table S6.

DNA ligand constructs
All DNA oligonucleotides used in this study were obtained from Sigma-Aldrich.dsDNA was obtained through annealing of complementary strands (5 min at 98 C followed by cooling down to room temperature).An overview of herein used DNAs is given in Tables S3 and S5.

RNA in vitro transcription
Unlabeled RNAs from 15 nt in length and longer were produced by in-house optimized in vitro transcription (IVT) and purified either from a linearized plasmid or from annealed oligonucleotides (Table S4) as described in (64).Briefly, plasmid DNA was linearized with HindIII prior to IVT by in-house-expressed T7 RNA polymerase.Alternatively, complementary oligonucleotides (Sigma-Aldrich) were annealed and used as templates for IVT.RNAs from preparative-scale (10-20 ml) transcription reactions (4 h at 37 C) were precipitated with 1.5 volumes 2-propanol overnight at −20 C. RNAs were separated on 12 to 18% denaturing polyacrylamide gels and visualized by UV shadowing.The excised RNA-fragments of expected length were eluted into 0.3 M NaOAc overnight and subsequently washed, concentrated, and buffer-exchanged to the experimental buffer.
RNAs below 15 nt in length (Table S5) were obtained from Dharmacon Horizon in 1-mmol-scale quantities, deprotected, and desalted.Each RNA was dissolved in the respective volume of ddH 2 O to a final concentration of 3 mM.

In vitro transcription of the RBNS input pool
As template, a T7 promoter-containing oligonucleotide was annealed to an equimolar quantity of RBNS T7 template oligonucleotide (a random 20-mer flanked by partial Illumina primers).500 fmol template were transcribed overnight at 37 C with 200 mM Tris-HCl pH 8.0, 20 mM magnesium acetate, 8% (v/v) DMSO, 20 mM DTT, 20 mM spermidine, 4 mM nucleoside triphosphates (each), and self-made T7 RNA polymerase.The RBNS pool was purified by PAGE (polyacrylamide gel electrophoresis).Oligonucleotide sequences are given in Table S7.

NMR spectroscopy
NMR experiments were performed at the Frankfurt BMRZ using Bruker Avance III/Avance Neo spectrometers of 600, 700, and 900 MHz proton Larmor frequency, equipped with cryogenic probes, and using Z-axis pulsed field gradients.All measurements containing protein were performed at 298 K in standard Arid5a buffer containing 20 mM Bis-Tris pH 6.5, 150 mM NaCl, 2 mM TCEP, 0,02% NaN 3 supplemented with 5% (v/v) D 2 O. Topspin versions 3 and 4 were used for data acquisition and processing.Graphical plots of spectra were created using the program NMRFAM-Sparky (73) version 1.470.
NMR backbone resonance assignments of WT ARID constructs were taken from BMRB entries 51,811 and 51,812 (68).Amide assignments of mutant ARID versions were accomplished by directly transferring the majority of assignments for peaks matching both spectra.Assignments for shifted peaks were transferred to the closest neighbor and/or with most obvious fit, which led to an unambiguous assignment completeness of 92 to 98% in the mutant ARID versions.All assignment transfers for individual apo and DNA-bound mutants are summarized in the Source Data file compared to 135 total amide assignments for the ARID 37-183 apo spectra and 126 assignments for ARID 37-183 with 2x 13merAT, as well as a total of 101 resonances for the apo and DNA-bound ARID 49- 152 WT.Note that all significantly perturbed residues in DNAbinding were successfully re-assigned for comparison and later use in the box plot analysis.
NMR titrations were performed by preparing two initial samples: (i) a protein apo sample and (ii) a sample comprising protein in the presence of the maximum DNA/RNA concentration.All intermediate titration points were mixed from those samples subsequently (from high to low) to avoid side effects of protein dilution.For each sample, we monitored protein peaks by recording 15 N-(TROSY)-HSQCs and DNA/ RNA imino peaks by acquisition of 1D imino proton spectra.For HSQC-spectra, we typically recorded 128 and 2048 points in the indirect 15 N and 1 H direct dimensions, respectively, with spectral widths of 32 ppm (offset at 116.5 ppm) and 16 ppm.
For 70 mM samples used in titrations, we recorded 32 scans per increment, while 40 scans per increment were recorded for 50 mM samples.For the DNA 1D imino proton spectra, we recorded a second set of experiments, where the DNA concentrations were kept constant at 80 mM and the protein concentration varied (10 mM, 20 mM, 40 mM, 80 mM).Spectra were recorded with 8192 points and 512 scans for 13merAT and 2560 points and 256 scans for 13merGC.The spectral width was set to 23.5 ppm and 21 ppm for 13merAT and 13merGC, respectively.Analysis of spectra and quantification/ plotting of CSPs from titrations were performed in the CCPNMR Analysis 2.5 software (74).Significance of CSPs was defined as above average plus one SD, if not indicated differently. 1 H-15 N-CSPs were calculated in ppm according to Equation 1: For the full integration of CSPs into statistics and graphical depiction, we used box plots according to the OneSampletTest (descriptive statistics) in OriginPro 2021b.Each box represents the interquartile range from the 25th to the 75th percentile.Whiskers show deviating values with a coefficient of 1.5.Values further beyond this threshold are shown in black or colored triangles, with colored triangles representing the five highest CSPs from ARID 37-183 with 13merAT or the single highest CSP of ARID 37-183 /ARID 49-152 with 13merGC.Those colors are used throughout the panel for comparison.
For the assignment of imino protons in the 13merAT, we recorded a 1 H-1 H-NOESY at 278 K with a spectral width of 22 and 15 ppm and 4096 and 266 points for the direct and indirect proton dimensions, respectively.The mixing time was set to 300 ms.Based on this, we transferred the assignment to 298 K in a peak-traceable temperature series (Fig. S5).

Structures and structure models
Five structural models of Arid5a were generated ab initio with RoseTTAfold (75) using residues 37 to 183 from the sequence deposited in Uniprot (76) under ID Q03989 (see Fig. S6).We confirmed the secondary structural elements within the ARID domain by a comparison of the generated models with secondary chemical shift data, obtained in earlier work (68).The 13merAT dsDNA was modeled using the program Avogadro (77) from its primary sequence as B-DNA.PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.) was used to align both the Arid5a models and the 13merAT dsDNA model individually to the structure of Arid5b ARID in complex with DNA (PDB 2OEH, ( 16)).To visualize the extended Arid5a ARID domain binding to DNA, the aligned models were than manually arranged to each other by means of a slight positional adjustment to exclude steric clashes caused by the nonidentical sequences of DNAs and between Arid5a and 5b.

Electrophoretic mobility shift assay
To decipher the interaction of RNA/DNA and protein, we used EMSAs with radioactively labeled RNA (rEMSA) and fluorescently labeled DNA.The RNA was in vitro transcribed with T7-RNA polymerase and labeled with g-32 P according to a protocol by Nahvi and Green (78).We used 30 pmol of RNA, which was dephosphorylated at the 5 0 -end with 3 ml of Quick-CIP (5000 U/ml, NEB) in a total volume of 20 ml according to manufacturers instructions.Next, we performed a phenol/ chloroform extraction and precipitated the RNA with ethanol and sodium acetate in the presence of 20 mg glycogen for 30 min at −20 C. The precipitated RNA was pelleted for 15 min at 16,000g at 4 C.The pellet was resuspendend in 10 ml ddH 2 O from which 5 ml were used for the 32 P-labeling.Therefore, 1.5 ml g-32 P-ATP (10 pmol, Hartmann Analytic), 2 ml T4-PNK buffer (NEB), 2 ml T4-PNK (10 U/ml, NEB), and 9.5 ml H 2 O MQ were added.The reaction was incubated for 60 min at 37 C to allow phosphorylation followed by 10 min at 80 C to inactivate the kinase.To finally purify the radioactivelylabeled RNA, we used NucAway Spin Columns (Thermo Fisher Scientific) according to manufacturer's instructions.Finally, the RNA was refolded (4 min 95 C, cooled down on ice water) and diluted to a final volume of 400 ml and stored at −20 C.
For the fluorescently labeled DNA, complementary DNA oligonucleotides were used (Table S3), with one oligonucleotide 5 0 -labeled with fluorescein-(FAM) and the other one unlabeled.Complementary oligonucleotides (100 mM) in Arid5a buffer (20 mM Bis-Tris, 150 mM NaCl, 2 mM TCEP, 0.02% NaN 3 , pH 6.5) were mixed in a 1:1 ratio and heated to 95 C for 5 min before cooling down to allow for the annealing of dsDNA.
EMSA reactions were prepared in a final volume of 20 ml.Therefore, we mixed 0.6 mg of yeast tRNA (Roche), 10 mM MgCl 2 , Arid5a buffer (20 mM Bis-Tris, 150 mM NaCl, 2 mM TCEP, 0.02% NaN 3 , pH 6.5), and respective amounts of protein.Finally, 2 ml labeled RNA or DNA were added and the reaction (final concentration of fluorescent ligand was 10 nM and of 32 P-ligand ≤1 nM) incubated for 10 min at room temperature (22-24 C).Immediately before loading 10 ml onto a 6-% polyacrylamide gel, 3 ml of loading buffer were added.Gel-electrophoresis was run for either 40 min at 80 V for DNA or 80 min at 80 V for RNA.The gels were imaged with a Typhoon Imager (GE Healthcare) either in the glass plates (for DNA) with a laser at 488 nm excitation and an emission filter at 520 nm or dried and indirectly imaged by phosphor imaging (for RNA).
Quantification of EMSAs was carried out as follows: The free band intensity was quantified in ImageQuantTL 8.1 by measuring the pixel intensity in a fixed window for each given protein concentration (see Fig. S4F as an example).Afterward, intensities were normalized to the lane with 0 mM protein, which was automatically set to 1.These values were then subtracted from 1 (1-free DNA) and again normalized, so that the highest values would reach 1 (only done for EMSAs with visible complex formation and 13merGC as a control).The twofold normalized data was plotted as a function of the respective protein concentrations used.The single data points were fitted by a nonlinear fit (Hill-fit) in OriginPro 2021b according to Equation 2: Arid5a-extended ARID binds DNA and RNA In the equation, "V max " stands for the maximum possible bound fraction represented by the upper asymptote, "k" is the protein concentration at the transition point, and "n" is the Hill coefficient.

RNA bind-n-seq
An RBNS assay was performed with the Twin-Strep-tagged ARID 49-152 domain and a randomized input RNA pool based on reference (38).The protein was equilibrated in binding buffer (25 mM Tris-HCl, pH 7.5, 150 mM KCl, 3 mM MgCl 2 , 0.01% Tween, 500 mg/ml BSA, 1 mM DTT) at three different concentrations (0.25, 1, 5 mM) for 30 min at 4 C. Next, the RNA was folded by snap-cooling and added to a final concentration of 1 mM with 40 U Ribonuclease Inhibitor (moloX, Berlin) and incubated for 1 h at room temperature.A pulldown was performed by incubating the RNA/ARID mixture with 1 ml of washed MagStrep "type3" XT beads (IBA Lifesciences) for 1 h at 4 C. Subsequently, unbound RNA was removed by washing three times with wash buffer (25 mM Tris-HCl pH 8.0, 150 mM KCl, 60 mg/ml BSA, 0.5 mM EDTA, 0.01% Tween).Afterward, the RNA-ARID complexes were eluted twice with 25 ml of elution buffer (wash buffer containing 50 mM biotin).RNA was extracted with the Zymo RNA Clean & Concentrator-5 kit (Zymo Research) according to the manufacturer's instructions.The extracted RNA was reverse transcribed into cDNA, amplified by PCR to add Illumina adapters (Table S7) and an index for each concentration (Table S8), and subjected to deep sequencing (GENEWIZ).
Next-generation sequencing data were analyzed using the RBNS pipeline as described in (79), available at https:// bitbucket.org/pfreese/rbns_pipeline/overview.The sequence context was analyzed using a self-written Python script.This searches for a given motif (in this case the enriched kmers) in each read of the sequence and generates the upstream and downstream sequence logos of the given sequence.Logos are then calculated from this, which are corrected by the composition of the bases (background) in the input pool.

Culturing and transfection of P19 cells
Murine P19 WT cells were cultivated on 10-cm culture dishes pre-coated with 0.1% gelatin (in PBS) under humidified condition at 5% CO 2 and 37 C in DMEM GlutaMAX Medium, supplemented with 10% (v/v) heat-inactivated fetal bovine serum and 100 mg/ml penicillin-streptomycin (all Gibco, Thermo Fisher Scientific).P19 WT cells were transfected with 4 mg plasmid DNA in 10-cm plates using the jet-OPTIMUS Transfection reagent (Polyplus) according to the manufacturer's instructions.The cells were harvested after 24 h of incubation.

Confocal microscopy
For GFP and immunofluorescence microscopy, P19 cells were grown on precoated 10 mm glass coverslips in 10-cm plates.The coverslips were transferred into a 24-well plate and washed with 1× PBS.After removing the PBS, cells were fixed with 4% paraformaldehyde (in PBS; Thermo Fisher Scientific) for 20 min at room temperature.Fixed cells were washed twice with 1× PBS and then permeabilized in permeabilization buffer (5% BSA, 0.1% Triton in 1× PBS) for 30 min.Mouse anti-G3BP1 antibody (Abcam, ab56574) was diluted in blocking buffer (5% BSA in 1× PBS) at 2 mg/ml final concentration and incubated for 16 h overnight at 4 C in the dark as a cytoplasmic marker.The coverslips were washed twice with 1× PBS and incubated with the secondary antibody (donkey anti-mouse coupled to Alexa Fluor 594, Abcam; 1:500 in blocking buffer) for 60 min at room temperature in the dark.After washing the coverslips twice with 1× PBS, the DNA was stained with Hoechst 34580 (Thermo Fisher Scientific) at a final concentration of 5 mg/ml in Tris-buffered saline with 0.1% Tween-20 for 30 min at room temperature in the dark.After a final wash, the coverslips were dried and mounted on ProLong Diamond Antifade Mountant (Thermo Fisher Scientific P36961).

Figure 1 .
Figure 1.The Arid5a ARID domain is a minimal ARID core motif.A, domain architecture of full-length (fl)-Arid5a and overview of ARID domain boundaries used in this study.B, comparison of human and mouse Arid5a ARID sequences with the human Arid5b ARID, as obtained by Clustal Omega (88).The ARID secondary structure elements (helices H1 to H6) are indicated above the sequence for human Arid5b (blue, PDB 1IG6) and human Arid5a (red (68)).For full sequence alignment of Arid5a human with mouse, see Fig. S2.C, structural model of the Arid5a core ARID domain as derived from a RoseTTAFold (75) run with the sequence of ARID 37-183 .The model represents member 1 of an ensemble (see Fig. S6).

Figure 3 .
Figure 3. Arid5a interacts with AT-rich DNA through loops in its core ARID and the C-terminal IDR.A, 1 H-15 N-HSQC overlay of ARID 37-183 alone and after titration with 4-fold 13merAT (orange) or 13merGC (blue).Insets show all titration points and assignments.Spectra were recorded at a constant protein concentration of 70 mM with 17.5, 35, 70, 140, and 280 mM dsDNA at 600 MHz and 298 K. B, chemical shift perturbation (CSP) plot of ARID 37-183 upon titration with 4-fold 13merAT (upper panel) or 13merGC (lower panel).Negative bars in light gray and gray show prolines and unassigned residues, respectively.Significantly shifting peaks, that is, above mean + 1 SD, are shown in Fig. S8B.See methods section for details on the quantification of CSPs in this manuscript.C, RoseTTAFold (75) model of ARID 37-183 as picked from an ensemble (see Fig. S6) highlighting highest CSPs from (b) (above mean + 1 SD) in red.For simplification, only residues 37 to 165 are shown.The DNA is shown for orientation revealing the putative interface (see Experimental procedures section).D, 1 H-15 N-HSQC zoom-ins of four Arid5a ARID constructs with/without N-and/or C-terminal extension-as indicated above-showing spectra of proteins alone and when titrated with 13merAT dsDNA.Spectra were recorded at a constant protein concentration of 70 mM with 17.5, 35, 70, 140, and 280 mM dsDNA at 600 MHz and 298 K. E, 1D imino proton spectra of 13merAT (upper panel) and 13merGC (lower panel) upon titration of DNAs with ARID 37-183 .See also Fig. S5.Spectra were recorded at a constant DNA concentration of 80 mM with 10, 20, 40, and 80 mM protein at 600 MHz and 298 K.All experiments have been carried out in standard Arid5a buffer.

Figure 4 .
Figure 4. Mutational studies of the Arid5a ARID domain reveal the central core-binding residues.A, 1 H-15 N-HSQC zoom-ins of ARID 49-152 WT and mutants overlaying apo protein spectra (black) with those of samples containing 2-fold 13merAT dsDNA (orange/red) or 2-fold 13merGC dsDNA (blue).Spectra were recorded in standard Arid5a buffer at a protein concentration of 70 mM with 140 mM dsDNA for the complex sample at 600 MHz and 298 K. B, boxplot of CSP quantifications of ARID 37-183 WT and mutants upon addition of 2-fold 13merAT dsDNA.C, boxplot of CSP quantifications of ARID 49-152 WT and mutants upon addition of 2-fold 13merAT dsDNA.For comparison and as a reference, each boxplot also shows the CSPs of the WT with 2-fold 13merGC dsDNA.The box represents the interquartile range from the 25th to the 75th percentile with a whisker coefficient of 1.5 for outliers and further outliers shown as black/colored triangles.The median is shown as a horizontal line within boxes and mean values are indicated by black squares.The five highest CSPs for 13merAT and the highest CSP for 13merGC with ARID 37-183 or ARID 49-152 are color coded for direct comparison between WT and mutants.Source data with all CSPs are provided as a Source Data file.

Figure 5 .
Figure 5.The Arid5a ARID domain binds RNA motifs with moderate affinity.A, the Ox40-ADE forms a stem-loop element, confirmed by the depicted imino-proton spectrum.Assignments have been transferred from Janowski et al., 2016 (23).Spectrum measured with 20 mM RNA at 600 MHz and 298 K. B, enrichment of all 6-mers at 0.25, 1, and 5 mM ARID 49-152 concentration from RNA bind-and-seq (RBNS).Values greater than three SDs above the mean are highlighted in blue.For the highest significant, 10 motif sequences are given.RBNS experiments have been carried out in 25 mM Tris-HCl, pH 7.5, 150 mM KCl, 3 mM MgCl 2 , 0.01% Tween, 500 mg/ml BSA, 1 mM DTT. C, RBNS-based 9mer sequences that can be clustered from the enriched 6mers containing AGGC.D, zoom-ins of 1 H-15 N-HSQC spectra of apo ARID 49-152 (40 mM for ADE/70 mM for RBNS-RNAs) overlaid with 1.7-fold molar excess of ADE RNA or 2-fold molar excess of RBNS-9mer RNAs.Spectra were measured at 600 MHz (for ADE RNA) or 700 MHz (for RBNS-RNAs) at 298 K in standard Arid5a buffer.E, CSP plots of ARID 49-152 upon addition of Ox40-ADE or RBNS-9mer RNAs as shown in (D).

Figure 6 .
Figure 6.Arid5a ARID domain binding to RNA.A and B, 1 H-15 N-HSQCs of ARID 49-152 (A) or 37-183 (B) without RNA (black) or with 1-fold 19mer_ds RNA (red).C and D, 1 H-15 N-TROSY-HSQC of ARID 49-152 (C) or 37-183 (D) without RNA (black) or with 1.2-fold and 0.1-fold human Interleukin-6 mRNA (red), respectively.Protein concentration for all NMR measurements was 50 mM.Spectra were measured at 600 (A and B) or 900 MHz (C and D) and 298 K. E, EMSAs showing that increased RNA size leads to more affine binding by ARID 37-183 .All experiments have been carried out in standard Arid5a buffer.

Figure 7 .
Figure 7. Binding preferences of Arid5a in endogenous RNAs determined by iCLIP2.A, binding site distribution of Arid5a in different gene biotypes.B, Arid5a-binding sites in transcript regions of protein-coding genes.C, heatmap showing clusters of 3-mers starting/ending with U around Arid5a-binding sites in a window of ±50 nt.D, heatmap showing clusters of all other 3-mers around Arid5a-binding sites in a window of ±50 nt.RBNS-derived 3-mers areunderlined.E, frequency of UUU per position in a window of ±50 nt around the binding sites.F, frequency of the 3-mers AGG, CAG, GCC, and AGC from the RBNS consensus motif per position in a window of ±50 nt around the binding sites.G, functional enrichment analysis (Gene Ontology Biological Process) using for transcripts with Arid5a-binding sites.CDS, coding sequence; UTR, untranslated region; lincRNA, long intergenic non-coding RNA; snRNA, small nuclear RNA; snoRNA, small nucleolar RNA.

Figure 8 .
Figure 8. Subcellular localization and RNA-binding of Arid5a.Representative confocal micrograph showing that Arid5a-GFP localizes to the nucleus of murine P19 cells and colocalizes with bright heterochromatin dots.GFP was used to stain the entire cell and SRSF3-GFP as marker for nuclear speckles.Staining for G3BP1 in all cells labels the cytoplasm, and Hoechst stains chromatin.All zoom-ins are twofold.Scale bars represent 5 mm.

Figure 9 .
Figure 9. Summary of Arid5a interacting with nucleic acids.A, overview of possible and preferred interactions of the extended ARID domain, ARID 37-183 , with DNAs (left) and RNAs (right) with an apparent hierarchy of affinity and selectivity.B, hypothetical model of transcription modulation by Arid5a integrating the in vitro and in vivo findings of Arid5a 0 s specific and nonspecific nucleic acid interactions mediated by the core domain (dark blue rectangle) and the IDR extensions (broken lines): Recruitment of Arid5a to DNA/RNA ('scanning') will primarily locate the protein to AT-rich DNA promoter/enhancer regions (I).Increase of local Arid5a concentration through recognition of intron-exon boundaries in nascent transcripts closely located to transcribing DNA (II).Arid5a binding could allow productive transcription simply by relieving the block of DNA promoter/enhancer regions (III).Alternatively, Arid5a recruitment to pre-mRNA can lead to recognition of the gene's promoter region and its silencing.