Atomic force microscopy-based approaches for single-molecule investigation of nucleic acid–protein complexes

The interaction of nucleic acids with proteins plays an important role in many fundamental biological processes in living cells, including replication, transcription, and translation. Therefore, understanding nucleic acid–protein interaction is of high relevance in many areas of biology, medicine and technology. During almost four decades of its existence atomic force microscopy (AFM) accumulated a significant experience in investigation of biological molecules at a single-molecule level. AFM has become a powerful tool of molecular biology and biophysics providing unique information about properties, structure, and functioning of biomolecules. Despite a great variety of nucleic acid–protein systems under AFM investigations, there are a number of typical approaches for such studies. This review is devoted to the analysis of the typical AFM-based approaches of investigation of DNA (RNA)–protein complexes with a major focus on transcription studies. The basic strategies of AFM analysis of nucleic acid–protein complexes including investigation of the products of DNA–protein reactions and real-time dynamics of DNA–protein interaction are categorized and described by the example of the most relevant research studies. The described approaches and protocols have many universal features and, therefore, are applicable for future AFM studies of various nucleic acid–protein systems.


Introduction
Most fundamental biological processes that occur inside a cell involve the interaction of proteins with nucleic acids.In particular, the specific interaction of proteins with DNA and RNA is one of the main factors in gene expression regulation (i.e. the transmission of hereditary information from genes to proteins), including the process of transcription.
The traditional methods of investigation of nucleic acid-protein interaction include a number of gel electrophoresis-based methods (such as electrophoretic mobility shift assay (EMSA), footprinting, cross-linking, and western blotting techniques), filter binding assay, chromatin immunoprecipitation analysis, protein-binding microarrays, and surface plasmon resonance.(Ferraz et al. 2021).These techniques are highly informative in determining the amount of DNA associated with a protein or the specificity of the DNA sequence to a protein.However, they provide an averaged picture of nucleic acid-protein interaction, and if several configurations of complexes are present in the system, the most rarely presented fraction may not be detected.Moreover, some of these techniques, such as polyacrylamide gel electrophoresis, are limited to the study of relatively short DNA fragments (up to several hundred base pairs) and are not applicable for the analysis of large DNA-protein complexes.Also, the movement of nucleic acid-protein complexes through the polyacrylamide gel matrix during electrophoresis may induce a distortion of some weak but important bonds (Wyman et al. 1997).
AFM is an alternative method for studying the nucleic acid-protein interaction, which allows visualizing individual DNA molecules with bound to them protein molecules with high spatial resolution, determining the dimension, number of protein molecules in the complex, their position on the DNA template, and the length of the DNA-protein complex (Bangalore and Tessmer 2018;Beckwitt et al. 2018;Kasas and Dietler 2018;Main et al. 2021;Zhao et al. 2022).At the same time, the length of DNA molecules involved in the formation of complexes, which may be studied by AFM, is practically unlimited.Despite a number of limitations intrinsic to AFM, such as a necessity of adsorption of biomolecules to a solid support, possible mechanical impact of a sample by a cantilever, under-or overestimation of the sample dimensions in certain cases (Winzer et al. 2012;Kasas and Dietler 2018;Tolstova and Dubrovin 2019), and the use of simplified in vitro models, which usually do not contain all components of karyoplasm, AFM was shown to be an effective tool for genomic research (Lyubchenko and Shlyakhtenko 2016;Konrad et al. 2021).In particular, AFM is widely used to study numerous aspects of transcription, translation, replication, repair, recombination, DNA restriction, DNA packaging, viral genome transfer, mapping protein binding sites on DNA, analyzing the structure and properties of nucleic acid-protein complexes of biotechnological applications (for example, nanocontainers for drug delivery), and to study the nucleic acid-protein interaction involving proteins whose function remains unknown.
This review describes the characteristic features of AFM investigation of nucleic acid-protein interactions with a focus on the study of transcription.Transcription is a process of the copying of genetic information from DNA to RNA, i.e. the RNA synthesis on the DNA template, performed by the enzyme DNA-dependent RNA polymerase (RNAP).From a chemical point of view, transcription is a reaction of polymerization of ribonucleotides.Transcription takes place in all live cells and has a fundamental significance as one of the key phases of gene expression.The diversity of RNAP enzymes, as well as transcription regulation factors inherent in certain organisms, creates great variability in the details of development of this process in different cells, which often remain not fully understood.At the same time, transcription has general patterns, in particular, the presence of three phases initiation, elongation, and termination.
During the initiation step, RNAP recognizes a specific DNA sequence, which is called a promoter.The promoter is located just before the coding sequence of the gene.By binding to the DNA, RNAP forms a closed transcription complex.After that, RNAP undergoes structural changes and transforms from a closed promoter complex to an open promoter complex (OPC), which is accompanied by the melting of a small portion of the DNA double helix and the formation of the so-called transcription bubble.The next stage, elongation, implies the movement of RNAP together with the transcription bubble along the DNA template and the simultaneous synthesis of RNA as a result of the polymerization of ribonucleoside triphosphates (ATP, CTP, UTP, and GTP).The complex of DNA, RNAP, and RNA that is formed during elongation is called a triple transcription complex.Termination is characterized by the stopping of RNAP movement along the DNA template (and RNA synthesis), the dissociation of the triple transcription complex, and the release of the RNA transcript.Termination usually occurs when the elongation complex reaches a special sequence on the DNA template called a terminator, but can also be regulated by protein factors.
There are two main approaches to the study of transcription using AFM.The first one implies AFM study of reaction products either in air or in aqueous solution.For example, various transcription complexes (Hansma et al. 1993;Billingsley et al. 2012) and DNA complexes with transcription regulation factors, such as FIS (factor for inversion stimulation of the homologous Hin and Gin site-specific DNA recombinases of Salmonella and phage Mu, respectively) and H-NS (histone-like nucleoid-structuring protein) (Zhang et al. 2004;Shin et al. 2005), have been characterized in this way.Another approach involves a dynamic study of the interaction of DNA with RNAP on a substrate in an aqueous solution.In this case, a sequence of AFM images of the individual DNA molecules interacting with the protein molecules is recorded, and the changes in morphology and biopolymer conformation are analyzed (Kasas et al. 1997;Bustamante et al. 1997;Guthold et al. 1999;Suzuki et al. 2012).In dynamic studies, the frequency of recording AFM images varies from one frame in a few minutes to several tens frames per second (for high-speed AFM).
An important condition for the study of nucleic-protein complexes using AFM is their immobilization from the solution onto the surface of the substrate, which would preserve the integrity of the complexes and the mutual arrangement of the bound enzyme and DNA.Such deposition of the sample to the substrate is usually provided by the equilibrium adsorption of DNA-protein complexes, which is determined by the adsorption of the DNA molecule as the dominant component of the DNA-protein complex.Most often, the equilibrium adsorption of DNA-protein complexes is achieved during their deposition on mica in the presence of mono-and divalent cations (Rivetti et al. 1996;Margeat et al. 1998).
During transcription initiation, RNAP binds to a specific promoter sequence of the DNA molecule.In many cases, the interaction between RNAP and the promoter is mediated by other proteins, which bind to the regulatory region of the promoter.This leads to the formation of a certain spatial structure, which changes the configuration of the promoter region and influences the regulation of the nearest promoters (Browning and Busby 2004).In particular, DNA bending and looping play a big role in many cellular processes, for example, they control the approach of distant DNA regions during transcription (Saiz and Vilar 2006).Controlling the morphology of the promoter and the region near the promoter is one of the ways to regulate gene activity.
Therefore, it is difficult to overestimate the importance of direct visualization of individual DNA-protein complexes using AFM.Morphological characteristics of DNA-protein complexes extracted from AFM images, such as the contour length of DNA, height, thickness, area and volume of the complex, the bending angle of the DNA, the distance between enzymes on the DNA matrix, as well as between the enzyme and the ends of the DNA molecule, allow us to restore the molecular structure of complexes and solve a big range of tasks related to the characterization of the molecular biological processes.The main classes of the tasks related to the study of nucleic-protein interaction, which can be efficiently solved using AFM are considered below (summarized in Fig. 1).

DNA mapping and identification of the position of protein molecules on a DNA template
Identification of DNA-protein binding sites (or DNA mapping) is a key to understanding complex intracellular mechanisms, primarily the mechanisms of the control of gene expression, when many DNA-binding proteins work simultaneously, specifically binding to the DNA near its promoter region.An AFM-based analysis of the position of bound enzyme molecules on a DNA template makes it possible to determine (map) the location of the binding sites of these enzymes.Thus, AFM complements traditional biochemical methods, such as EMSA or footprinting, allowing the mapping of individual DNA fragments (hundreds of base pairs or more).
The results of the identification of binding sites on DNA using AFM turned out to be in good agreement with the expected location of these sites in a number of model systems, for example, yeast transcription factors GAL4 and Pho4, as well as eukaryotic transcription factor AP2 with long DNA fragments containing specific binding sites of the corresponding proteins (Nettikadan et al. 1996;Yokota et al. 1998;Moreno-Herrero et al. 2001), while mapping accuracy was up to a dozen pairs nucleotides.AFM was used to map the binding sites of another yeast transcription regulation factor Mig1p, and two binding sites were identified near the HXK2 promoter.The obtained data suggested that Mig1p is a factor involved in the regulation of transcription of the HXK2 gene (Moreno-Herrero et al. 2001).
The mapping of DNA binding sites with the enzyme using AFM remains possible even if the enzyme is also prone to nonspecific interaction with DNA.In such a case, the position of a specific binding site can be determined from a statistical analysis of a large number of DNA-protein complexes as the most probable position of the enzyme on the DNA template.An analysis of the spatial distribution of binding sites for the p53 transcription factor allowed identifying specific binding sites with nonspecific binding of this protein in the background (Nuttall et al. 2016).
AFM analysis of the positions of nucleosomes in yeast RNAP II-nucleosome complexes revealed an upstream transfer of a small subpopulation of the transcribed nucleosomes suggesting a looping mechanism of histone transfer.AFM images have also revealed two fractions of transcribed nucleosomes, which differ by their height.These fractions were attributed to nucleosome octamers and hexamers.The obtained results provided a mechanistic insight into the histone-transfer process during transcription and suggested a competition between elongation, looping, and histone dissociation (Bintu et al. 2011).
Analysis of AFM snapshots of the transcriptional elongation complexes and, in particular, positions of protein molecules on DNA templates allowed to characterize the transcription regulation by a number of repressor proteins such as lac (LacI), the 186 and the λ CI repressors.It was shown that LacI becomes a strong roadblock for RNAP in looped DNA, in contrast to unlooped DNA, demonstrating that the looped DNA topology strongly interferes with transcription (Fig. 2) (Vörös et al. 2017).One hundred eighty six CI, or λ CI, bound along unlooped DNA demonstrated negligible interference with transcription (Lu et al. 2022).
Analysis of the spacing distributions of binding sites of 5mC (5-methylcytosine) was performed using AFM of anti-5mC antibody complexes with DNA (Bu et al. 2022).It was shown that 5mC binding sites are non-homogeneously distributed on genome DNA and the distributions differ for different species.
Identifying (or control) of protein binding sites and their mutual arrangement on the DNA matrix is an essential task of most studies of nucleic-protein complexes using AFM, which is often solved in combination with other tasks.

Stoichiometry of nucleic-protein complexes
The AFM analysis of morphology and dimensions of nucleic-protein complexes allows determining the number of protein molecules bound to the DNA matrix.In particular, the formation of dimeric DNA-protein complexes of eukaryotic transcription factor AP2 (Nettikadan et al. 1996), multimeric complexes of E.coli transcriptional trp repressor (Margeat et al. 1998) and yeast transcription factor GAL4 (Yokota et al. 1998), as well as several complexes of regulatory transcription protein of λ-bacteriophage on a single DNA matrix molecule (Erie et al. 1994) was confirmed using AFM.
The obtained information about the stoichiometry of DNA-protein complexes can clarify the characteristics of the operation of enzymes on the DNA molecule.For example, AFM visualized loop structures when examining the human heat-shock transcription factor 2 (HSF2) complex with a DNA fragment containing specific binding sites for this protein, heat shock elements (HSE) (Wyman et al. 1995), which are located both near and far from the promoter.At least one such element near the promoter is required for transcription, while HSEs far from the promoter act as enhancers that stimulate transcription.As a result, loop-like DNA structures are formed that bring two HSEs together for their contact with the promoter.The HSF2 factor binds to the HSE as a trimer, however, the number of HSF2 trimers required for the formation of loop structures remained unknown.AFM images revealed the presence of two symmetrical topographic maxima near the loop-like Fig. 1 The summary of the tasks and approaches typically used for AFM investigation of DNA-protein interactions complex, whose volume analysis showed that each maximum corresponds to one HSF2 trimer.
In another work devoted to the study of the interaction of DNA with the bacterial nitrogen-regulatory protein NtrC, which promotes the transition from a closed to an open transcription complex and thereby activates transcription, the analysis of the volumes of complexes from AFM images demonstrated the multimerization of NtrC on enhancers, which are specific binding sites for this protein (Wyman et al. 1997).Multimerization was also confirmed when using two mutant variants of NtrC with a reduced ability to activate transcription, one of which also lacked the ability to bind to DNA.Together with the results of transcription analysis, which showed the transcriptional activity of complexes with mutant NtrC, the AFM data allowed to confirm the important role of the cooperative interaction of NtrC in the activation of transcription.Accurate analysis of the volume of protein globules on DNA molecules has revealed different stoichiometry of the binding of bacterial toxin-antitoxin system DinJ-YafQ with the promoter region providing the evidence of DinJ-YafQ binding cooperativity (Fig. 3a) (Bonini et al. 2022).
The multimerization of p53 upon binding with specific sequence of DNA extracted from γ-irradiated cells has been evidenced using AFM (Legartová et al. 2023).AFM has also revealed two different modes of binding of IFI16 (interferon inducible protein 16) with DNA molecules depending on DNA topology (linearized or supercoiled) and molar ratio: as oligomers or globular complexes (Valková et al. 2023).
AFM helped to establish the important role of transcriptionally inactive RNAP binding sites located upstream of the E. coli canonical FIS promoter in the regulation of transcription of this gene (Gerganova et al. 2015).In this work, the concomitant binding of the RNAP holoenzyme to the canonical promoter and a transcriptionally inactive promoter located upstream was reported.This construct has been shown to stabilize the transcription complex in vitro and influence its activity in vitro and in vivo.The authors suggest that transcriptionally inactive RNAP molecules serve as factors initiating transcription from a closely located promoter.

Enzyme-mediated DNA looping
Approaching of remote DNA regions (for example, the enhancer and promoter during transcription) is a key factor in the regulation of many biomolecular processes.Many enzymes are responsible for such approaching, which takes place due to protein-protein interaction or the presence of more than one binding site to DNA for one enzyme.The analysis of the number of DNA loops observed in AFM images of the DNA-protein complex, together with the determination of the stoichiometry of such a complex, allows determining the number of DNA-binding centers of an enzyme or an enzyme complex.
The study of the looping out of transcription complexes allows understanding the molecular mechanism of activation or suppression of transcription.The formation of loops that activate transcription has been studied in a number of works.As was mentioned above, the looping of a DNA molecule containing specific promoter and enhancer sites for binding to HSF2 was observed during the formation of the corresponding complexes (Wyman et al. 1995).In another work, the formation of a loop in the transcription complex was demonstrated in the bacterial system.The loop was formed due to the interaction between RNAP molecules bound to the glnA promoter and nitrogen-regulatory protein C molecules bound to the enhancer (Rippe  . 1997).Another example of a transcription-activating complex revealed by AFM was the triple complex of RNAP holoenzyme, FIS-activating protein and DNA, which forms short loops (Maurer et al. 2006).The investigation of mitochondrial transcription factor A (TFAM) binding with mitochondrial DNA (mtDNA) containing two promoter regions (LSP and HSP1) has demonstrated TFAM mediated formation of DNA loops (Fig. 3b).The formation of these loops was TFAM concentration dependent and correlated with maximal HSP1 transcription activity.In contrast, TFAM devoid of 26 amino acid residues of carboxy-terminal tail (TFAM-CTΔ26) was not able to form DNA loops indicating the important role of carboxyterminal tail in transcription regulation in mitochondria (Uchida et al. 2017).
AFM has also revealed DNA loops, whose formation is associated with the suppression of transcriptional activity of the gene.For instance, transcription suppression of two overlapping promoters in the gal operon of E. coli is mediated by GaI repressor (GaIR) and histone-like protein HU.In this case, GaIR tetramers bind, with the participation of HU, to two remote sites on the DNA matrix, contributing to DNA looping.Such GaIR-HU-DNA complexes with a loop were visualized using AFM (Lyubchenko et al. 1997), and their analysis determined the antiparallel orientation of the formed DNA loops (Virnik et al. 2003).
Fulcrand et al. have investigated the formation of short (91 base pairs) DNA loops containing O1, O2, and O3 operators in the lac promoter upon binding to the lactose repressor of E. coli, which is a negative transcription regulation factor (Fulcrand et al. 2016).
Suppression of the expression of the λ-bacteriophage gene responsible for the activation of the lytic pathway proceeds with the participation of the λ-repressor CI, which specifically binds to triple sites located in regions of the phage genome separated by 2300 base pairs.AFM visualization of complexes of the λ-repressor CI with DNA revealed their multimerization with the formation of loops (Wang et al. 2009).The data obtained confirmed that the interaction of CI with O3 operators significantly stabilizes the loop-like shape of DNA fragments, and thus directs the bacteriophage replication by the lysogenic pathway.

DNA bending and wrapping around the enzyme
Many enzymes, including regulatory transcription factors, bend DNA upon specific binding to it.The advantage of the utilization of AFM for DNA bending analysis is the ability not only to determine the average value of the measured angle, but also to obtain the distribution of this value.The shape of such distribution may indicate the flexibility of the DNA-protein complex, as well as the presence of several populations of complexes characterized by different bending angles.As a rule, the bending of DNA is characterized from AFM images directly by the angle between tangents to the contour of the DNA molecule at the points of visible entry of two arms of the DNA strand into the enzyme.The bending angle of short DNA molecules can be also determined indirectly by analyzing the distance between the ends of the DNA molecule forming the complex (Rivetti et al. 1999).
As a rule, the bending of DNA is closely related to the function of the protein that bends it.In particular, DNA bending can regulate transcription in a variety of ways.For example, DNA bending that occurs at the RNAP binding site facilitates transcription initiation and the transition from initiation to transcription elongation (van der Vliet and Verrijzer 1993).The average DNA bending angle induced by a specific binding of the σ 70 -subunit containing E. coli RNAP holoenzyme determined from AFM was 54° (Rees et al. 1993) (55° (Rivetti et al. 1999)) for the open transcription complex and 54-67° (depending on the sequence of the transcriptional region)-for the elongation complex (Rivetti et al. 2003).In another study, the average DNA bending angle in the elongation complex was 92° (Rees et al. 1993)).AFM analysis of the DNA bending angle in a closed transcription complex formed by another E. coli RNAP holoenzyme containing the σ 54 subunit showed that the average bending angle is 49°, and it increased to 114° upon transition from a closed to an open transcription complex (Rippe et al. 1997).Finally, for eukaryotic RNAP III, the average DNA bend in the elongation complex, according to AFM data, was 49° (Rivetti et al. 2003).
Moreover, DNA bending can facilitate the approach of transcription factors located far away from each other on the DNA matrix, favoring the formation of DNA loops.Such an interpretation was proposed for the complex of the Pseudomonas TOL plasmid with the integration host factor (IHF): upon binding to the plasmid, this factor caused a 123° bend in the DNA, thereby facilitating the necessary for transcription activation contact of the promoter-bound RNAP with the Xy1R protein located on the upper part of the promoter (Seong et al. 2002).
AFM can also be used to study DNA bending caused by non-specific binding to a protein.Specific and nonspecific DNA-protein complexes can be distinguished in AFM images if the position of a specific binding site relative to one of the ends of the DNA matrix is known.It is assumed that non-specific DNA-protein interactions play an important role in facilitating the search for specific sites by enzymes (Suzuki et al. 2015;Lyubchenko and Shlyakhtenko 2016).Comparison of specific and nonspecific DNA complexes with protein allows for a better understanding of the molecular basis of specific binding process.For instance, the bending angle of the complex upon DNA binding with the λ-bacteriophage transcription regulation factor Cro was analyzed in ref Erie et al. (1994).It turned out that upon binding to the DNA this enzyme bends it both in a specific and non-specific way, while the bending angle differs: in non-specific complexes it was 62°, whereas in specific complexes 69°.The data obtained suggested that the bending of non-specific complexes increases the specificity of protein binding and can be used by the enzyme to improve the contact with DNA necessary for recognition of a specific sequence.Thus, DNA bending can be used to facilitate the recognition of specific sites by the protein.
DNA bending by an enzyme is often accompanied by DNA wrapping around this enzyme.Like DNA bending, DNA wrapping around an enzyme is also a factor in the regulation of transcription.The length of wrapped DNA can be determined from AFM images of the complex by the shortening of the apparent length of the DNA molecule forming the complex relative to the length of the enzyme-free DNA molecule.AFM has revealed a DNA shortening of about 30 nm (or 90 base pairs) for a bacterial open transcription complex formed by the σ 70 subunit RNAP holoenzyme that was associated with DNA wrapping around RNAP (Rivetti et al. 1999).
An analysis of the DNA fragment lengths from the bound enzyme to both ends of the DNA ("arms") also allowed to determine which particular nucleotide site near the promoter is involved in the DNA wrapping around the RNAP molecule.The obtained results allowed proposing a molecular model of the OPC, which considers both DNA bending and DNA wrapping around the RNAP holoenzyme by approximately 300°.According to this model, DNA wraps around an RNAP molecule by a region from the 70th to the + 24th nucleotide (where the transcription initiation point corresponds to " + 1" position, negative values are above the transcribed region, and positive values refer to the transcribed region).
Later, AFM experiments conducted on a series of DNA constructs with different nucleotide sequences above the promoter and using mutant variants of RNAP showed that the degree of DNA wrapping around RNAP varies depending on the position of A/T-rich sites located above the promoter (Doniselli et al. 2015) and the length of the interdomain linker of the α-subunit (Cellai et al. 2007), while DNA wrapping around RNAP, devoid of the C-terminal domain of its α-subunit, is practically absent (Cellai et al. 2007).This is explained by the fact that the DNA wrapping around RNAP is mostly mediated by the interaction of the carboxylic domain of the α-subunit of RNAP with A/T-rich DNA regions located above the " − 35" hexamer.
In the elongation transcription complex with E. coli RNAP, the average value of DNA compaction was 22 nm, which agrees with the 180° wrapping of DNA around the holoenzyme (Rivetti et al. 2003).The decrease in the degree of DNA wrapping around RNAP in the elongation complex compared to the open transcription complex is associated with the loss of contact between the RNAP α-subunit and the specific DNA sequence located upstream of the promoter.The same study reports about wrapping of 30 nm DNA around eukaryotic RNAP III in the elongation complex.DNA wrapping in the elongation complex suggests a possible mechanism by which RNAP can overcome the physical barrier in the form of nucleosomes during transcription (Rivetti et al. 2003).
The absence of DNA compactization upon binding with an enzyme, which is known to induce DNA wrapping, may indicate non-specific binding.Chammas et al. observed three peaks in the contour length distributions of the complexes of E.coli RNAP with 1144 bp DNA templates containing two specific promoters (Fig. 4).The first peak was indistinguishable from the contour length of the free DNA molecule and assigned to non-specific DNA-protein complexes.The other peaks corresponded to the shortening of DNA contour length by ~ 25 nm and ~ 50 nm, respectively, and were associated with single and double OPCs (Chammas et al. 2017).
It was demonstrated that not only RNAP but also many regulatory transcription factors induce DNA wrapping.For instance, the transcription regulation factor FIS is wrapped by DNA segments of different lengths (29 or 80 nm) depending on the specific promoter upon binding to the regulatory region (Zhang et al. 2004;Maurer et al. 2006).It was also found that 40 nm of DNA was wrapped around the PutR protein, which positively regulates the expression of the putA gene of Agrobacterium tumefaciens in response to proline (Jafri et al. 1999).The DNA wrapping around the bacteriophage 186 repressor protein CI has been shown to be a regulatory factor of lytic and lysogenic transcription (Wang et al. 2013).
Using AFM, it was shown that different degrees of DNA wrapping around RNAP in the presence of another transcriptional regulatory protein, H-NS, in systems with σ 70 -and σ 38 -containing RNAP underlie the selective repression of transcription by the H-NS factor (Shin et al. 2005).The DNA wrapping around the σ 70 -containing RNAP is more complete that leads to the crossing of the DNA strand wrapped around the enzyme.In this case, the resulting DNA crossing is fixed by the protein-protein interaction of H-NS molecules bound to the near-promoter region, and the transition of the transcription complex to the elongation step is impossible.In the σ 38 -subunit RNAP-based transcription complex, wrapping is observed to a lesser extent so that DNA does not from self-crossing, and transcription initiation becomes possible (Fig. 5).

Conformation and flexibility of protein-DNA complexes
A free DNA molecule is characterized by a certain conformation (i.e. the position of all atoms in space), which in many practically relevant cases can be approximated by a conformation of a continuously flexible isotropic rod, or by a worm-like chain (WLC) model (Rivetti et al. 1996).The flexibility of a polymer molecule in WLC model is quantitatively described by a persistence length P, which is defined as a characteristic decay length of the mean cosine of the angle θ between two tangents to a polymer contour at points separated by a certain length l along the contour (Fig. 6a): The persistence length of DNA can be determined directly from the analysis of the molecular contours in AFM images.It was shown that the persistence length of DNA varies from ~ 25 to ~ 115 nm depending on the electrolyte composition and ionic strength of the solution (Mantelli et al. 2011;Guilbaud et al. 2019).
Protein binding with DNA may change its conformation and flexibility; therefore, the persistence length may provide important information about protein-DNA complex formation and the biological functionality of the complexes.It was shown that HU binding changes the persistence length of DNA, and this change is protein concentration dependent: initially the persistence length monotonically decreases from ~ 50 nm in the absence of protein to ~ 25 nm at an HU concentration of 500 nM, and then it increases up to ~ 35 nm, when an HU concentration reaches 900 nM (Nir et al. 2011).TFAM and its deletion mutant TFAM-CTΔ26 have also demonstrated the ability to decrease the persistence length of DNA upon binding, however, to a different extent.The persistence length decreased from ~ 50 nm in absence of protein to ~ 30 nm in presence of 10 nM of wild-type TFAM, but only to ~ 46 nm in presence of 10 nM of TFAM-CTΔ26 that demonstrates less affinity of DNA binding of TFAM deletion mutant (Uchida et al. 2017).
The increase of the radius of gyration estimated from AFM image analysis has revealed the increase in the stiffness of H-NS-DNA filaments as compared with the naked DNA molecules (Yamanaka et al. 2018) (Fig. 6b).
The comparison of the cos(θ) dependence on l for free and protein-bound DNA has allowed to estimate a contact length of DNA with a Dps molecule, a ferritin-like bacterial protein, which crystallizes DNA in the stationary phase.Angle analysis has demonstrated that Dps molecule contacts with a DNA segment of ~ 6 nm in length (Dubrovin et al. 2021).

Analysis of binding/dissociation constant and binding specificity of DNA-protein complexes
The possibility of direct counting of the number of DNA-protein complexes and free (unbound) DNA molecules allows quantitative characterization of the efficiency of DNA-protein complex formation.Such an approach has been used to investigate the complex formation of the isoforms of p53 protein, which acts as a transcription factor and a tumor suppressor, with supercoiled and linear DNA molecules.The analysis of the frequency of occurrence of the p53-DNA complexes has shown stronger p53 protein binding with supercoiled rather than linear DNA molecules and confirmed different affinity of various p53 isoforms for DNA binding (Goswami et al. 2022).
The association and specificity constants of DNA-protein binding are the primary thermodynamic characteristics of DNA-protein interaction.For the determination of the thermodynamic constants of DNA-protein interaction EMSA, filter-binding assay, surface plasmon resonance (SPR), and calorimetric analysis are traditionally used (Yang et al. 2005).One of the disadvantages of these methods is the analysis of the "average picture" of DNA-protein interaction, which may incorrectly consider the contributions of different types of interaction.Another disadvantage of traditional methods for determining thermodynamic constants is their indirect nature, for example, they are calculated from the heating signal in calorimetry or refractive index in SPR.The linear dependence of such a signal on the DNA-protein binding constant is usually assumed, which is not always correct (Lohman and Bujalowski 1991).
Yang and coauthors proposed an AFM-based method for determining binding constants and specificities that overcomes these limitations (Yang et al. 2005).This method allows determining the binding constant and binding specificity of an enzyme to a specific DNA site due to the possibility of direct AFM observation of protein binding to all DNA sites.Binding constants K i for a given DNA-binding site (DNA i ) are determined from the expression K i = [protein − DNA i ]/([DNA i ] × [protein]), where [protein], [protein − DNA i ] and [DNA i ] are the concentrations of the free protein, the protein bound at the given site, and the unoccupied given site, respectively (Fig. 7a).The specificities can be obtained either from the ratio of binding constant at the specific site to that at a nonspecific site (S = K SP /K NSP ) or by analysis of the distribution of the positions of bound proteins on the DNA template (Fig. 7b).Therefore, the determination of equilibrium thermodynamic constants and binding specificity is based on a direct calculation of the number of certain DNA-protein complexes and the measurements of the distances from the bound enzyme to one of the ends of the DNA molecule.The use of this method to analyze the interaction of the MutS mismatch repair protein with DNA showed a much higher specificity of this enzyme for mismatched DNA nucleotides compared to the values obtained from the EMSA (Yang et al. 2005).This discrepancy is explained by the high affinity of MutS binding to DNA ends.
Using this method, the dissociation constants of various transcription complexes and their dependence on the factors regulating gene expression, such as guanosine tetraphosphate and DksA, were determined (Doniselli et al. 2015).In Fig. 7 Determination of a binding constants and b specificities from the AFM images: K SP , specific binding constant; K NSP , nonspecific binding constant; K E , binding constant for a DNA end.N bp,P , the number of DNA base pairs covered by a protein.b Illustration of the position distribution of bound protein molecule on a DNA template with a single specific site: A nsp and A sp , the areas under the position distribution of protein-DNA complexes for non-specific and specific binding, respectively; P min and P max , the observed minimum and maximum occurrence probabilities.Reproduced from Yang et al. (2005) under the Creative Commons CC BY license (http:// creat iveco mmons.org/ licen ses/ by/4.0/) particular, promoter complexes formed on the bacteriophage λ pR promoter have been shown to be more stable than those on the E. coli rrn promoter.In this case, the effect of inhibition of the rrn promoter by guanosine tetraphosphate and DksA, and the antagonism of DksA and guanosine tetraphosphate in relation to the λ-pR promoter was shown (DksA reduced the effect of guanosine tetraphosphate) (Doniselli et al. 2015).In another work, it was shown that the dissociation constant of the GabR transcription regulation factor with DNA doubled after the addition of γ-aminobutyrate, which is responsible for the initiation of transcription of the B. subtilis gabT and gabD genes (Amidani et al. 2017).Thus, it was concluded that the binding of γ-aminobutyrate to GabR promotes protein conformational changes that favor transcription initiation.Apparent dissociation constants K D app were calculated for polycomb repressive complex 2 (PRC2) binding with DNA: K D app was 150 ± 12 nM, when PRC2 binds with DNA without looping and 900 ± 400 nM for the looped configuration of PRC2-DNA complexes (Heenan et al. 2020).

AFM investigation of convergent transcription
One of the fundamental mechanisms of transcription regulation is associated with the so-called transcription interference, occurring when two RNAP molecules initiate transcription from closely located promoters either in the same direction (tandem transcription) or toward each other (convergent transcription).The genes with two promoters have been found in a big number of prokaryotes (Ward and Murray 1979;Horowitz and Platt 1982;Callen et al. 2004) and eukaryotes (Puig et al. 1999;Prescott and Proudfoot 2002), including humans (Lehner et al. 2002)."Traditional" methods of investigation of transcription, such as footprinting, transcriptional analysis, and reporter assay, has allowed to reveal the specific features of transcription interference, such as the effect of suppression of transcription initiated from a weaker promoter in the presence of a stronger one in the systems of convergent promoters (Sneppen et al. 2005;Shearwin et al. 2005;Palmer et al. 2011).To explain the observed effects, several mechanisms of interactions of RNAP molecules have been proposed such as occlusion and collision events (Callen et al. 2004;Shearwin et al. 2005).Occlusion implies the inability of RNAP binding with one of the promoters in presence of another RNAP molecule, which initiated transcription from a convergent promoter.The collision of two transcriptional elongation complexes or elongation complex with OPC may result in the dissociation on one or both RNAPs from the DNA template or stalling of both RNAPs ("sitting ducks") (Gibson et al. 2005;Shearwin et al. 2005).The use of high-resolution microscopy such as AFM complements traditional transcription analysis by providing an insight into individual RNAP-DNA complexes that is important for understanding of the mechanisms of transcription interference.
AFM was used to visualize various products of transcriptional interference including convergent open transcriptional complexes, elongation complexes, and stalled complexes (Crampton et al. 2006) (Fig. 8).The collision of two elongation complexes has led to the backward motion of one of the RNAP molecules (usually, stalled one) along the DNA template.
To study the role of mutual orientation of the promoters and the distance between them in suppression of convergent transcription, Koroleva et al. constructed a series of plasmids with closely spaced convergent promoters separated by a different length.AFM analysis has shown two distinct configurations of double OPCs: depending on the interpromoter distance less probable cis-or more probable trans-configuration was observed.Cis-configuration of double OPC corresponded to the DNA templates with promoters separated by an integral number of DNA helical turns, whereas trans-configuration -to the DNA template with promoters separated by an integral number of DNA helical turns plus a half-turn.It has been suggested that mutual orientation of convergent promoters regulate the formation and stability of convergent OPC (Fig. 9) (Koroleva et al. 2016).

Studying the dynamics of DNA-protein interaction
For real-time AFM study of DNA-protein interaction, the adsorption of a sample onto a substrate should satisfy two contradicting conditions: it should be strong enough to allow obtaining stable AFM images; at the same time, the adsorbed biomolecules should retain sufficient mobility to be able to move or turn on the substrate (Bustamante and Rivetti 1996;Billingsley et al. 2012).To meet such conflicting requirements, special attention is paid to the ionic composition of the solution in which the AFM study is carried out.The ionic composition and concentration of ions in a solution regulate the interaction of DNA and protein molecules with the mica surface, which is mainly used as a substrate for real-time AFM studies of biomolecular processes.The possibility of free diffusion of DNA molecules on the mica surface in a solution containing a low concentration (1-10 mM) of divalent cations is confirmed by the DNA conformation observed in AFM images obtained in air, which is consistent with the theoretical equilibrium two-dimensional conformation of the polymer (Rivetti et al. 1996).In addition, the two-dimensional diffusion of DNA adsorbed on the mica surface at divalent cation concentrations up to 20 mM was directly confirmed by AFM studies performed in tapping mode in a solution (Fig. 10a) (Bustamante et al. 1999;Jiao et al. 2001;Suzuki et al. 2010;Vanderlinden and De Feyter 2013).Therefore, solutions containing 1-20 mM divalent cations are often used for AFM studies of the dynamics of DNA-protein interaction on mica (van Noort et al. 1998;Ellis et al. 1999;Bustamante et al. 1999;Jiao et al. 2001;Abdelhady et al. 2003), i.e. solutions optimized for AFM studies of DNA in an equilibrium two-dimensional conformation.
The optimal conditions for real-time AFM investigation of p53 interaction with DNA in tapping mode were found to be 20 mM MgCl 2 in the presence of 5 mM Na-Hepes.The use of these conditions allowed to visualize a number of molecular events and suggest two modes of interaction of p53 molecules with DNA, such as direct specific binding and initial non-specific binding followed by translocation of the protein along the DNA to the specific site by onedimensional diffusion (Jiao et al. 2001).
Two or three buffer solutions differing in their composition were used for real-time AFM studies of transcription: one for deposition of transcription complexes on mica, another for obtaining AFM images, and the third for the carrying out of transcription (Guthold et al. 1999;Bennink et al. 2003).This approach allowed visualizing the DNA movement along the RNAP during transcription and the dissociation of RNAP during termination.The rate of RNAP movement relative to DNA (0.5-2 nucleotides per second) was directly measured.In addition, one-dimensional RNAP diffusion along a DNA matrix that does not contain a specific promoter was visualized, demonstrating one of the possible mechanisms used to speed up the search for an RNAP promoter by the enzyme.The other searching mechanisms of promoter sites, such as intersegment transitions and jumps, were also revealed from the analysis of the subsequent AFM images obtained in solution (Bustamante et al. 1999).
It should be noted that another widespread method of mica treatment, modification with aminosilanes, is also used in AFM studying the dynamics of DNA-protein interaction, especially when the presence of divalent cations in solution is undesirable (Shlyakhtenko et al. 2009(Shlyakhtenko et al. , 2012;;Suzuki et al. 2011).For this purpose, special mica modification protocols have been developed that allow regulating the strength of DNA interaction with a modified mica surface and, in particular, preserving the mobility of adsorbed DNA (Mikheikin et al. 2006;Lyubchenko and Shlyakhtenko 2009).
In recent years, high-speed AFM (HS-AFM), which allows obtaining up to several tens AFM images per second, has been increasingly used to study dynamic processes.For example, the transcription elongation step was visualized with a recording frequency of 1 AFM image in 0.5 s, and the basic searching mechanisms of the RNAP promoter by RNAP, such as a one-dimensional diffusion along DNA, intersegment transition and jumping, were confirmed (Suzuki et al. 2012).At the same time, HS-AFM made it possible to observe transcription at a rate of 15 nucleotides per second under conditions close to physiological.
The emergence of DNA origami technology (Rothemund 2006) provided an opportunity to place individual molecules on a precisely defined site on DNA nanostructures (called DNA frames) for their subsequent AFM study.Recently, such an approach using DNA frames has become widely used to visualize the movement of individual molecules in the process of their interaction with each other using HS-AFM (Endo and Sugiyama 2014).This approach was used to characterize DNA-protein interactions during replication (Chao et al. 2016), recombination (Suzuki et al. 2014a(Suzuki et al. , 2014b;;Okholm et al. 2015), repair (Endo et al. 2010a), restriction (Endo et al. 2010b) and transcription (Endo et al. 2012).To study transcription, a promoter containing linear DNA matrix was fixed at both ends on a DNA frame, and then the whole structure was adsorbed onto the mica surface.DNA binding to the DNA matrix, RNAP movement along DNA, RNA synthesis, and RNAP dissociation were visualized on the constructed system (Endo et al. 2012).A DNA origami frame was constructed for HS-AFM investigation of the searching mechanism of a specific binding site by a photoresponsive transcription factor GAL4-VVD.Various types of motion of GAL4-VVD dimer near a dsDNA template, which contained several specific GAL4 binding sites, have been visualized including binding, sliding, stalling on dsDNA, and inter-strand jumping between two dsDNA templates (Raghavan et al. 2019).coli RNAP-pSF-OXB19 DNA) adsorbed on a stearylamine-modified HOPG surface (images 1-2) before and (images 3-4) after the addition of nucleoside triphosphates (NTP).The E.coli RNAP molecule is denoted by the arrow.The time (in min) of each AFM image relative to the first AFM image in the sequence is shown at the bottom left.The scan size is 500 × 500 nm 2 .AFM images were obtained in the buffer.Reprinted with permission from Dubrovin et al. (2017).Copyright (2017) American Chemical Society 1 3

AFM of DNA-protein complexes on a modified HOPG surface
It has been shown that high ionic strength may be formed spontaneously near the mica surface due to the dissociation of K + ions in aqueous solutions and the formation of K 2 CO 3 upon mica cleavage in the air (Christenson and Thomson 2016).This factor may be important for correct interpretation of the AFM images of DNA-protein complexes adsorbed on mica.For example, intensive dissociation of DNA complexes with a restriction enzyme EcoRI upon their adsorption to a mica surface was explained by a high ionic strength of the solution near a surface (Sorel et al. 2006).
Alternative approaches of AFM investigation of DNA-protein complexes may be based on a HOPG surface modified with organic monolayers as a substrate (biopolymer adsorption on graphitic surfaces is reviewed in ref.Dubrovin and Klinov (2021)).DNA molecules were shown to adsorb in an extended conformation on HOPG surfaces modified with dodecylamine, stearyl amine, and stearic acid, which form nanostructured monolayers on graphite (Adamcik et al. 2009;Dubrovin et al. 2010Dubrovin et al. , 2014)).Moreover, the surface diffusion of single DNA molecules and molecular segments in water has been directly imaged on stearyl amine and stearic acid modified HOPG surface (Dubrovin et al. 2016) indicating their potential suitability for real-time AFM investigations of biomolecular processes.The approach for real-time investigation of transcription on stearylamine monolayers on a HOPG surface has been proposed in ref Dubrovin et al. (2017) (Fig. 10b).
HOPG surface modified with oligopeptide-hydrocarbon derivative N,N'-(decane-1,10-diyl)bis(tetraglycineamide) (GM) was used as a substrate for AFM investigation of DNA (Prokhorov et al. 2021) and protein molecules (Barinov et al. 2016).Though the DNA is kinetically trapped on a GM-HOPG surface in water, extensive surface diffusion of the DNA was observed in 100 mM NaCl solution (Dubrovin et al. 2017).Therefore, the GM-HOPG surface may have a big potential for AFM investigation of DNA-protein complexes (Klinov et al. 2020;Dubrovin et al. 2021).

Conclusions
Due to the high spatial resolution, ability to study individual biomolecules, and dynamical processes, AFM is widely used in investigation of DNA (RNA)-protein interaction as a complementary tool to traditional molecular biology approaches.Analysis of multiple AFM studies of DNA (RNA)-protein complexes with a main focus on DNA transcription allowed to distinguish the general approaches and characteristic tasks in such investigations.They include AFM investigation of the products of nucleic acid-DNA interaction in ambient environment or in aqueous solution and real-time study of the biomolecular processes such as one-dimensional diffusion of protein molecules along a DNA template, RNAP movement along DNA during transcription, and RNA synthesis.The basic types of the tasks of AFM investigation of nucleic acid-DNA complexes include DNA mapping and defining the stoichiometry of nucleic acid-protein complexes, investigation of protein induced DNA looping, DNA bending and wrapping around a protein molecule, conformation and flexibility of protein-DNA complexes, analysis of binding/dissociation constant and specificity of DNA-protein binding, and the study of the dynamics of DNA-protein interactions.Special attention is paid to the AFM contribution to understanding of the fundamental mechanisms of convergent transcription, which is believed to be one of the mechanisms of regulation of gene expression.The methodological aspects of sample preparation such as the choice of a substrate and ionic content of DNA-protein solutions play an important role in DNA-protein adsorption and modulate the strength of biomolecule adhesion on a substrate that is crucial for AFM imaging.AFM has allowed obtaining unique information about morphology, conformation, and functioning of a big number of various DNA (RNA)-protein complexes that help to understand the fundamental processes taking place inside a cell and may be used for the development of functional biological surfaces for biomedical and biotechnological purposes.

Fig. 2 a
Fig. 2 a Representative AFM snapshots of transcription-elongation complexes on (first column) unlooped and (second and third columns) looped DNA.DNA loops are formed due to the specific binding of LacI with (second column) Os and O1 operators or (third column) O2 and O1 operators (complexes without LacI are imaged in the first column).Streptavidin was used to label the biotinylated downstream end of the DNA template.The rows I-V represent transcription elongation complexes with RNAP position progressing downstream along a DNA template: I, RNAP bound at the promoter; II, RNAPs are between the promoter and near operator; III, RNAPs contact LacI at the near oper-

Fig. 3 a
Fig. 3 a (Top) AFM snapshots and image profiles of DinJ-YafQ complexes of different volume specifically bound to the promoter region; (bottom) the distribution of volumes of specific DinJ-YafQ complexes.Freshly cleaved mica was used as a substrate.AFM images were obtained in air.Scale bar is 100 nm.Adapted from Bonini et al. (2022) under the CC 4.0 license (http:// creat iveco mmons.org/ licen ses/ by/4.0/).b AFM image of the complex of streptavidin labeled

Fig. 6 a
Fig. 6 a The definition of the persistence length P: angle θ between the tangents (dashed lines) at two points (black circles) along a contour of a polymer molecule (solid curve) separated by a distance l along the contour.b AFM images of (left) the naked slp promoter DNA (~ 700 bp) and (right) the same DNA incubated with 600 nM

Fig. 9
Fig. 9 (A-C) The illustration of formation of (A) single OPCs; (B and C) two configurations of two closely located OPCs: two RNAP molecules occupy and bend a DNA template (B) from one side and (C) from different sides.(D-E) AFM snapshots of two closely located open promoter complexes, illustrating cases (B) and (C).A DNA bend near RNAP molecule is indicated by the white arrow in