Data on structural analysis of cholesterol binding and sterol selectivity by ABCG5/G8

ATP-Binding cassette subfamily G (ABCG) sterol transporters maintain whole body endogenous and exogenous sterol homeostasis. A substantial portion of exogenous sterols are undigestible phytosterols (plant sterols), which can introduce complications when accumulated. ABCG5/G8 is the main protein functioning to remove ingested plant sterols providing protection from their toxic effects, although, the structural features behind substrate binding in ABCG5/G8 remain poorly resolved. Within this data article, we present extended preceding in the determination of the cholesterol-bound crystal structure and the sterol docking analysis. The crystal structure was deposited in the Protein Data Bank with the accession number of 8CUB, whereas the diffraction images were deposited at the SBGrid Data Bank. This dataset follows the research article entitled as “Structural analysis of cholesterol binding and sterol selectivity by ABCG5/G8” (doi: 10.1016/j.jmb.2022.167795).


Specifications
Biological sciences Specific subject area Structural biology Type of data Figure, Table, External Depository  How the data were acquired  Figs. 2 / 3 & Tables 1/2: the data was acquired via X-ray crystallography by collecting the X-ray diffraction images at the beamline 19-ID at the Advanced Photon Source (APS). Fig. 4 & Tables 3-5 : the data was acquired via molecular docking using programs of UCSF Dock6.9 through the SBGrid Consortium. Fig. 5 : the data was acquired via multiple sequence alignment using PSI-BLAST and PROMALS3D. Only five representative species were used in the figure. Data format Raw Analyzed Description of data collection The X-ray diffraction data were collected remotely at beamline 19-ID at the Advanced Photon Source. Diffraction images of a total of 3 crystals were merged for data processing. Data of the molecular docking were obtained by excluding published ligands in the input models. Amino acid sequences of top 300 mammalian species were obtained by the online-based PSI-BLAST. All sequences in FASTA format were subjected to multiple sequence alignment for conservation analysis. Data

Value of the Data
• The crystallographic data shows the electron density of a novel cholesterol ligand in the ABCG5/G8 crystal structure and the structural information at highest possible resolutions.
• The ligand docking data shows a collection of predicted poses of cholesterol, sitosterol and stigmasterol, the top 3 enriched sterols in Sitosterolemia patients. Contracting data via comparison of sterol poses between ABCG1 and ABCG5/G8 provides hypothesis-driven models for sterol selectivity by ABCG sterol transporters.
• The sequence analysis data shows a collection of amino acid conservation patterns on the transmembrane helices at the putative substrate translocation pathways in mammalian and yeast ABCG transporters. This provides the preliminary results to suggest a novel and key region within ABCG proteins that may play a role in regulating transporter activities and to guide structural and functional studies using experimental approaches.

Data Description
The data reported herein describes the data acquisition, processing and presentation for the crystal structure of a cholesterol-bound conformation of ABCG5/G8 sterol transporter, including figures, tables, and external repositories. Fig. 1 depicts a flow-chart of protein purification leading to crystallization; Figs. 2 and 3 illustrate the atomic model of the crystal structure and electron density fittings to the protein backbone and the cholesterol ligand, where the sterol binding site is near alanine 540 of ABCG5/G8. Table 1 summarizes the X-ray reflection intensities of processed X-ray images, and Table 2 highlights the results of i) model searching by Phaser's molecular replacement and ii) model refinement by Phenix. At the Protein Data Bank , the refined and validated structural data is publicly available through the model accession number 8CUB. At the SBGrid Data Bank , three sets of the raw X-ray diffraction images are publicly accessible through the identification numbers of 971, 972, and 973. Fig. 4 shows the data on the molecular docking of cholesterol and two plant sterols ( i.e. , sitosterol and stigmasterol) to ABCG5/G8 sterol transporters. Tables 3 , 4 and 5 summarize the ligand-docking results on ABCG5/G8 by cholesterol, stigmasterol, and sitosterol, respectively. Fig. 5 highlights the amino acid conservation analysis of the Phenylalanine Highway (PH) motif on transmembrane helix 2 (TMH2) in five mammalian ABCG sterol transporters. The β-DDM-solubilized membrane preparation was subjected to a Ni-NTA/CBP tandem affinity column chromatography to extract ABCG5/G8 heterodimers and exchange detergents with MNG. Endo H and 3C protease were used to remove the extracellular N-glycans and the CBP tag, respectively. CBP-free proteins were collected via the flow-through of a 2 nd CBP column, which was further purified by gel-filtration chromatography. Purified transporters were subjected to lysine methylation and cysteine-capping alkylation, and the excess chemicals were filtered by a 2 nd Ni-NTA column. Synthetic phospholipids were added to the Ni-NTA bound proteins, and the eluates were then subjected to desalting by a PD-10 spin column. If needed, ligands, e.g. , ATPase inhibitors, were mixed with the 2 nd Ni-NTA eluates before desalting. The relipidated proteins were mixed with cholesterol in equilibrium and then concentrated to desired concentration for crystal growth. (light orange) (e). The electron densities with a mesh presentation were contoured at 1 σ with a thickness of 1.5 as implemented in PyMOL).

Table 1
Summary of reflections intensities and R-factors by shells.

Protein crystallization
Proteins used for crystallization were prepared according to the previously established protocol [4] . Fig. 1 highlights the key steps during the protein purification. The relipidated proteins were treated with 1 mM AMPPNP (sodium salt, Roche) for overnight at 4 °C and desalted with a PD-10 column. The desalted and lipidated proteins were incubated overnight with cholesterol (prepared in isopropanol) to a final concentration of ∼20 μM. The protein precipitants were removed by ultracentrifugation, using a TLA-100 rotor and a benchtop Optima TLX ultracentrifuge at 10 0,0 0 0 g and 4 °C for 10 minutes. The supernatants were concentrated to a final protein concentration of 30-50 mg/ml. The cholesterol-treated proteins were reconstituted into 10% DMPC/Cholesterol/CHAPSO bicelles. To prepare the 10% bicelle stock solution, lipids and detergents (CHAPSO) were mixed in a ratio of 3:1 (w/w), where the lipids contained 5 mol % cholesterol and 95 mol % DMPC. The proteins and bicelle stock were gently mixed in a 1:4 (v/v) ratio, resulting in a final protein concentration of ∼10 mg/ml. The protein/bicelle mixture was incubated on ice for 30 min and then mixed with equal-volume crystallization reservoir solution (usually 0.5 or 1mL). In a 20 °C incubator, crystals grew by a hanging-drop vapor diffusion technique in the 4 8-well VDX4 8 crystallization trays using thin glass cover slides. The crystallization reservoir solution contained 1.8-2.0 M ammonium sulfate, 100 mM MES pH 6.5, 2-5% PEG400, and 1 mM TCEP. Within 1-2 weeks of incubation, the protein crystals ( Fig. 2 ) were harvested by submerging with 0.2 M sodium malonate and flash-freezing in liquid nitrogen.

Collection of X-ray diffraction data
The X-ray diffraction images were collected on a Quantum 315r detector at beamlines 19-ID at the Advanced Photon Source (APS), at which the data collection was carried out remotely via SBCCollect GUI as the beamline control. The crystals were exposed to X-ray radiation (wavelength ∼ 1 Å ) under the cryogenic condition with the beam window fully open and without attenuation. For each crystal, 90 frames of X-ray diffraction images were collected at an incremental step of 1 °and for 2-minute exposure ( Fig. 2 ). Diffraction images from all crystals were individually processed by Denzo and XdisplayF as implemented in HKL20 0 0. Each dataset were used to proceed to scaling by HKL20 0 0's Scalepack. Datasets of three crystals were merged for data processing, as they contained similar unit cell dimension, and were scaled to 4-30 Å to ensure sufficient signal/noise ratio, i.e. , I/s > 1 with a space group of I222 and a unit cell dimension of a = 173.27 Å , b = 230.11 Å , c = 249.82 Å , and α= β= γ = 90 °. Table 1 summaries the reflections' intensities and R-factors by shells.

Model building and refinement of the crystal structure
Applying molecular replacement method, we used the cryo-EM structure of ABCG5/G8 (PDB ID: 7JR7) as the starting model in the Phaser to obtain phase information and a structural solution [1 , 2] . Phaser's solution at the Walker A, Walker B, and Signature motifs was corrected based on precise registry of the previous ABCG5/G8 crystal structure (PDB ID: 5DO7) before Phenix's model refinement [3 , 4] . The initial model underwent several refinements which finally resulted in a refined model with Rfree and Rwork of 0.338 and 0.309, respectively. Close inspection of the Fo-Fc map ( Fig. 3 ) illustrated two orthogonal electron densities that clearly include the nature of polycyclic rings and are within the van der Waals' distance from ABCG5 A540 . We refined the structure in the presence of one cholesterol molecule on each ABCG5/G8 heterodimer by testing a series of sterol orientations using the program COOT [5] . Following our final refinement, Rfree and Rwork improved to 0.302 and 0.244, respectively. Eventually, the quality of the model was validated by MolProbity as implemented in Phenix [3 , 6] . We have also tried to place cholesteryl hemisuccinate (CHS used during purification) or ergosterol (native sterol from yeast), but either ligand led to severe ligand-protein clash; thus cholesterol is the most suitable ligand in this study.

Molecular docking
ABCG5/G8 models were gathered from the protein data bank while the ABCG5/G8 mutants (ABCG5 A540F /G8 and ABCG5 Y432F /G8 N564P ) were generated using UCSF MODELLER. PDB ID: 5DO7 was used as the template file. 400 models were developed and subsequently ranked by discrete optimized protein energy (DOPE). The best model was also visually inspected to determine if any disordered regions were found within the transmembrane domain (TMD).
Molecular Docking was done using UCSF Dock6.9 [7] accessed through SBGrid [8] . Protein files were downloaded from the protein data bank or in the case of mutants, were generated through MODELLLER [9] . All protein files had .pdb extension. The ligand files were downloaded through PubChem as SDF files. Chimera was used to visualize all steps of docking [10] . The nucleotide binding domain (NBD) of the protein was removed to minimize the overall size prior to docking, a requirement since dock6 has a size limit. All three-dimensional .pdb and .sdf model files (protein and ligand) were converted to .mol2 during the preparatory steps of docking as charges and hydrogens were added to each. A surface model of ABCG5/G8 with extension .dms was also developed to use for the generation of spheres (using sphgen accessory software) denoting the ligand binding site. The spheres were specified to be generated within a 10 Å radius from the center of each dimer interface, where the ligands were located. A box was built in relation to the specified spheres from the previous step. This box was visually inspected to make sure it maximized coverage of the protein surface and the denoted binding site. This box would also allow the denoting of the energy grid which was built through the grid accessory software. Two grid files were generated, file extension .nrg allowing for scoring of ligand conformations, while .bmp determined if the atoms from receptor and ligand are overlapping. Allowed amount of Van der Waal overlap was set at 0.75, providing stringent restrictions on overlap. Grid spacing was set at 0.4. The output files from grid were fed into Dock6's minimization feature, where the ligands were shifted in three dimensions in a rigid manor, i.e., no rotation of bonds. This movement of the ligand minimized the ligand energy by placing it in a more energetically favorable position within the placement site and provided the reference position (input file) for which flex docking procedure started from. Flexible docking provided Dock6 with the freedom to rotate any of the ligand's rotatable bonds to better fit potential binding sites. Maximum orientations were tested at 10 0 0, 50 0 0 and 10 0 0 0, yielding no differences, indicating that the number of max orientations exceeded the amount needed to saturate new poses. Once flex docking was done, three output files were created: scored, orients and conformers. The scored file was opened through chimera's viewdock feature. The poses were ranked by Van der Waal energy (vdw_energy) values i.e., lower is better. This scoring method was chosen after redocking controls on ABCG2 were conducted [11] . Here, we conducted molecular docking of cholesterol and two plant sterols (stigmasterol and sitosterol) on wild type ABCG5/G8 and mutants ABCG5 A540F as well as ABCG5 Y432F /G8 N564P [1] . In Fig. 4 , both plant sterols appear to have similar binding conformations, having two partitioned clusters. The poses, parameters and docking data for all three sterols are seen in Tables 3 -5 . Additionally, both plant sterols are lacking the peak horizontal conformation taken by cholesterol which was resolved in the crystal structure as well as shown in the docking simulation [11] . In the A540F mutant, both plant sterols remain clustered similarly at the bottom of the protein, with the polar heads nearing the intracellular interface. On the other hand, cholesterol displays the similar clustering at the lower membrane region however, maintained a horizontal conformation, this time pointing the other way ( Fig. 4 , Tables 3 -5 ). The wild-type docking extends from the top of the transmembrane domain (TMD), while in the A540F mutant, sitosterol clusters solely at the bottom with the hydroxyl group facing the intracellular interface of the membrane.

Conservation analysis of protein sequences
Conservational analysis started with protein FASTA sequences gathered from Uniprot for ABCG1, ABCG2, ABCG4. ABCG5 and ABCG8 (P45844, Q9UNQ0, Q9H172, Q9H222, Q9H221). These were then pasted individually into NCBI's basic local alignment search tool for proteins (BLASTp), with the search parameters set at 300 species. Although the whole sequences were pasted in BLAST, only the Phenylalanine Highway (PH) was monitored for conservation. The PH was found to be minimum 99% conserved. The remaining percent was lost at times due to different alignments causing a break in sequences which was noticed after manual inspection. In Fig. 5 , we see the conservation of ABCG protein's PH in five different mammalian species.

Ethics Statements
This work did not involve the use of human subjects. The manuscript adheres to Ethics in publishing standard.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.