Analysis of the conformations of the HIV-1 protease from a large crystallographic data set

The HIV-1 protease performs essential roles in viral maturation by processing specific cleavage sites in the Gag and Gag-Pol precursor polyproteins to release their mature forms. Here the analysis of a large HIV-1 protease data set (containing 552 dimer structures) are reported. These data are related to article entitled “Conformations of the HIV-1 protease: a crystal structure data set analysis” (Palese, 2017) [1].


a b s t r a c t
The HIV-1 protease performs essential roles in viral maturation by processing specific cleavage sites in the Gag and Gag-Pol precursor polyproteins to release their mature forms. Here the analysis of a large HIV-1 protease data set (containing 552 dimer structures) are reported. These data are related to article entitled "Conformations of the HIV-1 protease: a crystal structure data set analysis" (Palese, 2017) [1].
& 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Chemistry, Biology.

More specific subject area
Biochemistry, HIV-1 protease structure. Table (csv files), text file, figure, animated figures.

How data was acquired
Input data for analysis were obtained as pdb files from public database.

Data format
Raw: pdb files (as text files). Analyzed: table (csv files), text file, graph, animated GIF.

Experimental factors
Raw pdb files were checked for quality.
Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/dib

Experimental features
The pdb files included in the database were analyzed by different computational protocols. Data source location Not applicable.

Data accessibility
Analyzed data are within this article.

Value of the data
The described data set includes a very large number of the public available structures of the HIV-1 protease.
The database can be useful in the drug design and analysis studies. The evidence that preferential conformations are adopted by different sequences could represent an interesting benchmark for the computational prediction and fine tuning of protein structures.

Data sets
The large HIV-1 protease data set used in the analysis is reported in csv format (file name HIV-1_dataset.csv). Data in this file are arranged in columns (headers in the first row): the first column reports the PDB id of each entry; the second column refers to the internal sequence id; the last two columns report the calculated first and second principal component projections, respectively (calculated by the truncated SVD method [1]). The high quality structures are listed in the file HIV-1_HQ_dataset.csv. In the file are reported the PDB id, the available quality data (R observed, R all, R work, R free, refinement resolution, and the R difference); last column reports the sequence cluster id.
The full set of fluctuations (see [1]) is reported in the file fluctuations.csv. Each row in this file represents an eigenvector (297 eigenvector describe the monomer), and each amino acid is reported as a column (99 amino acid compose the monomer).
The first and second principal modes calculated for the monomer data set are reported as animated GIF image (see [1] for details). Some relevant modes are reported as nmd file [1][2][3].
Supplementary material related to this article can be found online at: doi:10.1016/j.dib.2017.09. 076.   Some results of the analysis reported in [1] on the above described data set are reported as Figs. 1-4. The reader could refers to [1] for full details.

Relevant sequence clusters in the data set
Some of the sequence clusters of the HIV-1 protease data set discussed in [1] are reported in Fig. 5; differences respect to the Consensus B sequence (Stanford HIV database) [2][3][4][5] are in red.

Experimental design, materials and methods
The structures sharing the 90% identity with the Consensus B sequence (Stanford HIV database) [4][5][6][7] were initially considered. The X-ray structures of the HIV-1 protease were obtained from the PDB [8][9][10]. A total number of 581 structures in the PDB met this criterion. The structures obtained by X-ray, of dimeric form, classified with an E.C. number 3.4.23.16 (HIV-1 retropepsin), and with a refinement resolution better of at least 3.1 Å were further selected. The number of alpha-carbon atoms in the downloaded pdb files was checked by the bash grep function after deleting the multiple conformations by the bash sed command. Few structures requested a further manual editing step. Finally 552 HIV-1 protease structures, as dimer, were included in the data set.
The structures contained in a data set were aligned to a common reference by Tcl (www.tcl.tk) scripting in VMD [3]. The new atomic coordinates were stored in a pdb file. For the analysis, the Cartesian coordinates of alpha-carbon atoms of the superposed structures of the data set were extracted and arranged in a matrix form by a Tcl script in VMD. Bracket in the obtained text file were removed in vi (www.vim.org). The result was that the coarse grained data conformations were   5. Some sequence groups of the HIV-1 protease data set (see [1]). arranged in a matrix such that each row represented a sample, and each column a degree of freedom. This data matrix was analyzed by methods described in [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25], as reported in [1].