Dataset of 2-(2-(4-aryloxybenzylidene) hydrazinyl) benzothiazole derivatives for GQSAR of antitubercular agents

Fragment based Quantitative structure activity relationship (QSAR) analysis on reported 25 2-(2-(4-aryloxybenzylidene) hydrazinyl) benzothiazole dataset as antitubercular agents were carried out. Molecules in the current dataset were fragmented into six fragments (R1, R2, R3, R4, R5, R6).Group based QSAR Models were derived using Multiple linear regression (MLR) analysis and selected on the basis of various statistical parameters. Dataset of benzothiazole reveled importance of presence of halogen atoms on is essential requirement. The generated models will provide structural requirements of benzothiazole derivatives which can be used to design and develop potent antitubercular derivatives.


a b s t r a c t
Fragment based Quantitative structure activity relationship (QSAR) analysis on reported 25 2-(2-(4-aryloxybenzylidene) hydrazinyl) benzothiazole dataset as antitubercular agents were carried out. Molecules in the current dataset were fragmented into six fragments (R1, R2, R3, R4, R5, R6).Group based QSAR Models were derived using Multiple linear regression (MLR) analysis and selected on the basis of various statistical parameters. Dataset of benzothiazole reveled importance of presence of halogen atoms on is essential requirement. The generated models will provide structural requirements of benzothiazole derivatives which can be used to design and develop potent antitubercular derivatives.

Value of the data
Tuberculosis is one of most lethal disease in the current decade; development of potent antitubercular compounds is need of time.
GQSAR modelling data was developed for predicting structural properties of benzothiazole dataset which are infusing antitubercular activity.
The GQSAR models generated will be utilized to screen various heterocyclic datasets for antitubercular potency, which will lead to development of novel antitubercular compounds.

Data
The data shown here regarding a GQSAR equation development that is used to predict contribution of substituents towards antitubercular potential of benzothiazole dataset.

Data set preparation
Molecular data set for current study were taken from literature reported by Telvekar et al. [1]. All the 24 structures of benzothiazole derivatives were drawn using 2D builder module of Vlife MDS 4.3. These 2D structures were converted into 3D via using V life engine platform. Geometry and structures of 3D molecules were optimized via energy minimization process using Merck molecular force field (MMFF) and Gasteiger charges. A common template which is a representative of the entire molecules under study was prepared with the presence of a dummy atom (X) at the substitution site.

Calculation of descriptors
The common chemical structure as shown in Fig. 1 was utilized for development of GQSAR model. The molecules in the data set were fragmented in six different fragments (R-R6). The fragmented molecules were incorporated into the QSAR module of V life MDS for calculation of molecular descriptors. Molecular descriptors are nothing but the numerical values which represents physical and chemical information of the molecules. In GQSAR studies descriptors are representation of the physical and chemical behavior of substituents present.

Data selection and building G-QSAR model [2-5]
Generated dataset of 25 benzothiazole derivatives were randomly divided into training set and test set 17 and 8 molecules respectively. Random distribution of training and test set will results into uniform distribution of biological activity across the molecules under study. Multiple linear regression analysis was utilized for development GQSAR models, with number of dependent variable limited to not more than 3 per model (Table 1).

Validation of the developed G-QSAR model [6-10]
Validation is a critical step in the QSAR model development. Validation methods are required for establishing predictability of QSAR model on unseen data and for determination of complexity of QSAR model which is justified by the data under study. Number of methods like the methods of least squares fit (R2), cross validation (Q2), adjusted R2 (R2adj), chi-squared test (χ2), root mean squared error (RMSE), bootstrapping and scrambling (Y-Randomization) are reported for internal validation of QSAR models. Observed activity of molecules in dataset was expressed in MIC(μg/ml) and converted into pMIC for QSAR analysis. All the molecules in the dataset are having activity (MIC) in the range 1.5-29.00 μg/ml.

QSAR analysis
Congeneric nature of the dataset is basis prerequisite for any QSAR analysis. Fragment based QSAR is recent methodology were complex structures can be analyzed. 30 different G-QSAR models were generated and best one of them are selected on basis of the statistical values like r 2 , q 2 , pred_r 2 , F-test and standard error. The predicted activity data via QSAR models was in accordance with the observed biological activity with small variations which were clearly identified in the correlation plot of different model (

Acknowledgement
The authors are thank full to Vlife Sciences for providing software for study.

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2017.08.006.