计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 309-313.doi: 10.11896/jsjkx.210700262
孙福权1,2, 梁莹1
SUN Fu-quan1,2, LIANG Ying1
摘要: N6-甲基腺嘌呤(6mA)位点在调控真核生物的基因表达中起着至关重要的作用,准确识别6mA位点有助于理解基因组6mA位点的分布和生物功能。目前,各种实验测定方法应用于识别不同物种体内的6mA位点,但其太昂贵和耗时。基于此,文中提出了一种基于XGBoost的识别水稻基因组6mA位点模型P6mA-Rice。首先,通过引入序列核苷酸位置特异性及其他相关DNA性质,从7个方面提出了有效的特征提取准则,使其更全面地提取DNA信息;然后,基于XGBoost算法的特征重要性进行了进一步特征筛选,最终获得了特征集合P6mA;最后,在此基础之上,基于所选XGBoost分类算法,成功构建了P6mA-Rice甲基化位点识别模型。其相应的小刀实验结果表明,P6mA-Rice的敏感性为90.55%,特异性为88.48%,相关系数为79.00%,准确率为89.49%。大量实验验证了P6mA-Rice模型的有效性。
中图分类号:
[1] LI Y,ZHANG X M,LUAN M W,et al.Distribution Patterns ofDNA N6-Methyladenosine Modification in Non-coding RNA Genes[J].Frontiers in Genetics,2020,11. [2] O'BROWN Z K,GREER E L.N6-Methyladenine:A Conserved and Dynamic DNA Mark[J].Advances in Experimental Medicine & Biology,2016,945:213-246. [3] TSAI K,COURTNEY D G,CULLEN B R,et al.Addition of m6A to SV40 late mRNAs enhances viral structural gene expression and replication[J].Plos Pathogens,2018,14(2):e1006919. [4] FRELON S,DOUKI T,RAVANAT J L,et al.High-perfor-mance liquid chromatography--tandem mass spectrometry mea-surement of radiation-induced base damage to isolated and cellular DNA[J].Chemical Research in Toxicology,2000,13(10):1002-1010. [5] FLUSBERG B A,WEBSTER D R,LEE J H,at al.Direct detection of DNA methylation during single-molecule,real-time sequencing[J].Nature Methods,2010,7(6):461-465. [6] FENG P,YANG H,DING H,et al.iDNA6mA-PseKNC:Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC[J].Genomics,2019,111:96-102. [7] CHEN W,LV H,NIE F,et al.i6mA-Pred:identifying DNA N6-methyladenine sites in the rice genome[J].Bioinformatics,2019,35(11):2796-2800. [8] TAHIR M,TAYARA H,CHONG K T.iDNA6mA(5-steprule):Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule[J].Chemometrics & Intelligent Laboratory Systems,2019,189:96-101. [9] HAO L,DAO F Y,GUAN Z X,et al.iDNA6mA-Rice:A Computational Tool for Detecting N6-Methyladenine Sites in Rice[J].Frontiers in Genetics,2019,10:793. [10] FU L,NIU B,ZHU Z,et al.CD-HIT:accelerated for clustering the next-generation sequencing data[J].Bioinformatics Oxford,2012,28:3150-3152. [11] ZHANG X,LIU S.RBPPred:predicting RNA-binding proteins from sequence using SVM[J].Bioinformatics,2016,33(6):854-862. [12] HOFACKER I L,STADLER P F.Automatic Detection of Conserved Base Pairing Patterns in RNA Virus Genomes[J].Computers & Chemistry,1999,23(3/4):401-414. [13] MANAVALAN B,BASITH S,SHIN T H,et al.Meta4mC-pred:A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation[J].Molecular Therapy.Nucleic Acids,2019,16:733-744. [14] MANAVALAN B,SHIN T H,LEE G.DHSpred:support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest[J].Oncotarget,2018,9(2):1944. [15] XU R,ZHOU J,WANG H,et al.Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation[J].BMC Systems Biology,2015,9:S10. [16] CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:785-794. [17] KONG L,ZHANG L.i6mA-DNCP:Computational Identifica-tion of DNA N6-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features[J].Genes,2019,10(10). |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 王坤姝, 张泽辉, 高铁杠. 基于Hachimoji DNA和QR分解的遥感图像可逆隐藏算法 Reversible Hidden Algorithm for Remote Sensing Images Based on Hachimoji DNA and QR Decomposition 计算机科学, 2022, 49(8): 127-135. https://doi.org/10.11896/jsjkx.210700216 |
[3] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[4] | 陈慧嫔, 王琨, 杨恒, 郑智捷. 蓝舌病毒基因组序列多元概率特征可视化分析 Visual Analysis of Multiple Probability Features of Bluetongue Virus Genome Sequence 计算机科学, 2022, 49(6A): 27-31. https://doi.org/10.11896/jsjkx.210300129 |
[5] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120 |
[6] | 李京泰, 王晓丹. 基于代价敏感激活函数XGBoost的不平衡数据分类方法 XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function 计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064 |
[7] | 赵耿, 王超, 马英杰. 基于混沌序列相关性的峰均比抑制研究 Study on PAPR Reduction Based on Correlation of Chaotic Sequences 计算机科学, 2022, 49(5): 250-255. https://doi.org/10.11896/jsjkx.210400292 |
[8] | 沈少朋, 马洪江, 张智恒, 周相兵, 朱春满, 温佐承. 多元时序上状态转移模式的三支漂移检测 Three-way Drift Detection for State Transition Pattern on Multivariate Time Series 计算机科学, 2022, 49(4): 144-151. https://doi.org/10.11896/jsjkx.210600045 |
[9] | 赵耿, 李文健, 马英杰. 基于离散动力学反控制的混沌序列密码算法 Chaotic Sequence Cipher Algorithm Based on Discrete Anti-control 计算机科学, 2022, 49(4): 376-384. https://doi.org/10.11896/jsjkx.210300116 |
[10] | 高堰泸, 徐圆, 朱群雄. 基于A-DLSTM夹层网络结构的电能消耗预测方法 Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM 计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006 |
[11] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[12] | 陈晋鹏, 胡哈蕾, 张帆, 曹源, 孙鹏飞. 融合时间特性和用户偏好的卷积序列化推荐 Convolutional Sequential Recommendation with Temporal Feature and User Preference 计算机科学, 2022, 49(1): 115-120. https://doi.org/10.11896/jsjkx.201200192 |
[13] | 吴立波, 黄玉芳. 基于DNA链置换的逻辑推理问题研究 Logical Reasoning Based on DNA Strand Displacement 计算机科学, 2022, 49(1): 259-263. https://doi.org/10.11896/jsjkx.210200131 |
[14] | 程思伟, 葛唯益, 王羽, 徐建. BGCN:基于BERT和图卷积网络的触发词检测 BGCN:Trigger Detection Based on BERT and Graph Convolution Network 计算机科学, 2021, 48(7): 292-298. https://doi.org/10.11896/jsjkx.200500133 |
[15] | 陈静杰, 王琨. 不平衡油耗数据的区间预测方法 Interval Prediction Method for Imbalanced Fuel Consumption Data 计算机科学, 2021, 48(7): 178-183. https://doi.org/10.11896/jsjkx.200500145 |
|