Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (11): 1-12     https://doi.org/10.11925/infotech.2096-3467.2022.0034
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
知识增强的生物医学文本生成式摘要研究*
邓露,胡珀(),李炫宏
华中师范大学计算机学院 武汉 430079
Abstracting Biomedical Documents with Knowledge Enhancement
Deng Lu,Hu Po(),Li Xuanhong
School of Computer Science, Central China Normal University, Wuhan 430079, China
全文: PDF (1518 KB)   HTML ( 30
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 将生物医学文本映射到生物医学领域超级叙词表以获得文本中包含的生物医学术语及其对应概念,并将术语和概念作为背景知识融入文本摘要模型中,提高文本摘要模型在生物医学文本上的摘要生成质量。【方法】 通过抽取式摘要技术获取文本的重要内容,然后结合生物医学领域知识库将文本重要内容中包含的术语与其对应的知识库概念一并抽取出来,作为背景知识融入神经网络生成式摘要模型的注意力机制中,使模型在领域知识引导下既可聚焦文本内部的重要信息,又可抑制因外部信息引入而可能产生的噪音问题,显著改善摘要的生成质量。【结果】 在三个生物医学领域数据集上的实验结果验证了本文方法的有效性,本文所提模型PG-meta在三个数据集上的ROUGE均值达到31.06,比原PG模型ROUGE均值高1.51。【局限】 未探索不同的生物医学领域背景知识获取方式对于模型增强效果的影响。【结论】 本文方法可帮助模型更好地学习生物医学文本深层含义,提高摘要生成质量。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
邓露
胡珀
李炫宏
关键词 生物医学文本挖掘生成式摘要领域知识知识增强    
Abstract

[Objective] This study proposes a new text summarization model for biomedicine research, aiming to improve the quality of their abstracts. [Methods] First, we obtained the important contents of the biomedical texts with extractive abstracting technology. Then, we combined the important contents with related knowledge base to extract the key terms and their corresponding concepts. Third, we integrated these contents and concepts to the neural network abstrcting model as background knowledge for the attention mechanism. With the help of domain knowledge, the proposed model can not only focus on the important information from the texts, but also reduce the noises occurring due to the introduction of external information. [Results] We examined the proposed model with three biomedical data sets. The average ROUGE of the proposed model’s PG-meta reached 31.06, which was 1.51 higher than the average ROUGE of the original PG model. [Limitations] We did not investigate the impacts of different knowledge acquiring methods on the effectiveness of our model. [Conclusions] The proposed model can better learn the in-depth meaning of biomedical documents and improve the quality of their abstracts.

Key wordsBiomedical Text Mining    Generative Abstract    Domain Knowledge    Knowledge Enhancement
收稿日期: 2022-01-12      出版日期: 2023-01-13
ZTFLH:  TP393  
  G250  
基金资助:* 国家语委科研项目(YB135-149);中央高校基本科研业务费项目(CCNU20ZT012)
通讯作者: 胡珀     E-mail: phu@mail.ccnu.edu.cn
引用本文:   
邓露,胡珀,李炫宏. 知识增强的生物医学文本生成式摘要研究*[J]. 数据分析与知识发现, 2022, 6(11): 1-12.
Deng Lu,Hu Po,Li Xuanhong. Abstracting Biomedical Documents with Knowledge Enhancement. Data Analysis and Knowledge Discovery, 2022, 6(11): 1-12.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0034      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I11/1
Fig.1  PG-meta模型的整体结构
Fig.2  知识获取实例
类别 内容
参考
摘要
Glaucoma is a leading cause of blindness within the United States and the leading cause of blindness among African-Americans. Measurement of intraocular pressure only is no longer considered adequate for screening. Recognition of risk factors and examination of the optic nerve are key strategies to identify individuals at risk. Medical and surgical treatment of glaucoma have ······
核心
内容
control of iop regulation resides within the aqueous outflow system of the eye ( grant , 1958 ) and iop regulation becomes abnormal in glaucoma.<q>iop is the only treatable risk factor.<q>the intrinsic outflow system abnormality inglaucoma is unknown but is described as poag······
术语
概念对
Glaucoma:Eye disease
IOP:Intraocular pressure
POAG:Glaucoma, Primary Open Angle
Table 1  知识获取与对应摘要的关系实例
数据集 数据集
大小
原文平均
长度(词)
摘要平均
长度(词)
原文术语概念对平均
数量(个)
标准摘要术语概念对平均
数量(个)
Full-Abs 训练集 3 200 25 825 1 072 # 10
验证集 400 24 348 903 # 9
测试集 400 24 484 936 # 9
Abs-Ti 训练集 3 514 1 477 111 9 3
验证集 439 1 439 109 9 3
测试集 439 1 466 113 9 3
BioAbsTi 训练集 24 631 1 574 118 109 4
验证集 8 210 1 584 118 109 4
测试集 8 210 1 562 117 10 4
Table 2  实验数据集统计结果
Fig.3  不同d值下的实验结果
Full-Abs Abs-Ti BioAbsTi
模型 R-1 R-2 R-L R-1 R-2 R-L R-1 R-2 R-L AVG
Lead 27.93 15.38 24.58 23.36 13.79 24.58 31.79 16.55 28.67 22.96
TextRank 27.82 14.64 24.83 24.83 13.56 20.93 33.52 17.29 30.23 23.07
PG 35.95 20.42 29.87 33.25 18.36 31.45 36.58 25.56 34.52 29.55
Keywords-PG 36.23 20.89 30.52 33.75 19.06 31.83 36.93 26.22 34.85 30.03
PG-meta(All) 36.15 20.85 30.28 33.54 18.75 31.79 36.88 25.96 34.58 29.86
BERT+聚类 32.85 18.87 28.85 27.53 14.22 23.93 35.25 21.52 32.85 26.20
BERTSum 33.93 19.86 29.18 28.70 14.56 24.34 37.60 22.81 33.65 27.18
PG-meta 37.05 21.96 33.58 34.82 20.21 32.97 37.58 26.26 35.19 31.06
Table 3  参与比较的基线模型与PG-meta模型在三个数据集上的实验结果比较
类别 文本内容
原文 ······. A recent study reported that cardiac lymphatic endothelial cells (LECs) stem from venous and non-venous origins in mice. Here, we identified Isl1-expressing progenitors as a potential non-venous origin of cardiac LECs. Genetic lineage tracing with Isl1-Cre reporter mice suggested a possible contribution from the Isl1-expressing pharyngeal mesoderm constituting the second heart field to lymphatic vessels around the cardiac outflow tract as well as to those in the facial skin and the lymph sac. Isl1(+) lineage-specific deletion of Prox1 resulted in disrupted LYVE1(+) vessel structures, indicating a Prox1-dependent mechanism in this contribution. ······
参考摘要 Isl1-expressing non-venous cell lineage contributes to cardiac lymphatic vessel development.
译文: Isl1-expressing的非静脉细胞谱系有助于心脏淋巴管发育。
BERTSum模型的
摘要结果
Here, we identified Isl1-expressing progenitors as a potential non-venous origin of cardiac LECs.
译文:在这里,我们确定Isl1-expressing的祖细胞是心脏LECs的潜在非静脉起源。
PG模型的摘要结果 The non-venous cell lineage can help the development of cardiac lymphatic vessels.
译文:非静脉细胞谱系可以帮助心脏淋巴管的发育。
PG-meta模型的
摘要结果
The non-venous cell lineage of Isl1-expressing promotes the development of cardiac lymphatic vessels.
译文: Isl1-expression的非静脉细胞谱系促进心脏淋巴管的发育。
Table 4  三种模型自动生成的摘要结果对比
数据集 指标 PG PG-meta
(LEAD)
PG-meta(TR) PG-meta(BS)
Full-Abs R-1 35.95 36.89 36.97 37.05
R-2 20.42 21.88 21.97 21.96
R-L 29.87 32.95 33.56 33.58
Abs-Ti R-1 33.25 34.67 34.85 34.82
R-2 18.36 19.66 19.35 20.21
R-L 31.45 32.58 32.73 32.97
BioAbsTi R-1 36.58 37.42 37.55 37.58
R-2 25.56 25.93 26.13 26.26
R-L 34.52 34.97 35.16 35.19
AVG 29.55 30.77 30.91 31.06
Table 5  不同重要内容抽取方法下的实验结果
数据集 指标 PG-meta(term) PG-meta(con) PG-meta(t-c)
Full-Abs R-1 36.53 36.75 37.05
R-2 21.85 21.72 21.96
R-L 33.24 33.35 33.58
Abs-Ti R-1 34.63 34.79 34.82
R-2 20.19 20.25 20.21
R-L 32.73 32.79 32.97
BioAbsTi R-1 37.46 37.59 37.58
R-2 26.07 26.12 26.26
R-L 35.07 35.03 35.19
AVG 30.86 30.93 31.06
Table 6  不同知识关联粒度下的实验结果
Fig.4  不同知识融合方式实验结果
[1] Mishra R, Bian J T, Fiszman M, et al. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research[J]. Journal of Biomedical Informatics, 2014, 52: 457-467.
doi: 10.1016/j.jbi.2014.06.009 pmid: 25016293
[2] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[3] 王凯祥. 面向查询的自动文本摘要技术研究综述[J]. 计算机科学, 2018, 45(S2): 12-16.
[3] (Wang Kaixiang. Survey of Query-Oriented Automatic Summarization Technology[J]. Computer Science, 2018, 45(S2): 12-16.)
[4] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
doi: 10.11896/j.issn.1002-137X.2016.06.048
[4] (Yu Shanshan, Su Jindian, Li Pengfei. Improved TextRank-Based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
doi: 10.11896/j.issn.1002-137X.2016.06.048
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[6] Liu Y. Fine-Tune BERT for Extractive Summarization[OL]. arXiv Preprint, arXiv: 1903.10318.
[7] Hermann K M, Kočiský T, Grefenstette E, et al. Teaching Machines to Read and Comprehend[OL]. arXiv Preprint, arXiv: 1506.03340.
[8] Zhou L, Hovy E. Template-Filtered Headline Summarization[C]// Proceedings of the ACL-04 Workshop:Text Summarization Branches Out. 2004: 56-60.
[9] 石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116.
[9] (Shi Lei, Ruan Xuanmin, Wei Ruibin, et al. Abstractive Summarization Based on Sequence to Sequence Models: A Review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116.)
[10] Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
[11] Chen T, Xu R F, He Y L, et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230.
doi: 10.1016/j.eswa.2016.10.065
[12] Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1243-1252.
[13] Cai T, Shen M J, Peng H L, et al. Improving Transformer with Sequential Context Representations for Abstractive Text Summarization[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019: 512-524.
[14] 罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10): 1046-1059.
[14] (Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10): 1046-1059.)
[15] Bhatia N, Jaiswal A. Automatic Text Summarization and It's Methods—A Review[C]// Proceedings of the 6th International Conference-Cloud System and Big Data Engineering(Confluence). IEEE, 2016: 65-72.
[16] Nallapati R, Zhou B, Dos Santos C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 2016: 280-290.
[17] Gu J, Lu Z, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1631-1640.
[18] Tu Z, Lu Z, Liu Y, et al. Modeling Coverage for Neural Machine Translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 76-85.
[19] Jiang X P, Hu P, Hou L W, et al. Improving Pointer-Generator Network with Keywords Information for Chinese Abstractive Summarization[C]// Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. 2018: 464-474.
[20] Nasr-Azadani M, Ghadiri N, Davoodijam E. Graph-Based Biomedical Text Summarization: An Itemset Mining and Sentence Clustering Approach[J]. Journal of Biomedical Informatics, 2018, 84: 42-58.
doi: S1532-0464(18)30111-4 pmid: 29906584
[21] Yoo I, Hu X H, Song I Y. A Coherent Graph-Based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method[J]. BMC Bioinformatics, 2007, 8(S9): S4.
[22] Afzal M, Alam F, Malik K M, et al. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation[J]. Journal of Medical Internet Research, 2020, 22(10): e19810.
doi: 10.2196/19810
[23] Moradi M, Dorffner G, Samwald M. Deep Contextualized Embeddings for Quantifying the Informative Content in Biomedical Text Summarization[J]. Computer Methods and Programs in Biomedicine, 2020, 184: 105117.
doi: 10.1016/j.cmpb.2019.105117
[24] Kondadadi R, Manchanda S, Ngo J, et al.Optum at MEDIQA 2021: Abstractive Summarization of Radiology Reports Using Simple BART Finetuning[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 280-284.
[25] Mahajan D, Tsou C H, Liang J J. IBM Research at MEDIQA 2021: Toward Improving Factual Correctness of Radiology Report Abstractive Summarization[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 302-310.
[26] Sotudeh S, Goharian N, Filice R. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1899-1905.
[27] Bhattacharya S, Ha-Thuc V, Srinivasan P. MeSH: A Window into Full Text for Document Summarization[J]. Bioinformatics, 2011, 27(13): i120-i128.
doi: 10.1093/bioinformatics/btr223
[28] Plaza L, Díaz A, Gervás P. A Semantic Graph-Based Approach to Biomedical Summarisation[J]. Artificial Intelligence in Medicine, 2011, 53(1): 1-14.
doi: 10.1016/j.artmed.2011.06.005 pmid: 21752612
[29] Bodenreider O. The Unified Medical Language System(UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270.
doi: 10.1093/nar/gkh061
[30] MacAvaney S, Sotudeh S, Cohan A, et al. Ontology-Aware Clinical Abstractive Summarization[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 1013-1016.
[31] Zhang Y, Ding D Y, Qian T, et al. Learning to Summarize Radiology Findings[C]// Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. 2018: 204-213.
[32] Mohan S, Li D. MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts[OL]. arXiv Preprint, arXiv: 1902.09476.
[33] Du Y P, Li Q X, Wang L L, et al. Biomedical-Domain Pre-Trained Language Model for Extractive Summarization[J]. Knowledge-Based Systems, 2020, 199: 105964.
doi: 10.1016/j.knosys.2020.105964
[34] Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. 2004: 74-81.
[35] Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003: 71-78.
[1] 景慎旗, 赵又霖. 基于医学领域知识和远程监督的医学实体关系抽取研究*[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[2] 贾明华, 王秀利. 基于BERT和互信息的金融风险逻辑关系量化方法[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[3] 王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[4] 陈果, 肖璐. 网络社区中的知识元链接体系构建研究*[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[5] 胡新明, 罗建军, 夏火松. 基于商品领域知识的交互式推荐系统[J]. 现代图书情报技术, 2014, 30(10): 56-62.
[6] 宋文, 黄金霞, 刘毅, 汤怡洁. 面向知识发现的SKE关键技术及服务[J]. 现代图书情报技术, 2012, 28(7): 13-18.
[7] 沈东婧, 许咏丽, 夏勃. 面向植物领域科学家网络应用研究与实践[J]. 现代图书情报技术, 2012, 28(7): 27-32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn