知识增强的生物医学文本生成式摘要研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.0034

数据分析与知识发现

2022, Vol. 6

Issue (11): 1-12 https://doi.org/10.11925/infotech.2096-3467.2022.0034

研究论文

本期目录 | 过刊浏览 | 高级检索

知识增强的生物医学文本生成式摘要研究^*

邓露,胡珀(

),李炫宏

华中师范大学计算机学院武汉 430079

Abstracting Biomedical Documents with Knowledge Enhancement

Deng Lu,Hu Po(

),Li Xuanhong

School of Computer Science, Central China Normal University, Wuhan 430079, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1518 KB) HTML ( 30 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 将生物医学文本映射到生物医学领域超级叙词表以获得文本中包含的生物医学术语及其对应概念，并将术语和概念作为背景知识融入文本摘要模型中，提高文本摘要模型在生物医学文本上的摘要生成质量。【方法】 通过抽取式摘要技术获取文本的重要内容，然后结合生物医学领域知识库将文本重要内容中包含的术语与其对应的知识库概念一并抽取出来，作为背景知识融入神经网络生成式摘要模型的注意力机制中，使模型在领域知识引导下既可聚焦文本内部的重要信息，又可抑制因外部信息引入而可能产生的噪音问题，显著改善摘要的生成质量。【结果】 在三个生物医学领域数据集上的实验结果验证了本文方法的有效性，本文所提模型PG-meta在三个数据集上的ROUGE均值达到31.06，比原PG模型ROUGE均值高1.51。【局限】 未探索不同的生物医学领域背景知识获取方式对于模型增强效果的影响。【结论】 本文方法可帮助模型更好地学习生物医学文本深层含义，提高摘要生成质量。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	邓露
	胡珀
	李炫宏

关键词 ：生物医学文本挖掘, 生成式摘要, 领域知识, 知识增强

Abstract：

[Objective] This study proposes a new text summarization model for biomedicine research, aiming to improve the quality of their abstracts. [Methods] First, we obtained the important contents of the biomedical texts with extractive abstracting technology. Then, we combined the important contents with related knowledge base to extract the key terms and their corresponding concepts. Third, we integrated these contents and concepts to the neural network abstrcting model as background knowledge for the attention mechanism. With the help of domain knowledge, the proposed model can not only focus on the important information from the texts, but also reduce the noises occurring due to the introduction of external information. [Results] We examined the proposed model with three biomedical data sets. The average ROUGE of the proposed model’s PG-meta reached 31.06, which was 1.51 higher than the average ROUGE of the original PG model. [Limitations] We did not investigate the impacts of different knowledge acquiring methods on the effectiveness of our model. [Conclusions] The proposed model can better learn the in-depth meaning of biomedical documents and improve the quality of their abstracts.

Key words： Biomedical Text Mining Generative Abstract Domain Knowledge Knowledge Enhancement

收稿日期: 2022-01-12 出版日期: 2023-01-13

ZTFLH:	TP393
	G250

基金资助:* 国家语委科研项目(YB135-149);中央高校基本科研业务费项目(CCNU20ZT012)

通讯作者: 胡珀 E-mail: phu@mail.ccnu.edu.cn

引用本文:

邓露,胡珀,李炫宏. 知识增强的生物医学文本生成式摘要研究^*[J]. 数据分析与知识发现, 2022, 6(11): 1-12.
Deng Lu,Hu Po,Li Xuanhong. Abstracting Biomedical Documents with Knowledge Enhancement. Data Analysis and Knowledge Discovery, 2022, 6(11): 1-12.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0034 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I11/1

Fig.1 PG-meta模型的整体结构

Fig.2 知识获取实例

Table 1 知识获取与对应摘要的关系实例

Table 2 实验数据集统计结果

Fig.3 不同d值下的实验结果

Table 3 参与比较的基线模型与PG-meta模型在三个数据集上的实验结果比较

Table 4 三种模型自动生成的摘要结果对比

Table 5 不同重要内容抽取方法下的实验结果

Table 6 不同知识关联粒度下的实验结果

Fig.4 不同知识融合方式实验结果

[1]	Mishra R, Bian J T, Fiszman M, et al. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research[J]. Journal of Biomedical Informatics, 2014, 52: 457-467. doi: 10.1016/j.jbi.2014.06.009 pmid: 25016293
[2]	See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[3]	王凯祥. 面向查询的自动文本摘要技术研究综述[J]. 计算机科学, 2018, 45(S2): 12-16.
[3]	(Wang Kaixiang. Survey of Query-Oriented Automatic Summarization Technology[J]. Computer Science, 2018, 45(S2): 12-16.)
[4]	余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247. doi: 10.11896/j.issn.1002-137X.2016.06.048
[4]	(Yu Shanshan, Su Jindian, Li Pengfei. Improved TextRank-Based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.) doi: 10.11896/j.issn.1002-137X.2016.06.048
[5]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[6]	Liu Y. Fine-Tune BERT for Extractive Summarization[OL]. arXiv Preprint, arXiv: 1903.10318.
[7]	Hermann K M, Kočiský T, Grefenstette E, et al. Teaching Machines to Read and Comprehend[OL]. arXiv Preprint, arXiv: 1506.03340.
[8]	Zhou L, Hovy E. Template-Filtered Headline Summarization[C]// Proceedings of the ACL-04 Workshop:Text Summarization Branches Out. 2004: 56-60.
[9]	石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116.
[9]	(Shi Lei, Ruan Xuanmin, Wei Ruibin, et al. Abstractive Summarization Based on Sequence to Sequence Models: A Review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116.)
[10]	Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
[11]	Chen T, Xu R F, He Y L, et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230. doi: 10.1016/j.eswa.2016.10.065
[12]	Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1243-1252.
[13]	Cai T, Shen M J, Peng H L, et al. Improving Transformer with Sequential Context Representations for Abstractive Text Summarization[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019: 512-524.
[14]	罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10): 1046-1059.
[14]	(Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10): 1046-1059.)
[15]	Bhatia N, Jaiswal A. Automatic Text Summarization and It's Methods—A Review[C]// Proceedings of the 6th International Conference-Cloud System and Big Data Engineering(Confluence). IEEE, 2016: 65-72.
[16]	Nallapati R, Zhou B, Dos Santos C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 2016: 280-290.
[17]	Gu J, Lu Z, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1631-1640.
[18]	Tu Z, Lu Z, Liu Y, et al. Modeling Coverage for Neural Machine Translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 76-85.
[19]	Jiang X P, Hu P, Hou L W, et al. Improving Pointer-Generator Network with Keywords Information for Chinese Abstractive Summarization[C]// Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. 2018: 464-474.
[20]	Nasr-Azadani M, Ghadiri N, Davoodijam E. Graph-Based Biomedical Text Summarization: An Itemset Mining and Sentence Clustering Approach[J]. Journal of Biomedical Informatics, 2018, 84: 42-58. doi: S1532-0464(18)30111-4 pmid: 29906584
[21]	Yoo I, Hu X H, Song I Y. A Coherent Graph-Based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method[J]. BMC Bioinformatics, 2007, 8(S9): S4.
[22]	Afzal M, Alam F, Malik K M, et al. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation[J]. Journal of Medical Internet Research, 2020, 22(10): e19810. doi: 10.2196/19810
[23]	Moradi M, Dorffner G, Samwald M. Deep Contextualized Embeddings for Quantifying the Informative Content in Biomedical Text Summarization[J]. Computer Methods and Programs in Biomedicine, 2020, 184: 105117. doi: 10.1016/j.cmpb.2019.105117
[24]	Kondadadi R, Manchanda S, Ngo J, et al.Optum at MEDIQA 2021: Abstractive Summarization of Radiology Reports Using Simple BART Finetuning[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 280-284.
[25]	Mahajan D, Tsou C H, Liang J J. IBM Research at MEDIQA 2021: Toward Improving Factual Correctness of Radiology Report Abstractive Summarization[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 302-310.
[26]	Sotudeh S, Goharian N, Filice R. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1899-1905.
[27]	Bhattacharya S, Ha-Thuc V, Srinivasan P. MeSH: A Window into Full Text for Document Summarization[J]. Bioinformatics, 2011, 27(13): i120-i128. doi: 10.1093/bioinformatics/btr223
[28]	Plaza L, Díaz A, Gervás P. A Semantic Graph-Based Approach to Biomedical Summarisation[J]. Artificial Intelligence in Medicine, 2011, 53(1): 1-14. doi: 10.1016/j.artmed.2011.06.005 pmid: 21752612
[29]	Bodenreider O. The Unified Medical Language System(UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270. doi: 10.1093/nar/gkh061
[30]	MacAvaney S, Sotudeh S, Cohan A, et al. Ontology-Aware Clinical Abstractive Summarization[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 1013-1016.
[31]	Zhang Y, Ding D Y, Qian T, et al. Learning to Summarize Radiology Findings[C]// Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. 2018: 204-213.
[32]	Mohan S, Li D. MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts[OL]. arXiv Preprint, arXiv: 1902.09476.
[33]	Du Y P, Li Q X, Wang L L, et al. Biomedical-Domain Pre-Trained Language Model for Extractive Summarization[J]. Knowledge-Based Systems, 2020, 199: 105964. doi: 10.1016/j.knosys.2020.105964
[34]	Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. 2004: 74-81.
[35]	Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003: 71-78.

[1]	景慎旗, 赵又霖. 基于医学领域知识和远程监督的医学实体关系抽取研究^*[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[2]	贾明华, 王秀利. 基于BERT和互信息的金融风险逻辑关系量化方法[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[3]	王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究^*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[4]	陈果, 肖璐. 网络社区中的知识元链接体系构建研究^*[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[5]	胡新明, 罗建军, 夏火松. 基于商品领域知识的交互式推荐系统[J]. 现代图书情报技术, 2014, 30(10): 56-62.
[6]	宋文, 黄金霞, 刘毅, 汤怡洁. 面向知识发现的SKE关键技术及服务[J]. 现代图书情报技术, 2012, 28(7): 13-18.
[7]	沈东婧, 许咏丽, 夏勃. 面向植物领域科学家网络应用研究与实践[J]. 现代图书情报技术, 2012, 28(7): 27-32.

Viewed

Full text

Abstract

Cited

Shared

Discussed