Diagnosis Then Aggregation: An Adaptive Ensemble Strategy for Keyphrase Extraction

Jin, Xin; Liu, Qi; Yue, Linan; Liu, Ye; Zhao, Lili; Gao, Weibo; Gong, Zheng; Zhang, Kai; Bi, Haoyang

doi:10.1007/978-981-99-8850-1_46

Xin Jin^11,12,
Qi Liu^11,12,
Linan Yue^11,12,
Ye Liu^11,12,
Lili Zhao^11,12,
Weibo Gao^11,12,
Zheng Gong^11,12,
Kai Zhang^11,12 &
…
Haoyang Bi^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

169 Accesses

Abstract

Keyphrase extraction (KE) is a fundamental task in the information extraction, which has recently gained increasing attention. However, when facing text with complex structure or high noise, current individual keyphrase extraction methods fail to handle capturing multiple features and limit the performance of the keyphrase extraction. To solve that, ensemble learning methods are employed to achieve better performance. Unfortunately, traditional ensemble strategies rely only on the extraction performance (e.g., Accuracy) of each algorithm on the whole dataset for keyphrase extraction, and the aggregated weights are commonly fixed, lacking fine-grained considerations and adaptiveness to the data. To this end, in this paper, we propose an Adaptive Ensemble strategy for Keyphrase Extraction (AEKE) that can aggregate individual KE models adaptively. Specifically, we first obtain the multi-dimensional abilities of individual KE models by employing cognitive diagnosis methods. Then, based on the diagnostic abilities, we introduce an adaptive ensemble strategy to yield an accurate and reliable weight distribution for model aggregation when facing new data, and further apply it to improve keyphrase extraction in the model aggregation. Extensive experimental results on real-world datasets clearly validate the effectiveness of AEKE. Code is released at https://github.com/kingiv4/AEKE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/boudinfl/pke..

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 667–672 (2018)
Google Scholar
Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: A text feature based automatic keyword extraction method for single documents. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 684–691. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_63
Chapter Google Scholar
De La Torre, J.: Dina model and parameter estimation: a didactic. Journal of educational and behavioral statistics 34(1), 115–130 (2009)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, H., Luo, X.: AttentionRank: unsupervised keyphrase extraction using self and cross attentions. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1919–1928 (2021)
Google Scholar
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14, 241–258 (2020)
Article Google Scholar
Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
Google Scholar
Gallina, Y., Boudin, F., Daille, B.: Large-scale evaluation of keyphrase extraction models. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 271–278 (2020)
Google Scholar
Ganaie, M.A., Hu, M., Malik, A., Tanveer, M., Suganthan, P.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)
Article Google Scholar
Gao, W., et al.: RCD: relation map driven cognitive diagnosis for intelligent education systems. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 501–510 (2021)
Google Scholar
Gao, W., et al.: Leveraging transferable knowledge concept graph embedding for cold-start cognitive diagnosis. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 983–992
Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, C., Yang, L., Gao, W., Li, Y., Liu, Y.: MuST: an interpretable multidimensional strain theory model for corporate misreporting prediction. Electron. Commer. Res. Appl. 57, 101225 (2023)
Article Google Scholar
Liu, Q.: Towards a new generation of cognitive diagnosis. In: IJCAI, pp. 4961–4964 (2021)
Google Scholar
Liu, Y., et al.: Technical phrase extraction for patent mining: a multi-level approach. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1142–1147. IEEE (2020)
Google Scholar
Liu, Y., et al.: TechPat: technical phrase extraction for patent mining. ACM Trans. Knowl. Disc. Data 17, 1–31 (2023)
Article Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376 (2010)
Google Scholar
Lord, F.: A Theory of Test Scores. Psychometric Monographs (1952)
Google Scholar
Meng, R., Wang, T., Yuan, X., Zhou, Y., He, D.: General-to-specific transfer labeling for domain adaptable keyphrase generation. arXiv preprint arXiv:2208.09606 (2022)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. pp. 404–411 (2004)
Google Scholar
Papagiannopoulou, E., Tsoumakas, G.: A review of keyphrase extraction. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 10(2), e1339 (2020)
Article Google Scholar
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Article Google Scholar
Song, M., Feng, Y., Jing, L.: A survey on recent advances in keyphrase extraction from pre-trained language models. In: Findings of the Association for Computational Linguistics, EACL 2023, pp. 2108–2119 (2023)
Google Scholar
Sun, S., Liu, Z., Xiong, C., Liu, Z., Bao, J.: Capturing global informativeness in open domain keyphrase extraction. In: Wang, L., Feng, Y., Hong, Yu., He, R. (eds.) NLPCC 2021. LNCS (LNAI), vol. 13029, pp. 275–287. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88483-3_21
Chapter Google Scholar
Sun, Y., Qiu, H., Zheng, Y., Wang, Z., Zhang, C.: SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)
Article Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860 (2008)
Google Scholar
Wang, F., et al.: Neural cognitive diagnosis for intelligent education systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6153–6161 (2020)
Google Scholar
Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the EMNLP-IJCNLP 2019, pp. 5175–5184 (2019)
Google Scholar
Yue, L., Liu, Q., Du, Y., An, Y., Wang, L., Chen, E.: DARE: disentanglement-augmented rationale extraction. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10–18 (2017)
Google Scholar

Download references

Acknowledgements

This research was supported by grants from the National Key Research and Development Program of China (Grant No. 2021YFF0901003).

Author information

Authors and Affiliations

Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China, Hefei, China
Xin Jin, Qi Liu, Linan Yue, Ye Liu, Lili Zhao, Weibo Gao, Zheng Gong, Kai Zhang & Haoyang Bi
State Key Laboratory of Cognitive Intelligence, Hefei, China
Xin Jin, Qi Liu, Linan Yue, Ye Liu, Lili Zhao, Weibo Gao, Zheng Gong, Kai Zhang & Haoyang Bi

Authors

Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Linan Yue
View author publications
You can also search for this author in PubMed Google Scholar
Ye Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lili Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weibo Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Gong
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haoyang Bi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Liu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, X. et al. (2024). Diagnosis Then Aggregation: An Adaptive Ensemble Strategy for Keyphrase Extraction. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_46

Download citation

DOI: https://doi.org/10.1007/978-981-99-8850-1_46
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Diagnosis Then Aggregation: An Adaptive Ensemble Strategy for Keyphrase Extraction