Abstract
The purpose of this article is to present the principles of a developed algorithm for identifying trends based on the analysis of big text data and presenting the result in formats that are convenient for decision makers to be implemented in the iFORA Big Data Mining System. The paper provides an overview of existing text analytics algorithms; outlines the mathematical basis for identifying terms that mean trends, which is proposed and tested for dozens of implemented projects; describes approaches to clustering terms based on their vectors in the Word2vec space; and provides examples of two key visualizations (semantic, trend maps) that outline the range of topics and trends that characterize a particular area of study, as a way to adapt the results of the analysis to the tasks of decision makers. The limitations and advantages of using the proposed approach for decision support are discussed, and directions for future research are suggested.
REFERENCES
Pappa, G.L. and Freitas, A., Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach, Natural Computing Series, Berlin: Springer, 2010. https://doi.org/10.1007/978-3-642-02541-9_5
Yuan, Ye., Sun, P., and Fan, H., Automatic selection and evaluation on data mining algorithms, 2015 6th IEEE Int. Conf. on Software Engineering and Service Science (ICSESS), Beijing, 2015, IEEE, 2015, pp. 29–32. https://doi.org/10.1109/icsess.2015.7339000
Porter, A.L. and Zhang, Y., Tech mining of science & technology information resources for future-oriented technology analyses, Futures Res. Methodology Version, 2015, vol. 3.
Zhu, D. and Porter, A.L., Automated extraction and visualization of information for technological intelligence and forecasting, Technol. Forecast. Soc. Change, 2002, vol. 69, no. 5, pp. 495–506. https://doi.org/10.1016/s0040-1625(01)00157-3
Osipov, G., Smirnov, I., Tikhomirov, I., Sochenkov, I., Shelmanov, A., and Shvets, A., Information retrieval for R&D support, Professional Search in the Modern World, Paltoglou, G., Loizides, F., and Hansen, P., Eds., Lecture Notes in Computer Science, vol. 8830, Cham: Springer, 2014, pp. 45–69. https://doi.org/10.1007/978-3-319-12511-4_4
Newman, N.C., Porter, A.L., Newman, D., Trumbach, Ch.C., and Bolan, S.D., Comparing methods to extract technical content for technological intelligence, J. Eng. Technol. Manage., 2014, vol. 32, pp. 97–109. https://doi.org/10.1016/j.jengtecman.2013.09.001
Tseng, Yu., Lin, C., and Lin, Y.I., Text mining techniques for patent analysis, Inf. Process. Manage., 2007, vol. 43, no. 5, pp. 1216–1247. https://doi.org/10.1016/j.ipm.2006.11.011
Cooke, P., Gomez Uranga, M.G., and Etxebarria, G., Regional innovation systems: Institutional and organisational dimensions, Res. Policy, 1997, vol. 26, nos. 4–5, pp. 475–491. https://doi.org/10.1016/s0048-7333(97)00025-5
Kwakkel, J.H., Carley, S., Chase, J., and Cunningham, S.W., Visualizing geo-spatial data in science, technology and innovation, Technol. Forecast. Soc. Change, 2014, vol. 81, pp. 67–81. https://doi.org/10.1016/j.techfore.2012.09.007
Feldman, R., Fresko, M., Kinar, Ya., Lindell, Ye., Liphstat, O., Rajman, M., Schler, Yo., and Zamir, O., Text mining at the term level, Principles of Data Mining and Knowledge Discovery, Żytkow, J.M. and Quafafou, M., Eds., Lecture Notes in Computer Science, vol. 1510, Berlin: Springer, 1998, pp. 65–73. https://doi.org/10.1007/bfb0094806
Averbuch, M., Context-sensitive medical information retrieval, MEDINFO 2004, Fieschi, M., Coiera, E., and Li, Yu-Ch.J., Eds., Studies in Health Technology and Informatics, vol. 107, IOS Press, 2004, pp. 282–286. https://doi.org/10.3233/978-1-60750-949-3-282
Osipov, G., Smirnov, I., Tikhomirov, I., Sochenkov, I., and Shelmanov, A., Exactus expert—search and analytical engine for research and development support, Novel Applications of Intelligent Systems, Hadjiski, M., Kasabov, N., Filev, D., and Jotsov, V., Eds., Studies in Computational Intelligence, vol. 586, Cham: Springer, 2016, pp. 269–285. https://doi.org/10.1007/978-3-319-14194-7_14
Church, K.W., A stochastic parts program and noun phrase parser for unrestricted text, Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, 1988, IEEE, 1988, pp. 695–698. https://doi.org/10.1109/icassp.1989.266522
Wang, B., Liu, S., Ding, K., Liu, Z., and Xu, J., Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology, Scientometrics, 2014, vol. 101, no. 1, pp. 685–704. https://doi.org/10.1007/s11192-014-1342-3
Frantzi, K., Ananiadou, S., and Mima, H., Automatic recognition of multi-word terms: The C-value/NC-value method, Int. J. Digital Libr., 2000, vol. 3, no. 2, pp. 115–130. https://doi.org/10.1007/s007999900023
Javed, Z. and Afzal, H., Biomedical text mining for concept identification from traditional medicine literature, 2014 Int. Conf. on Open Source Systems & Technologies, Lahore, Pakistan, 2014, IEEE, 2014, pp. 206–211. https://doi.org/10.1109/icosst.2014.7029345
Rose, S., Engel, D., Cramer, N., and Cowley, W., Automatic keyword extraction from individual documents, Text Mining: Applciations and Theory, Berry, M.W. and Kogan, J., Eds., John Wiley & Sons, 2010, pp. 1–20. https://doi.org/10.1002/9780470689646.ch1
Salton, G. and Yu, C.T., On the construction of effective vocabularies for information retrieval, ACM SIGPLAN Not., 1973, vol. 10, no. 1, pp. 48–60. https://doi.org/10.1145/951787.951766
Liu, C., Sheng, Ya., Wei, Z., and Yang, Yo., Research of text classification based on improved TF-IDF algorithm, 2018 IEEE Int. Conf. of Intelligent Robotic and Control Engineering (IRCE), Lanzhou, China, 2018, IEEE, 2018, pp. 218–222. https://doi.org/10.1109/irce.2018.8492945
Kutuzov, A., Kuzmenko, E., and Pivovarova, L., Clustering of Russian adjective-noun constructions using word embeddings, Proc. 6th Workshop on Balto-Slavic Natural Language Processing, Valencia: Association for Computational Linguistics, 2017, pp. 3–13. https://doi.org/10.18653/v1/w17-1402
Kumar, G. and Kumar, K., An information theoretic approach for feature selection, Secur. Commun. Networks, 2013, vol. 5, no. 2, pp. 178–185. https://doi.org/10.1002/sec.303
Turney, P.D., Mining the Web for synonyms: PMI-IR versus LSA on TOEFL, Machine Learning: ECML 2001, Da Raedt, L. and Flach, P., Eds., Lecture Notes in Computer Science, vol. 2167, Berlin: Springer, 2001, pp. 491–502. https://doi.org/10.1007/3-540-44795-4_42
Ahmad, K. and Davies, A.E., Weirdness in special-language text: Welsh radioactive chemicals texts as an exemplar, Int. Inst. Terminologieforschung J., 1994, vol. 5, no. 2, pp. 22–52.
Steinhaus, H., Sur la division des corps materiels en parties, Bull. Acad. Polon. Sci., C, 1956, vol. 4, pp. 801–804.
Han, J., Kamber, M., and Pei, J., Classification, The Morgan Kaufmann Series in Data Management, San Francisco: Morgan Kaufmann, 2001. https://doi.org/10.1016/C2009-0-61819-5
Bae, S. and Yi, Yo., Acceleration of word2vec using GPUs, Neural Information Processing. ICONIP 2016, Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., and Liu, D., Eds., Lecture Notes in Computer Science, vol. 9948, Cham: Springer, 2016, pp. 269–279. https://doi.org/10.1007/978-3-319-46672-9_31
Waskom, M.L., Seaborn: Statistical data visualization, J. Open Source Software, 2021, vol. 6, no. 60, p. 3021. https://doi.org/10.21105/joss.03021
Funding
The study by the Analytical Center at the Government of the Russian Federation, agreement no. 000000D730321P5Q0002, and by the HSE University, agreement dated November 2, 2021, no. 70-2021-00139, within a grant for support of research centers in the area of artificial intelligence, including the area of strong artificial intelligence, systems of authorized artificial intelligence, and ethical aspects of application of artificial intelligence.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s Note.
Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lobanova, P.A., Kuzminov, I.F., Karatetskaia, E.Y. et al. Trend Detection Using NLP as a Mechanism of Decision Support. Sci. Tech. Inf. Proc. 50, 440–448 (2023). https://doi.org/10.3103/S0147688223050106
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0147688223050106