Abstract
Understanding the evolutionary relationships among scientific topics and learning the evolutionary process of innovations is a crucial issue for strategic decision makers in governments, firms and funding agencies when they carry out forward-looking research activities. However, traditional co-word network analysis on topic identification cannot effectively excavate semantic relationship from the context, and fixed time window method cannot scientifically reflect the evolution process of topics. This study proposes a framework of identifying topic evolutionary pathways based on network analytics: Firstly, keyword networks are constructed, in which a piecewise linear representation method is used for dividing time periods and a Word2Vec mode is used for capturing semantics from the context of titles and abstracts; Secondly, a community detection algorithm is used to identify topics in networks; Finally, evolutionary relationships between topics are represented by measuring the topic similarity between adjacent time periods, and then topic evolutionary pathways are identified and visualized. An empirical study on information science demonstrates the reliability of the methodology, with subsequent empirical validations.
Similar content being viewed by others
Notes
JASIST changed its name from Journal of the American Society for Information Science and Technology to Journal of the Association for Information Science and Technology in 2014.
VantagePoint is a text mining visualization software for bibliometric data (such as scientific paper patents and academic project applications). Please visit the website for detail: www.thevantagepoint.com.
References
Arruda, H. F., Costa, L. D. F., & Amancio, D. R. (2016). Topic segmentation via community detection in complex networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 26(6), 063120.
Balili, C., Lee, U., Segev, A., Kim, J., & Ko, M. (2020). TermBall: tracking and predicting evolution types of research topics by using knowledge structures in scholarly big data. IEEE Access, 8, 108514–108529.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd ACM international conference on machine learning (pp. 113–120).
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 30(2), 155–168.
Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1), 179–255.
Branting, L. K. (2012). Context-sensitive detection of local community structure. Social Network Analysis and Mining, 2(3), 279–289.
Carmona-Poyato, Á., Fernández-Garcia, N. L., Madrid-Cuevas, F. J., & Durán-Rosal, A. M. (2021). A new approach for optimal offline time-series segmentation with error bound guarantee. Pattern Recognition, 115, 107917.
Chae, C., Yim, J. H., Lee, J., Jo, S. J., & Oh, J. R. (2020). The bibliometric keywords network analysis of human resource management research trends: the case of human resource management journals in South Korea. Sustainability, 12(14), 5700.
Chang, P. C., Fan, C. Y., & Liu, C. H. (2009). Integrating a piecewise linear representation method and a neural network model for stock trading points prediction. IEEE Transactions on Systems, Man, and Cybernetics Part c: Applications and Reviews, 39(1), 80–92.
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. C. (2017a). Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Chen, H., Zhang, G., Zhu, D., & Lu, J. (2015). A patent time series processing component for technology intelligence by trend identification functionality. Neural Computing and Applications, 26(2), 345–353.
Chen, H., Zhang, G., Zhu, D., & Lu, J. (2017b). Topic-based technological forecasting based on patent data: A case study of Australian patents from 2000 to 2014. Technological Forecasting and Social Change, 119, 39–52.
Chen, J., Chen, J., Zhao, S., Zhang, Y., & Tang, J. (2020). Exploiting word embedding for heterogeneous topic model towards patent recommendation. Scientometrics, 125(3), 2091–2108.
Chen, X., Chen, J., Wu, D., Xie, Y., & Li, J. (2016). Mapping the research trends by co-word analysis based on keywords from funded project. Procedia Computer Science, 91, 547–555.
Cheng, Q., Wang, J., Lu, W., Huang, Y., & Bu, Y. (2020). Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics, 124(3), 1923–1943.
Cruz, P., & Cruz, H. (2020). Piecewise linear representation of finance time series: Quantum mechanical tool. Acta Physica Polonica A., 138(1), 21–24.
Ding, W., & Chen, C. (2014). Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods. Journal of the Association for Information Science and Technology, 65(10), 2084–2097.
Ding, Y. (2011). Community detection: Topological vs. topical. Journal of Informetrics, 5(4), 498–514.
Ding, Y., & Stirling, K. (2016). Data-driven discovery: A new era of exploiting the literature and data. Journal of Data and Information Science, 1(4), 1–9.
Ding, Z., Liu, R., Li, Z., & Fan, C. (2020). A thematic network-based methodology for the research trend identification in building energy management. Energies, 13(18), 4621.
Érdi, P., Makovi, K., Somogyvári, Z., Strandburg, K., Tobochnik, J., Volf, P., & Zalányi, L. (2013). Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95(1), 225–242.
Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. Studies in Linguistic Analysis the Philological Society, 1957, 1–32.
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5), 75–174.
Gémar, G., & Jiménez-Quintero, J. A. (2015). Text mining social media for competitive analysis. Tourism & Management Studies, 11(1), 84–90.
Guimera, R., Sales-Pardo, M., & Amaral, L. A. (2007). Classes of complex networks defined by role-to-role connectivity profiles. Nature physics, 3(1), 63–69.
Holland, G. A. (2008). Information science: an interdisciplinary effort? Journal of Document, 64(1), 7–23.
Hou, J., Yang, X., & Chen, C. (2018). Emerging trends and new developments in information science: A document co-citation analysis (2009–2016). Scientometrics, 115(2), 869–892.
Hu, K., Wu, H., Qi, K., Yu, J., Yang, S., et al. (2018b). A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model. Scientometrics, 114(3), 1031–1068.
Hu, X. (2014). Using social network analysis and text mining to analyze students’ input on social media. Library & Information Science Research, 32(3), 732–741.
Huang, G., & Zhou, X. (2016). A piecewise linear representation method of hydrological time series based on curve feature. In 2016 8th international conference on intelligent human-machine systems and cybernetics (IHMSC) (pp. 203–207). IEEE.
Huang, L., Chen, X., Ni, X., Liu, J., Cao, X., & Wang, C. (2021). Tracking the dynamics of co-word networks for emerging topic identification. Technological Forecasting and Social Change, 170, 120944.
Huang, L., Liu, F., & Zhang, Y. (2020). Overlapping community discovery for identifying key research themes. IEEE transactions on engineering management.
Isler, Y., & Kuntalp, M. (2010). Heart rate normalization in the analysis of heart rate variability in congestive heart failure. In Proceedings of the Institution of Mechanical Engineers Part H Journal of Engineering in Medicine, 224(3), 453.
Iwata, T., Yamada, T., Sakurai, Y., & Ueda, N. (2010). Online multiscale dynamic topic models. In Proceedings of the 16th ACM Sigkdd international conference on knowledge discovery and data mining (pp. 663–672).
Jeong, C., Jang, S., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics, 124(3), 1907–1922.
Jeong, D. H., & Min, S. (2014). Time gap analysis by the topic model-based temporal technique. Journal of Informetrics, 8(3), 776–790.
Kai, H., Qi, K., Yang, S., Shen, S., Cheng, X., Huayi, W., Zheng, J., McClure, S., & Tianxing, Y. (2018). Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations. Scientometrics, 114(3), 1141–1157.
Katsurai, M., & Ono, S. (2019). TrendNets: Mapping research trends from dynamic co-word networks via sparse representation. Scientometrics, 121, 1583–1598.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmenting time series. In Proceedings 2001 IEEE international conference on data mining (pp. 289–296).
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2004). Segmenting time series: A survey and novel approach. Data Min Time Ser Databases, 57, 1–22.
Kimura, A., Kashino, K., Kurozumi, T., & Murase, H. (2008). A quick search method for audio signals based on a piecewise linear representation of feature trajectories. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 396–407.
Kiss, A., Temesi, G., Tompa, O., Lakner, Z., & Soós, S. (2021). Structure and trends of international sport nutrition research between 2000 and 2018: Bibliometric mapping of sport nutrition science. Journal of the International Society of Sports Nutrition, 18(1), 12.
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology, 68(4), 984–998.
Kleminski, R., Kazienko, P., & Kajdanowicz, T. (2020). Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification. Journal of Information Science. https://doi.org/10.1177/0165551520962775
Kralj, J., Valmarska, A., Robnik-Šikonja, M., & Lavrač, N. (2015). Mining text enriched heterogeneous citation networks. In Pacific-Asia conference on knowledge discovery and data mining (pp. 672–683). Springer, Cham.
Kuhn, T. S. (1962). The structure of scientifific revolutions. University of Chicago Press.
Lancichinetti, A., & Fortunato, S. (2009). Community detection algorithms: A comparative analysis. Physical review E, 80(5), 056117.
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177–2185).
Li, G.-C., Lai, R., D’Amour, A., Doolin, D. M., Sun, Y., Torvik, V. I., Yu, A. Z., & Fleming, L. (2014). Disambiguation and co-authorship networks of the US patent inventor database (1975–2010). Research Policy, 43(6), 941–955.
Liu, Z. (2005). Visualizing the intellectual structure in urban studies: A journal co-citation analysis (1992–2002). Scientometrics, 62(3), 385–402.
Luo, L., & Chen, X. (2013). Integrating piecewise linear representation and weighted support vector machine for stock trading signal prediction. Applied Soft Computing Journal, 13(2), 806–816.
Mathieu, R. G., & Gibson, J. E. (1993). A methodology for large-scale R&D planning based on cluster analysis. IEEE Transactions on Engineering Management, 40(3), 283–292.
McCain, K. W. (2008). Assessing an author’s influence using time series historiographic mapping: The oeuvre of Conrad Hal Waddington (1905–1975). Journal of the American Society for Information Science and Technology, 59(4), 510–525.
Mei, Q. Z., & Zhai, C. X. (2005). Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceedings of the 11th ACM Sigkdd international conference on knowledge discovery and data mining (pp. 198–207).
Miao, Z., Du, J., Dong, F., Liu, Y., & Wang, X. (2020). Identifying technology evolution pathways using topic variation detection based on patent data: A case study of 3D printing. Futures, 118, 102530.
Mikolov, T., Sutskever, I., Chen, K., et al. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.
Moreno, A., & Terwiesch, C. (2014). Doing business with strangers: Reputation in online service marketplaces. Information Systems Research, 25(4), 865–886.
Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical review E, 69(6), 066133.
Newman, M. E. J. (2012). Communities, modules and large-scale structure in networks. Nature Physics, 8(8), 25–31.
Newman, M. E. J., & GIirvan, M. (2004). Finding and evaluating community structure in networks. Physical review, 69(2), 108–113.
Nguyen, T. H. D., Melcer, E., Canossa, A., Isbister, K., & Seif El-Nasr, M. (2018). Seagull: A bird’s-eye view of the evolution of technical games research. Entertainment Computing, 26, 88–104.
No, H. J., An, Y., & Park, Y. (2015). A structured approach to explore knowledge flows through technology-based business methods by integrating patent citation analysis and text mining. Technological Forecasting & Social Change, 97, 181–192.
Onan, A. (2019). Two-Stage Topic Extraction Model for Bibliometric Data Analysis Based on Word Embeddings and Clustering. IEEE Access, 7, 145614–145633.
Onan, A., & Toolu, M. A. (2020). Weighted word embeddings and clustering-based identification of question topics in mooc discussion forum posts. Computer Applications in Engineering Education., 29, 675–689.
Palla, G., Barabási, A.-L., et al. (2007). Quantifying social group evolution. Nature, 446(7136), 664.
Park, I., & Yoon, B. (2018). Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. Journal of Informetrics, 12(4), 1199–1222.
Pépin, L., Kuntz, P., Blanchard, J., Guillet, F., & Suignard, P. (2017). Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted tweets. Computers & Industrial Engineering, 112, 450–458.
Qi, L., Wang, Y., Chen, J., Liao, M., & Zhang, J. (2021). Culture under complex perspective: A classification for traditional Chinese cultural elements based on NLP and complex networks. Complexity, 2021, 1–15.
Qian, Y., Liu, Y., & Sheng, Q. Z. (2020). Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence. Journal of Informetrics, 14(3), 101047.
Qiu, J., & Lin, Z. (2011). A framework for exploring organizational structure in dynamic social networks. Decision Support Systems, 51(4), 760–771.
Rabitz, F., Olteanu, A., Jurkevičienė, J., & Budžytė, A. (2021). A topic network analysis of the system turn in the environmental sciences. Scientometrics, 126(3), 2107–2140.
Rees, B. S., & Gallagher, K. B. (2012). Overlapping community detection using a community optimized graph swarm. Social Network Analysis & Mining, 2(4), 405–417.
Ren, H., Renoust, B., Melançon, G., Viaud, M.-L. & Satoh, S. (2018). Exploring temporal communities in mass media archives.
Schwartz, R., Reichart, R., & Rappoport, A. (2015). Symmetric pattern based word embeddings for improved word similarity prediction. In Proceedings of the nineteenth conference on computational natural language learning.
Sharef, N. M., Martin, T., & Azmimurad, M. A. (2013). Conceptually related lexicon clustering based on word context association mining. International Journal of Information Processing & Management, 4(3), 40–50.
Sharma, D., Kumar, B., Chand, S., & Shah, R. R. (2021). Uncovering research trends and topics of communities in machine learning. Multimedia Tools and Applications, 80(6), 9281–9314.
Sheng, Z., Hailong, C., Chuan, J., & Shaojun, Z. (2015). An adaptive time window method for human activity recognition. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1188–1192). IEEE.
Silvestrini, P., Amato, U., Vettoliere, A., Silvestrini, S., & Ruggiero, B. (2017). Rate equation leading to hype-type evolution curves: A mathematical approach in view of analysing technology development. Technological Forecasting and Social Change, 116, 1–12.
Steven, A. G. (2011). Understanding belief using citation networks. Journal of Evaluation in Clinical Practice, 17(2), 389–393.
Su, L. X., Lyu, P. H., Yang, Z., & Ding, S. (2015). Scientometric cognitive and evaluation on smart city related construction and building journals data. Scientometrics, 105(1), 449–470.
Sud, P., & Thelwall, M. (2014). Evaluating altmetrics. Scientometrics, 98(2), 1131–1143.
Sun, J. M., Yu, P. S., Papadimitriou, S., & Faloutsos, C. (2007). GraphScope: Parameter-free mining of large Time-eevolving graphs. In Proceedings of the 13th ACM Sigkdd international conference on Knowledge discovery and data mining (pp. 687–696). New York: ACM.
Sun, X., & Ding, K. (2018). Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents. Scientometrics, 116(3), 1735–1748.
Symeon, P., Yiannis, K., Athena, V., & Ploutarchos, S. (2012). Community detection in social media, performance and application considerations. Journal of Data Mining Knowledge Discovery, 24(3), 515–554.
The, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581.
Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5), 1216–1247.
Vaio, G. D., & Weisdorf, J. L. (2009). Ranking economic history journals: A citation-based impact-adjusted analysis. Discussion Papers, 4(1), 1–17.
Van Raan, A. F. (2004). Sleeping beauties in science. Scientometrics, 59(3), 467–472.
Verma, M. (2017). Cluster based ranking index for enhancing recruitment process using text mining and machine learning. International Journal of Computer Applications, 157(9), 23–30.
Wang, B., Liu, S., Ding, K., Liu, Z., & Xu, J. (2014a). Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: A case study in LTE technology. Scientometrics, 101(1), 685–704.
Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 579–586).
Wang, Q., She, J., Song, T., Tong, Y., Chen, L., & Xu, K. (2016). Adjustable time-window-based event detection on twitter. In international conference on web-age information management (pp. 265–278). Springer, Cham.
Wang, X., & Mccallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Acm Sigkdd International conference on knowledge discovery & data mining (pp. 424–433). ACM.
Wang, X., Cheng, Q., & Lu, W. (2014b). Analyzing evolution of research topics with NEViewer: A new method based on dynamic co-word networks. Scientometrics, 101(2), 1253–1271.
Wang, Y., Liu, Z., & Sun, M. (2015). Incorporating linguistic knowledge for learning distributed word representations. PloS one, 10(4), e0118437.
Wasserman, S., & Faust, K. (1994). Social network analysis methods and applications. Contemporary Sociology, 91(435).
Wu, H., Yi, H., & Li, C. (2021). An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field. Scientometrics, 126, 1–21.
Xie, J., Kelley, S., & Szymanski, B. K. (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. Acm Computing Surveys (csur), 45(4), 1–35.
Xu, Y., Zhang, S., Zhang, W., Yang, S., & Shen, Y. (2019). Research front detection and topic evolution based on topological structure and the PageRank algorithm. Symmetry, 11(3), 310.
Xu, H., Winnink, J., Yue, Z., Liu, Z., & Yuan, G. (2020). Topic-linked innovation paths in science and technology. Journal of Informetrics, 14(2), 101014.
Yan, C., Yi, C., Wu, L., & Fang, J. (2015). Trend Feature Extraction in Condition Monitoring by a New Piecewise Linear Representation Method. In First international conference on information sciences, machinery, materials and energy (pp. 1378–1383). Atlantis Press.
Yang, B., Liu, D., & Liu, J. (2010). Discovering communities from social networks: methodologies and applications. In Handbook of social network technologies and applications (pp. 331–346). Springer.
Yang, Y., Wu, M., & Cui, L. (2012). Integration of three visualization methods based on co-word analysis. Scientometrics, 90(2), 659–673.
Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientifc documents with topic modeling. Scientometrics, 100(3), 767–786.
You, H., Li, M., Hipel, K. W., et al. (2017). Development trend forecasting for coherent light generator technology based on patent citation network analysis. Scientometrics, 111(1), 297–315.
Zeng, Q., Hu, X., & Li, C. (2019). Extracting keywords with topic embedding and network structure analysis. Data Analysis and Knowledge Discovery, 3(7), 52–60.
Zhang, F., & Wu, S. (2021). Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network. Scientometrics, 126, 1–26.
Zhang, Y., Lu, J., Liu, F., Liu, Q., Porter, A., Chen, H., et al. (2018). Does deep learning help topic extraction? a kernel k-means clustering method with word embedding. Journal of Informetrics, 12(4), 1099–1117.
Zhang, Y., Porter, A. L., Hu, Z., Guo, Y., & Newman, N. C. (2014). “Term clumping” for technical intelligence: A case study on dye-sensitized solar cells. Technological Forecasting and Social Change, 85, 26–39.
Zhang, Y., Wu, M., Miao, W., Huang, L., & Lu, J. (2021). Bi-layer network analytics: A methodology for characterizing emerging general-purpose technologies. Available at SSRN 3830937.
Zhang, Y., Zhang, G., Zhu, D., & Lu, J. (2017). Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics. Journal of the Association for Information Science & Technology, 68(8), 1925–1939.
Zhou, D., Ji, X., Zha, H., & Giles, C. L. (2006). Topic evolution and social interactions: how authors effect research. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 248–257).
Zhou, H. K., Yu, H., & Hu, R. (2017). Topic evolution based on the probabilistic topic model: A review. Frontiers of Computer Science, 11(5), 786–802.
Zhou, P., & Jiang, D. (2020). Study on the evolution of hot topics in the urban development. Evolutionary Intelligence. https://doi.org/10.1007/s12065-020-00391-y
Zhou, X., Huang, L., Porter, A., Vicentegomila, J. M., & Phillips, F. (2019). Tracing the system transformations and innovation pathways of an emerging technology: solid lipid nanoparticles. Technological Forecasting and Social Change, 146, 785–794.
Zhu, J., Li, X., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R. & Liu, P. (2015). Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In International conference on web-age information management (pp. 70–82). Springer, Cham.
Acknowledgements
This work was supported by the National Nature Science Foundation of China Funds [Grant No. 71774013]. Yi Zhang acknowledges supports from the Australian Research Council under Discovery Early Career Researcher Award DE190100994.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, L., Chen, X., Zhang, Y. et al. Identification of topic evolution: network analytics with piecewise linear representation and word embedding. Scientometrics 127, 5353–5383 (2022). https://doi.org/10.1007/s11192-022-04273-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04273-1