Understanding the linkages of smart-city technologies and applications: Key lessons from a text mining approach and a call for future research

There have been many attempts to transform cities into smart cities worldwide. However, it is difficult to understand and describe smart cities from different perspectives, given the widespread application of the concept of smart city in diverse disciplines, such as urban planning, electronic engineering, and computer sciences. This work conducted a comprehensive smart city literature review based on text mining of 3,315 papers on smart cities published in journals indexed in the Science Citation Index Expanded and Social Sciences Citation Index databases. These include “ all papers ” classified as research articles published from 1999 to April 2020. Our findings show the state of the art of research on smart cities, including (i) smart city literature statistics from 1999 to 2019, (ii) 23 research topics related to smart cities, and (iii) geographical variations in smart-city research. Based on these findings, we offer theoretical and practical implications of (1) missing fields of studies, (2) future research directions, and (3) the applicability of text-mining techniques to literature reviews. We believe that this work, which aims to establish a common ground for understanding smart cities from multiple disciplinary perspectives, will encourage further research and development regarding smart cities.


Introduction
With the development of Artificial Intelligence (AI) and Internet of Things (IOT) technology, the idea of smart cities is attracting attention as a new growth engine and development strategy in many global cities.Many cities are introducing new urban services, using information and communications technology (ICT) to build smart cities that improve the quality of life of citizens.However, despite its obvious importance and popularity, there is a lack of in-depth understanding of smart cities in the literature, and the concept of "smart city" remains unclear.What is a smart city?What are the major smart city research areas and how are they related?What are the key technological factors of smart cities? What are the main application areas and how do they vary in different countries?In short, how should we describe "smart cities"?
A comprehensive review of smart cities across different fields may facilitate smart city planning and development.However, such integrative work is not easy to achieve because of the variety and volume of studies and applications related to smart cities.One of the commonly applied methodologies for integrative work is the bibliometric approach, which allows to identify how a specific concept or research discipline is formulated and how it correlates with other concepts by clarifying the relationships between all the concepts (Kalmaski and Kirby, 2012).
There have been several meaningful studies of the subject of the smart city from a bibliometric perspective.For instance, using a systematic webometric exercise, Joss et al. (2019) selected 27 cities and collected 346 highly-read online texts across those cities.Analyzing these texts reveals five key issues regarding the evolving smart city discourse, including socio-technical bifurcation, transformative change, and scale issues.Mora et al. (2017) conducted a bibliometric analysis of the literature published between 1992 and 2012, showing that research in the European context follows a substantially different path from the that in the North American context, such that while the former supports a holistic perspective on the smart city, the latter shows rather a techno-centric understanding of the subject.Fu and Zhang (2017) understood the smart city as one of the major evolutionary trajectories of contemporary city concepts.Examining clusters of papers published from 1980 to 2015, they found two distinct concepts of "sustainable" and "smart" cities.The "sustainable city" literature considers the socio-economic structure of urban sustainability, while the "smart city" concept is more heavily connected with various technology nucleus.
One of the virtues of the bibliometric approach is that a quantitative and objective review can avoid concerns over potential bias in qualitative reviews caused by subjectivity (Liu et al., 2020).This study extends these earlier contributions with a more systematic and automated interpretations of the results.To capture the essence of the literature text corpus, our analytical method incorporates metrics for measuring the statistical and semantic significance of word features of the corpus, and unsupervised machine learning algorithms such as spectral clustering (Von Luxburg, 2007) and non-negative matrix factorization (NMF; Lin, 2007).A machine learning approach differs from bibliometric and systematic literature reviews in terms of the process of knowledge discovery (i.e., automation with algorithms versus manual interpretation by humans).In fact, it adds to the traditional approach, rather than subtracting from or conflicting with it (Lim and Maglio, 2018).In terms of methodological contributions, to the best of our knowledge, this is the first study to apply machine learning to the data-driven review of an urban planning topic.Our analysis of 3,315 articles identified basic statistics, significant keywords, research topics, technological factors, application areas, and geographical characteristics of smart city literature.By combining these quantitatively derived findings and our own qualitative review of literature, we offer theoretical and practical implications for smart city research and practice.This paper is organized as follows.In Section 2, we describe the review methodology, including the data collection and analysis methods.In Section 3, we review the features of smart city literature as revealed by the text-mining technique, and describe the state of the art of research on smart cities.In Section 4, we discuss the theoretical and practical implications of our work.In Section 5, we present concluding remarks.

Methods and materials
Establishing a common ground for central concepts is essential in science (Boehm and Thomas, 2013;Fortunato et al., 2018).Developing a unified conceptualization of smart cities is difficult given the variety and volume of the related studies and applications.Here, to integrate the perspectives on and capabilities of methods for developing smart cities, we provide a systematized view of the dispersed knowledge of smart cities by integrating such knowledge into a robust conceptualization of smart cities and identifying the commonalities and diversities in smart city literature using a text-mining approach.
Text mining or text data mining is a process of discovering previously unknown knowledge from textual data (Bird et al., 2009).Text mining methods and machine learning algorithms have been used for many purposes with a wide range of data types, such as the analysis of technological trends using patent data (Yoon and Park, 2004), understanding customers using hotel review data (Mankad et al., 2016), and descriptions of specific research fields using scientific documents (Fortunato et al., 2018).Text mining is an appropriate method for achieving our research objective because we aim to comprehensively explore aspects and areas of research on smart cities, and it is a challenge to achieve such work using only human capacity.In addition, despite the insights of experts, subjective reviews and descriptions of smart cities can be difficult to evaluate.A data-driven approach, such as text mining with metrics and machine learning algorithms, can constitute an excellent alternative (Blei et al., 2003).Moreover, an expert's analysis built upon a machine's finding from massive amounts of data often generates rich insights and implications (Lim and Maglio, 2018;Ordenes et al., 2014).Accordingly, we collected and analyzed a comprehensive set of 3,315 journal articles as shown in Fig. 1.
We identified and downloaded journal papers related to smart cities from the Web of Science Core Collection databases of the Science Citation Index Expanded (SCIE) (1945-) and Social Sciences Citation Index (SSCI) (1987-) using the two queries of {TOPIC: ("smart city")} and {TOPIC: ("smart cities")}.The Web of Science Core Collection provides quality papers.We only collected "research articles" because other types of papers, such as book reviews and editorials, may be extremely broad and generate noise on specific topics and characteristics of smart city literature.Unlike existing reviews of selected sample papers, we collected and analyzed the data of the full population from the databases in a semi-automatic manner.We initially downloaded 4,227 articles, but removed 912 duplicates or incomplete data, resulting in a text corpus of 3,315 articles for analysis.Deciding from what part of the documents to extract text data for analysis, such as from the abstract only or the full text, is crucial in text mining application research.From the 3,315 articles, we extracted text data composed of the title, abstract, and keywords, following existing studies (e.g., Lim and Maglio, 2018;Noh et al. 2015;Xie and Miyazaki, 2013).
The aim of text preprocessing (Step 2 in Fig. 1) was to prepare the Fig. 1.Overview of text mining process of smart city literature in this study.
C. Lim et al. text corpus for subsequent analysis by removing potential sources of noise.It involved eliminating "stop words" (e.g., "it" "and" or "for"), any words containing letters not found in the alphabet, changing all text to lowercase (e.g., from "City" to "city"), and lemmatizing all words (e.g., from "processes" to "process"), as well as customized rules (e.g., we deleted words commonly used in journal articles that are noncontextual, such as "paper" and "result").We then focused on wordfeature selection.Words in a corpus of journal articles can be categorized into three types: (1) specific words developed by the authors or specific companies, such as acronyms and application names; (2) contextual words relevant to the topic, such as "urban," "sensor," or "communications"; and (3) general words frequently used in scientific publications such as "method" and "approach," and in English documents, such as "within" and "over."Word-feature selection requires the inclusion of Type 2 words and exclusion of Types 1 and 3 words.Based on the algorithm and word significance metrics proposed by Lim and Maglio (2018), we eliminated Type 1 and 3 words from the original data set and identified Type 2 words as representing smart city literature.As a result, 1,708 word features were selected from the original 20,351 features.
For learning the literature text corpus and identifying key topics representing smart city literature (Step 3 in Fig. 1), we applied spectral clustering (von Luxburg, 2007), which is a graph-partitioning method known for its effectiveness in identifying clusters of data points (i.e., articles in this work) that are nonlinearly separable in the original data space.This algorithm embeds the data points into the eigenspace of their eigenvectors (i.e., the spectral domain) to represent the global graph structure of the data and to partition the graph into a set of separable clusters.This algorithm was chosen based on pilot studies that tested the performance of various clustering algorithms applied to sample data.The graph-partitioning problem in spectral clustering is an NP-hard problem, requiring the use of a heuristic algorithm, meaning that the clustering result changes with each run and it is difficult to determine an optimal number of clusters.We used the mean of Silhouette Coefficient values (Rousseeuw, 1987) of the entire data to determine the number of clusters in the text corpus.We checked the average of 100 iterations for each of 40 cases (i.e., testing the right number of clusters from 1 to 40 clusters); the mean of Silhouette Coefficient values was high when the number of clusters was 22, 23, and 24.We compared all cases by checking the representative words determined by the metrics proposed by Lim and Maglio (2018).We also reviewed the results of each case for manual evaluation.Finally, we concluded that an optimal number of clusters was 23.
In interpreting each cluster, we used the NMF algorithm (Lin, 2007) and word-significance metrics (Lim and Maglio, 2018) to identify words representative of each cluster, and interpreted the results using Longabaugh (2012)'s visualization method.NMF is a representation learning method for extracting the latent representation of data with nonnegative values, which are often more interpretable in describing the original features.Thus, we used NMF to interpret the topic across article data and word features in a specific cluster.The metrics we adopted from Lim and Maglio (2018) include the mean of the TF-IDF scores of a word feature across data, where a high value indicates that the word is generally important, as well as the mean of the dot-product scores of a word feature with other features, where a high value indicates that the word is close to many other words and important across many data at the same time.
Step 3 in Fig. 1 shows a binary adjacency matrix.Each cluster is highlighted by a yellow square.The density of each cluster indicates the level of homogeneity (i.e., text similarity), whereas its size indicates the amount of data.Thus, the cluster may indicate a broad topic if the density is low and the size is large.Opposite results are found for specific topics.The homogeneity and size were considered when each cluster was interpreted and named based on its top representative words as determined by the NMF algorithm and the word significance metrics.
We subsequently validated the outcome from spectral clustering (i.e., hard clustering) using the Latent Dirichlet Allocation (LDA; Blei et al., 2003) and NMF topic-modeling algorithms (i.e., soft clustering) for different topic number scenarios to check whether any significant topics were missed by the analyses.LDA is a generative probabilistic model for collections of documents and words; it assumes that every document contains every topic latently and that every word contributes to every topic.This method has been applied to a variety of topic extraction problems.As a result of the validation with LDA, we found that no significant topic was missed.We also reviewed the contents of the top and bottom data of each cluster during interpretation, which were identified based on the cosine similarity between each item and the centroid of its cluster, as well as the data sources (i.e., journals) of each cluster to assess the homogeneity of the cluster; a cluster may be homogeneous if the top and bottom data discuss the same topic and the data sources are highly correlated.
For further interpretation, we analyzed the basic statistics (e.g., geographical variation) and the relational network of the 23 topics (Step 4 in Fig. 1).For the network analysis, we identified the centroids within the clusters and computed the cosine similarities between the centroids.Given that all clusters are highly related, the similarity scores are generally high and all nodes (clusters) are connected; thus, we identified and only connected the top three most relevant clusters from each cluster to observe the most significant relationships between the research topics; the size of the node represents the network degree (i.e., relationship strength).This allowed us to identify a higher-level overview of the 23 topics; for example, the nodes in the network center may represent broad topics of the smart cities literature, whereas the nodes on the boundaries pertain to specific topics.This is attributed to the fact that the former nodes have strong relationship values (i.e., connections with many other nodes), whereas the latter are weaker and connected through the former.For the geographical analysis, we identified the corresponding author's country for each paper and related this information with the clustering results to understand the geographical variation in smart city research.Sections 3 and 4 present the detailed outcomes of these analyses.
Finally, building on the findings quantitatively derived, we further reviewed the data manually to add human insight and to describe the overall structure of smart city literature.In doing so, we tried to incorporate a broad range of studies on smart cities from technology and engineering to urban planning and social science, thereby filling the gap between existing studies on technological development and social advance.As a result, we categorized the 23 topics into three levels: 1) smart city technology, 2) smart city service, and 3) smart city policy.The main research categories and their sub-topics identified through text mining are reviewed to draw theoretical and practical implications for studies of smart cities.

Results and findings
This section presents the results and findings of a text-mining analysis in three ways from a bibliometric approach.First, we name and briefly describe the research topics and categories yielded by the analysis: the 23 topics under three upper-level categories as shown in Fig. 2. Second, the trends of smart city research topics in recent years are analyzed focusing on these 23 topics, and the evolution of smart city research is discussed.Third, focusing on the differences between Asian, European, and North American institutions, geographical variances in smart city research topics are examined.

The 23 research topics related to smart cities
This study identified three research categories, which consist of 23 sub-research topics related to smart cities, through the text-mining process of smart city literature based on the results of network analysis of each identified topic and the review of representative keywords and articles: (1) Smart City Technologies, (2) Smart City Services, (3) Smart City Policies.There is a great difference in the research topics for C. Lim et al. each research cluster, as shown in Fig. 3.Under Smart City Technologies, strong emphasis has been placed on "computing," "IoT," and "learning," while in Smart City Policies, the focus is on "urban," "sustainability," and "innovation."In Smart City Services, specific service fields such as "energy" and "traffic" have been indicated with an emphasis on "management.""Sensors," "network," and "system" were commonly found in both the smart city technologies and services categories."Data" was found as the main topic in all three categories, implying that the data generated from the sensing and monitoring system using IoT could be key resources for smart cities.
The classification of smart city literature using three research clusters is consistent with other studies.For example, Angelidou (2015) described smart cities using four forces: Technology Push, Application Pull, Urban Futures and Knowledge, and Innovation Economy.In this article, technology push refers to the fact that the recent development of ICT makes smart cities possible.Application pull is related to smart city services, and urban futures and the knowledge and innovation economy could be included in smart city policy and planning.D'Auria et al. ( 2018) divided smart city literature into three groups: How (development, management, monitoring, big data, etc.), What (services, implementation of technologies), and With (people, institutions, policies, innovation).Their classification could be renamed services (What) and policies (With) in smart cities.Further, technology and policy are also included in the three elements of the smart city initiative framework suggested by Chourabi et al. (2012).
Regarding sub-research topics, as summarized in Table 1, the Smart City Technologies category includes seven sub-fields that focus on ICT for data communication, collection, management, and analysis.Wireless network and IoT explore communication technologies as the core infrastructure of data-driven smart cities.Big data analytics includes systems and practices for analyzing sensed and monitored data in smart cities. Security and privacy are the technologies for ensuring the security and privacy of personal information collected in smart cities. Cloud computing is a technology where cloud services are applied for data storage, management, and analysis in smart cities to reduce the cost of hardware ICT infrastructure, such as servers, for data storage.Machine learning focuses on traditional machine learning or modern deep learning algorithms for prediction and optimization of smart cities. Mobile crowdsensing is a data collection technology that utilizes mobile smartphones as sensors.
The Smart City Services category consists of 12 sub-topics.In general, Models and applications includes a variety of case studies and model developments of smart city services.System architecture focuses on the practical development of system architecture for various smart city services.Service management also includes various case studies of smart city services with an emphasis on users' acceptability and service delivery and management.Transportation has been analyzed as the most important application area of smart cities.  Electric vehicle charging infrastructure focuses on the optimal location of charging stations of electric vehicles.Another important application area of smart city services is the environment, considering the fact that Energy management, Environmental sensing and monitoring, Water management, and Waste management are identified as subfields of smart city services.Sensors and monitoring systems for each subfield are key contents of these studies.In particular, Environmental sensing and monitoring focuses on sensing systems for noise, air quality, and air pollution monitoring.Video surveillance addresses the development of services and technologies using CCTV, which is attracting attention as a key information gathering tool for transportation and crime prevention in a smart city.
The Smart City Policies category has four sub-topics.Citizen governance addresses political and social aspects of smart cities with an emphasis on citizen participation and governance.This research topic consists of the largest share in the Smart City Policies categories, implying the importance of citizen-centric smart cities as (Joss et al., 2017).Planning and development focuses on urban planning approaches in smart cities with various theoretical and practical frameworks.Both the Citizen governance and Planning and development topics include diverse case studies and evaluation studies to find policy and planning implications for smart cities. Sustainability and mobility addresses the connections between smart cities and sustainable cities, and transportation and mobility is the main topic that links these two concepts.Finally, Human capital is a line of research that uses the concept of the smart city in describing cities with knowledge industries and highly skilled and educated labor forces.These studies are mainly conducted from urban economy perspectives and are less focused on the role of technological advancement and services.However, considering the research result that smart city policies could contribute to urban economic growth (Caragliu and Del Bo, 2018), research on the subject of Human capital can be expanded and developed into more systematic research on the relationship between smart cities and urban economy.
As reference materials for readers, the top keywords of each research topic are summarized in Table 1, and a short review of the most relevant research articles for each topic in terms of similarity of texts are presented in Appendix 1.The articles in Appendix 1 were selected from among the top 10 research articles that were central in the vector space of each research topic, considering their publication date and the number of citations.It is important to note that these studies do not represent the most influential and important studies for each subject.Rather, they can be considered studies with higher centrality in the research subject cluster in terms of text-mining technology.We provide a short description of these studies in the table in the appendix to show how the 23 topics differ from each other in research themes and focal points (Table 2).Since Mahizhnan (1999) published the first research paper on smart cities, the amount of smart-city-related research continues to increase and the subject is increasingly diversifying.In the early 2010s, smart city research began to surge.In 2011, only 13 studies were selected as smart city literature; however, this number increased to 503 in 2017, 805 in 2018, and 1,017 in 2019, as shown in Fig. 4. Mahizhnan (1999) described the possibility and applicability of a smart city in Singapore, presenting the vision of Singapore as an intelligent island and its IT strategies in various fields such as education, infrastructure, and economy with an emphasis on the final goal of improving the quality of life of its citizens.Considering the fact that Singapore launched a "Smart Nation" vision in 2014, this paper is meaningful as pioneering research in which technology and policy goals are well-balanced.Shapiro (2006) used "Smart Cities" in the main title of his article; however, the term "smart cities" referred to the growth of jobs for a highly educated labor force due to a better quality of life in a metropolitan area rather than technology-driven urban services.Yovanof and Hazapis (2009) would be one pioneering study that focused on a technology-driven smart city.They considered cities as "dynamic and evolving smart ecosystems known as Intelligent Cities" (p.445) and presented a framework for a digital city where ICT create smart services in cities.
An Technologies.These research trends in smart city technologies show that analysis and processing technologies for sensed data are emerging as more important fields in smart city research.In the Smart City Services category, since Stephan et al. (2008), who introduced the product development process of the Micro Compact Car for smart cities by Daimler, Models and applications has constituted the largest share of Smart City Services research.In the Smart City Policies category, Sustainability and mobility is a fast-growing field, implying that the concept of smart cities has expanded to include not only technological advancement but also social and environmental sustainability.

Geographical variations in smart city research
Regarding the regional trends of smart city research, we reviewed the geographical variations in smart city research topics to examine the existence of differences in research focus by region.As noted earlier, the geographical location of each research is identified based on the location of a corresponding author's affiliated institution.Overall, 37%, 42%, and 12% of smart city studies were conducted in Asian, European, and North American (U.S. and Canada) institutions, respectively, as shown  in Table 3.The differences between the share of each research topic in a specific region and the proportion of all smart city research articles of the geographic region was statistically tested with the t-test of the equality of proportions using the prtest function in STATA.In general, contrary to studies in European institutions, smart city studies in Asian institutions tend to emphasize technology rather than policies, which corroborates previous studies (Angelidou, 2014;Lee et al., 2014).In the Smart City Policies category, Asian institutions account for only 18%, while European institutions have a 57% share.There is no statistical difference by regions in the proportion of Smart City Services research.The results also indicate that European smart city research has recently focused on policy contexts with citizen-centric approaches (Joss et al., 2017), while North American smart city research shows relatively holistic approaches for all three categories identified in this study.This finding is somewhat different from that Mora et al. (2017), which highlighted the techno-centric understanding of smart cities in North American contexts.Although a slightly higher share of studies on Smart City Policies rather than Smart City Technologies and Services were conducted in North American institutions, smart city research in this region is generally better distributed across different topics than in the Asian and European regions.
Regarding the variances in specific research topics, in research on smart city technologies in Asian institutions, strong emphasis was placed on wireless networks, security and privacy, cloud computing, and machine learning.North American institutions focused on machine learning studies.Interestingly, Asian institutions conducted the largest number of studies on security and privacy, while emphasizing that privacy is socially and culturally of more important value in western countries.This might be because smart city technologies that facilitate security and privacy are more easily initiated and have been more actively implemented in Asian countries under Asian legal and cultural contexts, thus providing data and systems for study.In the Smart City Services category, there is little difference between regions except for energy management, video surveillance, and water management.A strong emphasis on video surveillance research was found in Asian institutions.This might be because people in Asian countries, such as China and Singapore, might have relatively higher acceptance of technology-based services that potentially infringe their privacy.European institutions focused on practical research on smart city management such as energy and water management.In the smart city policies category, there was a weak emphasis on human capital in Asian institutions as opposed to a very strong emphasis on this topic in North American institutions.On the other hand, European institutions are more actively conducting various studies of smart city policies.In sum, the geographical variation of smart city research topics showed an overall trend of technology-driven approaches in Asian institutions, policy-centered approaches in European institutions, and holistic approaches in North American institutions.

Discussion
This section includes a discussion of the theoretical, practical, and methodological implications of our findings.Based on the review of the growing body of smart city literature, we present several missing perspectives and suggestions for future studies.

Strengthening multidisciplinary aspects through urban analytics
Smart city research, which comprises three research categories-technologies, services, and policies-has strong interdisciplinary characteristics.Urban planners who make and implement a smart city plan are required to have basic knowledge of the smart city technology and service systems to communicate effectively with engineers and data scientists.Scientists and engineers working for smart cities also need to understand the potential negative consequences of their technologies and services on societies to realize a people-centered smart city.Effective collaboration between policy makers, urban planners, engineers, and businesspeople could be a prerequisite for successful smart city creation.However, it is not easy for experts from different disciplines to collaborate because language usage, thinking and communication methods, and problem-solving approaches are different in each area of expertise.
The networks of research topics identified in this study in Fig. 2 provide clues for successful cooperation in the multidisciplinary smart city discipline.As noted earlier, each research topic is linked to the three most relevant topics in terms of keywords of research.Models and applications and System architecture have the highest degree of centrality and are linked to almost all research topics.In of their centrality in the research network, these two topical areas are strongly oriented toward engineering studies; consequently, it is somewhat difficult to create a link between smart city technologies and policies through these topics.Thus, considering the current research networks, it is necessary to build a new field to connect various research topics of smart cities, and "urban analytics" could be one of the fields of study with the highest potential.As shown in Fig. 2, Big-data analytics and Machine learning had no connection with four sub-topics of Smart City Policies; however, Urban analytics could effectively link the policies, technologies, and service systems.In the Urban analytics field, data scientists who focus on managing and analyzing big data generated from smart city sensors and services systems could play an important role in supporting effective communication among experts in various fields.Recently, many universities have introduced new graduate education programs for urban analytics (i.e., Urban Analytics at the University of Hong Kong and University of Glasgow, Smart Cities and Urban Analytics at the University College London, Applied Urban Analytics at the University of Manchester, and Applied Urban Science and Informatics at the New York University).Students trained in these educational programs will be able to become key professionals in the field of smart city.As one of the emerging disciplines related to smart cities, continuous research and development on theories and practices in urban analytics should be conducted.More specifically, a curriculum and ethical standards with regard to urban analytics should be established and evaluated, and an artificial intelligence algorithm effectively applicable to urban big data analysis, such as high-resolution spatiotemporal analysis, should be developed.

Mobility and transportation studies
One of the unexpected findings is a relatively weak connection between sustainable mobility and transportation-related topics.The studies on mobility were grouped in the same cluster as the subject of sustainability (3.3).Some of the studies indicate that the ultimate goals of sustainable and smart transport are not different but highly interdependent (Bamwesigye and Hlavackova, 2019;Lyon, 2018), or smart mobility could be understood as a tool for realizing sustainable transport (Zawieska and Periegud, 2018).The concept of mobility is somewhat comprehensive and represents diverse aspects of transport issues in a study context.Some studies used the term for individual-level transport behavior (Miyazawa et al., 2019;Pappalardo and Simini, 2018;Semanjski and Gautama, 2015), while in other studies it is more closely related to public transport or the overall transport system of a city (Garau et al., 2016;Peprah et al., 2019).Although a broader concept of mobility seems to include tech-driven transport services such as Connected vehicles (2.4), Traffic management (2.7),EV-charging infrastructure (2.9), or Parking management (2.11),Fig. 2 shows a clear demarcation between policy-oriented and tech-driven research fields.The four transport-related clusters (2.4,2.7, 2.9, and 2.11) constitute a larger cluster network within Service, while the Sustainability and mobility cluster (3.3) maintains a strong connection with Policy-relevant clusters such as Citizen governance (3.1), Planning and development (3.2), and Human capital (3.4).In particular, studies of EV-charging infrastructure and Parking management have addressed spatial analysis and location optimization problems, but there seems to be no manifestation of a direct connection with planning and development.Our finding may reflect long-lasting gaps between social science and engineering studies.Regarding education in transportation planning, there is an increasing trend of offering education making multidisciplinary connections between traditional technical and soft skills for handling a wide range of conflicting problems and demands (Handy et al., 2002).From a research perspective, we could not find any evidence of increasing trends of a multidisciplinary approach that connects both disciplines.Meanwhile, the finding also implies a current gap within smart mobility studies and a promising research subject in the near future.Note: *, ** refer to the statistical significance of t-tests at the 5% and 1% level, respectively.
C. Lim et al.

Missing fields and perspectives in smart city literature
Based on the systematic review of smart city literature, we identified 23 research fields that have been widely explored by smart city researchers.However, there remain missing and weaker fields in smart city studies.First, smart health care is a missing and underdeveloped field in smart city literature.Regardless of whether the main providers are in the public or private sectors, health care is traditionally one of the core services of cities (Vandecasteele et al., 2019).With technological advancement, there is continuing growth in the provision of remote medical care services; however, this topic is rarely found in smart city literature.In South Korea, smart health care is one of the key R&D service areas of the national smart city projects.Indeed, Busan Gimhae, one of the nation's pilot smart cities, is preparing to develop new towns that combine remote health care and community care services for the elderly.The recent outbreak of COVID-19 shows that public health research combined with smart technology is a key area for effectively responding to epidemics in smart cities (Kang et al., 2020).Therefore, in the future, more systematic research on health care services in the context of smart cities should be emphasized.
Second, air quality management is one of the most important areas not to have been systematically studied in smart city literature.Air quality management is drawing attention as one of the most important urban policies not only in developing countries such as China, but also in developed countries in Europe, because ensuring better air quality could lead to healthier and safer lives of citizens (Vandecasteele et al., 2019;Wang and Hao, 2012).Reflecting this importance, many air quality monitoring systems and services with IoT sensors were exhibited at many smart city exhibitions like the Smart City World Expo World Congress in Barcelona in 2019.While studies on Environmental sensing and monitoring (2.6) have included air quality monitoring, it has not yet been developed to the level of effective management that other studies of smart city services have that focused on "management."This might be because rigorous academic research is extremely difficult due to the lack of available high-resolution air pollution data.Specifically, air pollution monitoring and modeling in cities is only possible when linkages between macroscopic meteorological models, urban microclimate models, and sufficient empirical data are secured.Since available data from IoT meteorological and air pollution sensors are gradually accumulating, more systematic research and development in air quality management in the context of smart cities should be developed in the future.
Lastly, there is less emphasis on the role of public sectors and local communities in smart city studies.Public sectors like local governments and public enterprises have played a key role in the provision of services in cities.The identified topics in Smart city services include many public domains such as energy, traffic, water, and waste management.Despite the importance of public services in cities, there is lack of emphasis on the public in smart city literature.In fact, the only research topic with the keyword "public" is Citizen governance (3.1).Smart city research led by private industry does not automatically guarantee the sustainable development of smart cities.For instance, studies of smart parking management, traffic management, and electrified vehicles could promote more convenient auto-dependent cities, which are widely known as less sustainable cities.Although Citizen governance (3.1) includes "social" and "innovation" as main keywords, consideration of the role of communities, such as community-based air monitoring and renewable energy communities, is also lacking in smart city studies.Therefore, more research related to public services and local communities needs to be conducted with reference to all research topics.For example, future studies on Smart mobility regarding personal mobility vehicles like escooter sharing services to solve last-and first-mile problems in public transit could contribute to sustainable and smart urban development.Considering the provision of services in cities, emphasis on the role of public sectors and public service development for smart cities cannot be overstated.

Conclusion
This study is meaningful in applying text mining techniques to a literature review to facilitate the understanding of the overall composition and trends of smart city research.The smart city, which has been presented theoretically and conceptually in social science literature, is now growing to become a new academic and industrial field that combines technologies and services as well as policy discussions from various disciplines.Based on the findings of this study, we define the smart city as a city that provides various smart services using ICT under smart policies to solve urban challenges and improve the quality of life of citizens.Specifically, wireless networks, big-data analytics, and IoT are core technologies of smart cities, and energy and environmental management are key application areas of smart city services.Smart city policies focus on ensuring citizen governance and sustainability goals.
Our work advances the understanding of smart cities by mapping dispersed knowledge from the scientific literature to achieve a more systemized and integrated conceptualization.We believe our work is important because there is ongoing intense discussion among researchers and practitioners regarding the importance of smart cities for the prosperity of humankind.Our work can provide clarification and establish a foundation for this discussion by expanding the multidisciplinary understanding of smart city research fields.Further, methodologically, to the best of our knowledge, this study is one of the first to apply machine learning to the data-driven review of urban planning literature.We took a different approach and developed an understanding of smart cities based on the text mining of a vast number of related documents.In fact, as shown in the previous sections, our approach adds to the traditional approach rather than subtracting from or conflicting with it; the findings of our work are consistent with the information that we can glean by reading existing studies.This implies that the findings of our semi-automated analysis of textual big data make sense, naturally and clearly reflecting the existing structure of smart city literature.Our approach can be used in other conceptual foundation studies in the future.For example, we believe a machine learning approach can be effective in "clarifying the concept of the sustainable city," "comparing the concepts of the smart city and sustainable city," or "analyzing the evolution of urban studies." Nonetheless, our work shows several limitations that can be addressed in future research.First, our findings are data-dependent, which means the results might change if the data source or time frame for data collection changes.Although we used all available data from the Web of Science Core Collection databases, the scope and source of data could be expanded to other databases (e.g., Scopus) or reduced to specific domains (e.g., urban planning journals).We collected data on the literature up to April 8, 2020; data in April 2022 would undoubtedly show an extended list of research topics and factors of smart cities.Further, as this study analyzes studies that include "smart city" words, there is a limit in that it may not include studies that explain smart cities with other concepts such as "smart urbanism" or "smart citizenship."Depending on the keywords used in databases, the outcomes could vary.However, this study's approach is meaningful at least in showing the structured characteristics of a research field at a particular time point, while it is difficult to generalize the outcomes of this study using a database based on a specific keyword at a specific time point.Second, our work did not provide detailed information on each research topic or factor of smart cities.For instance, this study does not clearly identify public services for citizens and businesses such as issuing permits, getting certificates, and automatic notifications while the e-government based on ICT is one of the key elements of smart cities (Caragliu et al., 2011, Yeh et al., 2017).Considering the keywords in Table 1 and relevant studies of Appendix, service management may partially include these types of public services, but there is a limit to showing the overall structure and characteristics of smart city-based public services.Therefore, a systematic review of existing literature for a specific topic may be valuable in describing a particular aspect in more detail, such as C. Lim et al. reviews focused on public services for citizens and businesses, and mobility in smart cities.Finally, our findings obtained through text mining should be integrated with real research and development projects related to smart cities through empirical studies.An integration of this text-mining based literature review with such practical project cases will further facilitate an in-depth understanding and promotion of smart cities.

Research topics
Review of representative articles  2017) reviewed the cases of assessment frameworks for urban sustainability and smart cities, and found that in the frameworks of smart cities, less emphasis had been placed on environmental aspects.− Lyons (2018) explored how smart urban mobility in smart cities is related to the concept of sustainability, and defined smart urban mobility as connectivity within sustainable cities. − Bibri and Krogstie (2019) argued that the sustainability performance of cities is monitored, analyzed, and improved using ICT technologies, and proposed a conceptual framework for the data-driven smart sustainable cities. 3.4.Human capital (36) − Shapiro (2006) considered a smart city as the place where human capital results in employment growth not only through knowledge spillover but also through quality of life considering consumption amenities.− Winters (2011) considered a smart city as a center of higher education, then showed students who complete their education and stay in the city, and that in-migrants within a state are the main contributors of the smart city.− Dameri and Ricciardi (2015) suggested a smart city intellectual capital (SC-IC) framework with six capital concepts (human; social; institutional; process; renewal; environmental), and proposed five expected outcomes of a smart city (value creation, competitiveness, resilience, sustainability, quality of life).Zanella, A., Bui, N., Castellani, A., Vangelista, L., Zorzi, M., 2014.Internet of things for smart cities. IEEE Internet Things J. 1 (1), 22-32.Zawieska, J., Pieriegud, J., 2018.Smart city as a tool for sustainable mobility and transport decarbonisation.Transport Policy 63, 39-50.Zhang, K., Ni, J., Yang, K., Liang, X., Ren, J., Shen, X.S., 2017. Security and
C.Lim et al.   studies of Traffic and parking management service systems are developed.
interesting finding from the trend in smart city literature is that smart city concepts and research began in theoretical and conceptual discussions in policy disciplines and extended to more technologyoriented disciplines.In fact, the first 6 of the 8 articles published before 2011, and 17 of the 38 studies published by 2012 were classified in the Smart City Policies category, while a greater proportion of papers are currently being published under the Smart City Technology and Service categories, indicating that smart city literature was initiated in theoretical and conceptual discussions in the policy disciplines and extended to more technology-oriented disciplines.Since 2014, more than 100 studies in the smart city area have been published, and approximately 76% of such research papers have been published in journals indexed in the SCIE database, while 24% have been published in journals in the SSCI.Aa a reference, the list of journals in which smart city studies are most frequently published are summarized in Table 2. Institute of Electrical and Electronics Engineers (IEEE)-related journals are listed in the Smart City Technologies and Services categories, while social science journals such as Sustainability and Cities are listed in the Smart City Policies category.Sensors, IEEE Access, Future Generation Computer Systems, and Sustainable Cities and Society are listed in all three research categories.The trends of smart city literature also show the evolution of research topics over two decades as shown in Fig. 5.The first research article in the Smart City Technology category was by Chen et al. (2010), who presented big data technologies for managing and utilizing visual data with digital watermarking techniques.Since this study, "big data analytics" has become one of the fastest-growing research areas in smart city technologies, along with "cloud computing."By 2015, Mobile crowdsensing was the more frequently studied technology, but its share in smart city technologies is now decreasing, while IoT and Security and privacy have maintained their share.From 2016, Machine learning emerged as a new research field and has expanded its share in Smart City

Fig. 5 .
Fig. 5. Trends in research topics of smart cities.

1. 1 .−
Wireless network (218) −Qiu et al. (2018) introduced a system architecture of heterogeneous IoT and its application fields such as smart home, intelligent transportation system, and security system as well as smart health care system.−Han et al. (2017) conducted a study on 5G cell-less communication networks and the probability of their coverage, energy efficiency, and future challenges as mobile terminals in smart cites.−Magno et al. (2014) proposed technology for a wake-up radio receiver with a smart power unit which could improve the energy efficiency of the wireless sensor networks.1.2.Big data analytics (208) − Hashem et al. (2016) reviewed the communication technologies and smart city applications, then proposed the vision and business model of big-data analytics in smart cities. − Osman (2019) presented a conceptual framework of big-data analytics in smart cities called, "Smart City Data Analytics Panel" based on a synthetic literature review.1.3.Internet of Things (204) − Zanella et al. (2014) provided a comprehensive review of communication technologies of an urban IoT system for smart cities, and introduced the case of the Padova Smart City, Italy.− Lin et al. (2017) conducted a systematic review of IoT technologies in terms of system architecture, security, and privacy focusing on fog/edge computing-based IoT.1.4.Security and privacy (162) − Zhang et al. (2017) reviewed the privacy and security challenges in smart city applications and suggested future research directions.− Sookhak et al. (2018) overviewed the security and privacy issues in smart cities and suggested the security requirements of smart city applications.1.5.Cloud computing (131) − Masip-Bruin et al. (2016) introduced a layered fog-to-cloud architecture and suggested its advantages and research challenges.− Naranjo et al. (2019) presented a Fog-supported smart city network architecture and its advantages in energy usage.1.6.Machine learning (109) − Mohammadi and Al-Fuqaha (2018) argued that much of generated data of smart cities are wasted, and proposed a semi-supervision data learning framework with a deep reinforced learning model to ensure the usability of smart city big data.− Diro and Chilamkurti (2018) compared the performance of a deep learning model to that of a conventional machine learning model in detecting cyber-attack in a social IoT system.1.7.Mobile crowdsensing (84) − Cardone et al. (2013) reviewed socio-technical challenges of mobile crowdsensing and proposed an android-based crowdsensing platform.− Abualsaud et al. (2018) comprehensively reviewed the challenges and opportunities of mobile crowdsensing and discussed the ICT infrastructure to support this crowdsensing.2.1.Models and applications (438) − Leydesdorff and Deakin (2011) proposed the triple-helix model by which the interactions between university, industry, and government contribute to a knowledge-based economy to explain the process of reconstruction from traditional cities to smart cities. − Bellini et al. (2017) suggested the methodological approaches of utilizing Wi-Fi based sensing to understand and analyze city users' behaviors in smart cities focusing on the localization technology.− Yang and Lee (2019) suggested applications to improve the accuracy of an automated 3-D city model by analyzing the combined information from aerial photographs and terrestrial instruments.2.2.System architecture (236) − Mora-Mora et al. (2015) presented a system architecture for collecting and representing pedestrian flow information based on a smart sensor network with RFID communication technologies.− Chifor et al. (2017) proposed a system architecture and the implementation of solutions in utilizing social network platforms to connect the citizens with smart objects.− Pribyl et al. (2018) proposed a decentralized intelligence system for smart cities based on the Intelligent Transport System architecture.2.3.Energy management (158) − Moreno et al. (2014) introduced an energy efficient building management system with IoT and its performance for energy saving and consumption.− Mosannenzadeh et al. (2017) proposed a theoretical and practical framework of a smart energy city based on a literature review and interviews of experts.2.4.Connected vehicles (115) − Saleem et al. (2019) introduced the routing scheme of the Mobile Ad-hoc Networks in which the Vehicle to Vehicle data transmission is possible.− Wan et al. (2019) proposed a vehicle mobile IoT coverage enhancement algorithm to ensure effective network communication among in-vehicle sensors.2.5.Service management (105) − Kuk and Janssen (2011) explore the relationship between business models and information architecture in implementing smart city services based on the case study of two Dutch cities. − Lytras and Visvizi (2018) focused on the citizen's awareness of smart city applications and their ability to use the applications, then showed that even skilled users have various concerns, such as safety and efficiency, while utilizing the smart city services.− Yeh (2017) analyzed the citizen's willingness to accept and use the ICT-based smart city services in Taiwan, then showed that frequent use of smart city services can improve the quality of life of the users.2.6.Environmental sensing and monitoring (105) − Kotsev et al. (2016) overviewed the AirSensEUR project which is an open platform for monitoring air quality and pollution using low cost sensors.(continuedon next page)C.Lim et al.Mydlarz et al. (2017)  proposed smart and low-cost noise monitoring sensor systems with an emphasis on the calibration of a microphone to acquire reliable noise information.− Venkatanarayanan et al. (2019) suggested a smart sensing system for collecting real-time air pollution and health data of bike-users.2.7.Traffic management (94)−Galán-García et al., (2014) presented an accelerated-time simulation for traffic flow for smart traffic control based on cellular automata and neural network approaches.− Younis and Moayeri (2017) proposed a dynamic traffic light control system by which a traffic light schedule can be adjusted considering the traffic conditions.2.8.Video surveillance (57)− Calavia et al. (2012) proposed an intelligent video surveillance system to identify abnormal object movement as a safety and security solution for smart cities. − Sultana and Wahid (2019) introduced a video surveillance system with an Internet of Video Things (IoVT) framework, and presented a system architecture of the IoVT surveillance system.2.9.Electric vehicle charging infrastructure (47) − Lam et al. (2014) identified an electric vehicle charging station placement problem and evaluated the performances of possible solutions based on various simulation.− Chaudhari et al. (2017) proposed a hybrid optimization algorithm of energy storage management for the Photovoltaic (PV)-integrated electric vehicle charging station considering the price of electricity, PV power, and operation cost.2.10.Water management (37) − Lee et al. (2015) provided a comprehensive review of smart water grid technologies and introduced smart water management platforms.− Chen and Han (2018) presented a multi-parameter water quality monitoring system that provides real-time water quality data based on the case of the Bristol Floating Harbour.2.11.Parking management (35) − Bagula et al. (2015) introduced a smart parking infrastructure and proposed a multi-objective optimal sensor placement model in terms of coverage, lifetime, and cost.− Vlahogianni et al. (2016) proposed a methodological framework of parking availability based on the actual parking sensing data of the city of Santander, Spain.2.12.Waste management (24) − Anagnostopoulos et al. (2015) proposed an algorithm of efficient and scalable waste collection system for the areas requiring immediate collection.− Laso et al. (2019) compared conventional door-to-door collection with pneumatic collection based on life cycle assessment focusing on biodegradable waste.3.1.Citizen governance (340) − Vanolo (2014) reviewed the concept of smart city based on the European Union cases with an emphasis on the role of private sectors and citizens in managing urban development.− Albino et al. (2015) provided an overview of the definition, dimensions, performances, and initiatives of smart cities, then present that smart city concepts emphasize not only ICTs but also "qualities of people and communities" (p.18) − Thomas et al. (2016) investigated the responses of citizens to the concept of smart cities based on a survey conducted in London, Manchester, and Glasgow.3.2.Planning and development (223) − Caragliu et al. (2011) reviewed the concept of smart city and suggested factors for evaluating the performance of smart cities in Europe using the 2004 Urban Audit data set.− Batty et al. (2012) discussed the goals, research challenges, scenarios, and project areas of smart cities.The seven project areas included smart city database, sensing, networking, mobility, urban land use, and so on.− March and Ribera-Fumaz (2016) criticized smart city strategies and projects in Barcelona where environmental concerns were politically utilized to develop the smart city initiatives.3.3.Sustainability and mobility (149) − Ahvenniemi et al. (

Table 1 23
Research topics and their keywords

Table 1 (
continued ) 3.2.Trends in smart city literature over two decades

Table 2
List of the most frequently published journals of smart city studies

Table 3
Research topics by regions privacy in smart city applications: Challenges and solutions.IEEE Commun.Mag.55 (1), 122-129.Dr. Chiehyeon Lim is an Associate Professor in the Department of Industrial Engineering and the Graduate School of Artificial Intelligence at UNIST.His research interests include smart service system, service systems engineering, and knowledge discovery with data mining.Dr. Gi-Hyoug Cho is an Associate Professor in the Department of Urban and Environmental Engineering at UNIST.He conducts research on pedestrian safety, travel behaviors and applications of smart city technology in urban planning process.His research has been published in Accident Analysis & Prevention, Urban studies, Cities, Journal of Transport Geography, among others.Dr. Jeongseob Kim is an Associate Professor in the Department of Urban and Environmental Engineering at UNIST.His research interests lie in housing, neighborhood change, and smart cities with an emphasis on applications of urban data analytics.Dr. Kim earned his doctoral degree in Design, Construction and Planning at the University of Florida and served as a planner and consultant for various institutions, such as Inter-American Development Bank and Daegu Metropolitan Council.