International Journal of Applied Earth Observation and Geoinformation

are aligning with the commonplace idea that the main advantage of using DTs is economical as, for example, DTs can improve the planning of activities thus saving money and time. But how can they be useful for a city? Instead of looking at the DTs as solutions in search of problems to be solved, we start from city needs. Our approach is two-fold. We start by briefly reviewing existing possibilities for meeting some specific needs, but keep the focus on identifying and attempting to close the gap between the needs arising from everyday city functions and the latest DT techniques useful for meeting those needs. DTs are technically different and serve different applications, yet they share a common identity and name, as well as several technical similarities. Adopting computer science terminology, we see a back-end city DT as the container of all information, while any single front-end, visualized or used either by humans or robots, offers a limited but meaningful representation of the DT for a specific application. Alas, there are multiple open questions regarding the realization and benefits of such back-end DT. Nevertheless, we discuss how the back-end DT (or any specific DT) could be updated autonomously from sensor data using artificial intelligence techniques, and how the front-ends could be used for large benefits to the entire city ecosystem.


Introduction
Cities host over 74% of the population in Europe (WorldBank, 2022). As the population increases in cities, problems emerge, but so do solutions too (Caragliu et al., 2013). A city with human capital, if well managed, may become an exporter of solutions, a so-called innovation machine (Florida et al., 2017). This is desirable since innovations bring benefits to the economy and the quality of life of citizens (Lehtola and Ståhle, 2014). However, it is not self-evident how such a machine can be put together by the city management. For city management, the thing to desire is not actually a machine, but an ecosystem. There is a promise of increased economic efficiency if all players (private companies and public parties) organize themselves in a good manner (Calzada, 2020). Because the complexity of such a system is of the level that if centrally managed, the system dynamics are hindered down, the solution is an ecosystem that rewards participants for proper Fig. 1. Digital twins are different depending of the context. In manufacturing, DT is replicated into products, while in construction the work site is first surveyed and adapted to match the constructed building. Manufacturing and construction start from the DT and end up with the physical twin, using a linear timeline, while maintenance and smart city management are cyclic processes. Both twins are changed recursively. Finally, when a building is maintained as a whole, only a part of a city (and its DT) is considered to be of interest for a particular operation. In a city, different parts of the city obtain attention with different frequency and intensity. ready and the production is started. In construction, Building Information Modeling (BIM) is the closest match to DTs (Greif et al., 2020;Deng et al., 2021). BIM models are high fidelity three-dimensional (3D) construction plans, that sometimes need to be adjusted to fit in the landscape, for example, when single form-work concrete bridges are built. Facilities management (Wong et al., 2018;Matarneh et al., 2019), industrial maintenance (Errandonea et al., 2020), and smart city applications require updating of the DT (Ketzler et al., 2020;Farsi et al., 2020). This forms a requirement that the process is cyclic, see Fig. 1.
In contrast to previous DTs, see Fig. 1, a holistic city DT must support different local levels of fidelity. Buildings in a city can be from different centuries, and need to be scanned before they can be added into the DT. This, and the updates, likely happen only as a side product of other activities, meaning that there is internal variance in the granularity level of the DT between areas, themes etc. If a construction or maintenance activity is taking place in a city district, that area is scanned in detail, providing a local high fidelity update. The rest of the city DT will not be affected, and will retain the previous level of fidelity it had. Nonetheless, a city DT would follow the original idea in manufacturing that all of the planning is done digitally and for a jointly shared model. A city DT must be approached also from the perspective of human factors, if the DT is to serve the ecosystem orchestration. A city DT is not only about the properties of a computer model but also how the everyday services in the city are organized to perform all planning activities in a joint digital model. Technocracy must be balanced with democracy, if stakeholders are to be truly included (D'Hauwers et al., 2021). Thus, if something is changed in an existing city, then the digital planning should answer the question on what are the consequences to the city dynamics. This digital planning should be available also to third parties, to test new services and changes to city plans.
City management benefits from specific digital twin techniques, for example, in the cities of Helsinki (Ruohomäki et al., 2018), Zurich (Schrotter and Hürzeler, 2020), and Vienna (Lehner and Dorffner, 2020). These DTs share a common identity under the DT umbrella (Van der Valk et al., 2020), but they are technically different and serve different applications. We could argue that they are, in fact, manifestations of developing DTs residing between the maintenance DTs and the city DTs that answer more fully to the multitude of city needs.
We ask, how can the DTs be useful in answering the city needs. What should a digital twin of a city contain and how should it be updated (cyclicity in Fig. 1) and offered to serve a dynamical ecosystem and therefore enhance the efficiency that ecosystem? We argue that a digital twin of a city would be the pinnacle of digitization of city assets and services, when it would consist of the four following parts.
1. The DT must be based to address the needs of the city (Section 2). 2. Support for high fidelity content must be offered, including mature BIM information, but also support for low fidelity content and local differences in fidelity (Section 3). 3. Updating the DT of a city is of utmost importance, as the city DT is never complete, because a city is constantly changing. We review the literature not only directly under the DT umbrella, but look for methods that could well serve the automatic updating of city DTs. Autonomous updating from sensor systems includes Internet of Things (IoT) sensor networks, drones, and robotic cars, but also data from professional surveying (Section 4). 4. The benefit of interacting with DTs can be ensured only by safe and usable systems that could enable agents to visualize and share information appropriately to enhance decision making. A human factors perspective is essential to identify the potential advantages and the future use of DT systems to support city decision making (Section 5).
Should these four items be met, we see benefits for autonomous and human operated systems in various forms including asset management and planning of activities and robotic services (Section 6).

City needs
The needs of a city follow from what the city is and what the city does. We consider a city to be an entity that is managed in terms of city planning and governance (Beall and Fox, 2009) to provide good living environments for the inhabitants. In Europe, this management role typically falls onto a democratically elected city council or equivalent, which uplifts a plurality of needs from the civil society it represents. Hence, the needs of a city are not constant but they change. Furthermore, cities need to react both to internal and external pressures demanding change. For example, the United Nations Sustainable Development Goals (SDGs) were formulated rather recently, in 2015. Yet, some cities are facing more traditional challenges due to rocketing number of citizens and limited land availability leading into vertical growth (Chen et al., 2008). Current urban planning and design practices, and operating procedures are more in favor of sectorial rather than systematic developments with synchronized data planning and exchange (Klyukin et al., 2018). This often results in conflicting situations that lead to waste of investments (OECD, 2020).
We consider the following city needs, typical for European cities. The cities need • City planning and urban development to obtain good living environments. The development needs to be sustainable (SDG 11, United Nations 1 ). The city's digital twin can help here in multiple ways (Qian et al., 2022). These include zoning and municipal development, high rise planning, climate simulation and use in visualization for architectural competitions and participatory planning activities, to mention some examples reported for the city of Zürich (Schrotter and Hürzeler, 2020). • Primary and secondary education, arranged jointly with city planning, avoiding socio-economic segregation and problems, e.g. Renzulli and Evans (2005).
• Healthcare and emergency services arranged so that they are available and adequate (Ahmadi-Assalemi et al., 2020). • Infrastructure to be constructed and properly maintained (roads, water, sewage, energy). DT is a viable solution in, for example, predictive maintenance of roads (Sofia et al., 2020), simulations and analyses of traffic (Rudskoy et al., 2021), energy (Tzanis et al., 2020), but also water and waste and telecommunications (Callcut et al., 2021). • Taxes to fund these activities. Fostering the growth of economic activities, including innovations (Florida et al., 2017), leads to increases in city tax revenues and further investments in bettering the lives of the citizens. Digital twins and data analytics can enable the design and development of new commercial activities and services based on data (see Section 6). Benefits are reachable if challenges can be met, standardization being one of them (see Section 3).
There is a gap in between what can theoretically be done with DTs and what is in practice done with DTs (Kar et al., 2019). In order to see the actual state of progress with city DTs in Europe, we look at a particular area 2 in the Helsinki metropolitan area, Espoo, the most sustainable city of Europe in 2016-2017 (Zoeteman et al., 2016(Zoeteman et al., , 2017. Espoo has up-to-date 3D models for internal use, but to a certain extent 3D models can also be distributed to third parties via interfaces. 3 The utilization of 3D models in urban planning has been in use in Espoo for a long time, but the integration of real-time data (e.g. on traffic, energy consumption, etc.) into the urban model is in the development phase. In Helsinki, the current city strategy emphasizes the role of digitalization as both a method to increase productivity in city services and a tool to facilitate better prediction and response to potential changes and crises, such as climate change (Helsinki, 2021a). The current city data strategy perceives the digital twin as primarily a tool for scenario analysis and simulation (Helsinki, 2021b). Research on the Helsinki example shows that solid data infrastructure forms the foundation for a successful urban digital twin (Hämäläinen, 2021). Solid data infrastructure builds on standards and the interoperability they enable, discussed in the next Section.

Information content of DT
When in digital form, our urban environment should replicate its essential properties, including those related to buildings, infrastructure, vegetation, terrain and other elements, but also offer ways to link information from various processes onto them. To determine the information content of an urban environment requires cognition. However, cognition is different for humans and computers (or robots). DTs can only be stored digitally in a way that is understood by the computer with so-called representations. 4 The 3D models used for DTs are usually created using point clouds and/or images as raw material. Therefore, we refer to them as data derivatives. These derivatives fall into five distinct categories, listed in Table 1.
Why are there so many different data derivatives? Because they are all useful. Point clouds are the natural output from sensor systems (Section 4.1). Voxels are the aggregation of point data into cubes with predefined spacing (e.g. 1 × 1 × 1 m), especially useful for physical simulations. They answer to city needs in, e.g., visibility analyses for road safety (Aleksandrov et al., 2019;Golub et al., 2018), but also traffic noise (Saran et al., 2018), solar radiation (Liang and Gong, 2017), and wind analyses (see Fig. 2 b). However, the use of voxels is limited by their memory consumption, which forces a painful trade-off between keeping the details accurate and having a large scale needed for a city DT. Mesh surface models refer to, e.g., the triangulation of point clouds into surface models (Berger et al., 2017;Brédif et al., 2020), which are useful in topographic mapping (Vosselman and Maas, 2010). Finally, 3D models are end products for GIS-based (Section 3.1) or BIM-based (Section 3.2) DTs, or used in visualizing them (Section 5.2). City DTs encompass a lot of information, arguably more than the previous DTs. We shall see that city DTs have (therefore) naturally emerged as a continuation of two existing ecosystems related to 3D modeling: one founded on Geo-Information Systems (GIS) and the other founded on Building Information Modeling (BIM).

GIS origins
Geo-information science has its roots in paper maps, but its modern version is focused on vector data and how to organize it. One of the first requirements for organized data management is to store the collected and used datasets in a relational database management system (DBMS) with a spatial extension such can be PostGIS or Oracle Spatial. Moreover, they can be stored in so called 3DcityDB (Rossknecht and Airaksinen, 2020) and DB4Geo (Breunig et al., 2010). The most widely supported data formats for 3D city digital twins are CityGML (Chen et al., 2020) and CityJSON (Nys et al., 2020). City GML is an Open Geospatial Consortium (OGC) standard for multi-hierarchical geographical, topological and semantic representation (Brasebin et al., 2018), supported by widely used geo-spatial software such as ArcGIS and QGIS. CityJSON was proposed 2021 to OGC as another standard and is expected to become accepted. Using CityGML, cities can in principle be visualized in four levels of details (LoD 0 to LoD 4). Starting from simple building representation and increasing the complexities incorporating even the indoor in LOD4 . Standards may be extended to enhance their applicability. For example, SimStadt developed by TU Stuttgart extends CityGML for energy simulation (Nouvel et al., 2017). In addition, Agugiaro et al. (2018) further developed an Energy Application Domain Extension (ADE) for the same standard, which from one side solves data interoperability issues among multisource energy-related applications, and, from the other side allows detailed single-building energy simulations and also city-wide energy assessments. Other applications include seismic vulnerability assessment (Catulo et al., 2018), noise mapping (Deng et al., 2016), and flood damage assessment (Amirebrahimi et al., 2016).
Validation of models is crucial before those models are used in decision-making. However, there are issues, for example, in validating the geometrical relationships of 3D buildings and their real values (Ham and Kim, 2020). Another highly relevant example is the cadastre. Typically, land administration systems are the platforms where updates on ownership have to be registered in the form of rights, restrictions, and responsibilities (RRRs), and are managed by the national cadastral agencies (van Oosterom et al., 2020). Unfortunately, the majority of the current cadastral systems represent, visualize, store, and validate ownership data in 2D format (Shnaidman et al., 2019). However, successful efforts in the direction of 3D Cadastre have, for example, been done in the Netherlands , Sweden (Larsson et al., 2020), Finland (Krigsholm et al., 2020), China (Ying et al., 2015), and Australia (Rajabifard et al., 2018). First 3D Cadastre building in Helsinki metropolitan area, Finland, was registered in 2020. 3D cadastre combines the 1) legal model based on advanced policies and standards and 2) physical model providing the spatial property registration (van Oosterom et al., 2018), linking the different standards such as LADM (Land Administration Domain Model, ISO19152, 2012), 3D GIS (CityGML) or BIM with its IFC models (see Section 3.2). One of the main challenges has been that the principles and validation rules in the current 2D systems are not designed for examining 3D models (Karki et al., 2010). Here, the knowledge in how to do it largely exists (Thompson et al., 2019;Asghari et al., 2020), but the practice is lagging behind.

BIM origins
The second point of origin of an urban digital twin is recognized in Building Information Modeling (BIM). BIM models are detailed models of our built-up environment incorporating the geometries of the buildings, their spatial and topological relationships, and detailed information of their physical infrastructure. Consequently, as BIM includes a large amount of data combining physical and functional building information, it requires high level technical data storage and maintenance (Chen et al., 2018). Here, difficulties have been observed in data transfer since BIM domain is still using proprietary data formats, workflows and software (Olfat et al., 2019), although open data formats, so-called OpenBIM, are gaining momentum. One of the most used data schemes for building representation, modeling and storage, and supported by most BIM software is Industry Foundation Class (IFC) (Sun et al., 2019). The IFC standard supports data and model transferability and reproducibility. Considering city DTs, we see that open data formats would be of paramount importance to ensure wide adaptation of intercommunicating technical solutions.
The BIM ecosystem inherently addresses certain city needs. In addition to 3D information, offering great benefits BIM offers to civil engineers and architects, BIM models can be incorporated with information about the cost, maintenance and construction time (Fernández-Rodríguez et al., 2018), and be used to assess risks (Zou et al., 2017). Furthermore, they can be used for other benefits serving city needs such as cultural heritage management (Fadli and AlSaeed, 2019) and indoor navigation (Hamieh et al., 2020).
Validation of BIM models is done using model checkers (Sacks et al., 2017). Concerning the city, one concrete step in increasing automation related to model validation could be automatic granting of building permits based on IFC models, which is technically feasible (Noardo et al., 2020). The benefits for the city organization would be a reduction in the needed work hours to assess the building plans, and for the construction company it would mean shortened throughput times. There are also indirect beneficiaries from the faster construction processes, for example, the future residents. From a BIM model, different kind of DTs for buildings may be created (Khajavi et al., 2019). For example, a building DT can be a IoT-integrated BIM (Pan and Zhang, 2021). In general, BIM models are developed towards DTs by including aspects such as real-time monitoring and simulation (Boje et al., 2020). Remarkably, the primary Table 1 Data derivatives of captured (lidar) point clouds and RGB images.

Derivative
Raw data Human creation Point clouds a Images b BIM, CAD Point clouds with color X X Voxels X Mesh surface models X 3D models (e.g. BIM, CityGML) X Planning 3D models (Photorealistic or textured) X X Art Specific process related data (Section 4) (Section 4) Reports a (XYZ, Intensity), including the back-scattering intensity of a pulsed lidar. b Images are 2D perspective projections of the 3D world in the visible light range yielding Red-Green-Blue (RGB) colors, e.g., to color point clouds.
tenet of BIM does not require the BIM model itself to be updated in real-time to incorporate real-time sensory data (Bruno et al., 2018). In fact, although the well-established concepts of as-is BIM and Scan-to-BIM (Wang et al., 2019b) offer a systematic approach for updating the BIM model, this is only done at very few key milestones (e.g., after the construction) and thus at a very low frequency. These updates can also serve as one mean in updating city DT, but many other means exist as well (see Section 4). Finally, analogously with the GIS-based modeling, BIMs act as an attractive starting point for forming city DTs, although they are not sufficient per se.

Best of the both worlds
The applicability of BIM models can be enriched by combining them with 3D model of the environment obtained from GIS. For example, BIM models used in construction can be automatically transformed into CityGML (Donkers et al., 2016). Hence, GIS and BIM can be technically fused together, and the BIM models can be used to locally update a 3D city model, possibly also increasing the level of detail. One interesting example is offered by Lu et al. (2020), where the authors model 600 buildings of which certain important buildings were represented using BIM. However, when working on large city scale, computations may become intractable and data management issues arise (Chen et al., 2018). Also, BIM-GIS data conversion still faces challenges when huge data sets covering complete cities are processed (Olfat et al., 2019).
Combining GIS and BIM makes sense, given that the overall objective is to serve the city needs. Facilitating this development has been acknowledged in the development of the CityGML standard, where one of the most significant recent developments has been the CityGML 3.0 (Kutzner et al., 2020). Considering DTs, the mechanism for including dynamic sensor data is highly relevant.
The most established approach to realizing any urban DT appears to be the use of 3D city information model as their starting point, see e.g. Schrotter and Hürzeler (2020). Understandably, the city model is focused (and developed for) describing the state of the physical city environment. At the same time, there appears to be a widely shared understanding that the digital twin does not simply equal a 3D city model, but contains additional attributes and properties. Urban digital twin expands the scope towards modeling e.g. stakeholder relations, processes and handling simulation scenarios (Nguyen and Kolbe, 2021). Thus, the DT becomes, from the information content perspective, an extended, linked 3D city model. Amongst these added properties, at least the following have been mentioned: 1. Lifecycle management of individual city objects and assets; 2. Simulation use of the 3D city model to assess various scenarios; and 3. Linking the city model with real-time (sensor) data sources.
Maturity level assessment developed for BIM models may be applied when urban DTs emerge from that direction. Previous research shows that there are, in fact, multiple BIM maturity levels (Siebelink et al., 2018). This means that even if a BIM model has lifecycle information (cyclicity in Fig. 1), there can be a great variance in (i) what dimensions (e.g. time, money, maintenance) the model covers, (ii) what is the detail of information on these individual dimensions, and (iii) how is this information used in different processes. This variance has set a need to assess the maturity of the BIM systems, in order to identify their stage of development. Currently, one of the most advanced BIM maturity models is the Nordic BIM maturity model (Nordic BIM, 2020). However, the maturity indicator yet missing from the assessment is the one of automated updates. This would be greatly useful for city DTs. Constant updating of the contents of a city-scale model is an immense task, even if full GIS-BIM integration is reached standard-wise. It is something unimaginable to be done manually. Therefore, automated updating of models from sensor data is needed.

Updating DTs with sensors and AI
Large scale 3D city models can serve a variety of city needs. Benefits may already be reaped related to thermal simulations (Muñoz et al., 2019), urban ventilation analyses (Luo et al., 2017) and solar potential estimations (Machete et al., 2018). However, the limiting factor in doing this (and more) is that these models need to be created with so-called procedural modeling techniques, requiring manual work steps.
Significant resources and time are needed for updating DTs (Bshouty et al., 2020). Therefore, to reduce the cost of these efforts, focus is needed on automating the processing of data into derivatives, see Table 1. The deployment of new sensor systems to supplement the old ones offers various ways to acquire data on the state of physical assets (Section 4.1). It is not self-evident how to turn this big data into useful information that can be used to automatically update city DTs (Section 4.2), but we argue that the new artificial intelligence (AI) algorithms can enable ways to do this (Section 4.3).

Sensor systems
Sensor systems, both airborne and ground-based, are used to capture geo-referenced data from the real world, see Fig. 3. These systems can be categorized as in Table 2, where the main characteristics are also listed. This shows that sensor systems are both overlapping and complementary in various ways. Airborne systems, including unmanned aerial vehicles (UAVs) , may have lidar and imaging systems that output point clouds, orthophotos, and oblique imagery (Vosselman and Maas, 2010). Terrestrial systems include static systems such as camera networks and traffic counters, and also mobile systems such as individual car perception systems, as with Tesla, and professional car-mounted lidar mappers. The output of these systems is point clouds and images from which data derivatives can be obtained, see Table 1. Lidars have proven already useful for BIM purposes (see e.g. Liu et al., 2021), as they offer geometrically more accurate measurements than imaging systems.
Capturing indoor data is fundamentally different from capturing outdoor data, because GNSS signals do not penetrate building roofs or walls. The consequential extra effort to process and geo-reference indoor data makes indoor modeling much harder  and explains the gap between the abundance of LOD3 models and scarceness of LOD4 models (see Section 3). In other words, LOD4 models are expected to contain details about the indoor spaces and objects, but as these information cannot be obtained with the same methods than LOD3 models, a persistent gap has emerged between these two model types. Indoor mobile mapping systems combine both lidar and camera data to simultaneously position themselves and output point clouds and imagery (Lehtola et al., 2017). This technique is called simultaneous localization and mapping (SLAM) and it is needed to replace the missing GNSS signals. Finally, indoor data is converted into 3D models , which can be either in BIM or cityGML format.
IoT networks may be formed from sensors that are installed permanently to the urban environment and connected to internet. Communication between consumer and sensor can either be direct or via software systems (Jacoby and Usländer, 2020). Sensor networks are used for a variety of tasks, of which data acquisition, human-sensor interaction, knowledge discovery and generation, and intelligent control are mostly related to DT . Data acquisition is broad in terms of what can be acquired, most of them relate to the real time capturing of phenomena such as traffic and air quality. Results of these data captures are shown to humans or go into further data processing within the DT environment. In current practice many of the sensor networks are connected within a certain domain. In an ideal DT environment the different networks are communicating with each other by sharing data and insights (Ivanov et al., 2020).
Cities have also underground utilities. Monitoring and surveying above ground infrastructure is different than retaining an accurate representation of the underground assets, and related processes (Delmastro et al., 2016). However, underground utility surveys can be successfully used to update 3D data models (Yan et al., 2021). Also, nothing prevents extending the BIM-GIS integration or IFC and CityGML schema (Section 3) to cover also underground facilities (Wang et al., 2019a). The focus with underground facilities is more on predicting their maintenance needs, than surveying them frequently, as they are protected from the effects of weather.

Autonomous updating of city DTs
A digital twin represents a real scene in a digital way, and uses several types of data, such as historical data, real-time data, and algorithm models to simulate, verify, predict, and control the entire life cycle of the same real scene (Lv and Xie, 2021). This is our context, while the focus is on updating city DTs automatically from sensor data. It is good to mention that in our context the data acquisition is usually not autonomous, although autonomous systems can also be used as data sources. Autonomous in this context refers to data processing without human intervention. Sensor data, i.e., the new data, comes in the form of point clouds and imagery. The flow of information from raw data to high-level decision making can be sensor-to-sensor, sensor-to-model,   (Tomljenovic et al., 2015). Drones equipped cameras or lidars can also be used to acquire data (Image: Flaticon.com). (c) Laser scanning backpack for measuring for indoor 3D model, e.g. BIM, generation (Karam et al., 2021). (Color online).

Table 2
Sensor systems and their strengths colored in bold green. and model-to-model fusion (Liu et al., 2018). Same type of data can be straightforwardly fused together. For example, point clouds can be co-registered and re-sampled, even if the different mobile scanning systems yield different quality data (Lehtola et al., 2017). Model to model matching also serves as a straightforward way to fuse BIMs (Tran et al., 2019). Updating city DTs from point clouds, however, means fusing data with different modalities. This is where difficulties arise. Consider a BIM model consisting of elements, in Fig. 4. The elements of such a model can be parametrized and then fitted on top of a point cloud, giving a limited way to automatically update a BIM model (Rausch and Haas, 2021). This technique is limited to resizing and moving the existing BIM elements, but cannot deal with the removal of elements nor the addition of new ones (e.g. walls, windows, pillars). The same issue is present with image data (Zhang and Lippoldt, 2019) and when updating large-scale 3D city models (Bonczak and Kontokosta, 2019). Hence, difficulties arise because real world data from the sensors is more messy than e.g. planned and validated models. Straightforward geometry matching typically fails because it is highly susceptible to sensor noise, co-registration assumptions, and cannot respond to changes in a dynamic scene. We can say that monitoring even a normal urban environment poses insurmountable challenges for these traditional techniques, because humans move and interact with different objects as part of their normal life and then these humans or the moved objects are visible in essentially random part of the sensor data, creating large geometrical errors. However, if the elements and objects are first semantically labeled using artificial intelligence methods, and then matched smartly label-wise, this problem can be avoided (See Section 4.3). Hence, the problem requires the application of AI methods, so that the map updating can be done by first pairing the semantic elements and then their geometry.
Model validation is important, as we saw in Section 3, and it is equally important for updates. The validation of updated models can be conducted automatically by rule-based techniques (Ledoux, 2020), regardless of whether the models are created by humans or by computers. So, even though rule-based techniques are unreliable in computer-based urban model creation, both indoor  and outdoor, they excel in model checking. This is explainable by human architecture tending to prefer artistic forms serving aesthetic objectives, while model checking focuses on clear programmable objectives concerning the details of these models.
Therefore, the autonomous updating of city DTs needs to be conducted with the following steps This ensures that (1) object pairing can be established before geometrical analysis to overcome issues in using real-world data and (2) dynamic objects can be removed from analyses that focus on the immovable objects or structural elements. This sets a call for novel AI techniques.

Artificial intelligence methods
Artificial intelligence methods have shown prominence with manufacturing DTs (Rathore et al., 2021;Huang et al., 2021). In digital twins, the objects are represented by 3D vector models (also for cities, see Section 3). Vectorizing objects such as buildings out of the sensor data requires first the semantic recognition of the data pieces that represent these objects, while vectorization itself consist of localizing and generalizing the object boundaries.
Point cloud processing with AI uses deep learning methods for classification, to classify the extracted subsets into specific object categories (Fig. 5a), e.g. for ground use mapping; for segmentation, i.e., to semantically label each point on the scene by segmenting each subset of similar points into a group with the same label (Fig. 5b); and for object detection: to detect subsets of points that represent objects, such as assets from the data (Fig. 5c). These are not easy problems. In fact, the structure of point clouds is irregular and three-dimensional, two factors that restricted the use of deep learning methods until the quite recent development of PointNet (Qi et al., 2017). Shortly after, Thomas et al. (2019) presented a way to span regular kernels around the irregular points to enable the use of convolution operators, which have been previously found powerful in image analysis. A renaissance of deep learning on 3D point cloud data began (Guo et al., 2020a), focusing mostly on convolutional neural networks (CNN) and recursive neural networks (RNN). The three above-mentioned problems are connected. Classification and detection share similarities in the (philosophical) sense that the scene segmentation algorithm can be thought to operate as an multi-object detector (Plachetka et al., 2021), except that it has a priority on labeling everything rather than focusing on the correctness of the labels of one specific group. Hence, the scene segmentation is applicable in visualization applications, while the applications that require more specific and correct information, e.g. asset management    . 6. Cognition-based AI techniques for model updating. (a) A concept hierarchy is built while the sensor data is being captured, labeling objects, places, rooms, and buildings (Hughes et al., 2022). (b) The ground truth of a city scene, which (c) can be (partly) reconstructed by multiple robots traversing the streets. The matching of semantic labels of objects enables combining geometrical updates from multiple sources, i.e., data fusion (Tian et al., 2022). Note that roofs and high walls are left unseen (blank spots). (d) AI can be taught to find holes in the point clouds and to patch them with generated points using so-called inpainting techniques, even for complex objects (Väänänen and Lehtola, 2019). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) and construction planning, are better to rely on object detectors (Lang et al., 2019).
Vectorizing information from image and point cloud data by deep learning is initially described by Li et al. (2019) who present PolyMapper, a deep network built upon a CNN-RNN architecture. This branch of research focuses is on 2D and 3D modeling of objects using AI, e.g., for buildings from an airborne perspective (Zhao et al., 2021). AI algorithms can be taught to generate relatively simple models, such as building roofs, on top of point clouds, because these objects have a limited number of edges and roof faces (Wichmann et al., 2018). Still, learning and generating a variety of 2D and 3D shapes remains an open research field. It is an open research problem to decide where to draw this boundary, when the sensor data is noisy, and when the object is represented by a set of points. Another interesting branch of development is in Neural Radiance Field (NERF) multi-camera-view techniques, which offer ways to construct implicit surfaces from image pairs (Wang et al., 2021). In other words, with two RGB images, one can create a 3D surface that is a model for an object. This is achievable in real-time (Yu et al., 2021). Moreover, the model can be in form of triangle meshes and textures that is directly usable in graphics engines (Munkberg et al., 2022). Although NERF techniques are powerful, open questions exist in how the generated surfaces that are implicit, i.e., without holes, could be joined together and altered so that indoor modeling would be feasible. For example, doorways in buildings cannot be modeled with implicit surfaces, i.e., facades cannot have respective openings, and thus LOD4 models cannot be created. Also, the geometrical accuracy from images falls short from the one obtained from lidars, which sets a call for AI-based sensor fusion techniques.
Deep learning approaches could focus to imitating humans, specifically cartographers, in how to look at different data and create 2D and/or 3D models. However, the success of deep learning methods builds on the availability of big and detailed training data and a network architecture that can process these big and detailed data, while human activities are driven by intuition (Winiwarter et al., 2019). How to match these two seemingly non-compatible items so that computers could learn from humans is an open question. The same problem persists in indoor  and outdoor environments. Consequently, solutions are likely to emerge from cognitive approaches.
Cognitive concept hierarchy can be established digitally once the semantic labeling of the data is done (Beetz et al., 2018). This can happen in parallel to vectorization, or when the data is being acquired. In indoor scenes, sensor observations and detected objects are linked to rooms where they reside, see Fig. 6(a). And an ensemble of rooms make for a building. In outdoor scenes, objects are linked to the street where they reside, see Fig. 6 (b&c). Streets can be contained in, e.g., a city. The straightforward benefit of this is that data fusion from different sources can be successfully done (Tian et al., 2022). We see that this would be a key factor in enabling the automated updates from sensor systems for urban DTs (Section 4.2). In addition, human cognition is not the only option for organizing the concept hierarchy. Deep generative spatial models of environments can be learned by robots or computers using sensor data (Pronobis and Rao, 2017). The benefit of generative models is that they offer universality, because a multitude of observations (seeing a place) is encoded in a small piece of information (descriptor of the place, for humans e.g., ''a fancy hotel''). Because of this universality, generative models can be also used to cover imperfections in measured data, i.e., to do inpainting with AI (Väänänen and Lehtola, 2019), see Fig. 6.
Privacy issues emerge when high resolution data is acquired from city environments. Car number plates and humans fall under the definition of personally identifiable information (PII) of the General Data Protection Regulation (GDPR). These cannot be handled nor stored without appropriate permissions. However, by first detecting, for example the human faces (Kumar et al., 2019), and then technically anonymizing the detected data beyond recognition allows for the removal of personal information and biometrics. Anonymization of humans and cars is nowadays available in many software packages and systems, such as in mobile phones for protecting privacy of bystanders in mobile imagery (Darling, 2021).

Using DT in city decision making
Decision making in cities is typically related to investments (e.g. investing in a new autonomous service, Section 5.1), allocating money to city functions, or city planning. City DTs can be highly relevant for all of these three cases. This raises two essential topics: viewing DTs (Section 5.2) and interacting with them (Section 5.3).

Uncrewed and autonomous systems
Autonomous driving, shipping, and flight operations using uncrewed vessels are being developed with a vision that they would be used on a large scale. Their relationship with city DTs is two-fold. One the one hand, autonomous systems can use the information content of city DTs (Section 3) to operate. On the other hand, their sensors can be used to update the city DTs (Section 4). However, before this point can be reached there are multiple open questions on how such autonomous operations would otherwise impact the city. These questions can likely be answered by detailed simulations.
Simulations in robotics are used to design new systems, specifically, their control and perception properties. These simulations are physically detailed, e.g. ROS Gazebo, and can be thus thought as digital twins. In this line of thought, city DTs can be utilized to simulate and plan solutions for new robotic applications. This is technically feasible as, e.g., CityGML data can be used in ROS2/Gazebo simulator (de Haag et al., 2021). In a similar fashion, simulation by city DTs could help traffic centers and city councils in deploying more efficient Intelligent Transportation Systems (ITS) (Dimitrakopoulos and Demestichas, 2010), while also considering the city needs. Simulation fed by data collected from sensors distributed on the roads may enhance the current ways to regulate "traffic flows, providing end-users with greater information content and safety, as well as, qualitatively increasing the level of interaction between road users in comparison with conventional transport systems" (Rudskoy et al., 2021). In fact, city DTs may offer multiple benefits: • Autonomous driving becoming part of the smart traffic scheme, after safety issues are resolved and legal clarity established (Muhammad et al., 2020). City DTs could help for example to answer questions whether it would be plausible to partially isolate some part of the road system for autonomous driving, in order to clarify liabilities and reduce risks. • Drone deliveries are envisioned to be operated from a truck, i.e. the traveling salesman, that launches flying drones which cover the last meters to the doorstep (Murray and Raj, 2020). City DTs could help simulate the ecological, economical, and legal benefits and risks of such services. If longer drone flights are required, however, narrow dedicated flight ways could be planned in city DTs to minimize risks. • Port operations could be planned to optimize ship wait time and preceding steaming speed, which would reduce both fuel consumption and emissions (Olba et al., 2018). Autonomous perception systems on ships that could be used for autonomous updates are slowly gaining popularity but face multiple hindrances (Thombre et al., 2020). • Autonomous ITS based on city DTs could assist, by predictive modeling and real-time dynamic visualization, decision-makers in traffic centers to better support the intervention of essential services (e.g., police, ambulance, fire departments etc.) as well as to better serve citizens by reducing reaction time towards unexpected events and increase safety on the roads (Rudskoy et al., 2021;Hämäläinen, 2021).
In principle, robotic simulations can already be conducted at a large scale. For example, the OpenDrive standard defines a representation of the road networks to be applied in vehicle and traffic simulations. 5 Such applications have also been acknowledged in the 3D city modeling side, with e.g. further development of the transportation module of CityGML (Labetski et al., 2018).   (Peters et al., 2021), Image copyright 3D BAG by tudelft3d. Right: DT in VR (Image courtesy of xD Visuals Oy: xD Twin solution, https://www.xd-twin.io/).

Viewing DTs is a cartographic problem
One of the most obvious interactivity modalities for DTs is to view them, which immediately implicates cartography and geovisual analytics, as well as current practices in video game and computer animation visualization (Batty, 1997;Indraprastha and Shinozaki, 2009;Biljecki et al., 2015;Chandler et al., 2018;Lock et al., 2019;Noghabaei et al., 2020). Most cartographic data visualization to date has been 2D, but Digital Twins will typically require oblique, 3D views with panning, zooming, and rotating abilities (Halik and Kent, 2021;Abdelaal et al., 2022). Contemporary video game and 3D animation visualization is directly and immediately applicable to DT visualization, as it shares the same fundamental 3D coordinate geometry, lighting, and perspective view concepts. But while video gaming and computer animation generally present objects symbolized in a single, certain way (i.e., usually with colors and textures that might represent the given object photorealistically), DT visualization will also benefit from more abstract symbolization of the sort common in thematic mapping and geovisualization, such that graphic variables like color and texture can be used to symbolize object attribute properties, rather than how the real object would actually look to human eyes. A rich history of perceptive and communicative properties of graphical variations -how symbol choices like color or shape are typically interpreted by human viewers -exists in cartographic research (Bertin, 1983;MacEachren, 2004), as does an understanding of methods of interaction with digital maps, and how these enable and affect their use (Roth and Harrower, 2008;Roth, 2015;Roth and MacEachren, 2016;Tominski et al., 2021). These concepts will need to be brought to bear in symbolizing immersive, 3D views of DTs (Çöltekin et al., 2020;Austin et al., 2020;Newbury et al., 2021;Danyluk et al., 2021;Lee et al., 2021b;Ghaemi et al., 2022). Fig. 7 shows illustrations on how DTs could be viewed. The origins from GIS (baselayer map) and BIM (detailed building models) are both visible (Section 3). Conveniently, BIM models can be transformed into visualization models adaptable for a game engine, for straightforward viewing (White et al., 2021b). In addition, rendering 3D point clouds is also possible (Schütz et al., 2016). This means that there are little backend related technical challenges. Instead, cartographic visualization problems related to DT include: • Symbolization. How exactly things are drawn for the viewer, using what graphic variables, lighting conditions, textures, etc., and whether or not they are symbolized analytically, as in thematic mapping, or photorealistically, as in a faithful simulation of the real world. E.g., CAD models could be shown on the screens of excavators to show the location of existing pipes, and these pipes could be best shown either symbolized for what they carry (e.g., blue for water), or photorealistically. 5 https://www.asam.net/standards/detail/opendrive/.
• Scale. Whether or not DT models are viewed at 1:1 cartographic scale to simulate the real, embodied experience, or at smaller scales, such that models are viewed on tables like doll houses. E.g., city plans are typically visualized from top-down view at map scales smaller than 1:1. • Perspective. Models can be viewed using perspective, or in orthographic projections, and either choice supports diverse visual analyses in better or worse ways. E.g., Google Streetview offers the possibility to look at surroundings from the path of a vehicle, in first-person perspective view. • Interface usability. Panning, zooming, and rotating abilities need to be provided to viewers, and how these are made available needs to be considered in terms of usability. The characteristics of the device affect the interface design and its usability e.g., an augmented reality headset makes panning and zooming as natural as moving one's head, but also needs some gestural or vocal input abilities for zooming and other movements. Beyond viewing, interfaces also have to provide querying and settings manipulations. The interface to a visualization system will also include visual augmentations beyond the symbolized objects themselves, such as labels, text boxes, and icons, and the interactivity programmed into these elements needs to be usable as well. • Bring Your Own Device (BYOD) compatibility. A diversity of devices, from augmented reality headsets to traditional computers and smartphones, need to be able to participate in 3D DT viewing, and since these have different capabilities, viewing environments need to design for their diverse contexts.

Human factors
Discussion about human factors and how to ensure a reliable, comprehensible and safe manipulation of DT elements is still lagging behind, although, as we saw in Section 5.2, there has been a lot of effort in designing and deploying technically high fidelity DT.
Certainly, there is a growing interest in using DT as interactive tools to support (shared and augmented) decision making, co-design, simulation and training (Bilberg and Malik, 2019). However, as recently recognized by Liu et al. (2022) researchers have done only little so far to agree on a framework regarding the implementation of intuitive and natural mode of interaction with DTs. Currently, clear guidelines are missing in how to implement the interaction with DTs. What is also missing are frameworks to exhaustively evaluate the key elements of usability (namely, efficiency, effectiveness and satisfaction in a specific context of use) defined by the International Standard for Organization (ISO9241-11, 2018). Regulators are pushing the definition of a technical framework for DT in the field of manufacturing, however, with little or no interest on the impact on the potential stakeholders. For instance, the recently proposed ISO 23247 for the application of DT in the specific domain of manufacturing defines DTs as ''fit for purpose [. . . ] data element representing a set of properties of an observable manufacturing element [. . . ] with synchronization between the element and its digital representation'' (ISO23247-1, 2021). Despite that the ISO 23247 generically refers to the usability (ISO 9241-11) of DT solutions, it does not provide any vision on how such solutions should be evaluated as ''fit for purpose'' and used appropriately by different operators. A similar focus on the implementation can be identified in the recent perspective proposed by the International Electrotechnical Commission technical committee for Internet of Things and Digital Twin that suggests that a DT could be considered a digital entity that aims to represent a target entity with data connections that ''enable convergence between the physical and digital states at an appropriate rate of synchronization'' (JTC1-SC41/261/C, 2021). This committee is proposing that a DT emerges from the interplay of three key elements: 1. technical aspects and computation, i.e., semantics; 2. symbols and relationship between concepts, i.e., semiotics; and 3. pairing of semantic and semiotic elements, i.e., morphisms.
Nevertheless, again, what is left outside of the discussion is human factors and the involvement of stakeholders in the generation and assessment of such a system. In domains different from manufacturing, in which DT solutions are currently not regulated, researchers are exploring how DT should be designed and assessed to maximize the benefit of such systems. For instance, models are emerging to integrate data and utilize DT to support service maintenance (Steinmetz et al., 2021) and management of transport systems (Guo et al., 2020b). Overall, however, researchers are rarely looking at the usability aspect of interactive DT systems, and when they do the tendency is to adopt different modalities of assessment, for instance, performing a heuristic review (Sefrin et al., 2021) or by involving participants in the assessment of concepts or prototypes of a DT adopting various forms of user testing or using standardized scales of satisfaction to investigate the reaction of the users after the interaction (Kalantari et al., 2022;Yeom and Woo, 2021).
The lack of a well-defined methodological framework to assess DTs brings issues in terms of bench-marking the usefulness of the different solutions by also limiting the emergence and exchange of good practices. The risk is that each domain will adopt different, if not diverging, approaches to assess the quality of DT exposing stakeholders to potentially unsafe ecosystems. While regulators and practitioners are pursuing the right direction, there is a growing need of a cross-sectors definition of DT and for consensus on how to assess and ensure safety in the usage of interactive DT systems.
Researchers agree that DT solutions can present several advantages for their stakeholders, such as offering a way of designing digitally, before the real implementation, or as ways to train people in performing tasks or making decisions before the impact with reality Shahzad et al., 2022). Concurrently DT systems are seen as a way of involving stakeholders in co-design and in reviewing systems to define how to implement modifications in complex ecosystems like cities (Lee et al., 2021a;Du et al., 2020;Dembski et al., 2020). Despite the tendency among experts is to aim for very accurate DT solutions that mirror the reality, it should be noted that the necessity for recreating identical (high fidelity) digital twins depends on the purpose of the DT, and fraternal (mixed level of fidelity) twins could be also enough in many cases, for instance, in the context of training of specific procedural skills (Borsci et al., 2015;Mao et al., 2021) by offering operators a more efficient and effective process of skills acquisition compared to traditional training. High fidelity of DT is, however, essential for safetycritical tasks like, for instance, systems of decision-making management during disasters (Deren et al., 2021). In an ideal world, DT systems are not going to be used only by expert decision-makers, and there is a growing need of involving lay users and potential beneficiaries of these powerful systems in the co-design and assessment of the interaction. This could realize the possibility to, for instance, implement an effective practice of universal design, intended here as universally participated design (Al-Kodmany, 2000) by involving a large cohort of citizens with different needs and levels of individual functioning in open experiments of user-driven adaptation of their social shared environments by rethinking their urban and services organization, the accessibility of the spaces and transport organization. This potential process of changing the virtual to affect the real world, and supporting policymakers in their decision regarding the development of urban spaces was recently tested by White et al. (White et al., 2021a) in the Docklands area of Dublin with roughly 30 participants providing feedback about the potential construction of a new building. Future implementation of such co-design paradigm supported with intuitive and usable interfaces could be used for including a wider group of citizens in decision-making and co-creating and revamping designs of urban environments. It could also be used to alert about potential issues in specific areas as well as a way to communicate and instruct citizens about the best practice during crises or natural disasters.

Conclusion and benefits
Digital twins have many forms, as we have seen, yet they share the same identity under the DT umbrella. DTs first emerged in manufacturing, where their role is acting as a detailed digital model that can be duplicated in physical copies. There, the economic benefits are reaped from detailed planning. In construction, DT techniques provide similar benefits, with the addition that the information about the physical landscape of the building can be brought in the planning. Facility management, including building maintenance and asset management, also gains economic benefits from DT techniques, but needs periodic updating of DTs, see Fig. 1. City DTs bring in even more complexity in the form of both technical issues, such as the integration of GIS and BIM, and human factors, such as that it becomes unclear what purpose do the city DTs ultimately serve.
Technical interoperability in the form of integration of GIS and BIM data for city DTs supports co-creation, management, and data sharing among various stakeholders (Section 3). Clear communication between specialists in the municipalities supports fair decision making processes in the design phase and can prevent mistakes. In addition, DTs representing the city both above ground and underground provide reliable information to test different scenarios such as preparing for disasters or other societal challenges that directly impact the living environment. Further efforts should be made to strengthen the connection between DTs and official decision-making, including cadastral applications, for example by advancing automated model validation methods.
Cities never stop changing, and therefore city DTs also need to be constantly updated with appropriate autonomous techniques. The role of AI is to enable the automated updating of the city DTs from (crowd-sourced) sensor data (Section 4). This needs to be done in a manner that preserves the fidelity of these DTs. Technical challenges include deciding what parts of the DTs should be updated given a new set of data, and how these updates are incorporated into existing DTs. Standards are required to allow for this bottom-up updating (Section 3). Open interfaces and operational clarity are good ways to attract third-parties in offering their data for automated city DT updating. Automated updating creates new opportunities, some of which are visible and some, we argue, are not yet visible. For example, we see that the feasibility of robotic applications may be tested in DTs, and robotic systems may serve in updating the city DT, but how far can the interaction between a (robotic) smart city and humans be taken is yet to be seen. Another visible example is that change in natural elements can be tracked, e.g. snow or water on the streets and green growth, as well as temporary settings such as construction sites and related exceptional traffic arrangements. Yet, monitoring events as they happen is different than foreseeing those events and their impact, and planning ahead. It is yet to be seen how impactful DT techniques can be when decisions are made on preparations against simulated risks.
We may ask, who is benefiting from city DTs. The primary beneficiary should be the city, based on the city's needs (Section 2) that follow from the everyday routines of city functions, namely, supporting the creation of a better living environment and providing services for the citizens with limited resources. City DTs can serve these needs in several ways that culminate in planning activities and involving the citizenry. City planning and modeling (Section 3) is connected to many applications that aim for improving the living environment of the residents or the companies operating there, thus making these actors of the civil society central stakeholders of these activities, and therefore of any DT techniques that support these activities. Involving members of the civil society in decision making is important to enrich the decisionmaking process of the city organization with insights from a larger group of stakeholders and beneficiaries (Section 5).
We may also ask, who could benefit from city DTs. Using maps is an established way to plan human activities, be it constructing something new or planning activities or processes. The idea of such a map is that (1) an environment is visualized so that all the relevant information is shown and that (2) all persons participating in the process share the same information. If DTs are used for joint planning purposes, they must be understood as maps (Section 5.2). Now, some of the city's needs are being answered with the help of DT techniques (Section 2). Yet, for each separate item/need, there is a separate DT. This divergence of DT manifestations is driven by the underlying specific (sectorial) needs. Therefore, we also expect divergence in DTs developed in different cities, as the needs across cities vary. However, we foresee a future where the development of city DTs turns into a converging phase: overall planning and risk assessment derived from a joint DT could benefit a large group of civil society stakeholders. The approach here would not be a gargantuan effort to combine these under a single technological solution, but rather to develop a modular architecture with collaborative interfaces. Successful urban DTs are founded on solid data infrastructure, including appropriate data organization and standardization (Section 3). Thence, we see that standards should be developed in the direction so that it would be technically possible to have a back-end DT infrastructure that encompasses all information. The back-end could then be used to derive different frontend representations for different users (as in Fig. 2). This type of approach would bring together the benefits of joint planning and the flexibility to use only the information that is essential.
From the human factors point of view (Section 5.3), the diffusion of interactive, usable and safe to use DT solutions could enable new forms of (more or less immersive) collaborations between city stakeholders (e.g., decision-makers, city planners, citizens etc.) paving the way for new approaches of rethinking commune spaces, acknowledging the needs of citizens and minorities and providing a way to constructively involve stakeholders in the identification of unmet or hidden needs. This could open up the possibility for citizens to have a digital experience of their city on top of their daily real experience, crosspollinating real and virtual worlds in a positive cycle through which citizens can actually modify or adapt their digital common living space, and this digital transformation when accepted by the community could be then implemented in the real world. Concurrently, municipalities could share their future plans letting the citizens view and even explore new buildings and viewing the impact of these plans such as changes in the sunlight coming to neighboring properties. DT techniques could help raising awareness about future changes in the city before their implementation and guide this development towards what is needed, be it new green areas, attractions, or commercial sites. Nevertheless, we see that it is necessary to invest in identifying approaches and rules to make DTs available, interactive, and editable by all to benefit shared decision-making processes about future cities, in order to support this potential positive loop between digital and real worlds.
Open questions remain. Cyber-security challenges step in when the DT must be made not only available to third parties but preferably updateable by them. Incentives for third parties to provide DT updates from their sensor data must be established. When the third-party AIprocessed data comes in, quality control measures must be in place. In other words, the AI needs to be embedded between the official city organization and the civil society (Borsci et al., 2022).

Declaration of competing interest
One or more of the authors of this paper have disclosed potential or pertinent conflicts of interest, which may include receipt of payment, either direct or indirect, institutional support, or association with an entity in the biomedical field which may be perceived to have potential conflict of interest with this work. For full disclosure statements refer to https://doi.org/10.1016/j.jag.2022.102915. JPV is employed at Forum Virum Helsinki, which is a company owned by the city of Helsinki.