FROM SUB-OPTIMAL DATASETS TO A CITYGML-COMPLIANT 3D CITY MODEL: EXPERIENCES FROM TRENTO, ITALY

: More and more cities are moving towards the creation and adoption of three-dimensional virtual city models as a means for data integration, harmonisation and storage. To this purpose, CityGML is an international standard conceived specifically as information and data model for semantic city models at urban and territorial scale. The automatic building reconstruction process, up to the Level of Detail 2 (LoD2) can be achieved nowadays nearly completely automatically and with a high degree of accuracy, provided that high quality input data (e.g. a dense DSM obtained from LiDAR or dense stereo-matching with 10÷15 pt/m 2 or better) are provided. This paper deals indeed with the creation of a CityGML-compliant, LoD2 city model starting from sub-optimal datasets and tries to address some of the issues tied with the use of sub-standard data – which however, represents a quite common case in “real life”. As study area, a part of the city of Trento, in the northern Alpine region of Italy, was chosen and contains about 2300 buildings of different typology, use and construction year. Only existing datasets were gathered and used.


INTRODUCTION
More and more cities are moving towards the creation and adoption of three-dimensional virtual city models as a means for data integration, harmonisation and storage.The beneficial effects tied to a unique and spatio-semantically coherent urban model (Stadler and Kolbe, 2007) are multiple, as well as the possibility to exploit such a model for further, more advanced applications ranging from urban planning, noise mapping (Czerwinski et al., 2007), augmented reality, utility network management (Becker et al., 2013) to energetic simulation tools (Krüger and Kolbe, 2012).To these extents, CityGML (Gröger and Plümer, 2012) is an international standard conceived specifically as information and data model for semantic city models at urban and territorial scale.It is being adopted by more and more cities all over the world (CityGML).In general terms, the fully (or semi-) automatic creation of a virtual city model requires the heterogeneous input datasets to be sufficiently "clean" and properly structured, before moving towards the actual data integration process.More specifically, when it comes to the geometric modelling of buildings, they are generated today nearly fully automatically in LoD2, provided vector footprints (generally as shapefile) and a sufficiently detailed (at least 4 pt/m 2 , better if 10÷15 pt/m 2 ) DSM are given as input.An overview of techniques and methodologies for automatic and semi-automatic 3D building reconstruction is given in (Macay Moreia et al., 2013).If however these minimum requirements are not met, the creation of a city model may not be possible at all, or it may not meet an adequate accuracy, or the manual effort may simply be too huge to make the whole operation feasible or worth.This paper deals indeed with the creation of a CityGMLcompliant, LoD2 city model starting from sub-optimal datasets and tries to address some of the above mentioned issues.As study area, a part of the city of Trento, in the northern Alpine region of Italy, was chosen and contains about 2300 buildings of different typology, use and construction year.Only existing datasets were gathered and used.This paper represents therefore an example of a "real-world" situation, where decision and strategies need to be met according to the given, often sub-optimal input data.Section 2 contains the description of the study area and a list of the several datasets used as input, their characteristics and their peculiarities.Section 3 is dedicated to the creation of the 3D city model.The different data integration strategies are discussed, both for the spatial and the non-spatial data.In section 4 the results are presented, before the conclusive remarks and the future improvement contained in section 5.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4, 2014ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, China This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-4-7-2014 Figure 2 Examples of spatial datasets: a) Building footprints from land cadastre map (in pink) and from topographic map (in red).Planimetric misalignments between corresponding features, and with regards to the underlying DSM (in grey), are clearly visible; b) building footprints and sub-footprints around the main square of Trento.The church is a clear example of a complex structure represented by means of sub-footprints; c) example of home address points falling outside the corresponding building footprint; d) the home address points after semi-automatic correction.

Test area
The test area is located in Trento, a city of about 115.000 inhabitants in the Trentino-Alto Adige/Südtirol region (Northeast of Italy).It lies on the banks of the river Adige, in the homonymous valley and it is surrounded, mainly east and west, by the Alps, whereas the Adige valley stretches mainly north to south.The test area is approximately 3.5×1.5 km wide, it contains circa 2300 residential, industrial and commercial buildings, with varying sizes and geometry complexity.It is moreover located along the temporal city-growth direction, spreading from the older central and densely built-up areas (building blocks and multi-family houses) southwards to the immediate outskirts, more recent and less densely built-up, with single and terraced houses.This allows therefore for a better distribution of building typologies and year of construction.

Spatial data sources
Heterogeneous data were collected, mainly from the Autonomous Province of Trento (PAT) and Municipality of Trento.No new data (spatial and non-spatial) were acquired, on purpose, in order to simulate as much as possible a real case scenario.In the following, the data sources will be classified into spatial and non-spatial ones, for better reading.All spatial data, where applicable, were delivered already georeferenced (WGS84/UTM32N) and consist of: 1.A raster-based DSM (and the resulting DTM) derived from a LiDAR flight in 2006/7 at 1×1 m geometric resolution.Height accuracy for the original LiDAR data is given as σ z =15 cm for the DSM, and σ z =30 cm for the DTM.It must be noted that, although geometrically accurate, the DSM is a bit outdated (new buildings after 2007 are missing) and its resolution is below today's standards when it comes to automatic buildings reconstruction from a DSM. 2. A set of 17 land cadastre maps in vector format covering the whole area of the city of Trento at nominal scale of 1:2000 and provided as shapefiles.The shapefiles contain, among the rest, the building footprints, each provided with a unique land parcel ID.Several misalignments and distortions can be found all over the 17 datasets, when compared to the LiDAR DSM (considered as ground truth).
A simple visual inspection suffices to detect planimetric displacements, sometimes as high as 10 m or more.An example is given in Figure 2a.3. A vector topographic map, at nominal scale of 1:1000, containing also the building footprints (sometimes also subfootprints) and a rough classification of the building types (residential, religious, commercial/industrial, etc.).They show no significant planimetric misalignments, however they are over-segmented (i.e.building parts and other details are present) and they lack any information with regard to the land cadastre IDs or to those of the registry of buildings (see later).An example is given in Figure 2b.4. A set of digital orthophotos with an average GDS of 12 cm, as well as a set of oblique aerial images.The images were used mostly for visual reference and check of the reconstructed buildings in the study area. 5.A point-based shapefile containing all home addresses (street name and home number).When overlaid to either the land cadastre or the topographic footprints, some points do not fall within (i.e.spatially intersect) the corresponding polygon feature, they instead as they sometimes refer to the garden's gate (or somewhere else) and not to the actual house door.An example is given in Figure 2c.6.For a limited number of buildings, the 2D vector maps (plus attributes) of the interior rooms.They are however not georeferenced, thus they are provided only in a local reference system.

Non-spatial data sources
The non-spatial datasets used in this work consist of: 7. Data from the register office regarding number of residents, as well as the number of families per home address.8. Data from the register of buildings, containing all information about each property unit (e.g.owner, number of rooms, surface, property category -residential, commercial -etc.).It is important to stress that there is not a direct correspondence between the land cadastral entities and those in the register of buildings.One can think, just as a simplifying example, of a building footprint (contained in the land cadastre) having multiple flats (contained in the register of buildings).In cardinality terms, there is not a 1:n relation, but a more general m:n one (see section 3.3 for more details).9.A list of restoration, transformation and refurbishment activities carried over the time on the buildings (e.g.roof rebuilt, windows changed, etc.).Each record is identified also by means of address data and the land cadastre ID. 10.A list Energy Performance Certificates (EPCs) for some residential property units inside the study area.Each record is identified also by means of address data and the land cadastre ID.

3D CITY MODELLING
The creation of the 3D virtual city model in the study area was divided into three operative steps.In the first one, all necessary dataset were homogenised and integrated in order to obtain a CityGML-compliant model, with particular attention to the semi-automatic generation of the 3D building geometries, which was carried out in the second step.
The third part dealt with the so-called "enrichment" of the geometric model, in that the remaining datasets were used and integrated into the 3D model.
Although applied to the study area, all integration strategies were conceived to be replicable in any part of the city.

Step 1: Spatial data integration
Given the need to keep the cadastral information tied to each building footprint, different approaches were tried to adjust the land cadastral maps.Due to the sensible distortions mentioned before, the dataset cannot be used "as is" and must be transformed properly in order to match the underlying DSMthis is indeed (understandingly) a strict requirement for any automatic 3D reconstruction process of the buildings.The topographic map, instead, shows no relevant distortions with regards to the DSM.Therefore, an initial adjustment attempt was tested on the cadastral map by means of rubbersheeting, using the topographic map as reference.
In order to perform an initial quantitative characterisation of the distortions, circa 100 couples of pass points were first selected manually between common features in all 17 cadastral maps and the reference topographic map.An analysis of the distortion vectors showed in general no significant regular pattern: distortions are irregularly distributed, both in terms of direction and magnitude.The final accuracy after rubber sheeting, measured on a set of selected and evenly displaced features (building footprints) was on average circa 2 m, therefore not enough.Slightly better results could be achieved by raising the number of pass points, but at unacceptable costs in terms of temporal resources needed.Moreover, implementing automatic procedures for matching of corresponding features, as described for example in (Beinat et al., 2004a;Beinat et al., 2004b), was out of the scope of this work, also considering that the cadastral office of the Trento province is itself adjusting the maps (although the ones for Trento were not all done at the time of this work).
When it comes to the topographic map, the advantage of negligible map distortions and displacements has already been mentioned.In addition, having multiple sub-footprints polygons in case of particularly geometrically complex buildings leads to a less approximate 3D building reconstruction in the successive phases.A visual example is given in Figure 3, where the same building in represented in LoD1 and generated from a single footprint (cadastral map) and multiple sub-footprints (topographic map).
Figure 3 Comparison between 3D models (at LoD1) of the same building, obtained from a single footprint (left) and from multiple sub-footprints.
For the above mentioned reasons, it was eventually chosen to adopt a hybrid approach: to merge and integrate the cadastral and the topographic datasets in order to use on one hand the topographic geometries and on the other hand the cadastral ids for each building.
In order to achieve it, two steps had to be carried out: some footprints from the topographic map had to be split according to the cadastral parcel subdivisions.This operation was carried out manually, however on a relatively small number of cases (circa 5% of the building).In the second step, the cadastral IDs were assigned to the topographic (sub-)footprints by means of spatial overlay operations.In order to take planimetric distortions into account (and therefore inexact overlaps between footprints polygons), a set of classification rules, based on ratio of area overlap, was implemented.ID assignment was carried out nearly completely automatically, with manual assignment required on circa 7% of the cases.An example is shown in Figure 4.The final integrated map contained nearly 3050 (sub-)polygons to describe circa 2300 buildings with regards to the study area.
Parent and children IDs were used to define "part-of" relations, in case of multiple sub-footprints belonging to the same cadastral ID.The dataset was finally exported as shapefile to be used in the successive 3D modelling phase.
Figure 4 Example of overlay and intersection between land cadastre parcels (red lines, with IDs) and topographic map polygons.The pink polygons need to be split, as they cover multiple cadastre IDs.Cadastral IDs are assigned automatically to the polygons the other polygons, depending on the ratio of overlapping area.

Step 2: 3D modelling
A schematic representation of the steps required to reconstruct the 3D building geometries is presented in Figure 5. Once the building footprints were prepared, they were used together with the DSM as input for the BuildingReconstruction 2012 software by VirtualCitySystem GmbH (B-REC), which allows for automatic reconstruction -and for some manual editing functions -of 3D buildings and their export as a CityGMLcompliant files (or as 3D shapefile).
Figure 5 Schematic representation of the steps required to reconstruct the 3D building geometries starting from a DSM and the vector-based building footprints.
Due to the DSM resolution (1 pt/m 2 ) being much lower than the recommended one (≈10÷15 pt/m 2 ) and even lower than the minimum required (4 pt/m 2 ), a high degree of accurate automatic reconstruction could not be achieved.Therefore, around 40% of the buildings had to be edited manually, as they were partially incorrectly modelled, or not rebuilt at all.Once the manual editing was done, a global height RMSE of the all buildings with regards to the input DSM yielded circa 0,7 m.By means of the 3DcityDB tools (3DcityDB), the reconstructed buildings were imported into a PostgreSQL 9.2/PostGIS 2.0 database.
For some buildings, it was possible to reconstruct and integrate fully automatically also the interior rooms, starting from a new spatial and non-spatial data format for the registry of buildings, which has been officially adopted in December 2013 in the autonomous provinces of Trento and Bolzano/Bozen.In this case, FME Desktop by Safe Software was used for all data transformations and integration steps (Fronza et al., 2013).An example is given in Figure 6.
When it comes to the dataset containing the points with the home addresses, those points not intersecting the building footprints were corrected semi-automatically, in order to guarantee a clear assignment of each address to the correct corresponding building polygon.In some cases, uncertainties had to be solved manually using as reference Google Streetview or (in few cases) by direct observation on site.

Step 3: Non-spatial data integration
Once the geometric (and semantic) generation of the buildings was completed, the city model was enriched with the remaining datasets.Several m:n relations, as well as diachronic inconsistencies, had to be identified, quantified, and taken care of, some automatically, some manually.
In general, for all non-spatial dataset a general harmonisation process was carried out, in that all semantic and structural incompatibilities had to be identified and taken care of.Typical examples consist in table fields with different names but containing analogous data, or different formats adopted to store addresses, etc. Attribute mapping rules were therefore defined.Data from the registry office (number of residents, number of families per home address) was coupled with the home addresses data (and eventually aggregated up to building level).
With regards to the data contained in the registry of buildings, the most time-consuming task was to solve the n:m relations for some buildings.Keeping in mind the differences between the land cadastre units and those in the registry of buildings mentioned before, in most cases for each cadastre polygon one or multiple property IDs exist.In these 1:1 or 1:m cases, the assignment can be performed automatically.This case was quantified in circa 94% of buildings in the test area, and is conceptually exemplified in Figure 7 (left).
In the remaining 6% of the cases, two or more distinct cadastre polygons share the same ID.Hence which properties belong to which polygon is a problem that can be solved only manually, as no other reliable secondary keys (e.g.addresses, buildings class) can be used with certainty.A conceptual example is given in Figure 7 (right).It must be added that this is actually a wellknown issue for which no definitive (automatic) solution has been proposed so far.
Once the m:n relations were solved, the registry of buildings dataset could be queried to extract and automatically assign the number of above and underground storeys, the number of rooms, an average value for the building use (residential, commercial, industrial, etc.).Finally, all available data regarding the "history" of the building (construction date, restoration, refurbishment, etc.), as well as the Energy Performance Certificates (EPCs) were integrated and linked to the corresponding building.
From the 3D database, further values like the building volume, the surface of the outer and shared walls were computed and integrated as attributes as well.
All data were imported directly into the PostgreSQL database or linked by means of external references.External links to Google Streetview, OpenStreetMap as well as to the WebGIS of the city of Trento (providing oblique aerial imagery) were also added.
Figure 6 Automatic reconstruction of the interior rooms of a building, and successive integration in CityGML as LoD4 geometries.
Visualisation in Google Earth of the exported dataset with the accompanying attributes.
Figure 7 Example of cardinality problems encountered during integration of cadastral cartography with the registry of buildings.
While for most of the cases (94%) a direct assignment of the units in the registry of property can be carried out automatically due to 1:1 or 1:n relations, the remaining 6% have to be solved manually.

RESULTS
An example of the integrated model, exported in kmz and visualised in Google Earth is shown in Figure 8.
For each building, the user can click on the corresponding geometry and retrieve information, which is presented either directly in the balloon, or can retrieved online using a normal Web browser.This is the case of the tables containing the details for the "history" of the building, the registry of buildings, and the Energy Performance Certificates, which are accessed by means of a standard Apache/PHP server.
The user can further explore, compare (and check) the data visually by connecting to Google Streetview, OpenStreetMap or similar web services.
For each step during the data integration and harmonization process, the effort in terms of time and resources was recorded in order to provide, at the end, an (approximate) quantitative reference and to help estimating how much it would cost to extend the approach to the whole city.
For the spatial data integration part, circa 6 man-months were necessary, while for the non-spatial data integration, approximately 4 man-months were required.According to the experience made in the test area, the global time necessary to model the whole city of Trento can be estimated to be the tenfold.
Figure 8 Visualisation in Google Earth of the 3D virtual city model of the study area in Trento.The main attributes for each building can be accessed by means of balloons.Balloons contain also hyperlinks to external data sources (see next Figure 9).
Figure 9 Example of information retrieval for a building: list of restoration, transformation and refurbishment activities carried over the time on the selected buildings (left); list of Energy Performance Certificates (EPCs) linked to the selected building.

CONCLUSIONS AND OUTLOOK
This paper has dealt with the creation of a LoD2 CityGMLcompliant 3D virtual city model for a part of the city of Trento, Italy.One of the goals of this work was to test the overall feasibility of the project, given the sub-optimal input datasets used for the geometric 3D modelling.In addition, attention was paid to the resources (namely: time) required with regards to the results obtained.The concluding remarks can be summarised in the following points: • Although the input data were sub-optimal, a sufficiently accurate city model (with regards to buildings' geometries) was successfully generated with a global RMSE of 0,7 m.Therefore, creating a LoD2 3D virtual city model is indeed possible.
• Although limited to the study area, the approach can be extended to the whole city using the same data sources, as no study-area specific solutions have been developed.• The integration and harmonisation of data has led to mutual benefits for each newly added dataset, reducing redundancy and inconsistencies and adding more controls to check data quality.• The test area has led to acquire experience how to change and better structure existing data to facilitate future integration • Although the initial effort to integrate all data was not negligible, it fully complies with the concept of "do once, use many".This means that any investment required for the 3D model creation would bring benefits to multiple organisations willing to adopt it as a standard platform for data organisation and sharing.• The most time-consuming task consisted in the preparation of the building footprints datasets and in the 3D modelling, due to the low resolution DSM.In both cases, better and updated datasets are being prepared: the cadastral maps are being progressively updated by the cadastre office, while a high-resolution DSM from a new LiDAR flight (≥10 pt/m 2 ) should be available in the course of 2014.This means that the time required could be potentially reduced significantly.• Once the cadastral maps are updated, it will be possible to use them directly as footprint source.However, the hybrid approach presented in this paper, although more complex, allows reaching a better 3D reconstruction due to the subfootprints contained in the topographic map.Which approach should be finally adopted is a decision to be evaluated also according to the set of applications the 3D model will be created for.• As an application example, the enriched semantic city model was already further used as input to estimate to energy heating demand for each residential building in the study area, thus serving also as basis for other an energyrelated simulation application (Figure 10), similarly to the approach described in (Kaden and Kolbe, 2013) • In near future, integration with the existing datasets regarding the PV-suitability of the roofs, which were obtained in a previous project (Agugiaro et al., 2012;Nex et al., 2013) will be carried out, too.

Figure 1
Figure 1 Position of the city of Trento in Italy with the extents of the study area (Image source: Google Earth).