Multi-unit building address geocoding: An approach without indoor location reference data

Accurately mapped locations within multi-unit properties are useful for several organizations in today's society. Published work on geocoding methods either require detailed location reference data or does not apply to multi-unit buildings. In this research, a generalizable method is realized to map apartment addresses to their explicit locations without access to indoor location reference data based on publicly available address-and geospatial-building information. The performance of this approach is measured by conducting a comparative study between a linear interpolation baseline and gradient-boosted decision trees model. The proposed method can successfully geocode addresses across different building shapes and sizes. Furthermore, the model significantly outperforms the baseline in terms of positional accuracy proving the feasibility of approximating apartment locations by their address-and geospatial-building information.

Currently, most Geographic Information Systems (GIS) applications like Apple Maps and Bing Maps are still two-dimensional (2D). Three-dimensional (3D) systems like Google Earth only show building outlines and do not have the level of detail required to locate places within a building. Difficulties might arise when figuring out specific locations as names and numbers might be inconsistently distributed across a building, making it unclear on which floor and where a unit is located. This can be inconvenient for delivery personnel or life threatening for residents when first aid is required.
Published work on geocoding systems has been limited to the same types of methods and either requires detailed location reference data or only applies to streets and is therefore not applicable to multi-unit buildings (Zandbergen, 2008). This research realized a generalizable method to map apartment addresses to their explicit locations without access to indoor location reference data using linear interpolation and gradient-boosted decision trees.

| Research question
To what extent can gradient-boosted decision trees outperform geocoding addresses in multi-unit residential buildings in terms of positional accuracy using address and geospatial building information compared to linear interpolation?

| Sub questions
The following questions will contribute to answering the main question: 1. To what extent do gradient-boosted decision trees outperform linear interpolation for floor-level prediction?
2. To what extent do gradient-boosted decision trees outperform linear interpolation for on-floor location prediction? Furthermore, the following questions will be used to answer each sub-question: a) How does linear interpolation perform in terms of mean absolute error (MAE), root mean squared error (RMSE), maximum error, and precision when compared with the ground truth? b) How do gradient-boosted decision trees perform in terms of MAE, RMSE, maximum error, and precision when compared with the ground truth? c) How do the gradient boosted decision trees perform compared to linear interpolation? d) What features play the most influential role?

| RELATED WORK
This section will give insights into the research gap where existing geocoding methods fall short as well as address state-of-the-art models used to solve problems with similar data.

| Geocoding methods
The published work on geocoding systems has been limited to the same set of interpolation methods and reference dataset methods for over two decades. The work of Zandbergen explains and compares these methods consisting of linear street segment interpolation, parcel polygon determination, and address point mapping. Recent work focuses on the improvement of the reference datasets required by the latter two methods. The work uses state-of-the-art machine learning models to either optimize address naming to improve address point mapping (Fize et al., 2021;Lee et al., 2020;Matci & Avdan, 2018) or learn from images to more accurately identify parcels and other shapes for polygon mapping (Laumer et al., 2020;Wang et al., 2018;Yin et al., 2019). No work that suggests another method to geocode addresses in the absence of such detailed reference data has been found and neither method can accurately geocode addresses located in multi-unit buildings.
Lee proposed a solution to geocoding addresses in multi-unit locations. The first step is to determine the correct hallway of a unit within a building based on the corresponding address range. Next, the distance from the hallway starting point to an approximate unit location is calculated using linear interpolation. Finally, a user-specified offset is used to further reflect on which side of the hallway the address is located (Lee, 2004). Lee subsequently proposed a more accurate addition to this method by mapping the approximated location to a known existing unit location (Lee, 2009). These methods both rely on the availability of detailed 3D building models as reference data to identify the correct hallways and/or existing unit locations.
Since there is no method that can reliably geocode addresses in multi-unit locations without the use of 3D detailed reference data, this research seeks to address that gap in the literature.
Considering the absence of indoor location reference data, state-of-the-art machine learning work in an architectural context shows the effective use of general adversarial networks to generate floor plans (Chaillou, 2020;Nauata et al., 2021). However, these methods are either not constrained to adhering to a specific number of rooms or staying within certain building shapes. Hence, these models are unsuitable for generating indoor location reference data.
Following the work of Zandbergen, the overall quality of any geocoding result can be characterized by completeness, positional accuracy, and repeatability (Zandbergen, 2008). Completeness (also referred to as match rate) represents the percentage of records that can reliably be geocoded. Addresses that cannot, by any type of reference, be linked to a certain street or area, cannot be geocoded and therefore lower the completeness score for a particular method. Furthermore, positional accuracy indicates how close each geocoded point is to the "true" location of the address. Estimates of "typical" positional accuracy errors for single-unit residential addresses range from 25 to 168 m. This broad range can largely be explained by the differences between urban and rural residential areas. Finally, repeatability is used to compare different geocoding systems by observing how sensitive the geocoding results are to variations in the geospatial input data.

| Explainable supervised regression
In the Netherlands, apartment addresses in multi-unit buildings can intuitively be geocoded by the logic of their house number. These house numbers are unique identifiers along a street segment and can either be a single number or a combination of two numbers (a building number with a unit number). Several municipalities have policies concerning the naming and numbering conventions of addresses in multi-unit residential buildings (Gemeente Breda, 2012;Gemeente Heerhugowaard, 2015). However, not all municipalities established such policies, and buildings built before the establishment of a given policy might have used different numbering rules than after.
Alternatively to the aforementioned interpolation geocoding method, regression can be used to solve interpolation problems (Szymanowski & Kryza, 2012). This allows for more complex models to potentially learn policy and construction year patterns. Recent work by Luo et al. (2021) addresses the favorability of traditional machine learning approaches over complex deep learning methods in the context of supervised regression with heterogeneous tabular input data. Furthermore, recent work conducting comprehensive comparisons of state-of-the-art interpretable and uninterpretable models on several datasets reveals that CatBoost's implementation of an ensemble algorithm for gradient boosting on decision trees outperforms all other models on average (Bentéjac et al., 2021).

| METHODOLOGY
In this section, usage of the two chosen methods is argued, after which their implementation and a new location representation is explained. Finally, the evaluation process is described.

| Interpolation baseline
All state-of-the-art geocoding methods either require detailed location reference data or do not apply to multi-unit buildings and since building a model to generate such reference data is out of scope, neither of these methods can form an appropriate baseline. However, a recurring concept, used as a geocoding method itself (Lee, 2004;Zandbergen, 2008), and in state-of-the-art geocoding methods to fill "gaps" where the proposed more advanced models fall short (Laumer et al., 2020;Lee et al., 2020), is interpolation. In the context of multi-unit buildings and streets, interpolation is used to approximate address locations on a hallway or street segment within a known address range. Similarly, but without the availability of such a known segment, this concept will be used as a baseline to interpolate locations on a universally applicable building space, based on only knowing the outside public information of the building.
Following the geocoding accuracy metrics addressed by Zandbergen, only positional accuracy is a relevant geocoding validation metric for this research, since the addresses known by Kadaster are complete and the goal of this article is not to compare multiple input datasets. As for a baseline performance indication, neither of the multi-unit building geocoding methods has been evaluated on more than a single building and both methods lack an evaluation of positional accuracy. However, a median positional accuracy error of 38 m has been identified for linear interpolation when used to geocode addresses in buildings along street segments in urban areas and will serve as a baseline performance indication (Zandbergen, 2009).

| CatBoost regression
CatBoost's implementation of gradient boosted decision trees has a proven track record in the context of several different and similarly sized problems (Bentéjac et al., 2021;Luo et al., 2021). Since the proposed solution is novel the explainability of the underlying decision trees can help in investigating performance and identifying what variables play the most significant role in location predictions. Furthermore, Kadaster's services are required by law to be accurate and therefore required to be able to explain the decision-making path of automated results.
As the goal of this research is to create a generalizable way to geocode locations, the number of features used for training the gradient-boosted decision trees is limited to publicly available information originating from one of Kadaster's key registers. Furthermore, the hyperparameters used for tuning will be based on a merge of the experimental setups of the previously mentioned comparisons (Bentéjac et al., 2021;Luo et al., 2021).

| Research method
A comparative study is conducted after splitting the problem into two sub-problems: (1) floor-level prediction and (2) on-floor location prediction. The problems are solved and evaluated separately since both pieces of information are relevant and if one of the two tasks shows lower reliability, the reliability of the other task does not suffer. Finally, the results are combined to evaluate performance across the overall problem.
1. In the case of floor-level prediction, as a baseline, the number of floors will be approximated and used to interpolate the numbers across. The house numbers of the addresses will be used to distribute the apartments equally across the approximated floors from bottom to top in ascending order. Furthermore, a CatBoost model will be trained with the actual floor as label and featurized publicly available (geospatial) building/address information as input. Since floors can technically, but not realistically, be infinitely large, the model cannot be bounded to a fixed range of prediction labels. Therefore, the model's predictions can also fall below 0. These values will be set to 0.
Furthermore, all predicted labels will be rounded, since floors are discrete values.
2. As for the on-floor location prediction, a generalizable space on which apartment locations can be represented must be determined, which has to be consistent and applicable to other buildings. Since every building has a different shape, the floor plan shapes can vary significantly and the 2D representation of a floor is therefore unsuitable as a general space to place addresses on. To assert a consistent space for apartment locations, the assumption is made that every apartment is by any means directly connected to the outside world (e.g., by door or window). This allows for the building outline to represent a one-dimensional space that can vary in length; however, the relative distance of an apartment to a consistent starting point allows for generalizability across buildings. The only building types that might render issues are buildings with multiple connections to the outside world, for example, buildings with apartments only connected to a courtyard. Therefore, these buildings will be manually discarded from the dataset.
Furthermore, a consistent starting place on the building outline will have to be determined. Like before, the starting place has to be consistent across different buildings and generalizable in the sense that it can be identified without access to floor plans. Therefore, the corner of the building located on the lower end of the numbering direction directly facing the adjacent street will be selected as the starting point.
To keep the relative distance to the starting point consistent for both sides of the building, the outline will be split in half. The apartments located more closely in the clockwise direction half will receive a positive distance value between 0 and 100. The counterclockwise direction will receive a relative distance value between 0 and −100. This way, the magnitude of the value will reflect the relative distance to the starting point equally for both sides of the building and the positivity/negativity will reflect the side on which the apartment is located. Figure 1 shows the visualization of the determined "true" location for a small set of apartments. The green corner represents the starting point, the blue line represents the space following the clockwise numbering direction of the F I G U R E 1 Locations for a set of apartments on a single floor. adjacent street, the red line represents the counterclockwise numbering direction of the adjacent street, and the yellow dots represent the closest location for an apartment on the building outline.
As a baseline, the house numbers will be interpolated in turn around the building outline in clockwise and counter-clockwise directions. Finally, as opposed to the floor predictions, the range of output values for on-floor locations is fixed. To realize a model that will not predict labels outside the range of −100 to 100, a CatBoost model will be trained bounded to a range of outcomes. This can be achieved by training a binary CatBoost classifier with a cross-entropy loss function, which acts similarly to a logistic regression model, only allowing for a range of labels between 0 and 1. Using this method, a multi-label output model will be trained to predict the following two target variables: 1. The distance to the starting point (0-100) will be normalized to fit within the allowed 0 and 1 range and reflects the probability that an apartment lies further on the building outline.
2. The probability of the apartment being located on the street side. An output of 1 represents the street side and an output of 0 represents the opposite side.

| Results evaluation
Both sub-problem results will be compared using the maximum error, MAE, RMSE, and precision. The maximum error is the largest absolute difference between the true and predicted values. This gives insight into the worst possible prediction. MAE gives the same weight to all errors and is calculated as follows: RMSE gives more weight to larger absolute errors, causing variance to be penalized. This makes it more suitable for analyzing the performance of models that have many small errors and is calculated using the equation below: For both equations, and depending on the sub-problem being evaluated, y_pred either represents a predicted value for a floor-level or an on-floor location, whereas y_true represents their corresponding true value.
In the case of floor-level prediction, the output label is a discrete value. Therefore, precision will be used to measure the effectiveness of precisely determining the correct floor.
As for the on-floor predictions, precision will be used to investigate the model's performance in determining the side of the building and the absolute location value will be used to calculate the other mentioned evaluation metrics to reflect the effectiveness of determining the distance to the starting point.
To investigate what features play the most significant role, Shapley additive explanations (SHAP) will be used.
SHAP measures the contribution of each feature in changing the output value up or down in comparison to the mean output value from the training data.
Furthermore, to answer the main question x, y, and z-coordinates will be approximated to evaluate positional accuracy and to conduct a median performance comparison with the related work baseline indication. The vertical z-coordinate will be calculated using an average floor height of 3 m, which is based on the minimum Dutch floor height of 2.6 m and an apartment floor depth of 0.4 m (Ministerie van Binnenlandse Zaken en Koninkrijksrelaties, 2003;VBI, 2020). The horizontal x, y-coordinate will be calculated by moving the apartment back into the building toward its medial axis (Figure 2), using a previously determined average offset, and translating it to a real-world coordinate in the national triangulation system (RD), which gives an x, y location in meters.
Both horizontal, vertical, and their combined positional accuracy will also be evaluated in the context of their building sizes, which are building height (m), floor size (m 2 ), and multiplication of those (m 3 ), respectively. Floor size will be deduced by taking the sum of all apartments for each separate floor in the building, after which the median is chosen as a representative floor size for the building. The median is less sensitive to outliers, making it robust against large or small floors.
Finally, an appropriate significance test will be identified to determine the significance of the positional accuracy results between the baseline and gradient-boosted decision trees. The lightweight nature of the models allows for running experiments multiple times.

| EXPERIMENTAL SETUP
This section describes the data that is used in this research. First, information about its origin will be provided.
Second, the ground-truth preparation process is described. Next, an explanation is given as to how the outcome quality of this automated process was assured. Finally, data analysis is conducted and the used features/hyperparameters are described.

| Data sources
Kadaster is the main land administration organization in The Netherlands that collects and registers administrative and spatial data on properties and the rights involved. They are responsible for all six Dutch key registers and in doing so, they protect legal certainty. One of these databases is the "Basisregistratie Adressen en Gebouwen" (BAG). This is a public GIS containing information on all addresses and buildings in the Netherlands, for example, their location, size, purpose, and construction year. Furthermore, Kadaster also holds the notarial deeds for bought properties. These are stored and protected under the "Basisregistratie Kadaster" (BRK), which is responsible for the registration and administration of all legal space in the Netherlands. These deeds are unstructured documents composed by government officials describing the ownership of a property and commonly also contain information surrounding the size and location of the specific property. It is not possible to own parts of a property. Therefore, in the case of a multiunit building, the property is split up into apartment rights which gives the exclusive right to use certain parts of the building. These apartment rights are contained in apartment deeds and can also hold information about the floor on which a certain unit is located. In 1973, a law was enacted which requires the inclusion of floor plans with apartment deeds, these must visualize how units are distributed across a building (Kadaster, 2022).

| Data preparation
Due to the unstructured and raw nature of the data, thorough pre-processing is required to generate a dataset eligible for apartment geocoding. The preparation is performed using Python 3.9.7. Data is provided by Kadaster, as a download from the BRK system for apartment buildings in the provincial capitals of the Netherlands.
F I G U R E 2 A full shift from building outline to medial axis.
Both the deeds and floor plans are stored as PDF files. Therefore, to build the datasets, the relevant text and numbers will have to be extracted from the image-type files. Extracting textual information from images is referred to as Optical Character Recognition (OCR) (Wu et al., 1997). Recently, the Data Science team at Kadaster conducted a comprehensive comparison of OCR tool performance for text extraction on Dutch notarial documents. The results rendered the highest accuracy from Azure Computer Vision (ACV) (Kadaster Labs, 2021). Furthermore, Kadaster already has a legal partnership with Microsoft to guarantee data protection when processing privacy-sensitive documents like deeds in their cloud environment, assuring GDPR compliance. Figure 3 contains an example describing a unit's location encapsulated in a deed. The initial size of the raw dataset contains the deeds and floor plans for 144 buildings.

| Deeds
To build a dataset for floor-level predictions, the deeds are loaded, and text is extracted using ACV. Next, the texts are processed to extract the floor of an apartment. Every text relates to a single apartment and the floor extraction is done using simple Natural Language Processing by looking for keywords in the text and defining a set of rules on how to handle the relevant parts found. First, words relating to a floor are sought in the text to find sections containing elevation information ("verdieping," "woodland," "stag," and "bowling"). The previously shown deed example reveals a complexity in that the text can contain multiple floor information sections as apartment rights often also have some form of storage or parking space location information, which are commonly located on different floors than where the apartment is located. Therefore, the text is split on the mentioning of elevation information, and the text before the mention is checked to see whether living space is mentioned since "woning" is commonly stated before the floor level. Finally, the first written number between the living space and the elevation information is extracted and mapped to the respective numerical representation. Deeds that fail to extract an apartment's floor based on these rules are discarded from the dataset.

| Floor plans
As for the extraction of the on-floor location on floor plans, Computer Vision (CV) techniques are used to isolate the relevant information. The provided floor plans commonly contain multiple floors on a single page; therefore, OpenCV F I G U R E 3 A piece of a notarial apartment deed revealing the floor of the respective apartment.
is used to split the file into separate floor images by recognizing shape contours and drawing a convex hull around them to generate floor building outlines and fill potential gaps in their respective contours. Multiple versions of the separated floors are generated using OpenCV's morphology functions to remove noise like small lines and dots to potentially recognize more information. The morphology settings can be found in Appendix C ( Figure C1). Next, ACV is used to locate the numbers of the different versions of the floorplans and the text below them to recognize the respective floor. Next, the locations of the apartment numbers on their floors are shifted to the closest point on the building outline. The average shift distance is stored and will later be used as an offset to place apartments back into the building from the border.
Determination of the starting point is done using the shape of the building's bottom floor and matching it with the actual building geometry retrieved from the BAG. The geometry received from the BAG is a polygon reflecting the actual orientation and shape of the real-world object. They are scaled to the same size and turned around their centroid until the shapes are most similar. Next, the location of the adjacent street in relation to the building is determined using the "Basisregistratie Grootschalige Topografie" (BGT), and the neighboring buildings and their location on the street are retrieved to determine the numbering direction and the corresponding building corner location.
Finally, after determining the starting point, the percent distance from the starting point to the apartment index locations on the building outline is calculated.

| Quality assurance
To assure that the final dataset adheres to the previously described steps and is as accurate as possible, error-prone entries are removed. Buildings that are built across multiple parcels or contain apartments that are addressed by different streets are not included in the dataset. Furthermore, determining the orientation of a building automatically cannot be done if the building and parcel outlines are both on any axis symmetrical; hence, these are manually discarded from the dataset. As for the on-floor location, if the floor plan contains a large amount of non-building space (e.g., garden or parking spots) and apartments on that floor are located on both sides of the building, the apartments on that floor might be shifted to the wrong side of the building; therefore, these floors, as well as floors below the ground floor, are removed. Finally, low-quality floor plans result in fewer apartments being found by ACV, after plotting the completeness of apartments found in buildings a gap in floor plan quality can be identified as shown in Figure 4. Buildings below the 22%-35% completeness gap were removed from the dataset resulting in the removal of 96 apartments across 25 buildings.
The real-world orientation, street side, and numbering direction determination were verified for the remaining 70 buildings. As a result, the building orientation had to be manually adjusted two times and the street side 25 times. To identify relevant significance tests, the normality of the data is investigated. First, the normality of the floor labels was investigated visually ( Figure 5). The histogram shows a left-skewed bell shape that seems to represent an F-distribution. Furthermore, the QQ-plot shows that the labels represent a systematic pattern that does not follow a straight line. Using the Shapiro-Wilk test, the tested null hypothesis is that the data come from a normally distributed population (H0: P is a normal distribution). With a resulting p-value <0.001, the Shapiro-Wilk test is smaller than alpha = 0.05. Therefore, H0 is rejected. Based on the Shapiro-Wilk test and the investigated plots, it can be assumed that the floor levels have not been sampled from a normal distribution.

| Data analysis
As for the on-floor locations, the distribution of an apartment's building side is somewhat skewed toward the street side, with 1778 samples as opposed to 1307 on the other side. Furthermore, as shown in Figure 6, the distance on the building outline is a left-skewed shape in the histogram, which seems to represent a lognormal distribution.
The QQ-plot shows that the labels represent a systematic pattern that does not follow a straight line. Combined with a Shapiro-Wilk test result of p-value <0.001 under alpha = 0.05, it can be assumed that the on-floor location distances have not been sampled from a normal distribution.
Since both datasets are assumed to not follow a normal distribution and both the baseline and model predictions will be calculated on the same sample during each experiment, a paired nonparametric Wilcoxon signed-rank test will be used to determine the significance of the results. The minimum sample size for this test is 16 (Dwivedi et al., 2017); therefore, the experiments will be conducted 16 times to meet this requirement. A single experiment will consist of the following: 1. Randomizing all samples and creating an 80:20 stratified building train/test split based on municipality distribution.
2. Applying a 5-fold cross-validation grid search within the train split to find the optimal hyperparameters. The parameters will be tuned by minimizing the results of CatBoost's implementation of the following two objective functions, RMSE for floor-level predictions and MultiCrossEntropy for on-floor location predictions. Furthermore, early stopping is set to 100 iterations to prevent over-fitting.
3. Predicting the locations of the test split using the previously determined optimal hyperparameters.

F I G U R E 5 Normality plots for floor-level labels.
F I G U R E 6 Normality plots for on-floor distance location labels.
4. Linearly interpolating the locations of the test split.

Calculating the validation metrics and positional accuracy.
Finally, the positional accuracy significance is determined.

| Features and hyperparameters
Tables 1 and 2 display all hyperparameters tuned and features used, respectively. The number of floors will be approximated using the building height, by dividing it by the average Dutch floor height (3 m) and rounding it. Decision trees are immune to feature magnitude; therefore, the quantitative features will not be normalized. However, due to the house numbers potentially starting with varying offsets as buildings are located differently in a street address range, all house numbers will be standardized to start at 0 by subtracting the lowest house number in the building. Finally, multiple postal codes can occur in a single building and will therefore be indexed alphabetically starting from 0 within each building. The apartment count distribution across datasets for different municipalities and construction years is found in Figures 7 and 8.

| Floor level
Figures 9 and 10 display the average performance and prediction error across different floor levels for interpolation and CatBoost, respectively. Both methods seem to be able to predict increasingly higher floors as the actual floor levels increase. However, after floor level 10, only the interpolation can keep following this trend.
Furthermore, when evaluating both methods' precision using Figure 11, the graph confirms that interpolation can cope better with higher-level floors than the CatBoost model. Table 3 displays the mean performance metrics across all experiments, which similar to the graphs, intuitively reflect the favorability of the baseline linear interpolation method.
Finally, Figure 12 shows the top three features with the highest mean absolute impact in descending order.  error range of the CatBoost model is generally smaller. Furthermore, as opposed to the interpolation method, the CatBoost model's predictions also somewhat follow the upward trend, predicting higher distances when the actual distances also increase.

| On-floor location
When evaluating both methods' precision in building side prediction using Figure 15, the graph shows a significantly higher average precision rate for the CatBoost model.  Table 4 displays the mean performance metrics across all experiments, which intuitively reflect the favorability of the CatBoost model.
Finally, Figures 16 and 17 show that the standardized house number and the size of an apartment have the most significant impact on predicting on-floor locations. Both features appear to be positively correlated with distance to F I G U R E 7 Municipality apartment count distribution.

F I G U R E 8
Construction year apartment count distribution.

F I G U R E 9
Interpolation performance for floor levels.  Figures B2 and B3).

| Positional accuracy
Tables 5-7 display the median positional accuracy of both methods across all three dimensions (vertical, horizontal, and combined) including whether the difference between both methods is found to be significantly under alpha = 0.05 according to the Wilcoxon signed-rank test results. Furthermore, Figure 18 displays the positional accuracy in relation to each sample's building size to reflect whether the positional error is consistent across different building sizes.

F I G U R E 1 4
CatBoost performance for on-floor locations. The results show that CatBoost does not have a significantly better positional accuracy for vertical location predictions. However, horizontally and overall, CatBoost does show a significantly lower positional error. Furthermore, the positional error across all three dimensions seems to be relatively higher for the smallest buildings in the dataset. All experiment results, including the optimal hyperparameters, is found in Appendix A (Tables A1-A8).

| DISCUSSION
A question that arises from the results is, why does CatBoost not outperform linear interpolation in the context of floor-level predictions? The overall difference between the two methods is not statistically significant which can F I G U R E 1 7 Top three features with the highest impact on building side predictions. be explained by the lack of samples used for training and the underlying decision tree structures. Furthermore, this section describes the impact of building side predictions, a comparison with related results, the limitations in terms of reliability of the results, and finally validity in terms of generalizability.

| Lack of samples
CatBoost is a more sophisticated model than linear interpolation; therefore, it could potentially learn more complex patterns from the inputted information. However, CatBoost relies on observed data for decision-making and as shown in Figure 19, there are fewer buildings with more than approximately 10 floors in the dataset. Due to the relatively small number of observed samples for higher-level floors, the model is unable to predict these properly.
Next, it is noticeable that house_letter is not among the most impactful features for floor-level predictions as this feature in some municipalities directly correlates to certain floors (e.g., A is the first floor in Utrecht). This can also be explained by the lack of observed samples since only 9% of the used apartments contain a house letter.
Furthermore, over 40% of the initial dataset was disregarded as their textual structure could not effectively be processed by the developed automated pre-processing pipeline. It is assumed that the removal of a large portion of the buildings had a negative impact on the final accuracy as the models had significantly less examples from which to deduce logical patterns. Improvements to this pipeline are recommended to utilize a larger portion of the dataset.
Up-scaling the experiment to a larger dataset will require more work as manual correction of building orientation was required for 36% of cases. Therefore, pipeline improvements are recommended to focus on more effectively processing texts for floor-level labels and more accurately identifying street sides for real-world orientation determination.
In the case of a smaller dataset, the interpolation method is most useful as no training is required. After a model has been developed this method is able to geocode addresses in new buildings as opposed to the existing multi-unit geocoding methods where every new building requires manual work.

| Decision tree structure
When investigating the structure of CatBoost's underlying decision trees, it is noticeable that all leaves/nodes only base their decision on single features against a learned value. An example tree branch, with a floor-level regression output, is found in Figure 20. Since the model internally does not use feature combinations as separate features or take into account feature interaction strength for each pair, the model is not able to learn certain contextual information. For example, if an apartment surface is bigger than 90% of the parcel surface, it could be the penthouse on the highest floor.

| Side prediction
CatBoost shows a significantly higher success rate in predicting on which side of the building an apartment is located.
However, since 68% of buildings in the used dataset contain apartments that touch both sides of the building, incorrectly predicting the side does not have a large impact on horizontal positional accuracy. This is caused by the final F I G U R E 1 9 Distribution of apartments per amount of floors in the building.
coordinate being calculated from a point moved toward the center of the building. Therefore, if additional buildings were included with apartments touching only a single building side, a more accurate horizontal positional accuracy performance difference would be reflected.

| Related work comparison
The median positional accuracy for linear interpolation in the context of geocoding single-unit houses along street segments in urban areas is 38 m (Zandbergen, 2009). Since the metric originates from a US-based study conducted in an environment that is more spread out than the situation in Dutch cities, the expected median positional accuracy of all methods was expected and resulted to be lower. With a rounded horizontal, vertical, and overall median positional accuracy error of 3, 18, and 19 m for linear interpolation as well as 3, 12, and 14 m for CatBoost, respectively.

| Reliability
In terms of ground-truth preparation, ACV-OCR has an average accuracy of 91% across different quality type documents and might misclassify numbers on the floor plans or be unable to find them at all (Kadaster Labs, 2021).
This can occur when the floor plans are handwritten, contain other text/numbers, hold much architectural detail, or include other noise potentially caused by the different morphology steps.

F I G U R E 2 0
Branch structure of an on-floor decision tree.
Next, the lowest occurring floor of an apartment is currently chosen as the actual floor location. Therefore, if an apartment is located across multiple floors and the entrance is not on the lowest floor, the model does not accurately present the apartment's entrance.
Moreover, the real-world coordinates used to determine the horizontal positional accuracy are calculated based on the orientation of the bottom floor. Higher floors are aligned based on the earlier described corner; hence, if the bottom floor plan significantly differs in shape compared to the upper floors, the upper floors get shifted to the wrong real-world location. This will not impact positional accuracy as both the ground-truth and prediction points are shifted similarly. However, if the results are visualized on a map, the apartments will show an incorrect location.
Furthermore, the measure of error is calculated from an approximated true position deduced from a number located in their respective floor plan. These numbers are ideally located at the center of an apartment but in practice, they show up in different places. Nevertheless, these points are all "true" apartment locations as they are all located inside the apartment. However, since these points are used to calculate the measure of error, some predictions could end up looking worse or better than they truly are as the predictions are compared against a truth that could be located anywhere inside the apartment. Either way, this might only impact the final error by a few meters.
Unless a dataset with consistent accurate center apartment coordinates is found or created, this limitation cannot be overcome.

| Generalizability
Since the total amount of apartments in the Netherlands is over 2.75 million (Achmea, 2020), the sample sizes used for this research can be considered relatively small (n = 4703, n = 3085). The sample only contains apartments from buildings built on a single parcel after 1960 with at least 12 other apartments located in relatively large Dutch cities. Furthermore, the one-dimensional representation of apartments on their building outline is only applicable to buildings with a single connection to the outside world, for example, this method will not work for apartments only connected to a courtyard. When looking to implement this method in other regions, there should be sought after buildings where a single line can be drawn around the building outline that touches every apartment inside the building. All of these factors restrict the generalizability to the rest of the population or other countries and may also impact the prediction performance and conclusion validity as more samples might lead to better and/or more valid results.

| CONCLUSION
The goal of this article was to realize a generalizable method to map apartment addresses to their explicit locations without access to indoor location reference data while attaining a lower median positional accuracy error than geocoding single-unit addresses along street segments. Apart from a new way to represent apartment locations in multiunit residential buildings applicable to different building shapes and sizes, a comparative study was conducted to predict the locations in their newly proposed representation using publicly available address-and geospatial-building information.
The results show that gradient-boosted decision trees do not significantly outperform the linear interpolation baseline in the context of vertical (floor-level) location predictions. Horizontally (on-floor) and overall, gradient-boosted decision trees show significantly better positional accuracy as opposed to linear interpolation. Furthermore, across all three dimensions, the median positional accuracy of the proposed method is lower than the median positional accuracy of linear interpolation in the context of single-unit buildings, proving the effective use of public apartment-and geospatial-building information for approximating apartment locations. The most influential variables consist of the (standardized) house number, the number of floors in the building (approximation), and the apartment's surface area.
In the best-found sample, an apartment location was approximated less than a meter off target.
Limitations toward the reliability of results due to errors in the ground-truth preparation process and generalizability due to sample size should be considered when applying this method to a larger population.
Future work is advised to investigate a way of applying the representation to buildings with multiple outsides as well as using a larger more diverse dataset and applying a model that can learn patterns between features directly. Lastly, to argue the out-of-the-box performance of the interpolation method and to assess the positional accuracy gains in relation to the preparation and processing costs for the machine learning approach, it is recommended to conduct a comparison with the performance of geocoding apartments using their building center point.

T A B L E A 3 On-floor location interpolation results
Exp.