A Mean-Variance Optimization Approach for Residential Real Estate Valuation


 This paper introduces a new approach to the sales comparison model for the valuation of real estate that can objectively estimate the coefficients associated with the explanatory price variables. The coefficients of the price adjustment process are estimated from the formulation of a quadratic programming model similar to the mean-variance model in the portfolio selection problem and are shown to be independent of the property to be valued. It is also shown that the sales comparison model should minimize the variance of the adjusted prices, and not their coefficient of variation as indicated by some national and international valuation regulations. The paper concludes with a case study on the city of Medellín, Colombia.


Introduction
Price prediction is a matter of great importance in the real estate industry (Ahn et al., 2012). In fact, interest in property valuation has increased over time due to the appearance of new investment organizations such as real estate investment trusts (REITs), as well as for reasons of property taxes, investment decisions, sales transaction, price formation, etc. (Arribas et al., 2016;Raslanas et al., 2010). Since real estate is an important asset for most people, undervaluing or overvaluing it can cause a series of problems to both owners and buyers. Professional valuers estimate a property's equity in such a way that overvaluation implies underestimating the default risk, which can be passed on to buyers or secondary mortgage providers (Guo et al., 2013). Valuers must supplement their skills with valuation methods that can systematically analyze larger datasets with an output that is readily applicable to single-property appraisal (Kane et al., 2004).
Different methodologies have been proposed to estimate the value of real estate. Aznar et al. (2011) classify the different valuation methodologies into (1) Economic methods, (2) Non-economic methods, and (3) Mixed methods. The Economic methods provide a monetary value for the asset being considered, whereas Non-economic methods assign a value measured by a non-monetary scale to each asset. Mixed methods include a set of procedures that combine both methods with the aim of deriving more realistic and effective monetary valuations. Most of the studies in the literature focus on economic methods, since they are simpler and more objective when the necessary information is available (Dmytrow & Gnat, 2019). In this sense, we must point out that the application of one method or another is conditioned precisely by the quantity and quality of the information available. Researchers have found that many factors can affect prices and that the amount of sales evidence varies widely, but generally there are few sales of properties similar enough to be considered comparable and none that can be considered identical. www.degruyter.com/view/j/remav vol. 29, no. 3, 2021 Programming (Aznar & Guijarro, 2007), Analytic Hierarchy Process (Aznar et al., 2011), Analytic Network Process (Aznar et al., 2010). Artificial Neural Networks (García et al., 2008;Selim, 2009), Support Vector Machine (Kontrimas & Verikas, 2011;Plakandaras et al., 2015), Genetic Algorithm (Gu et al., 2011), Decision Tree (Fan et al., 2006;Cupal et al., 2019), Random Forest (Antipov & Pokryshevskaya, 2012), and Rough Set Theory (d' Amato, 2002Amato, , 2004Amato, , 2007. Recent papers are focused on the application of machine learning methods in the valuation of real estate, which has currently resulted in a significant increase in research focused on this area (Baldominos et al., 2015;Park & Bae, 2015;Hausler et al., 2018;Hu et al., 2019;Liu & Liu, 2019;Pérez-Rave et al., 2019;Liu & Wu, 2020). Valier (2020) states that "machine learning models are more accurate than traditional regression analysis in their ability to predict value. Nevertheless, many authors point out as their limit their black box nature and their poor inferential abilities".
The main objective of this work is thus to propose a new version of the sales comparison model for the valuation of real estate from a quadratic programming model, overcoming the inconvenience of subjectivity in the judgments issued by the valuers. The proposed model is compared with the multiple regression model and its similarity with the celebrated mean-variance model proposed by Markowitz (1952) for the portfolio selection problem is shown. It is also shown that optimizing the process of price adjustment through the coefficient of variation generates suboptimal solutions, since the adjusted prices are more dispersed than in the optimization of the variance of the adjusted prices. In addition, a positive bias (overvaluation) is introduced in the estimated price of the property to be valued, so price variance is optimized to obtain more precise and unbiased solutions.
The paper unfolds as follows: Section 2 introduces our modified version of the sales comparison model and highlights the similarities with the regression model and the mean-variance model. In Section 3, we apply the proposed model to a database of apartments in the city of Medellín (Colombia), analyzing the results according to the different variables used as predictors. Lastly, the concluding remarks are given in Section 4.

Methodology
Let us assume a set of properties in addition to the property to be valued, so that 1, are comparable to each other. To explain the price ( 1, … , ), we have a total of explanatory variables or predictors ( 1, … , , 1, … , ). The value of these variables is also known in the property to be valued * ( 1, … , ), but not the price, which is precisely what has to be estimated. Since the characteristics of the properties are different from those of this property, the price of the comparable properties must be adjusted appropriately according to their characteristics and in relation to the same characteristics in the problem property. Finally, the price of the property is estimated as the average of the adjusted prices.
Instead of performing price adjustment by means of an expert's subjective judgment, we propose the quadratic mathematical program Eq. (1a) -(1f), a solution by which can estimate the price of the property ℎ without the need to consult the opinion of an expert valuer.
The objective function (Eq. 1a) minimizes the variance of the adjusted prices (Eq. 1b) of the properties considered in the comparable set. In the case study described in Section 3, we explain why it is preferable to optimize the variance instead of the coefficient of variation.
The adjusted price of the -th property, ℎ , is obtained by multiplying its original price by the factor . This factor is the product of other factors , so that each of these factors is obtained by comparing the -th characteristic between the -th comparable property and the property to be valued. The comparison for the -th variable is performed by dividing * by . For example, if the -th comparable property has an area of 100 m 2 and the property to be valued has an area of 90 m 2 , the result of this comparison is * ⁄ 90 100 ⁄ 0.9. This difference would justify the property to be valued having a price 10% lower than the comparable; i.e., the price of the comparable must be adjusted by a reduction of 10%. The other variables would also adjust the price of the -th property, so that the different adjustments made have a multiplying effect -they could also be considered to have an additive effect. Logically, not all the variables need to have the same importance in determining the price, thus in constraint (Eq. (1f)), we introduce an adjustment factor that graduates the importance of the variables. In the example above, a 10% lower surface area would not necessarily imply a 10% lower price. This factor is precisely the variable that model (Eq. (1a) -(1f)) must solve, since the rest of the variables are either known or derived from the said adjustment factor. Once the value of is www.degruyter.com/view/j/remav vol. 29, no. 3, 2021 solved for the different explanatory variables, the estimated price of the property to be valued (Eq. (1c)) is obtained as the average of the adjusted prices ℎ (Eq. (1d)): where: : -th property price : value for the -th variable in the -th property * : value for the -th variable in the property to be valued : factor of adjustment for the -th variable : adjusted factor for the -th variable in the -th property : compounded adjusted factor for the -th property ℎ : adjusted price for the -th property : standard deviation of the adjusted prices ℎ : mean of the adjusted prices The way in which prices are adjusted explains why the objective function minimizes its variance; the price adjustment can make equivalent properties whose characteristics are not equivalent. When a property has characteristics superior to those of the property to be valued, its original price will be adjusted downwards to make it comparable to the property to be valued. If another property has inferior characteristics to the property to be valued, then its price will be corrected upwards. Finally, both adjusted prices try to estimate the price that the property would have, given the prices and characteristics of other similar properties, but not exactly equal to it, as regards their characteristics. In this way, each adjusted price constitutes an estimate of the value of the problem property. If the estimates are very similar to each other, the vector of adjusted prices will have little variance and we can conclude that the consensus is high. However, if the adjusted price vector has a high variance, this will be a symptom that the adjustment process has not made consistent comparisons, since the estimated price of the property will vary according to the property used for its valuation.
Although model (Eq. (1a) -(1f)) can be solved using any commercial optimization software, below we show how a slight modification allows an analytical solution to be obtained.
The value of ℎ can be expressed as follows: If we take logarithms, we reach the next equality: Eq.
(3) can be expressed in vector form: So that is an -vector with the adjusted prices of the set of comparable properties, is an -vector with the original prices of such properties, and is an -vector with the values of the comparable properties in the -th variable. The * ⁄ ratio is the result of the division of * by each of the elements of .
Next we make the following variable changes: with 1, … , . In this way we can rewrite Eq. (4) as a function of the new variables: where is a matrix with rows and 1 columns, and 1. This expression points to greater similarities between the regression model and the sales comparison model than those reported in the literature and by professional valuation associations. Traditionally, both models have been considered within the same family of economic comparison methods. But just as the regression model infers a general valuation function that is then applied to a particular property to estimate its price, the sales comparison model adjusts the prices of each comparable property to finally estimate the price of the property to be valued as the simple average of the adjusted prices. Therefore, the sales comparison model does not estimate a valuation function, but directly estimates the price of the property.
However, Eq. (8) shows that the sales comparison model can also be understood from a holistic perspective in which the different variables contribute individually to the formation of the price, i.e. as a valuation function. Eq. (8) allows for the following quadratic programming model to be proposed: where represents the 1 1 covariance matrix of . The other two new elements are 1 -vectors: … and 1 0 … 0 . The objective function in Eq. (9a) minimizes the variance of the logarithm of the adjusted prices , while the constraint Eq. (9b) forces the coefficient to take the value of 1. We must note that Eq. (9b) is actually not a constraint we impose on the model. It comes from Eq. (5) and is closely related to the intercept in the classical regression model.
The model Eq. (9a) -(9b) is similar to the model developed by Markowitz (1952) for the portfolio selection problem. The mean-variance approach is used due to the fact that this model proposes the minimization of the variance of a portfolio (risk) for a given return (average portfolio return). In our case, we do not impose any constraint on the average of prices because this is precisely the variable that we want to solve. But the objective function also consists of minimizing the variance and, as we will see in the next section, we can also impose a constraint on the average adjusted price to obtain a solution frontier in the mean-variance space. As in the model by Markowitz (1952), we also assume that is a definite positive. If this assumption is not guaranteed, the variable (or variables) causing the problem must be excluded. This does not influence the achieved solution because this variable is considered redundant (perfectly correlated with some of the remaining variables).
From the quadratic program of Eq. (9a) -(9b) we can state the KKT conditions: In Eq. (11) we can clear the value of as a function of: If we premultiply on both sides of the equality by , and bearing in mind that 1, we get: For convenience, we will identify to the matrix product as . We clear the value of in Eq. (14): In this way we can already obtain the vector of coefficients : 1⁄ The product returns a vector with the first row of the matrix , so that Eq. (17) obtains the coefficients associated with each variable by dividing the elements of this vector by the scalar .
The constant is the 1,1 element of the matrix . This scalar represents the inverse of the residual variance of the logarithm of the price versus the set of explanatory variables. In fact, the www.degruyter.com/view/j/remav vol. 29, no. 3, 2021 elements of the diagonal of contain the residual variance of a regression between the variable defined by the diagonal and the rest of the variables. If we call to the , -element of the matrix , then we get: where 1, … , 1 . Therefore, the solution to model Eq. (1a) -(1f) is obtained directly from the simple ratio of Eq. (18) between two elements of the matrix . Here too we find a similarity with the traditional regression model, where the coefficient of the explanatory variable is obtained by dividing the covariance of this variable and the price by the price variance. In our model, the same elements are used in the calculation of the coefficients , but of the matrix instead of the matrix . From Eq. (18) it is easy to show the relation between the coefficients and the partial correlation between the logarithm of the price and the explanatory price variables. The partial correlation coefficient measures the linear dependence between two variables by controlling the effect of the remaining variables. The partial correlation , .. between the variables , controlling for the variables , … , is calculated in two steps. First, the regression residual between and the set of variables , … , is obtained, along with the residual of the regression between and the set of variables , … , ; the partial correlation coefficient is then calculated as the simple correlation between the two residuals calculated in the first step. The partial correlation can be calculated from the elements of the matrix : (19), we can express the partial correlation between the logarithm of the price and the -th explanatory variable as a function of the coefficient : And clearing , we verify that the coefficients calculated in Eq. (18) have opposite signs to the partial correlation between the price logarithm and the explanatory variable analyzed: The sign of these coefficients is thus the opposite of what we would expect in a classical multiple regression model between the price and explanatory variables. However, as we have argued before, the process of price adjustment involves lowering the price of properties with better characteristics than the property to be valued, and adjusting the price of those with worse characteristics than that property. Hence the apparently counter-intuitive result obtained by estimating the coefficients ( ). Another peculiarity of the coefficients is that their values are independent of the property to be valued, since they only depend on the properties included in the comparable set. In the model Eq. (1a) -(1f), we have defined * as the value of the -th variable in the property to be valued, so that the original explanatory variables are transformed as * ⁄ . Then the solution of model Eq. (1a) -(1f) may be thought to be dependent on the characteristics of the property to be valued. However, this is not so.
If instead of wanting to value a property whose -vector of characteristics is * , the purpose is to value property with characteristics * , the solution ⁄ would remain unchanged. In this case, we would have * ⁄ . This change would not affect the value of , since the variance of the regression residuals between the logarithm of the price and the rest of variables does not vary by a change in the scale of the regressors. In the same way, would also remain unchanged. While the values of the coefficients would be maintained, the estimated price for the property to be valued, ℎ , would change as a result of the change in adjusted prices ℎ for the comparable set.
Finally, it is interesting to know to what extent the process of price adjustment has obtained very homogeneous adjusted prices. The ideal situation would be for all adjusted prices to have the same value, i.e., the variance of these prices would be zero. On the other hand, the worst possible situation would be that the adjusted prices coincide with the original prices. Parallel with the regression models, we can calculate the coefficient of determination of the adjustment made by the model with Eq. (22) If we call to the element in the -th row and the -th column of the covariance matrix , Eq. (22) can be rewritten as 1 1 ⁄ . In the following section, we present a case study that serves as an illustrative example of the proposed model. Table 1 shows information on 28 properties in the city of Medellín, Colombia, plus two properties to be valued in the last row. We want to estimate the price from 4 explanatory variables: surface area, Stratum, administration expenses and age. In Colombia, cities are usually segmented into strata, from 1 to 6, so that the poorer neighborhoods with the lowest real estate prices are grouped in Stratum 1, whereas the most expensive ones are classified as Stratum 6. For the properties to be considered comparable, most of the properties considered in the sample were from Strata 4 and 5. Age was another one of the variables considered in the research, though not as a continuous variable but as a categorical variable. The information was gathered from the main real estate website in Colombia, metrocuadrado.com, where age is not expressed in years but as a range of values in the form of a categorical variable: between 0 and 5 years, between 6 and 10 years, between 11 and 20 years, and more than 20 years. The last two rows contain the properties that illustrate the valuation proposal: properties 29 and 30. The first one has a surface area of 120 square meters, is classified as Stratum 5, has annual administration expenses of 1,800,000 Colombian pesos, and was built between 0 and 5 years ago. The second property has a surface area of 70 square meters, the Stratum is 4, administration expenses are 925,000 Colombian pesos, and the age is between 11 and 20 years.

Empirical results
The Stratum needs to be transformed before the application of the valuation model. If we consider this variable without being previously transformed, the result would be assuming a price behavior that does not necessarily meet the market principles. For example, if no transformation was applied on the Stratum, the model would assume that the (price) distance between Stratum 3 and 4 is the same as distance between Stratum 4 and 5, which might be correct, but does not necessarily have to be true in the real estate market. Our proposal is to overcome this limitation by following the same procedure used in the least-squares regression models: transforming -level variables into 1 dichotomous variables.
Let's assume that * is the Stratum of the property to be valued, where the Stratum takes values between 3 and 5 in our case. This would translate into two dichotomous variables Stratum_4 and Stratum_5 that follow (23) Hence, values of Stratum_4 and Stratum_5 depend on the values of the Stratum in the property to be valued ( Table 2). The same approach was used to transform the Age variable with 4 original levels into 3 dichotomous variables: age_6_10, age_11_20, and age_20. These values serve to calculate columns in Tables 3 and 4, which record the logarithmic transformation of the explanatory variables corrected by the values of the property to be valued ( to ). Another column shows the logarithm of the price per square metre ( ) for the 28 properties that form the set of comparable properties.
From these tables, the covariance matrix and its inverse are calculated. The first 4 columns of Table 5 show the values of matrix , while the next 4 columns include those of . Note how the matrix remains constant for properties 29 and 30; that is, it does not depend on the specific values taken by the properties to be valued. The last column of the table contains the coefficients , which have been calculated from Eq. (18). Like the matrix, weights are also independent of the property to be valued. Table 5 includes the weights computed by the proposed model. In other words, these weights compute the factors of adjustment for those variables involved in the valuation process, according to equation (7).
Equations (5)-(8) serve to compute the factors of adjustment by using the ratio between the specific values of the property to be valued and those values corresponding to the comparable set. The higher the value of , the lower the value of * ⁄ . This term is multiplied by the factor of adjustment, * ⁄ ; hence a negative value of ( ) translates into a positive relation between the estimated price and feature , while a positive coefficient for translates into a negative relation between the estimated price and feature .
According to the last column of Table 5, weights of Area, Stratum_4 and Stratum_5 are negative. This would result in a positive impact of area on prices, and higher estimated prices for properties in Stratum 4 and Stratum 5 (when compared with Stratum 3). The coefficient associated with age_6_10 is close to zero (-0.001), hence we can conclude that the estimated prices for properties between 6 and 10 years of age are similar to those of properties between 0 and 5 years old. However, age_11_20 and age_20 obtain positive coefficients: 0.023 and 0.035, respectively. On average, the estimated price for a 11-20 year-old property is lower than the estimated price for a nearly new property (0-5 years old), and even lower for a property built 20 years ago or more, ceteris paribus.    Source: own study.   Source: own study.
The adjusted prices are obtained from the coefficients estimated by the model, whose average is 3,150 for property 29 and 3,105 for property 30 (Table 6). If we undo the logarithmic transformation, we obtain the estimated price per square meter for the property to be valued: 10 .  Source: own study.
The value of the determination coefficient is 1 1 ⁄ 1 1 2,526.64 0.0030 ⁄ 0.8680 ≅ 86.8%, which shows that a relatively good price adjustment has been obtained, albeit far from ideal and unrealistic, which would imply a determination coefficient of 100%.
The coefficient of variation was also calculated as the standard deviation of the adjusted prices divided by their average, since, as mentioned in the introductory section, this is the measure used by www.degruyter.com/view/j/remav vol. 29, no. 3, 2021 some national and international associations to measure the quality of valuation processes. The result obtained was a coefficient of variation of 0.00039578 . 3.150 ⁄ 0.63% for property 29 and 0.00039578 . 3.105 ⁄ 0.64% for property 30, much lower than the maximum threshold of 7.5% marked by the Colombian regulations.
The solution found in model Eq. (9) minimizes the variance of the logarithm of the adjusted prices of the 28 comparable properties. Continuing with the similarity between the proposed model and that proposed by Markowitz (1952), we can obtain a solution frontier for different values of the logarithm of the estimated price. For this, we propose the model Eq. (24a) -(24c): where 1 … 1 is a 1 -vector and ℎ * is the particular value of the logarithm of the price we want to obtain. Figure 1 represents the frontier obtained by the model Eq. (24a) -(24c) for different values of ℎ * . The leftmost point of the curve labeled Minimal variance represents the solution obtained with model Eq. (9). In the same figure, we also represent the solution of the problem in which, instead of minimizing the variance of prices, we had minimized the coefficient of variation, as proposed by some valuation regulations. The straight line in Figure 1 represents the inverse of the coefficient of variation, and its steepest slope is at the tangent point labeled Minimal coef. of variation, which is precisely the solution we would get by minimizing the coefficient of variation instead of the variance. We can observe that the solution obtained when optimizing the coefficient of variation is worse than that obtained by the proposed model, since its variance is greater. We must remember that the greater the variance of the adjusted prices, the less the similarity of the set of comparable properties to the property studied, taking into account the explanatory variables. In an extreme case, a model that presented zero variance in the adjusted prices would indicate that the comparable properties allow us to perfectly adjust the price of the property to be valued, with a coefficient of determination of 100%. In addition, since prices are always going to take positive values, it is easy to show that the solution found in optimizing the coefficient of variation will always obtain a price higher than that obtained when optimizing variance, i.e. this model will lead to a positive bias over the estimated price. In the www.degruyter.com/view/j/remav vol. 29, no. 3, 2021 case study, this bias is only 1,000 Colombian pesos, but if we look closely at Figure 1, we can see how the bias increases as the variance of adjusted prices -at equal prices-increases, i.e. the adjustment process was less reliable.
Finally, we can also analyze the relative importance of the explanatory variables that we consider in our model. Figure 2 compares different models with two explanatory variables and the one in the previous figure with all four explanatory price variables. We can see how the model that includes the stratum and administration expenses is close to the general model that includes all variables, with 83.3%. The combination of Stratum and age also gets a high 81.6%. However, all models that include the area obtain a worse fit.

Conclusions
This paper proposes a new approach to the sales comparison model for the valuation of real estate. Within the economic valuation models, the sales comparison and multiple regression models share some similarities and differences that have been highlighted in this study. The main disadvantage of the sales comparison model versus the regression models is that it involves a subjective comparison by an expert valuer between the property to be valued and the properties included in the comparable set, and thus the price adjustment process can be seriously biased. The model proposed in this paper determines such comparisons objectively through a quadratic programming model similar to that used in the portfolio selection problem.
Although it has been considered that the sales comparison model does not obtain a valuation function, our work shows that the proposed model does indeed generate a valuation function that is independent of the property to be valued, so that the coefficients of the variables are obtained as a simple ratio between two elements of the inverse of the covariance matrix.
The study also shows that the process of price adjustment should not be carried out according to the criterion of minimizing the coefficient of variation, contrary to what is suggested by different national and international valuation regulations. This procedure obtains solutions that are less precise and have a positive bias on the estimated price, compared to the proposed alternative of minimizing the variance of the adjusted prices. The present work thus shows that these regulations should be modified to update the valuation procedures along the lines indicated here. The results obtained from the case study verified that a model that barely obtains a coefficient of determination of 86.8% would comply with the quality requirements imposed by valuation regulations, which should also lead to a revision of these standards to increase the validation requirements of their models.