Testing Uncertainty in ILUTE - An Integrated Land Use -Transportation Micro-simulation Model of Demographic Updating

In microsimulation models, the behaviours of agents as intelligent objects are modeled through various behavioral modeling techniques, which give rise to uncertainties in the modeling workflow and predicted outputs. ILUTE (Integrated Land Use Transportation Environment model) is an agent-object-based microsimulation model system designed to simulate different activities in a city, in which many agents (persons, families, households, firms, etc.) intelligently and/ or randomly act and interact in a complex way. A 150,000 (150K) synthetic base year (1986) household sample of the Greater Toronto and Hamilton Area (GTHA) population is used as the starting point for undertaking twenty-year historical simulations (1986-2006). In order to analyze the uncertainty associated with stochastic behaviours in the ILUTE model system, a 1000 independent runs using different initial random number seeds are generated and tested through parallel programming and using high performance cloud computing (HPC). In this paper, the behaviour of the demographic updating module is tested by examining the variability and the distribution of its outputs including births, deaths, families, and households across simulation runs. The test results show that all the simulated outputs’ distributions are generally normally, or very close to normally, distributed. In general, ILUTE can simulate demographic updating process with considerable reliability and confidence.


Introduction
The complex and multi-modal nature of urban transportation systems in the one hand and the diverse travel patterns of individuals in an urban system on the other make modeling travel behaviour and land use activities a challenging task.This reality motivated planners and modellers to develop and implement a more comprehensive modeling framework in order to predict and simulate the future state of an urban system more reliably and accurately.Land-use-transportation models (LUTM) address the above issue by modeling the two-way interaction between transportation and land use system in a comprehensive solution space.Such integrated urban system models have been under development since the 1960s.A variety of model systems have been developed over time, including PECAS [1], MEPLAN [2], TRANUS [3], ITLUP [4], ILUTE [5], MUSSA [6], Delta [7], and UrbanSim [8].
LUTMs vary in terms of their level of spatial (and associated socio-economic) aggregation, as well as the extent to which they are deterministic or stochastic in their determination of process outcomes.The focus of this paper is on agent-based microsimulation LUTMs, in which system behavior is the emergent outcome of the decisions of individual agents (persons, households, firms), where these decisions usually are probabilistically determined through Monte Carlo simulation.Such models are motivated by the intrinsically non-linear nature of the behavior of significantly heterogeneous populations [9].Delta [7], UrbanSim [8], and ILUTE [10] are three examples of highly disaggregated LUTM frameworks.This paper focuses on the ILUTE model system.
Of course, no LUTM model system is able to simulate an urban system with a perfect accuracy.This is due to the complexity of urban region's system on the one hand, and uncertain and changeable (stochastic) behavior of agents in the system on the other.In addition, human beings are the intelligent objects of an urban system, making the simulation of their behaviors and decision making processes over time is not a straightforward task for modeling.Given this, it is essential to analyze and test the uncertainties and stochastic properties associated with the microsimulation modeling processes.This paper tests the stochastic behavior of ILUTE's demographic updating module by running and comparing the model system forecasts using many random seeds for the same base scenario.ILUTE is an integrateddisaggregated land use transportation modeling framework designed to simulate dynamic evolution of urban spatial form, economic structure, demographics, and travel behavior over time for the Greater Toronto and Hamilton Area (GTHA).Several outputs simulated by the tested module are statistically scrutinized to understand the stochastic behavior of the ILUTE model.

ILUTE (Integrated Land Use Transportation Environment model
) is an agent-object-based microsimulation model system designed to dynamically simulate the evolution of demographics, land use, and travel within urban areas [11].It is a comprehensive, integrated solution space containing both transportation and land use components intertwined to explore or project the impact of different policies on an urban region.ILUTE has been under development at University of Toronto for several years [9,10,[12][13][14].
Figure 1 illustrates a high-level flowchart of ILUTE simulation engine consisting of nine main modules: base year synthesis, demographic updating of agents, labour market, housing market, auto ownership model, commercial vehicle movement model, activitybased daily travel (TASHA), road and transit network assignment model, and environmental model [14].The simulation starts from a base year representing the initial state of an urban system towards a target end year to simulate the evolution of the whole urban system in a monthly or a yearly time-step depending on the prediction model used.An extensive time-series database is assembled from a wide variety sources to support the model development activity of ILUTE including census, a household travel survey, real estate, demographic, and special data.More details about input data are summarized by Miller et al.Several exogenous inputs such as interest rates, energy prices, in-and out-migration rates, and zoning are fed to the model system at each simulation time-step.
In ILUTE, the evolution of system state over time is defined in terms of several agents including individual persons, families, households, dwelling units, firms, etc. that collectively define the urban system being modelled [15].Agents play an important role in ILUTE's evolutionary engine as the decision making units, in which their behaviours and attributes are simulated over time.First, lists of aggregate distributions of agents are synthesized for the base year to generate disaggregated sets of statistically representative agents.In this regard, the interconnections of gents and objects, and their attributes and descriptions in the urban system are simulated.At the next timestep, the population demographics such as aging and mortality are updated.Family/household composition (such as marriage, divorce, births, adult children leaving/returning home) and school participated are also updated in the demographic updating module.The rest of key processes that are explicitly modelled within the ILUTE evolutionary engine that operate upon and change over time are listed below [16]: 1. Supply of new housing and commercial floorspace.
2. Firm growth/decline, location/relocation resulting in changes in the amount, type and location of employment.

3.
Changes in labour force and school participation.

4.
Changes in household residential location.

5.
Changes in household auto ownership levels (also persons' possession of driver's licences and transit passes).

Person-based activity and travel.
The focus of this paper is testing the stochastic behaviour of demographic updating by examining the variability of predicted outputs in response to many random seed starting points.In the next section, a brief overview of the ILUTE's demographic updating module is presented.

Demographic updating module
The demographic update component of ILUTE updates sociodemographic attributes of person, family, and household agents throughout the simulation run.It is concerned with how life-changing events and major life decisions affect the demographic characteristics of the simulated population over time [17].Individuals, families, and households are the three primary agents in ILUTE responsible for all demographic decisions.It updates residential population demographics throughout the simulation in yearly time steps.Several demographic attributes such as births, deaths, divorces, marriages, inmigration, out-migration, and dwellings are updated at each simulation year.New agents (persons, families, and households) are introduced in the model through birth and in-migration, while agents exit from the model through death and out-migration events.Unions between agents are formed through a marriage market, while a divorce model dissolves existing ones [18].Figure 2 illustrates the model structure of the demographic updating component in ILUTE.
Historical validation of demographic updating module for the twenty-year of simulations  have been run, with model outputs being compared to Canadian Census data and Transportation Tomorrow Survey (TTS) data for 1991, 1996, 2001 and 2006.Figure 3 presents the simulated vs. historical family and household type distribution for four years (based on Census Data).The results demonstrate that ILUTE overestimates households and it seems that the errors are propagated slightly over the last three years.On the other hand, ILUTE tends to produce families with a high accuracy over the twenty-year simulation, but a slightly overproduction of families has been occurred in the year 2006.
Figure 4 plots the simulation results of birth, death, out-migration and the total population from ILUTE vs. the corresponding historical data over time.The results show ILUTE simulates the birth and death rate with a high accuracy (with a slight under prediction of death), but out-migration rates start off slightly too high.As ILUTE simulates the  of ILUTE in a series of different independent runs and to investigate the distribution of results and its association errors.This examination shows how errors propagate incrementally over the simulation of urban system evolution and also whether different components of the system interact correctly with each other.This is the uncertainty of the model system in which the error term for the natural phenomena (most components of ILUTE simulate natural phenomena) is expected to have a normal distribution.Otherwise, the distribution of simulation runs is skewed to the right or left from the normality nature, therefore, there is a chance that uncertainties or errors in the model system propagate over simulation years.In this regards, the uncertainty associated with predicting models due to the random (Monte Carlo) simulation nature of the ILUTE model system is investigated in this paper.
Studying the variability in ILUTE outputs due to the random effects of behavioral models is essential to test its predictive performance.To do that, 1000 independent simulation runs over a twenty-year  historical simulation period where undertaken for a sample of 150K households (around 10% of full population).Each simulation run starts with over 686K agents (including 423K persons, 113K families, 150K households), and the overall number of agents grow past 1.1 million evolution of urban system incrementally over time in a yearly time steps, the out-migration results are corrected, this may be due to the population levels increasing faster than they should have [19].

Stochastic Analysis of ILUTE Model System
In micro-simulation modeling, the behaviors of agents (as intelligent objects) are modeled through a variety of behavioral modeling techniques.ILUTE is a multi-agent micro-simulation model system, in which a variety of agents such as persons, families, households, firms act and interact in complex ways.In ILUTE, a behavior is implemented by objects called "decision-making units" [10].In order to project and predict real-world interactions between agents, a variety of behavioral models and events were designed in ILUTE to simulate the random behaviours of different agents (known as intelligent objects) through a set of random seed starting points.Specifically, the Mersenne Twister pseudo-random number generator is used [20].However, utilizing such random numbers produces uncertainties to the outputs of ILUTE model system, in which each independent simulation run results a set of different outputs.Because ILUTE is an incremental model system, it is very important to examine the behavior of each individual component  after a twenty-year run.On a computer with an i7-2600 processor (3.4 GHz, 8 threads) with 16 GB of RAM running on a 64-bit Windows 7 operating system, the simulation for each run takes under 5 minutes to complete.In order to test the stochastic properties of ILUTE, a set of random seed starting points is generated for each simulation run.The 1000-simulation runs take 22 hours and 30 minutes to complete.In order to reduce the simulation runtime, a high performance cloud computing (HPCC) platform includes 10 nodes (Canada SHARCNET cloud platform) with parallel programing are also utilized for the 1000 simulation runs.Each node has 1.6 GHz and 4 threads.Using the HPCC platform reduces the simulation runtime significantly to around 9 hours.
The stability of ILUTE in response to the 1000 simulation runs are tested by examining the variability of various demographic outputs including births, deaths, families, and households which are discussed in the following three sub-sections.

Testing distribution of demographic outputs
In ILUTE's simulation engine, demographic outputs are simulated in yearly time-steps.The distributions of each demographic output under examination for all simulation years are plotted across simulation runs.Figure 5  In order to analyze the variability of demographic outputs (which are resulted from 1000 runs), the normal frequency distribution curves are fitted to each of simulated demographic outputs for each year and then overlaid for all simulation years.Figure 6 plots fitted normal frequency distributions for the predicted number of births, deaths, families, and households generated each year by the ILUTE demographic updating module.Each line in the graphs represents a normal frequency distribution for a specific simulation year.As can be seen from these plots, births and deaths each have generally similar probability distributions (variances) across the years, while the variances in the number of families and households increase over time following relatively the same pattern.Although the variance grows over time, in numerical terms, the run-to-run variability is small which is concluded that the overall confidence in simulating number of births, deaths, families, and households is high.To explore the uncertainties in the demographic outputs in more detail, the spread in predicted values and coefficient of variance (COV) for each simulation year for each variable are examined in the next two sections.

Births and deaths
Figure 7 plots the spread in number of births and deaths by year across the simulation runs (plots a and c) and the corresponding COVs for these two outputs (plots b and d).Different colors in Figures 7a  and 7c represent different runs of ILUTE.The red line in each graph is the average trend of all runs by year for each output.The result shows the number of deaths increases monotonically over time, which is to be expected since these basically track the overall growth in GTHA population during the simulation time period.The COV, however, declines over time, indicating that the increasing rate of spread in predicted values is not higher than the overall mean rate.They are also small in magnitude.
Births display a more complex pattern Figure 7a.The long-term average birth rate appears to be more or less constant, but the yearto-year changes in average birth rates follow a somewhat sinusoidal pattern.This presumably reflects the trend in exogenous birth rates that are inputs to the model, as well as possibly trends in the relative number of females of prime child-bearing ages, the number of marriages in the modeled system, etc.The birth COV displays approximately the inverse pattern, rising when the birth rate falls and vice versa, overall displaying a relatively constant (and small) variance, as was qualitatively observed in Figure 7a.In both the births and deaths, the COVs are constantly small in value indicating the stability of ILUTE in predicting the two variables.In order to test the normality of the simulation runs for the birth and death components of the demographic updating module, the Shapiro-Wilk test is utilized at 0.05 confidence level.The Shapiro-Wilk test utilizes the null hypothesis principle to check whether a sample come from a normally distributed population [21] is calculated as below: Where is the Shapiro-Wilk value, is the simulation run for the variable of interest, is the mean of all simulation run, and is the constant which depends on expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution and the covariance matrix of those order statistics.
The results of the test for the simulation results of birth and death are summarized for all years in Table 1.For an alpha level of 0.05, the simulation data set of birth and death with the p-value of greater than 0.05 (except for year 1999 of Death) rejects the alternative hypothesis, and concludes that the data comes from a normal distribution.It can be also concluded that the error term also has a normal distribution and there is a little chance that errors propagate over the simulation years (Figure 4).
Examining the normal Q-Q plots for each simulation year confirms the observation and the statistical test.As an example, normal Q-Q plots and detrended normal Q-Q plots for the simulation years of 1991, 2001, and 2006 are graphed comparing the observed birth and death simulated in ILUTE on the X-axis with the expected values (assuming a normal distribution) on the Y-axis Figures 8 and 9.If the distribution of ILUTE-generated birth and death for a simulation year is distributed exactly like a normal distribution, the points should fall on a straight line.The Q-Q plots indicate that the generated birth and death distribution is very close to the normal distribution at each simulation year but slightly deviates at the tails of the normal curve for a few number of ILUTE runs.The detrended Q-Q plots illustrate how much the observed simulated values deviate from the expected values of normal distribution.It also demonstrates that the concentration of birth and death results for 1000 runs is very close to the mean implying the stability of ILUTE in simulating dwellings.

Families and households
Figures 10a and 10c plot the spread in predicted number of families and households by year respectively, with the red line indicating the mean value for each year.As can be seen from the two figures, while some growth over the twenty-year simulation occurs in the spread in the range of number of families and households generated, the spreads for the two variables remain small and tight around their means.Predicted families and households follow relatively the same pattern with monotonically increasing rate over time Figures 10b and 10d plot the coefficient of variation (COV) in predicted families and households for each year.As it is seen the COV for the two variables follows a similar pattern increasing for the first couple of simulation years and then tending to be more stable over time.In particular, the values of COV across the simulation years are very small, which indicates that the deviation in predicted families and households from their mean across simulation runs of ILUTE is negligible.It thus appears that ILUTE is very stable in simulating families and households across independent runs with different random seed starting points.
The normality test of Shapiro-Wilk is performed for the 1000 simulation runs of families and households.The results of the test are summarized for all years in Table 2.For an alpha level of 0.05, the simulation data set of families with the p-value of greater than 0.05 rejects the alternative hypothesis, and concludes that the data comes from a normal distribution.However, the results for the households are different in the way that the simulation outputs have normal distribution for the years 1987 to 2003, but not for the last three years.Comparing the household historical data with the ILUTE results also demonstrates that the errors tend to be increased for the last three years of simulation.This indicates that the error term of the model does not have a normal distribution and there is more chance that the errors propagate over simulation years as expected from the observation of the historical data and ILUTE results (Figure 3).This condition may also affect the accuracy of other ILUTE components which interact directly with the components of the household simulation module.In general, these results are very encouraging for the use of largescale stochastic LUTMs for practical policy analysis.They are, however, still partial in that other system components (notably the labour market and transportation demand models) still need to be added into the tests.The results also need to be extended to the 100% population case.Based on the current results, however, there is no reasons to expect that a (properly validated) more complete model system or larger sample size would result in significantly more chaotic or unreliable results.

Figure 2 :
Figure 2: Overall model structure of the demographic updating component in ILUTE.

Figure 5 :
Figure 5: Distribution of simulated number of births.

Figure 6 :
Figure 6: Normal distribution curves of ILUTE Multi-run over simulation years for (a) births (b) deaths (c) families (d) households.
plots the histograms of the simulated number of births for the years of 1987, 2003, and 2006 and presents their associated means and standard deviations as an example.The results illustrate that the distribution of births are very close to normal for the given years.The average number of births grows from 5901 in 1987 to 6370 and 6655 in the years 2003 and 2006 respectively.The standard deviations (St.Dev) for all three years changed slightly indicating the spread of births for different runs are very close to its mean at each year.Testing the distributions of births, deaths, families, and households for all simulated years (1986-2006) indicates their normal behaviour across different independent simulation runs.

Figure 7 :
Figure 7: Number of births (a) and deaths (c) by year, and the coefficient of variance (COV) in these predicted values for births (b) and deaths (d).

Figure 8 :
Figure 8: Normal Q-Q plots of number of births.

Figure 9 :
Figure 9: Normal Q-Q plots of number of deaths.

Table 1 :
Test of normality (Shapiro-Wilk test) for births and deaths simulations.