A stochastic approach for quantifying immigrant integration: the Spanish test case

We apply stochastic process theory to the analysis of immigrant integration. Using a unique and detailed data set from Spain, we study the relationship between local immigrant density and two social and two economic immigration quantifiers for the period 1999-2010. As opposed to the classic time-series approach, by letting immigrant density play the role of"time", and the quantifier the role of"space"it become possible to analyze the behavior of the quantifiers by means of continuous time random walks. Two classes of results are obtained. First we show that social integration quantifiers evolve following pure diffusion law, while the evolution of economic quantifiers exhibit ballistic dynamics. Second we make predictions of best and worst case scenarios taking into account large local fluctuations. Our stochastic process approach to integration lends itself to interesting forecasting scenarios which, in the hands of policy makers, have the potential to improve political responses to integration problems. For instance, estimating the standard first-passage time and maximum-span walk reveals local differences in integration performance for different immigration scenarios. Thus, by recognizing the importance of local fluctuations around national means, this research constitutes an important tool to assess the impact of immigration phenomena on municipal budgets and to set up solid multi-ethnic plans at the municipal level as immigration pressure build.

We apply stochastic process theory to the analysis of immigrant integration. Using a unique and detailed data set from Spain, we study the relationship between local immigrant density and two social and two economic immigration quantifiers for the period 1999−2010. As opposed to the classic time-series approach, by letting immigrant density play the role of "time", and the quantifier the role of "space" it become possible to analyze the behavior of the quantifiers by means of continuous time random walks. Two classes of results are obtained. First we show that social integration quantifiers evolve following pure diffusion law, while the evolution of economic quantifiers exhibit ballistic dynamics. Second we make predictions of best and worst case scenarios taking into account large local fluctuations. Our stochastic process approach to integration lends itself to interesting forecasting scenarios which, in the hands of policy makers, have the potential to improve political responses to integration problems. For instance, estimating the standard first-passage time and maximum-span walk reveals local differences in integration performance for different immigration scenarios. Thus, by recognizing the importance of local fluctuations around national means, this research constitutes an important tool to assess the impact of immigration phenomena on municipal budgets and to set up solid multi-ethnic plans at the municipal level as immigration pressure build.

I. INTRODUCTION
A particular political challenge of growing immigration is immigrant integration. It is considered a necessity for minimizing frictions and confrontation between immigrants and natives in the host community, as well as a precondition for a competitive and sustainable economy [1]. In response to the recent rapid growth in the number of immigrants throughout many major regions in the world, the need for political intervention targeting integration has become increasingly urgent [2]. Still, effective policymaking in this area is obstructed by the lack of rudimentary knowledge about how immigrant integration responds to an increase in immigration.
To this end, in a recent work [3] a new approach for studying key-integration quantifiers, based on methods, models, and ideas from statistical physics, was proposed. The theory describes and predicts how typical integration quantifiers change when the density of migrants increases. The results predicted a linear growth for the averages of economic quantifiers like permanent and temporary jobs given to immigrant, and a square root growth for the averages of social quantifiers like mixed marriages and newborns to mixed couples. This framework is a powerful tool for the policy makers that are interested in assessing and evaluating integration progresses at the national level.
To deal with the phenomena at municipality level we use here a different theoretical framework based on the theory and techniques of continuous random walks [13,18]. The approach developed in [3], based on a full micro-macro statistical mechanics theory, revealed in fact a high efficacy to forecast average values. However, since the developed model does't have yet an exact solution, its related phase space picture is not fully disclosed and doesn't cover yet the structure of the fluctuations around the mean values. The random walk approach that we follow here instead, based on a meso-macro stochastic process, has the advantage to allow for a full analytical control of both mean values and fluctuations.
We consider classical quantifiers of integration such as the fraction of all temporary and permanent labor contracts given to immigrants, the fraction of marriages with spouses of mixed origin (native and immigrant), and the fraction of newborns with parents of mixed origin. The evolution of these quantifiers versus the percentage of migrants inside the host country is "locally erratic", that is, when looked at a fine level of resolution such as the municipality, it can be thought of as a random walk where the time change is represented by the change of migrant density in the municipality, and the integration quantifier -playing the role of the space variable -changes according to suitable probability distributions defining the stochastic process. Instead of obtaining the evolution of averages via statistical mechanics, with this approach the evolution of averages are here the result of averaging over the whole ensemble of municipalities, i.e., averaging over all the random walks.
From a sociological perspective, the evolution of the quantifiers, with respect to the density of immigrants, is, in fact, a random process whose stochasticity may depend on several exogenous factors driving immigration: fluctuations in the ratio between work demand and work request in the host country [2]), or "biases" resulting from (for example) push-pull factors [2] or different types of network induced migration outcomes [4][5][6]. However our aim here is not to explain or disentangle these mechanisms, but rather to look at the evolution of quantifiers as a combined effect of a "drift" in the presence of some "noise" regardless of its source/origin. To this task we use random walk theory: the latter constitutes the prototype of stochastic process, and, at the same time, the basic model of diffusion phenomena and non-deterministic motion. Indeed, applications can be found in the study of, for example, transport in disordered media (e.g., [7]), anomalous relaxation in polymer chains (see e.g., [8]), financial markets (see e.g., [9]), quantitative analysis in sports (see e.g., [10]).
Using stochastic process theory allows to get a mesoscopic description of the integration quantifiers behavior and to addresses questions such as whether these socio-economic metrics are determined by memory-less stochastic processes or by processes with long-time correlations. Moreover, this framework allow us to analyze rare events and non-Markovian quantities which are important determinants for planning, in so far they are key tools for quantifying fluctuations. That is, we aim to provide efficient tools to help assessing the progress (or deficit) in integration as well as to generate strong predictions for extreme case scenarios at lower administrative levels such as municipalities, and thereby, through an interplay between statistical mechanics and stochastic processes, we broaden the scope of practical applications of the quantitative theory of immigrant integration as a whole. Typical questions begging an answer are for example: What is the worst/best case scenario in the two integration branches -social and economic integrationin a particular municipality if immigrant density changes from say 5 to 7 percent? And how does the effect magnitude of this change compare to the effect magnitude of an equivalent change at the national level, i.e., average change, or in a similar/dissimilar municipality? In other words, through first-passage-time and maximumspan techniques, we obtain estimates for the expected value of immigrant density for which a particular integration quantifier -say, the share of immigrant workers or the number of mixed marriages -reaches a given threshold above which new policies, structures, services, facilities etc., have to be made available.
The work is organized as follows: first we describe the database and the procedures for data extraction (Sec. II), then we explain in details the mapping between the evolution of social quantifier and of a random walk (Sec. III and IV) and we report the related results (Sec. V). Finally, we discuss how such outcomes may be exploited to more effectively set up multiethnic plans and immi-gration policies in general (Sec. VI).

II. DATA DESCRIPTION, ANALYSIS AND ELABORATION
Data considered here refer to quarterly observations during the period 1999 to 2010. It is drawn from Spain's Continuous Sample of Employment Histories (the so called Muestra Continua de Vidas Laborales or MCVL) [19] and from the local offices of Vital Records and Statistics across Spain (Registro Civil) [20]. The former provides detailed data on labor contracts, and the latter provides detailed data on spouses and parents to newborns. Information on the municipalities immigration density are drawn from the Municipal population registers [21] A unique feature of the Spanish data is that all three data sources include also so called undocumented immigrants, that is, immigrants that lack a residence permit. Undocumented immigrants are usually not included in official statistical sources. However, their assimilation within the immigrant population is often significant and excluding them would underestimate the true size of the immigrant population as well as the frequency of the socio-economic events used to measure integration.
Because "municipality" is the lowest administrative level for which data on density is available, the individual data on mixed events is aggregated to the level of municipality. From these datasets, for each municipality[22] we obtain quarterly time series for the following quantities: J t = #temporary contracts to immigrant #temporary contracts , B m = #newborns with mixed parents #newborns .
As explained below, by studying how the quantities in Eqs. 1-4 vary with the overall fraction of immigrants, we can unveil the growth law determining their evolution and based on this information make previsions.
In order to assess the evolution of the Immigrants-Natives system, a convenient quantity to use as control parameter is where γ = N imm /N is the fraction of immigrants. Indeed, Γ provides an intensive measure of the cross-links existing among the communities of natives and of immigrants (however, for small values of γ, Γ ∼ γ, hence we can roughly map the percentage of migrants with the time in our bridge). Moreover, differently from other possible choices such as time, using Γ avoids any inaccuracy due to seasonality and allows to directly compare municipalities of different sizes (see also [3]). Complete time series for data on labor contracts involve M J = 124 municipalities and consist of 2976 data entries over the period 2005−2010 which is sampled quarterly (i.e. overall 24 trimesters). Complete series for data on marriages and newborns involve M M = 581 municipalities and consist of 23240 data entries spanning the period 1999 − 2008 which is sampled quarterly (i.e. overall 40 trimesters).
Thus, for any municipality i, we consider five time series: one for Γ and one for each observable in Eqs. 1-4, hereafter denoted generically as X (i) .
As Γ varies, each series X (i) determines a "path" in the related space and this point process can be looked at as a continuous-time random walk (CTRW) [23], where the time variable is given by Γ, while the space variable is given by X (i) , see Fig. 1. This mapping is fully described in the next section.
Finally, in Fig. 2 we show the time series for X (i) and Γ (i) vs time (in units of trimesters) to highlight the different shape of paths.

A. Telegraphic introduction on CTRWs
A CTRW process can be depicted as a dynamical point (to fix ideas embedded in a one-dimensional space, as here we need such a case only), which occupies a position r(t) at time t (see also Fig. 3). Let us suppose that the point starts on the origin, that is r(0) = 0. Then, it stays fixed to its position until time t 1 , when it jumps to ξ 1 , where it waits until time t 2 > t 1 , when it jumps to a new location ξ 1 + ξ 2 , and so on. The series {t 1 , t 2 , ...} defines  Fig. 1) are depicted in different colors. Notice that for marriages seasonality effects emerge: during summer months marriages are more frequent.
FIG. 3: Example of path realized by a CTRW whose step widths and waiting times are extracted from the distributions given by Eqs. 28 and 30, respectively, and with parameters consistent with those found experimentally (see Tab. II).
the times of jumping events. The times The waiting times {τ i } and the width of the instantaneous jumps {ξ i } are continuous random variables extracted from the distribution ψ(ξ, τ ). The latter determines the long-time properties of the walk: a diverging average waiting time typically corresponds to subdiffusive behaviors, while a diverging variance for jump widths typically corresponds to super-diffusive behaviors.
The position r of the particle at the k-th jump, that is at time t k , is given by the sum r(t k ) = k i=1 ξ i . Getting r(t), namely a direct dependence on t, requires the introduction of the random variable n(t), representing the number of steps m performed up to time t and defined by n(t) = max{m : t m ≤ t}, in such a way that The expected value r(t) of the displacement can be derived from the probability distributions for the waiting time and for the step length. In fact, focusing on the decoupled case [24], we can define ξ = ξf (ξ)dξ and τ = τ ψ(τ )dτ , whereby, as long asτ is finite, one can show that, in the limit of large t [11] Thus, if there is no net drift (ξ = 0), the average displacement is zero and one usually looks at the mean square displacement which turns out to scale as r 2 (t) ∼ ξ 2 t/τ , and the purely diffusive limit can be recovered.
On the other hand, in the presence of a net drift (ξ = 0), the mean displacement can also be expressed in terms of the mean number of steps n(t) performed up to time t as (see e.g., [11,12]) and, accordingly, r 2 (t) ∼ r(t) 2 [11,12]. From Eq. 8, one can see that if the average time diverges or displays any anomalous behavior, the biased motion turns out to be anomalous as well.
Of course, the definitions given here can be extended to a geometrical space with arbitrary topology [13].
Despite this random walk process is, by definition, Markovian, one can also introduce non-Markovian related quantities such as the mean-first passage timet and the maximum spanr, [14].
The mean-first passage time represents the mean time taken by a random walk to first reach a (fixed) point placed at a given initial distance r. Its dependence on r qualitatively depends on the kind of diffusion realized, in particular:t The maximum span represents the farthest distance ever reached by a random walk up to time t. Again, the functional form ofr as a function of t depends on the kind of diffusion realized: These relatively simple laws stem from the peculiarity of the one-dimensioanl structure. In general, the behavior oft andr functionally depends on the underlying topology.
Indeed, due to their non-Markovian nature, estimating such quantities may be rather tricky, yet they are intensively studied as they provide useful information and play an important role in many real situations (e.g. transport in disordered media, neuron firing, spread of diseases and target search processes [13,15,16]).
To summarize, the CTRW is a stochastic model for which ψ(τ ) and f (ξ) serve as input functions. The output is provided by the temporal series {t 1 , t 2 , ...} and {r 1 , r 2 , ...} from which quantities such as mean squared displacement, mean first-passage time, etc. can be calculated.
In the next section, the jump widths ξ i 's as well as the positions r(t) will assume different meanings (i.e., number of mixed marriages, of newborns from mixed couples, of temporary/permanent contracts to immigrants) according to the specific quantifier addressed.

III. THE MAPPING IN A NUTSHELL
Let us denote with X (i) a generic quantifier (i.e., the number of mixed marriages, of newborns from mixed couples, of temporary/permanent contracts to immigrants), where i specifies the municipality. According to the quantifier considered i is bounded by M J or by M M . Therefore, we have the time series where X For a (one-dimensional) CTRW of T steps, defined by the two series where ξ n is the jump width and t n is time when the n-th step occurs, we recall that the position r(t) of a walker at time t is obtained by Analogously, we can state that, for the i-th municipality, the value of the quantifier X (i) (Γ) corresponding to degree of cross-link Γ is where ∆X Therefore, we can look at the set of M municipalities as a set of M random walks. Actually, before proceeding, a couple of remarks are in order.
In principle, Γ and X are bounded by 1, yet, the number of immigrants corresponds to a small fraction of the overall population in such a way that Γ, X << 1 and we can neglect boundaries [25].
Moreover, Γ and X are not continuous variables as there exists an intrinsic unit given by 1/#number of marriages, 1/#number of newborns and 1/#number of contracts, representing our experimental sensitivity. However, such a unit is in general much smaller than the quantities measured which can therefore be considered as continuous.
Therefore, we can treat the set of M municipalities as a set of M random walks, for which we can build the following ensemble average: Similarly, for the average square distance covered The progression of the quantifiers X(Γ) averaged over the whole set of municipalities, that is to say, the average displacement of the related CTRW, is shown in Fig. 4, where fits evidence the following behaviors perfectly consistent with those outlined in [3], despite the procedure for their derivation is conceptually different; this confers robustness to the above results. To summarize, in our random-walk picture for the time evolution of the social quantifier X, in each municipality the quantifier starts from zero and, for a given variation of the related immigrant percentage Γ, the quantifier increases or decreases until the path ends. The trajectory of X versus Γ qualitatively resembles the position of a CTRW as a function of time (see Figs. 1 and 3).
In the next section we analyze the CTRWs associated to the quantifiers and try to get a microscopic perspective for the origin of these laws. Such a perspective will allow to speculate about possible effects and to make crucial forecasts.

IV. FORMALIZING THE MAPPING
We first check that the CTRWs corresponding to J p , J t , M m and B m are decoupled, that is, the related probability distributions ψ(∆X, ∆Γ) for the generic increments Bm (panel d). Data available were binned over Γ and averaged over the set of M municipalities; the resulting values (•) and the related best fit (solid line) are shown. In particular, for family quantifiers we fitted by the law r = p1 √ t + p2, while for job quantifiers we used the law r = p3t + p4; best fit coefficients are summarized in Tab. I. In general, the goodness-of-fit R 2 ranges between 0.97 and 0.99. Notice that X 2 (Γ) ∼ X(Γ) suggests the presence of a drift [11]. ∆X and ∆Γ can be factorized into f (∆X)ψ(∆Γ): this is achieved through direct inspection of the scatter plots reported in Fig. 5.
Thus, we can proceed by studying separately f (∆X) and ψ(∆Γ). We recall that such distributions provide qualitative information about the diffusive behaviors of the walks associated to our quantifiers, that is, on their time progress. Moreover, from f (∆X) and ψ(∆Γ), we are able to derive the expectation values which play as the expected jump length and as the expected waiting time respectively. Analogously, we can derive n(Γ) which plays as the expected number of steps performed up to "time" Γ, that is where Q(n|Γ) is the probability that n j ∆Γ j is smaller than Γ, but n+1 j ∆Γ j is larger that Γ. From these quantities, one finally has (see e.g., [11,12]) Of course, the expectation X(Γ) and the ensemble average X(Γ) ought to be consistent (as checked in the next section). This ensures the ergodicity of the system and will allow us to exploit the analytical results derived starting from the probability distribution functions also for our "time" series.

A. Step width and Waiting time distributions
Let us start with the distribution for the "step lengths" f (∆X). In Fig. 6 we show the histogram for the increments ∆J t , ∆M , ∆J p and ∆B obtained from experimental data. In all cases the symmetric, centered exponential distribution provides an excellent fit. An exponential distribution for step lengths ensures that the related CTRW does not exhibit any super-diffusive feature as the central limit theorem is fulfilled. Now, the fit coefficient λ depends on the quantifier considered and it is directly related to the expected value by λ −1 X = ∆X. Results are collected in Tab. II, where a comparison with the experimental average values |∆X| and ∆X is also provided.   Fig. 6, while the third and fourth columns contain the related average values, where the average is performed on raw data over all municipalities. Being the support of the exponential distribution positive, λ −1 X has to be compared with |∆X| . Moreover, we checked that the absolute error on |∆X| is approximately equal to |∆X| itself, as expected from an exponentially-distributed variable. Notice that the average displacement ∆X in a single step is positive for any quantifier. The goodness of the fit is corroborated by the fact that λ −1 X and |∆X| coincide within the error. However, looking at ∆X we report a slight deviation: while one would expect a null average value due to the centrality of the distribution, the average is systematically positive for all quantifiers and this implies that, as Γ increases, X is more likely to grow rather than to decrease. In the random-walk picture, this can be interpreted as the presence of a drift which biases the motion of the walker.
Let us now move to the distribution for the "waiting times" ∆Γ. In Fig. 7 we show the histogram for the increments ∆Γ obtained from experimental data related to the time period and to the municipalities considered. Interestingly, here qualitative differences emerge between the job quantifiers, i.e. J t and J p , and the family quantifiers, i.e. M m and B m .
Before proceeding it is worth stressing that for job quantifiers and family quantifiers the time along which sampling has been performed is not exactly the same, being, respectively, 2005-2010 and 1999-2008 (of course, the consistency between the related time series has been checked for the overlapping period [3]). Now, in order to ensure that the qualitative differences reported do not stem from different time interval, but are intrinsic, we repeated the analysis shown in Fig. 7 by restricting only to the common time lapse 2005-2008 and, indeed, we checked the robustness of the result.
In fact, calling ψ F and ψ J the distributions for family and job quantifiers respectively, the reason for their intrinsic difference can be depicted in the way mapping between quantifier evolution and random-walks has been fixed. In particular, there exist trimesters i for which a growth in the number of immigrants is reported, i.e. Γ i − Γ i−1 > 0, but no change in the quantifier X considered occurs, i.e. ∆X i = 0. In such cases the two trimesters behave as practically merged as the overall waiting time gets Γ i+1 − Γ i−1 . This concept can be repeated iteratively until each step of the walk actually corresponds to a true displacement. Thus, as one can see from Fig. 7, such merging are more frequent for family quantifiers in such a way that the related waiting times display a larger range. Otherwise stated, the integration of immigrants within the market is more direct: as long as new immigrants arrive, a fraction of them get a job, either permanent of temporary. Conversely, the integration of immigrants from a familiar perspective is more complex and does not follow a prescribed pattern: not surprisingly, the arrival of new immigrants does not necessarily correspond to integration when considering these quantifiers. This is consistent with the results in [3], where from a different perspective, it is shown that the qualitative difference between the laws M m (Γ), B m (Γ) and J t (Γ), J p (Γ) is due to a different degree of interaction among agents in the two different scenario (families and jobs).
It is worth stressing that such effect is not directly imputable to the seasonality of marriages; this can be seen, for instance, from the fact that for newborns the same effect emerges as well, but their time series do not display any seasonality.
Let us now analyze in more details the waiting time distributions.
For family quantifiers the distribution ψ F (∆Γ) fitting the experimental histogram is a log-normal distribution for which the average value is expected to be ∆X = e µ+σ 2 . As for jobs, the best fit is provided by a halfnormal distribution for which the average value is expected to be ∆X = µ. Details on fitting coefficients and average values are all collected in Tab. III; notice that, in both cases, ∆X turns out to be comparable with the ensemble average ∆Γ .
Thus, although both ψ J and ψ F fulfill the central limit theorem and display a finite mean, the latter displays a long tail so that we expect that the growth for family quantifiers may be slowed down.
In particular, we expect such slowing down to be more evident at "short times", namely for small values of Γ. This can be seen intuitively: for family quantifiers waiting times are more broadly distributed in such a way that for relatively small values of Γ it is likely that the the number n of steps performed is rather small, that is, smaller than the mean-field expectation value Γ/ ∆Γ . Now, given ψ J and ψ F , we can derive the number of steps performed up to time Γ, exploiting the properties of Laplace transforms (see e.g., [11,12]). Examples of numerical results of these calculations are shown in the lower inset of Fig. 7: the difference between the two cases is striking.
In order to check this point we measure directly on raw data the average number n(Γ) of steps performed before reaching the time Γ (see Fig. 7). Indeed, for jobs Γ µ σ 2 ∆Γ ∆Γ Job (1.2 ± 0.2) · 10 −3 (6.7 ± 0.6) · 10 −6 (2.0 ± 0.2) · 10 −3 (1.7 ± 0.2) · 10 −3 Family −6.6 ± 0.9 0.32 ± 0.04 (1.7 ± 0.3) · 10 −3 (1.9 ± 0.3) · 10 − 3   TABLE III: Best-fit coefficients obtained by fitting the probability distribution function of the "waiting time" ∆Γ shown in Fig. 7 according to Eqs. 29 and 30. The relative error on fit coefficients ranges between 10% and 20%. Within the error there is perfect consistency between the average values ∆X and ∆X , as well as between the variance of such distributions and the variance on the related raw data. Here we report only data for marriages and permanent jobs; for newborns and temporary jobs analogous analysis evidence only slight quantitative changes.
we find a roughly linear growth, i.e. n(Γ) ∼ Γ, while for marriages and births we find a slower growth, i.e. n(Γ) ∼ √ Γ. Such a qualitative difference, together with Eq. 8, immediately explains the results of Eqs. 20-23.
Summarizing, both processes display a non-null positive drift, i.e. ∆X > 0, yet the resulting behaviors are qualitatively different over the time window considered. Such a difference ultimately stems from deep differences in the waiting times: a broader distribution for ∆Γ occurs in the case of family quantifiers and the related random walks may experience rather long waiting times, although the jump widths remain narrowly distributed. The net result is just a slowing down in the progress of the quantifier.
Conversely, as for job, both ∆X and ∆Γ are narrowly distributed so that at each trimester we do not expect strong variations in the fraction of new immigrants getting a job.
Such a difference suggests an intuitive motivation, namely that the mechanisms underlying the emergence of mixed marriages are more complex and may be subjected to mutual interaction among individuals. This is perfectly consistent with the statistical-mechanics description of the phenomenon provided in [3].

V. FIRST PREDICTIVE OUTCOMES FOR SOCIAL PLANNERS
We now turn to the theory's predictive capacity. The aim is to present concrete instruments directed to aid policy makers at the municipal level in their work to accommodate and plan for further immigration. We focus on two well-known observables: the (mean) first passage time, and the (mean) maximum walk span.

A. Mean first-passage time
Mean first-passage-time quantities have been extensively investigated in a number of different fields, ranging from chemical kinetics to finance, as they provide an estimate for the average time at which a given stochastic event is triggered [15,16]. Given the process X(Γ) we calculate the valueΓ(x) at which the quantifier reach a certain threshold x. In order to evaluate the typical value ofΓ(x) we perform an average over the ensemble of walks, that is The quantity Γ (x) allows predictions about the consequences additional immigration have on integration and when a integration threshold is likely to be reached. For instance, let us say that when a integration quantifier reach the threshold x, some integration policies, activities, or services must be activated (e.g. concerning public education, public health, etc). Then, as Γ approaches Γ (x) local projects and plans need to be activated.
In Fig. 8 we show the mean-first passage time for the quantifiers considered in this work as a function of X. The mean first-passage time is especially useful for policies plans and service that are coupled with a concrete "discrete" integration target, and when we need to know the expected time when the politically defined threshold is reached, and activation of the plans are being called for.
For example, we could ask at which value of Γ (which is the percentage of migrants) we expect that the amount of newborns from mixed parents reaches the threshold of 10%. By simply looking at the behavior of X(Γ) , by inverting, we would get Γ ∼ 0.2. However, due to huge fluctuations (hence in some peculiar municipalities), the threshold of 10% can be reached much earlier, as the first passage time, returns a value Γ ∼ 0.04. Hence planning based on average evolutions only may underestimate reality by a factor rendering planning and resource allocation extremely ineffective.

B. Walk span
The walk span represents the largest point reached by the walker up to a given time. That is, the largest valuẽ X reached by X up to Γ. More precisely, we say that for the i-th walk, at the k-th step, the span isX (i) if X(n) (i) <X (i) (k), ∀n ≤ k. Again, in order to evaluate the typical value ofX(k) we perform an average over the ensemble of walks, that is Experimental data (•) are obtained by first getting the mean number of steps to first reach the distance X and then by inverting through n(Γ) (see Fig. 7 Fig. 4; see also data in Tab. I for comparison.
The average walk span provides information the capacity to integrate further immigration. In fact, in organizing local integration policies and make appropriate priority decisions among different integration initiatives, one is interested in the span of, say, the number of children, or the number of immigrants with permanent jobs, rather than in their average number as the latter may lead to dramatic over-and underestimations.
In Fig. 9 we show the span of the quantifiers considered in this work as a function of Γ. We notice that the qualitative differences already evidenced for X(Γ) are robust and, the span for marriages and births grows like √ Γ, while the span for temporary and permanent jobs grows like Γ. The persistence of such behaviors is consistent with the fact that such random walks display distributions for waiting time and step width having finite average and variance. For instance, for a simple random walk on a line the span grows in time like √ t, while in the presence of a drift one has a linear law t [13].

VI. CONCLUSIONS
Theoretical models, originally developed to solve physical problems, are increasingly being used to study social phenomena. Statistical mechanics and stochastic process theory are particularly well suited for this task, and have generated a novel quantitative understanding of the underlying complexity of social interactions. In this pa-  per we focused on stochastic processes. We identified the random behavior of the four integration quantifiers with random walkers: each municipality draws a random walk in the quantifier-migrant's density plane. Averaging over all the municipalities then allowed to investigate the evolution of the quantifier averages, which are found to scale with the square root of the percentage of migrants for familiar quantifiers and linearly with the percentage of migrants for job quantifiers, in complete agreement with previous findings obtained through the statistical mechanical route [2]. We inferred the distributions of jumps and waiting times (which are found to be decoupled): while jump distributions are exponentially distributed for all the quantifiers, waiting time distributions depends on the context: social quantifiers have log-normal distributions for those times, while economic quantifiers display Gaussian distributions.
This difference has a simple explanation. While there is a correlation, even on a short timescale between the last-arrived migrant and that migrants incorporation into the labor market (in order to sustain), the same is not true for marriages or newborns. Clearly correlation is likely to be negligible between the last arrived migrant and a mixed marriage or birth event (i.e., it is unlikely that the arriving migrant and the one, say, marrying a native are the same person). This results in a stronger noise affecting our social quantifiers, which destroys the net drift and simple diffusion is the only survivor. On the contrary, driven by the necessity to work to survive, our economic quantifier display ballistic motion. Another motivation that contributes to the macroscopic differences resides in the much broader distribution of jumps for the working quantifiers: The fat tail encoding for the long jumps in the working quantifiers implies a larger value of drift, that, coupled with much less noise -for the reasons just mentioned -lead to ballistic motion.
From a practical purpose, no power-law distributions are found. Hence, the Central Limit Theorem holds implying that the theory is suitable for generating predictions. To this end, we introduced two predictive non-Markovian tools: the "mean first passage time" and the "maximum span walk". Using these tools it become possible to tackle questions that traditionally been answered using guesstimates in a more scientific way. For example, our predictive framework can easily produce forecast of the share of newborns with mixed parents following an increase in the share of immigrants from, say, 3 to 5 percent? We make two types of forecasts: first, we assess the evolution of the mean of this quantifier. The evolution is obtained evaluating from Figure 4 the average increment, which is roughly from B(Γ) = 0.04 to B(Γ) = 0.05. Second we assess the mean worst case by dealing with fluctuations. These fluctuations are obtained by extrapolating data from Figure 8, which gives aB(Γ) ∼ 0.08, i.e. more than fifty percent higher than its average value. Although the investigated quantities are non-Markovian ( X (Γ) and Γ (X) ) their behavior is still treatable: each of them can indeed be studied separately as a onedimensional random walk also concerning the first passage time and the maximum span walk. On a broader level, this work provides a concrete rig-orous method for quantitative studies of social-science problems. The choice of immigrant integration is motivated by its prominent place in both the UE and the US political agendas. By uncovering the local variation pattern in the quantifiers we produced a scientific tool for anticipating the consequences of further immigration on local integration process. Information of this type has not been available in the past and constitutes great value for the development of immigration policies and multi-ethnic planning at the local level. However, while this work advances our knowledge on integration phenomena, other effects, like segregation phenomena, that may spontaneously develop in the host country has yet to be considered and incorporated into the theoretical framework developed here.
we estimate local immigrant densities for different points in time between 1999 and 2010.
[22] Due to data protection, data on mixed marriages and newborns with mixed parents is only available for municipalities with a population larger than 10000. In addition, and due to data protection, municipality coding for the labor contract data is only available if the municipality's population exceeds 40000. However, about 85% of Spain's immigrants reside in the included municipalities.
[23] The continuous time random walk (CTRW) was introduced by Montroll and Weiss [17]; see also [13,18] for recent reviews and SI for a deeper description.
[24] As we will show, this is the case recovered by our experimental data [25] Conversely, if boundaries can not be neglected the mapping could still be feasible but we should refer to the theory of random walks on finite chains