Data-driven assessment of arch vortices in simplified urban flows

Understanding flow structures in urban areas is widely recognized as a challenging concern due to its effect on urban development, air quality, and pollutant dispersion. In this study, state-of-the-art data-driven methods for modal analysis of simplified urban flows are used to study the dominant flow processes in these environments. Higher order dynamic mode decomposition (HODMD), a highly-efficient method to analyze turbulent flows, is used together with traditional techniques such as proper-orthogonal decomposition (POD) to analyze high-fidelity simulation data of a simplified urban environment. Furthermore, the spatio-temporal Koopman decomposition (STKD) will be applied to the temporal modes obtained with HODMD to perform spatial analysis. The flow interaction within the canopy influences the flow structures, particularly the arch vortex. The latter is a vortical structure generally found downstream of wall-mounted obstacles, which is generated as a consequence of flow separation. Therefore, the main objective of the present study is to characterize the mechanisms that promote these phenomena in urban areas with different geometries. Remarkably, among all the vortical structures identified by the HODMD algorithm, low- and high-frequency modes are classified according to their relation with the arch vortex. They are referred to as vortex-generator and vortex-breaker modes, respectively. This classification implies that one of the processes driving the formation and destruction of major vortical structures in between the buildings is the interaction between low- and high-frequency structures. The high energy revealed by the POD for the vortex-breaker modes points to this destruction process as the mechanism driving the flow dynamics.


Introduction
The study of the flow around building-like obstacles has been extensively addressed in the literature [1,2,3] due to its implications in urban-environment phenomena, i.e. pollutant dispersion, air quality, and heat propagation. The very high levels of air pollution to which the vast majority of the urban population is exposed are undoubtedly related to myriad health issues [4]. The hunt is for predictive models capable of accurately reproducing the pollutant and thermal distributions within urban environments. Some of these models have already been introduced by the European Union (EU) [5]. However, their inability to provide the spatiotemporal accuracy required to model pollutant dispersion through urban environments forces researchers to improve those methods to ensure urban sustainability. For instance, to establish a proper action plan to alleviate the associated adverse consequences, several studies [1,2,3,6,7,8] have focused their efforts on analyzing the spatio-temporal structures of the flow. The main point is to identify the three-dimensional flow regions responsible for the pollutant dispersion within a given urban geometry. Therefore, this study aims to apply recently-developed tools from system dynamics, notably higher-order dynamic mode decomposition (HODMD), to turbulent flows within urban environments to understand how different city configurations influence the mechanisms leading the flow dynamics.
The large number of spatio-temporal features present in the high-dimensional nonlinear system of a turbulent flow complicates the analysis. Nonetheless, the fact that physical-flow features are shared across a wide variety of flows suggests that they may be used to describe the dynamics of such a flow. Many experimental [1,6,7,9,10,11,12,13,14,15] and numerical [10,16,17,18] studies have focused on the flow around a wall-mounted cylinder of different aspect ratios, which is highly three-dimensional [6,7]. Hunt et al. [1] performed one of the first experimental analyses to examine the general pattern of the streamlines of the flow around a single wall-mounted bluff obstacle. They proved the absence of a closed surface, i.e. a separation bubble or a cavity, in the wake of the obstacle due to the interaction of four different vortical structures: (I) the horseshoe vortex formed around the obstacle, (II) the roof vortex and (III) the vortices on the obstacles sides, both having a strong interaction with the wake, which yield the formation of the so-called (IV) arch vortex downstream the obstacle. The latter is depicted in Fig. 1 and consists of two spanwise vortical legs on each side of the obstacle with rotation in the vertical axis and a roof, where the flow rotates in the spanwise direction. These vortices are a consequence of the interaction of the outer flow within the urban-canopy layer. Understanding their underlying physics is essential for developing strategies to reduce pollution dispersion and perform pedestrian-comfort assessment.
The apparent complexity of urban-based environments leads to more intricate physics due to the interaction of flow structures around individual buildings. Oke [2] provided an analysis of the resultant flow regimes as a function of the geometrical parameters that define an urban model. Interestingly, the author discovered that the street width was the critical parameter in establishing the flow regimes [2]: in the case of narrow streets, the flow above the canopy can barely reach down to the street (skimming flow), and only one vortex can be seen between the obstacles; gradually broader streets lead to the wake-interference regime first and then to the isolated-roughness flow, which exhibits much more contact with the flow above the roofline. Meinders [19] further examined this classification by analyzing the interaction of flow patterns around wall-mounted rectangular obstacles with different spacing ratios in the streamwise direction. The separated shear layer from the first obstacle reattached on the windward side of the downstream obstacle for the lowest separation, resulting in an inter-obstacle area with an arc-shaped vortex confined by the side flow [19]. With larger separation ratios (isolated-roughness regime), flow reattachment occurs in the region between the obstacles, from which a horseshoe vortex emerges around the downstream obstacle. This results in similar flow patterns for both obstacles, but with lower intensity in the downstream block due to the flow disruption of the upstream block [19].
A wide range of criteria has been developed to identify these vortical structures. Monnier et al. [15] aimed at identifying the main flow patterns present in the wind-tunnel flow around the geometry of the Mock Urban Setting Test (MUST) experiment using different criteria. They started evaluating the vorticity components, namely the wall-normal and spanwise mean components,ω y and ω z , respectively, to identify the location of the arch vortex. Using the modulus of the spatially-averaged vorticity vector allowed them to define a local threshold to properly characterize the influence of the angle of incidence (AOI) on this vortical structure, extracting similar conclusions to those of Becker et al. [6]. They also employed some popular methods for vortex identification, based on the second invariant of the velocity gradient tensor, i.e., the Q-criterion [1] and the λ 2 criterion [20]. However, they improved the identification of large-scale vortical structures using the normalized angular momentum technique Γ 1 , introduced by Sousa [21] to locate the center of vortical structures downstream of a single cuboid obstacle. This method allowed the authors to describe the relationship between the arch vortex and high-turbulence areas. As shown in Fig. 1, Monnier et al. [15] concluded that the arch vortex is located between high-turbulence areas. They consist of two regions of significant streamwise velocity fluctuations on both sides of the obstacles due to the separation of the shear layer and a high spanwise velocity fluctuating region along the windward face of the downstream obstacle. This experimental study led to relevant conclusions in analyzing coherent structures in a more realistic urban model.
Here, we focus on Oke's classification [2] to extract through data-driven procedures the key dominant patterns present in the three-dimensional instantaneous fields of the flow through urban environments with different separation ratios. In this regard, we explain the origin and evolution of the various three-dimensional topological patterns that precede the formation of the well-known flow structures found in these geometries: the horseshoe vortex, the roof vortex, the vortices of the obstacles sides, and the arch vortex. For the first time, several modaldecomposition techniques are used to identify how the previous vortices are related to the physical mechanisms driving the system's dynamics, shedding light on new possibilities for future urban-flow control research. First, we use properorthogonal decomposition (POD) [22] to identify those spatial modes energetically more relevant to the system and their associated time coefficients. We compare them with the results obtained using a recently-developed higher-order variant of dynamic-mode decomposition (DMD) [23], named HODMD [24]. Via this novel nonlinear dynamic mode decomposition approach, we can analyze the dynamics of a highly complex turbulent flow [25,24,24,26,27,28], cleaning noisy artifacts and small amplitude modes from data. Recently, Amor et al. [29] showed the potential of HODMD to understand the complicated physics of the wake in a wall-mounted square cylinder, which decomposes spatio-temporal data into a group of modes orthogonal in time, representing the leading flow dynamics [24]. Balanced POD (BPOD) [30], spectral POD (SPOD) [31] and spatiotemporal Koopman decomposition (STKD) [32] are other successful variants of POD and DMD for analysis of turbulent flows. It is notorious the similarities between HODMD and SPOD algorithms. Both methods combine a sliding window process with singular-value decomposition (SVD) to reduce the data dimensionality, selecting the most relevant features of the flow. The successful application of these methods for the analysis of turbulent flows has been presented in several works (e.g., Refs. [33,34,31,23]). Although both techniques are suitable for identifying the main dynamics of the flow in the cases studied in the present article, we will use HODMD. The advantage of this method is that it is automatic. In other words, HODMD identifies the main frequencies driving the flow dynamics without needing previous knowledge about the physics of the case studied [35]. Also, the method provides a complete frequency spectrum driving the flow. Hence it is possible to establish connections between some of the highestamplitude modes modeling the complex flow dynamics.
Following the introduction to urban flows, the present work provides a general overview of the performed numerical simulations in § 2. A summary of the mathematical concepts behind the modal-decomposition techniques used to characterize the flow structures over the numerical simulation data is addressed during § 3. The main vortices and structures found in the mean flow are presented in § 4. The mechanisms driving the flow dynamics within urban environments, which results from of the application of different data-driven tools, are investigated in § 5.
These structures are further examined in § 6 using the results obtained from the application of POD. Finally, a summary of the main conclusions of the project is provided in § 7, and the justification of the selected modes is performed in Appendix A through the calibration process of the methods. A review of the formation and destruction mechanisms of arch vortices in urban flows has already been addressed in a companion paper [36]. Here, a detailed analysis of the modaldecomposition techniques, which shed light on the mechanisms driving the flow dynamics, will be addressed with an overview of the high-order numerical simulations carried out to perform the present analysis.

Numerical simulations
The high-order spectral-element code Nek5000 [37] was used to solve the incompressible Navier-Stokes equations governing the flow in the cases under consideration. Based on the spectral-element method (SEM) of Patera [38], Nek5000 exhibits both geometrical flexibility and the accuracy of the high-order spectral methods [18,39,40,41,42,43,44]. Due to the flow complexity in urban environments, high-order methods need to be used to resolve all the relevant flow structures properly. In this database, we use a well-resolved large-eddy simulation (LES), the resolution of which is close to that of a direct numerical simulation (DNS) [41]. This code has been extensively used for high-fidelity simulations of complex turbulent flows, see Refs. [42,43,44]. The main focus of the present section is on the parameters with important implications in the modal decompositions. Additional details on the numerical scheme, employed resolution, and flow statistics can be found in Ref. [45].
The geometrical domain comprises two wall-mounted obstacles, as depicted in Fig. 2. The size of the computational box dimension varies according to the separation of the obstacles. While the wall-normal and spanwise directions remain the same for the three cases, the streamwise length changes proportionally to the separation ℓ, which modifies the computational cost of the numerical simulation associated with each case. The variation of this parameter according to the flow regime is depicted in Table 1. The obstacles are then defined by the height h, length w b and width b. The length-to-height w b /h and width-to-height b/h ratios correspond to 0.5 for both obstacles. All dimensions are normalized with the height of the obstacle h. The velocity field is given by v(x, y, z, t), where x, y, and z are the streamwise, wall-normal, and spanwise directions, respectively, and t is time. Every velocity is normalized with the free stream velocity. The components of the velocity are v = (u, v, w), which denote the streamwise, wall-normal, and h ℓ L y spanwise components, respectively. Using Reynolds decomposition, v is defined as v = V +ṽ, where V = v is the average in time andṽ is the turbulent fluctuation. Primes are reserved for intensities v ′ =ṽ 2 1/2 .
As an inflow condition, a numerically-tripped [46,47] laminar Blasius profile allows the flow for undergoing a rapid transition to turbulence without needing to accelerate the flow before reaching the obstacles. This numerical tripping consists of a weak wall-normal volume randomly added in the forcing terms of the incompressible Navier-Stokes equations to create flow disturbances, thus inducing turbulence. The inflow is located at x/h = −10, and the tripping force is applied at x/h = −9, allowing the boundary layer to develop in the region upstream the obstacles, i.e., −8 ≤ x/h ≤ −1. In this region, both z-averaged friction and momentum-thickness Reynolds numbers, i.e. Re τ and Re θ respectively, increase in the streamwise direction, reaching Re τ ≃ 175 and Re θ ≃ 450 upstream the obstacle at x/h = −2, which corresponds to fully-turbulent conditions. The adverse pressure gradient induced by the obstacles leads to an increase of the Rota-Clause pressure-gradient parameter and a decrease in the skin-friction coefficient, which are β = 0.6 and C f = 4.6 × 10 −3 at x/h = −2, respectively. In this same location, the boundary-layer thickness evaluated at the 99% of the freestream velocity, δ 99 , and the shape factor H, i.e., the displacement thickness to momentum thickness ratio, are δ 99 = 0.38 and H = 1.62 [45]. The stabilized outflow condition devel- oped by Dong et al. [48] is used as an outflow condition. At the upper part of the domain, a combination of outflow and Dirichlet conditions is used to simulate an open-air urban environment: a zero-stress condition is applied in the wall-normal direction and a Dirichlet condition in the other two directions [37]. Finally, periodicity is applied in the spanwise direction. A smooth-wall (including the no-slip and no-penetration) condition is applied to the bottom plane of the domain and the surfaces of the obstacles. We consider a spectral-element mesh with an eight-point Gauss-Lobatto-Legendre (GLL) quadrature in each element to solve the scale disparity of the flow. The mesh is refined in the near-obstacle area to increase resolution, which has a direct impact on flow statistics. Following the criteria of Negi et al. [41], the mesh employed in this work satisfies all the resolution criteria to be considered a well-resolved LES. Furthermore, the study focuses only on neutral stability conditions, where no buoyancy effects are simulated [49].
As we focus on the flow near the obstacles, the following region is extracted from the computational domain: 0 ≤ y/h ≤ 2 and −1.5 ≤ z/h ≤ 1.5. For the streamwise direction, we use −1 ≤ x/h ≤ 5 for the SF regime, −1 ≤ x/h ≤ 7 for WI, and −1 ≤ x/h ≤ 11 for IR, owing to the change in the separation between the obstacles. Using this reduced domain, we consider 225, 215, and 90 three-dimensional instantaneous fields of the three components of the velocity to perform the modal decompositions on the SF, WI, and IR flow regimes, respectively. Note that additional analyses conducted with a larger number of snapshots did not yield significant differences in the large-scale structures identified by the methods; hence, results are considered to be converged with the previous sets of snapshots. The previous fields were spectrally interpolated from the original SEM mesh to another one with a coarser resolution. The analyzed database has the temporal parameters gathered in Table 1. Note that this information is critical for the analysis of the spatio-temporal structures of the flow since they define the dynamical behavior of the system, which is closely related to the time span and time step of the snapshots to be analyzed. On this account, every database is obtained over a time span of the same order of magnitude, which is sufficient to accurately capture the low-frequency mechanisms occurring in the flow. All the introduced parameters are expressed in convective time units, i.e. a ratio between a characteristic length and a velocity. In the present work, time is obtained from the freestream velocity U ∞ and the height of the obstacle, h. In all cases, the Reynolds number is set to Re h = 10, 000.

Proper-orthogonal decomposition (POD)
The proper orthogonal decomposition (POD) is a modal-decomposition technique, introduced in fluid mechanics by Lumley [22], which aims at extracting coherent patterns from a given flow field. Thus, the objective of the POD algorithm is to decompose a set of data of a given field variable into a minimal number of modes (basis functions) that capture as much energy as possible. This process implies that POD modes are optimal in minimizing the mean-square error between the signal and its reconstructed representation. For instance, if the field variable to be examined is the velocity, the modes representing such variable are optimal to capture the kinetic energy of the flow field. This low-dimensional latent space provided by the POD modes is attractive for interpreting the most energetic and dominant patterns within a given flow field. Let us consider a vector field q (ξ, t), which may represent the velocity or the vorticity field depending on a spatial vector ξ and time. In fluid-flow applications, subtracting the temporal meanq (ξ) allows for the analysis of the unsteady component of the field variable: where x(t) represents the fluctuating component of the vector data with its temporal mean removed. This representation emphasizes that the data vector x(t) is considered as a collection of snapshots at different time instants t k . If the m snapshots are then stacked into a matrix form, we obtain the so-called snapshot matrix X: where J represents the number of points in x, y and z. The objective of the POD analysis is to find the optimal basis to represent the given set of data x(t). This can be solved finding the eigenvectors Φ j and the eigenvalues λ j from: where C states for the covariance matrix of the input data, defined as The size of this matrix depends on the spatial degrees of freedom of the problem.
In the case of fluid flows, this value is usually large since it equals the number of grid points times the variables to be considered. The POD modes are derived from the eigenvectors of Eq. (3), with the eigenvalues reflecting how well each eigenvector Φ j represents the original data in a least-squares optimal sense, i.e. it offers a way to find lower-dimensional linear approximations of a given data set [22]. This enables a hierarchy of modes in terms of captured energy, which improves understanding of the most prominent patterns, e.g. in a specific flow field. Another method for computing the POD algorithm is based on the singularvalue decomposition (SVD) [50], which can be applied directly on the snapshot matrix X to obtain the left Φ and right Ψ singular vectors as where Φ ∈ R J×J , Ψ ∈ R K×K and Σ ∈ R J×K . The matrix Σ contains the singular values (σ 1 , σ 2 , . . . , σ N ) along its diagonal, which relates to the eigenvalues as σ 2 j = λ j . Moreover, the left and right singular vectors correspond to the eigenvectors of matrices XX T and X T X, respectively. Therefore, the SVD can be seen as a rectangular-matrix decomposition technique capable of computing the POD modes.

Higher order dynamic mode decomposition (HODMD)
Aiming at identifying the spatio-temporal coherent patterns present in highdimensional flow data, Schmid [23] developed a data-driven tool that retrieved the spatially-correlated structures with similar behavior in time. This methodology, known as dynamic-mode decomposition (DMD), provides not only a reduction in dimension concerning a reduced set of modes that best reproduce the input flow field, but also a model for the interaction of those modes in time.
The method decomposes the vector field data v (x, t) as an expansion of M Fourier-type modes: for k = 1, ..., K, where u m represents the DMD modes weighted by an amplitude a m , ω m , their associated frequencies and δ m , their associated growth rates, which symbolize the temporal growth or decay of the u m modes in time.
The standard DMD algorithm assumes a linear relationship of two consecutive snapshot matrices using the linear Koopman operator R. To this end, a general snapshot matrix V k 2 k 1 can be defined for k 1 < k 2 so that its columns represent the snapshots varying equidistantly between k 1 and k 2 , namely: Therefore, using the previous nomenclature, the standard DMD can be defined based on the Koopman operator as where V K 2 and V K−1 1 represent here the second to last snapshots and the first to the second last snapshots of the data matrix, respectively. Recalling Eq. (6), this equation might be seen as the simplest equation exhibiting such behavior [51]. The Koopman matrix R, which is independent of k, contains the dynamical information of the system. Recently, Le Clainche & Vega [51] extended the DMD method for the analysis of various types of flows, e.g. turbulent, multi-scale or transitional flows and noisy experimental data. Based on Takens' delayed-embedded theorem [51], the higher order dynamic mode decomposition (HODMD) relates d time-delayed snapshots using higher-order Koopman assumption defined as which relates each flow field with the d subsequent fields. The HODMD algorithm can be encompassed into two main steps.

Step 1: Dimension reduction
First of all, the SVD technique is employed to reduce spatial redundancy and filter out noise caused by numerical or experimental errors. The truncated SVD allows for the reduction of the original snapshot data into a series of linearly independent vectors of dimension N (where N < J is the spatial complexity), based on a certain tolerance ε SVD : where Σ includes the singular values σ 1 , ..., σ N and W T W = T T T = I are N × N unitary matrices. Note that the parameter ε SVD is tunable based on previous information of the simulation or experimental data, e.g. if the noise level of the snapshots is known in advance, then ε SVD may be set to be comparable to that level (see details in Ref. [27]). Above all, this parameter determines the number N of SVD retained modes as: Following the definition in Eq. (10), the reduced snapshot matrixT can be defined as: The dimension of this reduced snapshot matrix is N × K.

Step 2: The DMD-d algorithm
The higher-order Koopman assumption, defined in Eq. (9), is now applied to the modified snapshot matrix as: The above equation may be cast in a more generic form by incorporating the modified snapshot matrixṼ k−d+1 1 and the modified Koopman matrixR as:Ṽ where the many Koopman operatorsR 1 , . . . ,R K are then combined into a single matrix after some computations, from which the eigenvalue problem can be solved to obtain the DMD modes, frequencies and growth rates defining the DMD expansion of Eq. (6). Sorted in decreasing order of the mode amplitudes, this expansion is further reduced by removing the modes such that: for m = 1, . . . , M , where ε DMD represents a parameter tunable by the user. The number of retained modes, M , represents the spectral complexity of the analysis. This complexity, together with the spatial one, determines the performance of the HODMD algorithm, which reduces to the standard DMD when d = 1. In complex fluid flows, the spatial complexity is usually smaller than the spectral one, N < M , where the standard DMD fails, thus requiring the use of the DMD-d algorithm.
Furthermore, using the tunable parameters ε SVD and ε DMD enables retaining only the large scales of the input data, which is particularly interesting for complex turbulent flows involving a large number of scales.

Spatio-temporal Koopman decomposition (STKD)
Spatio-temporal Koopman decomposition (STKD) is an extension of HODMD introduced to identify spatio-temporal structures as an expansion of traveling and standing waves driving the flow dynamics both in the streamwise and spanwise directions. For the streamwise direction, the spatio-temporal modes u mn 1 and growth rates ν mn 1 are defined in the following modal expansion, which reconstruct the original flow field analyzed as: for k = 1, . . . , K and j = 1, . . . , J. It must be emphasized that the spatio-temporal expansion is useful when the data exhibit exponential/oscillatory behavior in both the x coordinate and time. In this case, it is interesting to compare the expansion (16) with the purely temporal expansion (6). The former can be easily obtained by simply applying HODMD to the DMD modes in Eq. (6), resulting in the following DMD expansion: for j = 1, . . . , J. Eq. (16) is obtained by combining this solution with Eq. (6), where the spatio-temporal amplitudes are defined as a mn 1 = a m a n 1 . In a similar way, it is possible to obtain spatio-temporal expansions defined along the spanwise direction as: for k = 1, . . . , K and r = 1, . . . , R, where λ mn 1 and β mn 1 are the growth rates and wavenumbers related with the spanwise direction. Using this expansion, it is also possible to describe the analyzed data as a group of traveling waves, the phase velocity of which is defined as c mn 1 = ω m /β mn 1 . A more detailed description of the method can be found in Refs. [32,28]. Also, notice that this method has been successfully applied to turbulent non-periodic flows [33,52]. Finally, we conclude this section with a brief comparison of the previous techniques for the sake of completeness. While POD decomposes a given field variable into a set of linearly-superposed modes orthogonal in space and capturing as much energy as possible, HODMD offers a more advantageous decomposition when the dynamics of the system are the focus of the study. On the one hand, the dynamical behavior of the variables is not considered in the decomposition provided by POD, but is rather included in a set of time-varying coefficients associated with each of the modes. In contrast, HODMD decomposes the spatiotemporal data into a set of Fourier-type modes ranked in terms of their dynamical behavior, i.e. frequencies, amplitudes, and growth rates. Therefore, a combined analysis using HODMD and POD provides insights into the most relevant mechanisms of the flow in terms of dynamics and energy, respectively. Finally, STKD is an extension of HODMD, which identifies spatio-temporal structures as a collection of modes and waves. In this case, the method is particularly useful when the flow dynamics exhibit exponential or oscillatory behaviour in the streamwise and spanwise directions and in time.

Mean-flow structures
In this section, we focus on the evolution of the primary structures characterizing the mean flow for the three flow regimes. These flow regimes are the skimming flow (SF), the wake interference (WI) and the isolated roughness (IR) [2]. The change from a cavity-like to a wave-like flow topology is the most noticeable result of increasing the distance between the obstacles [45,53]. They both represent a substantial recirculation zone in the rear part of the upstream obstacle; however, for the cavity-like flow, this recirculation region extends throughout the entire space between the obstacles, whereas for the wave-like topology, it is only restricted to the immediate rear part of the first obstacle [45]. Meinders [19,54] also extended the work on flow structures around wall-mounted cubes by analyzing the interaction between the obstacles when more than one cuboid was introduced. Using oil-film visualizations, Meinders experimentally analyzed the influence of the separation distance between the obstacles, i.e. ℓ, on the flow around an in-line tandem disposition of two cubes. It was proved that the separation variance only led to a substantial modification of the mean flow patterns and, as a consequence, of the interaction of the free stream within the inter-obstacle region. In Fig. 3, the streamline time-averaged flow patterns for the three flow regimes are depicted. The main conclusions addressed here are in good agreement with the experimental results of Meinders [19]. For the lowest separation ℓ/h = 1, i.e., the skimming-flow regime, the inter-obstacle region is characterized by an arch-shaped vortex. This means that the flow between the obstacles is fully confined by the flow on the sides and the roof, a feature characteristic of cavity-like flows. In addition, a horseshoe vortex emerges upstream of the windward face of the leading cube, and it is deflected downstream along the sides of both obstacles, as a result of the low penetration of the flow above and around the sides of the canopy. For the isolated-roughness regime, i.e., ℓ/h = 4, the surrounding flow eventually interacts with the inter-obstacle region: a shear layer detached from the sides and top sharp edges of the upstream obstacle breaches the interobstacle spacing before reaching the downstream obstacle. Because of that flow interaction within the region in between the obstacles, a second horseshoe vortex emerges around the downstream cube, which is characteristic of wave-like flows and is not observed for the previous case. The wake-interference regime, with a separation ratio ℓ/h = 2, exhibits similar flow patterns to those of the SF and IR regimes. In this case, the arch vortex does not span the entire region between the obstacles. This is characteristic of a transition from a cavity-like to a wake-like flow: the separation is large enough for the free stream to slightly interact with the inter-obstacle region. Nevertheless, this interaction is not large enough for the flow to exhibit independent structures, like a horseshoe vortex, around the second obstacle.
Remarkably, in the three flow regimes, the wake behind the second obstacle is relatively similar: a second arch vortex is formed on the leeward side of the downstream obstacle, albeit with a lower intensity. This is mainly due to the flow disruption of the upstream obstacle, which induces a different turbulent-intensity level upstream of the second one [2,54,18]. Finally, the reader is referred to Ref. [45] for a more detailed discussion on the flow statistics for the three regimes.

Spatio-temporal structures
Data-driven modal decompositions are powerful techniques to extract the energetically important features from a given flowfield. Having identified the main flow structures present in the time-averaged fields in § 4, the analysis of the instantaneous fields will allow for the characterization of the main mechanisms driving the flow dynamics. To that end, the results obtained with a highly-efficient tool for the analysis of complex flows, i.e. higher-order DMD, are analyzed in this section. As seen in § 3.2, an important characteristic of this method is the higherorder Koopman assumption, which relates each flow field with d subsequent fields. Therefore, a proper selection of this parameter is paramount for the identification of the main mechanisms of the flow, specially when the amount of data is restricted. In Appendix A we provide a detailed overview of the calibration process of the identification method of the modes. In particular, Fig. 4 shows the frequency versus amplitude of the different modes computed using DMD-d for the three flow regimes. These results were obtained using the databases specifications from Table 1. A dominant mode, i.e. the one with highest amplitude, is located between the frequencies ω m = 1 and ω m = 1.3, while the rest of the modes are subharmonics and harmonics of it. Additionally, another relevant mode is the one with the lowest frequency, ω m = 0.1, since it is the first mode to appear in the spectrum and it is the frequency that drives the periodicity of the main physics.
Based on this identification, we highlight two types of modes, which are also shown in Fig. 5: vortex-generator (A) modes and vortex-breaker (B) modes. Vortex-generator modes are the ones responsible for the main flow structures and vortices; therefore, this suggests that they could be related to the mechanism that creates the arch and horseshoe vortices. These modes usually appear in the lowfrequency area of the spectrum and have a lower amplitude than B modes. For instance, for the WI flow regime, some of these modes appear at the frequencies ω m = 0.11, 0.22 and 0.33. Conversely, the vortex-breaker modes are suggested to be responsible for the destruction mechanisms of the main flow structures and the ones that generate the turbulent wake. As opposed to A modes, B modes appear in the high-frequency region of the domain, i.e. for the ninth harmonic of the lowest frequency onward. Further studies need to be carried out to study in detail the main mechanism and instabilities connected to the presence of the previous modes.
To verify this classification of modes in a quantitative way, the modal-assurance criterion (MAC) analysis [55,35] was used. Given two complex modal vectors u i and u j , the MAC value represents the normalized dot product of the modal vectors at common points, namely: If the mode's shapes are identical, the MAC will have a value of 1. In the present study, modes with MAC > 0.8 are considered similar. Note that in this study MAC < 1, since the large range of scales found within turbulent flows play a role. Note as well that the relative error made in the calculations when comparing the reconstructed and original fields always remains fenced in the set of tolerances used. Different calculations with these tolerances, defined in Eqs. (11) and (15), were made during the calibration process, see Appendix A. The goal of this study, however, is to identify some of the large-scale structures that provide a broad description of the fundamental patterns driving the flow, rather than to build any accurate reduced-order models based on the physical knowledge of the flow. As a result, the relative error will not be examined further in this study.
It has been demonstrated that areas with substantial recirculation cause an increase in the concentration of passive scalars [7], i.e. the diffusive contaminants present in a fluid flow that have no dynamical effect on the fluid motion. As a result, according to Monnier et al. [15], A modes, which are associated with these prominent recirculating zones, may be linked to areas of high pollution concentration. Similarly, since B modes are related to the processes that cause these coherent structures to shatter, it is possible to hypothesize that they are associated with areas of lower pollution concentration. In particular, the arch vortex is a recirculating feature formed in the region in between the building-like obstacles. Therefore, further understanding of these flow patterns could allow for the development of some control strategies that might reduce the presence of A modes in urban areas, hence, reducing pollutant concentration. However, further research needs to be carried out to identify these types of modes in databases modeling multi-phase flows, which remains as a topic for future research. Fig. 5 shows a three-dimensional view of the main HODMD modes presented in Fig. 4 as a function of the separation ratio between the obstacles. Fig. 5 (left) corresponds to the vortex-generator low-frequency mode, mode A, whereas the vortex-breaker high-frequency mode, mode B, is depicted in Fig. 5 (right). These results are further discussed in the following sections.

Generation process of the main vortices
The wall-normal velocity component of mode A exhibits a three-dimensional pattern on the upper windward side of the upstream obstacle, which is shared among all the flow regimes. This cap-like structure created on top of the first building interacts primarily with the roof of the arch vortex. At this location, the flow experiences a high-velocity region due to the impact of the flow over the edge and the shear layer on the upper part of the obstacle. This region is then followed by another fluctuating part in the wall-normal direction in between the obstacles. In addition, a similar cap-like structure is observed in the IR case for the downstream obstacle. Finally, similar to the wall-normal component, the flow encounters a high-spanwise velocity gradient owing to the effect of the flow over the edges of the first obstacle, which tends to deviate the flow towards the outer regions of the domain. Furthermore, the second obstacle has comparable structures on its windward lateral edges as a result of the flow reattachment that occurs in the IR case [56]. On the other hand, due to the slight interaction of the freestream flow coming from the lateral sides in the spanwise direction with this region, no fluctuating zones develop in between the obstacles. This characteristic distinguishes A modes from B modes, where the latter exhibit substantial spanwise variations   ]Three-dimensional iso-surface of the (left) streamwise, (middle) wall-normal and (right) spanwise velocities of the vortex-generator mode shown in Table [ between the obstacles, aiming at providing a general overview of the main flow structures related to each mode. Fig. 10 (left) corresponds to the vortex-generator low-frequency mode, whereas the vortex-breaker high-frequency mode is depicted in Fig. 10 (right). These results are further discussed in the following sections.

A. Generation process of the main vortices
In the first case, the vortex-generator mode presents some characteristic three-dimensional structures which relate to the formation of the arch and horseshoe vortices in a low-frequency fashion. First, with regard to the streamwise velocity component, a dome-like structure prevails in the intermediate section of the obstacles. Increasing the distance between the obstacles does not result in a reallocation of such a flow pattern, which remains in the same place throughout all flow regimes. This indicates that when the flow reaches this region in the WI and IR cases, it interacts with the canopy, which does not happen in the SF regime, where the flow in between the obstacles remains trapped, i.e. the spherical structure covers the same region as the separated zone between the obstacles. Note as well that the dome feature in the IR case is complemented by a structure on the wake which extends through both lateral sides up to the leeward side of the downstream obstacle. These conclusions were already extracted from the analysis of the time-averaged fields in the streamwise direction at the symmetry plane zh = 0, where the increased separation between the buildings yielded a higher interaction of the flow within this zone. On the other hand, although not presenting relevant flow structures in the constant−y planes analysed in Fig. 11, the wall-normal velocity component of mode A exhibits a three-dimensional pattern on the upper windward side of the upstream obstacle, which is shared among all the flow regimes. At this location, the flow experiences a high-velocity region due to the impact of the flow over the edge and the shear layer on the upper part of the obstacle. This region is then followed by another fluctuating part in the wall-normal direction in between the obstacles. Albeit to a lesser extent, the downstream obstacle in the IR case also exhibits a similar flow structure. Finally, similar to the wall-normal component, the flow encounters a high-spanwise velocity gradient over this zone owing to the effect of the flow over the edges of the first obstacle, which tends to deviate the flow towards the outside regions of the domain. Furthermore, the second obstacle has comparable structures on its windward lateral edges as a result of the flow reattachment that occurs in the IR case. On the other hand, due to the slight interaction of the flow in the spanwise direction within this region, several fluctuating zones develop in between the obstacles. This characteristic distinguishes this generating mode from the breaker modes, which exhibit substantial spanwise variations between the obstacles and are thus responsible for the destruction of the vortical structures in this region, as seen in the following. To conclude, it can be state that the flow features that dominate in the vortex-generator mode are located close to the first obstacle rather than on the wake, emphasising the idea of being a formation-type mode.  between the obstacles. This may suggest that the structures of A modes could be connected with the formation of the vortical structures in this region. In particular, Fig. 6 shows the streamlines associated with this type of mode, where it can be noticed the high resemblance of this mode with the arch-vortex structure shown in Fig. 3. It can then be stated that the flow features that predominate in the vortexgenerator mode are resemblant of those of the mean flow, emphasizing the idea of being a formation-type mode.
To conclude, the most unfavorable characteristics of the generation mechanisms with regards to the pollutant dispersion may appear to be the dome and the cap since they are linked to a delay in the pollutant vertical escape. Columns appearing on both sides of the buildings for the spanwise component may have less influence. They prevent the flow from spilling over the sides of buildings, but not from leaving the urban area. Further research is ongoing to confirm the relationship of the previous mechanisms with the dispersion of pollutants within urban environments. However, this topic remains out of the scope of this paper.

Breaking process of the main vortices
The three-dimensional structures of the vortex-breaker mode are illustrated in Fig. 5 (right). This mode is closely connected with the wake as opposed to what was observed in the previous case. In this type of mode, three additional structures can be distinguished for each velocity component: a turbulent wake, coherent cluster between the buildings, and arrowhead-like shapes. For all of the flow regimes investigated here, the spanwise-fluctuating areas are shown to occupy the whole region between the obstacles. As a result, increasing the distance between the obstacles leads to a greater number of these arrowhead-like structures in the spanwise direction: in the IR case, up to three alternating structures may be seen, whereas only one can be noticed in the SF regime. Conversely, the streamwise component of the present mode does not exhibit the same behavior; the coherent structures formed on the leeward side of the upstream obstacle remain unchanged for all three flow regimes. However, another oscillating zone arises connected to the downstream obstacle, the position of which is modified among the various regimes. Finally, in terms of velocity in the wall-normal direction, the high-velocity clusters appear within the region in between the buildings. As the separation increases, the flow interacts inside the canopy with considerably more significance, resulting in larger flow structures for both the WI and IR cases. These structures are consistent with the results of Monnier et al. [15] (see Fig. 1), where strong streamwise fluctuations on both lateral sides and a high turbulent spanwise region near to the windward side of the downstream obstacle were reported. The arch vortex is known to exist between these regions, and these structures will be related to the process of breaking rather than creation, owing to its location on the wake. The interaction of the previous structures relates to the creation of a tunnel-shaped vortical structure between the buildings which might be responsible for the breaking process of the arch vortex. This flow mechanism is clearly elucidated by means of the streamlines patterns depicted in Fig. 6. While the first mode resembles the arch-vortex structure, the second exhibits a helicoidal tunnelshaped flow pattern in the region in between the obstacles, owing to the increased correlation in the spanwise direction. The location of these structures perfectly matches the gaps in between the velocity-fluctuating regions in the streamwise and spanwise directions. Therefore, since these velocity-fluctuating regions define the location of such tunnel-shaped flow patterns, the number of structures is modified from case to case: up to three structures are observed in between the buildings for the IR regime. Consequently, the interaction of the structures of B modes within this region results in a mixing procedure, leading to the breaking process of the vortical structures that are generated by A modes.

Interaction between vortex-generator and vortex-breaker modes
Finally, the three-dimensional structures of the most significant HODMD modes can be compared to those of the various modes identified by the algorithm. Specifically, the following lines will be dedicated to the classification of the modes in vortex-generator or breaking-vortex modes based on resemblance with the prior patterns. Note that this classification is driven by the MAC analysis stated in previous sections, which allows for selecting the more robust modes identified by HODMD. Fig. 7 shows a contour representation at y/h = 0.25 of the HODMD modes presented in Fig. 4. The main A and B modes are highlighted in bold and their two-dimensional structures can be compared with the previously-discussed three-dimensional ones. From these structures, a limit frequency can be established such that greater-frequency values result in flow structures which are more related to the vortex-breaking process. For these modes, the MAC analysis gives values which are closer to 1 when compared to B modes, indicating similarity in shape with B modes.
For the SF case, the mode ω m = 0.37 still exhibits some flow trapped in between the obstacles and high spanwise fluctuations on the lateral edges of the first obstacle. Even though the flow in this region appears to be modified by the slight interaction with the surrounding flow, this mode can be thought of as a vortex-production mode with a different production mechanism. Higherfrequency modes (ω m > 0.85) are characterized as breaker modes since both the streamwise and spanwise components share the same flow features as the main B mode (with frequency ω m = 1.22). Note as well that higher-frequency modes exhibit smaller spatio-temporal scales. This highlights the association of lowfrequency modes with large flow scales (dominant patterns) and high-frequency modes with smaller turbulent structures. A similar conclusion can be extracted for the IR regime, where the threshold value is set for the mode with frequency ω m = 0.57. This mode exhibits some flow structures around the upstream obstacle combined with particular features on the wake, which makes it a transitory mode between the vortex-generator and vortex-breaker modes. Finally, regarding the WI case, apart from the vortex-generator mode (ω m = 0.13), the lowest-frequency mode (ω m = 0.45) exhibits a flow pattern similar to that of the vortex-breaker mode. Therefore, in this situation, the threshold value should be set lower than this frequency, resulting in all modes fulfilling ω m > 0.45 becoming of breakingtype. Note that for this case, modes with associated frequencies ω m = 0.11, 0.22 and 0.33, which are harmonics of the lowest-frequency mode, are examples of vortex-generator modes with a MAC value close to 1 when compared to the main A mode, indicating significant similarity with this mode.
Knowledge of the mechanisms of generation and destruction of relevant vortical structures within urban flows provides sufficient information to be able to perform studies of pollutant dispersion within urban environments, so that groundlevel concentrations significantly higher than those occurring in the absence of the building can be avoided. In this sense, for pollutants emitted at street level, the vortex-breaker mode could provide, at a high frequency, a higher interaction with the surrounding clean air, while for the arch-generating mode, the flow between buildings could be hardly influenced by the flow outside. Therefore, B-type modes could be connected to the promotion of the pollutant dispersion within cities, whereas the generation of A-type modes should be minimized owing to their low interaction with the surrounding atmosphere. Another important aspect closely related with the pollutant-dispersion aspect is the direction of the fluctuations. Regarding vortex-breaker modes, the tunnel-shaped structures shown in Fig. 6, mainly influenced by the arrow-shaped spanwise fluctuations, would disperse rapidly those pollutant emitted at the street level towards the atmosphere. However, the streamwise fluctuations, which increase close to the building walls, produce an increased concentration of pollutants within the city.

Streamwise and spanwise-periodic structures
After discussing the different possible generation and breaking mechanisms of the main vortical structures associated with the temporal modes obtained by HODMD, the STKD algorithm is applied to these modes in order to obtain the spatio-temporal modes, with the aim of understanding more in detail the mentioned mechanisms and how they link with the physics of the problem.
As already shown in § 3.3, the modal expansion can be applied to different spatial directions and, in this work, we analyze both the streamwise and spanwise components. Once the STKD is applied to the temporal HODMD modes, we obtain a spectrum of the spatio-temporal modes as can be seen in the Fig. 8 for both directions, where the modes obtained for the x-direction are called X modes and the ones obtained in the z-direction are the Z modes. The dominant wavenumbers are α m = 0.6 for the X modes, and β m = 2.2 for the Z modes. In general, it can be observed that the spatial modes obtained from the A modes have lower amplitude than the ones obtained from the B modes. After studying the results obtained from the spatio-temporal modes, it can be observed that the X modes are connected to the mechanisms of the breaking process and the Z modes show the resulting structures from these breaking mechanisms. Fig. 9 depicts the different structures appearing in the STKD analysis when applied to generator modes. The interaction zone for all the displayed modes is near and between the buildings, hence, the shedding of the arch vortex still has not started. Regarding the unsteady modes, it is possible to identify different types of flow patterns: for the Z modes, the patterns that appear in the streamwise component of the velocity surround both buildings for the SF and WI regimes while these patterns are divided in the IR case, creating a new structure in the second building. This phenomenon occurs due to the wave-like topology of the flow in the IR regime, where the free stream has enough distance between the buildings to adapt. Something similar happens for the wall-normal component in the IR case, where the cap-like structure starts to appear again in the second building. Also, a phase shift appears for the patterns formed in the streamwise component of the Z mode between the real and imaginary part (the imaginary part Figure 9: Three-dimensional iso-surfaces of the different spatio-temporal modes obtained from the generator modes. (Top), (middle) and (bottom) show the SF, WI and IR cases respectively, while each column represents a different mode: the first column shows the streamwise velocity of the temporal generator mode, the second and third columns display the streamwise and wall-normal velocity of the dominant Z mode with a wavenumber of β m = 2.2, and the last column represents the streamwise velocity of the dominant X mode with a wavenumber of α m = 0.6. Red and blue denote positive and negative velocities respectively, and the white-transparent structures represent the mean-flow of the spatio-temporal modes with zero wavenumber.
is not shown for the sake of brevity), showing that these structures are traveling waves along the spanwise direction. The same applies to the X modes, where the phase shift between the real and imaginary part shows that these structures are traveling waves along the streamwise direction. Regarding the interaction between the mean flow and the unsteady modes, for the streamwise component of the Z modes, the zero-frequency mode or mode 0 appears between the buildings interacting with the structures of the dominant mode, suggesting its connection with the presence of the arch vortex. Similar phenomena are observed for the wall-normal Z modes, although the interaction occurs on the top and bottom of the first building, suggesting that there is connection between the cap recirculation structure and the horseshoe vortex. Finally, the streamwise component of the X modes shows that the mode 0 surrounds the two buildings interacting with the unsteady modes, suggesting its relationship with the arch vortex again.
Regarding Fig. 10, where vortex-breaker modes are shown, the structures of the Z modes continue to appear between the buildings rather than on the sides of the buildings, as expected. The reason behind this phenomenon lies in the strong streamwise influence present in the flow, making the breaking mechanisms appear near the arch vortex. The X modes display large streamwise structures on the sides of the buildings (except in the IR case), suggesting the connection with the destruction of the main vortices and generation of the turbulent wake. With regards to the Z modes, a large structure appears between the buildings after the arch vortex, suggesting a possible connection with the shedding of the arch vortex. Regarding the zero-frequency mode, it begins to appear between the buildings with low intensity for the SF, it enlarges and gains relevance in the flow for the WI case, while for the IR case its presence decreases. However, the dominant mode becomes important, especially in the IR, suggesting its connection with the wake that forms downstream of the buildings. For the vortex-breaker case, the interaction between the zero-frequency mode and the dominant frequency mode is present, however, as the intensity of the structures related to the unsteady modes is considerably higher than in the vortex-generator modes, the interaction has a lower relevance.

Energy analysis
In this section, we analyze the same urban database discussed in § 2 using POD. In Fig. 11, we show the eigenvalues λ m (left) and cumulative sum of the eigenvalues ∑ i=m i=1 λ i (right) spectrum normalized with the total energy of the eigenvalues ∑ M i=1 λ i of the POD modes corresponding to the complete set of velocity components of the three reference regimes. The objective is to identify energy gaps that make some modes energetically more relevant than others. This energy gap is noticed between the second and third modes for all regimes, which highlights that the first two modes contain the most relevant information of the flow. A second but smaller energy gap is appreciated between the fourth and fifth modes. These constitute the set of POD modes that will be compared with the main HODMD modes in the present section. A separated analysis conducted on the three components of the velocity revealed that the wall-normal component only accounts for 15% of the total energy content of the flow. This means that the influence of the wall-normal velocity fluctuations is lower than that of the other velocity components. This is consistent with the findings of Monnier et al. [15] depicted in Fig. 1, who discovered that for a zero-incidence angle, the streamwise and spanwise fluctuating components are more significant than the wall-normal component. On the basis of the above, only the streamwise and spanwise components will be further studied for the three flow regimes. Figs. 12 and 13 depict the orthogonal POD basis for the first ten modes of the streamwise and spanwise velocity fields corresponding to the three flow regimes. In addition, an analysis of the temporal coefficients associated with these modes is performed in the frequency domain through the fast-Fourier-transform (FFT) method [57]. Fig. 14 depicts the power spectrum for the first five POD modes in all the flow regimes. We can classify the time coefficients linked to each spatial mode into low-and high-frequency phenomena in the frequency range ω m = [0, 2] using this modal-decomposition technique, whose characteristics are very relevant to the vortex-generating and -breaking processes, respectively. In fact, the first two modes, with associated frequencies that are similar to those of the HODMD B modes (ω m ≈ 1), are characterized by high-velocity streamwise fluctuations on both sides of the buildings; these are complemented by spanwise-velocity fluctua- tions in the area between the obstacles for each flow regime. These regions match with the high-turbulent-kinetic-energy (TKE) regions of the streamwise component identified by Monnier et al. [15] for an array of building-like blocks, see Fig. 1. However, the main differences among regimes depend on the position of the secondary structures, which are associated with the downstream block. For instance, while these streamwise fluctuating regions span the zone in between the obstacles for the skimming flow and the wake interference cases, in the isolated roughness, they are only located on the immediate leeward side of the upstream block. In such a fashireon, increasing the separation of the obstacles does not yield more streamwise fluctuating regions, at least for the more energetic modes, as it occurs in a vortex-shedding case. Furthermore, the spatial structures of the first two modes are observed to be the same for the three flow regimes, except for a shift in phase: they are both antisymmetric about the z-axis and they both represent fluctuating regions on the wake with matching frequency values. This is an evidence that these modes represent a wave-like periodic structure of the flow that develops in the streamwise direction.
On the other side, increasing the distance between the obstacles increases the number of high-intensity flow structures of spanwise fluctuating regions, although the width of these structures decreases. These results could be connected to the interplay of lateral flow within the canopy, which suggests that the arch vortex formed on the leeward side of the upstream obstacle is shattered. A tunnel-shaped structure is then produced as a result of the interaction between these two different types of structures, a fact that suggests that it is this process that breaks the main flow structures (see Fig. 6). Therefore, the vortex-breaking process has been identified as the most energetically relevant mode present in the flow field.
The third and fourth POD modes are related to the vortex-generating modes, or the A modes, because of their low-frequency behavior (ω m < 0.8). In this case, the streamwise component demonstrates how a dome-like structure encloses the area between the obstacles and expands further in the wake, with low spanwise fluctuations. Due to the resemblance with the time-averaged field, a generating process for these structures is implied. This is consistent among the three flow regimes; however, for the IR regime, these structures appear closer to the first obstacle and a clear interaction with the second obstacle is not observed. A similar conclusion was drawn for the most energetic modes, a fact that highlights the independence of flow around both obstacles in the IR regime.
The fifth mode, which produces flow structures as a result of the interaction of the aforementioned low-and high-frequency modes, can be thought of as a transitional mode between them. Therefore, higher-order modes exhibit flow structures which might result from the combination of the previous modes and if even higherorder modes (m > 10) were studied, certain high-frequency phenomena would be captured due to the smaller turbulent-flow scales associated with them. As a result, due to the wide range of frequencies in their spectrum, a clear comparison with HODMD modes is more difficult to be set for these higher-order modes since the former are associated with a single frequency value. Nevertheless, this transitional behavior is also observed in the spatial structures from the fifth to tenth modes in Figs. 14 and 13, where the wide range of structures found might be regarded as a combination of vortex-breaker and vortex-generator features. Remarkably, for some of the higher-order modes of the isolated roughness regime, e.g. modes 6 and 10, individual structures are observed around the downstream obstacle. Again, this provides more evidence of the independence of the flow in this flow regime, but also of the ability of the method to discern between flow patterns around both obstacles, a characteristic that was not found for the HODMD modes.

Summary and conclusions
A simplified urban environment model consisting of an array of two buildings with variable spacing ratios was examined using high-fidelity simulations. These simulations were carried out to provide a complete physical description of the fundamental mechanisms controlling the dynamics in different urban streets. The aim was to provide an insightful analysis of the physics of the flow within environments that approximately reproduce some aspects of different types of urban areas. The growing expansion of cities boosts the search for physical models capable of reproducing the pollutant and thermal distributions within cities, although we consider simplified versions of those flow cases. Here, the three-dimensional flow patterns responsible for pollutant dispersion have been characterized. Isosurfaces and contour slices were used to demonstrate the complicated flow behavior of the modes identified by the POD and HODMD algorithms. The results show that the flow behavior can be split into low-and high-frequency phenomena, each with significant consequences related to the formation and destruction of vortical structures such as the arch or horseshoe vortices. Low-frequency modes are named vortex-generator (A) modes since their associated structures have been re-lated to the mechanism triggering the formation of the arch vortex formed on the leeward side of both buildings. These structures are particularly noticeable on the windward side of the upstream obstacle for the wall-normal and spanwise velocity components and the leeward side for the downstream one, which defines the location and shape of the arch vortex. Furthermore, this location is kept constant among the different flow regimes, which highlights the idea that the process of formation of the arch vortex does not strongly depend on the separation between the obstacles.
In addition, HODMD identifies a high-frequency mode that correlates with the largest-amplitude mode in all cases. Because of the streamline flow patterns in the intermediate section of the obstacles, they are referred to as vortex-breaker modes. The large amplitude of these modes emphasizes their importance in this type of simplified urban environment. Indeed, their structures are linked to the first two highest-energy POD modes. Furthermore, as the separation increases, the flow becomes more correlated in the spanwise direction, owing to the more significant interaction of the flow with the wake layer inside the canopy. This effect yields to fluctuating velocity regions occupying the whole section between the obstacles, thus being associated with the destruction of the vortical structures rather than their formation. Besides, the vortex-breaker modes will be responsible for the emergence of high-TKE regions on both sides of the obstacles and on the windward side of the downstream obstacle. Interestingly, these results are consistent with the wind-tunnel results of Monnier et al. [15], performed on a more complex urban environment. Therefore, the conclusions of the present work could potentially be relevant to more realistic urban environments by considering that the turbulence levels from one street to the next one are expected to decrease significantly. It is also interesting to note that the results provided by both POD and HODMD show that the wall-normal velocity component does not significantly influence the more prominent structures as the streamwise and the spanwise components do.
Regarding the results obtained from the STKD analysis, the Z modes display the results of the mechanisms obtained in the X modes about the generation and destruction of the coherent structures. When analyzing the temporal generator modes, the X and Z modes show a phase shift between the real and imaginary parts. Consequently, traveling waves appear in each direction. In addition, the symmetry is conserved, and the influence area shows that the structures are still unbroken. On the other hand, when the STKD analysis is applied to the temporal breaker modes, the X modes show that the main structures are broken, and the structures causing this destruction are shaped as large streaks in the stream-wise direction. Meanwhile, the Z modes show the resulting structures from these breaking mechanisms.
From an environmental point of view, urban areas with highly-separated buildings, i.e. the isolated-roughness regime, would exhibit much more interaction with clean air sources, thus enabling the rapid propagation of those pollutants emitted at the street level. However, power plants, commonly located close to urban centers, are also responsible for pollution issues within cities. In those cases, owing to the low interaction of the flow above the urban canopy with the streets, it would be convenient to decrease the separation between buildings, i.e. establishing the skimming-flow regime. In such a case, the air at the street level would also be in contact with clean sources of air through the arch-vortex legs.
We conclude our work with a brief discussion over the main limitations of the methods employed here. Firstly, a numerical data set of more than 200 threedimensional snapshots for the three components of the velocity has been found to be accurate enough to represent the main large-scale structures of the flow. However, this number of fields has a direct impact on the required computational resources, especially given the amount of information employed for each field. Although some additional analyses have been performed with a slightly larger number of snapshots, these results remain to be confirmed with data sets encompassing a significantly larger number of snapshots. Additionally, higher-order modes could also be analysed: their associated structures might differ from those of the modes presented here, thus yielding to new types of modes. Other modaldecomposition techniques such as SPOD may also be employed with the objective of confirming the similarity of the obtained modes with the present spatiotemporal structures.

Appendix A. Calibration process of the HODMD algorithm
The large number of phenomena associated with complex turbulent flows motivates the use of highly-efficient methods to properly identify the behavior of such dynamical structures. This Appendix aims at providing a brief summary of the calibration process concerning the HODMD algorithm. A well-established criteria must be used in order to identify the most robust modes that best characterize the system from the very large number of modes calculated with each variation. Fig. A.15 shows the frequency versus amplitude of the different modes computed using HODMD with a variety of parameters for the three flow regimes. The largest-amplitude and lowest-frequency modes are selected since they possess the more relevant information about the system and the classification will be made based on this distinction. As highlighted in Fig. A.15, these modes are known to form clusters throughout the spectrum. The amplitude and frequency of the selected modes will be, therefore, the average value of the collection of modes. The number of preserved modes is also a function of the tolerances employed, which were ε = 10 −3 and 10 −4 in this study. Similarly, the other user-controlled parameter, d, changes depending on the number of snapshots to be analyzed. The skimming-flow and wake-interference regimes, with K = 225 and 215 snapshots, respectively, were studied with d = 10, 20 and 30, whilst the 90 snapshots of the isolated roughness case were studied with d = 5, 10 and 15. Note that, as mentioned during the theoretical derivation of the Koopman operator (see § 3.2), when the number of snapshots is reduced, the value of d, which represents the characteristic sliding window process, must also decrease in the same proportion [33]. Furthermore, the relative error obtained in the calculations remains fenced in the set of tolerances used. The goal of this study, however, is to identify the largest-amplitude modes in order to provide a broad description of the fundamental patterns driving the flow, rather than to build any accurate reduced-order models based on the physical knowledge of the flow. As a result, the relative error has not be examined further in this study. The reader is referred to Vega and Le Clainche [33] for a more detailed explanation of the calibration process and the influence of the error in the solution.
It is worth noting that when the distance between the obstacles increases, the flow complexity decreases, allowing a smaller number of snapshots to be used to estimate the same flow behavior. Note as well that, despite the fact that the number of snapshots in the isolated-roughness regime is smaller, the time span covered is of the same order of magnitude, as the time step between snapshots has been increased. This is particularly important when dealing with computationally expensive data, but one should bear in mind that in order to accurately represent smaller turbulent scales, a larger number of snapshots with a shorter time step should be utilized instead. However, in general, the HODMD provides a fair balance of computational cost and accuracy.