Shared Memory Architecture for Simulating Sediment-Fluid Flow by OpenMP

Simulation of fluid flow using Shallow water equations (SWE) and sediment movement below it using Exner equation is given. Both of the equations will be combined using splitting technique, in which SWE would be computed using Harten-Lax-van Leer and Einfeldt (HLLE) numerical flux, then Exner would be computed semi-implicitly. This paper elaborates the steps of constructing SWE-Exner model. To show the agreement of the scheme, two problems will be elaborated: (1) comparison between analytical solution and numerical solution, and (2) parallelism using OpenMP for Transcritical over a granular bump. The first problem is going to tell the discrete $L^{1}$-, $L^{2}$-, and $L^{\infty}$-norm error of the scheme, and the second one will show the simulation result, speedup, and efficiency of the scheme, which is around $56.44\%$.


Introduction
Particle movements below the surface of fluid flow are divided into three categories: saltation, suspension, and bedload.Saltation is defined as an alteration of particles position from one place to another.Meanwhile, suspension is defined as the articles moving together with the fluid when they are affected by huge amount of flux, and bed-load is defined as a movement of particles at the bottom of the current [1], [2].Generally, those particle movements can produce the sedimentation under water flow.However, sedimentation can produce negative impact or natural hazard for living organisms near water area, such as flooding, erosion, etc.

Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), volume 12, issue 1, February 2019
Through this paper, the simulation of sediment move ment by mathematical model and its discretization will be elaborated [2].Here, the focus of sediment is on bed-load particles movement.Moreover, the mathematical models shallow water equations (SWE) and Exner equation are used to model the fluid current and the sediment movement respectively.The illustration of one-dimensional model of SWE-Exner can be seen in Figure 1.Since SWE-Exner equations in this paper are discretized using Finite Volume Method (FVM), the degree of fineness in making the simulation is dependent on how many grid points used.In numerical simulation, explicit numerical scheme of FVM is used.Thus the time step of simulation for covering stability analysis depends on number of grids.Therefore, the more number the grid point has, the longer numerical computation would be taken [3].
With parallel technique, the execution time of the code in numerical scheme can be reduced.Some of the aspects that determine the efficiency between the parallel and serial technique are the simplicity of the code, the model of shared memory technique, the parallel architecture of the computer, and the robustness of the parallel code [4].A time comparison between parallel and serial technique would help to determine how big the impact given by the parallel.Then, the measurement of parallel computing is given by calculating its speedup and efficiency.
OpenMP is one of the method in parallel technique [5].Its simplicity (see [4], [6]) is one of the main reasons people choose it instead of other technique (you can also see the detail of process of OpenMP in Section 3).For instance, computing MPS method in [4] is shown to give 49.57% of efficiency in approximation.Moreover, in the simulation of water oscillation on paraboloid using SWE in [7], its efficiency of OpenMP is obtained up to 84.4%.Both of them show that OpenMP is a good choice to accelerate the computation and shorten the computation time.This paper will point out two things: (1) the errors of the SWE-Exner model, by comparing between the solution of analytical scheme and numerical scheme and (2) simulating a transcritical over a granular bump model and discover its speedup and efficiency in serial and parallel technique.The errors in the first problem will be seen by showing the discrete  1 −,  2 −, and  ∞ −  error.The parallel architecture's model in this paper for the second problem is OpenMP with Intel Core i5-4200 @2.3GHz using 4 threads.

Governing equations
Shallow Water Equations (SWE) is one of the hyperbolic system in conservation law which is used to solve geophysical flow problems [8], [9].Generally, SWE is applied for water flow and gas flow.For water flow, SWE may be applied if a current has a wave length more than twenty times bigger than its depth, or  ≥ 20ℎ [10].
SWE is formed by two equations: (1) continuity equation which is obtained from mass conservation law, and (2) momentum equation which is obtained from momentum conservation law.And because in this paper the depth of the bedrock to the reference level () is fixed, thus it becomes    = 0, so SWE can be described as following system of equations, where  describes the spatial coordinate,  as time, ℎ(, ) as water depth,  as gravity, (, ) as velocity of the fluid,   as friction slope, and (, ) as the height of sediment (see Figure 1 for more detail about these variables).The friction slope used in this paper is obtained using Darcy-Weisbach friction force, because of the fluid stream tend to be turbulent [11].Its function is given as, with   is Darcy-Weisbach friction factor.
In order to describe the interaction between fluid (SWE) and sediment, then the mathematical model of sediment should be given.Here, Exner model is used to simulate the movement of sedimentation.Exner equation is an equation derived from conservation mass that is used to model sediment that moves because of the interaction with the flow of fluid [9].Exner equation may be combined with SWE to model bed-load movement below the fluid current as sediment elevation , which can be affected by the velocity of water  [12].Its equation can be explained as, where  is sediment porosity,  is the velocity as in Eqs.(1-2),  is a parameter for the sediment with range 1 ≤  ≤ 4 (see [13] for more detail about this parameter), and Ag is a number around 0 ≤   ≤ 1 to show the interaction strength between the fluid and the sediment; the higher value   , the stronger the interaction between those two.Note that, this Exner equation uses the Grass formula [13] as the bed load function.See references [9], [14] for more detail about other bed load functions.

Numerical Scheme
Using Finite Volume Method (FVM), SWE is applied from system of conservation law [15].The equation of fluid flow (1-2) can be given in a compact form [16] such as, Value of spatial length can be obtained by decreasing one spatial point and the number before it (see Figure 2).It also can be written as, Then, discrete value of time can be obtained using   = ∆ with n as a natural number and ∆ ≥ 0. Using conservation scheme in Eq. ( 5), SWE is transformed in numerical scheme to, The fluxes  are defined using hydrostatic reconstruction scheme, which is, For the friction force in topography, Darcy-Weisbach is used by substituting ∆ with ∆  , which is, Finally, the numerical flux, Harten-Lax-van Leer and Einfeldt (HLLE) numerical flux is given as, The solution of Exner equation will be searched after the solutions of SWE (8-10) have found.volume 12, issue 1, February 2019 Splitting technique is applied when HLLE method is no longer used in Exner, but semi-implicit method instead.It means that the value of   +1 not only needs    , it also needs  +1 that is found earlier from SWE (see Figure 3).Exner equation can be discretized as follows [14].

Parallel Architecture
Parallel architecture is a method used to reduce memory access time.They work by splitting tasks to threads, then each thread will work on the task until it is done.One of the most popular sharedmemory programming model is Open Multi Processing (OpenMP).In OpenMP, each worker is given a chunk of tasks (iterations of a parallel-forloop, or sections in a parallel-sections block), and a barrier construct is used to synchronize the workers (not the tasks) [17].
OpenMP program is essentially a sequential program augmented with compiler directives to specify parallelism.Each thread, if it works below the OpenMP program, has its own program counter and executes one task at a time, similar to sequential program execution [18] (see Figure 4).For more information, see [19] about OpenMP description.This paper uses OpenMP as its sharedmemory programming model because of its simplicity [3], [4], [6], [7], [20]- [22].
has its own popularity because, • it is implemented incrementally [23], • it can still running serial program without modification needed [24], and • it has directive based interface [25].
OpenMP directives are categorized to three parts [23], • Work-sharing specified (omp parallel for, omp parallel section, omp single), that focus on partition work between process.One of the example is in omp parallel for, if you run the output would likely be like Figure 5.In Figure 5, the number of threads is 4 and the number of arrays is 12.Thus by #pragma omp parallel for command, the array indexes 0-2 will be executed by thread 1, then followed by other threads.
• Data properties specified (openmp shared, openmp private, reduction, firstprivate etc.) that focus on constructing the variables in the parallel part.The default of variables in OpenMP standard is shared.• (omp barrier, omp atomic, omp critical, omp flush etc.) that is incorporated into the control flow graph.
To see the details of every directives, reader is encouraged to see [25] for more detail.OpenMP also has basis syntax and runtime library routines, To see other runtime library routines, reader is encourage to check the reference [24].
The flowchart of SWE-Exner implementation in serial and parallel part is shown at Figure 6.In the serial part, variables are declared and the current time is inspected.The current time inspection has to be executed in the serial part because the looping has to be increased step by step.Still in the serial part, the definition of  boundary condition is executed because it only needs two execution lines to define it, so parallel technique isn't needed.Last but not least, the time step is executed in serial because it only need computed once.
In OpenMP part, the new value of ℎ  , ℎ  , and   are found by finding the solution of SWE and Exner.Because each iteration doesn't need the value of other iterations, so parallel technique is possible to be used.The boundary conditions of ℎ  and ℎ  requires four lines, so parallel technique is used by attaching each line to available threads.Note that each processor P. H. Gunawan, and A. Utomo, Shared Memory Architecture for Simulating Sediment-Fluid Flow 35 executes the parallel part by dividing the maximum spatial number and the number of processor in the architecture (_/).After that, u is counted using parallel technique because each of its iteration doesn't need other iterations.It also applies in updating the value of sediment () and water () profile.

Results and Discussions
In this section two numerical simulations will be elaborated: (1) Comparison between numerical solution and analytical solution, and (2) Transcritical flow over a granular bump.The parallel part will be performed in second simulation, and then its speedup and efficiency of parallel method will be given.

Comparison between numerical and analytical solution
Here the goal is to model a numerical solution where the analytical solution is available, from the paper of [26].The numerical solution is initialized with space domain Ω = [0,7], grass formula, and friction slope  = 0 as Coefficients  =  =   = 0.005 are used in this simulation.In detail, the analytical solution of this simulation can be found in [26].volume 12, issue 1, February 2019 The result in 50 grid points shows that the numerical scheme has good agreements with the analytical solution (see Figure 7).In order to validate the distance between numerical and analytical solution, then the discrete   −  error will be given (here  = 1,2, ∞).Here, discrete  1 −,  2 −, and  ∞ −  are a non-negative quantities that explain the magnitude of the object.Indeed, discrete  −  error is used as the measurement for validating the numerical solution in numerical analysis [27].Discrete   −  error is defined as follows depend on the value of , Tables 1, 2, and 3 show the results of discrete  1 −,  2 −, and  ∞ −  error of simulation in 100, 200, 400, 800, and 1600 grid points.Be noted that ℎ  ,   , and   stand for analytical solution of ℎ, , and  Moreover, the notations ℎ ̃= ℎ − ℎ  ,  ̃= −  and  ̃= −  are defined the distance between approximation and analytical solution.Tables 1, 2 and 3 show the numerical error of ℎ, , and  by several grid points.By increasing the number of points, discrete 1 −  error for ℎ, , and  are shown decreasing significantly.Since the error is tend to zero, it means the numerical results is in a good agreement with the analytical solution and the numerical solution is acceptable.Here for instance, using 1600 points, the numerical error of sediment () is obtained 1.257 × 10 -3 , 1.299 × 10 -3 and 1.492 × 10 -2 in discrete  1 −,  2 −, and  ∞ −  error respectively.

Visualization Images
In this test case, the transcritical flow over a granular bump problem from [14] is elaborated.In [14], the staggered numerical scheme is used.Meanwhile here, numerical scheme (Eqs.(8-20) ) is used.Considering an initial state of subcritical and supercritical flow over a granular bump with boundary conditions in space domain Ω = [0,10] are (ℎ)(0, ) = 0.6, (0, ) = 0.1,   () = 0.1 + 0.1 −(−5) 2 , (ℎ)  () = 0.6, ℎ() + () = 0.4, and   = 0.These conditions remain until the scheme reaches steady state (see Figure 8).It means that when a bump appear in the bottom of the flow, it would take some time to make the flow become appropriately formed because of the bump.Once it reached that state, then it is called as steady state.Changing   once it reaches steady state will make it more realistic because the effect of the bump should only happens after the condition becomes stable.
After that, the value of   = 0.0005 is applied to the scheme as the time starts.The movement of sediment can be seen in Figure 9.The image shows that from the steady state to the flat state, both fluid flow and sediment has to reach certain time to make it to the flat state.Furthermore, the movement of both fluid and sediment (see Figure 9) tells the harmony of both of them.Using parallel technique with shared memory architecture can be seen in the last paragraph of Section 1, it can be seen that the time used to compile the numerical computation is reduced for every grid points.Table 4 shows the time execution in and parallel technique.It shows that using parallel technique, the execution time of simulation can be reduced.For instance, using 3200 mesh points/grids, the execution time of serial and parallel are obtained 6118.536118 and 2015.589241seconds respectively.In other words, parallel technique gives almost speedup 3 times from serial one.
In order to make it easier, graphics of speedup and efficiency could be seen in Figures 10 and 11 respectively.Figure 10 show that the more mesh points used in the model, the more it would increase the speedup.Moreover, similar in Figure 11, high efficiency is obtained along the increasing of mesh points.However, the progress of speedup is observed only growing fast until 800 mesh points.The formula of speedup and efficiency can be seen for examples in [7], [20], [21].Here, good speedup (3 times serial time) is observed if the grid number is greater than 800.This phenomena is natural behaviour in parallel computing, since the commu nication time of threads dominates execution time due to small amount of data.Therefore, here stable speedup profile is shown when the number of grids is increased.Similarly, in efficiency measurement of using parallel technique is shown stable when the number of grids is more than 800.Here in Figure 11, this simulation is observed reach up to 70% efficient by using parallel algorithm.

Conclusion
In this paper, numerical scheme of semiimplicit collocated FVM using average velocity for approximating SWEExner model is given.Two numerical problems are successfully elaborated: (1) comparison scheme of analytical solution and numerical solution, and (2) parallel implementation for transcritical flow over the granular bump.In the first problem, it shows that the results of the proposed scheme in this paper are in a good agreement with the analytical solution.The agreement is shown by the discrete   −  errors (with  may be replaced with 1, 2, or ∞) which can be seen in Section 4.1.The second problem shows that in the term of algorithm, the efficiency of parallel implementation is gradually increasing until it reaches up to 70% by using grid number more than 800 points.Moreover, the average of parallel implementation's efficiency is obtained around 56.44%.Another parallel measurement, speedup of parallel computation is shown 3 times faster than using serial computation when the number of grids is more than 800.Therefore, through this speedup and efficiency, parallel technique is shown to be useful when less computation cost is needed.Moreover, parallel computing is shown able to decrease execution time in numerical simulation for approximating SWE-Exner model by semi-implicit collocated FVM.

Figure 1 .
Figure 1.The one-dimensional illustration of shallow water with sediment and fixed bedrock in horizontal slice.

Figure 2 .
Figure 2. The one-dimensional explicit stencil in FVM shows that ∆ is found by subtracting right and left side of every grid point, and every point has the same gap length to other point before or after it (same value of ∆).
#pragma omp construct [clause] ... as basic syntax, to explicitly instructs the compiler to parallelize the chosen block of code, • omp get num thread() to return the number of threads currently working in a parallel region, • omp get threads num() to get the thread rank within the team, • omp set num threads() to set the number of threads used in a parallel region, • etc.

Figure 7 .
Figure 7. Analytical and numerical comparison in  = 7 and their initial condition at  = 0.

Figure 8 .Figure 9 .Figure 10 .
Figure 8.The initial condition of transcritical flow over a granular bump

TABLE 1 DISCRETE
1 −  ERROR OF THE SCHEME AT  = 7

TABLE 2 DISCRETE
2 − m ERROR OF THE SCHEME AT  = 7

TABLE 3 DISCRETE
∞ −  ERROR OF THE SCHEME AT  = 7

TABLE 4 OF
EXECUTION TIME SIMULATION FOR EACH MESH POINTS IN SERIAL AND PARALLEL