Spectral-element simulation of two-dimensional elastic wave propagation in fully heterogeneous media on a GPU cluster

We present an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. We have incorporated most of realistic geological features in the model, including surface topography, curved layer interfaces, and 2-D wave-speed heterogeneity. To accommodate such complexity, we use an unstructured quadrilateral meshing technique. Simulation was performed on a GPU cluster, which consists of 24 core processors Intel Xeon CPU and 4 NVIDIA Quadro graphics cards using CUDA and MPI implementation. We speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.


Introduction
Over the past several years, numerical modeling using graphics processing units (GPUs) have been successfully used to simulate seismic wave propagation. One of the most widely used approach is the finite-difference method [1,2] with varying degree of sophistication. It has been successfully applied to simulate seismic wave propagation in fluid saturated porous media [3,4], high-order simulation [5], and heterogeneous model [6]. Unfortunately, significant difficulties arise in the present of topography [7]. Pseudospectral methods have become popular for acoustic and elastic simulations [8,9], but are restricted to models with regular variations.
The spectral-element methods have been used for more than 30 years in computational fluid dynamics [10]. The method easily accommodates topography and naturally incorporates free-surface condition. It uses a weak formulation for solving elastic wave equation and lends itself well to be applied in parallel computation with distributed memory [11] and GPU cluster [12].
In this article, we present an implementation of the spectral-element method for simulation of twodimensional elastic wave propagation in fully heterogeneous media. Based upon this implementation, a realistic topography, a complex subsurface structure, and wave-speed heterogeneity can be efficiently incorporated within the spectral-element mesh. The effects of heterogeneity, influence of realistic surface topography and analysis of simulation time by the implementation of Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) are investigated.

Model Setting
We define a model for simulation based on Merapi gravity data inversion in 2011 [13] as shown in Figure  1. It had a dimension from 0 to 20 km horizontally, and from 0 to 2912 m vertically. Throughout the entire model, P-wave speed was approximated based on the value of density [14], which may be written in the form with V P denotes P-wave speed in m/s and ρ denotes density of medium in g/cm 3 . The S-wave speed was approximated using the ratio of P-and S-wave speed given by [15], which has the value of 1.86. The summary of material properties of the model is given in Table 1.

Mesh Generation
To incorporate a considerable surface topography, complex subsurface interfaces, and 2-D wave speed heterogeneity, we used an unstructured quadrilateral meshing technique with an initial element size of 30 m. We used a paving mesh scheme algorithm provided by TRELIS meshing software to produce a fairly uniform mesh shape and accommodated curved interfaces in the model (see Figure 2). The total number of elements in the model is 30999.

GPU Cluster
Simulation was carried out on a GPU cluster, which consists of 24 core processors Intel Xeon CPU E5-620 @2.4 GHz and 4 NVIDIA Quadro K4000 graphics cards using CUDA and MPI implementation provided by SPECFEM2D package [12] at the Computational and Seismic Laboratory, Geophysics Sub-Department, Universitas Gadjah Mada.

Snapshots of Wave Propagation
We decomposed the model into 30999 quadrilateral elements. The minimum and maximum distances between grid points in the model are 14.07 m and 71.71 m respectively. In each spectral element, we interpolated a function with the polynomial degree of N = 4, and thus each element contains (N + 1) 3 = 125 Gauss-Lobatto-Legendre (GLL) integration points. The source of the earthquake is an elastic point source placed at the coordinates x s = (10009, 1912). We used a Ricker source time function with a corner frequency of 8 Hz. The time step used is Δt = 0.1 ms, and we propagated the signal for 100,000 time steps, thus total simulation time is 10 s.
Snapshots of the simulation for the vertical component of velocity wave field are displayed in Figure  3. We have normalized the color scale to the maximum of all time steps. We observed at t = 1 s, the wave field begins to propagate on each layer. We also observed a fairly dense wave field in the uppermost layer, magma chamber and pipe. At t = 3, 5, and 8 s, a relatively dense wave field is still quite visible at the layer 1, magma chamber and pipe compared to the layer 3 which is concentrated around the top, as well as the right and left of the slope. At t = 10 s, only at the layer 1, magma chamber and pipe, the wave field is still quite visible despite its small amplitude.

Performance of Simulation
We performed the simulation using three different schemes. The first scheme is a Serial simulation (1 CPU, 1 Process). This scheme requires 1 hour, 25 minutes and 31 seconds of the wall-clock time. The second scheme is an MPI only simulation (48 CPU, 48 Processes). This scheme requires 10 minutes and 50 seconds. The last scheme is MPI and CUDA simulation (4 GPU, 4 Processes). This scheme requires only 2 minutes and 8 seconds. By using MPI only implementation, we can speed up the computations by a factor of about 7 compared to Serial implementation. By using MPI and CUDA, we can speed up

Conclusions
We presented an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. Based upon this implementation, a realistic geological feature including surface topography, a complex subsurface interface and 2-D wave-speed heterogeneity can be easily accommodated.
By using MPI only implementation, we can speed up the computation by a factor of about 7 compared to Serial implementation. By using MPI and CUDA implementation, we can speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.