Primal Domain Decomposition Method with Direct and Iterative Solver for Circuit-Field-Torque Coupled Parallel Finite Element Method to Electric Machine Modelling

. The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM) with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG) solvers are introduced in two-dimensional parallel time-stepping ﬁnite element formulation to analyze rotating machine considering the electromagnetic ﬁeld, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational eﬃciency and number of iterations of the solver with diﬀerent preconditioners. Simulation results of a rotating machine is also presented.


Introduction
The numerical field calculation of electromechanical devices is a very complex task, because a lot of different physical aspects should be considered for the appropriate modelling.The performances of electrical equipments are not defined only by their electromagnetic field, because the electromagnetic field has strong interaction between the following quantities: electromagnetic field distribution, mechanical oscillation equa-tion, external circuits, etc.The electric machines are the most obvious examples.
The Finite Element Method (FEM) [1], [2] is a wellknown technique for the solution of a wide range of problems in science and engineering.However, a few years back, the simulation of complex structures considering multiple aspects in a same set of equations was restrictive, due to the unavailability of enough computer capabilities for data processing.However, nowadays, thanks to the improvements in the computer architecture, the analysis of complex electromagnetic systems is more affordable.But, the analysis of complex systems, e.g.rotating electrical machine analysis considering movement and voltage drive source require a computing effort to solve large sparse linear systems.These large linear systems arise from the discretization with the help of finite element method.The solution of these large equation systems are very resource-intensive and time consuming, wherein the resources and time of the calculation plays an important role for designers and researchers.Therefore, the solution of a complex system should be parallelised in order to speed-up the numerical computations with less computer requirement.
In this paper, we propose to solve a two-dimensional parallel time-stepping finite element problem using primal Domain Decomposition Method [3], [4], [5], [6], [7].The used primal DDM is also called the static condensation method, the method of sub-structuring or the Schur complement method [3].The direct solver is the parallel forward-backward method with parallel factorisation [4], [5].The iterative solver is a parallel Krylov method, the parallel preconditioned conjugate gradient method [3], [6], which is currently one of the most popular method for systems with real symmetric positive definite matrices.Two preconditioners, Jacobi and Neumann-Neumann preconditioner [7] used in the solver algorithm to improve the convergence behaviour.We present the numerical behaviour of the parallel direct solver and the parallel PCG solver with preconditioners for the modelling of electrical machine with direct coupled field-circuit formulations [2].
The paper is organized as follows.The next section briefly describes the used equations and methods to give an introduction to the formulation of the parallel time-stepping finite element method coupled with circuit and mechanical oscillation equations.The third section describes the Schur complement method and how this method, and its direct and iterative solver algorithms can be used to formulate and solve a coupled problem.Section 4. collect numerical results to illustrate the potential of the method, an induction machine with different mesh size are then presented.Finally, some extensions of the method are discussed.

Field-Circuit Coupling Finite Element Formulation
The electrical machine is modelled in two-dimensional space, using the FEM to discretize the domain, which is based on the weak formulation of the partial differential equations, which can be obtained by Maxwell's equations and the weighted residual method [1].The magnetic vector potential formulation has been applied, and the temporal derivatives are discretized by the backward Euler's scheme [2].The field and external circuit equations are combined together using the direct coupling method [2], [6].Equation (1) shows the matrix system of the field equations [2], [6]: where A is the vector of magnetic vector potential, I is the vector of currents in the windings, U is the vector of voltages at the terminal of the winding, S is the matrix related to permeability, N is the matrix related to electric conductivity, P is the matrix associated with constant coil current, Q is the matrix associated with flux linkage, R is the matrix of d.c.resistance of windings, L is the matrix of the end-windings inductances.
In order to simulate the rotation of the rotor in the two-dimensional case, we used one of the most common method, the so called sliding surface technique with first order nodal interpolation method [8].The interpolation method is necessary, when the fixed (stator) and mobile (rotor) part of the mesh are non-conforming because of variation of angular speed.The new angu-lar speed and rotor displacement are evaluated by the mechanical oscillation equation [6]: where J r is the rotor inertia moment, D r is the friction damping coefficient, T e is the electromagnetic torque, T L is the load torque acting on the mechanical axis, ω r is the rotor speed, and α r is the rotor angular position.At each time step, the electromagnetic torque is calculated via the Maxwell's stress tensor [2], [6] by the following relationship: where L is the axial length of the domain, and r is the position vector linking the rotation axis to the element dΓ, and Γ is a surface, which is placed in the middle of the air gap, B is the magnetic flux density, µ 0 is the permeability of vacuum, and n is the normal unit vector to the surface.

Primal Domain Decomposition Method
When domain decomposition method is used, the problem domain Ω is to divide into several sub-domains in which the unknown magnetic vector potentials and currents can be calculated simultaneously, i.e. in a parallel way.The general form of a linear algebraic problem arising from the discretization of a parabolic type problems defined on the domain Ω can be written as Ka = b [3], [5], in more detail: where K ∈ R (n×n) is the mass matrix, b ∈ R (n×1) on the right hand side of the equations, and a ∈ R (n×1) contains the unknowns.Here n is a number of unknowns.
After the problem is partitioned into a set of N S disconnected sub-domains, as it can be seen in Fig. 1 and the linear sparse system, Ka = b has been split into N S particular blocks [3], [4], [5], [6].For each sub-domains, the nodes are partitioned into interior nodes designated by the subscript i, and interface boundary nodes designated by the subscript Γ.If the interior nodes are numbered first and the interface boundary nodes are numbered last, then the sumdomain equation system can be written in the following matrix form [3], [6].
where j = 1 • • • N S , K jj is the positive definite submass matrix of the j th sub-domain, b j is the vector of the right hand side defined inside the sub-domain.
The sub-matrix K jΓ = K T Γj contains the coefficients of j th sub-domain, which connect to the interface boundary unknowns of that region.The superscript T denotes the transpose.K ΓΓ , and b Γ expresses the coupling of the interface unknowns.It should be noted, that it is much easier in the parallel computation, if the sliding surface is used as an interface boundary in the air gap, as it can seen in Fig. 1.Each sub-domain will be allocated to an independent processor core, because the sub-matrix K jj with the K jΓ , K Γj and the right-hand side b j are independent, i.e. they can be assembled in parallel on distributed memory.Only the matrix K ΓΓ , and the vector b Γ are not independent.The matrix K ΓΓ and the vector b Γ are assembled after interprocess data transfer, because they are the assembly of K ΓΓ and b Γ .K ΓΓ is called the local Schur complement, and K ΓΓ is the mass matrix of the reduced system, the Schur complement of the problem.
The assembly and the solution of the sub-matrices can be performed parallel by independent processors.However, the solution requires exchange of interface values, a Γ between the processes in charge of the various sub-domains.In many practical applications, the forward-backward method with LU factorisation, or the preconditioned conjugate gradient method is used because of its simplicity and efficiency.
The practical implementation of the parallel forward-backward substitution with parallel factorisation can be found in [4] and [5].
The parallel implementation of the preconditioned conjugate gradient method can be found in [3] and [6].
In this case, two preconditioners have been used, the Jacobi preconditioner [3], [6] and the Neumann-Neumann preconditioner [7].The Jacobi preconditioner is one of the simplest forms of preconditioning, in which the preconditioner is chosen to be the diagonal of the matrix.The Neumann-Neumann preconditioner is defined by approximating the inverse of the sum of local Schur complement matrices by the weighted sum of the inverses [7].
To illustrate how the above mentioned domain decomposition method with parallel solvers are implemented into the field-circuit coupled finite element method.

Application
In this section, to demonstrate the operation of the presented methods, a 4-pole 3-phase 3 kW induction motor with un-skewed rotor slot fed by sinusoidal voltage is tested.The test problem and parameters, and the GMSH model is from the free GetDP models it can find in [9], [10].The studied domain consists of one pole of the machine, i.e. a 45 • domain, as you can see in Fig. 2.
Anti-periodic boundary conditions are used to represent the whole problem.In this simulation, 20 periods have been calculated, and a period of the 50 Hz voltage excitation has been divided into 300 time steps.
Numerical experiments have been performed on platform composed of four CPUs Intel Xeon L5420.Each CPU is a Dual-core processor running at 2.5 GHz.It supplies 8 × 4 GB RAM memory.The benchmark presented in this paper consists in performing 15 times the same operation in order to overcome the problem caused by the finite precision of the clock.The implemented program has been developed under the MAT-LAB [11] computing environment in C language and in own scripting language of the MATLAB [11].
We compare the implemented method for different mesh size.Table 1 contains the data about the partitioned finite element mesh in various global element size factors.In the Tab. 1, the number of degree of freedom of the unpartitioned problem, GDoF, the number of degree of freedom of one sub-domain, DoF, the number of interface unknown, CDoF are summarized.In order to use the same stop criterion for the methods, ε = 10 −8 .The speedup has been calculated by the following formula, where T ime 1 is the running time in the case of the smallest element size factor, and T ime n is the running time of the different size factor [6].The efficiency has been calculated by the following formula: where Speedup sf is the speedup in the case of the different element size factor, and n is the applied processor cores [6].
The performance results of the parallel program are reported in Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7 and Fig. 8 for all element size factor.The speedups (Fig. 3, Fig. 4 and Fig. 5) are computed using the wall clock time of smallest problem (2900 GDoF) as the reference point.The results show the speedup as high as 6.6 with direct solver, 9.8 and 10.8 with iterative solver for the Jacobi preconditioner, and the Neumann-Neumann preconditioner, respectively.In both iterative case, the speedup is continuously increase until the 0.004 element size factor, because the time of the interprocess communication is relatively smaller, than the time of the parallel PCG.This is also true for the direct solver until 0.006 element size factor.However, this is not true for the largest test cases.When the sub-problems are too big, and the operations of both parallel solvers are very time consuming.Further the memory requirement of the program is also very high in this case.This conclusions are also supported by the figures of efficiency, as it can be show in Fig. 6 in the case of direct solver, Fig. 7 in the case of Jacobi preconditioner, and in Fig. 8   The interprocess communication hardly depend on the number of interface unknowns (CDoF in Tab. 1) and the number of applied processor cores.However, the number of iteration shows the robustness of the presented algorithm, because the curves are continuously increase, so the solver more or less independent from the number of degree of freedom and the number of interface unknown.13 show the simulation results of the induction machine.These figures show the first ten periods of the simulations.Figure 11 shows the transient speed waveform.Figure 12 shows the transient torque waveform of the machine.Figure 13 shows the transient current waveforms of the stator windings.

Conclusion
In this paper, a two-dimensional parallel finite element modelling of an induction machine have been presented taking rotor movement and field-circuit equation of the windings into account.To study the operation of the implemented method, different global finite element size factor have been considered.Results of numerical experiments on all mesh sizes compared.
The parallel direct and the parallel PCG solver with the preconditioners are work properly, as the presented results show.Furthermore, the results obtained for the simulation of the induction machine have also been presented.
The numerical experiments show the work of the implemented program is hardly depend on the size of the problem.If the problem size is too large, the efficiency of the computation is decreased, so the running performance of the implemented program is depend on the size of the problem.However, it can be concluded based on the presented results, the parallel direct solver is more efficient for the smaller problems, whereas the iterative solver is more useful for larger problem.
It should be noted, that only a two-dimensional benchmark has been used for the numerical tests.The tests with more complex three-dimensional problems will be the subject of a forthcoming work.

Fig. 1 :
Fig. 1: Domain decomposition of the finite element mesh of the induction motor.

Fig. 2 :
Fig. 2: The assembled results, the equipotential lines of magnetic vector potential and the magnetic flux density vectors.

Figure 9 and
Figure9and Fig.10show the running performance of the preconditioned conjugate gradient solver, the number of iteration versus global element size

Figure 11 ,
Figure 11, Fig.12and Fig.13show the simulation results of the induction machine.These figures show the first ten periods of the simulations.Figure11shows the transient speed waveform.Figure12shows the transient torque waveform of the machine.Figure13shows the transient current waveforms of the stator windings.

Fig. 13 :
Fig. 13: The time variation of the three phase currents.
Tab. 1: Data of the different used finite element meshes.