EFFECTIVE EXPECTATION MAXIMIZATION ALGORITHM IMPLEMENTATION USING MULTICORE COMPUTER SYSTEMS

. A popular expectation maximization algorithm that is widely used in modern data processing systems to solve various problems including optimization and parameter estimation is considered in the paper. The task of the study was to enhance effectiveness of the algorithm execution in time. An enhancement of execution rate for the EM algorithm using multicore architecture of modern computer systems was carried out. Necessary modifications aimed at better parallelism were proposed for implementation of the EM algorithm. An efficiency of the software implementation was tested on the classic problem of Gaussian random variables mixture separation. It is shown that in the mixture separation problem EM algorithm performance degrades when the distance between mean values of distributions is less than three standard deviations, which is totally in the spirit of three sigma law. In such cases, it is very important to have an efficient EM algorithm implementation to be able to process such test cases in a reasonable time.


Introduction
Fast developments in the area of modern informational computer technologies require new methods and techniques for data processing, implementation of data mining systems for large databases processing, and for solving many other practical problems. Today especially important and useful are new methods for solving optimization problems as well as the tasks of mathematical modeling, image and speech recognition, forecasting and control. All these procedures usually create a kernel of data processing units in modern informational decision support systems (IDSS) [3,7].
An expectation maximization algorithm (EM algorithm) is widely used in mathematical and applied statistics, optimization theory and its multiple applications for computing unknown model parameters, imputing lost measurements, finding the minima and maxima values for various functions etc. One of its applications is directed towards maximum likelihood estimation of unknown parameters for probabilistic models including the cases when some variables cannot be measured directly.
The algorithm is functioning iteratively in two steps. At the E-step (expectation step) an expected value of likelihood function is computed using current approximation of non-measurable variables. The M-step is used for computing the model parameter estimates that maximize the expected likelihood generated at the E-step. The EM algorithm is also often used for data clustering, machine learning, in computer vision systems and natural language processing (known as special case: Baum-Welch algorithm). Due to the possibility of functioning in the conditions of lost data the EM algorithm creates very useful instruments for portfolio risks estimation in analysis of various financial data [1]. Among other known applications are medical image processing: for example, positron emission tomography and single-photon emission computed tomography.
We use the EM algorithm to find the maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. Typically these models involve latent variables in addition to unknown parameters and known data observations. That is, either there are missing values among the data, or the model can be formulated more simply by assuming the existence of additional unobserved data points.

Problem statement
A mixture separation problem can be defined as follows. Given a set of N points in a D-dimensional space,

Problem solution
As shown in [5], this problem has the following solution:

Iterative computing procedure
These equations are strongly connected, because ) ( n k p on the right sides of both equations are dependent on all variables on the left side. This makes it hard to directly solve these equations. Nevertheless, EM-algorithm allows us to build an iterative procedure for solving this system [8]. EM-algorithm is an iterative process, with the following steps: 1) Е-step: estimate conditional expectation Computations of the sum of current membership probabilities for each point, conditional membership probability calculation loop, expectation estimation loop and variance estimation loop, ware parallelized using Open MP technology. Program implementation in C++ is done using Open MP, which allows to analyze algorithm efficiency can be tested experimentally.
To validate the correctness and measure the performance for all tests we used the following two simple metrics: and Timethe wall-clock time of the program execution.
All tests were executed on a PC with Intel Core i7 2600k CPU. This CPU has 4 cores with Hyper Threading technology, which makes it possible to execute 8 parallel threads.

Fig. 1. EM algorithm running time for single thread implementation
As one can see, EM algorithm keeps good precision even when the distance between means is less than the variance. From the other side, algorithm running time increases exponentially in such cases (Tab. 1, Fig, 1). Now take a closer look at the value range of  , which are less than four. As shown in the Tab. 2 and Fig. 2, algorithm performance degrades when the distance between mean values is less than 3 standard deviation values, which can be easily explained with a three sigma rule. Nevertheless, algorithm precision degrades slowly, while the running time increases exponentially.  Fig. 3). Algorithm works in a single thread.

Multithreaded algorithm version
Multithreaded version of the algorithm was implemented using Open MP technology [9,10]. Similar tests will be used for testing multithreaded version of the algorithm. The only difference is that threads used (Threads#) column will replace distribution parameters (a, s) column.

Separating mixture of two gaussians with equal variance and different means which are far apart
Separating the mixture of two Gaussians with equal variance s  

Effectiveness of parallelization
All tests in this paragraph were done with hiring as a data easy separable mixtures, what is meaning that the distances between mean values were significantly greater than standard deviation. The mixture of two Gaussians with equal variance  1 = 2 =1, and different means m 1 =-m 2 =100; sample size is 100 000 000 (Fig. 5).
Parallelization effectiveness [1] is calculated to the relation of single-threaded program running time with formula For higher quality of time measurements all calculations were performed 10 times, and then averaged.
As shown in Fig. 5, EM algorithm parallelization effectiveness drops with the increase of threads number. This is caused by the fact that EM-algorithm is using large amount of memory and this impacts performance when the number of used cores is greater than the number of memory channels.

Conclusions
The problem of random variables mixtures separation was considered with the use of expectation maximization algorithm running on fast processors. Running time estimations were determined and presented as a result of computing experiments. As an example a Gaussian mixture separation quality was measured for different sizes of the mixture samples.
It is shown that in the mixture separation problem performance of the EM algorithm constructed degrades when the distance between mean values of mixture clusters was less than three standard deviations, what is totally in spirit of the three sigma law. In such cases, it is very important to have an efficient EM algorithm implementation to be able to process such test cases in a reasonable time. To measure effectiveness of the computer C++ program parallelization we used the expression where Е is effectiveness; рnumber of used cores; T 1 , T p are program running times on one and р cores respectively. As the carried out experiments show, EM algorithm parallelization effectiveness is almost linear (Е = 0.9) when the number of cores is not greater than memory channels, and degrades with the increase of used cores number.
It is reasonable to direct the future studies to the implementation of selected EM algorithms on other types of modern computing systems, say popular and not expensive graphic processors.