Reference Hub3
Embarrassingly Parallel GPU Based Matrix Inversion Algorithm for Big Climate Data Assimilation

Embarrassingly Parallel GPU Based Matrix Inversion Algorithm for Big Climate Data Assimilation

M. Varalakshmi, Amit Parashuram Kesarkar, Daphne Lopez
Copyright: © 2018 |Volume: 10 |Issue: 1 |Pages: 22
ISSN: 1938-0259|EISSN: 1938-0267|EISBN13: 9781522543367|DOI: 10.4018/IJGHPC.2018010105
Cite Article Cite Article

MLA

Varalakshmi, M., et al. "Embarrassingly Parallel GPU Based Matrix Inversion Algorithm for Big Climate Data Assimilation." IJGHPC vol.10, no.1 2018: pp.71-92. http://doi.org/10.4018/IJGHPC.2018010105

APA

Varalakshmi, M., Kesarkar, A. P., & Lopez, D. (2018). Embarrassingly Parallel GPU Based Matrix Inversion Algorithm for Big Climate Data Assimilation. International Journal of Grid and High Performance Computing (IJGHPC), 10(1), 71-92. http://doi.org/10.4018/IJGHPC.2018010105

Chicago

Varalakshmi, M., Amit Parashuram Kesarkar, and Daphne Lopez. "Embarrassingly Parallel GPU Based Matrix Inversion Algorithm for Big Climate Data Assimilation," International Journal of Grid and High Performance Computing (IJGHPC) 10, no.1: 71-92. http://doi.org/10.4018/IJGHPC.2018010105

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Attempts to harness the big climate data that come from high-resolution model output and advanced sensors to provide more accurate and rapidly-updated weather prediction, call for innovations in the existing data assimilation systems. Matrix inversion is a key operation in a majority of data assimilation techniques. Hence, this article presents out-of-core CUDA implementation of an iterative method of matrix inversion. The results show significant speed up for even square matrices of size 1024 X 1024 and more, without sacrificing the accuracy of the results. In a similar test environment, the comparison of this approach with a direct method such as the Gauss-Jordan approach, modified to process large matrices that cannot be processed directly within a single kernel call shows that the former is twice as efficient as the latter. This acceleration is attributed to the division-free design and the embarrassingly parallel nature of every sub-task of the algorithm. The parallel algorithm has been designed to be highly scalable when implemented with multiple GPUs for handling large matrices.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.