With the HPC systems gradually evolving into the era of exascale, the algorithms and applications are facing challenges of both architectural changes, significantly increased level of parallelism and heterogeneity, as well as new domain requirements from big data analytics and machine learning. This issue focuses on the novel ideas, methods, as well as efforts of software development for resolving the above challenges, and to fill the gap between applications and hardware systems in the coming exascale era.

We have nine invited papers selected for this special issue based on a peer-review procedure, which cover a number of different aspects that relate to the algorithm and application challenges mentioned above.

The first part of the special issue focuses on the algorithmic improvements for sparse matrix solvers, which become a major design challenge for the continuously increasing compute density of recent high-end hardware platforms. We have four papers that discuss algorithm innovations in different aspects of parallel sparse matrix solvers.

  • The paper written by Takeshi Iwashita et al. tackles the ordering part, and presents a hierarchical block multi-color ordering method, which is a new parallel ordering method to vectorize and parallelize sparse triangular solvers. The method demonstrates better performance than the conventional block and nodal multi-color ordering method in 18 out of 21 test cases.

  • Dr. Xiaowen Xu and his colleagues investigate the coarse-level construction in the setup phase of AMG (algebraic multigrid) solvers, which is usually a limiting factor of parallel scalability. Their paper propose α Setup-AMG, which is an adaptive-setup-based AMG solver for sequence of sparse matrices, and demonstrates significantly improved efficiency in a radiation hydrodynamics simulation with thousands of cores.

  • The paper written by Dr. Yu Li et al. provides a parallel generalized conjugate gradient method for large scale eigenvalue problems. Besides the introduced techniques for orthogonalization and computing Rayleigh–Ritz problems, the authors also provide an open-sourced package, GCGE-1.0, which can improve the stability, efficiency and scalability of the eigenvalue solver.

  • The last paper in this part focuses on the new challenges brought by heterogeneous HPC platforms. Dr. Xin He and his colleagues propose a communication avoiding variant of the BICGStab iterative method to reduce the global synchronizations per iteration to be one third of the original method. A communication overlapped implementation of the sparse matrix–vector multiplication is also provided to hide the data transfer between host and accelerators.

The second part of the special issue, consisting of three research papers, focuses on the tool support for HPC applications. With almost all applications facing the dramatic architectural change of the upcoming exascale supercomputers, a lot of current software research efforts have moved to tools that would facilitate such a transition.

  • The paper written by Dr. Nan Ding et al. demonstrates an efficient tool to achieve automatic hardware counter-based performance modeling for parallel applications. Experiments on a number of climate simulation models demonstrate a both accurate (average error rate around 15%) and efficient (average performance overheads of 3%) performance analysis in different scenarios.

  • The second paper written by Dr. Peng Zheng et al. presents parallel and automatic isotropic tetrahedral mesh generation of misaligned assemblies. The proposed method could produce hundreds of millions consistent mesh qualified for high-performance numerical simulation based on thousands of geometry components, with effective demonstrations of a giant dam model and an integrated circuit board model.

  • The third paper written by Xiaohui Duan et al. target the memory bottleneck of simulation applications on Sunway TaihuLight. A general purpose software cache library is proposed for a wide range of applications, especially for the constantly changing molecular dynamics applications and simulation cases with dynamic unstructured grids.

The third part is a research paper from Prof. Jun Makino’s group, which is well known for their multi-decades’ continuous effort on particle simulation, and a number of Gordon Bell Prizes achieved over the years. The paper provides an overview of the current status and future development directions of their FDPS (Framework for Developing Particle Simulator) Framework.

The last part of the special issue is a survey paper on reconfigurable computing for HPC applications. Written by Dr. Lin Gan and his colleagues in Tsinghua University, and Prof. Wayne Luk from Imperial College London, this paper intends to provide a perspective that is different from conventional HPC hardware and software approaches, and demonstrates the ideas of data-flow oriented parallel processing and customized hardware architectures for specific applications.

We would like to take this chance to thank all the authors and the reviewers for their splendid contribution to this special issue of CCF THPC. Only with their great efforts, we are able to put together the nine research papers that discuss different topics, and present different ideas that help to bridge HPC algorithms/applications and the underlying hardware platforms.