ABSTRACT
The World Wide Web has naturally been evolving towards processing extra-large data volumes, such as collected by Linked Life Data or Open PHACTS repositories, capable of hosting billions of information entities (e.g., RDF triples used in Semantic Web) and beyond. In view of the explosive data growth along with excessive QoS requirements on scalability and processing time constraints, the Web is expected to dominate the data-centric computing already in the next decade. On the other hand, most of the current HPC infrastructures, both academic and industrial, do not support parallel Web applications, e.g., developed in the Hadoop framework, due to their service-oriented implementation in the Java programming language, which is (and will surely remain) prevalent for the Web programming. As a reaction to novel challenges of promoting data-centric supercomputing to the Web, we present a solution that introduces the Message Passing Interface (MPI) bindings to Java, seamlessly integrated in one of the most popular current MPI implementations - Open MPI. Our implementation enables Java-based Semantic Web applications to be successfully ported to the most of modern HPC systems. We also discuss the design features of Open MPI that enable the proliferation of MPI into Java applications. Finally, we present a pilot Semantic Statistics scenario implemented with MPI, Random Indexing, and discuss future work in terms of promising Semantic Web applications, such as Reasoning.
- Aster mapreduce analytics portfolio: Supercharge analytics with sql-mapreduce.Google Scholar
- Mpi: A message-passing interface standard, 1995.Google Scholar
- M. Assel, A. Cheptsov, B. Czink, D. Damljanovic, and J. Quesada. Mpi realization of high performance search for querying large rdf graphs using statistical semantics. In Proc. The 1st Workshop on High-Performance Computing for the Semantic Web (HPCSW2011), co-located with the 8th Extended Semantic Web Conference, ESWC2011, Heraklion, Greece, May 2011.Google Scholar
- M. Baker, B. Carpenter, G. Fox, S. Ko, and S. Lim. mpiJava: An object-oriented java interface to mpi. In Proc. International Workshop on Java for Parallel and Distributed Computing IPPS/SPDP, San Juan, Puerto Rico, 1999. Google ScholarDigital Library
- M. Baker, B. Carpenter, and A. Shafi. MPJ Express: Towards thread safe java hpc. In Proc. IEEE International Conference on Cluster Computing (Cluster'2006), Barcelona, Spain, September 2006.Google ScholarCross Ref
- M. Bornemann, R. van Nieuwpoort, and T. Kielmann. Mpj/ibis: A flexible and efficient message passing platform for java. Concurrency and Computation: Practice and Experience, 17:217--224, 2005.Google Scholar
- B. Carpenter, G. Fox, S.-H. Ko, and S. Lim. mpiJava 1.2: Api specification. Northeast Parallel Architecture Center. Paper 66, 1999.Google Scholar
- B. Carpenter, V. Getov, G. Judd, A. Skjellum, and G. Fox. MPJ: Mpi-like message passing for java. Concurrency and Computation - Practice and Experience, 12(11):1019--1038, 2000.Google Scholar
- A. Cheptsov and M. Assel. Towards high performance semantic web -- experience of the larkc project. inSiDE - Journal of Innovatives Supercomputing in Deutschland, 9(1):569--571, Spring 2011.Google Scholar
- A. Cheptsov, M. Assel, B. Koller, R. Kübert, and G. Gallizo. Enabling high performance computing for java applications using the message-passing interface. In Proc. The Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering (PARENG'2011).Google Scholar
- N. Davis. Cray's yarcdata division launches new big data graph appliance, February 2012.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce- simplified data processing on large clusters. In Proc. OSDI'04: 6th Symposium on Operating Systems Design and Implementation, 2004. Google ScholarDigital Library
- E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proc., 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.Google ScholarCross Ref
- G. Judd, M. Clement, Q. Snell, and V. Getov. Design issues for efficient implementation of mpi in java. In Proc. the 1999 ACM Java Grande Conference, pages 58--65, 1999. Google ScholarDigital Library
- P. McCarthy. Introduction to jena. IBM developerWorks.Google Scholar
- M. Sahlgren. An introduction to random indexing. In Proc. Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE)'2005, pages 1--9, 2005.Google Scholar
- E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: a practical owl-dl reasoner. Journal of Web Semantics. Google ScholarDigital Library
- J. Weaver and J. A. Hendler. Parallel materialization of the finite rdfs closure for hundreds of millions of triples. In A. B. et al., editor, Proc. International Semantic Web Conference (ISWC) 2009, 2009. Google ScholarDigital Library
Index Terms
- OmpiJava: a tool for development of high-performance reasoning applications for the semantic web
Recommendations
A Novel Data-Centric Programming Model for Large-Scale Parallel Systems
Euro-Par 2019: Parallel Processing WorkshopsAbstractThis paper presents the main features and the programming constructs of the DCEx programming model designed for the implementation of data-centric large-scale parallel applications on Exascale computing platforms. To support scalable parallelism, ...
A computational science IDE for HPC systems: design and applications
Software engineering studies have shown that programmer productivity is improved through the use of computational science integrated development environments (or CSIDE, pronounced "sea side") such as MATLAB. Scientists often desire to use high-...
A massively parallel distributed n-body application implemented with HPX
ScalA '16: Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsOne of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while ...
Comments