ABSTRACT
The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project
- Project Neptune http://www.neptune.washington.edu/.Google Scholar
- Swiss Experiment, http://www.swiss-experiment.ch.Google Scholar
- Life Under Your Feet http://lifeunderyourfeet.org.Google Scholar
- Bell, J. Gray and A. Szalay, Petascale Computational Systems. IEEE Computer 39(1): 110--112 (2006). Google ScholarDigital Library
- Microsoft Windows Workflow Foundation (WinWF) http://en.wikipedia.org/wiki/Windows_Workflow_Foundation.Google Scholar
- Technical Computing Group of Microsoft Research http://www.microsoft.com/science.Google Scholar
- Microsoft Silverlight http://silverlight.net/.Google Scholar
- Pan-STARRS http://pan-starrs.ifa.hawaii.edu/public/.Google Scholar
Index Terms
- Efficient scheduling of scientific workflows in a high performance computing cluster
Recommendations
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform
Many scientific workflows can be structured as Parallel Task Graphs (PTGs), that is, graphs of data-parallel tasks. Adding data parallelism to a workflow provides opportunities for higher performance and scalability. Workflow tasks are data-parallel and ...
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of ...
Comments