research-article

Efficient scheduling of scientific workflows in a high performance computing cluster

Authors:
Roger S. Barga

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Dan Fay

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Dean Guo

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Steven Newhouse

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Yogesh Simmhan

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Alex Szalay

The Johns Hopkins University, Baltimore, MD, USA

The Johns Hopkins University, Baltimore, MD, USA
View Profile

CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environmentsJune 2008Pages 63–68https://doi.org/10.1145/1383529.1383545

Published:23 June 2008Publication History

CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments

Pages 63–68

ABSTRACT

The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100⁺ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project

References

Project Neptune http://www.neptune.washington.edu/.Google Scholar
Swiss Experiment, http://www.swiss-experiment.ch.Google Scholar
Life Under Your Feet http://lifeunderyourfeet.org.Google Scholar
Bell, J. Gray and A. Szalay, Petascale Computational Systems. IEEE Computer 39(1): 110--112 (2006). Google ScholarDigital Library
Microsoft Windows Workflow Foundation (WinWF) http://en.wikipedia.org/wiki/Windows_Workflow_Foundation.Google Scholar
Technical Computing Group of Microsoft Research http://www.microsoft.com/science.Google Scholar
Microsoft Silverlight http://silverlight.net/.Google Scholar
Pan-STARRS http://pan-starrs.ifa.hawaii.edu/public/.Google Scholar

Index Terms

Efficient scheduling of scientific workflows in a high performance computing cluster
1. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software system structures
      1. Abstraction, modeling and modularity
      2. Software architectures

Recommendations

A Survey of Data-Intensive Scientific Workflow Management

Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Read More
Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform

Many scientific workflows can be structured as Parallel Task Graphs (PTGs), that is, graphs of data-parallel tasks. Adding data parallelism to a workflow provides opportunities for higher performance and scalability. Workflow tasks are data-parallel and ...
Read More
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
June 2008
74 pages
ISBN:9781605581569
DOI:10.1145/1383529
General Chair:
Yoonhee Kim
Sookmyung Women's University
,
Program Chair:
Xiaolin (Andy) Li
Oklahoma State University
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data intensive
eScience
scheduling
scientific workflow
Qualifiers
- research-article
Conference
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 630
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient scheduling of scientific workflows in a high performance computing cluster

CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Survey of Data-Intensive Scientific Workflow Management

Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient scheduling of scientific workflows in a high performance computing cluster

CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Survey of Data-Intensive Scientific Workflow Management

Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media