Conquering Big Data Through the Usage of the Wrangler Supercomputer

Salazar, Jorge

doi:10.1007/978-3-319-33742-5_16

Jorge Salazar²

1765 Accesses

Abstract

Data-intensive computing brings a new set of challenges that do not completely overlap with those met by the more typical and even state-of-the-art High Performance Computing (HPC) systems. Working with ‘big data’ can involve analyzing thousands of files that need to be rapidly opened, examined and cross-correlated—tasks that classic HPC systems might not be designed to do. Such tasks can be efficiently conducted on a data-intensive supercomputer like the Wrangler supercomputer at the Texas Advanced Computing Center (TACC). Wrangler allows scientists to share and analyze the massive collections of data being produced in nearly every field of research today in a user-friendly manner. It was designed to work closely with the Stampede supercomputer, which is ranked as the number ten most powerful in the world by TOP500, and is the HPC flagship of TACC. Wrangler was designed to keep much of what was successful with systems like Stampede, but also to introduce new features such as a very large flash storage system, a very large distributed spinning disk storage system, and high speed network access. This allows a new way for users to access HPC resources with data analysis needs that weren’t being fulfilled by traditional HPC systems like Stampede. In this chapter, we provide an overview of the Wrangler data-intensive HPC system along with some of the big data use-cases that it enables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Boosting HPC data analysis performance with the ParSoDA-Py library

Article Open access 02 February 2024

Big Data programming with Apache Spark

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

References

Stampede supercomputer, https://www.tacc.utexas.edu/systems/stampede. Accessed 15 Feb 2015
Wrangler supercomputer, https://www.tacc.utexas.edu/systems/wrangler. Accessed 15 Feb 2015
Extreme Science and Engineering Discovery Environment (XSEDE), https://www.xsede.org/. Accessed 15 Feb 2015
iRods, http://irods.org/. Accessed 15 Feb 2015
TigerVNC, http://tigervnc.org. Accessed 15 Feb 2015
RStudio, https://www.rstudio.com/. Accessed 15 Feb 2015
Jupyter Notebook, http://jupyter.org/. Accessed 15 Feb 2015
The Hofmann Lab at the University of Texas at Austin, http://cichlid.biosci.utexas.edu/index.html. Accessed 15 Feb 2015
OrthoMCL 2.0.9, https://wiki.gacrc.uga.edu/wiki/OrthoMCL. Accessed 15 Feb 2015
Titan supercomputer, https://www.olcf.ornl.gov/titan/. Accessed 15 Feb 2015
Autotune, http://rsc.ornl.gov/autotune/?q=content/autotune. Accessed 15 Feb 2015
PaleoCore, http://paleocore.org/. Accessed 15 Feb 2015
Hobby-Eberly Telescope Dark Energy Experiment (HTDEX), http://hetdex.org/. Accessed 15 Feb 2015
Visible Integral Field Replicable Unit Spectrograph (VIRUS), http://instrumentation.tamu.edu/virus.html. Accessed 15 Feb 2015

Download references

Acknowledgement

We are grateful to the Texas Advanced Computing Center, the National Science Foundation, the Extreme Science and Engineering Discovery Environment, Niall Gaffney (Texas Advanced Computing Center), Rebecca Young (University of Texas at Austin), Denne Reed (University of Texas at Austin), Steven Finkelstein (University of Texas at Austin), Joshua New (Oak Ridge National Laboratory).

Author information

Authors and Affiliations

Science and Technology Writer, Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
Jorge Salazar

Authors

Jorge Salazar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge Salazar .

Editor information

Editors and Affiliations

Texas Advanced Computing Center, Austin, Texas, USA
Ritu Arora

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Salazar, J. (2016). Conquering Big Data Through the Usage of the Wrangler Supercomputer. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-33742-5_16
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33740-1
Online ISBN: 978-3-319-33742-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Conquering Big Data Through the Usage of the Wrangler Supercomputer

Abstract

Access this chapter

Similar content being viewed by others

Boosting HPC data analysis performance with the ParSoDA-Py library

Big Data programming with Apache Spark

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Conquering Big Data Through the Usage of the Wrangler Supercomputer

Abstract

Access this chapter

Similar content being viewed by others

Boosting HPC data analysis performance with the ParSoDA-Py library

Big Data programming with Apache Spark

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation