Skip to main content

Getting Ready for Prime Time

  • Chapter
  • First Online:
Pro Spark Streaming
  • 2553 Accesses

Abstract

Application development is an incremental and continuous process: once an application has been designed, implemented, and deployed, it needs to be constantly monitored and improved. The same applies to real-time pipelines, with additional variables: scalability and capacity. There may be an increase in the volume and velocity of the incoming data or lower latency requirements. Over time, as requirements change, initial design choices need to be reevaluated. Developers and infrastructure engineers clamor to squeeze the last bit of performance out of both the software stack and the hardware. Regardless of the cause and effect, all such projects require rigorous and generous instrumentation—from logging and monitoring to alerting and metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Ellery Wulczyn and Dario Taraborelli, “Wikipedia Clickstream,” Figshare, January 4, 2016, http://figshare.com/articles/Wikipedia_Clickstream/1305770 .

  2. 2.

    Haoyuan Li et al., “Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks,” Proceedings of SOCC ’14 (ACM, 2014).

  3. 3.

    Download Tachyon version 0.6.4, and fill in $TACHYON_HOME/conf/tachyon-env.sh with values appropriate for your setup. If running it on top of the local file system, make sure TACHYON_UNDERFS_ADDRESS is set to a local file system folder, such as /tmp. Like any other file system, Tachyon first needs to be formatted via $TACHYON_HOME/bin/tachyon format before being executed: $TACHYON_HOME/bin/tachyon-start.sh [all|local].

  4. 4.

    Instead of keeping track of each individual article, the code aggregates all of them under the key “wikipedia”.

  5. 5.

    The application id and rdd id would obviously vary from execution to execution.

  6. 6.

    Andrew Or, “Understanding Your Spark Application through Visualization,” Databricks, June 22, 2015, https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html .

  7. 7.

    You can click Input Rate to see a breakdown per receiver.

    Figure 7-19.
    figure 19

    Streaming Statistics for an unhealthy application

  8. 8.

    http://metrics.dropwizard.io/3.1.0/ .

  9. 9.

    http://graphite.readthedocs.org/en/latest/index.html .

  10. 10.

    http://grafana.org/ .

  11. 11.

    www.docker.com/ .

  12. 12.

    Install Docker on a *nix system by running wget -qO- https://get.docker.com/ | sh. Warm up Docker by executing docker-machine env default followed by eval "$(docker-machine env default)".

  13. 13.

    https://github.com/SamSaffron/graphite_docker .

  14. 14.

    Find out the IP of a container via docker-machine ip <container_id>.

  15. 15.

    Use your browser to jump to the Grafana dashboard on port 3000, and choose Data Sources from the menu on the extreme left. Then click Add New, and enter http://localhost:80 as the URL.

  16. 16.

    https://collectd.org/ .

  17. 17.

    To set up collectd, download it from https://collectd.org/download.shtml and install it on all machines in your cluster via ./configure; make all install.

  18. 18.

    Add LoadPlugin write_graphite to your collectd.conf, which is typically located at /etc/collectd.conf.

  19. 19.

    https://www.nagios.org/ .

  20. 20.

    Download Nagios Core from https://www.nagios.org/downloads/nagios-core . The installation process is pretty standard: ./configure followed by make all and make install. Please refer to the Nagios web site for more information.

  21. 21.

    https://exchange.nagios.org/directory/Plugins .

  22. 22.

    https://exchange.nagios.org/directory/Plugins/Clustering-and-High-2DAvailability/check_spark_cluster-2Epl-(Advanced-Nagios-Plugins-Collection)/details .

  23. 23.

    git clone https://github.com/harisekhon/nagios-plugins .

  24. 24.

    http://<nagios_host>/nagios.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Zubair Nabi

About this chapter

Cite this chapter

Nabi, Z. (2016). Getting Ready for Prime Time. In: Pro Spark Streaming. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1479-4_7

Download citation

Publish with us

Policies and ethics