Abstract
This paper presents Distributed Systems Foundation (DSF), a common platform for distributed systems research and development. It can run a distributed algorithm written in Java under multiple execution modes—simulation, massive multi-tenancy, and real deployment. DSF provides a set of novel features to facilitate testing and debugging, including chaotic timing test and time travel debugging with mutable replay. Unlike existing research prototypes that offer advanced debugging features by hacking programming tools, DSF is written entirely in Java, without modifications to any external tools such as JVM, Java runtime library, compiler, linker, system library, OS, or hypervisor. This simplicity stems from our goal of making DSF not only a research prototype but more importantly a production tool. Experiments show that DSF is efficient and easy to use. DSF’s massive multi-tenancy mode can run 4,000 OS-level threads in a single JVM to concurrently execute (as opposed to simulate) 1,000 DHT nodes in real-time.
Chapter PDF
Similar content being viewed by others
Keywords
References
Chandra, T., Griesemer, R., Redstone, J.: Paxos Made Live—An Engineering Perspective. In: PODC (2007)
Choi, J.-D., Srinivasan, H.: Deterministic replay of Java multithreaded applications. In: Proceedings of the SIGMETRICS symposium on Parallel and distributed tools (1998)
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In: OSDI (2002)
Edelstein, O., Farchi, E., Nir, Y., Ratsaby, G., Ur, S.: Multithreaded Java program test generation. IBM Systems Journal 41(1), 111–125 (2002)
Elnozahy, E., Alvisi, L., Wang, Y., Johnson, D.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR) 34(3), 375–408 (2002)
GarcĂa, P., Pairot, C., MondĂ©jar, R., Pujol, J., Tejedor, H., Rallo, R.: Planetsim: A new overlay network simulation framework. In: Gschwind, T., Mascolo, C. (eds.) SEM 2004. LNCS, vol. 3437, pp. 123–136. Springer, Heidelberg (2005)
Geels, D., Altekar, G., Shenker, S., Stoica, I.: Replay Debugging for Distributed Applications. In: USENIX (2006)
Guo, Z., Wang, X., Tang, J., Liu, X., Xu, Z., Wu, M., Kaashoek, F., Zhang, Z.: R2: An Application-Level Kernel for Record and Replay. In: OSDI (2008)
IBM WebSphere Extended Deployment, http://www-306.ibm.com/software/webservers/appserv/extend/
Jones, M., Dunagan, J.: Engineering Realities of Building a Working Peer-to-Peer System. Technical report, MSR Technical Report MSR-TR-2004-54 (2004)
Lamport, L.: Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (2002)
Lin, S., Pan, A., Zhang, Z., Guo, R., Guo, Z.: WiDS: an Integrated Toolkit for Distributed System Development. In: HotOS (2005)
Liu, X., Lin, W., Pan, A., Zhang, Z.: WiDS Checker: Combating Bugs in Distributed Systems. In: NSDI (2007)
Rodriguez, A., Killian, C., Bhat, S., Kostic, D., Vahdat, A.: MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks. In: NSDI (2004)
Saito, Y.: Jockey: a user-space library for record-replay debugging. In: Proceedings of the sixth international symposium on Automated analysis-driven debugging (2005)
Segall, Z., Vrsalovic, D., Siewiorek, D., Yaskin, D., Kownacki, J., Varton, J., Dancey, R., Robinson, A., Lin, T.: FIAT–Fault injection based automated testing environment. In: Proc. 18th Int. Symp. Fault-Tolerant Comput., pp. 102–107 (1988)
Srinivasan, S.M., Kandula, S., Andrews, C.R., Zhou, Y.: Flashback: A lightweight extension for rollback and deterministic replay for software debugging. In: USENIX (2004)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM (2001)
Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A Scalable Application Placement Algorithm for Enterprise Data Centers. In: WWW (2007)
Thereska, E., Salmon, B., Strunk, J., Wachs, M., Abd-El-Malek, M., Lopez, J., Ganger, G.R.: Stardust: tracking activity in a distributed storage system. In: SIGMETRICS (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 IFIP International Federation for Information Processing
About this paper
Cite this paper
Tang, C. (2009). DSF: A Common Platform for Distributed Systems Research and Development. In: Bacon, J.M., Cooper, B.F. (eds) Middleware 2009. Middleware 2009. Lecture Notes in Computer Science, vol 5896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10445-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-10445-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10444-2
Online ISBN: 978-3-642-10445-9
eBook Packages: Computer ScienceComputer Science (R0)