Non-Strict Execution in Parallel and Distributed Computing

Cristobal-Salas, Alfredo; Tchernykh, Andrei; Gaudiot, Jean-Luc; Lin, Wen-Yen

doi:10.1023/A:1022664724413

Non-Strict Execution in Parallel and Distributed Computing

Published: April 2003

Volume 31, pages 77–105, (2003)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Alfredo Cristobal-Salas¹,
Andrei Tchernykh¹,
Jean-Luc Gaudiot² &
…
Wen-Yen Lin³

94 Accesses
3 Citations
Explore all metrics

Abstract

This paper surveys and demonstrates the power of non-strict evaluation in applications executed on distributed architectures. We present the design, implementation, and experimental evaluation of single assignment, incomplete data structures in a distributed memory architecture and Abstract Network Machine (ANM). Incremental Structures (IS), Incremental Structure Software Cache (ISSC), and Dynamic Incremental Structures (DIS) provide non-strict data access and fully asynchronous operations that make them highly suited for the exploitation of fine-grain parallelism in distributed memory systems. We focus on split-phase memory operations and non-strict information processing under a distributed address space to improve the overall system performance. A novel technique of optimization at the communication level is proposed and described. We use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing, but also to reduce the number of messages when some information about the input or part of the input is known. We show that split-phase transactions of IS, together with the ability of deferring reads, allow partial evaluation of distributed programs without losing determinacy. Our experimental evaluation indicates that commodity PC clusters with both IS and a caching mechanism, ISSC, are more robust. The system can deliver speedup for both regular and irregular applications. We also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of MPI IS and MPI ISSC applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A coded shared atomic memory algorithm for message passing architectures

Article 13 June 2016

Trace Semantics and Algebraic Laws for Total Store Order Memory Model

Article 30 November 2021

Store Buffer Reduction with MMUs

REFERENCES

A.M.Stepanov,Parallel Computation on Associative Networks, Preprint,AS USSR. Institute of Precise Mechanics and Computer Technology,Vol.2,Moscow,p.53 (1991) (In Russian).
S.Wray and J.Fairbaim,Non-Strict Languages-Programming and Implementation, Comput.J. 32(2:l42-151 (1989).
Google Scholar
Y.-H. Wei and J.-L.Gaudiot,Lazy Evaluation of FP Programs:A Data-Flow Approach, Proc.of the Int 'l.Conf.on Fifth Generation Computer Systems (1988).
G.Tremblay and G.R.Gao,The Impact of Laziness on Parallelism and the Limits of Strictness Analysis.In Proceedings High Performance Functional Computing, W. Bohm and J.T. Feo (eds.),pp.119-133 (April 1995).
R.Bird,Introduction to Functional Programming using Haskell,2nd edn.,Prentice Hall Press,460 pp.(1998).
R.Nikhil and M. Arvind,Implicit Parallel Programming in pH,Morgan Kaufmann Publishers,p.400 (2001).
M. Amamiya and R. Hasegawa,Data-Flow Computing and Eager and Lazy Evaluations, Computing 2(2):105-129 (1984).
Google Scholar
M.Amamiya, R. Hasegawa,and H. Mikami,List Processing with a Data-Flow Machine, Lecture Notes in Comput.Sci.147:165-190 (1983).
Google Scholar
N.Jones, An Introduction to Partial Evaluation, ACMComput.Surv. 28(3):480-503 (1996).
Google Scholar
T. Mogensen and P. Sestoft, Partial Evaluation,An Article for Encyclopedia of Computer Science and Technology,FTP version (1996).
A.P.Ershov,Mixed Computation:Potential Applications and Problems for Study, Theoret.Comput.Sci.18 (1982).
I.Bjorner, A. Ershov,and N. Jones, Partial Evaluation and Mixed Computational Evaluation of Pattern Matching in String, Inform. Process.Lett.30(2):79-86 (1989).
Google Scholar
C. Consel and O. Danvy,Static and Dynamic Semantic Processing, ACM Symposium on Principles of Programming Languages,pp.14-23 (1991).
J. Jørgensen,Generating a Compiler for a Lazy Language by Partial Evaluation, ACM Symposium on Principles of Programming Languages,pp.258-268 (1992).
N.Jones, C. Gomard,and P. Sestoft, Partial Evaluation and Automatic Program Genera-tion,Prentice-Hall (1993).
P. Sesyoft and H. Sondergaard (eds.), Special Issue on Partial Evaluation and Semantic-Based Program Manipulation (PEPM '94) (Lisp and Symbolic Computation, Vol.8, No.3)(1995).
Google Scholar
M. Sperber, H. Klaeren,and P. Thiemann,Distributed Partial Evaluation.In PASCO '97: Second Int 'l.Symposium on Parallel Symbolic Computation,Erich Kaltofen (ed.), p.8087,Maui,Hawaii, World Scientific Publishing Company (1997).
Google Scholar
http://www.diku.dk/research-groups/topps/activities/cmix/
http://compose.labri.fr/prototypes/tempo/
P. Kumar, J.P. Gupta,and S.C. Winter,CTDNET III-An Eager Reduction Model with Laziness Features.In Abstract Machine Models for Highly Parallel Computers,J.R. Davy and P.M. Dew (eds.),pp.103-117 (1995).
J.P. Gupta, S.C. Winter,and D.R. Wilson,CTDNet-A Mechanism for the Concurrent Execution of Lambda Graphs,IEEE Trans.Soft.Eng.15:1357-1367 (1989).
Google Scholar
J.Jaakko,Tuples and multiple return values in C++,TUCS Technical report No.249, Turku Centre for Computer Science (March 1999).ISBN 952-12-0401.
Arvind, R.S. Nikhil,and K.K. Pingali,I-Structures:Data Structures for Parallel Comput-ing,ACMTransaction on Programming Languages and Systems 11(4):598-632 (Oct.1989).
Google Scholar
P.S.Barth,Using Atomic Data Structures for Parallel Simulation.In Proceedings of the Scalable High Performance Computing Conference,Williamsburg,VA,April 27 (1992).
S. Sur and W. Böhm,Efficient Declarative Programs:Experience in Implementing NAS Benchmark FT,Technical Report CS-93-128,Colorado State University (October 1993).
X. Shen and B.S. Ang,Implementing I-Structures at Cache Coherence Level,Proceedings on the 5th Annual MIT Student Workshop on Scalable Computing,MIT (1995).
W.-Y. Lin and J.-L. Gaudiot,I-Structure Software Cache-A Split-Phase Transaction Runtime Cache System,Proceedings of PACT '96 Boston,MA,Oct. 20-23 (1996).
M. Sato, Y. Kodama, S. Sakai, Y. Yamaguchi,and S. Sekiguti,Distributed Data Struc-ture in Thread-Based Programming for a Highly Parallel Dataflow Machine,EM-4.Proc. of ISCA 92 Dataflow Workshop (1992).
P. Wadler,Monads for Functional Programming.In Advanced Functional Programming, J. Jeuring and E. Meijer (eds.),Springer Verlag,LNCS 925 (1995).
P.S. Barth, R.S. Nikhil,and Arvind,M-Structures:Extending a Parallel,Non-strict.-Computation Structures,Proceedings on Functional Programming and Computer Architec-ture,Cambridge,MA,August 28-30 (1991).
S. Sur and W. Böhm,Functional,I-Structure,and M-Structure Implementations of NAS Benchmark FT,Proceedings of the Int 'l.Conf.on Parallel Architecture and Compilation Techniques (PACT '94)(August 1994).
I. Attali, D. Caromel, Y.-S. Chen, J.-L. Gaudiot,and A.L. Wendelbom,Enhanced Functional and Irregular Parallelism:Stateful Fucntions and Their Semantics,Int.J. Parallel Progr.29(4)(August 2001),in press.
D. Kranz, B.-H. Lim, A. Agarwal,and D. Yeoung,Low-Cost Support for Fine-Grain Synchronization in Multiprocessors.In Multithreaded Computer Architecture:A Summary of the State of the Art,R.A. Iannucci, G.R. Gao,and R.H. Halstead,Jr.(eds.),Kluwer Academic Publishers (1992).
A. Agarwal, R. Bianchini, D.Chiken,K. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie,and D. Yeung,The MIT Alewife Machine:Architecture and Performance, Proceedings of the 22nd Annual Int 'l.Symposium on Computer Architecture,ISCA '95, June 22-24,Santa Margherita Ligure,Italy,pp.2-13 (1995).
A.M. Stepanov, A.N. Tchernykh, A.I. Lupenko,and N.G. Tchernykh,Parallel Com-putation on Associative Network.In Proceedings of MPCS '96 MFCS '96 Second Int 'l.Conf.on Massively Parallel Computing Systems,IEEE Computer Society Press, pp.190-197 (1996).
A. Tchernykh, A. Stepanov, A. Rodrýguez,and I.Scherson,Parallel Computation in Abstract Network Machina,Revista Iberoamericana de Investigacion ''Computacion y Sistemas ''v.TV,No.4,pp.143-157 (2000).
K. Ueda and T. Chikayama,Design of the Kernel Language for the Parallel Inference Machine,Comput.J.33(6):494-500 (1990).
Google Scholar
K.Ueda,Guarded Horn Clauses.In Concurrent Prolog:Collected Papers,E.Shapiro (ed.),MIT Press,Vol.1,pp.140-156 (1987).
K. Ueda,Designing a Concurrent Programming Language.In Proceedings of an Int 'l. Conf.organized by the IPSJ to Commemorate the 30th Anniversary (InfoJapan '90), Information Processing Society of Japan,pp.87-94 (October 1990).
W.-Y. Lin, J.N. Amaral, J.-L. Gaudiot,and G.R. Gao,Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System,Technical report,Dept. of E.E.-Systems,University of Southern California (July 1999).
W.-Y. Lin, J.-L. Gaudiot, J.N. Amaral,and G.R. Gao,Performance Analysis of the I-Structure Software Cache on Multi-Threading Systems,19th IEEE Int 'l.Performance, Computing and Communication Conference,IPCCC2000,Phoenix,Arizona,Feb.20-22 (2000).
W.-Y. Lin, J.N. Amaral, J.-L. Gaudiot,and G.R. Gao,Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System,Int 'l.Parallel and Dis-tributed Processing Symposium,IPDPS2000,Cancun,Mexico May 1-5 (2000).
J.N. Amaral, W.-Y. Lin, J.-L. Gaudiot,and G.R. Gao,Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions,Cluster Comput. J.4(4)(October 2001).
H. Ogawa and S. Matsuoka,OMPI:Optimizing MPI Programs Using Partial Evaluation, Proceedings of the 1996 IEEE/ACM Supercomputing Conference,Pittsburgh (November 1996).
T. von Eicken, D.E. Culler, S.C. Goldstein,and K.E. Schauser,Active Messages: A Mechnisim for Integrated Communication and Computation,Proceedings of the 19th Int 'l.Symposium on Computer Architecture,pp.256-266 (May 1992).
A.M. Stepanov, A.N. Tchernykh,A.I. Lupenko,and N.G. Tchernykh,Dynamic Partial Evaluations as Declarative Program Parallelization and Optimization Technique, Information Technology and Computer Systems 1(4):32-41 (1997).
Google Scholar
A.N. Tchernykh, A.M. Stepanov, A.I. Lupenko,and N.G. Tchernykh,Extraction and Optimization of the Implicit Program Parallelism by Dynamic Partial Evaluation, pAs '97 The Second Aizu Int 'l.Symposium on Parallel Algorithms/Architecture Synthesis, pp.332-339,IEEE Computer Society Press (1997).
A. Stepanov and A. Lupenko,Programming for ANM,Institute of Precise Mechanics and Computer Technology RAS;3,p.53,Moscow (1991).
Google Scholar
J.B. Dennis and G.R. Gao,On Memory Models and Cache Management for Shared-Memory Multiprocessors,CSG MEMO 363,Laboratory for Computer Science,MIT (March 1995).
D.E. Culler, S.C. Goldstein, K.E. Schauser,and T. von Eicken,Empirical Study of a Dataflow Language on the CM-5.In Advanced Topics in Dataflow Computing and Multi-threading,G.R. Gao, L. Bic,and J.-L. Gaudiot (eds.),pp.187-210,IEEE press (1994).
R. Govindarajan, S. Nemawarkar,and P. LeNir, Design and Performance Evaluation of a Multithreaded Architecture.In Proceedings of the First Int 'l.Symposium on High-Per-formance Computer Architecture,Raliegh,pp.298-307 (1995).
W.-Y. Lin and J.-L. Gaudiot,Exploiting Global Data Locality in Non-Blocking Multi-threading Architectures,Proceedings of ISPAN '97,Taipei,Taiwan (December 1997).
W.-Y. Lin and J.-L. Gaudiot,The Design of an I-Structure Software Cache System, Proceedings of MTEAC '98,Las Vegas, February 1-4.
H.-S. Kim, S. Ha,and C.S. Jhon,Performance Impacts of Caching I-Structure Data on Frame-Based Multithreaded Processing,Proceedings of the High-performance Computing on the Information Superhighway,HPC-Asia '97 (1997).
K.M. Kavi, A.R. Hurson, P. Patadia, E. Abraham,and P. Shanmugam,Design of Cache Memories for Multithreaded Dataflow Architecture.In ISCA 95,pp.253264 (1995).
J. Darlington, M. Cripps, T. Field, P. Harrison,and M. Reeve,The Design and Imple-mentation of ALICE:A Parallel Graph Reduction Machine.In Selected Reprints on Dataflow and Reduction Architectures,S.S. Trakkan (ed.),IEEE Computer Society Press (1987).
J. Peyton, C. Clark, J. Salkild,and M. Hardie,GRID-A High-Performance Architecture for Parallel Graph Reduction,Processing of 1987 Functional Programming Languages and Computer Architecture Conference,Springer-Verlag LNCS 274,pp.98-112 (1987).
Google Scholar
P. Sesyoft,Deriving a Lazy Abstract Machine,J.Functioning Programming 1(1) (1993).
R. Surati and A. Berlin,Exploiting the Parallelism Exposed by Partial Evaluation,MIT A.I.Memo No.1414a (May 1994).

Download references

Author information

Authors and Affiliations

CICESE Research Center, Ensenada, BC, Mexico
Alfredo Cristobal-Salas & Andrei Tchernykh
UCI Parallel Systems & Computer Architectures Lab, Department of Electrical and Computer Engineering, University of California, Irvine, California, 92697
Jean-Luc Gaudiot
TIA Mobile, Inc., Los Angeles, California
Wen-Yen Lin

Authors

Alfredo Cristobal-Salas
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Tchernykh
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gaudiot
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cristobal-Salas, A., Tchernykh, A., Gaudiot, JL. et al. Non-Strict Execution in Parallel and Distributed Computing. International Journal of Parallel Programming 31, 77–105 (2003). https://doi.org/10.1023/A:1022664724413

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1023/A:1022664724413

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-Strict Execution in Parallel and Distributed Computing

Abstract

Access this article

Similar content being viewed by others

A coded shared atomic memory algorithm for message passing architectures

Trace Semantics and Algebraic Laws for Total Store Order Memory Model

Store Buffer Reduction with MMUs

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Non-Strict Execution in Parallel and Distributed Computing

Abstract

Access this article

Similar content being viewed by others

A coded shared atomic memory algorithm for message passing architectures

Trace Semantics and Algebraic Laws for Total Store Order Memory Model

Store Buffer Reduction with MMUs

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation