Optimisation and parallelisation strategies for Monte Carlo simulation of HIV infection
Introduction
Understanding the behaviour of immune system is a key issue for immunologists and medical practitioners in seeking new treatments for persistent diseases. Most of the time, particularly for lethal bacteria or viruses, in vivo experiments cannot be performed, hence, models are built to simulate part of the real-life behaviour of the system. Analysis of the simulation results leads to the discovery of new behavioural rules that can be used in exploring the potential for new drugs and treatments.
The immune response is the natural defense of the body against foreign entities that invade the host cells, causing minor infections to major diseases. From antigen infection to elimination, the immune response is a complicated process that involves the interactions of a variety of immune cells such as: macrophages (responsible for phagocytosis of pathogens, dead cells and cellular debris), cytotoxic T cells and helper T cells (types of white blood cell or leukocyte with surface antigen receptors that can bind to fragments of antigens). Additionally, the immune repertoire includes plasma B cells, secreting antibodies, which destroy antigens by binding to them and making them easier targets for phagocytes. Further defense features include memory B cells that are specifically primed by the antigen(s) encountered during the primary immune response; these memory cells are able to live for a long time and can respond quickly upon second exposure to an antigen for which they are specific. Finally, the system contains the viral cells and antibodies which react to them. The immune system has been modelled in several ways, featuring different subsets of cells, together with interactions for describing progression and inhibition of an infection. See Kuby [1] and Roitt [2] for an in-depth introduction to immunology.
Cellular automata (CA) models were first introduced by Neumann [3] in the late 1940s to describe elementary units that can reproduce themselves. CA are used to mimic the global behaviour of a system from its local rules and gave good results in early studies on biological, chemical and physical systems [4], [5].
Over the last several decades they have been applied to many different problems, such as heat and wave equations [6], forest fire spread [7], railway traffic [8] or even cryptography [9]. Typically defined on an n-dimensional grid , discrete CA are able to describe continuous dynamical systems through a set of simple rules. Lattice CA are characterised by the following properties:
- •
An n-dimensional discrete lattice of cells.
- •
A cell state chosen from a finite set of states.
- •
A neighbourhood (the set of cells that can interact with each given cell).
- •
A set of rules determining states of a cell and its neighbours.
- •
The evolution, in discrete time steps, of the cellular state.
- •
Any rule can depend on a specified probability.
Monte-Carlo (MC) methods are statistical simulation methods, using random numbers to give approximate solutions to mathematical problems. Developed by Neumann [3], Ulam [10] and Metropolis [11] during World War Two, this method was first used to model neutron diffusion in fissile material and give an approximation of the eigenvalues of the Schrödinger equation [12]. Since then MC methods have been used to model a large variety of problems, from pi estimation [13] to immune system simulation [14], [15].
In the case of HIV modelling, the MC method can be used with CA to mimic different aspects of immune system behaviour and allow for stochastic updating. Thus, for a 3D lattice of size l, sites will be randomly updated each MC run. To produce good statistics, several MC runs for each initial situation of seeding are common. The codes considered in this article [16], [17] typically average over 10 runs which is a good compromise considering both statistics and the increase in execution time. A simple MC–CA HIV simulation structure can then be represented as in Fig. 1.
The nrun loop is the one that repeats the overall simulation several times to ensure good statistics. The nMonteCarloSteps is the loop realising a complete MC simulation, the number of steps determining the evolution period of the simulation system. The allSites loop is the heart of the simulation, in which all sites will be randomly updated. For typical MC HIV simulation:
- •
.
- •
.
- •
.
The allSites loop presented here is a basic element; in fact the full update of the lattice during an MC step may require more than one loop that increases further the amount of computation to be done.
Therefore, particularly in models of biosystems such as the immune response, which require large-scale simulation, memory and speed issues are crucial. In this paper some optimisation and parallelisation solutions are presented for existing MC HIV simulation code [16], [17]. These models and corresponding code for these two examples will be referred to as A and B, respectively, in what follows.
Section snippets
Description of the CA models and MC methods considered
Numerous CA models have been built to simulate immune system response to HIV infection. Early models of cell-mediated immune response were based on three key cell types, e.g. Pandey and Stauffer (PS1) [20], Pandey (P1) [21], Kougias and Schulte (KS1) [22], considering the helper cell (H), the cytotoxic killer cell (C) and the viral-infected cell (V). Further cell-mediated models include the five cell Pandey and Stauffer model (PS2) [23] or eight cell Pandey (P2) [24]. Other variants were
Choices of optimisation technique
In this section, we discuss optimisation that can be done on existing code. Of course these are not universal rules as optimisation has proven to be implementation-dependent, with each particular problem being optimisable in a different way. In fact, working on existing code requires us to respect existing algorithms in order to preserve the underlying model and obtain similar results to those of the original code. Even if hardware-optimised code in assembly language offered the best solution
Choices of parallelisation techniques
A look back at the MC loop described in the Introduction, Fig. 1, gives clues on how to parallelise the computation. Due to the independence of each run (each one being performed on a newly seeded lattice with a different sequence of random updates), the nrun loop can be made parallel. This is relatively easy to do, giving each processor a number of runs to execute, then gathering the data of each computation at the end. The nMonteCarloSteps loop cannot be parallelised as each step is
Optimisation efficiency
To obtain an estimate of efficiency, we use a form of speed-up measure, which simply compares execution time of original code and optimised codes:In the case of code A [16], the code optimisation techniques used were the following:
- •
Code was rewritten in C, in order to improve optimisation capability.
- •
Compiler optimisation was used, as it was found to be most efficient, and giving no further improvement.
- •
Four Boolean lattices were replaced by a single char lattice,
Discussion
The optimisation issues described above generalise to many problems, although implementation details vary and are typically problem-specific. In studies on the various types of disorder in materials, for example, the natural length scale is that of the basic constituents of the material and in most solids this is microscopic, with disorder usually topological, while in fibrous composites (mesoscopic scale), disorder depends on fibre orientation and distribution, which leads at larger scale to
Conclusion
The objective of the optimisation and parallelisation of HIV infection simulation code is to find optimal ways to reduce computation time of CA and MC simulations. While this article is not meant to give general optimisation and parallelisation techniques for biocomputational simulations, it does suggest the most effective parallelisation method for parallel execution of CA and MC codes. Methods using spatial parallelisation have a big communication overload as the exchanges done to update each
Acknowledgements
The work described in this paper was supported by a grant from the National Institute for Cellular Biotechnology (NICB) under the Higher Education (HEA) Programme for Research in Third Level Institutions (PRTLI). I would like to thank my supervisors Prof. Heather Ruskin and Dr. Martin Crane for all their patience and advice.
Damien Hecquet received his M.Sc. in Computing (2005) in the Institut National des Sciences Appliquées de Lyon, France. He is currently working as an Information System Software Designer for Capgemini.
References (36)
- et al.
Interactive simulation of bushfire spread in heterogeneous fuel
Math. Comput. Modelling
(1990) - et al.
Cellular automata computations and secret key cryptography
Parallel Comput.
(2004) Monte Carlo simulation of the shape space model of immunology
Physica A: Stat. Theor. Phys.
(1992)- et al.
Effects of viral mutation on cellular dynamics in a Monte Carlo simulation of HIV immune response model in three dimensions
Theory Biosci.
(2002) Cellular automata approach to interacting network models for the dynamics of cell population in an early HIV infection
Physica A
(1991)- et al.
Pattern formation in one and two-dimensional shape–space models of the immune system
J. Theor. Biol.
(1992) Computer simulation of immunological cellular automata: the Tsallis Model
Physica A
(1993)- et al.
Diversity emergence and dynamics during primary immune response: a shape space, physical space model
Theory Biosci.
(2004) - et al.
Parallelization strategies for Monte Carlo simulations of thin film deposition
Comput. Phys. Commun.
(2002) Immunology
(1992)
Essential Immunology
Cellular Automata and Complexity: Collected Papers
Cellular Automata: Theory and Experiment
Published in Physica D
Cellular automata as an alternative to (rather than an approximation of) differential equations in modeling physics
Physica D
Cellular automaton model for railway traffic
J. Comput. Phys.
The Monte Carlo method
J. Am. Stat. Assoc.
The beginning of the Monte Carlo method
Los Alamos Sci.
Cited by (8)
Optimization of cell seeding in a 2D bio-scaffold system using computational models
2017, Computers in Biology and MedicineCitation Excerpt :As the cell growth patterns are random for various configurations within a given space, a Combinatorial-Monte Carlo computational model is proposed to optimize cell seeding positions. In the case of cell growth modelling, the Monte Carlo method will be used with the Cellular Automata concept proposed in Section 2 to mimic various aspects of cell growth behaviour and allow for stochastic updating [14]. Our ultimate goal is to obtain an optimal seeding configuration that is able to expand to a given desired amount of cells at a minimal time, based on the given space and total number of cells to be seeded.
Response to the comments by Bernaschi et al. on 'Optimisation and parallelisation strategies for Monte Carlo simulation of HIV infection'
2008, Computers in Biology and MedicineHandling of uncertainty in medical data using machine learning and probability theory techniques: a review of 30 years (1991–2020)
2021, Annals of Operations ResearchEfficient Computational Models for assessment of Spatial Infection Features
2019, 2019 International Conference on High Performance Computing and Simulation, HPCS 2019
Damien Hecquet received his M.Sc. in Computing (2005) in the Institut National des Sciences Appliquées de Lyon, France. He is currently working as an Information System Software Designer for Capgemini.
Heather J. Ruskin received her B.Sc. degree in Physics and M.Sc. in Medical Statistics from London University (Kings/London School of Hygiene and Tropical Medicine) and her Ph.D. in Statistical and Computational Physics from Trinity College Dublin. She is currently a Professor in the School of Computing and Associate Dean of Research in Engineering and Computing in Dublin City University. Her research interests include Computational Models for Complex Systems; spatiotemporal processes and many-body problems in biosystems (biomimetics) and in socioeconomic systems (traffic and finance).
Martin Crane received his B.A. and B.A.I. (Mechanical Engineering) degrees from Trinity College Dublin in 1989 and his Ph.D. from the same institution in 1993. He has worked in a variety of areas of Computational Science such as CFD, Combustion Modelling, Financial Data Analysis and, more recently, Systems Biology. He has been a College Lecturer in Dublin City University since 1999.