Comparing neural simulations by neural density estimation

A common problem in computational neuroscience is the comparison of competing models in the light of observed data. Bayesian model comparison provides a statistically sound framework for this comparison based on the evidence each model provides for the data. In practice, however, models are often defined through complex simulators so that methods relying on likelihood functions are not applicable. Previous approaches in the field of Approximate Bayesian Computation (ABC) rely on rejection sampling to circumvent the likelihood, but are typically computationally inefficient. We propose an efficient method to perform Bayesian model comparison for simulation-based models. Using recent advances in posterior density estimation, we train a mixture-density network to map features of the observed data to the parameters of the posterior over models. We show that the method performs accurately on two tractable example problems, and present an application to a use case scenario from computational neuroscience – the comparison of ion channel models.


Introduction
Complex simulations play an important role in cognitive science, computational neuroscience, and other fields, and are used to develop and evaluate hypotheses for mechanisms underlying the processes under investigation (Gerstner, Sprekeler, & Deco, 2012). For hypothesis-building, we often want to decide which of several candidate models provides the best explanation of empirical data, i.e. we want to perform model comparison. However, in practice, complex mechanistic models are often defined implicitly through stochastic simulators, e.g., a set of differential equations to be integrated.
Bayesian model comparison offers a statistically sound framework for comparing competing hypotheses (Dienes, 2011;Bishop, 2006). Given candidate models m i with parameters θ and observed data where p(x o |θ, m i ) denotes the likelihood of the data given the model m i and parameters θ, and the integral is called marginal likelihood of the model (model evidence). In the context of complex simulator models, however, it is often impossible to compute the likelihood function p(x o |θ, m), implying that conventional approaches to Bayesian model comparison cannot be applied.
Classical approaches to solving this problem originate in the field of Approximate Bayesian Computation (ABC, Sunnåker et al., 2013) and rely on rejection sampling (Rubin et al., 1984). Basic rejection sampling was extended to a sequential Monte Carlo approach (ABC SMC) by Toni, Welch, Strelkowa, Ipsen, and Stumpf (2009), improving the sampling efficiency using importance weights. Overall, however, these approaches do not scale to high-dimensional applications and often rely on the choice of summary statistics, distance functions and acceptance thresholds.
Recently, several studies used conditional density estimation to infer the parameters of a simulation-based model (Papamakarios & Murray, 2016;Lueckmann et al., 2017;Greenberg, Nonnenmacher, & Macke, 2019). In contrast to rejection sampling approaches, these methods approximate the posterior over the parameters of a model in parametrized form, avoiding the choice of distance functions and providing improved simulation efficiency. Furthermore, these improve sampling efficiency by drawing new samples from adaptively chosen proposal distributions, giving rise to the name of sequential neural posterior estimation (SNPE).
However, previous studies concentrated on neural density estimation for posterior inference, and did not provide methodology for model comparison. Here, we investigate how neural density estimation can be used for model comparison, in order to provide methodology for comparing complex simulationbased models in neuroscience and beyond.

Model comparison via neural density estimation
The goal of neural posterior estimation is to learn a parametric approximation to the exact posterior, using simulations from the model. SNPE approaches originate in regression correction approaches to ABC (Beaumont, Zhang, & Balding, 2002), which proposed to learn properties of the posterior, such as its mean and variance, directly using a regression from simulated parameters to generated data. More recently, this idea was developed further using mixture-density networks (MDNs) to estimate the entire conditional density of the parameters given the observed data p(θ|x o ) (Papamakarios & Murray, 2016; 578 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 Lueckmann et al., 2017). Greenberg et al. (2019) use SNPE with flow-based conditional density estimators, and their methods allows for full flexibility in adaptively sampling simulations.
Here, we extend the idea of learning posterior over model parameters of a single model, to learning the posterior over multiple competing models to perform model comparison: Our goal is to approximate the posterior probability of each model i, given the observed data, i.e. p(m i |x o ). Instead of doing rejection sampling, approximating the posterior probability (or the evidence) with a selection of sampled parameters, our approach follows the rationale of posterior density estimation as outlined in Figure 1: Figure 1: Schematic overview of the approach: In the simulation phase, training data are generated from the simulator model. The resulting training dataset D is used in the density estimation phase to train a neural network which approximates the posterior over models. In practice, one often reduces the data x to a set of summary statistics s(x).
First, we choose priors over models and parameters. Then we repeatedly sample from the priors to generate a large amount of simulated data D = {(x n , m n )} N n=1 (Figure 1, Simulation). The simulated data are used to train a density estimation neural network q ψ (m|x) with parameters ψ to map from data to the model index, (Figure 1, Density estimation). Finally, the density estimator is used to predict the posterior over models given the empirically observed data,p(m i |x o ).
Starting from the joint data generating distribution one can derive a loss function for neural network training that minimizes the Kullback-Leibler divergence between the true posterior p(m|x) and the approximating parametrized posterior q(m|x). This results in the objective of maximizing the expected log-probability of the model.
The derivation involves both the posterior approximation for the models and for the parameters, thus, in principle the network can be trained to predict both the parameters θ n and the generating model m n , effectively doing model comparison and inference simultaneously. Here, we are interested in the discrete posterior over models only, so that the objective corresponds to the cross-entropy loss (3)

Experiments
We perform two experiments to demonstrate and test the accuracy of the approach, and one experiment to showcase the application in a computational neuroscience setting. In all numerical experiments, we used a two-layer neural network q ψ with hyperbolic tangent activation functions; and used ADAM (Kingma & Ba, 2014) to optimize the loss (eq. 3).

Gaussian toy example
We first provide an intuitive toy example -the problem of deciding between three Gaussian models with different means, given only a single data point as observation. Here, the training data consist of samples (m n , x n ) from all three models, each with different Gaussian priors p(µ|m i ) on the mean parameter µ (Figure 2, top). After training with 50000 data points, the posterior probability of each model predicted for the whole range of observed test data points x o closely matches the analytic posterior probability (Fig. 2, bottom).

Spike count models
Second, we present a model comparison problem from computational neuroscience for which the exact posteriors can be calculated: the comparison of the Poisson model vs. the negative-binomial model for spike count data. It relates to the question of whether spike counts generally have equal mean and variance (Poisson model) or whether the variance exceeds the mean, i.e., the counts are overdispersed (negativebinomial model, Taouali, Benvenuti, Wallisch, Chavane, & Perrinet, 2015). Thus, a statistical model comparison procedure is expected to favour the overdispersed model whenever the variance is sufficiently big. In this case, the training data consist of the mean and the variance of simulated spike counts from both models. We use a uniform prior p(m) over the two models and Gamma priors over the parameters of the Poisson and the negativebinomial model to generate 50000 training data points and 300 test data points. To illustrate the performance, we compute the predicted posterior probability of the Poisson model q ψ (m POI |s(x)) for a fine grid of means and variance ( Fig. 3, background) and plot the exact posterior probabilities p(m POI |x) of the test set on top using the same color code (Fig. 3, orange circles, foreground). The close match in color indicates accurate performance.

Ion channel models
Finally, we present a use case scenario from computational neuroscience for which no ground truth posterior is available: comparison of two ion channel models on the basis of observed current traces. Both ion channels models are taken from Pospischil et al. (2008), selected based on the ion channel genealogy (Podlaski et al., 2017). The output of the models consists in current traces in response to stereotypical voltage-clamp protocols applied to the membrane ( Figure 4); summary statistics were calculated using PCA, as described in (Podlaski et al., 2017).
In this example, the priors over the two models and the priors over the model parameters are uniform. We train q ψ with 100000 training data points to predict the probability of each channel model given observed summary statistics on 300 test data points. Because no ground truth posterior is available, no direct evaluation is possible here. However, we observe that q ψ reliably assigns high posterior probability to the correct underlying channel model.

Evaluation
For a comparison to state-of-the-art model comparison methods, we apply ABC SMC (Toni et al., 2009) as provided in the pyABC toolbox (Klinger, Rickert, & Hasenauer, 2017) to our example models and compare it with the neural density estimation (NDE) method presented here. We base the comparison on the same number of training simulations. For the tractable examples, the absolute difference between predicted and exact posterior serves as an evaluation metric, while the performance on the ion channel example is assessed using the cross-entropy loss with respect to the true underlying model. and sequential Monte Carlo (ABC SMC) methods on the presented example problems. Third column shows cross entropy loss, as ground-truth posteriors are not available for the ion-channel model.
Given the same number of simulations and based on a test set with 300 different observed data, NDE is competitive with ABC SMC on the analytically tractable examples. The comparison based on cross-entropy loss for the ion channel example shows that NDE clearly outperforms ABC SMC on this example.

Conclusions
Simulation-based models with intractable likelihoods play a central role in neuroscience and cognitive science (Palestro et al., 2018): recurrently connected spiking neurons (Ladenbauer, McKenzie, English, Hagens, & Ostojic, 2019), and models of decision making based on nonlinear diffusion mechanisms (Ratcliff & McKoon, 2008;Park, Lueckmann, von Kriegstein, Bitzer, & Kiebel, 2016) are examples of simulationbased models for which likelihoods are intractable or challenging to compute. The goal of this work is to provide statistical methods for performing Bayesian model comparison even on such complex mechanistic models. We show how neural density estimation for simulation-based inference can be extended to the task of Bayesian model comparison, and demonstrate its competitive performance on three example problems.
Our goal is to provide methods which will enable scientists to apply Bayesian model comparison to simulation-based models in computational neuroscience, cognitive science and beyond. Although we concentrated on training a neural network to do model comparison, one can perform both inference and model comparison simultaneously, which opens up new research possibilities in machine learning and neuroscience.