Parameter Inference

Research

The parameter inference problem involves inferring the parameters of a descriptive model (e.g., simulator) from observed data. In cases where the likelihood function corresponding to the model is available, techniques such as maximum likelihood estimation (MLE) can be used. A more interesting case is of likelihood-free parameter inference where inference must proceed solely based on observed data and availability of the descriptive model/simulator. Approximate Bayesian computation (ABC) is an established method for such problems, and involves sampling parameter values from a specified prior distribution. The sampled parameters are then simulated and compared to observed data using a distance function, and often in terms of low-level features (summary statistics). If the simulated output is close enough to observed data within a tolerance bound, the sample is accepted. Once a desired number of accepted samples have been accumulated, they form the posterior distribution of inferred parameters. In practice, ABC parameter inference can be slow and might require a large number of rejection sampling iterations. However, in recent times there has been a lot of progress on improving various aspects of ABC, including selection of priors, summary statistic selection and use of adaptive tolerance bounds.

Selected Research

Learning Summary Statistics With Deep Neural Networks

The use of informative summary statistics is crucial for accurate parameter inference. Methods for selecting approximate-sufficient summary statistics exist, but do not scale well with the number of candidate statistics. In recent years, learning high-quality summary statistics using regression models that map simulator output (e.g., time series) to estimated posterior mean, and minimise the squared loss has gained popularity. We propose convolutional architectures owing to their applicability to structured data, for use with simulator models involving time series (structured 1d input). We comprehensively evaluate the convolutional architecture, particularly on parameter inference problems involving complex biochemical reaction networks [1].

We further propose robust and data-efficient Bayesian convolutional networks that offer improvements with respect to overfitting performance, resilience to noise, etc. [2].

[1] Åkesson, M., Singh, P., Wrede, F., & Hellander, A. (2021). Convolutional neural networks as summary statistics for approximate bayesian computation. IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2] Wrede, F., Eriksson, R., Jiang, R., Petzold, L., Engblom, S., Hellander, A., & Singh, P. (2022, July). Robust and integrative Bayesian neural networks for likelihood-free parameter inference. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.

Scalable Inference, Optimization, and Parameter Exploration (Sciope) Toolbox

The Sciope toolbox implements our state-of-the-art summary statistic neural networks, SMC-ABC and Replenishment SMC-ABC parameter inference methods, associated machinery, and basic statistical sampling and optimization methods. The toolbox supports parallelization via Dask and uses the Tensorflow library for deep learning support. Sciope is also used as a backend for parameter inference workflows in StochSS (Stochastic Simulation Service).

[1] Singh, P., Wrede, F., & Hellander, A. (2021). Scalable machine learning-assisted model exploration and inference using Sciope. Bioinformatics, 37(2), 279-281.

[2] Jiang, R., Jacob, B., Geiger, M., Matthew, S., Rumsey, B., Singh, P., … & Petzold, L. (2021). Epidemiological modeling in stochss live!. Bioinformatics, 37(17), 2787-2788.

[3] Drawert, B., Hellander, A., Bales, B., Banerjee, D., Bellesia, G., Daigle Jr, B. J., … & Petzold, L. R. (2016). Stochastic simulation service: bridging the gap between the computational expert and the biologist. PLoS computational biology, 12(12), e1005220.

Surrogate Models of Summary Statistics

Parameter inference problems sometimes consist of highly complex simulators. As an example, stochastic biochemical reaction networks in computational biology often involve tens of reactions taking place among several interacting proteins. The number of control parameters of the simulator may be several tens. Using such a complex simulation model often incurs substantial computational cost in inference.

An efficient approach is to train a surrogate model [1] of only the species of interest, taking part in the parameter inference process. The surrogate model learns the mapping from the parameter space (thetas) to the summary statistic space (figure right-top). The prediction times for ~15000 samples for a test problem of 6 parameters is shown in the table (right-bottom). The surrogate model delivers several orders of magnitude speed-up. The surrogate model can be optimized to obtain a point-estimate of inferred parameters, or coupled with a density estimator to conduct parameter inference.

[1] Singh, P., & Hellander, A. (2017, December). Surrogate assisted model reduction for stochastic biochemical reaction networks. In Proceedings of the 2017 Winter Simulation Conference (p. 138). IEEE Press.