Abstract
Extracting information from data, often also called data analysis, is an important scientific task. Statistical approaches, which use methods from probability theory and numerical analysis, are well- founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and re- quires knowledge and experience in several areas. In this paper, we describe AutoBayes, a high-level generator system for data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribu- tion. It is thus a fully declarative problem description, similar in spirit to a set of diffierential equations. From this model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dy- namically into the Matlab and Octave environments. Code is generated by schema-guided deductive synthesis. A schema consists of a code tem- plate and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes aug- ments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. In this pa- per, we outline the AutoBayes system, its theoretical foundations in Bayesian probability theory, and its application by means of a detailed example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MatrixX: AutoCode Product Overview. ISI, 1999. http://www.isi.com.
J. Berkowitz. Photoabsorption, photoionization, and photoelectron spectroscopy. Academic Press, 1979.
W. L. Buntine, B. Fischer, and T. Pressburger. “Towards Automated Synthesis of Data Mining Programs”. In S. Chaudhuri and D. Madigan, (eds.), Proc. 5th Intl. Conf. Knowledge Discovery and Data Mining, pp. 372–376, San Diego, CA, August 15–18 1999. ACM Press.
L. Blaine, L.-M. Gilham, J. Liu, D. R. Smith, and S. Westfold. “Planware-Domain-Specific Synthesis of High-Performance Schedulers”. In D. F. Redmiles and B. Nuseibeh, (eds.), Proc. 13th Intl. Conf. Automated Software Engineering, pp.270–280, Honolulu, Hawaii, October 1998. IEEE Comp. Soc. Press.
M. Berthold and D. J. Hand, (eds.). IntelligentData Analysis-An introduction. Springer, Berlin, 1999.
T. J. Biggerstaff. “Reuse Technologies and Their Niches”. In D. Garlan and J. Kramer, (eds.), Proc. 21th Intl. Conf. Software Engineering, pp. 613–614, Los Angeles, CA, May 1999. ACM Press. Extended abstract.
N. S. Bjørner. “Type checking meta programs”. In Workshop on Logical Frame works and Meta-languages, Paris, France, 1999.
W. L. Buntine. “Operations for learning with graphical models”. J. AI Research, 2:159–225, 1994.
A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum likelihood from incomplete data via the EM algorithm (with discussion)”. J. of the Royal Statistical Society series B, 39:1–38, 1977.
T. Ellman and T. Murata. “Deductive Synthesis of Numerical Simulation Programs from Networks of Algebraic and Ordinary Differential Equations”. Automated Software Engineering, 5(3):291–319, 1998.
B. J. Frey. Graphical Models for Machine Learning and Digital Communication.MIT Press, Cambridge, MA, 1998.
M. I. Jordan, (ed.). Learning in Graphical Models. MIT Press, Cambridge, MA, 1999.
D. G. Koch, W. Borucki, E. Dunham, J. Jenkins, L. Webster, and F. Witteborn. “CCD Photometry Tests for a Mission to Detect Earth-size Planets in the Extended Solar Neighborhood”. In Proceedings SPIE Conference on UV, Optical, and IR Space Telescopes and Instruments, 2000.
G. W. Marcy and R. P. Butler. “Extrasolar Planets Detected by the Doppler Technique”. In Proceedings of Workshop on Brown Dwarfs and Extrasolar Planets, 1997.
C. B. Moler, J. N. Little, and S. Bangert. PC-Matlab Users Guide. Cochituate Place, 24 Prime Park Way, Natick, MA, USA, 1987.
J. L. McClelland and D. E. Rumelhart. Explorations in Parallel Distributed Processing. MIT Press, 1988.
M. Murphy. “Octave: A Free, High-Level Language for Mathematics”. Linux Journal, 39, July 1997.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA, USA, 1988.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge Univ. Press, Cambridge, UK, 2nd. edition, 1992.
D. R. Smith. “KIDS: A Semi-Automatic Program Development System”. IEEE Trans. Software Engineering, 16(9):1024–1043, September 1990.
M. Stickel, R. Waldinger, M. Lowry, T. Pressburger, and I. Underwood. “Deductive Composition of Astronomical Software from Subroutine Libraries”. In A. Bundy, (ed.), Proc. 12th Intl. Conf. Automated Deduction, Lect. Notes Artifical Intelligence814, pp. 341–355, Nancy, June-July 1994. S
A. Thomas, D. J. Spiegelhalter, and W. R. Gilks. “BUGS: A program to perform Bayesian inference using Gibbs sampling”. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, (eds.), Bayesian Statistics 4, pp. 837–842. Oxford Univ. Press, 19
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fischer, B., Schumann, J., Pressburger, T. (2000). Generating Data Analysis Programs from Statistical Models. In: Taha, W. (eds) Semantics, Applications, and Implementation of Program Generation. SAIG 2000. Lecture Notes in Computer Science, vol 1924. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45350-4_15
Download citation
DOI: https://doi.org/10.1007/3-540-45350-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41054-6
Online ISBN: 978-3-540-45350-5
eBook Packages: Springer Book Archive