Skip to main content

Generating Data Analysis Programs from Statistical Models

Position Paper

  • Conference paper
  • First Online:
Semantics, Applications, and Implementation of Program Generation (SAIG 2000)

Abstract

Extracting information from data, often also called data analysis, is an important scientific task. Statistical approaches, which use methods from probability theory and numerical analysis, are well- founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and re- quires knowledge and experience in several areas. In this paper, we describe AutoBayes, a high-level generator system for data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribu- tion. It is thus a fully declarative problem description, similar in spirit to a set of diffierential equations. From this model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dy- namically into the Matlab and Octave environments. Code is generated by schema-guided deductive synthesis. A schema consists of a code tem- plate and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes aug- ments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. In this pa- per, we outline the AutoBayes system, its theoretical foundations in Bayesian probability theory, and its application by means of a detailed example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MatrixX: AutoCode Product Overview. ISI, 1999. http://www.isi.com.

  2. J. Berkowitz. Photoabsorption, photoionization, and photoelectron spectroscopy. Academic Press, 1979.

    Google Scholar 

  3. W. L. Buntine, B. Fischer, and T. Pressburger. “Towards Automated Synthesis of Data Mining Programs”. In S. Chaudhuri and D. Madigan, (eds.), Proc. 5th Intl. Conf. Knowledge Discovery and Data Mining, pp. 372–376, San Diego, CA, August 15–18 1999. ACM Press.

    Google Scholar 

  4. L. Blaine, L.-M. Gilham, J. Liu, D. R. Smith, and S. Westfold. “Planware-Domain-Specific Synthesis of High-Performance Schedulers”. In D. F. Redmiles and B. Nuseibeh, (eds.), Proc. 13th Intl. Conf. Automated Software Engineering, pp.270–280, Honolulu, Hawaii, October 1998. IEEE Comp. Soc. Press.

    Google Scholar 

  5. M. Berthold and D. J. Hand, (eds.). IntelligentData Analysis-An introduction. Springer, Berlin, 1999.

    Google Scholar 

  6. T. J. Biggerstaff. “Reuse Technologies and Their Niches”. In D. Garlan and J. Kramer, (eds.), Proc. 21th Intl. Conf. Software Engineering, pp. 613–614, Los Angeles, CA, May 1999. ACM Press. Extended abstract.

    Google Scholar 

  7. N. S. Bjørner. “Type checking meta programs”. In Workshop on Logical Frame works and Meta-languages, Paris, France, 1999.

    Google Scholar 

  8. W. L. Buntine. “Operations for learning with graphical models”. J. AI Research, 2:159–225, 1994.

    Google Scholar 

  9. A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum likelihood from incomplete data via the EM algorithm (with discussion)”. J. of the Royal Statistical Society series B, 39:1–38, 1977.

    MATH  MathSciNet  Google Scholar 

  10. T. Ellman and T. Murata. “Deductive Synthesis of Numerical Simulation Programs from Networks of Algebraic and Ordinary Differential Equations”. Automated Software Engineering, 5(3):291–319, 1998.

    Article  Google Scholar 

  11. B. J. Frey. Graphical Models for Machine Learning and Digital Communication.MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  12. M. I. Jordan, (ed.). Learning in Graphical Models. MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  13. D. G. Koch, W. Borucki, E. Dunham, J. Jenkins, L. Webster, and F. Witteborn. “CCD Photometry Tests for a Mission to Detect Earth-size Planets in the Extended Solar Neighborhood”. In Proceedings SPIE Conference on UV, Optical, and IR Space Telescopes and Instruments, 2000.

    Google Scholar 

  14. G. W. Marcy and R. P. Butler. “Extrasolar Planets Detected by the Doppler Technique”. In Proceedings of Workshop on Brown Dwarfs and Extrasolar Planets, 1997.

    Google Scholar 

  15. C. B. Moler, J. N. Little, and S. Bangert. PC-Matlab Users Guide. Cochituate Place, 24 Prime Park Way, Natick, MA, USA, 1987.

    Google Scholar 

  16. J. L. McClelland and D. E. Rumelhart. Explorations in Parallel Distributed Processing. MIT Press, 1988.

    Google Scholar 

  17. M. Murphy. “Octave: A Free, High-Level Language for Mathematics”. Linux Journal, 39, July 1997.

    Google Scholar 

  18. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA, USA, 1988.

    Google Scholar 

  19. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge Univ. Press, Cambridge, UK, 2nd. edition, 1992.

    Google Scholar 

  20. D. R. Smith. “KIDS: A Semi-Automatic Program Development System”. IEEE Trans. Software Engineering, 16(9):1024–1043, September 1990.

    Article  Google Scholar 

  21. M. Stickel, R. Waldinger, M. Lowry, T. Pressburger, and I. Underwood. “Deductive Composition of Astronomical Software from Subroutine Libraries”. In A. Bundy, (ed.), Proc. 12th Intl. Conf. Automated Deduction, Lect. Notes Artifical Intelligence814, pp. 341–355, Nancy, June-July 1994. S

    Google Scholar 

  22. A. Thomas, D. J. Spiegelhalter, and W. R. Gilks. “BUGS: A program to perform Bayesian inference using Gibbs sampling”. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, (eds.), Bayesian Statistics 4, pp. 837–842. Oxford Univ. Press, 19

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fischer, B., Schumann, J., Pressburger, T. (2000). Generating Data Analysis Programs from Statistical Models. In: Taha, W. (eds) Semantics, Applications, and Implementation of Program Generation. SAIG 2000. Lecture Notes in Computer Science, vol 1924. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45350-4_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-45350-4_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41054-6

  • Online ISBN: 978-3-540-45350-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics