Skip to main content
Log in

Parallel MARS Algorithm Based on B-splines

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

We investigate one of the possible ways for improving Friedman’s Multivariate Adaptive Regression Splines (MARS) algorithm designed for flexible modelling of high-dimensional data. In our version of MARS called BMARS we use B-splines instead of truncated power basis functions. The fact that B-splines have compact support allows us to introduce the notion of a “scale” of a basis function. The algorithm starts building up models by using large-scale basis functions and switches over to a smaller scale after the fitting ability of the large scale splines has been exhausted. The process is repeated until the prespecified number of basis functions has been produced. In addition, we discuss a parallelisation of BMARS as well as an application of the algorithm to processing of a large commercial data set. The results demonstrate the computational efficiency of our algorithm and its ability to generate models competitive with those of the original MARS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

Notes

  1. 1In fact, the ratio of the cost of optimising the knots of a B-spline to that of optimising a single knot of a truncated power basis function is roughly 1 : Kr, where K is the total number of candidate knot locations on a particular variable.

  2. 2In this paper, the multivariate models which are piecewise linear in each covariate are referred to as piecewise linear models. The same applies to the piecewise quadratic models mentioned in the later sections of the paper.

  3. 1The set of B-splines of the largest scale turns out to be comprised of a single linear function.

  4. 1The optimal value for Jmax is recommended in (Friedman 1991) to be set to 2Jfinal, where Jfinal is the size of the model after elimination of the suboptimal basis functions (see below). Thus, one generally has to run MARS or BMARS several time to determine the optimal value for Jmax.

  5. 2The knots of a bivariate tensor product basis function \(T\left( {x,y} \right) = B_{{t_1}}^{{l_1}}\left( x \right)B_{{t_2}}^{{l_2}}\left( y \right)\) are the four corners of its support rectangle (x1, y1), (x1, y3), (x3, y3), (x3, y1) as well as the location of its peak (x2, y2), where (x1, x2, x3) and (y1, y2, y3) are the knots of the univariate splines \(B_{{t_1}}^{{l_1}}\left( x \right)\) and \(B_{{t_2}}^{{l_2}}\left( y \right)\) respectively.

  6. 1One least squares fit per candidate basis function (13) defined by an admissible triplet j, v, t.

References

  • Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984), Classification and Regression Trees, Wadsworth, Belmont, California.

    MATH  Google Scholar 

  • Chen, Z. (1990), Beyond additive models: interactions by smoothing spline methods, Technical Report SMS-009-90, The Australian National University.

  • Cox, M.G. (1981), Practical spline approximation, Topics in Numerical Analysis, Lancaster, 79–112.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996), From Data Mining to Knowledge Discovery: An Overview, in ‘Advances in Knowledge Discovery and Data Mining’, pp. 1–36.

  • Friedman, J.H. (1991), ‘Multivariate Adaptive Regression Splines’, The Annals of Statistics, 19(1), 1–141.

    Article  MathSciNet  Google Scholar 

  • Friedman, J.H. (1981), Estimating functions of mixed ordinal and categorical variables, Technical Report 108, Stanford University.

  • Friedman, J.H. & Stuetzle, W. (1981), ‘Projection Pursuit Regression’, Journal of the American Statistical Association, 76, 817–823.

    Article  MathSciNet  Google Scholar 

  • Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. & Sunderam, V. (1994), PVM: Parallel Virtual Machine, MIT Press.

  • George, E.I. & McCulloch, R.E. (1993), ‘Variable selection via Gibbs sampling’, Journal of the American Statistical Association, 88, 881–889.

    Article  Google Scholar 

  • Luenberger, D.G. (1984), Linear and Nonlinear Programming, Reading, Massachusetts.

    MATH  Google Scholar 

  • McCullagh, P. & Neider, J.A. (1983), Generalized Linear Models, Chapman and Hall.

  • Miller, A.J. (1990), Subset Selection in Regression, Chapman and Hall.

  • Stone, G. (1997), Analysis of Motor Vehicle Claims Data using Statistical Data Mining, CMIS Confidential Report CMIS-97/73, CSIRO, Australia.

  • Wahba, G. (1990), Spline Models for Observational Data, SIAM, Philadelphia.

    Book  Google Scholar 

Download references

8 Acknowledgements

We are most grateful to Prof J.H. Friedman for suggesting the idea of the experiment involving the synthetic data set and to Dr B. Turlach for very fruitful discussions. Our thanks are also due to the anonymous referees for their constructive comments which greatly helped to improve the quality of this paper. The research of S. Bakin was supported by the Australian Government (Overseas Postgraduate Research Scholarship), by the Australian National University (ANU PhD Scholarship) and, also, by the Advanced Computational Systems CRC (ACSys), Australia.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bakin, S., Hegland, M. & Osborne, M.R. Parallel MARS Algorithm Based on B-splines. Computational Statistics 15, 463–484 (2000). https://doi.org/10.1007/PL00022715

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/PL00022715

Keywords

Navigation