Semiparametric regression during 2003-2007.

Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology - thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.


Introduction
Semiparametric regression is a fusion between traditional parametric regression analysis (e.g. Cook and Weisberg [54]; Draper and Smith [77]) and newer nonparametric regression methods (e.g. Wahba [275]; Hastie and Tibshirani [126]; Green and Silverman [115]). This emerging field synthesizes research across several branches of Statistics: parametric and nonparametric regression, longitudinal and spatial data analysis, mixed and hierarchical Bayesian models, Expectation-Maximization (EM) and Markov Chain Monte Carlo (MCMC) algorithms. Semiparametric regression is a field deeply rooted in applications and its evolution reflects the increasingly large and complex problems that are arising in science and industry.
We do not view semiparametric regression as a competitor to parametric and nonparametric approaches, but rather as a bridge between them. The need for parsimonious statistical models is well-known and parametric models are often a convenient method for achieving parsimony. However, nonparametric models exist because there are many examples where parametric models do not provide a suitable fit to the data. Semiparametric modeling allows a researcher to have the best of both worlds: the parametric and the nonparametric. Those features of the data that are suitable for parametric modeling are modeled that way and nonparametric components are used only where needed. For example, in the study discussed in Section 4.1, the effect of blood lead concentration on a child's intelligence quotient is modeled with a spline in order to detect a nonlinear dose response. In that study, the effects of the numerous confounders are modeled linearly and within-child correlations are modeled by a parametric mixed model to achieve a parsimonious fit. The extreme cases, fully parametric or completely nonparametric models, can be used when they are appropriate.
Two prominent features throughout much of semiparametric regression are: • keeping the nonparametric regression part relatively simple by using lowrank penalized splines; • utilizing the mixed model representation of penalized splines.
These bring several benefits. Firstly, longitudinal and spatial effects are easily incorporated. Secondly, fitting and inference can be performed within the established frameworks of maximum likelihood and best prediction. Established mixed model software in R and SAS can aid implementation. If a Bayesian approach is used then the infrastructure of Bayesian inference can be called upon. This includes the BUGS software project (e.g. Lunn et al. [177]). The Bayesian/BUGS route is particularly attractive in non-standard situations, such as when the data are overdispersed or incomplete. An overarching benefit is extendability: the modularity of the mixed model-based penalized spline approach allows 'twists', such as missing data, to be incorporated in a straightforward manner.
In 2003 we published the book Semiparametric Regression (Ruppert, Wand and Carroll [232]). Since it is the first book to make use of both of these ideas, its publication 6 years ago constitutes some sort of 'line in the sand' for this exciting area of research. Although Semiparametric Regression was released in April 2003, the final drafts were written in late 2002. Hence Semiparametric Regression contains a survey of the literature up until the end of 2002 (roughly).
In this review we revisit the field 5 years later and summarize the state of the field as of the end of 2007. We are pleased to report that semiparametric regression is a thriving area of research with contributions to its theory, methodology and software being continually made by research teams around the world. Especially pleasing is the rate at which semiparametric regression is being used in applications. While surveying the area over 2003-2007 we learned about applications in several fascinating and diverse areas, including on-line auctions, genomics, air pollution, agricultural soil and cosmology. A great deal of penalized splines (especially smoothing splines) research does not make use of their mixed model representation. For example, Wood [292] and the accompanying R package mgcv (Wood [295]) mainly uses generalized cross-validation (GCV) and a version of Akaike's Information Criterion (AIC).
There is also an enormous literature on flexible regression analysis that does not involve penalized splines. Examples include regression splines (e.g. Stone et al. 1997), local polynomials (e.g. Fan and Gijbels [93]) and wavelets (e.g. Ogden, [205]). Ruppert et al. [232] discuss each of these choices but promote the mixed model-based penalized spline approach to semiparametric regression. Largely because of time and space limitations, we will stay mainly with this approach throughout the review.

Summary of mixed model-based penalized spline approach
In this section we provide a summary of the mixed model-based penalized spline approach to semiparametric regression that is adopted by many of the papers in this review.
We begin with some examples of semiparametric regression models: Here x 1i , . . . , x 4i are scalar predictors corresponding to the response variable y i , x 2i is a vector of predictors and u i,sbj is a random subject intercept with variance σ 2 sbj . The term f 2 (x 2i ) means a smooth function of x 2i . Other functional notation is defined similarly. Model (1) is an extension of the generalised additive model paradigm that allows nonparametric bivariate components. If (x 3i , x 4i ) correspond to geographic position then (1) is sometimes called a geoadditive model (e.g. Kammann & Wand [140]). In Model (2), β 0 and β 1 are smooth functions of the x 1 variable. This model is known as a varying coefficient model. Model (3) is usually called an additive mixed model since it represents the fusion of an additive model and a linear mixed model.
An example data set that benefits from (3) is shown in Figure 1. It consists of longitudinal measurements on the spinal bone mineral density of a cohort of young female subjects (source: Bachrach et al. [7]). A question of interest is how spinal bone mineral density differs among the four ethnicity groups. However, the analysis is complicated by (a) the non-linear effect of age, and (b) correlation arising from repeated measurements on the same subject. Model (3) with the x 1i s corresponding to the age measurements and the x 2i s corresponding to ethnicity indicators is appropriate.
In the mixed model approach to semiparametric regression, generic nonparametric functional relationships are handled through modelling mechanisms such as: Here z 1 , . . . , z K are a set of spline basis functions. The simplest example is z k (x) = (x − κ k ) + for some knot sequence κ 1 , . . . , κ K . Here u + equals u for u ≥ 0 and equals 0 otherwise. However, more sophisticated options now exist and these are reviewed in Section 2.1. Most of the spline bases described there are in accordance with the classical nonparametric regression method known as smoothing splines (e.g. Wahba [275]; Eubank [88]). This approach is extendable to multivariate functions using either radial basis functions (e.g. Wood [290]; Ruppert et al., [232]) or tensor products (e.g. Wood [294]). A consequence of (4) is that many frequentist semiparametric regression models are expressible as where y denotes the vector of responses, X and Z are design matrices and β β β and u are vectors containing coefficients. Here g is a scalar 'link' function, and evaluated element-wise for vector arguments. For a general random vector v, v ∼ (µ µ µ, Σ Σ Σ) is shorthand for E(v) = µ µ µ and Cov(v) = Σ Σ Σ. The fixed effects term, Xβ β β, handles covariates that enter the model linearly. The random effects component Zu and corresponding covariance matrix G handles non-linear effects through spline basis functions, but may also incorporate random subject effects and spatial correlation structure in longitudinal and spatial contexts. There will often be other parameters arising, for example, in the variance structure (e.g. R = Cov(y|u)) but we will ignore this in the current discussion.
Most commonly (5) is embedded in a fully specified probabilistic model. This allows fitting and inference to be achieved through the paradigms: Maximum Likelihood (ML) for β β β; Restricted Maximum Likelihood (REML) for G; Best Prediction (BP) for u.
BP is defined according to minimum mean squared error and has the solution u = E(u|y) (e.g. McCulloch, Searle & Neuhaus [191]). Depending on the form of the model (e.g. normal versus Poisson) execution of (6) can range from easy exact calculation using standard mixed model software (e.g. lme() in the R language; R Development Core Team [221]) to difficult approximation via computationally intensive algorithms such as MCMC.
The hierarchical Bayesian version of (5) takes the form where A β β β and A G are hyper-parameters, f 1 , . In semiparametric regression it is very rare that analytical solutions for these posteriors exist and approximation methods need to be employed. MCMC approximation via the BUGS software (e.g. Lunn et al. [177]) often provides satisfactory solutions.

Layout of review
The rapidity with which semiparametric regression is growing as a field means that a concise and informative review of the five years since 2002 is quite challenging. For instance, we estimate that more than three hundred papers in 2003-2007 are connected with the area. After surveying the literature we decided on the following layout for the remainder of the article: Section 2: Advancement of Primitives By primitives we mean the 'nuts and bolts' of semiparametric regression. Examples include spline basis specification, computing and asymptotic theory. Much of Ruppert et al. [232] is concerned with the primitives of semiparametric regression. However, some have undergone noticeable refinement in the past five years. Section 2 summarizes these developments.

Section 3: Advancement of Models and Methods
During 2003-2007 semiparametric regression models have continually become more sophisticated in response to the complexities of contemporary data sets and scientific questions. Section 3 reviews these advancements.

Section 4: Applications
Semiparametric regression is very much an applications-oriented branch of Statistics. In Section 4 we highlight several case studies which have benefited from the semiparametric regression paradigm.

Overlooked literature
The production of this review article has involved an immense amount of retrieval and reading over a relatively short time period. While we have tried hard to peruse all relevant contributions it is certain that some have been inadvertently overlooked. We welcome any omissions being drawn to our attention. Also, we point out that the end of 2007 cut-off for inclusion in this review is slightly fuzzy. For example, some relevant papers that we have known about for some time turned out to be 2008 or 2009 papers. These are still included.

Advancement of primitives
In this section we summarize 2003-2007 research on the primitives of semiparametric regression with emphasis on important advancement.

Univariate spline bases
All commonly used penalized spline models for a smooth real-valued function f spline can be expressed in the form where p is the degree of the polynomial component, with coefficients β 0 , . . . , β p , and {z 1 (·) : k = 1, . . . , K} is a set of spline basis functions for handling departures from pth degree polynomials. The spline coefficients u = (u 1 , . . . , u K ) are subject to penalization. In the mixed model representation u is usually taken to be random according to N (0, G) for some G. Already it is clear that there are a lot of options for spline bases and the penalization. Without loss of generality, we can take G = σ 2 u I since this just involves a linear transformation of the z k s. There is also a lot to be said for taking the polynomial to be linear -for example, if tests for linearity are of interest. So, while p > 1 may sometimes be desirable, the p = 1 canonical form is useful for sorting out the various spline basis options. In addition, (8) is convenient for implementation since it corresponds to the standard mixed model structure (with x 1 , . . . , x n being data on the x variable): Ruppert et al. ([232], Chapter 2) survey options and strategies for K and the z k s up to about 2002. However, there have been some interesting developments since then. Currie and Durbán [65] show how the Eilers and Marx [85] P-splines can be expressed in mixed model forms such as (8). Welham, Cullis, Kenward and Thompson [287] produce a useful exposition on how the various versions of penalized splines are connected to each other. In a similar vein to (8) they propose 'A general model for polynomial splines' that includes several options under one umbrella. A large-scale simulation comparison study shows no clear winner across all settings. Some practical advice about penalty choice and order of differencing is offered by Welham et al. [287]. The 'minimum worst possible change' approach of Wood [290], described in Section 2.2 for multivariate smoothing, also yields univariate low-rank spline bases as a special case.
Wand and Ormerod [280] study the O'Sullivan [209] low-rank approximation of smoothing splines. The name O-splines is suggested for this exact counterpart of P-splines. Results for exact computation of the z k are derived that allow implementation in R with only a few lines of code. A simulation study shows O-splines and P-splines to be quite close in the interior, but the former to have better extrapolation behavior, and also very close to smoothing splines even for K ≪ n. Given the well-established good properties of smoothing splines, such as natural boundary behavior and asymptotic optimality (Nussbaum [202]), the evidence points towards O-splines as the better option in comparison with Psplines, and as an excellent default for univariate spline bases in semiparametric regression analysis.

Multivariate smoothing
In principle, all smoothing techniques can be extended to the multivariate case. In practice, though, this extension is a delicate art because of the additional complications that high-dimensional domains bring. Chapter 13 of Ruppert et al. [232] summarized bivariate smoothing approaches based on kriging and splines, including low-rank extensions. General multivariate extensions were briefly described. We now summarize interesting new work in this direction from the last few years.
Wood [290] develops an approach to low-rank thin plate spline smoothing that circumvents the knot placement issue. His basis reduction is based on a 'worst possible change' criterion. For small data sets, implementation involves standard linear algebra manipulations, while Lanczos iteration (e.g. Demmel [70]) is suggested for larger sample sizes. In Wood [294] the same author opts for tensor products as a means of extending penalized splines smoothing to the multivariate situation. Scale invariance is the main mechanism for achieving this extension in an attractive way. An advantage is this version of multivariate smoothing is that each direction has its own smoothing parameter. Particular attention is also paid to the cogent incorporation of random effect structure for generalized additive mixed modeling.
Fahrmeir, Kneib & Lang [90]) show that common geostatistical approaches to bivariate smoothing have a representation in terms of stationary Gaussian random fields. They then point out that Gaussian random fields can be approximated by Markov random fields and that the latter has computational advantages. Markov random fields are also a common vehicle for Bayesian smoothing of spatial count data. Hence, the Markov random field approach to bivariate smoothing has the advantage of being in concert with that used for spatial count data. Kneib & Fahrmeir [148] also use the Markov random field approach to bivariate smoothing and relate it to mixed models.
Currie, Durban & Eilers [67] and Eilers, Currie & Durbán [83] treat the special case of smoothing on multidimensional grids. They develop an arithmetic that results in higher computation speed and lower storage requirements.
Paciorek [210,211] investigates the use of the spectral representation of stationary process structure (Wikle [289]) in semiparametric regression contexts. He identifies advantages for large sample sizes and MCMC mixing in the generalized response situation.
The problem of complicated domains in bivariate smoothing is addressed by Wang & Ranalli [281]. Motivated by a study on mercury concentrations in estuaries, Euclidean 'as the crow files' distance is replaced by a geodesic 'as the fish swims' distance. This distance depends on the intrinsic structure of the domain and needs to be estimated. A procedure based on shortest path theory and Floyd's algorithm (Floyd [98]) is described.

Bayesian semiparametric regression
Bayesian semiparametric regression is progressively becoming more prevalent, and could eventually challenge the frequentist version in terms of popularity. Reasons include (1) the attractiveness of hierarchical Bayesian models for quantifying multiple sources of variability, (2) models becoming more sophisticated (e.g. dealing with complications such as missing data and measurement error) to the point that standard (likelihood-based) mixed model software cannot be used, (3) continual improvement of Monte Carlo methods for Bayesian inference, and (4) continual improvement of the BUGS computing environment (Lunn et al. [177]) for MCMC sampling from posterior distributions of interest. We expand on aspect (4) in Section 2.7.
Recent Bayesian modeling research has also impacted upon Bayesian semiparametric regression since 2002. A prominent example is Gelman [106] which advises on non-informative prior distribution specification for variance parameters and, in particular, argues against the use of inverse Gamma distributions.
With robustness in mind, Jullion & Lambert [139] study prior specification for Bayesian P-splines models. The Bayesian hierarchical modeling is quite advanced and uses, for example, Dirichlet-based mixture priors.
Several other Bayesian semiparametric regression contributions, involving new models and methodology, are described in Section 3.
Bayesian methodology and software is currently an area of vigorous research activity -in both Statistics and Computer Science (see Section 2.5). This has led to the Bayesian brand of semiparametric regression becoming more prominent in recent years, which is a trend that we expect to continue.

Monte Carlo methods
Since the early 1990s Markov Chain Monte Carlo (MCMC) methods have been a mainstay of Bayesian inference. However, in the intervening years, we have noticed the emergence of new Monte Carlo methods. Some of these are more elaborate versions of MCMC, while others fall outside of the Markov chain paradigm.
Staying first within the MCMC family we note that specifically tailored Metropolis-Hastings schemes are developed by Baladandayuthapani et al. [8].
Paciorek & Schervish [213], Gryparis et al. [117] and Baladandayuthapani et al. [9]. The BayesX software package makes use of elaborate MCMC schemes. In each case, the goal is improved mixing for the complex semiparametric regression model at hand. This entails that inference for parameters of interest can be made with smaller MCMC samples.
The single component adaptive Metropolis algorithm of Haario, Saksman & Tamminen [120] is a recent modification of the random walk Metropolis-Hastings algorithm that adapts according to what it has learnt from previous sampled iterates. The resulting chain is not Markovian, although Haario et al. [120] prove that it does lead to samples from the correct posterior distributions. The adaptation aspect means that fiddly tuning runs are not required. Nott [201] successfully applied Haario et al.'s algorithm to a semiparametric regression setting.
Quasi-Monte Carlo is a vibrant research area in the general problem of highdimensional numerical integration via importance sampling. It differs from ordinary Monte Carlo integral approximation in that random samples are replaced by cleverly chosen deterministic ones. While much of quasi-Monte Carlo research is outside of Statistics, Hickernell, Lemieux & Owen [132] provides a recent survey for a statistical audience. Kuo, Dunsmuir, Sloan, Wand & Womersley [155] apply state-of-the art quasi-Monte Carlo algorithms to a class of statistical problems that encompass some important semiparametric regression models.
Sequential Monte Carlo samplers are a generalization of importance sampling that produce weighted samples from the target distribution by sampling sequentially from a slowly evolving set of distributions. Del Moral, Doucet & Jasra [69] is the main reference for this emerging methodology. Fan, Leslie and Wand [95] represents early work on application of sequential Monte Carlo samplers to Bayesian semiparametric regression.

Computer science interface
The foreword of a recent special issue of Statistical Science proclaimed the "the dissolving of the frontier between Statistics and Computer Science" (Casella and Robert [43]). In 2006 Statistica Sinica had a special issue titled Challenges in Statistical Machine Learning. Hastie, Tibshirani and Friedman's crossdisciplinary book The Elements of Statistical Learning has had colossal impact since its publication in 2001. In keeping with this zeitgeist, strong connections between semiparametric regression and contemporary Computer Science are becoming apparent.
Most of the connections are concerned with methodology for classification (or supervised learning in the Computer Science world) and the sub-field of Computer Science known as Machine Learning. Support vector machines (e.g. Moguerza & Muñoz [193]) and other kernel machines share many attributes and issues with nonparametric regression (e.g. Hastie & Zhu [128]). Wahba's [276] comment on Moguerza & Muñoz [193] describes recent convergence between support vector machine and regularization research. Pearce & Wand [215] show how penalized splines and semiparametric regression structure such as additive models can be embedded within the kernel machine framework.
Boosting (Schapire [235]), described in Section 3.1, is another innovation from Machine Learning that is now benefiting semiparametric regression. For example, Bühlmann & Yu [32] use smoothing spline theory and simulations to explain the interplay between the number of boosting iterations and the 'weakness' of the smoother. Tutz & Reithinger [266] apply their lessons to semiparametric mixed models and derive an alternative fitting algorithm called BoostMixed.
Another area on the Computer Science interface where we see great potential for benefits to semiparametric regression is graphical models (e.g. Jordan [137]). Wand [278] provides detailed discussion on this topic. Directed acyclic graphs have become a common way of representing hierarchical Bayesian models and, indeed, comprise the architecture on which BUGS is built (Cowell et al. [56]). Figure 2 is a directed acyclic graph representation of the model: where the data are (x i , y i ) ∈ R × {0, 1}, 1 ≤ i ≤ n, the z k are a spline basis as described in Section 1.1, and σ β , A > 0 are hyper-parameters. Nodes of the graph correspond to the components of the model, while arrows convey 'parentchild' relationships of the hierarchical structure. Suppose we add the complication that the x i in (9) are subject to measurement error and that we instead observe w i = x i + z i where the x i are now modeled to be from a N (µ x , σ 2 x ) distribution and the contaminating variable z i is from a known fixed distribution. Then an appropriate hierarchical Bayesian model is that represented by Figure 3, a more complex graph with four additional edges and nodes.
MCMC is currently the most common mechanism for approximation of posteriors in the models depicted in Figures 2 and 3. The graphical models setting allows for graph-theoretic structure, such as Markov blankets, to be exploited in the design and implementation of MCMC algorithms (Jordan [137]). An emerging alternative to MCMC is variational approximation (e.g. Jordan, Ghahramani, Jaakkola and Saul [138]). Joint work between the second author and J.T. Ormerod is investigating variational approximations that are specific to semiparametric regression analysis.
Interplay with Computer Science is one of the most exciting recent developments in semiparametric regression. We anticipate this to be an increasingly fruitful area of research.
Directed acyclic graph representation of model (9), but with the predictor subject to measurement error. Shaded nodes correspond to the observed data.

Asymptotic theory
Hall & Opsomer [121] is the first paper to study the asymptotic theory of penalized splines. They replace the nonparametric regression model y i = m(x i ) + e i , i = 1, . . . , n, where x i is in a compact interval I, by the model y(t) = m(t) + e(t), t ∈ I, where e(t) = n −1/2 v(t) −1/2 DW (t) and DW (t) is the 'derivative' of standard Brownian motion in the sense that DW (t)dt = dW (t). This white noise plus drift model is "asymptotically equivalent" to nonparametric regression meaning that the distribution of y 1 , . . . , y n converges to that of y(t), t ∈ I, in a metric due to Le Cam (Brown and Low [30]; Brown, Cai, Low, and Zhang [29]). Hall and Opsomer [121] use an idealized version of a penalized spline where there is a continuum of knots. Their spline is I β(s)ρ(s)(x − s) p + ds where β(s) is the spline coefficient at knot s, ρ(s) is the knot density, and (x−s) p + is the spline basis function with knot s. In this framework, they find asymptotic expressions for the bias and the stochastic part of the penalized spline estimator. These expressions are infinite series with terms depending on the eigenvalues and eigenvectors of a certain functional operator. They show that the mean integrated squared error, which is where n is the sample size, p is the degree of the spline and λ is the penalty parameter. Therefore, if λ is a constant multiple of n 2(p+1)/(2p+3) , then the mean integrated squared error is O(n 2(p+1)/(2p+3) ), which is the optimal rate for functions with p + 1 square-integrable derivatives (Stone [250]).
Penalized spline asymptotics with a finite but increasing number of knots can be divided into two cases, depending on the rate at which the number of knots K increases with the sample size n. "Small-K" asymptotics are similar to those of ordinary least squares regression splines, that is, splines fit by least squares without a penalty. "Large-K" asymptotics are similar to that of smoothing splines. The bias of a penalized spline has two components, the approximation (or modeling) bias and the shrinkage (or smoothing) bias. Approximation bias is the bias of an ordinary least squares regression spline and is due solely to the approximation of the regression function by a spline. Shrinkage bias is the difference between the bias of the penalized spline and the approximation bias and so is the additional bias due to the penalty. Under large-K asymptotics, the approximation bias is negligible compared to the shrinkage bias. Hall and Opsomer's [121] framework is an extreme case of large-K asymptotics. Under small-K asymptotics, the approximation bias converges to zero at the same rate or more slowly than the smoothing bias. The approximation bias is controlled by K, and under small-K asymptotics K is a smoothing parameter. Under large-K asymptotics, the approximation bias is negligible (or exactly zero in Hall & Opsomer's case), the exact value of K has no effect on the asymptotic distribution (provided only that K grows fast enough to be in the large-K case), and the penalty parameter λ is the only smoothing parameter. We believe that large-K asymptotics are the most relevant to current practice. The original penalized spline methodology proposed by Eilers & Marx [85] assume that the number of knots is sufficiently large that approximation bias is negligible compared to smoothing bias. Numerical evidence in Ruppert [230] supports this assumption, as does the current practice of using the data to carefully select the penalty parameter while using some rule of thumb applied to the sample size to select the number of knots. Moreover, under small-K asymptotics, one needs to use a data-based method to select K. One might also need to be careful about the locations as well as the number of knots. These issues have not been investigated, except in the case of pure regression splines with no roughness penalty where both the number and locations of the knots are chosen ( [73]) and the hybrid adaptive splines of Luo & Wahba [178] that use both adaptive knot selection and a roughness penalty.
Li & Ruppert [162] use large-K asymptotics. They study Eilers and Marx's [85] P-splines which involve B-splines and difference penalties on the spline coefficients. Li & Ruppert [162] obtain simple, explicit expressions for the asymptotic bias and variance. This allows asymptotic distributions of P-splines to be compared with those of kernel regression, local polynomial regression, and smoothing splines. Their results are restricted to the cases of zero-degree or linear splines and a first or second order difference penalty. We say that a penalty is of qth order if the penalty is on the qth derivative (O-splines) or the qth finite difference (P-splines). O-splines (Wand & Ormerod [280]) use the same penalty as used by smoothing splines, but are similar to the P-splines of Eilers & Marx [85] in that a reduced set of knots is used.
In a nutshell, Li & Ruppert [162] found that P-splines with a qth order penalty are asymptotically equivalent to Nadaraya-Watson kernel regression estimators with the equivalent kernel found by Silverman [239] for smoothing splines with a qth order penalty. The asymptotic distribution of m(x) at any fixed x does not depend on the degree p, but the minimum rate at which K must increase does depend on p and the rate is slower if higher degree splines are used. First consider the case where the xs are equally-spaced on a finite interval. For first-order penalties, the equivalent kernel for m(x) in the interior of I is the double-exponential kernel, which is second order. This is the equivalent kernel for smoothing splines with a first-order penalty (Silverman [239]). If the penalty parameter λ is chosen at the optimal rate, then the equivalent bandwidth h satisfies in distribution as n → ∞. In the boundary regions of I, which consists of the points within a multiple of n −1/5 of the left or right boundaries, the equivalent kernel is of first-order. At the boundaries themselves, the equivalent kernel is an exponential function. If the penalty parameter is chosen to be optimal for the interior, then in the boundary region of I, bias dominates and the convergence rate is O(n −1/5 ); the same as for a Nadaraya-Watson kernel estimator.
For second-order penalties, Li & Ruppert [162] find that the equivalent kernel in the interior is fourth-order, so takes negative as well a positive values, and is proportional to exp(−|x|){cos(x)+sin(|x|)}, ∞ < x < ∞, which is the equivalent kernel for cubic smoothing splines (Silverman [239]), which also have a secondorder penalty. The rate of convergence in the interior is n −4/9 , which is the same as for a Nadaraya-Watson kernel estimator with fourth-order kernel. In the boundary region the equivalent kernel is only second-order and the rate of convergence is slower.
The results so far assume equally-spaced knots. Li & Ruppert [162] also study unequally-spaced knots. In this case, the asymptotic bias of m(x) depends on derivatives of the design density, which means that penalized splines are not "design-adaptive" in the sense of Fan [92].
Penalized splines have slower convergence at the boundaries of I than in the interior, whereas local polynomial regression with odd degree polynomials has the same rate of convergence at the boundaries as in the interior. This might seem like an advantage of local polynomial smoothing compared to penalized splines. However, if we compare the widely-used local linear regression smoother with penalized splines with the typical second-order penalty, then what we find is that the local polynomial and penalized spline smoothers have the same boundary rate of convergence. In the interior, the penalized spline has a faster rate of convergence. Thus, as typically implemented in practice, penalized splines suffer no disadvantage in rate of convergence relative to local linear estimators and, in fact, have an advantage in the interior region.
Kauermann, Krivobokova & Fahrmeir [144] study small-K asymptotics for generalized spline modeling, that is, with possibly non-Gaussian responses and a link function relating the expected response to the spline. They put an upper bound on the rate at which the smoothing parameter increases and K is required to grow at a fixed rate, rather than faster than this rate as in Li & Ruppert [162]. The framework is the generalized linear mixed model, and Laplace approximation is used to integrate out the random effects. The authors obtain rates of convergence for the mean squared error and expressions for the asymptotic bias and variance.
Kauermann [142] is a comparison of REML and C p methods of selecting the amount of smoothing for univariate penalized splines. Both asymptotic and finite-sample simulation results are presented. The asymptotics assume that K is fixed. One general conclusion is that REML tends to under-smooth in that REML leads to less smoothing than optimal for minimizing mean squared error.
In contrast, C p targets the mean squared error-optimal amount of smoothing. However, a more detailed look at Kauermann's results show that REML and C p have very different behaviors and which one smooths most depends on the underlying regression function, the number of knots, the sample size, and the random sample itself. One advantage of REML can be seen in Kauermann's [142] results: the REML choice of the amount of smoothing is less, and often far less, variable compared to that of C p .
As described more fully in Section 3.1, Bühlmann & Yu [32] derive asymptotics for boosting in a nonparametric regression context.
Our view of asymptotic theory is that, at least at present, it is mainly of theoretical interest. Penalized spline methodology already had a well-established place in practice before the recent advances in large-sample theory and we have not yet seen cases where asymptotic theory has lead to new methodology or changes in practice. It is well-known that, in nonparametric estimation, it often takes extremely large sample sizes before the asymptotics "kick in". So to be of practical value, asymptotics must be carefully compared with finite-sample results, either exact or by simulation. Nonetheless, asymptotics are important because they show that low-rank penalized splines can achieve the same rates of convergence as full-rank estimators such as smoothing splines.

Software
Semiparametric regression research is now being conducted at a time of rapid change in computing technology. In particular, the Internet age now facilitates fast and convenient dissemination of code. Software for semiparametric regression is continually being added to the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org) allowing free widespread use for anyone who 'speaks' R (R Development Core Team [221]). Developments in commercial packages are also afoot. For example, SAS (SAS Institute, Incorporated [234]) added PROC GAM for generalized additive model analyses in 2000.
Generalized additive model analysis in R is now well-served by the packages gam (Hastie [125]), mgcv (Wood [295]) and VGAM (Yee [299]). The mgcv package is accompanied by the book Wood [292]), which contains numerous illustrations of its use. It also provides for automatic selection of degrees of freedom values via GCV. The VGAM package distinguishes itself by facilitating the 'vector' extension of generalized additive models (Yee & Wild [302]) and now provides for quantile regression (Yee [298]). Additive semiparametric quantile regression is also available in R's quantreg package (Koenker [150]).
In Ruppert et al. [232] we mentioned SemiPar as a suite of S-PLUS functions to accompany the book's mixed model-based methodology. It has evolved into a package on CRAN (Wand et al. [279]). Other packages with direct links to semiparametric regression include AdapFit (Krivobokova [152]) on spatial adaptive smoothing and polspline (Kooperberg [151]) on regression spline fitting.
BayesX is a public domain software package that supports Bayesian semiparametric regression analysis using MCMC. It is housed in the Department of Statistics, University of Munich, Germany, and its current Internet address is www.stat.uni-muenchen.de/∼bayesx/. Brezger, Kneib & Lang [26] provides an overview of the capabilities of BayesX. They also demonstrate superior mixing and speed of their MCMC implementations in comparison to WinBUGS.
Several other software modules indirectly benefit semiparametric regression analysis through their support of related methodology such as geostatistics, kernel machines and mixed models. While they exist in a variety of forms, we will mainly confine discussion to those available on CRAN.
As we explain later in Section 3.2 kernel machines have fundamental connections with semiparametric regression. The R packages e1071 (Dimitriadou et al. [74]) and kernlab (Karatzoglou et al. 2007) provide for kernel machine fitting, including support vector machines.
As demonstrated by Ngo & Wand [200]), mixed model software can be very useful for semiparametric regression analysis. A key feature is the support of general random effects design matrices (Zhao et al. [310]). The SAS procedure PROC MIXED and the R package nlme (Pinheiro et al. [218]), each support general design matrices. The function glmmPQL() in the package MASS (Venables & Ripley, [269]) has structure similar to that of lme() and lme4() and facilitates generalized response semiparametric regression analyses via penalized quasi-likelihood. In lmeSplines (Ball, [10]) mixed model-based splines are the main focus. Exact likelihood ratio tests for semiparametric regression analysis, as discussed in Section 3.6, is supported by the R package RLRsim (Scheipl, [236]).
As we discussed in Section 2.3, practical Bayesian inference has benefited enormously from the BUGS software project (Lunn et al., 2000). The employment of BUGS is currently the fastest way to get hierarchical Bayesian models fitted -or at least proto-typed. Ruppert et al. [232] and Crainiceanu, Ruppert & Wand [63] demonstrate the use of BUGS for Bayesian semiparametric regression analysis. A brief example, which incorporates the variance component prior recommendations of Gelman [106], is the Bayesian logistic nonparametric regression model given at (9). Figure 2 provides a graphical representation of this model. Implementation in BUGS involves the model specification code: where tauBeta and tauA are the reciprocals of the hyper-parameters σ 2 β and A 2 . WinBUGS, the most popular version BUGS, can generate samples from posteriors of interest from the above code via a graphical user interface. However, a major breakthrough for efficient and well-managed analyses is the R package BRugs (Thomas, O'Hara, Ligges and Sturtz [257]; Ligges et al. [165]), which was first released in 2005, and its predecessor R2WinBUGS (Sturtz, Ligges and Gelman [254]; Sturtz et al. [253]), first released in 2004. These packages allow for a single R script to (1) set up the data, spline basis functions, and various tweaking factors; (2) write a BUGS script and call BUGS; and then (3) produce summaries of interest using the vast graphical capabilities of R. These important facets are not available if WinBUGS is used alone. Crainiceanu et al. [63] illustrated this approach with R2WinBUGS. However, our most recent Bayesian semiparametric regression work (as yet unpublished) has employed BRugs.

Advancement of models and methods
After reviewing the semiparametric regression literature from 2003-2007 we then categorized the various contributions according to broad themes, with regarding to advancement of models and methods. The following subsections emerged, and are presented alphabetically.

Boosting
Boosting uses an ensemble of classification or regression fits as a means of improving their performance. The ensemble elements are obtained iteratively, after application of standard procedures to weighted versions of the data. Boosting began within the field of machine learning during the 1990s. Early references are Schapire [235], Freund [100] and Freund & Schapire [101]. Far-reaching statistical connections, involving gradient descent methods and additive models, were discovered by Breiman [24] and Friedman, Hastie & Tibshirani [102]. These acted as catalysts for a great deal of statistical research on boosting, including its interplay with smoothing techniques. Section 2.1 of Tutz & Binder [264] describes the evolution from boosting as a means to improve classification procedures to a powerful tool for semiparametric regression analysis. We now describe some of these developments.
Bühlmann & Yu [32] provides an excellent introduction to the main ideas of boosting, by working with linear smoothers and the simplest version of boosting, known as L 2 boosting. Let y be the response vector and y λ be the vector of fitted values obtained from a linear smoother (e.g. penalized spline, kernelbased local linear) with smoothing parameter λ. Then the smoother matrix S λ is given by y = S λ y. The L 2 boosting fit at iteration m is one with fitted values y λ,m = I − (I − S λ ) m+1 . The case m = 1 corresponds to the 'twicing' methodology of Tukey [260]. Boosting, in general, involves repeated fitting of 'weak' classification or regression procedures. Bühlmann & Yu use asymptotics to explain a new type of bias-variance trade-off that arises from the interplay between m and λ. A very interesting result is that, for optimal values of m, the optimal smoothing parameter is larger than for the ordinary (m = 0) case. This is consistent with the boosting 'folklore' which says that iteration of weak procedures leads to better performance. For linear smoothers, 'weakness' can also be achieved by replacing S λ by S λ,ν = νS λ , 0 < ν ≤ 1. Bühlmann & Yu [32] provide simulations results that show very small ν and quite large m can be optimal. One example has (ν, m) = (0.01, 1691) as the optimal configuration, showing how slow convergence in boosting can be.
Tutz & Reithinger [266] integrated the ideas of boosting with semiparametric mixed models based on penalized splines. Their BoostMixed algorithm works with weak versions of the smoothers obtained by inflation of the smoothing parameters. Versions of AIC and BIC are used as stopping criteria. Leitenstorfer & Tutz [160] use boosting for knot selection in a regression spline approach to smoothing.
The fitting of generalized additive models via likelihood-based boosting is developed by Tutz & Binder [264], resulting in their GAMBoost algorithm. Advantages are found in the case of very many predictors. Binder & Tutz [16] use a large-scale simulation study to show that GAMBoost compares favorably with other methods for fitting generalized additive models where there are many candidate predictors.
A comprehensive account of the statistical aspects of boosting is provided by Bühlmann & Hothorn [31] and accompanying discussion. Important contributions include connections to smoothing splines and the lasso, asymptotic theory, degrees of freedom and implementation in the R computing environment.

Connections with kernel machines
Let X be a general domain. Given data (x i , y i ) ∈ X × R, 1 ≤ i ≤ n, and a convex loss function L, kernel machine estimation of f = argmin g E{L(y, g(x)} (where (x, y) has the same distribution as the (x i , y i )) involves modeling f to be of the form f( Here y and c are the vectors containing the y i and c i , λ > 0 is the regularization parameter, K = [K(x i , x j )] is the Gram matrix and 1 is a vector of ones of length n. There are several ways by which (10) can be derived; including reproducing kernel Hilbert space projection theory (e.g. Kimeldorf & Wahba [146]), best linear prediction of stationary spatial processes (e.g. Stein, 1999), maximum a posterior estimation in Gaussian processes (e.g. Rasmussen & Williams [225]) and Tikhonov regularization of ill-posed problems (Tarantola [256]). Support vector machines (e.g. Cristianini & Shawe-Taylor [64]) are a special type of kernel machine, in which L(y, g) = n i=1 (1 − y i g i ) + for vectors y = (y 1 . . . , y n ) and g = (g 1 , . . . , g n ) such that y i ∈ {−1, 1}.
Hastie & Zhu [128] show that kernel machine methods, such as support vector machines for classification, are no different in substance from many statistical methods involving penalization. Their second section provides some revealing connections via the use of spectral decomposition of the Gram matrix of kernel machines.
Pearce & Wand [215] elucidate connections between the penalized spline and kernel machine literatures. Particular attention is paid to support vector machines. Computational aspects of the resulting penalized spline support vector classifiers are studied by Ormerod, Wand and Koch [208].
Takeuchi, Le, Sears & Smola [255] exemplifies research from the machine learning community on nonparametric regression problems. They tackle the nonparametric quantile regression problem using kernel machines. Included are solutions to the quantile crossing problem and incorporation of monotonicity constraints on quantile functions.
Gianola, Fernando & Stella [111] combine the ideas of linear mixed models and kernel machines to predict total genetic value for quantitative traits. Random effects are used for genetic effects, while kernel machines are used for expression of single-nucleotide polymorphisms. Liu, Lin & Ghosh [174] derived similar models, with kernel machines used to handle interactions between expression of several genes. They conclude with some interesting commentary on further opportunities for the use of kernel machine methodology in biostatistical research.

Epidemiological aspects
In our view, epidemiology is an area for which semiparametric regression has a lot to offer and this is reflected in much of our own research. We now review recent semiparametric regression research having epidemiological aspects.
Kim, Carroll & Cohen [145] take a penalized spline/mixed model approach to generalized additive model analysis in matched case-control studies. They develop an approximate cross-validation scheme to choose the smoothing parameters and explored both Monte Carlo EM and Bayesian approaches to fitting.
The methodology of Carroll, Ruppert, Crainiceanu, Tosteson & Karagas [42]) (see Section 3.10) is applied to the OPEN (Observing Protein and ENergy intake) nutritional epidemiological study. Doubly labeled water, a biomarker for nutrient intake, is used as an instrument in a nonparametric regression measurement error model that relates true protein intake with that reported via a food frequency questionnaire.
Dominici, McDermott & Hastie [75]) work with smoothing spline-based Poisson additive models to assess the effect of particulate matter air pollution on mortality. The data are daily time series with smooth function components to account for seasonal and meteorological effects. Improved inferential techniques leads to strong evidence of association between short-term exposure to particulate matter less than 10 microns in diameter (PM 10 ) and mortality.
Congdon [53], MacNab [180] and MacNab & Gustafson [181] use semiparametric regression techniques in spatial epidemiological analyses. The first applies the methodology to spatial count data on lip cancer in Scotland and suicide data in London. The second and third of these apply the methodology to spatial count data on adverse medical events to hospitalized children, youth and elderly patients in British Columbia, Canada. Temporal and spatial trends within 84 local health areas are estimated and assessed.

Functional data analysis
Functional data analysis is concerned with data that are collected at very fine gradations in time or space. It is sometimes referred to colloquially as 'curves as data' and, from the outset, has had strong connections with nonparametric regression techniques. Ramsay & Silverman [222,223] provide a solid foundation for this relatively new field. Functional data analysis is also a close relative of longitudinal data analysis. It is not surprising that some recent work in functional data analysis makes use of modern semiparametric regression methodology.
Cardot, Ferraty & Sarda [38] consider functional linear models, where the predictor is a random function. They consider approaches involving penalized B-splines and smooth principal components regression and establish L 2 rates of convergence for each.
Coull & Staudenmayer [55]) take a linear mixed model approach to selfmodeling regression for multiple response curve data. An Expectation-Conditional Maximization algorithm (Meng & Rubin [192]) is developed for fitting and inference. Application is made to data on the respiratory effects of residual oil fly ash inhalation in humans.
Motivated by functional data on particulate matter exposure and heart-rate variability, Harezlak, Coull, Laird, Magari & Christiani [122] extends historical functional linear models (Malfait & Ramsay [182]) in the direction of mixed model-based splines with REML smoothing parameter selection. L 1 penalties, with AIC smoothing parameters selection, are considered as well.
Qin & Guo [219] build periodicity into functional mixed-effects models to better model the circadian rhythms of cortisol concentrations. They develop a state space representation of periodic splines and use Kalman filtering for estimation.
Morris, Vannucci, Brown & Carroll [197] make the initial step of waveletbased nonparametric modeling on hierarchical data, using Bayesian fitting methods. Morris & Carroll [196] introduce the notion of wavelet-based functional mixed model. Regularization and smoothing are done within the Bayesian paradigm, with easy-to-use code available at odin.mdacc.tmc.edu/∼jeffmo/papers files/wfmm supplement.html. Their methods are applied to functional mixed models data on the O 6 -methylguanine methyltransferase deoxyribonucleic acid repair enzyme in a colon carcinogenesis experiment. Morris et al. [194] extend this method to allow for missing response data and apply it to an accelerometer profile study. Morris, Brown, Herrick, Baggerly & Coombes [195] use the wavelet-based functional mixed model to analyze mass spectronomy data. Antoniadis & Sapatinas [3] also work with wavelet-based functional mixed models. Recent work on likelihood ratio testing for penalized splines (e.g. Crainiceanu et al. [63]; see Section 3.6) is employed. Risk bounds are established and the methodology is applied to stepping-cycle data from an orthosis study.
Marx & Eilers [189] extend their earlier work on penalized signal regression (Marx & Eilers [187]) to two-dimensional signal regressors. Their example of such a regressor involves digitizations along the emission wavelength axis of curves arising from a sugar processing experiment. The second dimension arises from these digitizations being done at several excitation levels. The prediction of ash content and color is of interest. The regression fitting and modeling involves tensor product extensions of P-splines and cross-validation. These leads to an estimated coefficient surface, and an image of 't-like' statistics over the wavelength/excitation plane. In Marx & Eilers [188] and Eilers & Marx [86] the authors apply their general approach to other chemometrics data sets, with some tailoring to the problems at hand.
Reiss & Ogden [226] also treat the signal regression problem. They start by pointing out that there two main approaches to dealing with the multicollinearity problem: smoothing (e.g. Marx & Eilers [187]) and component selection (e.g. Massy [190]). They develop functional versions of principal component regression and partial least squares, which combine these two approaches. Selection of the smoothing parameter is studied in depth. Both GCV and REML are considered, the latter arising from a linear mixed model representation of their procedures. Their simulation results show good performance of REML.
Yao & Lee [297] treat the functional principal component analysis problem (e.g. Ramsay & Silverman [222], Chapter 8) using an iterative penalized spline procedure that addresses within-subject correlation in functional data. Consistency results are established and application is made to yeast cell cycle gene expression data. Zhou, Huang & Carroll [313] use a novel low-rank principal components approach to address joint modeling of bivariate functional data and show that a seemingly unrelated regression phenomenon exists.
Baladandayuthapani, Mallick, Hong, Lupton, Turner & Carroll [9] develop an elaborate hierarchical functional data analytic model for data arising from a colon carcinogenesis study. It is tailored to suit the colonic crypt structure of rats. Bayesian representations of penalized splines are used to model signals as a function of distance within a crypt, while the Matérn covariance family is used to model correlation of signals between the crypts.

Geoadditive models
Geoadditive models combine the ideas of geostatistics and additive models. An example of a geoadditive model, with variable names as defined in Kammann & Wand [140], is Kammann & Wand [140] show how linear mixed models could be used for geoadditive model fitting and inference. However, several other papers (e.g. Wood [290]) have treated the same structure in other ways.
Extensions of geoadditive models in the direction of generalized responses are contained in Fahrmeir & Echavarría [89] and Zhao, Staudenmayer, Coull & Wand [310]. Zhao et al. [310] deal with exponential family models, whilst Fahrmeir & Echavarría [89] treat over-dispersed and zero-inflated count data. Each use a Bayesian mixed model framework, with fitting via MCMC, and provide applications.
The extension of geoadditive models to survival data has seen considerable research since 2003. Hennerfeind, Brezger & Fahrmeir [130] develop geoadditive survival models for both geographical point data and count data. They take a Bayesian P-spline approach and use Gaussian and Markov random fields for the spatial components. Kneib & Fahrmeir [149] lays out the mathematics underpinning geoadditive hazard regression models. Kneib [147] extends these models to handle interval censored data. Adebayo & Fahrmeir [1] develop a geoadditive discrete-time survival model and use it to analyze child mortality data. Ganguli & Wand [105] also deal with geo-referenced survival data, and use the low-rank radial smoothers of Kammann & Wand [140].
Geoadditive models have also been adapted to model space-time data. Fahrmeir, Kneib & Lang [90]) and Kneib & Fahrmeir [148] use low-dimensional smooths involving time and age to model forest health data, in conjunction with Gaussian and Markov random fields for the spatial effects. Gryparis, Coull, Schwartz & Suh [117] also involves space-time data, but their geoadditive model is an elaborate one that includes latent variable structure for multiple exposures from mobile particulate matter.
Geoadditive models with missing data covariate data is studied by French & Wand [99]). Chen & Ibrahim [48] extend that work to geoadditive models that allow for specification of the covariate distribution and the missing data mechanism.

Inference
In its early years, smoothing techniques were developed with little regard to related inferential questions such as linearity versus non-linearity of a particular covariate effect. This is especially noticeable in the early kernel smoothing literature. In recent years, however, this situation has been redressed and there is now quite a large literature on inference in smoothing contexts. The mixed model representation of smoothing splines and penalized splines offers a particularly attractive framework for this endeavor. This is because the well-established tools of likelihood-based and Bayesian inference are readily available. While there is a great deal of research on inference for other approaches to smoothing since 2002, we confine discussion mainly to smoothers based on mixed models. A significant portion of the 2003-2007 literature involves likelihood ratio tests for testing departures from linear models. This boils down to tests on variance components being different from zero. The classical reference for tests of this type, in which the null value of the parameter is on the boundary of its space, are Self & Liang [237] and Stram & Lee [252]. However, their theory assumes independence under the null and alternative hypotheses. This is not the case for many mixed model scenarios, including penalized splines and several recent papers by C. Crainiceanu and co-authors are concerned with rectifying this situation. The main smoothing paper from this body of work is Crainiceanu, Ruppert, Claeskens & Wand [63]. It builds upon Crainiceanu & Ruppert [58]), where exact distribution theory for the likelihood ratio statistic in Gaussian linear mixed models is obtained. Crainiceanu et al. [63] also obtain confidence intervals for the smoothing parameter by inverting likelihood ratio tests. Claeskens [51]) contains asymptotic results for this setting, but with the number of knots increasing with the sample size and certain restrictions on the design matrices that are not satisfied by standard penalized spline models. Crainiceanu and Ruppert [59] develop likelihood ratio and restricted likelihood ratio tests of goodness-of-fit of nonlinear regression models. Liu & Wang [175]) review various versions of linearity tests based on Bayesian representations of smoothing splines and conduct a simulation study to assess their frequentist properties.
The exact distribution theory used in the papers of the previous paragraph applies only to the situation where there is a single variance component. Exten-sions to models with multiple covariance components is conducted by Crainiceanu & Ruppert [60]) and Greven, Crainiceanu, Kuechenhoff & Peters [116]. For testing hypotheses involving variance components being distinct from zero, remedies to the null distribution problem include use of the parametric bootstrap and approximation of the likelihood ratio statistic by a product of independent χ 2 1 and Bernoulli random variables. Greven et al. [116] demonstrate good performance of the second approach and also propose an approximation to the null distribution of the restricted likelihood ratio statistic using an idea similar to pseudo-likelihood estimation. Crainiceanu, Ruppert, Claeskens and Wand [63] show via simulation studies that the power properties of the likelihood ratio tests compare favorably those of competing tests.
In the context of least-squares kernel machines, Liu, Lin & Ghosh [174] develop a score test for non-linearity that relies on a mixed model representation. Sattherwaite's approximation is used to obtain approximate p-values.
Extension of likelihood ratio tests in the generalized response setting is challenging due to the presence of intractable integrals in the likelihoods. Lin [166] and Lin & Zhang [171] developed score tests for GLMM settings, the latter reference including generalized additive models through the mixed model representation of smoothing splines. This general approach has since been extended to additive mixed models (Zhang & Lin [308]), varying coefficient models for longitudinal data (Zhang [307]) and proportional hazards models (Lin, Zhang & Davidian [172]).
Wood [293] develops approximate Bayesian confidence intervals (see Section 6.4 of Ruppert et al. and references given there) for the estimated functions in generalized additive models. He takes advantage of the low-rank aspect of penalized splines so that the distribution theory involves the relatively small random vector of spline basis coefficients. The generalized case is dealt with by using a weight matrix approximation in the ridge regression expression. It is also explained how inference for functionals of the coefficient vector can be made without time-consuming bootstrap replications. This innovative paper finishes off with proposals on how to avoid MCMC in the 'fully Bayesian' case, in which variability due to smoothing parameter choice variance components is taken into account.
The Bayesian mixed model approach to semiparametric regression has immediate benefits regarding inference. For example, non-linearity versus linearity of covariate effects can be assessed through the posterior distributions of variance components. They are several new papers on Bayesian on semiparametric regression, scattered throughout Section 3 of this review article.
Lastly, we mention contribution spline-based approaches to the scale-space approach to feature significance, sometimes known as 'SiZer' (Chaudhuri & Marron [44], and summarized in Section 6.9 of Ruppert et al. [232]. Ganguli & Wand [104]) facilitates feature significance for bivariate smoothing, or geostatistics, by developing the appropriate theory for low-rank thin plate splines. Marron & Zhang [186] develop the requisite theory for a (full-rank) smoothing spline version of SiZer.

Latent variable models
The introduction of Skrondal & Rabe-Hesketh [242]) defines a latent variable as a random variable whose realizations are hidden from the analyst and gives, as examples of their utility, data measured with error, hypothetical constructs and latent responses underlying categorical variables. Mixed models play a prominent role in latent variable modeling so, for this reason alone, have common ground with contemporary smoothing techniques. Latent variable modeling is a growth area in Statistics in general, and has had some interplay with semiparametric regression in the last five years.
Tutz & Scholz [267]) use the principle of maximum random utility to link multi-category responses to latent utilities. They allow for dependence on covariates via additive and varying coefficient structure, aided by penalized splines. Fahrmeir & Raach [91] develop Bayesian semiparametric latent variable models, including those that allow spatial effects to be incorporated. They involve measurement models for mixed continuous, binary and ordinal responses. For example, the discrete value of ordinal responses are assumed to be generated through a threshold mechanism.
Elliott [87] uses smoothing splines and their mixed model representation to build flexibility into latent cluster models. These relate underlying 'clusters' of variability to measures of interest. Application is made to data on depression levels for patients recovering from myocardial infarction.
Gryparis, Coull, Schwartz & Suh [117], described more fully in Section 3.5, has a latent variables aspect for handling multiple exposures.

Longitudinal data analysis
Mixed models have been a staple of longitudinal data analysis for the last 25 years. This partnership has resulted in a high volume of mixed model methodology and software development over the same time period. The mixed model approach to penalized spline smoothing not only allows one to take advantage of these developments, but means that longitudinal structure is easy to incorporate. Nowadays, a single linear mixed model can be used to perform an elaborate longitudinal data analysis that incorporates nonparametric estimation of several smooth functions (e.g. Zhao et al. [310]. A component of recent semiparametric longitudinal data analytic research has been concerned with marginally specified models such as (11). We review this research in Section 3.9. The models covered in this subsection differ in that that are defined conditionally.
Ghidey, Lesaffre & Eilers [109]) develop the penalized Gaussian mixture linear mixed model. It involves function estimation via spline basis functions that are Gaussian densities and random effects modeled as mixtures of normal distributions. Particular attention is paid to two-dimensional random effects structure.
Durbán, Harezlak, Wand & Carroll [79] describe mixed models models for fitting subject-specific curves to longitudinal data. Models of this general type have been developed by several other authors (e.g. Donnelly, Laird & Ware [76]; Verbyla et al. [271]. The low-rank spline approach of Durbán et al. [79] is particularly simple and has the ability to handle very large sample sizes with standard mixed model software (code is included in an appendix). Harezlak, Ryan, Giedd & Lange [124] fit similar models to data from accelerated longitudinal designs where subjects enter the study at different points of their growth trajectory and are observed over a relatively short time period. Application is made to longitudinal magnetic resonance imaging data from an ongoing developmental study. Smith & Wand [243] focus on the variance calculations required for inference in semiparametric mixed models. They describe streamlined algorithms that yield two orders of magnitude improvements over naïve variance calculations.
Welham et al. [286] and Zhang et al. [309], as detailed in Section 3.16, deals with semiparametric longitudinal models under periodicity constraints. Zhao et al. [310], discussed in Section 3.13, provides quite a general treatment of Bayesian generalized response models that include longitudinal models as a special case.
The likelihood ratio methodology of Crainiceanu & Ruppert [58]) and Greven et al. [116] (Section 3.6), is applied to inference in longitudinal settings. Qu & Li [220] develop quadratic inference functions for fitting and inference in varying coefficient models for longitudinal data.
Harezlak, Naumova & Laird [123] devise a bump hunting test for longitudinal data, based on the subject-specific curves model of Durbán et al. [79].
Finally, we note that more extensive reviews of this subsection's general topic are provided by five chapters under the heading Nonparametric and Semiparametric Methods for Longitudinal Data in Fitzmaurice, Davidian, Verbeke & Molenberghs [97].

Marginal longitudinal models
Research on the marginal longitudinal nonparametric regression model (see (11) below) continues at a steady rate. Early contributions to this setting include Zeger & Diggle [306] and Lin & Carroll [167]. While most early research involved kernel smoothing, more recent approaches involve spline smoothing. Marginal models differ from the conditionally specified models of Section 3.8 in that they do not model the within-subject correlation or the error process.
The simplest setup is as follows. For 1 ≤ i ≤ m subjects we observe 1 ≤ j ≤ n (n ≪ m) responses y ij and predictors x ij . (Somewhat annoyingly, the m and n notation alternates in the literature between their roles given here and the reversal; i.e. that where n is the number of subjects and m is the number of measurements. In this paper we stick with the notation used by Diggle, Heagarty, Liang & Zeger [72] and Ruppert, Wand & Carroll [232].) Let y i be the vector of responses for the ith subject and x i be defined similarly. The marginal longitudinal nonparametric regression model is then for some smooth function f and n × n covariance matrix Σ Σ Σ. A noteworthy, somewhat paradoxical, result is that ordinary kernel smoothers are more efficient if so-called working independence is assumed (Lin & Carroll [168]. Wang [282] develops a more elaborate kernel smoothing strategy that escapes from this paradox and is uniformly more efficient. Welsh, Lin & Carroll [288] use equivalent kernel theory to demonstrate that penalized spline estimators are non-local compared with kernel smoothers. This means that the 'legality' of working independence justified by Lin & Carroll [167] for ordinary kernel smoothers does not apply to penalized splines. Lin, Wang, Welsh & Carroll [170]) brings together earlier papers by the authors on (11). Theoretical results include asymptotic equivalence between the Wang [282] kernel estimator and a smoothing spline-based estimator, and optimality of these two approaches.
Other contributions to theory and methodology for (11) [169]. Interestingly, there is little use of low-rank spline modeling in this context. Linton, Mammen, Lin & Carroll [173] and Carroll, Hall et al. [40]) discuss a two-stage approach that estimates Σ Σ Σ from the residuals of an unweighted fit and then computes a penalized spline estimator, finding good efficiency. But to the best of our knowledge the low-rank spline mixed model approach has not been implemented in this context. Current research on this approach, led by the second and third authors, is under way.

Measurement error models and deconvolution
Carroll, Ruppert, Stefanski & Crainiceanu [41] provides a recent and comprehensive review of non-linear measurement error models. In their preface to this second edition the authors point out that, in 11 years since the book's first edition, semiparametric regression and Bayesian computation via MCMC have grown enormously. These threads run through much of the contemporary research on nonlinear measurement error models. Chapters 9, 12 and 13 of Carroll et al. [41] summarize most of the relevant literature. We now supplement those with some recent literature that is closest to the Ruppert, Wand & Carroll [232] genre.
Carroll, Ruppert, Crainiceanu, Tosteson & Karagas [42]) study non-linear and nonparametric regression when there is covariate measurement error and an instrumental variable is available. They consider several approaches to estimation and, in a simulation study, a Bayesian spline estimator similar to the one in Berry, Carroll, and Ruppert [15] is the most effective.
Ganguli, Staudenmayer & Wand [103] studied additive model fitting and inference when measurement error is present in one or more predictors. They use a maximum likelihood approach and advocate use of the Monte Carlo EM algorithm for fitting and inference.
The periodicity-constrained functional mixed models of Zhang, Lin & Sowers [309] (see Section 3.16) handle measurement error in the predictor, follicle stimulating hormone, via a two-stage approach.
Ma & Carroll [179] show how to estimate nonparametric functions in semiparametric models while making no assumptions about the distribution of the variables measured with error.
Mallick, Hoffman & Carroll [184] use a Bayesian approach to fitting nonparametric functions when the measurement errors are mixtures of Berkson and classical types. They use a Direchlet process to estimate the distribution of the mismeasured covariate essentially nonparametrically. Their method is applied to the Nevada Test Site radiation study. Carroll, Delaigle & Hall [39] use a deconvolution approach in the same context.
Liang, Wu & Carroll [164] develop mixed effects varying coefficient measurement error models, applying the methods to AIDS data.
Ruppert, Nettleton & Hwang [231] use penalized B-splines on a deconvolution problem from multiple testing. Assume the ith hypothesis is H 0i : δ i = 0, 1 = 1, . . . , n; δ i ≥ 0 might be a non-centrality parameter. The problem is to estimate the distribution of δ i , but δ i is not observed. To estimate this distribution, π 0 , the proportion of true nulls and G, the distribution of δ i under the alternative, are estimated. The estimate π 0 is useful to estimate the false discovery rate (Benjamini and Hochberg [13]) and the authors show that G can be used to estimate the expected discovery rate, the true negative rate, and the true positive rate.
Staudenmayer, Ruppert & Buonaccorsi [248] study density estimation under heteroskedastic measurement error. Deconvolution methods that assume homoskedasticity over (under) correct in regions where the measurement error variance is smaller (greater) than average. To remedy this problem they introduce a variance function and estimate the density and the variance function as splines.
It is our belief that penalized splines are proving to be a very effective, perhaps the most effective, method for deconvolution and correction for measurement error. The reason is that, when given a Bayesian implementation, they utilize the likelihood and achieve high efficiency. In contrast, earlier more ad hoc method extract less of the information available in the data.

Missing data
Because of space and time limitations, a missing data chapter was missing from Ruppert et al. [232]. However, many contemporary methods for handling missing data use likelihood-based or Bayesian inference that is in keeping with our semiparametric regression methodology. While there has been a modest amount of work in this direction, which we now summarise, our feeling is that there is still room for more such research.
French & Wand [99]) develop a likelihood-based model for missing covariate data in geoadditive model (Kammann & Wand [140]) analyses. Monte Carlo EM and a version of penalized likelihood is used for fitting and inference. An application involving relative cancer mapping, with missingness in smoking status, is presented.
Chen & Ibrahim [48] develop likelihood-based semiparametric regression models, including those with bivariate smoothing, for specifying the covariate distribution and the missing data mechanism. The EM algorithm is recommended for fitting, and application is made to data from a melanoma clinical trial.
Bivariate smoothing is also considered by Geraci & Bottai [108]. They treat the incorporation of auxiliary data when there are non-ignorable missing responses. Mixed model-based low-rank kriging is used for bivariate smoothing, and Monte Carlo EM for fitting. Application is made to mapping of phytoplankton data.
Penalized splines are used in a missing data situation with clustering by Yuan & Little [305]. Several missing data mechanisms are entertained. Hierarchical Bayesian models are used and Gibbs sampling employed for fitting. Application is made to a childhood obesity study.

Model selection
In Ruppert, Wand & Carroll [232] we noted (Section 8.6) that model selection for semiparametric regression was still in its infancy, and provided a handful of references -particularly in the special case of additive models with several candidate predictors. There have been a few developments since 2003 on this problem.
Wager, Vaida & Kauermann [273] use the mixed model representation of penalized spline semiparametric regression models and versions of AIC to obtain a model selection algorithm for the continuous response case. The smoothing parameters of the fitted models are estimated from the data using (restricted) maximum likelihood.
Avalos, Grandvalet & Ambroise [6] work with smoothing splines and the lasso (Tibshirani, 1996) to choose among additive models. The lasso has the feature of annihilating coefficients rather than shrinking them, resulting in better parsimony. An approximation of the generalized cross-validation is used for smoothing parameter selection. Vandenhende, Eilers, Ledent & Renard [268] make use of penalized splines, GCV and the lasso to sift through candidate biomarkers in drug development applications.
Hens, Aerts & Molenberghs [131] is on model selection for incomplete and design-based samples based on a weighted AIC. While it is mainly concerned with parametric regression model, the approach is extendable to semiparametric regression.

Non-Gaussian response models
Fitting and inference for semiparametric regression models when the response is non-Gaussian usually entails an extra layer of complexity due to the non-explicit forms that arise. Research on this front roughly parallels that of generalized linear mixed models, where analytic approximations and MCMC are the main combatants.
Wood's [292] book is a contemporary account of generalized additive model fitting and analysis -accompanied by the R package mgcv [295]. Generalized additive mixed models are also treated. Smoothing parameter selection is achieved through generalized cross-validation, AIC and penalized quasi-likelihood. Zhao, Staudenmayer, Coull & Wand [310] work with a general form of the generalized linear mixed model that includes most exponential family semiparametric regression models as a special case. They adopt a Bayesian approach and describe MCMC fitting and inference using BUGS (e.g. Lunn et al. [177]) software. Skaug & Fournier [241] investigate the use of automatic differentiation in a general GLMM framework. A semiparametric regression example is included.
The past few years has seen several extensions of semiparametric regression beyond the one-parameter exponential family situation. Nott [201] works with the double exponential family (Efron [80]) and shows it to be a good vehicle for handling both mean and variance functions. He calls upon the Single Component Adaptive Metropolis algorithm of Haario, Saksman & Tamminen [120] to perform fitting. Branscum, Johnson & Thurmond [19] extend the Bayesian semiparametric regression approach to responses from the Beta family of distributions. The paper revolves around two applications on household expenditure and foot-and-mouth disease. Houseman, Coull & Shine [133] and Skaug & Fournier [241] each include models with negative binomial responses. Tutz [261] and Tutz & Scholz [267]) develop semiparametric regression models for, respectively, ordinal and multinomial responses.
Another area of recent activity for non-Gaussian semiparametric regression is modeling of sample extremes. Chavez-Demoulin & Davison [45] develop smoothing spline-based generalized additive models for exceedances-above-threshold data. The penalized likelihood corresponds to the generalized Pareto distribution because of its role as a limiting distribution in this context. Yee & Stephenson [301] work with sample maxima data and the generalized extreme value distribution and develop vector generalized additive models in this context. Several other papers, published since 2003, deal with semiparametric regression with non-Gaussian response -but are discussed elsewhere in this review article. Non-Gaussian spatial models are dealt with by Fahrmeir [308] and Zhang [307]) describe hypothesis testing for variance components in GLMM smoothing contexts. Tutz & Binder [264] and Binder & Tutz [16] provide boosting-type procedures for fitting generalized additive models.

Quantile regression
There have been a few new approaches to semiparametric regression that target quantiles, rather than means and variances. None of these use mixed models or hierarchical Bayesian approaches.
Yee [298] embeds the LMS (i.e. λ, µ, σ) quantile regression method of Cole & Green [52] in the vector generalized additive models (VGAM) framework. An improvement to the LMS method, based on the Yeo-Johnson transformation is developed. Non-Gaussian responses, such as those from the Gamma family, are also treated.
Bollaerts, Eilers & Aerts [17] and Takeuchi, Le, Sears & Smola [255] each use the ideas of constrained smoothing and 'pinball' loss functions to impose noncrossing in quantile regression. Bollaerts et al. [17] uses P-splines, making it more in keeping with traditional semiparametric regression. Takeuchi et al. [255] use the kernel machine approach which, as mentioned in Sections 2.5 and 3.2, is becoming increasingly intertwined with semiparametric regression research.
Choudhary [46] used Bayesian penalized splines to estimate a quantile function for the problem of assessing agreement between two measurement methods.

Sample survey aspects
An interesting development in recent survey sampling estimation research, led by F.J. Breidt and J.D. Opsomer, is the incorporation of nonparametric regression methodology. An early reference is Breidt & Opsomer [22] on the use of local polynomial regression. However, some more recent contributions have used penalized splines. The first of these is Breidt, Claeskens & Opsomer [20] where it is stated that, unlike for local polynomial regression, the theory for penalized splines closely follows the established survey linear regression theory. Breidt et al. [20] is concerned with the incorporation of auxiliary covariate information in the design-based estimation of finite population totals in complex surveys. Theorems on design root-n consistency of the penalized spline regression estimator are provided.
The previous article uses the fixed-penalty formulation of the penalized spline and primarily considers inference with respect to the sampling design, as is most commonly done in survey estimation. Other work uses the penalized spline's mixed-model representation to develop model-based estimators for survey data. Zheng and Little [311] estimate a finite population total by predicting the unobserved part of the population based on a model for the relationship between the variable of interest and the inclusion probabilities. This is extended to the two-stage sampling context in Zheng and Little [312]), and further incorporates a random item response model in Yuan and Little [305]. Opsomer, Claeskens, Ranalli, Kauermann & Breidt [207] consider spline-based small area estimation, a type of modeling widely used for survey estimation problems but relying almost exclusively on linear mean model specifications (Rao [224]). Opsomer et al. [207] combine univariate and bivariate penalized splines with the commonly used small area random effects model, and they establish a theorem on the predicted mean squared error properties of the resulting REML-based predictor of the small area means.
Opsomer, Breidt, Moisen & Kauermann [206] is at the applied end of the spectrum. They describe how design-based estimation of quantities, such as forested area or total wood volume over large regions, can be enhanced through the incorporation of geographic auxiliary information such as elevation and slope and of satellite-derived measurements. Generalized additive models are used to incorporate the auxiliary variables.
A recent review article (Breidt and Opsomer [23]) on nonparametric and semiparametric estimation methods in complex surveys discusses these methodological developments in more detail, and provides further information on the design-based and model-based modes of inference for surveys.

Smoothing under constraints
Another area of semiparametric regression that has seen vigorous activity during 2003-2007 is smoothing subject to constraints on mean and quantile functions. The predominant types of constraints in this work are monotonicity and periodicity of regression functions and non-crossing of quantile functions.
Bollaerts, Eilers & van Mechelen [18] explain how to build several shape constraints into univariate and multivariate P-spline quantile regression. Ghosh [110] focus on monotonicity in the binary response regression problem, making use of mixed models and the pooled adjacent violators algorithm, geared towards biomarker evaluation. Tutz & Leitenstorfer [265] take a boosting approach to enforcing monotonicity. They arrive at two algorithms: MonBoost for continuous responses and GMonBoost for generalized responses.
Driven by data from longitudinal studies, Welham, Cullis, Kenward & Thompson [286] and Zhang, Lin & Sowers [309] impose periodicity constraints on their fitting curves. Welham et al. [286] use the notion of L-splines (e.g. Kimeldorf & Wahba [146]; Ansley, Kohn & Wong [2] in the penalized spline/mixed model set-up, using specifically designed differential operators that annihilate sine and cosine functions. Zhang et al. [309] work with smoothing splines, and also account for measurement error, in work motivated by a hormone study. Eilers, Gampe, Marx & Rau [84] build periodicity-type constraints into models for data from seasonal incidence tables.
The quantile regression articles of Bollaerts, Eilers, and Aerts [17] and Takeuchi et al. [255], outlined in Section 3.14, allow for the imposition of monotonicity.
Other constrained smoothing research includes Eilers [81], in which unimodality is the focus, and Gluhovsky & Vengerov [113] in which penalized splines are used to do multivariate constrained extrapolation.

Spatial adaptivity
Each of the main smoothing techniques (e.g. local polynomials, smoothing splines, wavelets) have an accompanying literature on methods by which improved spatial adaptivity can be achieved. The idea is to perform differing amounts of smoothing at different locations and better recover spatially heterogeneous signals. Chapter 17 of Ruppert, Wand & Carroll [232] describes spatially adaptive extensions of penalized splines. However, there has been some further work in this area.
Lang & Brezger [158]) develop spatially adaptive Bayesian penalized splines for univariate and bivariate smoothing by allowing smoothing parameters to be locally adaptive. Baladandayuthapani,Mallick & Carroll [8] take a different approach that involves incorporation of a penalized spline estimate of the variance function into the penalty. Extension is made to additive models. Crainiceanu, Ruppert, Carroll, Joshi & Goodner [61] develop a Bayesian approach to spatially-adaptive penalized splines in the presence of heteroscedastic errors. They combine three spline models: one for the regression function, a second for the logarithm of the locally varying penalty on the regression function, and a third for the logarithm of the variance function. The authors also generalize their model to multivariate smoothing using low-rank thin-plate splines. In Baladandayuthapani et al. [8] and Crainiceanu et al. [61], special Metropolis-Hastings schemes are developed for implementation. Particular attention paid to improved mixing via innocuous model modifications.
Krivobokova, Crainiceanu & Kauermann [153] use similar models to those used by Baladandayuthapani et al. [8] and Crainiceanu et al. [61]. However, they use Laplace approximation rather MCMC and thereby obtain big speed improvements. Non-normal response is also treated. An R package named AdaptFit accompanies the paper.
Leitenstorfer & Tutz [160] also achieve spatial adaptive via model selection on the knots and a version of boosting.
Paciorek & Schervish [213] introduce a new class of non-stationary covariance functions for spatial smoothing via Gaussian processes. Non-stationarity essentially equates to spatial adaptivity.

Spatial and other high-dimensional data
Section 2.2 covers advancement of fundamental principles for multivariate smoothing. In this section we review new semiparametric regression models and methodology that have a multivariate smoothing component. Excluded, however, are geoadditive models, which are treated in Section 3.5.
Wager, Coull & Lange [272]) develop an approach labeled "mixed model intensity kriging" based on inhomogeneous Poisson spatial processes. A lowrank version of kriging is achieved through Voronoi tessellation of the plane. Application is made to spatial data arising from brain imaging studies.
Sain, Jagtap, Mearns & Nychka [233] develop a new multivariate spatial model, utilizing splines and mixed models, for soil water profiles. A particularly novel aspect is bivariate smoothing of the soil-texture triangle -where the relative proportions of sand, silt and clay are plotted.
Brezger, Fahrmeir & Hennerfeind [25] use the ideas of Bayesian semiparametric regression in the analysis of functional magnetic resonance imaging data. Space-varying coefficient models are developed, with the goal of improving upon the voxel-by-voxel approaches of earlier functional magnetic resonance imaging papers. Heim, Fahrmeir, Eilers & Marx [129] apply the same class of models to diffusion tensor images, also arising from magnetic resonance techniques. Pe-nalized splines are used at all stages of a three-step cascade of data processing: voxel-wise regression, smoothing and interpolation.
Dean, Nathoo & Nielsen [68] use penalized splines as a component of multistate models for longitudinal panel count data, where the processes corresponding to different subjects may be spatially correlated. Application is made to weevil infestation in white spruce trees.
Crainiceanu, Diggle & Rowlingson [57] use the binary response version of penalized bivariate splines binary response to model Loa loa prevalence in tropical Africa. A Bayesian/MCMC approach to fitting and inference is adopted. A fast method for approximate predictive inference, based on a calibration model, is developed.
Apanasovich et al. [4] investigate low-rank spline smoothing in a spatial context. They use penalized regression splines and develop a novel method for smoothing parameter selection that overcomes the well-known biases of crossvalidation with correlated data. Li et al. [163] show how to estimate a correlation function in longitudinal and spatial data. Both papers give applications to colon carcinogenesis experiments.

Survival analysis
The extension of parametric survival models for survival data to accommodate non-linear covariate and geographical effects continues to be a vibrant area of semiparametric regression research.
A variety of methods for fitting, inference and smoothing parameter type are proposed. For example, Lambert & Eilers [156] call upon the Langevin-Hastings algorithm, while Brown et al. [28] develop a hybrid smoothing parameter selector, based on AIC and penalized quasi-likelihood.
Lin, Zhang & Davidian [172] work with mixed model and spline-based extensions of the proportional hazard model. Score-test tests for the proportional hazards assumption and covariate effects are developed.
Namata et al. [198] develop GLMM-based methodology for current status data, geared towards an infectious diseases application.
Another interesting development is the integration of penalized spline smoothing into actuarial science -as exemplified by Currie, Durbán & Eilers [66]). In this case, the data take the form of mortality tables. The raw mortality table data used here, obtained from a United Kingdom insurance and pensions database, takes the form of two 53 × 90 matrices corresponding to the calendar years 1947-1999 and males between 11 and 100 years of age. One matrix is number of deaths; the other is number of years lived. The raw hazards matrix is the ratio of the first matrix to the second. Univariate and bivariate penalized spline smoothing is applied to the raw hazards to arrive at forecasts of mortality rates up to 2050.

Temporal data
The use of smoothing techniques in the analysis of temporal (time series) data has flourished in the past two decades -see, for example, Fan & Yao [94]. However, most of this work has involved local polynomial kernel smoothing. The permeation of these ideas to spline-based semiparametric regression is still quite mild.
Houseman, Coull & Shine [133] develop negative binomial time series models for modeling enterococcus counts in Boston Harbor, utilizing penalized splines and mixed model representations. Jank & Shmueli [136] use the same general approach to model concurrency of events in on-line auctions.
General correlation structures for mixed model-based smoothing are considered by Durbán & Currie [78] and Krivobokova & Kauermann [154]. The latter reference contains asymptotic theory for the smoothing parameter chosen via AIC and REML, and application to finance time series data.
As mentioned in Section 3.3, Dominici et al. [75]) use and modify generalized additive model technology for air pollution time series data. Gryparis et al. [117], discussed in Sections 3.5 and 3.7, has a temporal data aspect.

Miscellanea
A few 2003-2007 papers involving semiparametric regression do not fall into any of the categories corresponding to the previous subsections.
Yee & Hastie [300] extends reduced-rank regression (e.g. Izenman [135]) to the class of vector generalized linear models. While this work is mainly parametric, some non-linear modeling based on regression splines is used.
Yu & Ruppert [304]) build on their earlier work (Yu & Ruppert [303]) on partially linear single-index models using penalized splines. In particular, they remove the assumption of compactness and establish root-n consistency of the regression coefficients.
Wood [291]) is a rare instance of a semiparametric regression contribution that delves deeply into numerical issues. For example, pivoted QR decomposition is used to make GCV parameter choice in generalized additive (mixed) models more stable and efficient. Later releases of the author's R package, mgcv (Wood [295]), make use of this methodology.
Banerjee, Maiti & Mukhopadhyay [11] use penalized splines to build classification rules for the pathological state of prostate cancer patients. In Choudhary & Ng [47], penalized spline estimates of both mean and variance functions are employed assess agreement between two methods of measurement.
Piepho & Ogutu [217] explains how simple state-space models can be expressed as linear mixed models. Estimation via REML as an alternative to the Kalman filter is investigated and some advantages are found. It is also explained how smoothing is achieved via integration of state-space components and how the class of covariance structures for modeling serial correlation is broadened via state-space representations.
Lee & Oh [159] develop robust of semiparametric regression procedures based on M-type penalized spline smoothers. Extension is made to additive mixed models, with a robust modification of REML for variance component estimation.
Eilers [82] uses the discrete Whittaker smoother in meta-analysis. His approach includes nonparametric estimation of the latent distribution of event probabilities in control and treatment groups, and a smoothed EM algorithm with improved convergence to maximum likelihood estimates of the parameters in the latent distribution model.

Review articles
A few articles have reviewed aspects of semiparametric regression in the last few years. We briefly mention some of them here.
Tutz [262]) reviews semiparametric mixed models in the case of generalized responses. Generalized linear mixed models are shown to play a central role. Maximum likelihood is the main fitting tool. Techniques for dealing with the intractable integrals, such as Gauss-Hermite quadrature and the EM algorithm, are described. Similar structures, although within the Bayesian framework and MCMC are treated by Zhao et al. [310].
Brezger & Lang [27] reviews Bayesian penalized spline approaches to generalized additive models. Pointers to implementation in the authors' BayesX package are included.
In Section 3.8 we mentioned the five-chapter component of Fitzmaurice et al. [97]. Together, these provide a detailed account of recent semiparametric regression research involving longitudinal data.
Finally, we mention two books from the last few years that have strong semiparametric regression themes. Wood [292] presents a thorough account of generalized additive models, with emphasis on implementation in R. Wu & Zhang [296] focuses on semiparametric regression for longitudinal data, with emphasis on mixed model approaches.

Applications
Ruppert et al. [232] emphasize the modularity of low-rank spline smoothers; a spline can be embedded as a nonparametric module into a larger model with parametric components. This type of use of such splines in applications has become increasingly sophisticated, as the following selection of applied papers show.

Blood lead exposure on intellectual impairment
Canfield et al. [36] present an interesting application of semiparametric modeling to an important health problem. The authors study the intellectual impairment in children due to blood lead concentrations below 10 µg per deciliter, the "level of concern" as defined by the Centers for Disease Control and the World Health Organization. They measured blood lead concentration in 172 children at 6,12,18,24,36,48, and 60 months of age and modeled longitudinal effects with a mixed model. Maternal intelligence quotient (IQ), quality of the home environment, and other potential confounders are adjusted linearly. Preliminary data analysis suggest that the dose-response curve for IQ might be steeper, that is, IQ decreases more rapidly, in the 0-10 µg per deciliter range compared to blood lead concentrations above 10 µg per deciliter. To model the nonlinear dose-response, the authors used a penalized spline. This semiparametric analysis corroborates the preliminary finding that IQ declines more rapidly with blood lead concentration at low doses compared to dose above 10 µg per deciliter. This result is in disagreement with the previous belief that 10 µg per deciliter is the "level of concern," and the authors suggest that considerably more children are adversely affected by lead exposure than previously believed.

Spatial and temporal distribution of particulate air pollution
Gryparis, Coull, Schwartz & Suh [117] model the spatial and temporal distribution of particulate air pollution in the greater Boston area. Data are available mostly from three Boston area monitoring studies, and there are two surrogates of mobile source pollution, black carbon (BC) and elemental carbon. The authors use a semiparametric latent variable model for combining these multiple surrogates for a common mobile source of pollution. The measurement error model is y ij = g(Λ Λ Λ i , η ij ) + ε ε ε y ij , where y ij is a vector of measurements at location i and day j, g is a known function, Λ Λ Λ i is an unknown matrix of factor loadings, η ij is a latent variable and ε ε ε y ij is an error vector. The loadings matrix Λ Λ Λ i is modeled as having a linear regression on known covariates. Interest centers on the latent variable η ij , and a geoadditive model is used to express η ij as the sum of a linear function of certain covariates, univariate functions of other covariates, a bivariate function of longitude and latitude, and error. As is typical of factor models, constraints are needed to achieve identifiability. The model is fit separately to summer and winter data. The authors performed a Bayesian analysis and use a Gibbs sampler with Metropolis-Hastings steps. The geoadditive model facilitates visual presentation of the results. There is an obvious non-linear day of the year effect on particulate pollution. Maps of median predicted log-BC show the distribution of mobile source pollution in the greater Boston area during the summer and winter seasons.

Time series of enterococcus counts in a harbor
Houseman, Coull and Shine [133] use semiparametric regression methods to develop a new type of model that was particularly suited for their application: enterococcus counts in Boston Harbor. A noteworthy aspect of their model is that it handles a non-stationary time series of counts. The aim of their research was to understand the effects of changes in sewage treatment that were initiated to improve water quality. The authors assume that counts, y t , are observed on a finite set of time points in an interval T , and they depend on random effects Q t and fixed covariate effects µ t so that y t |Q t is Poisson(Q t µ t ). Here the Q t are independent Gamma with shape and rate parameters both σ −1 and induce overdispersion, while µ t models covariate effects and time effects. Specifically, µ t is equal to exp{x T t β β β+f(t)} where x T t β β β is a parametric model for covariate effects, and f(t) is a nonparametric penalized spline model for possibly non-stationary time trends. Because the covariates are time dependent, this semiparametric model is non-identifiable without constraints that are carefully explained by the authors. The time-varying covariates includes four variables that characterized sewage flows, temperature, tide height, salinity, and a sinusoidal seasonal effect. This model is fit separately at each spatial location. Then the authors use a geoadditive fit (Kammann & Wand [140]) to create a spatial summary of the effects of major interest, those of flows from the Nut Island and Deer Island Treatment Plants.

Concurrency of events in on-line auctions
On-line auctioning is a relatively new and rich source of challenging statistical problems. Jank & Shmueli [136] investigate concurrency in on-line auctions. They define concurrency as the effect upon an event of the same or similar events at or near the same time. The on-line auction web-site eBay conducts hundreds of simultaneous auctions for similar items. In addition, eBay makes available information on auctions that have closed in the last 15 days. It is expected that both types of information will affect the final price of an item being auctioned. To study these effects, Jank & Shmueli [136] use the model y t = g AC (x t ) + g SC (x t )+g TC (x t:(t−1) )+ε t , where y t is the log-price of an item sold at time t, x t is covariate information available at time t, and x t:(t−1) is covariate information over the time period from t − 1 to t. Time is modeled continuously since a auction can close at any time. The three components of the model are: g AC , the "auction component"; g SC , the "spatial component"; and g TC , the "temporal component". In their example, they use the prices of laptop computers. The auction component is modeled linearly, but the other components are modeling nonparametrically. "Spatial" refers to a feature space where distance measures similarly between laptops in terms of their features, e.g., screen size, memory size, and presence of a DvD drive. Therefore, the authors estimate g SC using a radial penalized spline in 7-dimensional space. The temporal component requires a more complex model than the other two components. The covariates are the prices from time t − 1 to t and the features of those laptops. Various functions of the prices, e.g., mean, median, minimum, maximum, and slope of the time trend, are computed for laptops most similar, least similar, and of average similarity to the laptop sold at time t. The result is 18 time-lag variables which are reduced to three principal components. The temporal component is an additive spline model in these principal components. Jank & Shmueli [136] test their model on a hold-out sample of 30% of the laptops sold, with the remaining 70% used to train the model. They compared the performance of their model with that of linear parametric models and totally nonparametric models that use regression trees. They find that the nonparametric models outperform the linear models, but that their semiparametric model outperforms the nonparametric models.

Genomic-assisted prediction of genetic value
Gianola, Fernando & Stella [111] use a semiparametric model for the genetic value of single nucleotide polymorphisms (SNP) and other genetic markers. Let y i be a measurement, such as height of a plant or milk production of a cow, and let x i be a vector of dummy genetic marker variables, e.g., the indicators of the presence of SNP or microsatellite covariates. Gianola et al. [111] use the model where w i and z i are known incidence vectors, β β β is a vector of nuisance location effects, u is a q × 1 vector of additive genetic effects of q individuals, which are modeled as random effects, and g is an unknown function. They present several estimation methods for this model. One of these methods converts (12) into a mixed model by using a type of radial basis function model for g(x). Specifically, they follow Mallick, Ghosh & Ghosh [183] and use a reproducing kernel mixed model that assumes that where the α j are iid N (0, σ 2 α ) and h is a non-negative smoothing parameter. Their model has one knot at each x j , but it should be possible to use only a subset of these knots, say, chosen by a space-filling design. The authors use a simulation experiment to compare the reproducing kernel mixed model method with a parametric mixed model approach. They find that the two estimators have nearly equal performance when the parametric model holds and that the semiparametric method outperforms the parametric method when the linear model does not hold.

Carbon sequestration in agricultural soils
Sequestration of carbon in soils has the potential to reduce greenhouse gases. This was the motivation for a study by Breidt, Hsu, and Ogle [21] who use a semiparametric mixed model to compare carbon sequestering under no-till and traditional tillage. Their main conclusion is that more carbon is sequestered under no-tillage than under traditional tillage, especially in wet climates but also in dry climates. Their data come from soil cores. A core is divided into increments, e.g., from 0 to 15cm depth, and total carbon is measured in each increment. They use 63 paired (no-till versus traditional tillage) studies, with 211 increments in total. The boundaries of the increments varied from study to study, making increment-wise comparisons impossible. Therefore, the authors used a varying coefficient penalized spline model for the concentration of carbon as a function of depth, so that the total carbon in any increment is the integral of this function over the increment. These "instantaneous" carbon sequestration functions can be estimated from the increment data and then compared between the two types of tillage. A varying coefficient model of the instantaneous function is needed to accommodate the effects of covariates such as soil type, climate factors, and the number of years since change to no-till. More specially, the model for difference between no-till and traditional tillage in carbon concentration at soil depth t in the ith study is q ℓ=1 α ℓ (t)w iℓ (13) where α ℓ (t) is a spline, w ℓ,i is the value of the ℓth covariate in the ith study, and q is the number of covariates. Several choices of covariates are considered and the final choice was to use the indicator of wet climate and the number of years since the change in soil management. Two models are considered for covariance function within a core. The first has i.i.d. random intercepts, which implies a correlation matrix with compound symmetry. The second, which uses a non-homogeneous Ornstein-Uhlenbeck model, allowed heteroscedasticity and a more general type of correlation. The second model fit significantly better than the first and is used by the authors. By plotting the fitted models given by (13) for different values of the covariates, the authors show the effects of no-tillage. Under no-tillage, there is more sequestered carbon in the upper soil layers and less in the lower layers compared to traditional tillage, but the former effect is dominant so that overall more carbon is sequestered under no-tillage. This suggests that a change to no-till, which has a number of other advantages, also has the beneficial effect of reducing the amount of CO 2 in the atmosphere.

Time series of air pollution and mortality
In studies of the effects of air pollution on mortality, confounders that are unmeasured, and perhaps even unknown, can bias the estimates. To circumvent this problem, analysts often include in the model a smooth function f(t) of time (t) to capture the effects of confounders that vary smoothly in time. An example, the Milan study of air pollution and mortality, can be found in Ruppert et al. [232]. The technique of including f(t) in the model is given careful study by Peng, Dominici & Louis [216]. An issue of primary concern is selecting the degrees of freedom for estimation of f(t). Peng et al. [216] find that the estimator of f(t) should be undersmoothed to reduce the bias in the estimate of the effect of pollution, which is modeled linearly with coefficient β. This finding agrees with asymptotic theory for partially linear models (Rice [228]; Speckman [245]). The authors find that the method for selecting the degrees of freedom for f(t) that is most accurate for estimating β is to use GCV to find the degrees of freedom that best predicts the pollution series. Then ones estimates f(t) with the same degrees of freedom. The function f(t) can be estimated by either an ordinary least squares fit with a natural cubic spline basis or by a penalized spline. The later requires more degrees of freedom for f (t) to achieve approximate unbiasedness of β. Peng et al. [216] include an interesting example involving a 100-city study of the effect of suspended particulate matter on mortality. Data are available from 1987 to 2000. They use an over-dispersed Poisson model with a log link for the daily number of deaths. Known confounders are accounted for explicitly: there are age-specific intercepts, a day of week effect, and smooth functions of temperature and dewpoint. Particulate matter enters the model linearly and the estimate of its coefficient β is studied as the degrees of freedom for f(t) varies. The ordinary least squares natural cubic spline needs about 9 degrees of freedom per year before β stabilizes. For the penalized spline about 15 degrees of freedom are needed.

The cosmic microwave background
Genovese, Miller, Nichol, Arjunwadkar & Wasserman [107]) address an important problem in cosmology. They study the peaks in the temperature power spectrum of the cosmic microwave background radiation. Let y ℓ be the estimated spectrum at multipole index ℓ. The model is y ℓ = f(x ℓ ) + ǫ ℓ where x ℓ = ℓ/ max(ℓ). A parametric model for f has three peaks and the existence of the third (rightmost) peak would provide the clearest support for the existence of dark matter. The response y ℓ is highly heteroscedastic with its variance increasing rapidly in ℓ. This complicates inference, especially for higher values of x ℓ which is precisely where the third peak should be located. Genovese et al. [107]) estimate f by a truncated cosine expansion. To construct a uniform confidence set, they extend the methodology of Beran and Dümbgen [14] to accommodate the heteroscedasticity. The result is a 900-dimensional confidence ball which is, of course, difficult to visualize. To explore the ball, they create targeted "probes" which are functionals of interest. Using the probes they can, for example, find 95% confidence intervals for the heights and widths of the first two peaks. The nonparametric fit is compared with the so-called concordance model, which maximizes the joint likelihood under the parametric model of five independent data sets. The nonparametric fit does not have the third peak although the concordance model does, since the third peak is an intrinsic part of the parametric model. The lack of the third peak in the nonparametric fit does not mean that the third peak does not exist. Rather, more precise data would be needed in order to establish its existence. This paper is noteworthy both for addressing a very interesting scientific question and for its novel use of simultaneous inference.

Needle losses of Norway spruces
Augustin, Lang, Musio & von Wilpert [5] study needle loss which is an indicator of tree vitality. They work with survey data on Norway spruces (Picea abies) in the southwestern region of Germany. One novel aspect of the paper is that the response is ordered categorical. The categories are healthy, intermediate, or damaged, defined, respectively, as less than 10%, 10-25%, or more than 25% needle loss. Augustin et al. use a geoadditive model for a latent continuous variable U such that U = f 1 (x 1 ) + · · · + f P (x P ) + f spat (c 1 , c 2 ) + w T γ + ε where x 1 , . . . , x P are continuous covariates, (c 1 , c 2 ) is spatial location, w is a vector of covariates that enter linearly, and ε is N (0, 1). The categorical response is a discretized version of U with cutoffs θ 1 < θ 2 . Univariate P-splines are used to model f 1 , . . . , f P and a tensor product of B-splines to model f spat . Including f spat in the model accommodates unknown covariates, but also acts as a partial surrogate for known covariates and reduces the size of their effects. The analysis is Bayesian using MCMC. One important problem is prediction of needle loss at locations not covered by the surveys. The model can be used for prediction, but a complication is that some covariates are also unknown at these locations.
To circumvent this problem, the authors use a spatial model for these covariates and draw multiple imputations from their posterior distributions.

Capture-recapture studies
Mark-recapture studies are a common means of assessing animal abundances and survival probabilities. Frequently, survival probabilities depend on covariates. For example, Gimenez, Crainiceanu, Barbraud, Jenouvrier & Morgan [112] describe a case study where the survival probabilities of snow petrels nesting at Petrels Island, Terre Adélie, Antarctica, depend upon the Southern Oscillation Index (SOI). SOI is negatively related to temperature and can be used as an index of overall climate condition. Gimenez et al. [112] assume that there are I + 1 sampling occasions at times t 1 , . . . , t I+1 . They define φ i to be the probability that an animal survives to time t i+1 given that it is alive at time t i . The data consist of the number of animals captured, marked, and released at each sampling occasion and the number marked at time t i and recaptured for the first time at t j . The authors begin with the Cormack-Jolly-Seber model, which has among its parameters φ 1 , . . . , φ I . Then they use a semiparametric model with a logit link function for the dependence of φ i upon covariates. The nonparametric dependencies are modeled by splines. They propose a Bayesian analysis with computations by MCMC. In the snow petrel case study, they use WinBUGS. They find that survival probabilities of snow petrels decrease, possibly in a nonlinear way, with increasing values of the SOI. The estimated rate of decrease is high at low values of the SOI but diminishes at higher values of the SOI. Because the data are sparse, there is too much uncertainty to conclude that the effect of SOI is nonlinear. However, Gimenez et al. [112] note that a nonlinear effect of SOI is biologically plausible; lower values of SOI might increase access to prey but prey abundance may increase with higher values of SOI (Loeb et al. [176]).