Timescale effect estimation in time-series studies of air pollution and health: A Singular Spectrum Analysis approach

A wealth of epidemiological data suggests an association between mortality/morbidity from pulmonary and cardiovascular adverse events and air pollution, but uncertainty remains as to the extent implied by those associations although the abundance of the data. In this paper we describe an SSA (Singular Spectrum Analysis) based approach in order to decompose the time-series of particulate matter concentration into a set of exposure variables, each one representing a different timescale. We implement our methodology to investigate both acute and long-term effects of $PM_{10}$ exposure on morbidity from respiratory causes within the urban area of Bari, Italy.


Introduction
A wealth of epidemiological studies based on time-series analysis has shown evidence for association between morbidity/mortality caused by respiratory and cardiovascular adverse events and the exposure to airborne particles [3]. Suspended total particulate matter is a complex mixture of organic and inorganic substances in either liquid or solid phase. They can vary in size, composition and origin and can be characterized both physically and chemically. Particles with an aerodynamic diameter of less than 10 microns are referred to as PM 10 : they may be inhaled reaching the upper airways and the lungs, with risk for adverse effects on health.
Assuming counts data y t of daily adverse health event being distributed as conditionally independent Poisson given the rate ϕ t , a standard ecological Poisson regression model (which has been used, with minor variations, in most of large scale epidemiological studies [11]) is log (ϕ t ) = β 0 +β 1 PM 10,t +[DOW t ]+S (t, δ 1 )+S (temp t , δ 2 )+S (umr t , δ 3 ) (1. 1) where PM 10,t is measured in µg/m 3 , [DOW t ] is a six-dimensional vector of dummy variables pointing the day of the week, and S (t, δ 1 ) is a smooth term function of calendar time controlling for seasonality and other trends (the degree of roughness being controlled by the smoothing parameter δ 1 ); further smooth confounders include temperature (temp) measured in • C and relative humidity (umr) expressed as percentage (meteorological confounders may affect the pollution-morbidity association: see [26] for an interesting discussion).
Among the most recent results we may cite the MISA 2, a planned study over 15 Italian cities for the period 1996-2002 [4]. Updated city-specific estimates show an overall RR=1.005 (C.I.: 0.991-1.018 -estimate ±2 std. err.) per 10 µg/m 3 increase in PM 10 concentration for respiratory causes, with a similar RR=1.005 (C.I: 1.000-1.010) for cardiovascular causes. The estimated lag-3 days overall RR of hospitalization due to respiratory causes is 1.006 (C.I.: 1.002-1.011), while RR=1.003 (C.I.: 1.000-1.006) is estimated for cardiovascular adverse events. Another important and recent study is the European meta-analysis conducted by the Regional Office of World Health Organization (WHO) for Europe, based on 17 country-specific estimates [1]; the mortality RRs reported in the WHO-meta analysis are 1.009 (C.I.: 1.005-1.013) for cardiovascular causes and 1.013 (C.I.: 1.005-1.020) for respiratory ones (per 10 µg/m 3 increase in PM 10 ).
Despite this growing body of evidence, a considerable uncertainty remains to be seen: this begs the question of whether these associations represent premature mortality within only few days among those already near to death. Such a displacement (or harvesting) effect has been discussed by several authors after [19], and can complicate the interpretation of the results: a reasonable underlying hypothesis is that mortality/morbidity displacement is related to associations on shorter time scales, while longer time scales are supposed to be resistant to mortality displacement. If associations reflect only harvesting, from a publichealth point of view, the effect of air pollution on morbidity can be considered as having a limited impact. A first attempt to assess short and long-term effects was proposed in [28] by using Cleveland STL decomposition by means of LOESS smoothing algorithm to separate the time-series of daily deaths into long, intermediate and short (residual) timescale series. Similarly, [10] developed a methodology based on the Discrete Fourier Transform to obtain a set of orthogonal predictors at given timescales, by partitioning the base interval [0, π] into a given set of Fourier frequencies. Expanding time series of particulate matter concentration into a set of lagged exposure variables is a concurrent approach: a distributed-lag model was proposed in [36] by replacing the pollutant effect with a distributed lag-specification, each lag coefficient representing a specific contribution to exposure: cumulative effects on a given timescale can be obtained by summing up contributions for a given range of lags.
The aforementioned methods share a common drawback: suitable timescales need to be arranged by the researcher in advance, rather than being a natural result of the data analysis process. For example, [28] examines mid-scale components of the daily number of deaths with smoothing windows of 15, 30, 45 and 60 days: risk estimates are provided for each mid-scale window without attempting to provide a data based criterion to choose among diverse alternatives. Similarly, [10] estimate the association between air pollution and mortality using six fixed timescale: ≥ 60 days, 30-59 days, 14-29 days, 7-13 days, 3.5-6 days and < 3.5 days. In a word, current approaches to mortality displacement estimation do not provide automatic, data-driven methods to decompose air pollution time series into a set of suitable exposure variables, each one representing a different timescale. Improvements in this field would be greatly beneficial in public health time series studies. For this reason, we propose an alternative approach based on Singular Spectrum Analysis -SSA [14]. The word "spectrum" may be quite confusing here, since SSA is not derived from Fourier analysis, but it is an algorithmic technique rooted in dynamical system theory, linear algebra and multivariate geometry. SSA can be defined as a model-free approach to decompose a time series in easy-to-interpret components such as trend, harmonic intermediate components and pure noise (short scale residual). This task can be accomplished by exploiting a functional clustering algorithm based on a suitable metrics that allows a sensible grouping of more "elementary" components. No fixed timescales need to be known in advance in our novel approach: the proposed methodology is used to test the harvesting hypothesis on a dataset of residents in the city of Bari (Apulia, Italy).
The paper is structured as follows. Sec. 2 briefly reviews the data and the preprocessing methods used to deal with missing information and outliers; Sec. 3 gives a short introduction to SSA; the first part of Sec. 4 develops a functional clustering algorithm to group elementary components into interpretable exposure variables at several timescales; the second part of Sec. 4 describes timescale estimation by means of GAM models with integrated smoothing parameter selection, we reported also some results about our data set. Finally, Sec. 5 gives a brief discussion about the results and outlines future research opportunities.

Data description and pre-processing
Epidemiological data were obtained from Apulian Regional Epidemiological Center, concerning the daily time-series of hospitalized people among residents in the city of Bari between June 1th, 2000 and December 31th, 2001 (in total N = 579 days), diagnosed as suffering from pulmonary diseases (ICD-IX Classification: 460-519). Time series of particulate matter concentration and meteorological data were collected by a monitoring network subgroup of the Municipality of Bari (Department of Environmental Protection and Health), including four monitoring stations named "S. Nicola", "King", "Savoia", "Cavour" that collected information concerning a wide set of pollutants, such as Benzene, CO, NO, NO 2 , NO x , O 3 , and SO 2 . It is worth noticing that most time-series present a large number of missing data points.
Bi-hourly measurements of PM 10 were available on each of the four monitoring stations, whereas temperature and relative humidity were available on an hourly basis on "S. Nicola", "Savoia" and "Cavour" stations only. These data have been pre-processed in order to calculate an overall daily series for each variable (from midnight to midnight the day after). Preliminary data analysis concerned of the adjustment for most disturbing outliers attributable to temporary failures in monitoring devices. In fact, we computed a robust estimate of the covariance matrix of each multivariate time-series by using the Minimum Covariance Determinant (MCD) estimator (see [25] for details: the robust MCD estimator requires much less data for reliable results than the classical covariance matrix estimator, and gives more interpretable results as extreme values are well isolated.). We set an empirical rule by removing the five multivariate observations that showed the highest Mahalanobis distance (from the barycentre) based on MCD estimate. Fig. 1 shows some details for the PM 10 series : most of the bi-hourly data are considered as to be missing, as the MCD estimation algorithm removes forcibly all the rows containing at least one missing datum.
For each monitoring station we recovered daily measures by averaging the twenty-four hourly observations (the mean of the twelve two-hours observations in the case of PM 10 ) when at least 75% of one-hour observations were available (this criterion is compliant with APHEA protocol, [2]); otherwise, the daily datum was considered missing. After averaging over time, there were still a lot of missing values in the PM 10 series, as it is shown in the right panel of Fig. 2. The left panel contains the daily time series of hospitalizations.
Some exploratory statistics obtained after daily averaging are reported in Tab. 1: it is readily apparent that missing data in the PM 10 series is about 60% in the "King" station. Comparable values were observed for the other monitoring stations; temperature and relative humidity were far less difficult to analyze. In order to obtain an overall daily measure for each variable, we computed a spatial average of each daily time-series. This synthesis of information is mainly based on the assumption of constant exposure over the whole urban area. This is a reasonable assumption for temperature and relative humidity, whereas for the daily PM 10 series this can be justified on the ground that correlation coefficients between the measurements ranged from 0.66 to 0.87. Some synthetic values of the reconstructed series can be found in Tab. 2; it is worth noticing the dramatic reduction in number of the missing points in the PM 10 series. There were still a few missing values in correspondence of those days for which the four PM 10 measurements were not available. In this case, 15days causal moving averages was used for imputing missing values. The ultimate result of this information filtering and retrieval work is shown in Fig. 3.

Singular Spectrum Analysis
In this section a brief review of Singular Spectrum Analysis (SSA) is provided: detailed description can be found in [13; 14]. We denote the daily PM 10,t timeseries by {x t } N −1 t=0 : SSA relies on the Karhunen-Loève decomposition of a covariance matrix estimate of K lagged copies of the original time series [29; 34; 35].
Let L < [N/2] be a fixed integer called window length and introduce K = t=0 into a lower-dimensional manifold than the original phase space, by transforming it into the following L-trajectory matrix Some celebrated results in dynamic system theory (a good reference is [5]) ensure that all the qualitative (topological) characteristics of the phase space are preserved after such a dimensionality reduction process. It is worth noticing that every L-trajectory matrix is Hankel, i.e. its entries coincide along the secondary matrix diagonals such that i + j = s for 2 ≤ s ≤ L + K, and vice versa every L × K Hankel matrix is the L-trajectory matrix of some time series.
In the second stage, further information collapsing is carried out by computing the eigenvalues λ i of the L × L matrix S = XX T . Let d = rank (X) = rank XX T be the number of nonzero eigenvalues of the matrix S: if d < L then λ 1 ≥ λ 2 ≥ . . . ≥ λ d > 0 and λ d+1 = 0 for all other eigenvalues with indexes larger than d. The trajectory matrix can be decomposed into its Singular Value Decomposition (SVD, [12]) is a system of orthonormal eigenvectors associated to nonzero eigenvalues of X and v i = X T u i / √ λ i . Hence, the trajectory matrix is decomposed into a sum of elementary rank-one, pairwise bi-orthogonal matrices. The collection √ λ i , u i , v i will be referred to as the ith eigentriple of the SVD (3.1).
In the third phase, SSA attempts to reconstruct components {x ℓt } N −1 t=0 such that the original time series can be decomposed into the sum of p time-series which can have meaningful interpretations. Reconstruction of such components requires a suitable grouping of the set of indexes I = {1, . . . , d} into p disjoint subsets I 1 , . . . , I ℓ , . . . , I p with I ℓ = {ℓ 1 , . . . , ℓ n ℓ }, such that the SVD expansion can be reformulated as where X I ℓ = X ℓ1 +. . .+X ℓn ℓ . If all the matrices on the right hand-side of (3.3) are Hankel, then they are L-trajectory matrices from which components on different timescales can be easily reconstructed. Nevertheless, this situation rarely occurs in practice: in most real data sets no sensible grouping can be found such that X I1 , . . . , X Ip are L-trajectory matrices. Last SSA phase consists of Hankelization or diagonal averaging of resultant matrices X I1 , . . . , X Ip . Let Y be any L × K matrix with elements y ij , ij the squared Frobenius-Perron norm of the matrix Y . Hankelization is carried out by means of a linear operator H acting on the space of L × K matrices: the resulting matrix Z = HY with elements z ij is defined as follows (s = i + j) where y ⋆ ij = y ij if L < K and y ⋆ ij = y ji otherwise. It can be proved that Z is Hankel and thus it is the L-trajectory matrix of some time series. It is also the best approximation to Y in the sense of Frobenius-Perron norm [6]: if M L×K is the space of real L × K matrices, and M (H) L×K is the linear subspace of Hankel L × K matrices, then ||Y − Z|| 2 is minimum, so that it readily follows that L×K is an orthogonal projector operator and HX = X. By applying diagonal averaging to both members of (3.3) every resultant matrix X I1 , . . . , X Ip produces an L-trajectory matrix, from which the decomposition (3.2) can be easily recovered.

Reconstruction of components
We can consider the lag-covariance matrix C = S/K instead of S for obtaining the singular values. C is a sort of non-centered covariance matrix among columns of X (the L-lagged vectors), and its use is justified by the fact that from Cu i = λ C i u i it follows that Su i = Kλ C i u i . The latter expression shows that the orthonormal system {u i } is unaffected by the choice of C, and the only difference in the SVD of X lies in the magnitude of the corresponding eigenvalues (they are K times larger for S). This fact simplifies the visual mining of component series. We set L = 60 for the window length, as adverse effects at timescales longer than two months are likely to be confounded with long-term effects due to other causes. The left panel of Fig. 4 shows spectrum of each singular value. Singular values are very close to each other, except for a large dominant value which prompts for a long-period slow-varying component (trend), and several plateaux that correspond to shorter period oscillatory components or pure noise. The right panel shows the degree of approximation of each single component of the SVD of X: it can be proved that ||X|| 2 M = p i=1 λ i and that X − X I ℓ for I ℓ = {ℓ 1 , . . . , ℓ n ℓ } ⊆ {1, . . . , d} is the SVD of X I with I  = {1, . . . , d} /I ℓ , hence a measure of the degree of approximation referred to as eigenvalues share of the eigentriples with indexes I ℓ = {ℓ 1 , . . . , ℓ n ℓ } can be defined in the following way The largest eigenvalue ( √ λ 1 = 357.85) accounts for about 88% of the total variability: this fact strengthens the belief that the corresponding eigentriple can be identified with slowly-varying component (trend), whereas the individuation of further components demands for a more elaborate algorithm.
A suitable decomposition of the PM 10 series can be determined by modifying the four-step algorithm described in the previous section. We suggest to apply Hankelization to each term in the full SVD decomposition x ij is the inner-product compatible with the Frobenius-Perron norm. Additionally (see [14], page. 258), it can be proved that since X is an Hankel matrix. Hence it is the L-trajectory matrix of some component time series: this condition will be referred to as weak L-separability.
By joining elementary hankelized components having minimum distance in terms of weak L-separability, we often obtain a sensible grouping (3.3) whose component matrices are as close as possible (in the sense of the Frobenius-Perron distance) to Hankel matrices, hence amenable to a suitable interpretation after the diagonal averaging step. A sensible measure of weak L-separability between components ℓ and  is the w-correlation Periods of the reconstructed components are not fixed in advance: therefore, the periodogram analysis of each reconstructed component may help us to estimate the corresponding timescales. More sophisticated approaches include non-parametric or parametric ARMA-based spectral density estimation [7; 8], which are both based on a suitable smoothing of the Fast Fourier Transform output, and require considerable skill for a proper application. For this reason, we suggest a faster and heuristic approach to period estimation. Period is defined as the amount of time necessary to complete a cycle: hence the number of turning peaks (local maxima) observed during the N = 579 days will be a rough estimate of the frequency (the number of cycles in the unit time). According to our definition, an approximate period estimate is given by the inverse of this quantity Π (G i ) = Number of days in the observation period Number of peaks in the ith reconstructed component Exploiting this simple estimator we found Π (G 1 ) ≃ 27.57, Π (G 2 ) ≃ 7.24, Π (G 3 ) ≃ 3.97, Π (G 4 ) ≃ 2.84, Π (G 5 ) ≃ 2.46: "day" is the natural measurement unit.
It is not surprising that some redundancies may arise: for example, estimated timescales Π (G 4 ) and Π (G 5 ) are almost identical. This result can be explained by noting that there does not exist a simple one-by-one correspondence between weak L-separability and the frequency content of the elementary series which are joined together by our functional clustering algorithm. For this reason, if [Π (G ℓ )] = [Π (G j )] we shall say that reconstructed components ℓ and j are non-identifiable (by [·] we denote the integer part function): in this case we prescribe that the corresponding elementary series should be merged into a single group, and a new component should be reconstructed by diagonal averaging.
This criterion seems to be reasonable for the application at hand, but it may be inappropriate in other contexts. For example, an orthogonal design is based on a set of orthogonal exposure variables: unfortunately w-orthogonality does not imply classical (Euclidean) orthogonality, with the result that reconstructed components are not pairwise orthogonal. In fact only elementary components in the right-hand side of (3.1) have this property, being the principal components of the L-trajectory matrix. In this case, a suitable additional constraint to remove non-identifiability would be R G ℓ ,Gj < ε, which consists in merging reconstructed series ℓ and j for which the Pearson correlation coefficient R G ℓ ,Gj is below a predetermined tolerance ε.
In our case components G 4 and G 5 are non-identifiable according to our definition and R G4,G5 ≃ 0.19. Therefore, a new grouping into p = 4 components can be obtained by merging elementary series formerly classified into groups G 4 and The new reconstructed components are shown in Fig. 6: looking at the first two panels, a remarkable overfitting of the longest period waveform is apparent. We did not explore this feature, as prediction or change-point detection are not the objectives of our analysis. We propose the following interpretation of the estimated series: Of course, a main feature of SSA is that it encompasses an automated datadriven approach for decomposing the time series into timescale components: reconstructed short-period series can be used for testing the mortality displacement hypothesis. In addition, long-period air pollution effects can be estimated in correspondence of exposure variables that can be easily interpretable, and that vary at timescales of scientific interest (such as seasonality and trend). We address this issue in the next section.

Timescale estimation
A GAM model accounting for the exposure variables at given timescales can be formulated as where h t is a shortcut for the intercept, the dummy vector [DOW t ] and the smooth components, while {x ℓt } N −1 ℓ=0 is a suitable set of exposure variables chosen to account for timescale effects. Our functional clustering method can be considered as a dimensionality reduction algorithm that allows for parametrization of model (4.2) in term of a small number of waveforms x ℓt ; other methods, such as standard principal components analysis (PCA) and independent component analysis (ICA, [20]) assume mathematically convenient constraints (orthogonality and independence) but have no meaningful justification as they cannot decompose the original data into a set of exposure variables and complicate the interpretation of components.
The initial SSA decomposition can be modified on the ground of a careful consideration of both estimated timescales and singular value amplitudes (see new.5 = G new.4 . We have Π(G  Of course, other answers may be sensible: a trend plus season solution can be estimated by widening G Although the overall number of suitable alternative decompositions is not large, a subjective adjustment seems to be still required to obtain a correct and easy-to-interpret decomposition. A simple procedure for forward traversing the space of all sensible decomposition can rely on estimating the statistical goodness of fit of each set of exposure variables entering the GAM model (4.2), after adjusting for the increasing complexity by means of a suitable penalty: data-driven model choice is certainly appealing since the interest is focused on the relationship between air pollution and morbidity. A computationally feasible solution is the UBRE criterion -a rescaled version of the AIC statistics, see [16], pg. 160 -based on the minimization of the expected mean squared error (details are discussed in [32] and [33]): for n independent observations from an exponential family with scale parameter φ (in the Poisson case φ = 1) the UBRE has the following expression where n ℓ=1 D (y ℓ ;μ ℓ ) is the log-likelihood ratio statistics (Deviance) for the current model, R is the weighted linear operator that produces estimates of the adjusted dependent variable at each step of the GAM local scoring estimation algorithm, and tr (R) represents the overall effective degrees of freedom of the model (see [32] for details). It can be proved that UBRE is a simplified version of a generalized cross-validation (GCV) criterion that is valid when the scale parameter is not known; in addition, it is very similar to the GCV-PM 10 criterion introduced in [22].
Another benefit of automatic model selection concerns the choice of degrees of freedom (dfs), associated with smooth terms entering model (4.2) to adjust for temporal unmeasured confounders and meteorological variables. Each smooth term is a natural cubic spline, which can be derived from a conditional minimization problem with respect to coefficients of a suitable function basis, given a quadratic penalty depending on a symmetric smoothing matrix S (δ j ) [17]: for fixed values of the smoothing parameters δ j , parameter estimation of the GAM model (4.2) is easily shown to be equivalent to a conditional minimization problem with multiple quadratic penalties [33]. It is usually difficult to decide the amount of smoothing that is allowed for smooth functions in model (4.2): fixing each df to a pre-specified value may be sometimes justified on the ground of biological knowledge and of the problem peculiarities, although it is likely to invalidate the reproducibility of epidemiological findings [23]. For example [9] estimate a model very similar to (1.1) (except for the exclusion of the relative humidity term) by setting δ 1 = 7, δ 2 = 8: an identical model is estimated in [3] by choosing δ 1 = 7 × years −1 (i.e. δ 1 = 3.5 for a two years observation period) and δ 2 = 6. We know that data-driven model choice is a necessary task in particulate matters time-series studies, even if the number of candidate models may be prohibitive: when k smoothing terms are allowed into the model (as in our case), simultaneous fixed parameter estimation and smoothing parameter selection based on UBRE minimization over a k-dimensional space were computationally infeasible until the methods reported in [30; 31] were recently introduced. The related software, the R mgcv::gam V.1.3-29 library has been used for model estimation and selection throughout this section.

Results and some comparisons
To test the harvesting hypothesis we assumed that model (4.2) holds: evidence for short-term effects may reflect increased recruitment into the frail pool caused by air pollution (and not by a former disease condition): Table 3 shows effect estimates and approximate p-values for each predictor entering a model based on the G new decomposition. The estimated degrees of freedom of each smooth term (edf) are inversely related to the smoothing parameter estimates: in order for a GAM to be identifiable each smooth evaluated at its covariate values should sum to zero ( [16] refers to this as 'centering' the smoothing). This identifiability constraint removes one degree of freedom, thus the inferior limit for any smooth which is not curved at all (a straight line) is 1 (at variance with bounds on degrees of freedom for free smoothing splines, ranging between 2, a straight line, and +∞, a perfectly interpolating spline).
Conditionally on G new we found little evidence for mortality displacement, as well as there were no associations present on longer timescales. We investigated the sensitivity of our results with respect to the chosen decomposition: the model based on the G (2) new set provided comparable global scores (UBRE=0.3618 and deviance explained equal to 51.6%). The only significant effect occurred at G (2) new.1 (adjusted RR=1.02 with C.I.: 1.001-1.039), although a four-month association between morbidity and air pollution is quite unrealistic and it may be likely due to data-snooping.
When G new was used, global model scores showed a significant decrease (approximate Anova tests comparing the current model with the previous ones were both significant). Results are shown in Table 4: the smooth term controlling for unmeasured temporal confounders was significant, as well as significant effects occurred at G new.1 timescale (corresponding to a relative-risk decrease) may be associated with a pool of healthy individuals which are still healthy two months after exposure to air pollution. The largest effect occurred at the timescale of one month (RR=1.00, Table 4 Effect estimates, approximate significance of smooth terms and global model scores: the decomposition of PM 10 used here is G C.I.=1.001-1.007), which corresponds to a relative-risk increase of about 4% per 10 µg/m 3 increase of PM 10 concentration. These findings are quite comparable to those of [10], which reported (by means of their FFT-based decomposition method applied to both cardiovascular and respiratory causes in four US cities) larger effects at longer timescales from 14 days up to 2 months. We applied the Dominici's methodology to our PM 10 data for a direct comparison: suitable R software (the decompose() function described in [10]) was used to generate two new sets of exposure variables. For the first one, the "breaks" in the decompose function have been set to 1, 19, 41, 83, 165, 579 days: except for the first and the last, these values are defined as 579/r where r = 30, 14, 7, 3.5. This choice was used as a standard in [10], and it generated quite equivalent waveforms to those obtained by the SSA G (3) new -based decomposition. According to [10] the interpretation of such timescales is: more than 30 days (timescale 1, trend plus large scale periodicity, see Fig. 8), 14-30 days (timescale 2) and so on down to 1-3.5 days (timescale 5). A second decomposition was generated by setting breaks equal to 1, 21, 80, 146, 220, 579; except for the first and the last these were obtained by taking r = 27.57, 7.24, 3.97, 2.63. The three decompositions (in a word SSA, FFT-A and FFT-B) are shown side by side in Fig. 8: surprisingly, no significant effects occurred conditionally on FFT-A (see Table 5), an impractical result which was confirmed by the poor global model score performance (UBRE=0.3608). Identical results occurred for FFT-B (UBRE=0.36048, deviance explained 51.7%, no significant effects): therefore, it must be stressed that SSA decomposition provides a better explanation of the data, the number of predictors entering the model being equal.  new.j ) for j = 2, 3, 4, 5).
Most of the literature report little evidence for mortality displacement due to PM 10 : our findings reinforce the conclusion that the increased chronic morbidity risks associated with even small increase in PM 10 are well established, although these increases are not greater for susceptible populations. In concordance with our results, several authors report a lag of about four weeks between the pollutant exposure and an increase of the mortality/morbidity risk: for example, [21] reported significant effects on cardiovascular-respiratory mortality in Sydney, Australia, at longer timescales (one month or more). Similarly, [36] assumed a distributed lag model for mortality due to natural causes in Milan, Italy, including lags up to 45 days: the authors reported an estimated total suspended particulate relative risk RR=1.037 (C.I.: 1.019-1.056) for the first 15 days, close to zero for lags between 16 and 30, and RR=1.027 (C.I.: 1.019-1.56) for lags up to 45 days.

Discussion and conclusions
A great deal of uncertainty exists about the extent of life-shortening or the increase in morbidity due to the effects of air pollution. A limited amount of results from particulate matter time series studies suggests that the increased morbidity/mortality risk is not greater for susceptible populations. If a mortality/morbidity displacement (harvesting) effect is evident only a few days after exposure, then relevance of the findings of the daily time-series studies could be questioned, as adverse health effects might be arguably attributed only to the low quality of frail individuals at risk. To test and estimate the pattern of mortality displacement, we proposed a statistical framework based on Singular Spectrum Analysis (SSA), a geometric technique derived from dynamical system theory, suitable for constructing easily interpretable exposure variables at several timescales. We believe that our method is superior from a practical point of view than FFT-based methods: the decomposition of the original series can be seen as a part of a data-driven process, and Fourier frequencies do not need to be fixed in advance. The only free parameter is the window length L. The problem at hand can suggest a sensible value, for the reason that the main (and only) rule of thumb for stationary series containing multiple harmonic frequencies prescribes that periods larger than L are confused with long-term effects.
Promising theoretical developments are reported in [15], where SSA is extended to deal with missing data: the issue of missing information is ubiquitous in meteorologic and pollutant time-series, and the power of such newer data pre-processing methods has still to be established. Some subjective adjustments are still needed during the grouping phase. In particular, a correct separation of the dominant period (trend) from sub-harmonic frequencies merged with the sub-dominant group is a critical phase: we believe that the data-adaptive wavelet-based method introduced in [27] is the correct route towards a fully automated data analysis in multi-scale public health time-series studies. More efficient functional clustering algorithms will be needed too: for example, the shape-based curve clustering procedure described in [18] is very promising.