The West African Monsoon Onset: A Concise Comparison of Definitions

The onset of the West African monsoon (WAM) marks a vital time for local and regional stakeholders. While the seasonal progression of monsoon winds and the related migration of precipitation from the Guinea Coast toward the Sudan/Sahel is apparent, there exist contrasting man-made definitions of what the WAM onset means. Broadly speaking, onset can be analyzed regionally, locally, or over a designated intermediate scale. There are at least 18 distinct definitions of the WAM onset in publication, with little work done on comparing observed onset from different definitions or comparing onset realizations across different datasets and resolutions. Here, nine definitions have been calculated using multiple datasets of different metrics at different resolutions. It is found that mean regional onset dates are consistent across multiple datasets and different definitions. There is low interannual variability in regional onset, suggesting that regional seasonal forecasting of the onset provides few benefits over climatology. In contrast, local onsets show high spatial, interannual, and interdefinition variability. Furthermore, it is found that there is little correlation between local onset dates and regional onset dates across West Africa, implying a disharmony between regional measures of onset and the experience on a local scale. The results of this study show that evaluation of seasonal monsoon onset forecasts is far from straightforward. Given a seasonal forecasting model, it is possible to simultaneously have a good and a bad prediction of monsoon onset simply through selection of the onset definition and observational dataset used for comparison.


Introduction
The annual advance of the West African monsoon (WAM) is a pivotal period for the inhabitants of the region. Because of a lack of irrigation, most farmers in West Africa are dependent on monsoon rains for sustainable crop farming (Ingram et al. 2002;Sultan et al. 2005a).
Furthermore, the onset of rains affects health organization operations, particularly for preparation and prevention of dengue fever and malaria epidemics and mapping the seasonal reduction in meningitis over West Africa (Molesworth et al. 2003;Sultan et al. 2005b;Mera et al. 2014). For meteorologists, the progression of the monsoon system into the Sahel marks a shift in regional climate dynamics signifying the beginning of an active period for the formation of westward-propagating mesoscale convective systems and African easterly waves (AEW; Fink and Reiner 2003), which can transform into tropical storms in the Atlantic Ocean (Berry and Thorncroft 2005).
During boreal spring and early summer, warm sea surface temperatures over the Gulf of Guinea (GoG) and eastern equatorial Atlantic (58S-58N, 108W-108E) lead to an increase in atmospheric water content. The rapid cooling of the sea surface around 58S during April and May leads to the increase in coastal rainfall around 58-78N; this marks the beginning of the coastal onset of the WAM (Okumura and Xie 2004;Hagos and Cook 2009;Thorncroft et al. 2011;Nguyen et al. 2011;Brandt et al. 2011;Caniaux et al. 2011;Liu et al. 2012). Precipitation is maximized at this time near 58N. Because of the strengthening of the north-south pressure gradient during this time (with high pressure near the coast and low pressure toward the Sahara), moist, cool, southwesterly winds protrude from the Gulf of Guinea and the Eastern Atlantic inland toward the Sudanian region. At the Intertropical Front (ITF), these winds meet the drier, warmer, northeasterly winds emanating from farther inland. The location of the ITF establishes the northernmost location of seasonal rains, and its passage marks the beginning of the local rainy season (Dettwiller 1965;Sultan and Janicot 2003;Lélé and Lamb 2010). The location of maximum deep convection and heavy precipitation, sometimes referred to as the intertropical convergence zone (ITCZ) or the tropical rain belt, is located approximately 400 km to the south of the ITF Lélé and Lamb 2010). As the heating of the Sahara intensifies, increasing the magnitude of the Saharan heat low (SHL) the pressure gradient between the coastal region and the Sahara maximizes (Roca et al. 2005;Lavaysse et al. 2009;Messager et al. 2010;Janicot et al. 2011). For reasons not yet fully understood, convection is suppressed at 58N, followed swiftly by a rapid shift of the ITCZ toward a new quasi-stationary location in the Sudanian region around 108N across the monsoon region (typically defined as the longitude range 108W-108E). The northward shift of the ITCZ marks the regional onset of the WAM (Sultan and Janicot 2003;Gu and Adler 2004;Fontaine and Louvet 2006;Hagos and Cook 2007;Fontaine et al. 2008;Gazeaux et al. 2011). The timing of this shift appears to be reasonably consistent with a standard deviation of eight days (Sultan and Janicot 2003). One important result from previous work is that the timing of the start of the local rainy season and that of the ITCZ shift are very weakly correlated (0.01) (Sultan and Janicot 2003).
While there is a clear overarching structure to the monsoon progression over West Africa, the concept of monsoon onset is more fluid and open to interpretation and modification for end user needs. Onset definitions can be broadly classified as local or regional, based on the scale on which they are applied. The majority of regional onsets identify a sudden northward shift (or jump) of the ITCZ (or closely linked variant) from near the Guinea Coast into the Sahel (Table 1). It is important to note that the abrupt jump of the monsoon system may not be a reliable feature for forecasting every year. Gazeaux et al. (2011) purposely looked for a statistical switch point in monsoon dynamics representative of the jump, but found no such feature present in 25% of studied years. Currently, it is not fully understood why in some years regional onset is more gradual than in other years.
Local onset definitions, by contrast, predominantly focus on precipitation totals at single locations and provide a more specific onset measure for local stakeholders (see Table 2). Because of the inherently noisy nature of local precipitation [alluded to in Maloney and Shaman (2008), for example], these onset definitions often have low spatial and interannual homogeneity, and their patterns have not been explored extensively. Nevertheless, the onset dates provided by these definitions could be far more suitable for individuals and local agencies operating in West Africa.
Onset can be diagnosed either from satellite or local in situ observations, such as rain gauges. Satellite data are generally more complete but may show substantial biases (e.g., Tompkins and Adebiyi 2012). Rain gauges are usually a more accurate representation of localized rainfall, but because of the inhomogeneity of rainfall, may not be representative over scales of more than 50-100 km. Furthermore the coverage of in situ observations across West Africa is not spatially consistent, and there exist many regions in which observations are very sparse (Ali et al. 2005;Roca et al. 2010).
There are few intercomparisons of the multiple onset definitions that have been proposed [exceptions include Ati et al. (2002); Fontaine et al. (2008); Gazeaux et al. (2011);]. In addition, to the best of our knowledge, there is no study that systematically and objectively compares WAM onset definitions realized over different scales using multiple datasets. Without a systematic overview of onset definitions and evaluation of dataset variability on onset, a complete and reliable study of onset forecasting skill is not possible. We offer here an evaluation of onset calculation methods and recommend pragmatic local and regional onset definitions for forecasters.
The purpose of this article is to provide an overview of existing definitions of the WAM onset and examine the variability of onset date found using different definitions and datasets. It is not the aim of this paper to evaluate the triggers for local and regional onset, nor to evaluate the predictability of onsets. This is left for future studies.
Section 2 provides an outline of published onset definitions, with section 3 providing a brief overview of  Barbé et al. (2002) (LeB_Reg) Daily precipitation data Rain gauge data taken from the Centre Regional Agrometeorologie-Hydrologie-Meteorologie (AGRHYMET) dataset 1950-90. Onset date inferred from an appreciable jump in the maximum of zonally averaged (108W-108E) 10-day smoothed daily rain event occurrence from 58 to 108N. Sultan and Janicot (2003) (Huffman et al. 2007).
Motivated by Sijikumar et al. (2006); onset occurs during the first pentad rainfall in a predefined Sahelian index that is greater than for a Guinean index for at least five consecutive pentads.
where D is the number of days in the first month with P . 50.8 mm (MER), R is rainfall in MER, and F is the total rainfall in previous months. Kowal and Knabe (1972) as analyzed in Ati et al. (2002) Same data as in Walter (1967). The first 10-day period in which rainfall exceeds 25 mm and the subsequent 10-day rainfall total is greater than 0.5 of the potential evapotranspiration. Benoit (1977) as analyzed in Ati et al. (2002) Same data as in Walter (1967). Date when accumulated daily rainfall exceeds 0.5 of the accumulated potential evapotranspiration for the remainder of the season, provided this date is not immediately followed by a 5-day dry spell. Olaniran (1983) as analyzed in Ati et al. (2002) Same data as in Walter (1967 . 50.8 mm, F x is the rainfall of month following MER, and R x is the rainfall of the second month following MER. Sivakumar (1988) as analyzed in Ati et al. (2002) Same data as in Walter (1967). First date after 1 May when 3-day accumulated rainfall exceeds 20 mm with no 7-day dry spell in subsequent 30 days. Omotosho et al. (2000) (Omo); see also Omotosho 1990 Rain gauge station data; data used for years between 1971 and 1997 but not continuous. The first 3 or 4 rainy days (.10 mm) to occur with not more than 7 days between them.
Here, we adapt the method found in the original work to prescribe a specific onset date; this method can be modified to be studied either regionally or semilocally. Zeng and Lu (2004) (semilocal) Global daily precipitable water data 1988-97 (Randel et al. 1996). Taking a grid cell within the West African region, and its eight direct neighboring grid cells, onset occurs at the first date when the normalized precipitable water index (a measure of current precipitable water values against historical values for the last 10 years) exceeds the golden ration (0.618) in seven of the nine locations examined. Marteau et al. (2009)  . 20 mm with no 7-day period of total rainfall less than 5 mm in the succeeding 20 days. Vellinga et al. (2012) GPCP daily data at 18 3 18 resolution for period 1979-2010 (Adler et al. 2003;Xie et al. 2003) and TRMM v6 3-hourly data at 0.258 3 0.258 resolution for period 1998-2010 (Huffman et al. 2007). The date before which a given percentage of climatological annual rainfall has fallen (20% given as example).

Yamada et al. (2013) (Yam)
Atmosphere general circulation model study (Numaguti et al. 1997). Date when 6-day average rainfall exceeds 2 mm day 21 . datasets chosen for analysis. Sections 4 and 5 analyze specific regional and local onset definitions, with section 6 offering recommendations for future work.

a. Overview of published onset definitions
There exist at least 18 definitions of the WAM onset explicitly defined or inferable from literature (this excludes local definitions used by nonacademic stakeholders, such as local farmers with traditional methods). The definitions can be grouped by the scale of onset considered. Regional definitions are classed as definitions of monsoon onset on a supranational scale. Local onset definitions work with data from a single rain gauge or grid cell with no additional spatial information. Any onset definition that exists between these scales is termed as semilocal. Table 1 shows a list of published regional onset definitions; all onset definitions have been taken as written in their initial publication, including any wording that can be open to interpretation. The regional definitions employ a variety of data: precipitation (e.g., Sultan and Janicot 2003), outgoing longwave radiation (OLR, such as Fontaine et al. 2008), moist static energy (MSE, as found in Vellinga et al. 2012), zonal winds (Nguyen et al. 2014), and daily rainfall event frequency (Le Barbé et al. 2002), among others. Certain regional definitions include a level of subjectivity (e.g., Sultan and Janicot 2003;Fontaine et al. 2008). These definitions require relative changes in observable features across the monsoon region, as opposed to specific metric thresholds being met (a common feature of local onsets).
Whereas regional onset definitions focus on the largescale migration and modulation of dynamics over the West African region, local onset definitions almost exclusively define monsoon onset as a given threshold of local precipitation (Table 2). Local onset definitions based on precipitation data at one grid cell have an inherent ''ignorance'' of rainfall from any neighboring locations. The separation of local and regional onsets in literature betrays a disconnection between local stakeholders and global climate model work; it is important to appreciate the difference between local and regional WAM onset to understand the effects of prediction on end users.
Onset dates with explicit thresholds (Kowal and Knabe 1972;Sivakumar 1988;Omotosho et al. 2000;Marteau et al. 2009;Yamada et al. 2013) carry the risk of dataset and resolution bias and may not be applicable across the whole of West Africa. For agronomic definitions, it is difficult to ascertain the risk/reward of flexible threshold values without further investigation, as chosen thresholds relate directly to crop yield information.
Farther northward into the Sahel, the likelihood of precipitation reaching a certain level is lower than for more southern locations. Local stakeholders within these regions have therefore adapted and constructed local onset indicators for their specific purposes that may not reflect the thresholds often found in literature. This is not a failing of onset definitions in publication, as most local definitions are created for a specific study region (Ati et al. 2002;Marteau et al. 2009).
Overreliance on one particular definition for monsoon onset can lead to unintended bias in forecasting (Vellinga et al. 2012). Recent works suggest that using a combination of several different local and regional onset definitions with different observed metrics is preferable (Fontaine et al. 2008;Gazeaux et al. 2011;Vellinga et al. 2012).

b. Comparison of definitions
Five regional definitions, three local definitions, and one semilocal definition have been chosen. The regional definitions were taken from: Le Barbé et al. (2002) (LeB_Reg), three realizations of Sultan and Janicot (2003) Tables 1 and 2 are not sufficient and provide comments on each definition used. Where definitions are self-evident discussion is left for sections 4 and 5.
The most prevalent onset definition in literature comes from Sultan and Janicot (2003). There exists a level of subjectivity within specific calculation of onset date using this definition (this is also true for Font and LeB_Reg). In order to determine the level of influence this subjectivity has on onset observation, three realizations of the original definition have been produced. We compare the verbatim definition from Sultan and Janicot (2003) (called SJ here) to two objective modifications (denoted SJ_10 and SJ_15). Full descriptions of these and comparisons can be found in section 4a.
Le Barbé et al. (2002) infer daily rainfall event frequency data from daily rainfall totals. Their work highlights an appreciable jump (or abrupt northward surge) of the time-smoothed maximum daily rain event frequency in composites from 1950 to 1990 from 58N to 108N during late June/early July. This result allowed for formation of a regional onset definition, LeB_Reg, influenced by the original work. In our definition, a rain event is any precipitation event of at least 1 mm that lasts for at least 3 h. It is possible for there to be more than one rain event on a given day or for a rain event to continue overnight. For the latter of these situations, the rain event has been credited to the day on which the initial rainfall occurred. A semilocalized equivalent of LeB_Reg, LeB_Loc, has also been produced. LeB_Loc has been constructed in the same way as LeB_Reg for 18 longitude ranges, as opposed to zonal averaging across 108W-108E. This allows for examination of longitudinal variability across the monsoon region within a given onset definition and a direct comparison of one definition across two scales of observations.
The inclusion of the regional definition Font in our analysis allows for a contrast between calculating onset using inferred and observable satellite data. While precipitation is the most key factor for local stakeholders in the region, it is not possible to observe precipitation from space (as satellites cannot see through clouds). Therefore, there is always an inherent risk with precipitation data that totals calculated are not guaranteed to be representative of true rainfall on the ground (though most satellite data products are very good; see Roca et al. 2010). Instead, by measuring onset using cold cloud-top coverage, onset is not subject to potential observational bias (however, the intensity cold cloudtop temperature is not necessarily indicative of precipitation rate). Both Font and SJ focus on the seasonal northward movement of the ITCZ but use different observational metrics. SJ highlights the northward migration of the maximum rain belt, whereas Font considers the northward shift of the maximum zone of convergence. It is to be expected for a well-established ITCZ that these two definitions would be largely similar.
Mart was produced and analyzed originally in Senegal, Mali, and Burkina Faso for agronomic purposes. The use of a 20-day dry-test period following the initial rainfall event allows for capture of potential false onsets or sporadic rain events. It is possible that, in some regions, a more restrictive dry test is necessary (this is considered in Marteau et al. 2009). The relatively long observation window of Mart can limit its use for realtime analysis compared to Omo or Yam. A potential link between local and regional definitions, or a consistent seasonal trigger for Mart onset would reduce this restriction.
The definition Yam is in direct contrast to other local definitions, most notably that of Mart. Yam does not consider the onset with regards to specific needs of the end users, nor does it relate onset to a key shift in local or regional dynamics. The comparison between this definition and that given by Mart provides an interesting contrast between a readily modifiable and agronomic definition WAM onset.

Datasets
Two daily, satellite-based datasets and one reanalysis dataset have been used to map precipitation-based onset dates. The period 1998-2012 is used, giving 15 years of data.
Daily precipitation data from the NASA Tropical Rainfall Measurement Mission level 3 product 3B42 (TRMM; Huffman et al. 2007) and the Global Precipitation Climatology Project (GPCP; Adler et al. 2003;Xie et al. 2003), retrieved over 0.258 3 0.258 and 18 3 18 grids, respectively, have been used. In addition, we have calculated onset using reanalysis data from the ERA-Interim (ERA-I) on a 1.58 3 1.58 grid (Dee et al. 2011). ERA-I is often used as a benchmark analysis product for West African dynamical studies. Therefore, it is of interest to compare the onset statistics found using reanalysis data to observational datasets; however, it must be stressed that ERA-I results should not be considered true observed onset dates.
The recent development of TRMM version 7 (Huffman et al. 2007; see http://trmm.gsfc.nasa.gov/3b42.html for more details) offers a comparison of onset dates found in versions 6 and 7 (v6 and v7) for the period 1998-2010. To test onset sensitivity to spatial resolution, TRMM v6 and v7 data were coarse grained to 0.58 3 0.58, 18 3 18, and 2.58 3 2.58. Finally, TRMM v6, 3-hourly data have been used for comparison of LeB_ Reg and LeB_Loc with other onset definitions for the period 1998-2010.
For all definitions, analysis of data was performed for the period 1 May-30 August. This period encompasses the date of ITF advancement toward 158N (Sultan and Janicot 2003;Lélé and Lamb 2010) until after the annual monsoon season in the Sahel. For local onset definitions, this allows for onset dates triggered by the preonset phase of the monsoon (as defined by Sultan and Janicot 2003) to be observed. It was expected that regional onset dates would not be altered by this early observation window, and special care was taken to make sure that all regional onset dates reflect the large-scale shift studied in the original work.
The OLR-based definition Font was calculated using the National Oceanographic and Atmospheric Administration (NOAA) daily averaged, OLR, satellite-based dataset interpolated on a 2.58 3 2.58 grid (Liebmann and Smith 1996; see http://www.esrl.noaa.gov/psd/data/ gridded/data.interp_OLR.html for more details). During the analyzed period, there are no relevant missing dates.
Given that this study aims to recreate published onset definitions, time smoothing of data has been performed only on definitions where time smoothing was originally performed. The regional definitions SJ, SJ_10, SJ_15, Font, and LeB_Reg, as well as the semilocal onset definition LeB_Loc, use time-smoothed data to better relate the results shown here to the original papers. 4. Regional definition comparison a. Regional onset definition by Sultan and Janicot (2003) The onset definition SJ taken verbatim from Sultan and Janicot (2003) does not use set onset thresholds and contains a level of subjectivity in specific onset date selection. An example of unambiguous onset selection can be found using TRMM v7 data for 2009 (Fig. 1a). Here, there is a decrease in precipitation at 58N (red line), simultaneous with an increase in precipitation at 108N (blue) and 158N (green); regional onset is taken as 4 July (black line in Fig. 1a). By contrast, Fig. 1b shows the corresponding time series in 2010. Evidently, there is no clear time at which the three precipitation time series satisfy the criteria given by Sultan and Janicot (2003) simultaneously. Subjectively, for 2010 it is unclear whether onset should be defined as occurring around 15 June or after 1 July, which is a significant difference. The plots suggest that the clear triggering of SJ and existence of a definitive onset date for each year is not guaranteed.
To provide consistency and repeatability of the results found here, two objective modifications of SJ have been made. The new definitions SJ_10 and SJ_15 are given as the first date when zonally averaged, 10-day smoothed precipitation at 58N is below precipitation levels at 108 and 158N, respectively, for at least 7 days. The two objective onset definitions have statistically significant (to the 5% level) rank correlation (0.52 for TRMM v7), but the objective onset definitions do not correlate well with SJ (0.22 for SJ and SJ_10 using TRMM v7). This is disconcerting, as it implies that the perceived onset given all three time series is not linked to the onset date identified using two of the three time series. For the rest of this section, we use the objective definition SJ_10, given its clarity and repeatability.
For consistency with other regional onsets and to avoid the potential of erroneously early regional onset, SJ_10 is calculated for the period 1 June-30 August each year. This removes two early onsets found within the TRMM v6 1-degree and ERA-I datasets, which occur in early May and skew results. Given a 1 May start date, regional onset in the 1-degree coarse-grained version of TRMM v6 occurs on 20 May. With a 1 June start date,  Table 1 for a definition of SJ and section 4a for definitions of SJ_10 and SJ_15. onset is 11 July, which agrees much more closely with other datasets and expected results (not shown). Table 3 gives the mean SJ_10 onset date and standard deviation for each dataset studied here and those found in Sultan and Janicot (2003). The mean onset dates for SJ_10 calculated in this study for different datasets closely agree with the result of the original paper as well as subsequent papers (such as Janicot et al. 2011), suggesting a consistency of the definition across differing time periods and datasets. In our study, there is greater variability of the onset dates, which could potentially be because of the smaller sample size in our study; however, this difference is only of the order of days.

1) SUMMARY STATISTICS
Overall, the largest spread of onset dates occurs in the ERA-I dataset, with 42 days separating the earliest (2005) and latest (1998) onsets. TRMM v6 and v7 show a similar range of onset dates; however, the interquartile range of v6 is much larger than v7. The majority of onsets are observed within a narrow date range in the TRMM v7 dataset, implying that typical interannual variability within this dataset is very low. A similar argument can be made for GPCP and the two coarse-grained TRMM v6 datasets. While the interannual variability of regional onset is of scientific interest, in general, a forecasted onset date within a week of true observed onset is sufficient for agronomical purposes (Sultan et al. 2005a). Therefore, a climatological regional onset date taken from any of the datasets studied here would generally provide sufficient information on onset for practical purposes. This raises the question of the need for accurate prediction of regional onset on an interannual basis. The standard deviation coupled with the spread of onset dates shows the low interannual variability given by this onset definition. Given the much higher interannual variability found in local onsets (highlighted in section 5), this raises the question of whether a regional definition can provide sufficient information for practical use at the local scale. This does not nullify the use of regional onsets for other forecasting needs and scientific research. Instead, it is important to highlight that onset date selection must relate to the needs of the forecast users.

RESOLUTION FOR INDIVIDUAL YEARS
The mean onset results mask interannual disagreements between datasets. Figure 2 gives the precipitation time series at 58, 108, and 158N for GPCP and TRMM v7 in 2008, as well as the SJ_10 onset date for both datasets. GPCP gives earlier-than-dataset-mean onset (13 June compared to an average of 29 June), whereas TRMM v7 observes 2008 as a slightly later-than-average onset (8 July compared to a 4 July average).
The reason for this difference in onset date is the relative levels of precipitation found at 58 and 108N prior to and around 13 June. For GPCP (Fig. 2a), precipitation at 58N gradually decreases from a peak value at the end of May and is less than precipitation at 108N from the date of SJ_10 onset until the end of our observed period. By contrast, in TRMM v7 (Fig. 2b), there is a secondary peak in rainfall at 58N in early June and, while the precipitation at 58N does temporarily dip below that found at 108N, this does not trigger our SJ_10 onset.
Of interest is the fact that precipitation levels at 108 and 158N are similar across the entire observed period for both GPCP and TRMM v7. The reason for the difference in onset date is the variability in precipitation at 58N. This example is representative of other comparisons between SJ_10 onsets realized using different data. Figure 3 highlights the interannual range of onsets found using different datasets. Comparing onset dates for the various datasets against SJ_10 onsets for TRMM v7, it is found that the difference between TRMM v7 onset and ERA-I onset is greater than 7 days for 5 of the 15 years; likewise, for GPCP this difference occurs in 6 of the 15 years studied. In 4 years, the difference between TRMM v6 and v7 also exceeds 7 days. There is no systematic year-to-year difference between pairs of datasets, and, therefore, agreement cannot be reached through a simple bias correction. The onset dates calculated across different datasets show statistically significant (to 5% level) positive correlation using Spearman rank correlation. Of note is the correlation between TRMM v6 and TRMM v7 (0.55) and TRMM v6 0.25-degree with its coarse-grained counterparts (0.63 and 0.67 for TRMM v6 0.5-degree and TRMM v6 1-degree respectively). The GPCP onset dates are also well correlated with the other datasets (0.81 for TRMM v6, 0.76 for TRMM v6 1-degree data, 0.51 for TRMM v7, and 0.54 for TRMM v7 coarse grained to 1 degree).
The onset definition SJ_10 does not appear to be sensitive to the resolution of observations used here. In particular, TRMM v7 shows very high consistency of onset dates across all resolutions. TRMM v7 0.25-degree onset dates are significantly correlated at the 5% level with the TRMM v7 0.5-degree onset dates (0.99), as well as the 1-degree (0.99) and 2.5-degree (0.90) coarsegrained version of TRMM v7. The northward migration of the maximum rain belt can be reliably captured using high-resolution and lower-resolution precipitation data.
Root-mean-square error (RMSE) analysis allows for comparison of onset dates given by each of the three precipitation datasets used. The RMSE is consistent between GPCP and the various resolution of TRMM v7 used (12.0 between TRMM v7 0.25-degree resolution and GPCP; 12.1 between TRMM v7 1-degree resolution and GPCP). The RMSE between TRMM v6 0.25degree resolution and GPCP is lower (7.2), as is the error between TRMM v6 and TRMM v7 at 0.25-degree resolution (10.6). The fact that the RMSE is of a similar order to the variability found in each dataset suggests that the variability of SJ_10 is well constrained across datasets.
While there is correlation between the SJ_10 onset time series, the fact that there are certain years (up to 20% of those studied) where two datasets can disagree by over seven days suggests that dataset choice can still greatly impact the onset date found for certain years, even though, in most years, this impact will not be too severe. b. Regional onset definition by Fontaine et al. (2008) Composites of annual OLR progression show a northward migration of regionally averaged (108W-108E) deep convective activity from 58N toward the Sahel during late June/early July (Fig. 4). This occurs concurrently with the annual movement of the ITCZ and maximum rain belt into the Sahel. The regional definition Font expands on the information in Fig. 4 to establish the date at which the maximum convective belt shifts from 58 to 108N, with annual onset dates of Font computed using the method in Table 1. The definition Font contains a level of subjectivity in the period of time required to establish the shift of the maximum convective belt to 108N.
As with SJ, the lack of definitive threshold values for Font allows for potential ambiguity within the definition. To remain objective and to allow for repeatability of the results shown here, the onset Font has been interpreted as close to verbatim as is pragmatic. Onset dates for Font have been previously published in Gazeaux et al. (2011) for the years 1979-2004. Analyzing the crossover between this study and our own, we find years in which the onset dates agree (1998, Fig. 5a) and ones with unclear results (1999, Fig. 5b). In 1998, a shift of the maximum convective belt from 58 to 108N around 1 July is evident, at which point convective activity remains at, or northward of, 108N for the remainder of the monsoon season (Fig. 5a); 1 July is also given by Gazeaux et al. (2011) as the onset date for this year. For 1999 (Fig. 5b), onset date is more difficult to determine. Gazeaux et al. (2011) determine onset 10 days later than for 1998; however, there exists much more frequent, low OLR coverage (indicative of deep convective activity) over the Sahelian region for 1999 during mid/late June. Although convective activity seems to diminish at about 58N around 11 July, part of the convective maximum has already established itself over 108N prior to this time. It is quite possible to make a case for the onset occurring prior to 11 July; however, given the exact wording of the definition, 11 July would be the most appropriate date.
Font is open to interpretation and difficult to objectively calculate, making this an unsuitable metric for sole use in evaluating onset prediction for West Africa. Despite this, the use of OLR data represents a significant benefit of the definition Font, because of the long and reliable dataset (NOAA 1979-present) and data not being reliant on satellite retrievals. Font could be used along with other regional definitions, such as SJ_10 and LeB_Reg, in order to observe the large-scale shift of the monsoon system. c. Regional onset definition based on Le Barbé et al. (2002) 1) SUMMARY STATISTICS The regional definition LeB_Reg determines a representative date for the seasonal northward shift of the ITCZ similar to SJ and Font. As with the two previous definitions, there is room for subjectivity in explicit onset date identification. In 2002, for example, there is a clear shift in the location of the maximum rain event frequency from near 58 to 108N during mid-July (Fig. 6a). By contrast, in 1999, the maximum in rain event frequency does not reach 108N until mid/late June. However as early as 1 June 1999, rain event frequencies at 108N are greater than 0.5 (i.e., at least 5 rain events per 10 days) (Fig. 6b). It is open to interpretation which date should be taken as the precise onset date for this year, highlighting the flexibility in the calculation and comparison of LeB_Reg. It should be noted that the result found here may be closely tied to the ambiguity found in Font for 1999 (Fig. 5b). Figure 7 highlights the comparison between onset dates for the various SJ_10 representations using different datasets and LeB_Reg. For TRMM v6, several SJ_10 onset dates agree closely with their complimentary LeB_Reg onset date, with 2004 marking the largest temporal gap between the two definitions for these datasets. This is because of the existence of a double peak in rain event frequencies occurring at 108N during July affecting the LeB_Reg onset but not the SJ_10 onset (not shown). There exist more frequent large outliers The black vertical lines mark onset dates given in Gazeaux et al. (2011). found when comparing LeB_Reg to SJ_10 onsets using other datasets. In all dataset comparisons, there are years when the difference between onset dates is greater than 7 days.

2) COMPARISON BETWEEN SJ_10 AND LEB_REG
The mean onset date for LeB_Reg using TRMM v6 over the period 1998-2010 is 3 July with a standard deviation of six days. This is comparable with the onset mean found for SJ_10 (28 June for TRMM v6, 5 July for TRMM v7) and consistent with the observed pattern of Font. Despite the mean onset dates being comparable between SJ_10 and LeB_Reg, there is little interannual correlation between the two definitions with no correlations significant at the 5% level found. Different onset definitions have disparate perceptions of which years have late and early onsets.
Overall, there is moderate agreement between the mean patterns of all regional definitions used here, implying that the same large-scale dynamical process is being measured by all three regional definitions using different metrics. However, the lack of interannual correlation between onset dates calculated using different datasets poses a significant problem for practical seasonal forecasting. Without agreement between datasets, there is limited use in evaluation of forecast models because of the lack of clarity on what an accurate prediction of monsoon onset would be for any given year. Forecasts should, therefore, be compared to multiple datasets, with the forecasted regional onset date expected to lie within the spread of observable onset dates. Finally, the onset definition used must be relevant to the goals of the forecaster and give practical implications to users. For tracking the seasonal ITCZ movement, the inclusion of definitions that do not use precipitation intensity data, such as Font and LeB_Reg, should also be considered, while care is taken to understand the limits of definition subjectivity.

Local definitions
Unlike the regional definitions analyzed above, the three local definitions examined in this study all use the same observational parameter (precipitation) but capture different aspects of the local seasonal precipitation time series. Mart captures the onset of persistent rainfall, Omo is triggered by heavy rainfall, and Yam identifies the commencement of local rainfall at the gridcell level.
The motivation behind and construction of Mart makes the definition directly relevant to local, agronomic stakeholders. In contrast, the definition given by Yam is, by the authors' admission, readily modifiable. It is expected that the beginning of local rains (i.e., Yam) at a given grid cell will occur before or simultaneous with the onset of persistent local rainfall (Mart). Therefore, potential strong correlation between Mart and Yam would provide a useful window of prediction for the agronomical onset of local rainfall with practical benefits for local farmers.

a. Geographical patterns of local onset in TRMM v7
Figure 8 compares the three local definitions studied here using the TRMM v7 dataset. Both Omo and Yam show widespread onset prior to the middle of June (day 169;Figs. 8b,c). Mart (Fig. 8a) has onset dates occurring later than the other two local definitions and shows a rough southerly progression of onset dates through the West African region. For some locations, onset dates in Mart occur over 28 days after local triggering of Omo (Fig. 8d) and Yam (Fig. 8e), with little difference in the average timing of onset for Yam and Omo, except over the northern Sahel (Fig. 8f). It is apparent that it is far easier to trigger Yam or Omo than Mart over most of the analyzed domain.
There is sporadic statistically significant (at the 5% level) interannual correlation between the three datasets studied at the gridcell level (Figs. 8g,h,i; white areas denote regions where no significant correlation is found). This lack of widespread correlation shows there is generally little interannual consistency between the local commencement of rainfall (Yam), the commencement of heavy rainfall (Omo), and the local commencement of persistent rainfall (Mart). The local onset definition chosen can significantly alter the onset pattern observed across the West African region and therefore impact the relationship between local and regional onsets considered.

1) COMPARISON OF TRMM V6 AND TRMM V7
ONSET COMPOSITES In general, local onsets dates for Mart and Yam agree closely across TRMM v6 and v7 (cf. Figs. 9a,c with Figs. 8a,c), with little difference in mean onset dates across datasets (Figs. 9d,f); the notable exceptions occur around 88-128N, 78-158W and 88-118N, 58-128E. The onset date Omo occurs earlier in TRMM v6 than in TRMM v7 by about two weeks or more across the entire West African region (Figs. 9b,e). It is possible that the difference found between onset dates using the two datasets is due to the different representation of intensive rainfall (Seto et al. 2011).
There is statistical significant correlation (at the 5% level) across datasets for Mart and Yam, but poorer correlation for Omo (Figs. 9g,i for Mart and Yam; Fig. 9h for Omo). This suggests that both TRMM datasets agree with their retrievals of light rainfall (,10 mm) and potentially have differing retrievals of heavier precipitation, with TRMM v7 less likely to observe the heavy early season rainfall required to trigger Omo.
In both TRMM versions, there is a zonal maximum in onset date Mart for low latitudes around 108W and in the longitude region 58W-08. Around these regions, it is possible that there exists significant coupling between African Easterly waves and local rainfall, which potentially explains the later onset dates observed (Bain et al. 2014, their Fig. 4). As this paper does not focus on the dynamical triggers for onset, this potential link has not been investigated further here but is worthy of further research.
2) COMPARISON OF TRMM V7 AND GPCP ONSET COMPOSITES Onset composites for Mart, Omo, and Yam using GPCP show earlier onsets than for TRMM v7 for almost the entire West African region. In particular, the agronomic onset Mart is triggered during May for almost all longitudes in the latitude range 88-128N, with the northeastward progression of onset dates apparent in TRMM v7 observations not present in GPCP (not shown).
Furthermore, there is little to no statistically significant (at the 5% level) interannual local correlation between onsets in TRMM v6 or TRMM v7 and those in GPCP for any of the three definitions studied. While it is possible that this disparity is due to the different spatial resolution of TRMM and GPCP, this lack of correlation is of concern with regard to the Mart definition. Poor correlation between GPCP and TRMM v6/v7 suggests an inherent disagreement as to the optimal time for effective planting of crops at a local scale and suggests that observational dataset choice can greatly affect the onset dates found. A potential reason for this is that GPCP has a wet bias of greater than 1 mm day 21 over the study region during boreal summer, making it easier for GPCP to satisfy the dry-test restriction of Mart than for either of the TRMM products (Adler et al. 2012, their Fig. 7).

COMPOSITES
The three local definitions studied here are triggered over a much smaller spatial area of West Africa within the reanalysis data of ERA-I than in the higherresolution TRMM datasets (not shown). There is very poor coverage of both Mart and Omo using ERA-I data; this pattern is not consistent with composites of coarsegrained TRMM data (not shown). There is little correlation between ERA-I onsets and TRMM onsets, regardless of version used or the level of coarse graining used. Because of the lack of onset triggering, it is clear that ERA-I struggles to calculate local onsets Mart, Omo, and Yam without modification of thresholds.

4) SUMMARY OF LOCAL ONSET COMPARISONS
The local onset pattern of the West African monsoon is very sensitive to the onset definition calculated and observation dataset used. The two onset definitions that do not require a prolonged period of sustained precipitation (Yam and Omo) both occur earlier than the agronomic definition Mart in TRMM datasets. There is good intradataset agreement for the local definitions Mart and Yam across TRMM v6 and v7, consistent with their agreement for regional onsets. TRMM v6 and v7 0.25-degree onset dates do not agree closely with those found using the coarser dataset GPCP for any of the three local definitions. Likewise, the reanalysis data do not appear to be able to satisfy the thresholds needed for local onset to be triggered. The local onset pattern found for West Africa is highly dependent on definition and dataset choice. Thresholds used in local onset definitions may need to be modifiable given potential biases across different datasets and forecast models.
c. Sensitivity of onset definitions to thresholds

1) SENSITIVITY OF MART TO DRY-TEST PERIOD
False onset poses the largest agronomical risk for local stakeholders in West Africa. Premature planting of crops before persistent rainfall can lead to near-total crop yield loss and widespread famine. It is therefore vital that onset definitions are robust and not overly sensitive to parameter thresholds.
To test the local precipitation threshold definitions, Mart was subjected to a modified dry-test period. A comparison between 20-day and 30-day dry tests was performed for the TRMM v7 dataset. Figure 10 shows the mean onset patterns for both realizations of Mart and the correlation between the two. The mean onsets for 20-and 30-day dry tests exhibit approximately the same general spatial pattern, suggesting that the definition is robust (Figs. 10a,b). However, there are several areas in which onset dates occur substantially later for a 30-day dry test. Most notably, parts of Ghana and Burkina Faso (88-148N, 58W-08E) have onset dates occurring 25-30 days later with the more restrictive onset definition. Although in places the timing of mean onset differs, there exists significant correlation at the 5% level between the onset definitions of Mart with 20-and 30-day dry tests across the entire region studied (Fig. 10c). It is possible, therefore, that in regions of difference between the two dry-test plots, the definition Mart can be calibrated on a local scale. Mart can be considered a stable onset definition that is not overly sensitive to the length of dry test employed.

2) SENSITIVITY OF OMO TO DRY-TEST SELECTION
The lack of a dry test within the definition of Omo means that a short, isolated, intense rain event could trigger onset prior to persistent rain necessary for crop growth being installed over local regions. To highlight the risk of this occurrence, Omo was modified to include the same dry test found in Mart (i.e., no 7-day spell, with less than 5 mm of rain in the subsequent 20 days of the initial rain event).
The inclusion of a dry test to the Omo definition drastically changes local onset dates observed across West Africa (not shown). Onset dates are much later across the entire region, with very few onsets triggered in May. There is little to no significant local correlation (at the 5% level) between the definitions of Omo with and without a dry test across the entire region studied. Because of the lack of agreement found, it is suggested that the local definition Omo should not be used to calculate onset for agronomic purposes unless a dry test of at least 20 days is included in calculation of the definition.

3) SENSITIVITY OF YAM TO OBSERVATION PERIOD AND PRECIPITATION INTENSITY
The local onset definition Yam was subjected to an extended observation period to ascertain its sensitivity to false onset. In order for onset to be triggered, it was required that the average rainfall for 20 days was greater than 2 mm day 21 , as opposed to the original window of 6 days. In low latitudes, there is little change between the two definitions (not shown). Onset dates are still triggered in May/early June across the entire West African region, and there is significant correlation between the two definitions for most of the region 88-128N.
The onset dates given by Yam are more sensitive to the intensity of precipitation required for triggering compared to the length of observation period. Figures 11a and 11b, respectively, display the onset distribution given by Yam with requirements of 2 and 4 mm day 21 needed in the initial 6-day period. The onsets seen in the more restrictive definition still occur during May/mid-June, but they are more closely representative of the onset seen in Omo (Fig. 8b). Despite this, Fig. 11c shows that there is statistically significant correlation at the 5% level between the two Yam realizations over a large part of the region considered, implying that interannual onset patterns are consistent across both onset requirements.

d. Semilocal onset distribution given by LeB_Loc
The definition LeB_Loc provides an onset value for each full degree of longitude with direct comparison to LeB_Reg. The temporal and latitudinal variation in rain event frequency differs depending on the longitude observed, giving further information of the possible reasons behind the contrast from local and regional onsets. This is evident by observing patterns at two locations in 2002 (Fig. 12). At 88W, an appreciable shift in the maximum of rain events from near 58N to about 108N can be seen to occur around late June/early July (Fig. 12a). Farther east this shift is not evident. Instead, rain event frequencies around 58N remain approximately constant throughout the monsoon season, with rain event frequencies in higher latitudes showing a seasonal pattern (Fig. 12b). It is possible that, at the edges of the monsoon region, the influence of moisture transport from the Gulf of Guinea and the eastern Atlantic (western boundary) and progression of mesoscale convective systems from the Ethiopian Highlands (eastern boundary) on rain event frequency is high. Furthermore, topography could influence the patterns shown in Fig. 12. These regions also display the highest variability in local onset date for Mart.
There appears to be little interannual consistency between the regional definition LeB_Reg and the semilocal definition LeB_Loc (Fig. 13a). No apparent link between the range of LeB_Loc dates for a given year and that year's LeB_Reg onset date is present in our results. However, there is a discernible spatial pattern of LeB_Loc onset dates across the 13 years when dates are grouped by longitude (Fig. 13b). Onset dates occur earliest near the western boundary of the studied region, with latest dates occurring from 58W to 08 (the region that is most likely connected to AEW activity). Although earlier dates are also observed across the eastern boundary of the monsoon region (58-108E), there is generally a wider range of onset dates observed than around the western boundary, implicitly highlighting higher interannual variability in onset dates around the eastern half of the monsoon region than farther west. This suggests potentially that drivers that affect LeB_ Loc over the western extent of the monsoon region are more consistent year to year than those that affect onset across the eastern part of the monsoon region.
The semilocal definition LeB_Loc suggests propagation of onset coming into the Sahel from both the east and west of the monsoon region. This is in agreement with findings for the local definition Mart. Semilocalized onset dates given by LeB_Loc occur predominantly in late June or early July, which is concurrent with the regional dates given by SJ and LeB_Reg but after Mart for the latitudes 88-108N. The semilocalized onset can therefore be seen as a middle ground between the regional and local onset definitions mentioned.
e. Correlation between local, semilocal, and regional definitions It is already known that there is no correlation between the migration of the zonal maximum rain belt and the beginning of the Sahelian rainy season (Sultan and Janicot 2003). However, the explicit link between local and regional onsets has not been assessed previously. Figure 14 shows the local correlation between local and regional onsets within the predefined monsoon region. We find that there is no considerable section of the West African region that shows statistically significant (at the 5% level) correlation between regional onset dates (SJ_10) and local onset dates (Mart). North of 108N, correlation is largely positive; here, later regional onset is concurrent with later local onset, on average. In lower latitudes, there exists a region of typically negative correlation between the two definitions. The local onset Mart precedes regional onset SJ_10 for the FIG. 11. Sensitivity of Yam to precipitation threshold required. (a) Yam composite for 2 mm day 21 threshold; (b) Yam composite for 4 mm day 21 threshold; and (c) correlation between the two definitions.
latitudes 88-128N, with the opposite occurring north of this point. Coupled with the poor correlation between the two definitions, this implies that regional onset cannot be used as a predictor of local onset for latitudes 88-128N.
The semilocalized definition LeB_Loc was also assessed to test whether a link exists between localized onset patterns and latitudinal shifts in rain event frequency. We find that there is little significant correlation between the localized and semilocal metrics across the West African region.
We conclude from these findings that local and regional onset dates are not linked on an interannual level at the local scale. It is possible that different dynamics are affecting local and regional onset on a year-to-year basis. This result is disconcerting, as it suggests that progress in understanding the regional shift of the ITCZ will not necessarily aid in understanding of what affects local onset.

Conclusions
The West African monsoon onset can be viewed as a two-stage seasonal progression. During April and May, the Intertropical Front gradually moves northward, increasing available moisture within the West African region. About 200 km behind the front, societally useful rainfall occurs (Lélé and Lamb 2010), marking the approximate start of the local rainy season. During late June/early July, the zonal maximum rain belt rapidly shifts from one quasi-stationary location at 58-108N. This sudden shift (or jump) marks the regional onset of the West African monsoon. Our work provides a comparison between different local and regional onset definitions and their observations in different datasets.
Nine definitions [including three realizations of Sultan and Janicot (2003)] of the West African monsoon onset have been calculated using four observational datasets and one reanalysis product for the period 1998-2012. Five regional definitions have been calculated across the monsoon region (108W-108E), with all definitions showing a shift of the West African monsoon from near the Guinea Coast into the Sahel around the end of June or the beginning of July.
The regional definition given by Sultan and Janicot (2003) measures the annual shift of the maximum rain belt in West Africa from near the Guinea Coast (;58N) farther inland (;108N). The mean onset date given by this definition is consistent across the four precipitation datasets studied here, as well as in the original work occurring in late June with a standard deviation of eight or nine days in all datasets. Although the mean patterns agree, there is poor interannual correlation between onsets calculated using different datasets. The interannual variability of regional onset is dependent on dataset choice despite the similar long-term patterns found. This result is disconcerting; if observational datasets cannot agree on regional onset dates year to year, even if their mean onset dates are similar, then it is unclear which dataset should be chosen to evaluate seasonal forecasting of monsoon onset. For forecast evaluation, this means that the skill with which a model appears to predict onset is inherently tied to the observations the model is evaluated against.
The OLR definition given by Fontaine et al. (2008) shows an annual shift in convection from the Guinean coast to the continental region, as does the rain event frequency definition given by Le Barbé et al. (2002). Because of the subjectivity of the definition, specific calculation of onset dates using OLR was not completed. The mean rain event frequency onset is consistent with the definition from Sultan and Janicot (2003); however, interannual correlation between definitions varies depending on dataset choice.
Local onset definitions are more spatially and interannually variable than their regional counterparts. The three definitions calculated [originally found in Omotosho et al. (2000), Marteau et al. (2009), and Yamada et al. (2013)] each require different precipitation patterns in order to be triggered. All three definitions identify widespread onset occurring before regional onset. There is poor correlation between the agronomic definition presented by Marteau et al. (2009) and the other two local definitions analyzed, but there is statistically significant (at the 5% level) correlation between Omotosho et al. (2000) and Yamada et al. (2013). FIG. 14. Correlation between local and regional onsets within the monsoon region. Local interannual correlation between SJ_10 and Mart using the TRMM v7 dataset.
To test robustness, the local definitions were modified with more restrictive triggering conditions. It was found that the definition from Marteau et al. (2009) was not strongly affected by the inclusion of an extended dry test with significant (at the 5% level) local correlation found across the region. The inclusion of a 20-day dry test onto the Omotosho et al. (2000) definition greatly altered the local interannual pattern of onset with little significant correlation (at the 5% level) found, implying that the definition may be susceptible to the risk of false onset. Yamada et al. (2013) is not sensitive to the length of observation period required or to the intensity of precipitation needed for triggering, suggesting that the use of an arbitrary threshold does not seem to affect the expected onset pattern. We would therefore argue that the specific selection of threshold value for agronomic purposes requires further study.
Because of the variability found in local onset dates, a link between regional and local onsets was explored. We found that there is minimal interannual agreement between the local and regional definitions studied. For local stakeholders, there may be little use in knowing the regional onset date. Furthermore, there is no link found between local onset dates and the semilocalized onset definition inspired by Le Barbé et al. (2002), nor between the semilocal definition and regional onsets. Local definitions appear to be triggered in isolation to the regional dynamic shift portrayed in regional onsets. Skill in predicting the seasonal progression of the maximum precipitation band will not necessarily imply skill in predicting local precipitation onset. Other local and regional definitions have been tested with similar results. Focus on local onset prediction and associated dynamics is therefore necessary for local stakeholders.
For evaluation of seasonal prediction of the West African monsoon onset, it is apparent that the definition chosen and dataset used will greatly affect assessment of forecasts. Local and regional onset definitions must be considered, the subjectivity found within regional definitions must be taken into consideration, and multiple observational datasets should be used. Although a lot of research exists evaluating the potential dynamical triggers for regional onset, local onsets require considerable future study, as these are of more pressing need for local stakeholders and are uncorrelated with regional onset.
The concept of West African monsoon onset is not a straightforward issue. While the monsoon system is clearly defined, picking a singular point at which onset occurs is reliant on understanding what onset means to the end users. For regional climatology, an understanding of when the maximum rain belt and convection zone reaches the Sahel is of interest. For local stakeholders, it is more important to identify when local rainfall sufficient for their needs begins. There is no necessary right or wrong onset definition, but care must be taken in future studies of onset sensitivity to understand what the scope of findings is for local and regional stakeholders.