Multiple mantle upwellings in the transition zone beneath the northern East‐African Rift system from relative P‐wave travel‐time tomography

Mantle plumes and consequent plate extension have been invoked as the likely cause of East African Rift volcanism. However, the nature of mantle upwelling is debated, with proposed configurations ranging from a single broad plume connected to the large low‐shear‐velocity province beneath Southern Africa, the so‐called African Superplume, to multiple lower‐mantle sources along the rift. We present a new P‐wave travel‐time tomography model below the northern East‐African, Red Sea, and Gulf of Aden rifts and surrounding areas. Data are from stations that span an area from Madagascar to Saudi Arabia. The aperture of the integrated data set allows us to image structures of ∼100 km length‐scale down to depths of 700–800 km beneath the study region. Our images provide evidence of two clusters of low‐velocity structures consisting of features with diameter of 100–200 km that extend through the transition zone, the first beneath Afar and a second just west of the Main Ethiopian Rift, a region with off‐rift volcanism. Considering seismic sensitivity to temperature, we interpret these features as upwellings with excess temperatures of 100 ± 50 K. The scale of the upwellings is smaller than expected for lower mantle plume sources. This, together with the change in pattern of the low‐velocity anomalies across the base of the transition zone, suggests that ponding or flow of deep‐plume material below the transition zone may be spawning these upper mantle upwellings.


Introduction
The causes of hotspot volcanism, i.e., volcanic activity away from or enhanced at plate boundaries, remain debated. Hot upwellings rising from the base of the lower mantle-commonly referred to as mantle plumes [Morgan, 1971]-are the dominant hypothesis [e.g., Ito and van Keken, 2007]. However, without conclusive seismic detection, the debate about plume existence, number, and shape continues [King and Anderson, 1998;Ritsema and Allen, 2003;Montelli et al., 2004b;Foulger, 2005;Boschi et al., 2007].
One of the most prominent continental hotspots in terms of buoyancy flux and upper mantle seismic expression is Afar, at the northern end of the East African Rift (EAR) [Sleep, 1990;Ito and van Keken, 2007;Styles et al., 2011]. Afar is characterized by voluminous Late Eocene-Oligocene-Miocene flood basalts [Hofmann et al., 1997;Kieffer et al., 2004], topographic swells [Ebinger et al., 1989], active magmatism with geochemical characteristics often attributed to plumes, including, in some samples, high 3 He/ 4 He ratios [Trull, 1994;Marty et al., 1996;Pik et al., 1999], and some of the lowest upper mantle seismic velocities imaged globally [Ritsema and Allen, 2003;. the presence of broad low-velocity structures in the upper mantle below the East African Rift, and interpreted them to be parts of the superplume [Benoit et al., 2006a;Hansen et al., 2012;Mulibo and Nyblade, 2013a]. Further support for a broad-scale structure feeding East African and Afar volcanism from a source below Kenya/Tanzania is given by some receiver function analyses [Huerta et al., 2009;Mulibo and Nyblade, 2013b] and studies of seismic anisotropy that are consistent with an overall upper mantle flow pattern along the EAR in a south-north to southwest-northeastward direction Montagner et al., 2007;Sicilia et al., 2008;Gao et al., 2010;Bagley and Nyblade, 2013;Hammond et al., 2014]. Such a large-scale plume is also able to match the broad dynamic topography of Afar and Arabia [Daradich et al., 2003].
More recent global and regional tomographic models have instead proposed that two/three separate plumes coming from the deep mantle feed East African Rift volcanism: one currently below Tanzania/Kenya, a second currently below Afar, and a possible third beneath Arabia [Davaille et al., 2005;Montelli et al., 2006;Chang and Van der Lee, 2011;Chang et al., 2015]. Trace-element and isotopic characteristics of the EAR lavas plus 40 Ar/ 39 Ar dating appear to require at least two plume sources, where the southern plume fed early (45 Ma) Ethiopian flood basalts and (due to northward movement of Africa) current Kenyan volcanism, and the northern one has supplied Afar volcanism since about 30 Ma [George et al., 1998;Rogers et al., 2000;Furman et al., 2004;Furman et al., 2006;Pik et al., 2006]. A few seismic studies have indeed imaged distinct structures with narrow (few 100 km wide) tails that extend into the mid mantle below Afar and Kenya Koulakov, 2007;Chang and Van der Lee, 2011]. Dynamic modeling of upper mantle flow, temperature, and melt productivity also led to the conclusion that two plumes entering the upper mantle are required to fit the observed evolution of volcanism and rifting from Tanzania to Afar [Lin et al., 2005].
Higher resolution body-wave images of the mantle down to 400 km below the northern East African rift, including the Afar region, reveal smaller-scale structure. Structure down to 100-200 km, which is likely dominated by signatures of melt, connects to multiple low-velocity zones down to at least 400 km depth, which may be upwellings on a smaller scale (100-200 km) than those proposed in the two to three-plume models [Bastow et al., 2008;Hammond et al., 2013;Thompson et al., 2015]. Surface wave tomography has also been used to suggest that the broad low-velocities below Afar are either unusually strong or made up of multiple smaller features [Debayle et al., 2001;Montagner et al., 2007]. Meshesha and Shinjo [2008] argue for the existence of small-scale and evolving upwellings based on the identification of four different geochemical mantle source flavors in East African volcanism since the Eocene. It is debated how and if these ''diapirs'' connect to the broad lower-mantle structure imaged by other studies [Debayle et al., 2001;Furman et al., 2006;Bastow et al., 2008;Hammond et al., 2013].
In this paper, we extend the previous high-resolution studies from Bastow et al. [2008] (below the Main Ethiopian Rift (MER)) and Hammond et al. [2013] (MER and Afar), by combining and updating P-wave travel-time data sets from all available regional seismic deployments from Madagascar to Saudi-Arabia ( Figure 2). This gives us good resolution for P-velocity structure from 100 to 700-800 km depth (see supporting information Figures S1-S6) below the region comprising the Afar Depression and the Main Ethiopian Rift (MER).
As another improvement upon previous studies, we use a starting model based on the structure from a previous surface-wave study [Fishwick, 2010] to add constraints to the inversion for anomaly amplitudes above 300 km depth. This helps in interpreting our velocities in terms of temperatures, using a conversion based on published thermodynamic parameters for the sensitivity of velocity to temperature, pressure, and phase together with tailored resolution tests of plume-like structures.

Potential Plume Structures
Dynamic models, numerical and analog, have cataloged the variety of mantle plume morphologies that are possible for likely ranges of mantle viscosity, thermal parameters, phase transitions, and chemical heterogeneity [e.g., reviews by Davaille et al., 2005;Lin and van Keken, 2006;Ribe et al., 2006;Ito and van Keken, 2007]. Hot plumes form at a thermal boundary layer, often with a large head followed by a thinner tail. Arrival of the head at the base of the lithosphere is expected to give rise to wide spread melting, and is often invoked as the cause for flood basalts [Richards et al., 1989], such as the Late Eocene-Oligocene-Miocene Ethiopian Traps [Hofmann et al., 1997;Kieffer et al., 2004]. After the head material has spread below the lithosphere, the tail may reach an almost steady state structure and continue to feed more localized volcanism [e.g., Ribe et al., 2006]. The size of thermal plume tails is governed by mantle rheology, and in the Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 upper mantle, diameters of few tens to few hundred km are expected [Van Keken et al., 1993;Van Keken and Gable, 1995;Goes et al., 2004;King and Redmond, 2007].
The ''Superplume'' proposed based on seismic tomography [Hansen et al., 2012;Hansen and Nyblade, 2013] is clearly larger than such tails. It is similar in scale to that predicted for domes or heads of mega plumes that can develop in a thermochemical mantle [e.g., Davaille et al., 2005]. Furthermore, although some plume bending by mantle flow is possible [Steinberger and O'Connell, 1998], the 608 angle from the vertical proposed for the superplume is too large to be dynamically stable [Loper and Stacey, 1983;Davaille et al., 2005]. Instead, Davaille et al. [2005] propose East Africa to be underlain by at least two, subvertical, plumes: a large-scale thermochemical plume/dome below southern Africa that has not yet reached the upper mantle, and a dying thermal plume below Afar of which only the shallower part of the tail is left. However, such a scenario is not able to explain a large-scale upper mantle low-velocity anomaly stretching from Kenya/Tanzania to Afar.
Petrological, topography, and seismic constraints give 100-300 K estimates for the excess temperatures below hotspots (relative to the mantle average) [Herzberg et al., 2007;Ito and van Keken, 2007]. Excess mantle temperatures below Afar are likely on the lower end of this range [Rooney et al., 2012;Rychert et al., 2012;Ferguson et al., 2013;Armitage et al., 2015]. An important consideration when attempting to seismically image mantle Figure 2. Distribution of sources (red dots in inset) and stations (main map) used in this tomographic study. The stations are color coded according to their network and cover the region from Madagascar in the south to Saudi Arabia in the north. The black rectangle shows the area on which we concentrate our analyses and interpretation. Source-receiver distances shown by the circles on the inset are from the center of the black rectangle. Network and station information can be found in supporting information Tables T1 and T2.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 plumes is the variation in structure with depth. At upwelling speeds of 10-20 cm/ yr, obtained from seismic estimates of plume width, surface topography, and flow modeling [Turcotte and Schubert, 1982;Nolet et al., 2006], plume temperature contrast will change adiabatically with depth. Across the upper mantle, this would amount to a relative minor increase in thermal plume anomaly with depth of <508C.
By contrast, the sensitivity of seismic velocities to temperature decreases very strongly with depth due to a combination of elastic and anelastic effects [Goes et al., 2004;Stixrude and Lithgow-Bertelloni, 2005;Styles et al., 2011] ( Figure  3). Additional strong but local anomalies are expected due to the depression of exothermic phase boundaries in the plume (like around 410 km depth) and uplift of endothermic ones (like near 660 km depth) [e.g., Stixrude and Lithgow-Bertelloni, 2007] (Figure 3). However, due to their limited size, these will not be resolvable in teleseismic travel-time tomography (as discussed in section 4 and shown in supporting information Figure S7). As a result, for a continuous thermal upper mantle plume of 100-200 K excess temperature, we expect to see V P anomalies of 0.4-0.8% at the top of the lower mantle, and a 5-7 times stronger signature in the upper 100 km of the mantle (Figure 3). The presence of melt at the shallowest depths (<50-70 km) would further enhance shallow anomalies adding to this contrast in anomaly strength [Armitage et al., 2015].

Data Sets
In this study, we combine P travel-time data from all available temporary and global seismic networks in Tanzania, Kenya, Uganda, Democratic Republic of the Congo, Ethiopia, Eritrea, Djibouti, Yemen, Oman, Malawi, Zambia, Madagascar, and Saudi Arabia, leading to a total of 452 stations (Figure 2, supporting information Table T2). When combining relative travel-times from different networks, there is a risk that some long-wavelength structure across the array cannot be recovered . However, even though the data sets are not all recorded simultaneously, in our region of interest in Afar/MER, all networks either overlap spatially or in time, and the permanent network is common to all the regional deployments. Station information, including duration of deployment and references for each network, can be found in supporting information Tables T1 and T2. We visually inspected the direct P arrivals from all events of magnitude m b 5.5 in the epicentral distance range 308 < D < 908 recorded between 1995 and 2012. Before analysis, the data were band pass filtered between 0.1 and 1.5 Hz. We picked the dominant peak or trough of the P-phase across all stations and used the multichannel cross-correlation method of VanDecar and Crosson [1990] to estimate relative arrival times (using a window of 1 s before and 2 s after our pick). Those arrival times with correlation coefficients >0.80 and a minimum number of picks per event 4 are retained for inversion. Furthermore, outliers . Two profiles of dV P /dT derivatives that we use to set up our synthetic plume resolution tests and perform our conversion of imaged velocity to temperature anomalies. The blue profile is the full (metamorphic) derivative that includes the effects of phase transitions. It was computed along a 13008C adiabat for a pyrolite composition using mineral parameters from database stx08 [Xu et al., 2008], with composite attenuation model Qg (above 400 km) [Van Wijk et al., 2008] and Q6 (below) [Goes et al., 2004]. The red profile corresponds to the smoothed isomorphic dV P /dT derivative without the effects of phase boundary topography. For our scaling, we use the isomorphic derivatives because the differential travel-time tomography cannot resolve localized phase boundary anomalies (see section 4 and supporting information Figure S7). Above 70 km depth, the isomorphic scaling is set to a constant value of 22% per 100 K, more representative for a lithospheric geotherm.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 exceeding 6 10 s (which might be due to clock errors or cycle skipping) are taken out. Mean travel-time residuals per event were removed to obtain relative travel-times. The formal picking error estimate from the multichannel cross-correlation is 0.01 s; however, a more realistic error estimate, taking into account that correlation pairs are not all independent [Tilmann et al., 2001], is 0.07 s.
This procedure gives us relative P-wave travel-times from 528 events of magnitude m b 5.5. To this, using the same processing as described above, we added travel-times from 261 lower magnitude earthquakes (2007)(2008)(2009)(2010)(2011)5.0 < m b < 5.5), mainly from source regions with low rates of m b 5.5 events (e.g., Mediterranean, mid-Atlantic Ridge). Figure 2 shows that the selected earthquakes provide a good azimuthal and distance coverage. The new seismic data from lower magnitude events together with data from south of Kenya and north of Eritrea provide more crossing rays at the base of the transition zone and top of the lower mantle (supporting information Figures S1 and S2). This improves resolution at these depths (supporting information Figures S3-S6). Our final data set consists of 16,420 relative P-wave travel-times.

Tomographic Inversion and Model Parameterization
The tomographic inversion is performed using the method of VanDecar et al. [1995], where we jointly invert for slowness, earthquakes-source corrections, and near-surface station corrections (which take into account the effect of the crust and uppermost mantle structure directly below the stations where no crossing rays are present), using an iterative conjugate gradient algorithm. The inversion recovers velocity anomalies relative to the average regional background, i.e., the method cannot determine the absolute background seismic velocity. We use ray theory (which may lead to somewhat underestimated anomaly amplitudes but should not significantly modify anomaly patterns [Montelli et al., 2004a]) and do a linear inversion (raypaths are calculated through the iasp91 1-D velocity model). The same inversion scheme has been used in previous tomographic analyses of the region [Benoit et al., 2006a;Hammond et al., 2013;Mulibo and Nyblade, 2013a].
The effect of outliers is reduced by Huber downweighting of travel-times having residuals larger than 3.0 standard deviations [Huber, 1981]. We choose this relatively high cut-off for downweighting after several tests and because other studies have shown that very strong velocity contrasts are to be expected below the region [e.g., Benoit et al., 2006a;Bastow et al., 2008;Hammond et al., 2013]. For our final models, we perform 10 Huber iterations with 2000 conjugate inner iterations to ensure a converged solution (756 data included some level of downweighting).
Where the stations are located (latitude 19.68N-24.48S, longitude 28.48E-55.28E, depth 0-1500 km), the model has a node spacing of 0.58 in latitude, 0.48 in longitude, and 50 km vertically. Outside this region, we extend the grid with node spacing of 18 both in latitude and in longitude and 100 km in depth to a volume spanning 288N-25.408S in latitude, 258E-57.208E in longitude, and from the surface to 2000 km depth, comprising a total of 263,736 nodes. Fresnel zone estimates indicate that such a large model is necessary to limit the mapping of distant structure into the area of interest. In this paper, we only interpret structure in a box spanning 5-178N and 35-478E around the Afar/Main Ethiopian Rift region ( Figure 1).

Regularization
Generally, this style of tomography is only regularized by supressing spatial gradients (smoothing and flattening), and a 1-D starting model is used without explicit damping toward it [e.g., Bastow et al., 2008;Hammond et al., 2013]. However, teleseismic body-waves recorded at regional networks give relatively low depth resolution directly below the stations where there is a lack of crossing paths; in our study this occurs for depths between 0 and 100-200 km. This is exactly the depth range where we expect the strongest anomalies and indeed very strong anomalies have been observed with other seismic data [Makris and Ginzburg, 1987;Maguire et al., 2006;Sicilia et al., 2008;Fishwick, 2010;Guidarelli et al., 2011;Hammond et al., 2011;Stork et al., 2013]. To reduce mapping of these strong shallow anomalies into station corrections and deeper structure, we additionally regularize the solution by damping to a starting model that above 300 km is constrained by surface waves. It has previously been shown that at these depths, surface waves can provide good complementary constraints to teleseismic travel-times [e.g., Allen et al., 2002;Schmid et al., 2008;Chang and Van der Lee, 2011;Fishwick and Rawlinson, 2012]. Including surface-wave structure as an initial model can also help constrain the Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 long-wavelength structure that relative travel-time tomography may not resolve .
To build our 3-D starting model, we use the surface-wave model of Fishwick [2010] (herein referred to as F2010), which is the most recent and highest resolution surface-wave model available for our entire study area. Model F2010 covers all of Africa and fits surface waveforms at periods between 50 and 120 s for the fundamental and first four higher modes, achieving vertical resolution of 50 km down to 300 km depth and lateral resolution on the order of 150 km below our region of interest. This is a coarser horizontal resolution than we can obtain with our travel-time data, which yields a lateral resolution on the scale of station spacing, i.e., 50-100 km. However, the regularized inversion adds smallscale structure to the long-wavelength starting model where the travel-times require it.
To obtain a P-wave speed starting model, we convert the absolute shear velocities from F2010 to temperature and V P following Goes and van der Lee [2002], who use a compilation from the mineral physics literature of phase stability fields, of elastic constants, density, and their dependence on temperature T, pressure P, composition and phase, and an Arrhenius expression for anelasticity Q as a function of P and T. Our conversion is performed using anelasticity model Qg and a constant pyrolitic mantle composition [see Van Wijk et al., 2008 supplement]. P-wave velocity perturbations for the 3-D starting model for our relative traveltime tomography are defined by subtracting the average velocity for the whole African model in each depth interval, essentially giving us a better constrained background model at these depths. Below 300 km depth, we smoothly grade into a zero-anomaly model. This conversion accounts for the fact that scaling from V S to V P anomalies varies strongly with temperature and depth [Goes et al., 2000;Cammarano et al., 2003;Goes et al., 2004]. Although, there are uncertainties in the resulting dV P of around 0.1% [Cammarano et al., 2003], the approximation is much better than using a constant scaling factor. Furthermore, these conversion parameters have previously yielded lithosphere and shallow mantle temperatures that can reconcile P and S-velocities, surface heat flow, and attenuation in the shallow mantle below several continental regions [e.g., Goes et al., 2000;Goes and van der Lee, 2002;Huang et al., 2009]. The conversion by necessity assumes the surface wave velocity anomalies are only thermal, as we cannot unravel the contributions of temperature and melt. Where melt is present, this might cause our V P anomalies to be an overestimate [e.g., Hammond and Humphreys, 2000]. However, melt effects are likely quite localized and thus will have only a strongly damped expression in the long-wavelength surface-wave structure, while the travel-times can add small structure that may be the result of melt concentration. Additionally, melt is likely only present in the top 90 km [Rooney et al., 2007;Rychert et al., 2012;Armitage et al., 2015], thus below these depths a thermal conversion is reasonable.
We first perform a set of inversions to select our preferred smoothing and flattening parameters. We construct a trade-off curve to find the best compromise between model fit and roughness and choose a model close to the ''knee'' of the curve (flattening factor 5 4800; smoothing factor 5 153,600; Figure 4). The tradeoff curves look very similar for different levels of damping ( Figure 4). The maximum achievable RMS reduction, above the noise level, is derived from the estimated RMS travel-time error. Given the data error

Resolution Tests
Before presenting our results, we test the resolution our models can achieve (Figures 5, 6, and supporting information Figures S3 and S4). All resolution tests use the same ray paths and inversion parameters as the data inversion. The synthetic travel-times are calculated by performing full three-dimensional ray tracing [Julian and Gubbins, 1977] through the synthetic model. Gaussian random noise (with an average amplitude of 0.07s, similar to the estimated noise level for the data) is added to the synthetic data. With the exception of the checkerboard tests, the starting model is the 3-D synthetic model down to a depth of 350 km. We first illustrate the spatial resolution and next the effect of damping and our ability to distinguish between the three plume scenarios.

Checkerboard and Transition-Zone Resolution Tests
We perform standard checkerboard tests (supporting information Figures S3 and S4), where we include positive and negative anomaly spheres of 125-250 km diameter (where diameter is defined as the distance to where the amplitude of the anomalies is reduced to 20% of their maximum). These show that we have good resolution for features of at least 125 km diameter from 200 to 700 km depth below most of the area, and at uppermost lower mantle depths in the eastern half and at the south-western edge of the domain. Note that the damping factor does not affect spatial resolution, only the relative recovery of amplitudes with depth. Supporting information Figures S5 and S6 show the same checkerboard tests for the data set and inversion parameters previously used by Hammond et al. [2013]. This clearly illustrates how the increased aperture of our data set and the addition of lower magnitude events have improved resolution in the transition zone and shallowest lower mantle.
As we concentrate our interpretation on the transition zone, Figure 5 further illustrates transition zone resolution (for a case with damping 5 35). Figure 5 (top) shows that if the input consists of only the surface Figure 5. Two resolution tests along cross-section A-B (orientation is shown in the 800 km depth slice through the second input model). The first row shows synthetic input models and the second row the corresponding recovered output models. The synthetic inversion is done using the same parameters as for the data-derived inversion. Regions with less than 3 rays per node are shaded gray. The spacing between the contours is 0.25%. White points indicate the distance every 28. The synthetic tests in the first column use the P-structure estimated from the surface-wave model from Fishwick [2010] as the input model. The plots in the second column show a test with the surface-wave-derived structure at shallow depths plus a set of 1.6% low-velocity anomalies of 380 km diameter (defined as the distance to 20% of the maximum amplitude) along line A-B beneath the transition zone (800 km depth) to represent lower-mantle structure with a large excess T (4008C) (estimated from the isomorphic derivatives shown in Figure 3). Neither the structure above 400 km, nor the structure at the top of the lower-mantle smear significantly into the transition zone.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 wave derived structure above 350 km depth, a minimal amount of smearing into transition zone structure occurs, with amplitudes less than 0.2%. Figure 5 (bottom) shows the effect of adding two checkers of 380 km diameter with a relatively strong amplitude of 1.6% (similar to what we might expect for thermal anomalies as high as 4008C), positioned just below the transition zone (800 km depth). There is low resolution of the deeper structure along this profile, yet this does not lead to spurious structure in the transition zone. Thus, we are confident that any transition zone structure imaged can be resolved independently from structure above and below. 4.1.2. Plume Resolution Tests Next, we present a set of resolution tests of the three end-member plume models (Figure 1). We use simplified plume geometries (Figure 6), motivated by published geodynamic models (see discussion in section 2). We define the plumes in terms of temperatures, and then scale to dV P according to the isomorphic derivatives (i.e., smoothed across the phase boundaries) shown in Figure 3. For case A, the ''Superplume'' ( Figure  1), we use an ellipsoid centered below Kenya, which fills the upper mantle depth to represent a spread plume head in the upper mantle (only part below our study area shown). The ellipsoid is flattened at the top, reaching its maximum temperature anomaly of 4008 at 120 km depth, with a Gaussian fall off down to 600 km depth. At depths less than 100 km, we add to this model a set of low velocities (4%) that follow the rift zone, to capture the effect of channeling and melting along topography at the base of the lithosphere, as imaged in previous seismic studies of the region Bastow et al., 2008;Hammond et al., 2013]. For case B (single upwelling) and C (multiple upwellings) (Figure 1), we use different sizes of vertical cylindrical plumes with a Gaussian variation of temperature across: for case B, a single cylinder of 380 km diameter (defined as the distance to 20% maximum anomaly) directly below Afar, and for case C, a set of 150 km diameter plumes, one below Afar and one below the MER-to illustrate the variable resolution across the network. Maximum excess DT for the latter two cases is set to 2008 and does not vary with depth. To both case B and C, we add a 200 km thick layer of hot (2008) material that has spread out below the lithosphere, as well as additional 4% shallow V P anomalies along the rift to represent the effect of localized melt concentrations and lithospheric topography. Figure 6 shows the resulting synthetic seismic plume structures. Due to the correction for seismic sensitivity to temperature, the synthetic anomalies decrease strongly with depth. (The additional structure predicted with a full metamorphic derivative, which includes the effect of phase boundary shifts with temperature (Figure 3), is not resolvable with this data and method, see supporting information Figure S7). The derivatives are computed along a 13008C adiabat for a pyrolite composition using mineral parameters from database stx08 [Xu et al., 2008], with composite attenuation model Qg (above 400 km) [Van Wijk et al., 2008] and Q6 (below) [Goes et al., 2004]. We set the derivative above 70 km depth to a constant value of 22% per 100 K, as the reference adiabat is too hot at lithospheric depths. Furthermore, the structure superimposed on the plume structures above 100 km depth accounts for the localized effects of high-temperature material and melt at shallow depths.
As for the checkerboards, we calculate synthetic data and then perform separate damped (factor 5 35) and undamped (factor 5 0) resolution tests for the synthetic plumes. For case A, the superplume analog, we see that an undamped test recovers only about 20-30% of the strong input structure, because the lateral variations in structure across the network are small and the 1-D background is removed in the inversion. The damped inversion with constraints on the shallow mantle structure recovers a structure much closer to the input anomalies, with anomaly recovery between 40% and 90%. Furthermore, the undamped solution leads to smearing of the strong anomaly to depths below the transition zone. This smearing is reduced in the damped model, which thereby returns a better representation of the depth extent of the input plume structure.
For case B and C, the vertical thermal plumes of different scales, the undamped inversion recovers mainly the structure below 200 km depth. The layer of plume material above 200 km is removed with the background model, while most of the shallow melt/topography anomalies are absorbed in the station corrections. Damping toward the mantle structure above 350 km, leads to a much better recovery of the geometry and continuity of the structures (although there is some overestimate of anomaly amplitude between 200 and 500 km depth) as well as a better representation of the anomaly depth extent through the transition zone.

Main Model Features
The P-wave structures from inversions with our preferred combination of smoothing and flattening and for a case without damping and one with damping factor 5 35 are shown in Figure 7-9. A model with stronger Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 damping (factor 70) is included in the supplement (supporting information Figure S8). As in previous body and surface-wave models, the shallow mantle below the region is slow overall [e.g., Debayle et al., 2001;Bastow et al., 2008;Park et al., 2008;Fishwick, 2010;Hammond et al., 2013]. Embedded within this slow background, down to about 100 km depth (not shown), strong low-velocity structures align with the surface rift zones, very similar to previous tomographic models Benoit et al., 2006a;Bastow et al., 2008;Hammond et al., 2013].
From about 200 km depth, two clusters of low-velocity anomalies, each covering an area about 400 km in diameter, start to appear: one below the Afar/Red Sea region, and the second cluster west of the Main Ethiopian Rift, an area characterized by off-rift volcanism. While these two clusters were already seen in previous tomography Benoit et al., 2006a;Bastow et al., 2008;Hammond et al., 2013], with our deeper resolution, it is now clear that they continue from 200 km down through the transition zone (Figures 7 and 8). The transition zone anomalies are very prominent in the case without damping. However, even when damping forces more of the low velocities into the shallow mantle, the anomalies through the transition remain required by the data (and this is still true for stronger damping-supporting information Figure S7). With a damping factor of 35, the low-velocity zones decrease in magnitude with depth from dV P 22.0% in the uppermost mantle to about 20.3% in the uppermost part of the lower mantle, a decrease similar in magnitude to what is expected from seismic sensitivity to temperature as a function of depth (Figure 3, section 2).
The two low-velocity clusters may each comprise additional smaller-scale structure of 100-200 km diameter (Figure 9). The cluster west of the MER contains at least two structures that are more or less vertical through the transition zone, one directly west of the rift zone ( 398E, 98N; profile A-B and profile E-F), and the second at the western edge of the model (368E, 118N; profile C-D). The cluster under Afar contains one prominent narrow structure through the transition zone (profiles A-B, C-D, and G-H), but may contain a second, less well-resolved, one below Djibouti (438E, 128N). This small-scale structure is further highlighted by less smooth models (see supporting information Figure S9).
One effect of damping is that the station terms are significantly reduced and are negative beneath the oceanic Red Sea and positive elsewhere, mirroring the distribution of continental and oceanic crust. In contrast, station delays in the undamped model are largely positive. This indicates that the damped inversion recovers much of the uppermost mantle structure that without damping is absorbed into the station terms. Note that on a larger-scale, negative delays at the stations outside of our region of interest reflect the influence of the African and Arabian cratons, and balance out the generally positive delays inside the investigated area (supporting information Figure S10), again emphasizing that structure below the northeastern EAR is anomalously slow.
In the upper part of the lower mantle, down to at least 800 km depth, there are resolvable low-velocity anomalies of a similar scale as those in the transition zone, and of the magnitudes expected from seismic sensitivity to a thermal anomaly of 1008 (Figure 3). However, the anomaly pattern changes across the transition from upper to lower mantle, hinting at three low-velocity features below 700-750 km depth (Figures  7 and 8) and there is often a misalignment between lower and upper mantle structures (Figure 9).

Temperature Estimates
The anomalous low P-wave velocities found beneath the Northern East-African rift can be caused by changes in temperatures, chemical composition, partial melting, water content, or a combination of Figure 6. Three synthetic tests to examine the resolving power of our tomographic inversion for mantle upwelling structures based on the proposed scenarios A-C from Figures 1. (a, f, k) Depth slices through the input models at 400 km depth. (b, g, l) Cross sections through the input models. Input models are defined in temperature and then multiplied by the isomorphic derivative from Figure 2 to obtain V P anomalies. In all cases, a set of 4% V P checker anomalies below the rift is added to the plume structures above 100 km depth to mimic lateral variability due to lithospheric topography and melt concentration. Regions with less than 3 rays per node are shaded gray. The spacing between the contours is 0.25%. White points indicate the distance every 28. (c, h, m) Recovered models using the same parameters as in the data inversion, with no damping. (d, i, n) Recovered models using the same parameters as in the data inversion, including damping (factor 5 35). The starting model to which the inversion is damped is the 3-D synthetic model down to a depth of 350 km. (a-d) Case A: superplume, represented by a flat ellipsoid with a maximum excess T of 4008C at 120 km depth, and a Gaussian fall off with depth leading to 08 anomalies at 660 km; (f-i) Case B: a single vertical cylinder (representing a larger upwelling) positioned beneath Afar with maximum 2008C excess T that is constant with depth and Gaussian variation laterally over a 380 km width (defined as the distance to 20% of the maximum amplitude), plus a layer of 2008C excess temperature above 200 km depth; (k-n) Case C: two vertical cylinders (representing a smaller upwelling), positioned beneath Afar and MER, again assuming a maximum 2008C T anomaly that is constant with depth, a Gaussian variation laterally over a 150 km width and a 2008C hot layer above 200 km depth. (e, j, o) Profiles through the center of the plumes showing the input and retrieved anomalies. These tests illustrate that without damping most shallow structure is not recovered. Damped inversions yield a better representation of the vertical continuity of the plume structures.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 these. Given (i) the strong sensitivity of the seismic velocities to temperature and weak sensitivity to composition [e.g., Sobolev et al., 1997;Goes et al., 2000;Schutt and Lesher, 2006], (ii) the evidence from surface observations for high mantle temperatures below many hotspots including Afar [McKenzie, 1984;Nolet et al., 2006;Putirka et al., 2007;Class, 2008;Rooney et al., 2012;Ferguson et al., 2013;Armitage et al., 2015]; and (iii) the expectation that thermal buoyancy is the main driver of active upwellings from the deep mantle [Morgan, 1971;Ito and van Keken, 2007], we perform an interpretation of the velocity anomalies purely in terms of temperature variations. Below (section 5.2), we discuss the possible effect of the presence of fluids and melt.
When converting to temperature, it is important to consider the effect of seismic resolution [Ritsema et al., 2007;Schuberth et al., 2009]. We do this by scaling the tomographic models with the estimated amplitude recovery from the plume resolution tests shown in Figure 6, before converting to relative temperature anomaly. Given the spatial extent of the anomalies we find, we use the amplitude scaling from case C (Figure 1, section 4.1.2). There is some uncertainty in this, as estimated amplitude recovery is geometrydependent. We will use the range of models with different degrees of damping (factor 5 0-70) to bracket the plausible range of velocity anomalies resulting from the regularization choice. We estimate the temperature variations dT from the velocity perturbations dV P using the isomorphic dlnV P /dT of Figure 3. Uncertainties in the isomorphic derivative lead to uncertainties in temperature anomalies of a few tens of degrees [Goes et al., 2000;Cammarano et al., 2003].  Figure 7h represent the sign and magnitude of the station static terms. Regions with less than 3 rays per node are shaded gray. The spacing between the contours is 0.25%. These maps show that the two clusters of low-velocity anomalies that appear at 200 km depth persist throughout the transition zone.

10.1002/2015GC005948
Results are shown for the two regions where low-velocity anomalies appear continuous from the surface to the lower mantle (boxes marked on Figure 8e). Figure 10 displays the distribution of the seismic velocity anomalies and inferred temperature anomalies within these boxes at a range of depths. We focus on the depths greater than 300 km where the two low-velocity clusters are the main velocity structures. This avoids those depths where velocities are likely complicated by rift-related structure and melt. The original, unscaled, tomographic V P anomalies decrease strongly with depth, from 21.2 to 21.6% at 300 km to 20.3 to 20.5% at 700 km depth. After scaling the anomalies according to the resolution inferred from the synthetic plume tests, the anomalies at 300 km depth become 20.8% and at 700 km depth become 20.5 to 20.7%.
When the velocity anomalies are converted to temperature without scaling them for the variable resolution with depth, the resulting temperature anomalies decrease from 150-1708C at 300 km depth to 70-1108C at 700 km depth. This contrasts with the expectation that plume temperature anomalies should increase with depth. However, after accounting for resolution the converted temperatures give a slight increase with depth from 1008C at 300 km to 120-1508C at 700 km ( Figure 10). This is in line with the small increase in temperature expected with depth for adiabatically rising plumes.
The two supporting information Figures S11 and S12 plot the temperature estimates for the end-member cases with no damping (body-wave only model) or stronger damping (factor 70) (surface-wave dominated  Figure 8h represent the sign and magnitude of the station static terms. Regions with less than 3 rays per node are shaded gray. The spacing between the contours is 0.25%. The two boxes covering Afar (A) and an area west of the MER (M) in Figure 8e show the regions used in our temperature interpretation in Figure 10. These maps clearly illustrate the change in structure from the rift-related and broad shallow mantle anomalies at 200 km depth to two low-velocity clusters that persist throughout the transition zone and may link to deeper structure in the lower mantle.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 model). For the case without damping, mean temperature anomaly estimates after resolution correction increase from about 908C at 300 km to 210-2308C at 700 km depth. For the case where damping 5 70, thermal anomalies show minimal change from 908C at 300 km depth to 50-1008C at 700 km depth. This shows that, with the inclusion of damping in the parameterization, we get more realistic trends of constant or slightly increasing temperature anomaly with depth, with estimates of the temperature between 50 and 1508C. Due to the integrated constraint on overall travel-times, the main difference is the variation in anomaly magnitude with depth.

Comparison With Other T Estimates
Our tomographic temperature estimates of 50-1508C excess temperature are consistent with those from previous studies in the region. Geochemical studies yield asthenospheric source temperatures of 100-1508C above the mantle temperature typically inferred below mid-ocean ridges [Rooney et al., 2012]. A shallowmantle (<140 km depth) receiver function analysis infers temperatures between 1350 and 14008C, no more than 50-1008C above ambient mantle [Rychert et al., 2012;Ferguson et al., 2013]. However, uncertainties in what property of the melt zone the receiver functions actually image allow temperature estimates up to Figure 9. Vertical cross sections through our preferred (damping 5 35) (a-d) and undamped (damping 5 0) models (e-h) (both flattening 5 4800, smoothing 5 153,600). The location of the cross-sections (black lines) is shown in the 500 km depth slice through the damped model (i). Regions with less than 3 rays per node are shaded gray. The spacing between the contours is 0.25%. White points indicate the distance every 28. Cross-section A-B cuts through subvertical downwellings below Afar and west of the MER. C-D is a cross section through the prominent low-velocity anomalies in the Afar region and directly next to the MER. Section E-F cross cuts the anomaly next to the MER, while section G-H provides another view of the Afar low-velocity anomaly. The undamped models (e-h) illustrate the 100-200 km width of the low-velocity structures, while the damped models (a-d) emphasize the continuity between shallow and transition zone structure.
Transition-zone temperatures have been estimated from topography on the phase boundaries imaged with seismic receiver functions. While studies focussing beneath Tanzania/Kenya have concluded that there needs to be a substantial thermal anomaly of 200-3008C through the transition zone below the southern rift [Owens et al., 2000;Huerta et al., 2009], contradictory results have been obtained below the northern EAR. Cornwell et al. [2011] invoked anomalies of about 2508C to explain a deep ''410'' plus significant chemical heterogeneity to explain the 30-40 km topography they recovered on the ''660'' discontinuity. However, Thompson et al. [2015] find little variation in depth of ''410'' and ''660'' below the region, and consequently suggest that the temperature anomaly is less than 1008C, to explain the relatively constant thickness of the transition zone similar to the global average (as also found below two stations in the region by Nyblade et al. [2000]). Our temperature estimates are more consistent with the lower temperature estimates of Thompson et al. [2015] and Nyblade et al. [2000]. According to the thermodynamic database we use [Xu et al., 2008], the 660 discontinuity should be governed by the ringwoodite to bridgmanite1magnesiow€ ustite transition at such warm temperatures, and hence be elevated.
Given the consistency between our temperature estimates and those from other studies, there is no reason to invoke other mechanisms to explain the imaged P-velocity anomalies. However, within the uncertainties of our inversion, we can also not rule out that other factors have some contribution. In the transition zone, Thompson et al. [2015] propose the presence of fluids and hydrous melt to explain a low-velocity layer they see above the  Figures 10b and 10d show the mean and standard deviation of the temperature anomalies estimated in each region. These show a slightly increase of excess temperature with depth from 1008C at 300 km to 120-1508C at 700 km.
Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 ''410'' discontinuity below Afar and the presence of a strong ''520'' discontinuity directly below it. Our teleseismic travel-times have no resolution to rapid changes of velocity over short vertical distances (see supporting information Figure S7). It has been suggested that carbonate melts may be present in the mantle below 200 km to explain the extremely slow delay times at some Ethiopian stations and inferred strong low-velocity anomalies at depth [Rooney et al., 2012], but our models do not require this. At lithospheric-asthenospheric depths (<80 km), however, there is ample independent evidence for the presence of significant amounts of (likely silicate) melt [Ebinger and Casey, 2001;Bastow et al., 2005;Kendall et al., 2005;Sicilia et al., 2008;Guidarelli et al., 2011;Hammond et al., 2011;Desissa et al., 2013;Stork et al., 2013;Hammond, 2014;Hammond et al., 2014;Korostelev et al., 2014].

Geodynamical Implications
By comparing the tomographic models (Figures 7-9) with the resolution tests in Figure 6, it is clear that the structure we resolve is significantly more complex and smaller scale than that expected for large-scale flow spanning most of the upper mantle from a superplume to the south [Hansen et al., 2012], or for a single plume rising directly from the lower mantle below Ethiopia/Arabia Chang and Van der Lee, 2011]. Rather the structure we find is of a scale most like that of the third case, C, with smaller scale upwellings that rise through the transition zone below the region. The idealized structures of our synthetic case C test do not capture all of the complexity of the actual structure imaged, but from the three models previously proposed to account for both mantle structure and surface expressions, case C is the closest to the structure actually imaged.
Dynamic models generate such small-scale upper-mantle upwellings when plume material stalls below an endothermic phase transition at the base of the transition zone, where it forms a regional thermal boundary layer from which smaller-scale upper mantle plumes spawn [e.g., Kumagai et al., 2007;Tosi and Yuen, 2011] ( Figure 11). In such models, the difference in lower and upper mantle scale of the upwellings is the result of a decrease in viscosity at the lower-upper mantle interface. Stagnation of plume material below the transition zone may be aided by dense compositional heterogeneity in the plume [Farnetani and Samuel, 2005;Kumagai et al., 2008], but this is not required; numerical and analog models found that plume material can also pond below an endothermic phase transition at the ''660'' discontinuity [e.g., Kumagai et al., 2007;Tosi and Yuen, 2011;Bossmann and van Keken, 2013]. If there is sufficient viscosity contrast between the plumes and surrounding mantle, the ponded material can disperse over distances as large as 1000 km [Tosi and Yuen, 2011]. Kumagai et al. [2007]'s analyses predict a spacing of 500-1200 km between the secondary upper mantle upwellings. The distance between our MER and Afar features is at the lower end of these, implying a relatively low effective upper mantle viscosity (10 19 Pa s).
Our relative travel-time inversions are insensitive to a constant velocity layer that might correspond to ponded plume material. However, our models do indicate that the velocity anomaly pattern changes across 660, which would be consistent with different dynamics below (ponding) and above (small-scale plumes) the base of the transition zone. Our lower mantle patterns are consistent with those imaged by Chang and Van der Lee [2011] in particular the low velocity anomaly at 800 km depth at southern end of MER coincides with the top of the lower mantle low-velocity feature they image. A further expansion of the joint network aperture, ideally with deployments away from the rift in Sudan/Somalia, will be necessary to improve resolution deeper and test whether there is ponded material at the top of the lower mantle and how it is linked to potential deeper upwellings ( Figure 11). More detailed seismic imaging and modeling will also be required to test how patterns of seismic anisotropy, which indicate strong horizontal mantle flow in the mantle below the MER and Afar Montagner et al., 2007;Sicilia et al., 2008;Gao et al., 2010;Hammond et al., 2014], can be reconciled with the upwellings structures we find.
In no other parts of the world have small-scale upwellings been seen at the resolution we have achieved below northeastern Africa, although the existence of multiple upwellings below East Africa has been suggested from geochemical arguments [Meshesha and Shinjo, 2008]. Based on seismic tomography, small-scale plumes have been suggested to be present below Europe [Granet et al., 1995;Ritter et al., 2001], where they have been attributed to plume-slab interaction [Goes et al., 1999]. Analog models of small-scale plumes [Kumagai et al., 2007] have also been used to explain the clusters of hotspots in the central Atlantic (including Azores, Canaries, Cape Verde, Madeira, and Great Meteor) and the Indian Ocean (Marion/Crozet) and the temporal evolution of the Reunion hotspot. Tosi and Yuen [2011] proposed a model of ponded and flowing plume material below ''660'' to explain some of the enigmatic observations for Hawaii [Cao et al., 2011]. The Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948 tomography below Hawaii does not rule out that there is more complexity than that predicted for a single upwelling [Wolfe et al., 2011]. Hence it is possible that multiple upper mantle upwellings above a single lower-mantle source are quite common.

Conclusions
We combine P travel-time data from 26 networks in Africa and Arabia to obtain a tomographic model of P velocity from the surface into the top of the lower mantle below the northern East African, Red Sea and Gulf of Aden rifts, comprising Yemen, the Afar Depression, Ethiopia/Eritrea/Djibouti and Main Ethiopian Rift (MER). Structure above 300 km depth, where the vertical resolution of teleseismic body waves is limited, is damped moderately toward the 3-D surface wave velocity structure from Fishwick [2010], converted to P wave speed using a thermodynamic approach.
We find at least two clusters of low-velocity structures that are continuous from the shallow mantle down to depths of 700 km. One of the low-wave speed clusters is located below the Afar depression, and the other one below the western side of the MER, an area characterized by off-rift volcanism. Each of the clusters appears to comprise one or more smaller-scale subvertical low velocity features of 100-200 km diameter. Amplitudes of the low velocity anomalies decrease by about a factor of four between 100 and 800 km depth. However, when we take into account the decrease of seismic resolution with depth and the decreasing sensitivity of V P to temperature with depth, these correspond to a constant or slightly increasing temperature anomaly with depth of 100 6 508C. Figure 11. Cartoon showing the geodynamical interpretation proposed based on our tomographic study and thermal conversion. We interpret the low-velocity structures we image through the transition zone to be upwellings with a temperature excess of 100 6 50 K. The small scale of these structures requires a local thermal boundary layer at the top of the lower mantle. This ponded material may be fed by previously suggested lower mantle upwellings [Ritsema et al., 1999;Hansen et al., 2012, Chang andVan der Lee 2011].

Acknowledgments
We would like to thank the many people involved in the collection of data used in this study. Data specifically provided for this study included those recorded instruments in Yemen, Ethiopia, and Eritrea provided by SEIS-UK. The facilities of SEIS-UK are supported by the Natural Environment Research Council under Agreement R8/ H10/64. These data will be available at the IRIS-DMC in 2015/2016 (http://ds.iris. edu/ds/nodes/dmc/). The instruments deployed in Djibouti belong to the French national pool of portable seismic instruments Sismob-RESIF (http:// sismob.obs.ujf-grenoble.fr) and are funded by Actions-Marges. The network deployment in Yemen was funded by ANR YOCMAL 07BLAN0135. The instruments deployed in Uganda-Congo were provided by the Geophysical Instrument Pool Potsdam (GIPP) and data were archived at the GEOFON data center (http://geofon.gfz-potsdam.de). The permanent instruments in Ethiopia are run by the Institute of Space Science, Geophysics and Astronomy, Addis Ababa University (Contact for data: atalay.ayele@aau.edu.et), and financial support for most of the Ethiopian permanent seismic stations comes from the International Science Program (ISP) of Uppsala University (Sweden). Other data were downloaded from the IRIS and GEOFON data centers. We thank Ivan Koulakov, Cindy Ebinger, and three anonymous reviewers for their constructive and critical comments that helped us improve our paper. Funding for data collection was provided by NERC grants NE/E007414/1, and NE/J012297/1 and BHP-Billiton. We thank Ian Bastow and John Armitage for discussions. C.C. was supported by a Janet Watson Fellowship from the Department of Earth Science and Engineering at Imperial College, J.H. by NERC Fellowship NE/I020342/1, D.K. is supported by NERC grant NE/L013932/1. Geochemistry, Geophysics, Geosystems 10.1002/2015GC005948