Subset relations in ellipsis licensing

This paper aims to provide arguments for the claim that ellipsis licensing requires elided material to constitute a subset of its antecedent. On the empirical side, I focus on deriving Ross’ (1970) generalization that backward gapping is restricted to OV contexts. A new operation Total Impoverishment is proposed for ellipsis, which involves insertion of null morphemes into the ellipsis site in a late insertion framework such as Distributed Morphology. This approach is developed as a possible alternative to Merchant’s (2001) account based on the formal [E]-feature. It will be shown that the correlation between word order and the directionality of gapping falls out under the present approach, and in addition more general arguments for the role of the subset in ellipsis will be presented, as well as a critique of the [E]-feature approach from the perspective of alternatives to formal features.


Introduction
Ellipsis is a pervasive property of natural language and has been the subject of close attention since the onset of Generative Grammar (e.g. Ross 1969;1970;Wasow 1972;Sag 1976;Williams 1977). Decades of cumulative research have done much to uncover the conditions under which ellipsis can apply, since it is clear that it is not completely unconstrained. The focus of this paper will be on the licensing conditions for ellipsis, that is, the structural configurations in which ellipsis is possible. Although approaches range from syntactic to semantic identity and positions in between (see Merchant to appear for an overview), recent work has begun to explore the idea that ellipsis licensing requires that the ellipsis site be, in some relevant sense, a subset of the antecedent (e.g. Merchant 2013a; Rooryck & Schoorlemmer 2014;Saab 2015). While it is certainly possible to enrich existing approaches to ellipsis licensing, e.g. Merchant's (2001) [E]-feature approach, to include this subset property, I take a radically different view that reduces ellipsis to morphology. The central idea is that ellipis is actually the result of the insertion of null exponents into the structure in a 'late insertion' approach to morphology, rather than lack of pronunciation at PF. This idea rests on a new operation Total Impoverishment, an impoverishment rule that completely reduces the feature set of a given node to the empty set, in the context that the aforementioned subset condition is met. Once a terminal bears an empty feature set, the only possible realization is as a null 'Elsewhere' morpheme (Ø), thereby deriving the effect of non-pronunciation. I will demonstrate how this approach will allow us to neatly derive an old, yet still puzzling, generalization about the directionality of gapping going back to Ross (1970). Ross' observation was that VO languages like English allow for 'forward gapping', i.e. gapping of the verb in the second conjunct only, whereas OV languages appear to be significantly more flexible in allowing for 'backward gapping' (ellipsis in the first conjunct). By paying closer consideration to the interaction (1) a. We want to invite someone, but we don't know [ CP  arrived. (Lobeck 1995: 20) Crucially, Lobeck views ellipsis sites as empty categories (pro) (2) and as such, the general requirements on empty categories (e.g. the Empty Category Principle (ECP) ;Chomsky 1981;Lasnik & Saito 1992) then also apply to ellipsis. Lobeck proposes the following condition, which encompasses both the identification and licensing requirements mentioned above: (2) Licensing and Identification of pro: An empty, non-arbitrary pronominal must be properly head-governed, and governed by an X-0 specified for strong agreement. (Lobeck 1995: 41) The fact that the ellipsis sites in (1) seem to correlate with the complement positions of certain functional heads seems to follow from the assumption that these are an empty category pro that must be head-governed.

Ellipsis licensing in Minimalism: [E]-feature (Merchant 2001)
In the transition from GB to Minimalism, there was a move away from the previous proliferation of empty categories and the corresponding filters (e.g. ECP). Furthermore, government was dispensed with in Chomsky (1995) and with it, Lobeck's licensing condition for ellipsis. In his seminal work, Merchant (2001: 60f.) imports Lobeck's original insights in the Minimalist Program 'recasting them as featural matching requirements in a head-head relation'. To do this, Merchant proposes an [E]-feature on the relevant head in (1). The [E]-feature is 'the sole repository of all information about the ellipsis' (Merchant to appear) and effectively states the phonological, syntactic and semantic requirements on the ellipsis site. Its primary function is to instruct PF to 'skip its complement for the purposes of parsing and production' (Merchant 2001: 60). For example, in a case of VP ellipsis, we can assume, following Merchant (2013b), that the head bears the [E]-feature triggering ellipsis of its complement (VP) as in (3): (3) John spoke first and then Mary did 〈speak〉.
The first thing to note is that Merchant, for good reasons, assumes a phrasal ellipsis approach, i.e. that ellipsis sites are not empty pronominals but rather contain fully-fledged syntactic structure. However, if we compare this approach to Lobeck's, it becomes apparent that the conditions on ellipsis no longer follow as elegantly. For Lobeck, the fact that ellipsis sites were empty pronominals meant that their licensing conditions stemmed from other independently motivated restrictions on empty categories such as the ECP. 1 Since these concepts were culled in the transition to Minimalism, the [E]-feature simply restates the conditions on ellipsis, for example, there is no deeper reason why only particular functional heads license ellipsis other than the fact that they bear the [E]-feature. Criticism along these lines has already been made by Thoms (2010): One problem with these feature-based accounts is that they adopt the notion of a licensing head in a context where it has been denuded of its wider theoretical justification. The move to Minimalism has effectively rendered the licensing head an ellipsis-specific stipulative category, since notions like "government" have been dispensed with […]. The elegance of Lobeck's account has thus been lost in transition, and the modern accounts of ellipsis licensing have been left to look somewhat stipulative in the context of Minimalist inquiry. (Thoms 2010: 2f.) If we compare this state of affairs to the trigger for movement, the EPP feature, we see that there is a clear parallel.  (Merchant 2001: 61). In the following sections, I will address the various respective semantic, phonological and syntactic respects of the [E]-feature.

The syntax of [E]
In the previous section, we saw that some of the conceptual objections levelled at the EPP feature also hold for the [E]-feature. We will in fact see that certain assumptions about the syntax of [E] actually exacerbate the problem, as they involve a tacit proliferation of [E]-features. It is assumed that the [E]-feature is bundled with certain other syntactic features (Merchant 2001;Aelbrecht 2011). For example, consider the case of sluicing (TP ellipsis) such as in (4) The data in (4) show that sluicing is only licensed by particular C-heads (Ross 1969;Lobeck 1995;Merchant 2001 (Merchant 2014: 20) and [E S ] for sluicing (Merchant 2004: 670). The question then arises of exactly how many [E]-features we are dealing with. It seems that, having to entertain variation in syntactic specification both in and across languages, we are forced to say that there are as many different specifications of the [E]-feature as there are ellipsis constructions. Thus, there is a considerably construction-specific nature about the [E]-feature that is also true of its formal feature cousin, the EPP feature.

The semantics of [E]
The [E]-feature imposes a semantic identity requirement on elided material. Merchant (2001) claims that elided material must be 'e-given' in order for ellipsis to be possible. The e-GIVENness requirement is stated as follows: (5) e-GIVENness (Merchant 2001: 26): An expression E counts as e-given iff E has a salient antecedent A and, modulo ∃-type shifting (i) A entails F-clo(E), and (ii) E entails F-clo(A) Thus, ellipsis requires a salient antecedent and there must be a mutual entailment relation between the F(ocus)-clo(sure) of the antecedent and ellipsis site. This can be illustrated on the basis of the following example from Merchant (2001: 27).
(6) a. Abby called Ben an idiot after Mary did 〈call Ben an idiot〉. (∃x.x called Ben an idiot ↔ ∃x.x called Ben an idiot) b. #Abby called Ben an idiot after Mary did 〈insult Ben〉.
(∃x.x called Ben an idiot ↮ ∃x.x insulted Ben) In (6a), ellipsis is possible, whereas the near-synonymous ellipsis site in (6b) is not licensed. F-closure involves existential closure of the elided category and the antecedent. For example, the antecedent in each case called Ben an idiot contains lambda-abstraction over the subject position, which, under ∃-closure, yields ∃x.x called Ben an idiot 'there is an x, such that x called Ben an idiot'. Crucially, there is no entailment between the ellipsis site insult Ben (∃x.x insulted Ben) and the antecedent called Ben an idiot (∃x.x called Ben an idiot) since it is possible to insult somebody without necessarily calling them an idiot. The motivation for this semantic condition on licensing/identification comes from the observation that syntactic isomorphism is not a necessary requirement for ellipsis, i.e. there can be mismatches in form between antecedents and ellipsis sites: (7) a. Mary took Intro to Syntax because Peter did 〈take Intro to Syntax〉. b. Decorating for the holidays is easy, if you know how 〈to decorate for the holidays〉. c. John finished his homework before Mary did 〈finish her homework〉.
However, the e-GIVENness requirement of the [E]-feature can be shown to be not quite restrictive enough (a fact acknowledged by Merchant 2008b: 134). One case of overgeneration discussed by Hartman (2009) involves so-called relational opposites such as win-lose: Relational opposites: John will win at chess and then Mary will 〈lose at chess〉. ∃x. x wins at chess ↔ ∃x. x loses at chess Despite the mismatching ellipsis site in (8) being illicit, e-GIVENness predicts this to be a legitimate instance of ellipsis since relational opposites inherently stand in a mutual entailment relation; if somebody wins at chess, then this entails that somebody loses at chess. Furthermore, Merchant (2013b) discusses active/passive voice mismatches in sluicing such as (9). In this example, there is a passive antecedent John was murdered and an active ellipsis site x murdered John. As Merchant (2013b: 87) points out, the fact that this particular mismatch is illicit constitutes a problem for the semantic licensing/identification condition imposed by the [E]-feature as described in Merchant (2001). If we assume that the denotation of the passive antecedent involves an existential quantification over the agent (cf. Bruening 2013), then F-clo of the antecedent and the ellipsis site will be identical, and thus mutually-entailing (9).
(9) *John was murdered but we don't know who 〈murdered John〉.
(∃x.x murdered John ↔ ∃x.x murdered John) Saab (to appear) discusses a problematic case for the approach to identity in which there is a mismatch between tense forms with Spanish TP ellipsis. Furthermore, Saab (2015) discusses cases of what he calls Bias Vehicle Change in which a synonymous expression or nickname (e.g. rabbit vs. bunny) is not licensed in the ellipsis site, despite this not being ruled out by a theory of identity based on mutual entailment. These are then further instances of over-generation that indicate that the licensing conditions on ellipsis involve more than just e-GIVENness.

The phonology of [E]
Arguably, the primary function of the [E]-feature is to trigger non-pronunciation of its complement. This can be represented as follows: (10) ϕ TP →Ø / E ____ (Merchant 2004: 671) This is simply a statement that PF carries out ellipsis by either non-parsing or somehow 'deleting' the complement of the licensing head. An alternative is to achieve the effect of non-pronunciation by assuming that ellipsis somehow blocks Vocabulary Insertion in a 'late insertion' approach to morphology such as Distributed Morphology (Halle & Marantz 1993 with the instruction 'no Vocabulary Insertion may apply to the terminals of my complement'. However, if Vocabulary Insertion proceeds blindly from root (as suggested by contextual allomorphy), then by the time the C head is reached Vocabulary Insertion will have already taken place at the ellipsis site. er points out this approach could be made to work if we assume that lower can see the features of higher As such, the instruction not to insert should actually come to late, if strict locality is maintained. Nevertheless, it seems that trying to establish a link between ellipsis and Vocabulary Insertion in a DM approach is a promising avenue of inquiry that has not yet been fully explored. I will pursue this idea in the remainder of this paper.

The subset requirement on ellipsis licensing
It was shown in the previous discussion that syntactic identity, in terms of isomorphism, cannot be correct in the strictest sense. There are, however, a number of arguments that at least some syntactic identity is required (Chung 2006;Merchant 2008a;Tanaka 2011;Merchant 2013b;Chung 2013 In (11a), it is clear that English tolerates preposition stranding under sluicing. Furthermore, (11b) indicates that 'sprouting' is also possible, i.e. it is not necessary for the prepositional phrase of who(m) to have an overt antecedent such as of someone in (11a). However, preposition stranding with sprouting is ungrammatical (11c). Under a semantic approach to licensing, it is unclear why the presence of an extra of in the ellipsis site should affect mutual entailment. Chung (2013: 30) suggests that the DP 'must be Caselicensed by a head identical to the corresponding head in the antecedent clause'. Since there is no corresponding P head in the antecedent clause of (11c), the ellipsis is ungrammatical. This can be captured on an intuitive level by Merchant's (2013a) paraphrase of Chung's (2006) lexico-syntactic matching requirement: No new words 'pedantic recoverability': Every lexical item in the numeration of the sluice that ends up (only) in the elided IP must be identical to an item in the numeration of the antecedent CP. Merchant (2013a: 460) also offers a more precise definition that he dubs the No new lexeme requirement: where M E is the set of lexemes in the elided phrase marker and M A is the set of lexemes in the antecedent phrase marker. (M E -t ⊆ M A ) (Any non-trace lexeme m that occurs in an elided phrase must have an equivalent overt correlate mʹ in the elided phrase's antecedent.) It is clear from the statement in parentheses (M E -t ⊆ M A ) that this equates to saying that the set of lexemes in the ellipsis site (minus any traces) must be a subset of those in the antecedent, that is, there cannot be additional material in the ellipsis site that is not present in the antecedent (a conclusion also arrived at by Rooryck & Schoorlemmer 2014;Saab 2015and, to some extent, Oku 19982001 3 ). It is possible to interpret these data in one of two ways: either the set of terms or lexemes in the ellipsis site must be a subset of the elements in the antecedent or alternatively, the morphosyntactic features of the ellipsis site must be a subset of those in the antecedent. Looking at the previous example, it seems either view of the subset approach would work since the preposition in the ellipsis site destroys the subset relation between the ellipsis site and the antecedent: (15) a. *The vase was stolen, but we don't know who 〈stole the vase〉. b. *Someone murdered Joe, but we don't know who by 〈Joe was murdered〉.
Following Merchant (2013b), the reason why the examples in (15)  These examples show that the -ing form of the verb has more functional structure than the past participle; they claim that the progressive form has an additional underspecified mood feature. As such, both left and leaving have a tense feature specified for [-fin] and only leaving has this additional mood feature ensuring that the features of left are a subset of those of leaving, with the inverse not being true. Tanaka (2011) discusses categorial mismatches under ellipsis and shows that VPs cannot serve as antecedents for NP ellipsis (17a) and derived nominals cannot be antecedents for VP ellipsis (17b): (17) a. *Eddy has already reported on the accident, but we cannot find Tim's 〈report on the accident〉. b. *We read Tim's report on the accident, but Eddy hasn't 〈reported on the accident〉.
As a reviewer points out, there do seem to be some cases where it is possible for a deverbal noun to license non-pronunciation of the verb (see Fu et al. 2001;Johnson 2001). Consider the following minimal pair from (Merchant 2013a: 446): (18) a. ?That man is a robber, and when he does 〈rob〉, he tries not to make any noise. b. *That man is a thief, and when he does 〈steal〉, he tries not to make any noise.
As discussed by Fu et al. (2001); Johnson (2001); Merchant (2013a), in cases like (18) it seems that the verbal part of the derived nominal can still license ellipsis of the verb. Furthermore, the subset relation in (18a) appears to go in the same direction as observed so far. If the deverbal noun contains all the features of the verb plus nominal features, then the infinitive form of the verb should constitute a subset of it. Saab (2015) also advocates the subset approach to ellipsis licensing. He claims that the subset relation between the ellipsis site and antecedent could also explain possible Vehicle Change (Fiengo & May 1994) mismatches such as the following: A: Will you help John? B: I will 〈help him〉.
Saab argues, following Nunberg (1993) and Elbourne (2005), that we can view pronouns as D elements with an index (presumably with f-features transferred in post-syntax, cf. Kratzer 2009). Thus, it is possible to conceive of a representation of (19), in which the pronoun in the ellipsis site constitutes a subset of its antecedent: (20) A: Will you help John 7 ? B: I will 〈help D 7 〉.
In the light of these observations, it seems that there is motivation for the following condition on ellipsis licensing similar to the one proposed by Rooryck & Schoorlemmer (2014) and Saab (2015): 4 (21) Subset Condition on Ellipsis: The morphosyntactic features in the ellipsis site must be a proper subset of those in the antecedent (F E ⊂ F A ).
For the purposes of this article, we can adopt (21) as our working hypothesis, that is, in order for ellipsis to be licensed, a proper subset relation must hold between the morphosyntactic features of the elided material and those of some relevant antecedent (E⊂A). 5 In the following section, I will develop this idea further and show how we can implement the subset condition on ellipsis licensing to derive Ross' observations about the directionality of gapping.

The Directionality of Gapping
In this section, I will address an old empirical puzzle regarding the restrictions on gapping and show how the subset view of ellipsis licensing can derive the relevant facts. Gapping is 'an ellipsis in which a verb is removed in one, or more, of a series of coordinations' (Johnson 2014). This is illustrated by the examples in (22) (Ross 1970: 251) This lead Johannessen (1998: 55f.) to claim that there is a direct correlation between headedness (head-initial vs. head-final) and the directionality of gapping (forward vs. backward). However, a number of head-final languages are in fact not restricted to backward gapping, but allow gapping either in the first or second conjunct. Examples of such languages are Hindi (Sjoblom 1980;McShane 2005)  Thus, it seems that it is not the case that head-final languages always show backward gapping (we will return to the case of Japanese below), but instead actually allow for backward gapping in addition to forward gapping. In light of these facts, the following generalization emerges: Ideally, a theory of gapping should be able to capture this. Hernández (2007Hernández ( : 2139 suggests that gapping is the result of an unspecified 'syntactic dependency between a null […] head and its antecedent'. She suggests that the Directionality Generalization in (27) can be captured by the type of coordinator used in gapping. Her claim is that if a language uses a different coordinator for clauses and NPs, then it allows forward gapping. This is claimed to follow from a kind of complex intervention effect, the details of which I will not recount here. It will suffice to point out that this generalization is questionable for Turkish since the coordinators ve and da exhibit different properties (Kornfilt 2010: 113), with the latter seemingly preferred in gapping structures (Kornfilt 2000;˙Ince 2009; also, see Sato 2010: 48 for problems with Hernández' approach). In the following section, I will discuss what are arguably the two main approaches to gapping in the literature and show how they struggle to capture the aforementioned generalization. Subsequently, I will present an alternative approach to gapping that can adequately derive the directionality facts.

Move-and-delete approaches
An [E]-feature approach to ellipsis entails that entire phrases (i.e. complements of heads) are deleted. Consequently, in order to achieve the effect of 'intermediate deletion' as in gapping, the remnant (normally the direct object) must evacuate the VP prior to ellipsis. I dub this the 'move-and-delete approach' familiar from the standard analysis of sluicing (Ross 1969;Merchant 2001). This view treats gapping as essentially another instance of VP ellipsis preceded by movement of some kind (cf. Sag 1976;Coppock 2001). The general approach is shown in (28): (28) John ate a burger and Mary ate a pizza.
The question that arises is what kind of movement operation exists in English to derive this. There have been a number of proposals for pseudogapping in the literature that could equally be applied to cases of gapping without do or an auxiliary/ modal: Heavy-NP Shift (Jayaseelan 1990), Object Shift (Lasnik 1995; and Movement to a low FocP (Gengel 2013), however, it is from straightforward to adopt these for the analysis of gapping. For example, while Heavy NP-Shift does exist in English, non-heavy NPs such as a pizza in (28) are clearly acceptable as gapping remnants. The Object Shift account is ad hoc for languages such as English, which lack this phenomenon independently, and the presence of a low focus phrase in languages such as English has, to my knowledge, not yet been clearly demonstrated. Putting this issue to one side, however, it still seems to be the case that even if we can find a suitable candidate for this evacuation movement, this general approach brings us no closer to deriving the generalization in (27). We would require the pre-ellipsis movement step in VO languages to be restricted to the final conjunct only, whereas OV languages would have the option of having this operation apply in either conjunct. It is difficult to find a non-stipulative link between headedness and the directionality of gapping in this theory.

ATB-movement (Johnson 2009)
In an alternative approach, Johnson (2009) eschews the conflation of VP ellipsis and gapping, since they do not share the same distribution. Instead, he argues that gapping should be analyzed as so-called Across-The-Board (ATB) movement, i.e. movement from two parallel positions in a coordination, of remnant VPs: (29) Some will eat beans and others eat rice.
Under this view of gapping, the remnant XPs are moved out of the VPs in each clause and then, rather than VP ellipsis, there is ATB-movement of the VP in each conjunct to the specifier of a higher head (PredP). Since ATB-movement necessarily involves one filler and two gaps (Williams 1978), this creates the effect of ellipsis of one of the VPs. With leftward movement, it will be the rightmost verb that then appears to be elided, i.e. forward gapping. The question is whether it is even possible to derive backward gapping under this approach.
To do so, we would require that the landing site of movement be to the right, either a rightward specifier or right-adjoined. Interestingly, if this were ATB-head movement rather than phrasal movement, we would predict that OV languages only have backward gapping (in line with Ross' original claim) since the Pred 0 landing site would be to the right: However, this is not what Johnson has in mind, and it seems to run afoul of the Head Movement Constraint (Travis 1984). Furthermore, in order to correctly capture the generalization in (27), we would need there to be a head position also to the left in order to derive the additional option of forward gapping in Turkish and Hindi. The same problem holds for Johnson's original proposal in that there is no clear way one can unify the properties of word order and gapping. 6

Total Impoverishment approach to gapping
This section proposes an alternative analysis of gapping that relies on the subset requirement on ellipsis argued for in Section 3. This approach to gapping will allow us to naturally derive the generalization that OV languages are more flexible with regard to the directionality of gapping. I adopt the view that ellipsis does not involve null pronunciation, but rather insertion of null exponents at PF. This approach is thereby contingent on a 'late insertion' approach to morphology as in Distributed Morphology (DM) (e.g. Halle & Marantz 1993;Harley & Noyer 2003;Embick & Noyer 2007;Nevins 2015). The following section will recapitulate some of the core assumptions of DM necessary for the analysis to follow.

Core assumptions of Distributed Morphology
In DM, lexical material is inserted into terminals created by syntax. The Vocabulary Item (VI) that can be inserted must be a subset of the terminal into which it is being inserted. This is the so-called Subset Principle (Embick & Noyer 2007: 298). If we have the tree in (31) and the VIs in (32), then it is only possible to insert the VI in (32a) speak into the V terminal, and not spoke (32b) since the latter has an additional feature (past), which is not present on the V terminal.
(32) Vocabulary Items: Competition between exponents is resolved by a further principle (Specificity Principle) stating that the VI realizing the most features should be preferred, but this will not play any major role in the analysis to come. In addition, there are a number of syntax-like operations that DM assumes the PF component can carry out. There are two post-syntactic operations that derive similar effects to Head Movement: Lowering and Local Disloca-6 At this juncture, one may wonder why these are assumed to be cases of gapping and not Right-Node Raising, as has been sometimes claimed in the literature (e.g. Maling 1972;Hankamer 1979 (Postal 1974;Sabbagh 2007), ellipsis (Wilder 1997;Hartmann 2000) and multidominance (McCawley 1982;Wilder 1999). İnce (2009) argues in detail that, of the three, only the ellipsis approach is tenable for Turkish. The ellipsis account involves backwards deletion of the 'right-node raised' material and is indistinguishable from the view of backward gapping advocated here. Thus, whether one dubs the phenomenon at hand 'backward gapping' or 'Right-Node Raising' will ultimately depend on the kind of analysis one adopts.
tion (Embick & Noyer 2001;Embick 2007 Both of these operations will each play an important role in the analysis to come.

Total Impoverishment
For the purpose of treating ellipsis as insertion of null exponents in the morphological component, I will introduce one new tool into the DM arsenal. In DM, the feature specifications of nodes generated in syntax can be manipulated by postsyntactic operations, such as impoverishment rules (Bonet 1991) that delete features on terminals prior to insertion (to derive syncretism, for example). I propose a particular variant of an impoverishment rule, namely one that deletes all the features on a given terminal. 8 I call this operation Total Impoverishment (TI) (36): 9 (36) Total Impoverishment: For any F, F a feature on L, F → Ø iff there is an Lʹ such that F ∈ Lʹ and L F ⊂ Lʹ F . 7 However, note that this does not preclude having Head Movement in the syntax (or indeed PF) in addition to the aforementioned postsyntactic operations. 8 Note that it is unclear whether the following approach makes any different predictions to one based on a post-syntactic operation such as Obliteration (Arregi & Nevins 2007) that deletes entire terminals rather than features. 9 I am grateful to an anonymous reviewer for their advice on the best formulation of this rule.
The above condition ensures that the feature set of a lexical item L is reduced to the empty set in the context in which there is another (distinct) instance of that lexical item Lʹ and the feature set of L is a proper subset of the feature set of Lʹ. In the Copy Theory of Movement (Chomsky 1995), marking lexical items as distinct in the numeration is required to avoid problems with Chain Reduction (see Chomsky 1995: 227, Nunes 2001: 306 for discussion) 10 . Thus, in order for ellipsis to be carried out via (36), there must be an asymmetry between two parallel elements. This can be captured in gapping in the following way. Let us assume, following Coppock (2001), Johnson (2009) and Toosarvandani (2016), that gapping involves 'low' coordination of νPs. Thus, we will always have an asymmetry between two verbs and a single T head in gapping structures. As we have seen above, T is assumed to be fused with the verb either via Lowering or Local Dislocation. For now, let us assume that this process is Lowering operating on hierarchical structure. Since the verb in the first conjunct is the hierarchically closest (due to the asymmetric structure of the conjunct phrase; Munn 1993; Johannessen 1998), then T will fuse with the verb in the first conjunct via Lowering, thereby creating a complex V+T. 11 (37) Fusion of T+V now creates a terminal with the additional features contained on T. The result of this is that V 2 is now a proper subset of V 1 and the conditions for Total Impoverishment in (36) are met. TI will then apply reducing the feature set of the verb in the second conjunct to the empty set (Ø). If we consider the relevant Vocabulary Items from (32) that can be inserted, the impoverished V terminal containing no features can only be realized by the null 'Elsewhere' marker (Ø) since this is the only VI that constitutes a subset of the empty set. As a result, upon Vocabulary Insertion into the tree in (36), the null exponent is inserted into the verb in the second conjunct, thereby deriving the effect of gapping: If we consider this approach from the point of view of Lobeck's (1995) identification and licensing requirements on ellipsis, the identification requirement is lexical. In order for an element to be elided, there must be another distinct (i.e. non-copy 12 ) instance of this lexeme accessible as an antecedent for the postsyntactic component. 13 The licensing requirement is the Subset Condition (21); it is only possible to elide an element if its features constitute a proper subset of the relevant antecedent.

Deriving the directionality of gapping
Now, we will see how this approach to gapping derives the connection between headdirectionality and gapping. Recall that we are trying to derive the fact that OV languages such as Hindi and Turkish allow for both forward and backward gapping, whereas VO languages only permit forward gapping. Furthermore, we saw that DM has two possible ways of fusing T with the verb, Lowering and Local Dislocation, the difference lying in whether fusion takes place pre-or post-linearization. If we consider the relevant structure for an OV sentence with gapping, we see that for Lowering, the hierarchically closest c-commanded verb is in the first conjunct, despite T now being to the right in this head-final language. Thus, if Lowering applies, T will fuse with V in the first conjunct.
(39) 12 However, note that this approach could very well be extended to Chain Reduction, i.e. all lower copies of a movement chain would have to stand in a subset relation to the higher copy (see Muñoz-Pérez 2016 for an approach very much along these lines). 13 I assume that 'accessible' here entails being contained in the same phase that is sent to PF, see Section 4.4 for discussion of some implications of this.
The result of this will be the same as with English above, V 2 now counts as a subset of V 1 and TI will apply, leading to forward gapping. However, consider what happens if Local Dislocation applies instead of Lowering. Once the structure has been linearized we have a flat structure as in (40) and the closest V head to T is actually V 2 , the verb in the second conjunct. Since Local Dislocation only cares about linear proximity, fusion will take place between T+V 2 : Now, it is V 1 , the verb in the first conjunct, that constitutes a subset of the verb in the second conjunct (V 2 ). As a result, TI will apply to V 1 , resulting in backward gapping. Interestingly, it makes no difference for VO languages whether Lowering or Local Dislocation applies, since the verb in the first conjunct is always closer: (41) Lowering: (42) Local Dislocation: Thus, we derive the optional direction of gapping from the fact that a language can freely chose whether to apply Lowering or Local Dislocation in order to combine T and V in the postsyntax. For VO languages, whether Lowering or Local Dislocation applies makes no difference to the outcome, only ever deriving forward gapping. Since the choice between these two operations only makes a difference for OV languages, we capture Ross' Generalization in a simple, direct way. Furthermore, it underlines the importance of whether a particular operation makes reference to hierarchical or linear closeness and how this can derive potentially different outcomes (see Marušič et al. 2015 for a similar conclusion about conjunct agreement in Slavic). At this point, we can establish the following order of postsyntactic operations: (43) Order of PF operations (Preliminary version): {Lowering, Local Dislocation} >> Total Impoverishment >> Vocabulary Insertion One final point pertains to the question of how we can derive cases in which both verbs are pronounced. Since there will necessarily be an asymmetry in gapping structures, we would always expect Total Impoverishment to apply. Thus, the fact that ellipsis appears to be 'optional' is not reduced to the presence vs. absence of a licensing [E]-feature, but rather particular structures, e.g. low coordination, will always lead to an elliptical structure following the assumptions here. In order to have both verbs pronounced, one would require TP conjunction and a particular order of operations, see Section 4.5. 14

Obligatory backward gapping in Japanese revisited
Now, let us return for a moment to gapping in Japanese. As discussed by Ross (1970) and Hernández (2007), Japanese appears to constitute a counterexample to the account presented in the previous section, since it is an OV language where only backward gapping seems to be possible: However, it is perhaps possible to reconcile these data with the general picture here. One striking fact is that it seems that it is only possible for the tense suffix to appear on the verb in the final conjunct in coordinate structures without ellipsis: Watasuki wa sakana o tabe(*-ta), Biru wa gohan o tabe-ta. I prt fish prt eat(-past) Bill prt rice prt eat-past 'I ate fish and Bill ate rice.' The single occurence of tense would seem to be indicative of a low coordination structure with a single T head. In terms of the present system, (45) seems to indicate that Japanese either lacks, or does not make use of the hierarchically-oriented Lowering operation and fusion of T to V is always conditioned by linear order, i.e. fusion of T and V always follows Linearization.

(46)
Local Dislocation: If this is the case, then the predictions of the current system are clear: since Lowering is a prequisite for forward gapping, a language lacking this independently (as Japanese appears to) would be correctly predicted to lack backward gapping.

Larger gapping sites
Up to now, we have only considered ellipsis of a single verb. However, it is well-known that gapping can remove considerably more material: (47) a. I want to try to begin to write a novel and Mary 〈wants to try to begin to write〉 a play. (Ross 1970) b. John has been reading War & Peace and Mary 〈has been reading〉 1984.
One of the main questions at this point is how we can avoid distributed deletion in the above examples deriving something like (48). This has typically been a problem for approaches such as the present one employing non-constituent deletion (Wasow 1972;Williams 1997).
(48) a. *I want to try to begin to write a novel and Mary 〈wants〉 to try 〈to〉 begin 〈to write〉 a play. b. *John has been reading War & Peace and Mary 〈has〉 been 〈reading〉 1984.
Let us assume that elements in an extended verbal projection (Grimshaw 2005) are connected via a series of syntactic Agree dependencies (also see Merchant 2015b: 211). This is motivated by the fact that the inflectional form of each element of the extended verbal projection seems to be determined by the next higher element. Adger (2003) implements this idea as a series of Agree dependencies where each head in the verbal projection inherits some 'inflectional' feature determining its morphological shape from the closest c-commanding head (49).
Following much recent work, I assume that Agree is split into two steps: Agree-Link and Agree-Copy (e.g. Arregi & Nevins 2012;Bhatt & Walkow 2013;Nevins 2014;Marušič et al. 2015;Smith 2015). The Agree-Link operation takes place in Narrow Syntax as with the standard approach to Agree (Chomsky 2000). However, valuation or transfer of features is delayed to PF (Agree-Copy). This is illustrated for case assignment in (50) and (51).

(50)
Agree-Link (Narrow Syntax): (51) Agree-Copy (Post-Syntax): This means that the successful transfer of features via Agree can be bled by postsyntactic operations such as Total Impoverishment. For example, if the probe undergoes Total Impoverishment before Agree-Copy applies, then the Agree-Link is destroyed and the deleted features can no longer be transfered to the goal: 15 (52) Total Impoverishment: (53) Bled Agree-Copy: If we consider the structure of a gapping example such as (47b), we have the structure in (54) with low coordination of PerfPs, where each conjunct contains the Agree-Link dependencies we saw in (49) (indicated by dashed lines), however valuation has not yet taken place.
As in the previous section, T is fused with the closest c-commanded head (Perf) in the first conjunct via the Lowering operation . After Lowering, the Perf 2 head in the second conjunct constitutes a subset of Perf 1 +T in the first conjunct and undergoes Total Impov-erishment . Now that the features on Perf 2 in the second conjunct have been deleted, the Agree-Link to Prog 2 is broken.
What follows is a 'cascade effect' similar to the analysis in Williams (1997). Agree-Copy between Perf and Prog now applies, but crucially it can only apply in the first conjunct . Now, the Prog head in the second conjunct is a featural subset of Prog 1 and can undergo Total Impoverishment : As a result of deletion of the features on Prog 2 , the Agree-Link to v+V 2 is broken. Agree-Copy applies in the first conjunct , but again cannot apply in the second conjunct. As before, the v+V complex in the second conjunct is a subset of the one in the first conjunct and now Total Impoverishment will delete the features on the verb . Thus, we see that Total Impoverishment on the verb can bleed Agree-Copy to other elements it is linked to.
Since this effect percolates down the verbal spine, it is possible to derive gapping of larger spans of material as well as avoid unwanted cases of distributed deletion as in (48). In order to achieve this flexibility, we can assume that Agree-Copy has the option of applying before or after Total Impoverishment: (57) Order of PF operations (Final version): {Lowering, Local Dislocation} >> (Agree-Copy) >> Total Impoverishment >> (Agree-Copy) >> Vocabulary Insertion

Restrictions on gapping
Gapping is known to obey a number of restrictions that do not hold for VP ellipsis, for example. One hallmark of gapping is that it cannot occur in embedded clauses as noted by Hankamer (1979): (58) *Alfonso stole the emeralds, and I think that Mugsy 〈stole〉 the pearls.
This suggests that gapping is phase-bound (Grano & Lasnik 2015). If postsyntactic computation proceeds in phases (e.g. at least CPs), then the verb in the embedded clause in (59) would belong to a different phase than its antecedent and would therefore not be able to trigger Total Impoverishment. The fact that non-finite boundaries do not trigger this effect (47a) seems to be suggestive that locality conditions play a role here. Furthermore, consider the following contrast from Grano & Lasnik (2015): (59) a. *Joe 1 claims that Bill likes apples and Tim 2 〈claims that Bill likes〉 oranges. b. ?Joe 1 claims that he 1 likes apples and Tim 2 〈claims that he 2 likes〉 oranges.
The observation Grano & Lasnik (2015) make is that, while gapped material cannot ordinarily span a finite clause boundary, it can do so if there is a bound pronoun in the ellipsis site. As discussed in the previous section, gapping of larger chunks of material involves a syntactic dependency traced back to the main verb. In order to elide the necessary material in (59), there needs to be a syntactic link between the verb in the embedded clause that undergoes TI and, for example, the matrix verb. However, this dependency cannot cross a phase boundary: (60) *Joe 1 claims that Bill likes apples and Tim 2 oranges].
If we follow the analyis of Grano & Lasnik (2015) in which binding is via Agree and unvalued features 'void phasehood', then the (un)bound pronoun will keep the CP phase open until its antecedent Bill is merged, i.e. long enough for the necessary TI relation with the matrix verb to be established. This link can then be used to derive gapping as described in the previous section.

The size of coordinated constituents
The present system derives gapping as the automatic result of low (νP) coordination (e.g. Coppock 2001;Johnson 2009;Toosarvandani 2016). If TPs are conjoined and Total Impoverishment applies before Vocabulary Insertion, then we can derive pseudogapping since the 'stray' features on the T in the second conjunct will be spelled out either as an auxiliary, modal or as do. Consider the following example of pseudogapping from Hoeksema (2006): (61) This may not bother you, but it does 〈bother〉 me.
In the first conjunct, T cannot combine with the verb since it is already occupied by a modal. Nevertheless, the modal agrees with the verb to determine its inflection, as discussed in Section 4.3. 16 In the derivation of pseudogapping, Agree-Copy applies between T and the verb in the first conjunct. Consequently, the verb in the second conjunct constitutes a proper subset of the verb in the first conjunct and undergoes Total Impoverishment. Since we have coordination of TPs rather than low coordination of νPs, the T head in the second conjunct has 'stray' features that are spelled out as do. 17 (62) Crucially, in the present approach cases of ordinary gapping cannot involve coordination of TPs (or CPs) since we always expect pseudogapping to emerge as above. There are, however, some cases where on the surface it looks like gapping actually involves the coordination of CPs, as the conjunct with gapping contains a left-peripheral wh-phrase (Sag 1976;Pesetsky 1982;Kubota & Levine 2016): (63) Bill asked [which books I gave to Mary] and [which records to John] (Pesetsky 1982: 646) It is tempting to conclude from this that (63) must then have the structure in (64): (64) Bill asked [ CP which books 1 I gave t 1 to Mary] and [ CP which records 2 〈I gave t 2 〉 to John]. 16 Note that negation also has a blocking effect for Lowering and Local Dislocation. An anonymous reviewer wonders how the present system would derive the following: (i) Alex did not eat caviar or Brian 〈eat〉 beans.
Since T cannot cross negation (for reasons that are unclear; perhaps English has only Local Dislocation), T does not combine with the verb. However, an Agree-Link still exists between elements of the extended verbal projection. Crucially, (i) involves low-coordination, T agrees with the closer verb and transfers its 'inflection' feature to this verb. Now the necessary subset relation is given, the verb in the second conjunct is elided. The stray features on T are then realized as do. 17 TP coordination will not always necessarily result in pseudogapping given the flexible ordering of Lowering and Total Impoverishment. If the former applies first in a TP coordinate structure, then we will derive (i) rather than pseudogapping: (i) It may not bother you but it bothers me.
However, as discussed by Johnson (2014), there are a number of reasons to be suspicious about an analysis with conjoined CPs. For example, one striking fact about English gapping is that conjunctions cannot occur in the second conjunct, suggesting that the conjuncts are smaller than CPs: It seems implausible to assume that coordinated CPs are involved in (64). Instead, we can assume, following Johnson (2014), that the wh-phrase remains in situ in the second conjunct: (67) Bill asked which books 1 I [ νP gave t 1 to Mary] and [ νP 〈gave〉 which records to John].
This low coordination analysis can also make sense of the following ungrammatical example from Pesetsky (1982: 646): (68) *Bill asked [which books 1 Mary likes t 1 ] and [which records 2 John 〈likes t 2 〉] If we were to assume that CP coordination is involved, the contrast between (63) and (68) would be puzzling. However, under a low coordination approach in which the wh-phrase in the second conjunct is in situ, the ungrammaticality of (68) follows from the fact that an in situ object cannot precede the subject in Spec-νP (again see Johnson 2014). 18

Open questions and directions for future research
In this paper, I have pursued the strongest version of the subset approach to ellipsis licensing, namely that the features of elided material must constitute a proper subset of some relevant antecedent. This is an approach based purely on syntactic identity and it is well-known that such approaches fail to capture the entire body of heterogenous ellipsis constructions. In this section, I discuss some potentially problematic data for the Subset Condition proposed here, as well as possible directions for extending this approach to phrasal ellipsis constructions such as sluicing.

Challenges for the Subset Condition
One case in which a subset-based approach would seem to make incorrect predictions is with personal pronouns. If, following Harley & Ritter (2002) for example, one assumes that third person pronouns are structurally less complex than first or second person ones, then we may expect, contrary to fact, that a third person antecedent cannot license deletion of a first person pronoun since the necessary subset relation would not hold: 18 One remaining puzzle involves the interpretation of negation in gapping. As also pointed out by a reviewer, there is the question of examples such as (i) (Siegel 1984): (i) We can't eat caviar and Alex 〈eat〉 beans. Such examples are reported to allow both readings where the modal/negation takes wide scope (can't > and), but also a reading where the modal/negation scopes below the coordination (and > can't) (see Repp 2009;Centeno 2012). One possible analysis of this latter reading is to assume that the second conjunct does indeed include negation and is therefore (at least) a TP. This would mean that gapping can involve conjunction of TPs in certain circumstances. The existence of gapping with TP coordination would pose a serious problem for the theory developed here, since TP coordination should result in pseudogapping (it would seem equally problematic for other low coordination analyses such as Johnson's (2009), which involves ATB-movement to a Pred projection directly above νP). It may be possible to maintain a low coordination analysis by assuming that negation starts off in each conjunct and somehow ATB-raises out to a higher position (cf. neg-raising), reconstructing at LF, however I will not be able to pursue this alternative analysis here.
(69) John loves his mother and I do 〈love my mother〉 too.
One potential solution to this problem would be to assume that pronouns acquire their f-features at PF in the spirit of Kratzer (2009) (via Agree-Copy in the present approach), and that such a process can be bled by ellipsis. 19 Another potentially problematic case comes from finiteness mismatches such as the following from Merchant (2001: 22): (70) I'll fix the car if you tell me how 〈to fix the car〉.
Intuitively, it seems that the terms in the ellipsis site form a superset of the relevant material in the antecedent. This is of course the opposite relation we would expect. However, if we focus on the features involved, rather than the terms, the antecedent TP plausibly contains a tensed T head passing on some inflectional features to the verb (in line with the assumptions in the preceding sections). In the ellipsis site, however, we have a tenseless T and an uninflected verb (presumbably both featureless). Assuming that case assignment to the DP can also be bled, it would be possible to view the relevant features in the ellipsis site as a subset of the antecedent.
A final set of problems comes from mismatches with fragment answers to imperatives. Examples such as (71) suggest Isac 2015). If this can be shown, then a subset relation could potentially be argued for. A more problematic case of this sort concerns cases such as (72) discussed by Thoms (2013: 562), where it seems that there must be a modal present in the ellipsis site that is not in the antecedent. Similar cases in German are reported in Merchant (2001: 23). It is worth noting that these are cases of 'sprouting' (Chung et al. 1995), which serves to complicate matters further as exophoric ellipsis is arguably a challenge for all syntactic approaches to identity (cf. Miller & Pullum 2013). At present, if we wish to posit the ellipsis site in (72), then it is unclear how to reconcile this with the subset approach pursued here. 20 Despite the evidence in support of subset relations discussed in Section 3, this section has shown that there are still a number of challenges. This is of course true for any account relying solely on syntactic or semantic identity as we saw in Section 2.2 (also see Merchant to appear). 19 A reviewer points out that it is not possible to adopt Kratzer's proposal entirely faithfully, since a crucial aspect of her analysis is that third person fake indexicals are 'born fully-specified ' (2009: 187). Since fake indexicals seem to be able to license ellipsis or a first or second-person pronoun as in (i) (suggested by a reviewer), one would have to assume that f-features are transferred at PF, and try to develop a different account of the fake indexical facts.
(i) Yesterday, only Jack loved his mother. Today, only I do.
(fake indexical reading ok) 20 However, there is an observation attributed to Howard Lasnik that there is a potential underlying source without a modal, namely an echo question: 〈Amuse you〉 with what? Under an in situ approach to sluicing (Abe 2015), this analysis may become tenable. Nevertheless, it is worth noting that the in situ approach to sluicing will inevitably struggle to explain sprouting cases, which are not possible in echo questions (〈*Amuse you〉 what with?).
While the directionality facts for gapping seem to fall out using syntactic (subset-based) identity, a more comprehensive theory of ellipsis capturing the full range of ellipsis facts will almost certainly require a hybrid approach combining notions of syntactic and semantic identity. 21

Sluicing and VPE
Cases in which larger constituents such as TPs or VPs are elided pose a challenge for the Total Impoverishment approach to ellipsis developed in this paper. However, all hope is not lost if one is willing to entertain some more radical ideas about the nature of movement and Spell-Out in the Minimalist Program. The tentative approach to sluicing I will sketch here depends on two central, albeit non-standard assumptions. The first idea is to import the notion of phrasal Spell-Out from Nanosyntax (e.g. Neeleman & Szendrői 2007;Caha 2009;Starke 2009). Without going into detail, phrasal Spell-Out entails that the portions of structure which are spelled out are not terminal nodes, but rather 'spans' of tree structure (cf. Svenonius 2012; Merchant 2015a) (73). Since Total Impoverishment equates Spell-Out and ellipsis, entertaining phrasal Spell-Out now opens the door to phrasal ellipsis in a Spell-Out approach.
(73) Phrasal spellout: (74) Ellipsis: 21 One final challenging set of data pointed out by a reviewer involve 'polarity reversals' under sluicing (Kroll 2016). Here, it is claimed that some ellipsis sites in sluicing can be interpreted with a negation that is absent in the antecedent.
(i) I don't think that [ TP Trump will comply with the debate requirements], but I don't know why 〈 [ TP he won't comply with them] 〉. Putting the actual availabilty of these interpretations aside, many of the cited examples involve NEG-raising predicates such as think. Crucially, if NEG-raising involves genuine syntactic movement from the embedded clause, as recently argued for by Collins & Postal (2014), then these data may not turn out to be quite as problematic after all (however, cf. Gajewski 2007). The question that remains is how to achieve the relevant subset conditions of TI to apply. As noted by Thoms (2010), particular types of Ā-movement can license ellipsis. Since the abandonment of traces in the Minimalist Program, most researchers adopt the 'Copy Theory of Movement' (Chomsky 1995). However, this will not help us in a subset-based account, since movement will not create any difference between the antecedent and the ellipsis site if movement leaves a copy. There is, however, an alternative, possibly more minimal, approach in which movement does not leave any copies (Epstein et al. 1998;Epstein & Seely 2002;Fox & Pesetsky 2005;Müller 2014). Adopting this view derives the subset condition in a very straightforward way: movement will create an asymmetry between the elements (and their associated functional structure) in the elided TP and the antecedent TP and thereby license Total Impoverishment, and subsequent null insertion, for the entire TP under a phrasal Spell-Out approach. 22 (75) [  (13) (M E -t ⊆ M A ). This, of course, follows naturally from the assumption that movement does not leave anything behind. Furthermore, the otherwise puzzling P-Stranding restrictions discussed by Chung (2013) repeated in (76) can also be neatly captured by this approach:  (Chomsky 1995;Merchant 2001;Platzack 2013), then there is potentially another interesting interaction between head movement and Total Impoverishment, with head movement applying in the second conjunct first to create the conditions for ellipsis to apply. While this approach needs to be fully worked-out in order to capture the many properties of sluicing and VP ellipsis, it constitutes a promising point of departure for an alternative view of the nature of Spell-Out, movement and ellipsis licensing.

Conclusion
One of the main goals of this paper was to argue for the role of subset relations in the licensing of ellipsis. From the point of view of formal features, we saw that the [E]-feature shares a number of the conceptual problems associated with the EPP. Furthermore, it was noted that there are also empirical problems with the [E]-feature account in its current form, such that it is worth exploring a possible alternative, non feature-based approach to ellipsis. The proposal in this paper locates ellipsis in the postsyntactic component, carried out by the operation Total Impoverishment deleting entire feature sets. Ellipsis is then not due to deletion, but rather insertion of null Elsewhere markers into syntactic terminals. 22 An anonymous reviewer points out an interesting challenge for this approach. Namely, if we want to say that certain wh-adverbs are base generated in Spec-CP, e.g. why or how come (see Stepanov & Tsai 2008), then we would expect these not to license sluicing. However, this appears not be the case. This means that either the aforementioned analysis of sluicing or the base generation analysis of causal adverbs should be reconsidered; I leave it to future research to decide which.
The licensing conditions for this operation are argued to be a proper subset relation, where the features of the ellipsis site must be a proper subset of the features of the antecedent. The analysis of gapping proposed here shows that there can be interesting crosslinguistic results depending on whether ellipsis (Total Impoverishment) applies before or after linearization, and that such an approach can successfully derive the observation by Ross (1970) that backward gapping is restricted to OV languages. It seems clear that a theory of ellipsis licensing that relies purely on syntactic or semantic identity will not cover all the relevant facts, and what we probably need is an approach that allows for some aspects of both. In general, isomorphism has been shown not to work as a criterion for syntactic identity and, as such, a subset approach such as the one developed could be adopted in its place. While there is still much to be done to uncover the correct balance between syntactic and semantic identity, this paper provides further motivation for an approach to syntactic identity involving subset relations.