Probabilistic grammar and constructional predictability: Bayesian generalized additive models of help + (to) Infinitive in varieties of web-based English

The present study investigates the construction with help followed by the bare or to-infinitive in seven varieties of web-based English from Australia, Ghana, Great Britain, Hong Kong, India, Jamaica and the USA. In addition to various factors known from the literature, such as register, minimization of cognitive complexity and avoidance of identity (horror aequi), it studies the effect of predictability of the infinitive given help and the other way round on the language user’s choice between the constructional variants. These probabilistic constraints are tested in a series of Bayesian generalized additive mixed-effects regression models. The results demonstrate that the to-infinitive is particularly frequent in contexts with low predictability, or, in informationtheoretic terms, with high information content. This tendency is interpreted as communicatively efficient behaviour, when more predictable units of discourse get less formal marking, and less predictable ones get more formal marking. However, the strength, shape and directionality of predictability effects exhibit variation across the countries, which demonstrates the importance of the cross-lectal perspective in research on communicative efficiency and other universal functional principles.


Introduction
The present paper investigates the English construction with help followed by the infinitive with or without to, as in (1): (1) a. Mary helped John to install the program. b. Mary helped John install the program.
The construction help + (to) Infinitive is a rare case when this choice is possible in Present-Day English. Different factors have been proposed to explain when one or the other variant is preferred. Some of them are related to the universal functional principles of iconicity, minimization of cognitive complexity and avoidance of identity (also known as horror aequi). Other factors include register, morphological form and the presence or absence of the Helpee. Lohmann's (2011) quantitative study of help in British English showed that the variation is multifactorial and probabilistic. Moreover, it has been observed that American English has a particularly strong preference for the variant without to, although the bare infinitive is more common than the toinfinitive in both British and American varieties (e.g. Biber et al. 1999: 735). In addition, the bare infinitive has been gradually replacing the to-infinitive in the constructions with more expected words, syllables or phonemes are more likely to undergo length reduction and loss of articulatory detail than less expected ones (e.g. Jurafsky et al. 2001;Aylett & Turk 2004;Bell et al. 2009;Mahowald et al. 2013).
This correlation inspired Aylett and Turk's (2004) smooth signal redundancy hypothesis, which says that information content should be spread evenly across the signal. A similar idea has also been expressed as the hypothesis of Uniform Information Density (see Levy & Jaeger 2007). These proposals involve concepts from Shannon's (1948) information theory. Information content, or surprisal, is based on the conditional probability of a unit given its context, e.g. n words on the right or left. It is the opposite of predictability. That is, the less predictable a unit is from its context, the more informative it is.
Of particular relevance for the present study are the studies of grammatical alternations with optional markers, which tend to be omitted when the structures that they introduce are predictable from the context, e.g. the relativizer that in English relative clauses after definite NPs (Wasow et al. 2011), the object marker in Japanese in typical agent-patient configurations (Kurumada & Jaeger 2015), or head-marking of the subject of the relative clause in Yucatec Maya after definite NP heads (Norcliffe & Jaeger 2016).
As far as help + (to) Infinitive is concerned, one can expect that the particle to will be more frequently used in the situations when the information content is higher. Information content is defined in the present study in two ways: a) based the predictability of the infinitive given help and b) based on the predictability of help given the infinitive. 1 These two measures have analogues in usage-based construction linguistics, which are known as Attraction, i.e. the conditional probability of a word given a construction, and Reliance, i.e. the conditional probability of a construction given a word (Schmid 2000). Although many corpus linguists find it useful to compute one bidirectional measure that represents the association between a construction and one of its collexemes (e.g. Stefanowitsch & Gries 2003), Schmid has been arguing that Attraction and Reliance represent two different types of information, each valuable on its own (e.g. Schmid & Küchenhoff 2013). To the best of my knowledge, these two types of predictability -predictability of a collexeme given the construction and the other way round -have not been previously taken into account in the previous studies of predictability effects in morphosyntactic alternations.
In speech, the use of additional coding material may give the speaker and the listener more time to plan and process the utterance. The predictability effects have been observed in writing, as well. As Wasow et al. (2015) hypothesize, this may happen because the speech habits are carried over to writing, or because of temporal pressures on readers. Still, the predictability effects found in writing are robust enough to test the main hypothesis of the present study on data from a written corpus.

Principle of iconicity
Iconicity is the correspondence between linguistic form and function. There exist many types of iconic relationships at all levels of language structure, from phonology and orthography to morphology and syntax. For our case study, the most relevant type of iconicity is the correspondence between formal and conceptual distance. As formulated by Haiman (1983: 782), "[t]he linguistic distance between expressions corresponds to the conceptual distance between them." With regard to help + (to) Infinitive, one can say that the formal distance between help and the infinitive is greater when the latter is preceded by the particle to. In addition, iconicity of independence or autonomy may also be relevant (cf. Bybee 1985). Events that are more integrated conceptually are also more integrated formally. In the case of help, it is possible to say that the bare infinitive, which is very restricted and occurs primarily as a complement to auxiliary and modal verbs and with supportive do, is more strongly integrated with help than the to-infinitive, which occurs in a wide range of constructions (Huddleston & Pullum 2002: 1174. As for conceptual proximity or dependence, they are very difficult to define. In the literature, they are understood as a number of different phenomena, for example, spatiotemporal integration of the events, the degree of control and agentivity of the participants, etc. (Givón 1990: Section 13.2). With regard to help, it has been proposed that the variant with the bare infinitive designates a more active involvement of the Helper in carrying out the event expressed by the infinitival complement (Dixon 1991: 199). Consider the following examples: (2) Dixon (1991: 199) a. John helped Mary eat the pudding (he ate half). b. John helped Mary to eat the pudding (by guiding the spoon to her mouth, since she was still an invalid).
When to is omitted, as in (2a), the sentence is likely to describe a cooperative effort where Mary and John ate the pudding together; when to is included, as in (2b), the sentence means that John acted as a facilitator for Mary, who actually ate the pudding herself (Dixon 1991: 199;230). Similarly, Duffley (1992: Section 2.3) suggests that the use of the to-infinitive evokes help as a condition that enables the Helpee to bring about the event denoted by the infinitive. It has also been argued that animate Helpers have a potentially greater involvement in the event (Lind 1983). Indeed, Lohmann (2011) finds that animate Helpers have higher odds of the bare infinitive than inanimate Helpers, which can be regarded as evidence in support of the iconicity account. Yet, many researchers have questioned the relevance of this semantic distinction. For example, Huddleston & Pullum (2002: 1244 argue that there are numerous contexts and examples where this distinction cannot be traced. Similar claims were made by McEnery & Xiao (2005).

Principle of (minimization of) cognitive complexity
The principle of minimization of cognitive complexity says, "In the case of more or less explicit grammatical options the more explicit one(s) will tend to be favoured in cognitively more complex environments" (Rohdenburg 1996: 151). The more words between help and the infinitive, the more difficult it is to recognize the latter as part of the construction. Consider an example of a complex environment in (3), where the distance between help and the infinitive is six words.
(3) (Great Britain, blog, 3069710) 2 …it's a way for me to make a contribution, to help the country in a small way to get back on its feet.
The longer the distance, the more likely it is that the infinitive will be marked by the particle to (see also Lohmann 2011).

Principle of avoidance of identity, or horror aequi
Horror aequi is a widespread tendency to avoid repetition of identical elements (Rohdenburg 2003). This idea is also known as the Obligatory Contour Principle, which has been first formulated for phonology (Leben 1973), but has been used to explain different phenomena at all linguistic levels since then (e.g. omission of optional that in Walter & Jaeger 2008). Rohdenburg uses horror aequi to explain why the to-infinitive tends to be avoided immediately after a governing to-infinitive (e. g. to try to do). When the verb help is itself preceded by to, the following infinitive is usually without to (Biber et al. 1999: 737). See an example in (4): (4) (Great Britain, general, 303502) Sorry, but how is this supposed to help answer the question?
This hypothesis was confirmed by Lohmann (2011), who also finds an interaction between this factor and complexity (see Section 2). The more words there are between help and the infinitive, the weaker the influence of horror aequi.

Other factors
• Register: The shorter variant with the bare infinitive is considered to be less formal than the one with the marked infinitive (e.g. Rohdenburg 1996: 159; see also Biber et al. 1999: 736-737). • Inflectional form: Lohmann (2011) observes that the form helping tends to be more frequently used with the to-infinitive in British English than the other inflectional forms of help. According to Rohdenburg (2009: 317), the effect of helping has an analogy with daring and needing, which differ from all forms of dare and need by being virtually always associated with marked infinitives. In addition to that, there is a weakly significant preference of the third person singular form helps for the to-infinitive in comparison with the base form ( Lohmann 2011). • Presence or absence of the Helpee: Biber et al. (1999: 735) show that the bare infinitive is particularly dominant in the pattern help + NP + infinitive clause. This observation is also supported by Lohmann (2011). • Passive or active infinitive: According to McEnery & Xiao (2005), the passive infinitive should always be marked with to. However, this is not supported by my data. Both the bare and to-forms can be used, as shown in (5).
(5) a. (USA, general, 288902) If rural voices are important -the bread basket, our farmers, our miners -then an electoral approach, not a pure popular vote, helps them to be heard. b. (USA, blog, 3177307) Thank you so much for sharing and helping our Vets be heard! One should also mention phonological factors. There is some evidence that the use of to in different constructions depends on prosody. Wasow et al. (2015), in particular, found an effect of prosody on the use of the bare or to-infinitive in their investigation of the do-be construction, e.g. All we want to do is (to) celebrate. Namely, they discovered that to was used to eliminate stress clash when both the copula and the first syllable of the infinitive after be were stressed. I'm not aware of any studies of help that focused directly on the effect of stress clash. However, Lohmann (2011) tested two other phonetic variables, namely, if the infinitive begins with the vowel, and whether the first syllable of the infinitive is stressed. Neither of the variables had a significant effect on the choice between the forms of the infinitive.

Corpus and the procedure of data extraction
The data used in the present study come from the Corpus of Global Web-based English (GloWbE) created by Davies (2013). This large corpus contains 1.9 billion words and represents online English from twenty countries. For this case study, seven geographic varieties were chosen from different parts of the world: Australia, Ghana, Great Britain, Hong Kong, India, Jamaica and the USA. The choice for this corpus was motivated primarily by its size. One needs large corpora in order to compute reliable information-theoretic measures, especially if the construction of interest is not very frequent. I used a part of the corpus with eighteen million words per country, nine million from the General subcorpus and nine million from the Blog subcorpus.
The data extraction procedure was as follows. First, I used a Python script to collect all instances of help in any inflectional form followed by an infinitive somewhere in the sentence. If there were finite verb forms, clause-combining conjunctions like because, or subject pronouns like I, he and she between help and the infinitive, the instance was discarded. A quality check based on one hundred manually extracted examples from five subcorpora revealed that this approach was quite successful in recognizing the instances of the construction: The recall was 86%, and the precision was 93%. Only active uses were collected because the bare infinitive can be used only in active sentences (Huddleston & Pullum 2002: 1244, as shown in (6): John was helped to cook the dinner. b. ??John was helped cook the dinner.
The spelling variants of the verbs were normalised, so that the pairs like maximize and maximise, fulfil and fulfill were treated as one word.
In spite of the fact that the corpus compilers performed some cleaning, there were still quite a few duplicate sentences in the data. They were removed with the help of a script. Another problem were nonsense sentences, which were probably machine-generated or contained advertising information (cf. similar problems reported in Mair 2015: 31-32). However, they were not numerous and were removed during the process of variable coding.
Finally, I cleaned the data manually from the instances of a formally similar but functionally different construction with the dummy it-subject, where the to-infinitive is always used (McEnery & Xiao 2005). An example is shown in (7). 3   (7) Ruth Bader Ginsburg's Relationship Advice: "It helps to be a little deaf". 3 After the data collection and cleaning, I obtained the frequencies shown in Table 1. Since the sizes of the subcorpora were identical (18 million words), the "raw" frequencies are directly comparable between the varieties. One can see that Hong Kong has the highest total frequency of the constructions, and Jamaica the lowest. However, the differences are not very large. As for the relative frequencies of the variants, the variant with the bare infinitive is the more frequent one in all countries. The USA subcorpus displays the highest relative frequency of help followed by the bare infinitive (84.9%), whereas the Jamaican subcorpus has the lowest one (60.8%), followed by Great Britain (70.3%) and the other countries. The next section describes the predictor variables, which represent the factors mentioned in Sections 2 and 3. The Helper's animacy is not taken into account because it was very difficult to automate the annotation procedure. The parser returned very poor results due to highly complex syntactic structures, e.g. when help was itself part of an infinitival clause. Note that the effect of animacy in Lohmann's (2011) study was rather weak. Prosodic factors (in particular, stress clash) are not tested, either, due to the practical difficulties in obtaining the stress patterns from the written data of such a large size. I added one new variable, the valency of the infinitive.

Constructional predictability
To test the effects of constructional predictability, I computed two measures for each unique infinitive, which are described below.
• Information content of the infinitive given the construction, defined as the negative log-transformed conditional probability of the infinitive (with or without to) given the construction with help: -log P(verb |help). This conditional probability is computed as the number of occurrences of a given infinitive with help divided by the total frequency of the construction with help in the relevant subcorpus. In corpus-based constructional studies this probability is known as Attraction (Schmid 2000). The more frequently a verb is used in the construction with help in comparison with the other verbs, the lower the information content. 4 • Information content of the construction given the infinitive, defined as the negative log-transformed conditional probability of the construction with help (with or without to) given the infinitive: -log P(help|verb). This conditional probability, which is also known as Reliance (Schmid 2000), is computed by dividing the number of occurrences of a given infinitive with help by the total frequency of the verb in the subcorpus in all forms. The more frequently a verb is used with help in comparison with the other uses of the same verb, the lower the information content.

Cognitive complexity
This principle is represented by linguistic distance, which was measured as the number of words between the wordform of help and the infinitive (the particle to was not counted). For example, the sentence in (8) has the distance of four words. Although there are different ways of defining syntactic complexity, such as counting the number of syntactic nodes and quantifying the level of embeddedness, word counts serve as a good proxy for the more sophisticated measures (Szmrecsanyi 2004). This is why I also use simple word counts in this study.

Horror aequi
This factor is represented by the variable which reflects the presence of the particle to before help, as in (9): (9) (India, blog, 3388613) The Plate-Inversion protocol, and this post are two simple hacks to help you get started.
This is a binary variable with the values "Yes" and "No".

Other variables
-Formality, which is represented by the average word length in the website text where a given instance of help was attested. The greater the average word length, the more formal the text. This operationalization is based on Biber's (1988) multidimensional analysis of register variation. He found, in particular, that longer word forms, alongside the type-token ratio and the relative frequency of nouns and adjectives, contribute strongly to the negative pole of the first factor or dimension, which is interpreted as "Involved vs. informational production" and has conversations and academic texts at its extremes. The use of the mean word length is purely practical. Many texts in the corpus are very short and cannot provide reliable relative frequencies for the lexico-grammatical categories required for a full-fledged multidimensional analysis. -Morphological form of the verb help, which can be help, helps, helped and helping.
-The presence or absence of the Helpee, illustrated by (10a) and (10b), respectively: (10) a. (Great Britain, blog, 3058500) It provides a systematic approach to helping people defeat dyslexia and related reading problems.
[Presence] b. (Ghana, general, 1259905) These bumps and turns will only help contribute towards a relationship. [Absence] According to the previous studies (see Section 3.4), the contexts with zero Helpees are expected to contain the to-infinitive more often than those with overt Helpees.
-Valency of the infinitive, which can be intransitive (including copulas), transitive (including ditransitives) or followed by a clause. Examples are shown in (11). In order to code this variable, the sentences were first parsed syntactically with the help of Stanford Parser (Klein & Manning 2003). The contexts were then manually checked, and the category "Clause" was added manually. Examples with passive infinitives were excluded. Due to their extremely low frequencies, it was impossible to include them as a separate category in the regression models. At the same time, it did not seem reasonable to merge them with any other category, since previous studied suggested that they might behave differently from active forms (McEnery & Xiao 2005; see also Section 3.4).

Bayesian inference and characteristics of the models
To test the effect of the predictors on the use of bare and to-infinitives, I used Bayesian mixed-effects generalized additive models. For this purpose, I employed Stan, a programming language and platform for Bayesian inference (Stan Development Team 2015) and the package brms (Bürkner 2017), which provides an R interface to Stan (R Core Team 2017). Seven Bayesian logistic regression models were fitted, one for each variety. The response variable was the use of the bare or to-infinitive. The predictors described in Section 5 were treated as fixed effects. The individual websites and the verbs that fill in the infinitive slot were treated as random effects (more exactly, random intercepts). Sum contrasts were used with all categorical and binary variables, so that zero represents the grand mean (i.e. the unweighted mean of means) of the categories. The numeric variables were centred around the mean. Two interaction terms were modelled after diagnostic tests. One interaction is between linguistic distance and the horror aequi variable, which was found to be significant by Lohmann (2011). The other is the interaction between the form of help and the presence or absence of the Helpee. In addition, an interaction between the two information-theoretic measures was taken into account by introducing bivariate smoothing terms (see below). The discriminating power of the models was excellent (all concordance indices C were greater than 0.9).
Bayesian regression allows the researcher to test directly the research hypothesis. In our case, we can obtain the probability of a predictor having a positive or negative effect on the presence or absence of the particle to. In Bayesian inference, such probabilities are called posterior probabilities, or posteriors, because they are computed after the data have been taken into account. They also depend on prior probabilities, or priors, which represent the researcher's prior beliefs in the probability of some parameters before the data are taken into account. If one provides non-informative priors (e.g. uniform ones, where any value is equally probable), this will result in posteriors that are influenced only by the data, as in frequentist statistics. As recommended by the Stan developers, I used the default weakly informative priors, which only help to constrain the posteriors to reasonable values, i.e. those to be normally found in logistic models. Bayesian regression is a perfect match for probabilistic grammar because posterior probabilities can be easily compared cross-lectally. They also allow us to study a continuum of credibility without forcing us to make binary decisions based on p-values. For more information about the technical details of Bayesian modelling, one can be referred to Kruschke (2011). In what follows, I focus on the results.
The algorithm returns 6000 posterior estimates of each regression parameter (1500 estimates in four Markov chains per each model). These probability distributions can be represented in a histogram which displays our posterior beliefs after the data have been taken into account. An example is provided in Figure 1. It shows the effect of average word length on the chances of the bare and to-infinitive in the websites from Great Britain. The numeric values on the horizontal axis are the log-odds ratios. A positive log-odds ratio means that the odds of the to-infinitive increase with average word length, whereas a negative value means that the odds of the to-infinitive decrease (and, conversely, the odds of the bare infinitive increase). From the posterior distribution one can compute the posterior mean, which is displayed as a dot in Figure 1, as well as 95% credible intervals, which show the region between the 2.5% and the 97.5% percentiles, where the 95% of the posterior distribution lies. Credible intervals thus span the most believable posteriors. If one has to make a categorical judgment of the type "Does the variable increase the chances of one or the other outcome?", one can use this criterion. If a credible interval does not include zero, as in this illustration, one can say that the effect is credibly nonzero.
The posterior distribution can also help us assess the probability of observing the positive and negative effect of a given predictor on the chances of the to-infinitive by computing the proportions of the posteriors that are greater and less than zero. In our example, the proportion of the posteriors greater than zero is 100%. This information allows us to test directly the alternative hypothesis.
Additional diagnostic tests with polynomials suggested that some of the effects of the predictability variables are non-linear. To take that into account, I used the methods of generalized additive modelling (Wood 2006), which applies smooth functions to model non-linear relations between predictors and the response. More exactly, I used bivariate smoothing terms, which take into account possible non-additive effects of two predictors. Using the LOO criterion for model comparison, I chose isotropic smooths, which are appropriate when variables are on similar scales. As for the other continuous variables, no convincing non-linearity was detected.

Predictability-related variables
The marginal effects of the information content of a verb given help are displayed in Figure 2. They are based on the predicted probabilities of the to-infinitive. Recall that the hypothesis was as follows: The greater the information content, the higher the chances of the marked form. Although some of the plots point in the right direction (e.g. the data from Ghana, Hong Kong and the USA), the 95% credible bands are very broad in comparison with the magnitude of those effects, which means that the latter are marginal at best. In contrast, the marginal effects of the information content of the construction given a verb are more robust, as shown in Figure 3. In the USA data, the effect is the weakest. In Australia, Great Britain, India and Jamaica, we also observe some non-monotonicity, with a small dip in the centre. The effects of both information-theoretic variables in interaction are displayed in Figure 4. The lighter areas (from violet to blue and then to green and yellow) indicate the information content values where the chances of the to-infinitive increase, while the darker areas show the values with a higher preference for the bare infinitive. When the information content of help given a verb is very high (see the top part of the plots), the chances of the to-infinitive tend to increase. There is also a slight increase in the bottom right part of the plots in some of the varieties. This is a region with with high information content of a verb given help (the horizontal axis) and low to middle information content of help given a verb (the vertical axis). This increase explains the non-linear patterns discovered in Figure 3.

Cognitive complexity, horror aequi and their interaction
The effects of cognitive complexity and horror acqui are as expected in all countries. Table 2 displays the effects of linguistic distance. With each word between help and the infinitive, the odds of the to-infinitive credibly increase. There is some variation in the strength of this effect, with the American variety displaying the smallest value, and the Indian one the largest. Table 3 shows the effects of the presence of to before help for mean linguistic distance. The chances of the to-infinitive decrease if there is to before help. There is some variation, again. The Hong Kong data display the weakest effect, and the Jamaican subcorpus shows the strongest effect.
The positive interaction terms (see Table 4) indicate that the odds of the to-infinitive become higher, as the linguistic distance between help and the infinitive increases. The US data display the weakest effect, while the Jamaican variety has the strongest effect, closely followed by several others.

The form of help, the presence or absence of the Helpee and their interaction
The results are best represented visually. Figure 5 displays the mean posteriors and the 95% credible intervals. In all varieties, the form helping without the Helpee has the highest  chances of being used with the to-infinitive. With the exception of the Indian variety, the base form help is the most likely to be followed by the bare infinitive. However, when the Helpee is present, the difference between the forms is small. Normally, the presence of the Helpee increases the chances of the bare infinitive, although its effect is quite small after the base form help, where the credible intervals largely overlap (see especially the US variety). In the Ghanaian variety, we even see a small increase in the chances of the to-infinitive. Table 5 shows the numbers that represent the effect of transitivity of the infinitive on the presence of to. One can see that high probabilities (greater than 90%) are observed in the data from Hong Kong, India and Jamaica, followed by the USA (almost 87%). In the other countries, there is no strong bias in either direction. A separate check (not shown here) reveals that the presence of clause complements has no highly credible effects (close to 100%) in any of the varieties. In the USA, there is 92.3% probability that the clausal complements increase the chances of the bare form, followed by Jamaica (85.1%) and Hong Kong (84.2%).

Formality (average word length)
Finally, let us consider the degree of formality represented by the average word length of the text presented at an individual website. The posteriors in Table 6 show the effect of adding one letter on the log-odds of the to-infinitive vs. the bare infinitive. In most countries the average word length has positive effect on the chances of the to-infinitive, as predicted. The strongest effect is observed in Great Britain. The Ghanaian and US data display very weak positive effects. The Indian data show, surprisingly, the opposite effect: the longer the words in a text, the higher the chances of the bare infinitive.

Summary and discussion of the results
In general, the bare infinitive is the preferred variant in all varieties discussed here. The highest proportion of the bare infinitive is observed in the US data, whereas the lowest proportion is found in the Jamaican subcorpus, followed by the British data. The remaining countries exhibit proportions very similar to the British one. The results of the previous studies are largely corroborated, although there are also quite a few new details.
• The variables related to horror aequi and the principle of cognitive complexity behave in accordance with the expectations in all varieties. They interact, such that the effect of to before help weakens with linguistic distance between help and the infinitive.
Here, the models reveal no surprises.  • The varieties also behave similarly with regard to the form helping, which substantially increases the chances of the to-infinitive. It is followed by helps in most varieties. However, the models demonstrate that this contrast is strong only in the absence of the Helpee. When the Helpee is explicit, the differences between the forms are small. For the base form help, the chances of the bare infinitive tend to be the highest, with or without the Helpee (except for the Indian variety, where helped is also very likely to be followed by the bare infinitive). • As expected, the presence of the Helpee increases the chances of the bare form in all forms, with the exception of the base form help, when the presence of the Helpee makes little difference. • There is a positive effect of the average word length, which serves as a proxy of formality, on the probability of the to-infinitive in most varieties, although it has low credibility in the Ghanaian and US subcorpora. Surprisingly, one finds a credible reverse effect in the Indian variety. • There is also some evidence that transitive infinitives increase the chances of the toinfinitive in the varieties of Hong Kong, India, Jamaica and the USA, although this effect is only sufficiently credible in the data from Hong Kong. There are also some indications that the clausal complements play a role in some of the varieties, but these indications are very weak.
To summarize, there are very strong cross-lectal similarities with regard to the factors of horror aequi and cognitive complexity. As far as the other contextual factors (stylistic, morphological and syntactic) are concerned, most varieties behave in a similar way, but there are also exceptions. Interestingly, the US model often exhibits relatively weak effects in comparison with the other models. This may be due to the fact that the tovariant is the closest to extinction in that variety. The competition between the variants gradually disappears.  Let us now turn to the second question of the present study. The aim was to find out if constructional predictability determines the use of the bare or to-infinitive in the varieties of English and whether these effects (or lack thereof) are consistent. The generalized additive models show that there are some common effects in the expected direction in all seven varieties, but their directionality, strength and shape vary. The main conclusion one can draw is that the information content of help given a verb displays stronger and more systematic effects in the expected direction than the information content of a verb given the constructions. The infinitives that are associated with high information content of help are, as a rule, highly frequent verbs, such as be, have, do, say, ask, try and use. 5 These verbs appear in many diverse constructions, which explains the high information content of help. A few examples are provided in (12).
(12) a. (Hong Kong, blog, 3585980) Growing plants will help you to be patient. b. (India, general, 623003) It will help your partner to have clear insight regarding your travelling habits. c. (USA,general,44601).
…if I try to help him to do it better, he gets an attitude and yells "I don't care about baseball".
To support this conclusion, Figure 6 displays the differences between the percentages of to in all examples and in those where help is highly informative given the infinitive (top 5% 5 One might argue that the verbs be, have, and do are commonly used in the auxiliary function and may be special in some way. However, the effect is not limited to those verbs. Additional analyses show that a positive effect remains in all varieties even if one excludes the auxiliaries be, have and do. Figure 6: Percentages of to-infinitives in all contexts and in those where help given the infinitive is highly informative (top 5% of the scores).