Predictability and Causality in Spanish and English Natural Language Generation

In recent years, the field of Natural Language Generation (NLG) has been boosted by the recent advances in deep learning technologies. Nonetheless, these new data-intensive methods introduce language-dependent disparities in NLG as the main training data sets are in English. Also, most neural NLG systems use decoder-only (causal) transformer language models, which work well for English, but were not designed with other languages in mind. In this work we depart from the hypothesis that they may introduce generation bias in target languages with less rigid word ordering, subject omission, or different attachment preferences for relative clauses, so that for these target languages other language generation strategies may be more desirable. This paper first compares causal and non-causal language modeling for English and Spanish, two languages with different grammatical structures and over 1.5 billion and 0.5 billion speakers, respectively. For this purpose, we define a novel metric of average causal and non-causal context-conditioned entropy of the grammatical category distribution for both languages as an information-theoretic a priori approach. The evaluation of natural text sources (such as training data) in both languages reveals lower average non-causal conditional entropy in Spanish and lower causal conditional entropy in English. According to this experiment, Spanish is more predictable than English given a non-causal context. Then, by applying a conditional relative entropy metric to text generation experiments, we obtain as insights that the best performance is respectively achieved with causal NLG in English, and with non-causal NLG in Spanish. These insights support further research in NLG in Spanish using bidirectional transformer language models.


I. INTRODUCTION
T HANKS to their capacity to acquire universal language representations from vast amounts of unlabeled text data, transformer-based Natural Language Generation (NLG) models [1] have achieved unprecedented success [2].
Transformers are based on Sequence-to-Sequence (seq2seq) language models [3] by enhancing them with positional encoding, which enables parallel training while still considering word order, and a novel self-attention mechanism that selects the most relevant parts of the input sequence.The unsupervised nature of transformers' pretraining facilitates handling vast raw text data.
However, the linguistic prevalence of English on the Internet is a primary source of bias [4].Most evaluation data sets and benchmarks are primarily or entirely written in English with very few exceptions [5], and therefore the most innovative contributions rarely target other languages.This linguistic data imbalance has a detrimental effect on polyglot language models' word embeddings [6], [7], as tokenizers assign longer tokens to character sequences in languages with more extensive representation [8], [9], [10], despite of some proposed solutions such as those in [11] and [12].The multilingual NLP paradigm of cross-lingual transfer learning, which tends to view non-English languages as particular use cases, makes language models favor English-like grammars with more strict word ordering and explicit subject [13].
Nowadays, the majority of generative language models are decoder-only transformers, being OpenAI's GPT-3/3.5/4[14], and ChatGPT the most popular, and Big Science's BLOOM, Meta AI's OPT, and Google AI's BARD also wellknown examples.Most of these models are multilingual, however, their performance varies between languages.This led to monolingual implementations of the smaller model GPT-2 in languages other than English, for example in German 1 , French 1 , Italian [15], and Spanish [16].
These generative models are exclusively causal, that is, they produce text from left to right by recursively feeding the model with previously generated sequences.As in the case of Recurrent Neural Networks (RNN), decoder-only transformers are expectation-based word predictors.These systems tend to favor structures in which related elements are close along the sequence, such as relative clause attachments to syntactically lower nominals in ambiguous contexts, which fits nicely into English syntax [17].
However, the mutually beneficial congruence between causal language modeling and English may not apply to other languages.Not only does Spanish prefer a higher nominal attachment in the resolution of ambiguous relative clauses, but its syntax is also highly flexible, even within declarative sentences [18].This is strongly opposed to the more strict subject-verb-object structure of the English language, which allows for few inversion exceptions [19].
Unlike causal language models, encoder-only non-causal language models generate word embeddings using bidirectional contexts, which means that the model output can be conditioned by both left and right tokens.This eliminates the output sequence's sequential dependencies and allows alternative generation orders.
In light of this, we depart from the hypothesis that decoderonly (causal) transformer language models may introduce generation bias in target languages with less rigid word ordering than English, subject omission, or different attachment preferences for relative clauses, so that for these target languages other language generation strategies may be more desirable.To put this hypothesis to test, in addition to English, we consider and Spanish, a language with a different grammatical structure and also a broad base of speakers (these languages sum over 1.5 billion and 0.5 billion speakers, respectively, a substantial share of the world's population).However, the approaches in this work can be extended to obtain insights on other languages and NLP tasks.
Our contributions are: A) First, we present a novel information-theoretic approach to study language predictability.We compare the causal context-conditioned entropy and the noncausal context-conditioned entropy of the grammatical category distribution of source natural texts to assess whether their language is more predictable from causal or non-causal language contexts.This reveals lower average non-causal conditional entropy in Spanish and lower causal conditional entropy in English.According to this assessment, Spanish is more predictable than English given a non-causal context.B) Then, using both automatic (based on conditional relative entropy) and manual evaluation methodologies, we put decoder-only and encoder-only transformer language models to test to assess empirical causal and non-causal NLG performance, seeking to evaluate if the currently dominant causal NLG paradigm is adequate from a language-agnostic perspective or whether specific languages may benefit from other word generation orderings.We obtain as insights that the best performance is achieved with causal NLG in English and non-causal NLG in Spanish.These insights support further research in NLG in Spanish using bidirectional transformer language models instead of the dominant decoder-only ones.The rest of this paper is organized as follows.Section II reviews related work on both psycholinguistic language predictability and language model causality in NLG.Section III describes the proposed analytical methodology used for the experiments.Sections IV and V present the details and results of the assessments of predictability and text generation performance, respectively.Section VI summarizes and discusses the results obtained.Finally, Section VII concludes the paper.

II. RELATED WORK
In this section, we discuss relevant works on both causality in NLG (Section II-A) and language predictability (Section II-B).

A. CAUSALITY IN GENERATIVE TRANSFORMER LANGUAGE MODELS
The contextual awareness of a transformer is controlled by self-attention.The base concept behind this attention mechanism is a mapping of a query (q) into pairs of keys (k) and values (v).By respectively denoting the queries', keys', and value sets' matrices as Q, K and V, we define self-attention as: Transformers, rather than a single attention function, project queries, keys, and values onto h separate heads.This is called multi-head attention: (2) By denoting each head attention function as: where W Q i , W K i , W V i and W O are parameter projection matrices for the queries, keys, values, and output respectively.This attention mechanism is present in all the layers of both the encoder and the decoder, if present.While the encoder's attention is bidirectional, the decoder has two different types of attention: (i) a masked multi-head attention block that masks non-causal context and (ii) a bidirectional multihead attention block that receives non-causal information from the encoder.
Even though this encoder-decoder architecture is popular in some NLP tasks such as machine translation [20], [21], [22], [23], several transformer-based models only have one of these components.By omitting the encoder in decoder-only transformers, all non-causal contextual dependencies are removed by exclusively using masked attention.Decoder-only transformers are nowadays the best performing task-agnostic NLG systems.Nevertheless, there exist some state-of-theart non-causal NLG solutions.For example, non-causal language models can be trained for the Masked Language Modeling (MLM) objective, a task in which the language model predicts masked words within a sentence [24].Typically, non-causal NLG systems are focused on particular tasks such as speech recognition [25], [26], [27], style transfer and grammar correction [28], textual data augmentation [29], and task-specific dialog systems [30], [31].

B. LANGUAGE PREDICTABILITY
Conditional entropy is a typical metric for evaluating the predictability of a problem given its input variables and expected output probability distribution [32], [33].Conditional entropy H(X | Y ) measures the extra information carried by a variable X when another conditional variable Y is available as side information.
In psycholinguistics, surprisal theory also uses this information-theoretic concept to quantify processing difficulty in sentence comprehension [34], [35].Multiple studies provide empirical evidence for this expectation-based theory by showing correlations between textual surprisal and both test subjects' reading times, as in [36], and brain activity, as in [37].
Even if generally accepted, surprisal theory does not model working memory in text comprehension, disregarding processing difficulties in integrating words or components that are widely apart within a text [38], [39], [40], [41], [42].Lossy context surprisal [43] combines expectation and memory-based predictability theories by modeling working memory constraints as noise.Even though this model premise is independent of language, it can accurately reflect several language-specific text-processing phenomena.
Lossy context surprisal recreates structural forgetting by dropping part of the context and re-sampling it incorrectly from the a priori language knowledge probability model.Structural forgetting [44] is a common grammatical illusion in English in which ungrammatical double-embedded relative clauses can be perceived as correct.Probabilistic language expectations can determine this exclusively.In [45] it is proven that native and non-native speakers show structural forgetting in English, but do not behave this way when presented with the same syntactic structures in German or Dutch.This propensity of English probabilistic distribution to such backward prediction mistakes is coherent with the issue of non-causal English text generation.
The main goals of neural NLP and psycholinguistics approaches to language cognition are very similar: (i) to give formally explicit descriptions of the mental structures underpinning cognitive processes, and (ii) to explain the learning mechanisms behind them [46].Even if research in these areas tends to diverge, recent contributions to the study of linguistic theory use language models [47], further evidencing their alignment.
However, to our knowledge, psycholinguistics concepts have yet to be applied to neural language modeling other than for data set elaboration [48].With this in mind, our work provides a novel linguistic-based conditional entropy hypothesis test for language modeling causality (see contribution 1 above), whose findings can support future NLG designs and methodologies.

III. METHODOLOGY A. PREDICTABILITY HYPOTHESIS TEST
Causal language models predict the next token in a sequence of tokens.These models are solely concerned with the left context for sinistrodextral (i.e., written from left to right) languages such as Spanish and English (conversely, non-causal models trained on the MLM task consider the bidirectional context for blank-filling-based text generation).
Given a sequence of tokens X as context, language models provide the probability mass function for the next predicted token X.For a generation index i < N , we define the N -long input causal context as follows: And the non-causal context as: So that we can express the output of a causal language model as: And the output of a non-causal language model as follows: with vocabulary set predictable given causal or non-causal contexts (and thus, for example, whether Spanish NLG may benefit from non-causal language generation ordering).The less conditional entropy a problem has, the more predictable its outcome.As NLG is a language prediction task in which previously generated words are available as context, we want to compare the conditional entropy in two scenarios: (i) causal text generation, in which text is generated from left to right so that we provide words to the left of the predicted one as context; and (ii) noncausal text generation, which uses both left and right context for word prediction.
In order to test the predictability of causal and non-causal language models for English and Spanish, we compute and compare the average causal and non-causal conditional entropy for textual data in both languages: with: It must be noted that both X N c and X N n have size |V| N .
Context length and vocabulary size determine the accuracy of our estimation results.As we have no previous information about token probability distribution, we model both p(X) and p( X = x | X = x) as categorical distributions.The estimators used for these distributions are: and with L being the token sequence length, L i the number of instances of the context i, and L ij the number of instances in which token j appears given context i.
In case both p(X) and p( X = x | X = x) are discrete uniform distributions, these estimators have normalized variances

L
, respectively.This means that, in order to set our estimators' normalized variances to a specific value, the number of analyzed tokens should be proportional to |V| N and |V| N +1 , respectively.
Therefore, given the data available, neither word nor subword tokenization are feasible.We instead use a grammatical categorization based on Part-Of-Speech (POS) tagging.It reduces vocabulary size and data requirements dramatically while maintaining the original goal.
Algorithm 1 Non-causal text generation.
The resulting hypothesis test evaluates how predictable natural English and Spanish syntaxes are for causal and noncausal language models.Our first intuition is that non-causal predictability, as determined by the inverse of the non-causal context-conditioned entropy, will be higher for Spanish than for English and the opposite for causal predictability.By validating this, we can demonstrate that causal ordering may not be ideal for Spanish NLG, paving the way for further study of non-causal Spanish text generation approaches based on bidirectional transformers.

B. NON-CAUSAL TEXT GENERATION
For non-causal NLG, first we start with a sequence of [MASK] tokens of the desired length K.At each iteration, we re-sample every token once.We mask and fill tokens in groups of size N .In order to fill the masked tokens, we sample the output of a non-causal language model, from which we remove adjacent tokens, short prefixes and suffixes, and unknown tokens to enhance the overall quality of the produced sequence.This process is formally described in Algorithm 1.
The number of iterations I and the number of tokens masked in each generation step N must be predetermined.These parameters influence the performance and computational efficiency of the algorithm.More masked tokens per generation step mean fewer calls to the language model function (⌈ K N ⌉ calls per iteration), resulting in improved computing efficiency.In this work we set N = 2 and I = 30.

C. AUTOMATIC EVALUATION
The relative entropy D KL (P || Q), also known as KL divergence, quantifies the expected increase in uncertainty that comes from modeling a reference distribution P as another distribution Q.In this study, we use the following formulation for the conditional relative entropy for both causal (X = X c ) and non-causal (X = X n ) contexts: where p and q are the conditional probability density functions of our reference textual dataset's POS tags and the sequences to evaluate, respectively.

D. MANUAL EVALUATION
The annotators were asked yes/no questions on the following aspects to assess the quality of the generated sequences: words by combining prefixes, suffixes, and pronouns as sequential tokens.This question penalizes nonsensical words.We assessed inter-agreement with accuracy and αreliability [49] to verify that the annotations were neither arbitrary (acc = α = 0) nor redundant (acc = α = 1).
Finally, we also included a more general rating question (Q5) in which we asked annotators to provide a numerical score between 1 and 5 based on their impression of the annotation and their experience.

IV. PREDICTABILITY TEST RESULTS
We used two different experimental setups to perform the hypothesis test in Section III-A: (i) A first setup using two data sets, one in English and another in Spanish, which are exact translations of each other (Section IV-A); and (ii) another setup with larger, relatively similar English and Spanish data sets that cannot be considered exact translations of each other (Section IV-B).Both setups have advantages and disadvantages.It is more desirable to compare parallel content (setup #1), but bigger data volumes allow for analyzing lengthier contexts (setup #2).As mentioned, we preprocessed the data sets with a POS tagger.The POS tagging module used the spaCy es_core_news_sm 2 and en_core_web_sm 3 pipelines for Spanish and English, respectively.In order to balance the categories, we reduced 2 Available at: https://spacy.io/models/es,June 2024. 3Available at: https://spacy.io/models/en,June 2024.the original seventeen Universal POS tags 4 to the following nine: adjectives (ADJ), adpositions (ADP), adverbs (ADV), conjunctions (CONJ), determiners (DET), nouns (NOUN), pronouns (PRON), verbs (VERB), and a last category combining unknown words, interjections, blank spaces, punctuation marks, and symbols (OTHER).
We executed the experiments using two Nvidia A100 GPUs with the specifications in Table 1.

A. TALE DATA SETS
TABLE2: Tale data set sources.

Source Language
Ciudad Seva5 Spanish Rincón Castellano 6Spanish Elejandría 7  Spanish Andersenstories.com 8  Spanish & English Grimmstories.com 9 Spanish & English Americanliterature.com 10  English D. L. Ashliman's compilation 11  English Long long time ago 12  English Project Gutenberg 13  English The H.P. Lovecraft Archive 14  English The tale data sets of setup #1 consisted of public domain tales, short novels, and fables with Creative Commonslicensed translations.We crawled the English and Spanish text collections for the sizes of the respective datasets to be identical (3.7M words of raw text each) so that they could be considered direct translations, from Portable Document Format (PDF) with the Python pdftotext 15 library and by web scrapping using Scrapy 16 web spiders.Table 2 shows all the data sources.
TABLE3: Two-word context-conditioned entropy in bits per tag, tale data set.We could not compute conditional entropy values for contexts longer than two words due to data set size constraints.However, because of the similarities in content between English and Spanish data, we could efficiently study shortterm grammatical dependencies in both languages.In the experiment, we compared the causal two-word context with a specific case of non-causal context in which the predicted word lies between the two contextual words.We provide a brief qualitative analysis of the contexts that resulted in lower conditional entropy values for the predicted term Xi for causal and bidirectional contexts in both languages.

Language H(
Table 3 shows the results of hypothesis testing on English and Spanish tale data sets.They are coherent with our initial intuition that Spanish is more suited for non-causal text prediction than English.However, by examining Table 3 rowwise, middle tag prediction seemed to have lower entropy than causal text prediction in both languages.This does not necessarily mean that non-causal NLG outperformed its causal counterpart, as this experiment disregarded relevant factors, such as initial text generation steps. Tables 4, 5, 6 and 7 show left and bidirectional contextpredicted word pairs with entropy lower than log(3) ≈ 1.585.We can note that the number of combinations satisfying this condition for the Spanish data set was much higher for bidirectional contexts than for causal contexts.Figure 1 shows that, for Spanish, causal patterns were also less diverse, as all of them relied on either pronoun/verb or determinant/noun grammatical dependencies.
As shown in tables 6 and 7, causal and bidirectional low entropy contexts in English were more balanced.Figure 2 reflects the prevalence of the adjective/noun dependencies (especially for the causal case).Conversely, adjectives have a much less rigid position in Spanish within the sentence.using WikiExtractor 19 .Then we loaded and mapped the resulting JSON files with HuggingFace's Datasets 20 library.We picked one million random non-empty articles in each language for hypothesis testing.The amount of data available allowed accurately computing the conditional entropy for longer contexts than in the previous case, yielding average conditional entropy results for contexts up to six words.For this experiment, we explored all possible non-causal contexts and assessed the impact of the location of the predicted tag on our predictability results.
Table 8 shows the average causal and non-causal conditional entropy results for the Wikidumps data set.Compared to the tale data set with tighter equivalence between languages, Spanish conditional entropy values often exceeded those for English.For shorter contexts, even non-causal conditional entropy was higher in the Spanish data set than in the English data set.For this reason, we report average entropy as conditioned by the set of non-causal contexts by subtracting the causal context, which is always lower in Spanish than English, highlighting the lower efficiency of left-to-right word prediction in Spanish.We can further appreciate this effect in Figure 3, which depicts the average conditional entropy for all possible predicted tag locations within contexts of diverse lengths.As we can see, the more balanced the left and right contexts are, the more predictable the grammatical category in both languages.However, this tendency was considerably more noticeable in Spanish and became more evident as the context was longer.Since the average number of words per sentence tends to be higher in Spanish than in English [50], [51], [52], Spanish non-causal NLG is even more promising.

V. TEXT GENERATION RESULTS
For the text generation experiment, we used four different language models: Deep ESP's Spanish GPT-2 21 and University of Chile's Spanish BERT 22 for Spanish causal and noncausal language modeling, respectively, and OpenAI's GPT-2 small 23 and Google's BERT base 24 for English causal and non-causal language modeling, respectively.Therefore, we used the small versions of these language models, which range from 110M to 117M parameters.We are aware that these models are not on par with state-of-the-art generative language models, but our goal was not to achieve the best text generation results but rather to compare the performance of causal and non-causal language models with similar characteristics.The chosen models were comparable in both training data, number of parameters, and vocabulary size, thus appropriate for this experiment.
We fine-tuned the models on the tales data set described in Section IV-A using HuggingFace's Transformers25 library.
The training was executed in 10 epochs with batch size 8 using a Nvidia A100-PCIE-40GB (see Table 1).
Then, we generated 1,000 different 50-token sequences for each of the four language models.We assessed the generation performance of each language model using both automatic (Section V-A) and manual (Section V-B) evaluation metrics.

A. AUTOMATIC EVALUATION
In the sequel, we will call "opposite" the language, either Spanish or English, for which the model was not trained for an experiment.We used the conditional relative entropy metric described in Section III-C for automatic evaluation to compare the produced sequences from the tales data sets in the target and opposite languages.This was possible because language-independent POS tagging was used for tokenization.
As expected, Table 9 reveals that the sequences generated by all four models adhered more closely to their respective target language's POS tag distributions.There were no significant differences between causal and non-causal contextconditioned relative entropy results.
Given its conditional relative entropy values one order of magnitude greater than those of the other three models, English BERT performed significantly worse in terms of adherence to the target language (∼0.02 vs ∼0.1), which is consistent with the state of the art [53], [54].For Spanish, the results of causal and non-causal NLG are more comparable.The Spanish BERT non-causal language model had the lowest conditional relative entropy, outperforming even English GPT-2 when considering adherence to the respective target languages.These results show that causal models corresponded more closely to English grammar and non-causal models corresponded more closely to Spanish grammar (by considering grammar as reflected in the English and Spanish datasets).Note that this also held when using the models of the opposite language.That is, the conditional relative entropy of English Bert for Spanish was lower than the conditional relative entropy of English GPT-2 for Spanish, and the conditional relative entropy of Spanish GPT-2 for English was lower than the conditional relative entropy of Spanish BERT for English.

B. MANUAL EVALUATION
We chose 250 sequences at random for each pairing of language (Spanish or English) and model (BERT or GPT-2) to reduce the annotation load while still obtaining useful insights.Each sequence was examined independently using the questions in Section III-D.
Five annotators participated in the evaluation.Table 10 shows the global inter-agreement analysis of yes/no replies.Using the thresholds by [55], α-reliability coefficients lay between fair and moderate agreement.Accuracy values were also acceptable for all questions.The grammatical structure (Q2) was the most controversial aspect of the first four questions due to different opinions on linguistic demand, as some annotators were more lenient with one of the languages.
Next, we can see that the manual assessment scores for all four questions in Table 11 were consistent with the automatic metrics: BERT was considered to perform better in Spanish and GPT-2 to perform better in English.The best outcome for Spanish language models was in word sense (Q4), whereas English language models scored better in word concordance (Q1).This is consistent with the fact that there is no gender concordance in English.For all language models, the more challenging question was word repetition (Q3).For the more subjective fifth question, as the average rating from one annotator to another ranged from 3.529 to 4.31, we normalized the scores of each annotator for an average of 1.0.The results of Table 12 indicate that the subjectively perceived quality of Spanish texts generated by BERT is higher than when using GPT-2, vice versa in the case of English, which is consistent with our initial intuition and all the results so far.

VI. DISCUSSION
All the results of the previous section are aligned with our initial intuition that Spanish is more suited for non-causal language modeling than English.As shown in Table 13, natural English text was demonstrated to be more predictable than text in Spanish given a causal context in the predictability test results, by a relatively constant margin of ∼5%.However, given a non-causal context, Spanish was more predictable than English, by an increasing margin as the context got longer.
In the automatic evaluation with conditional relative entropy, Spanish BERT showed the highest adherence to its target language grammar.These results are consistent with the text generation ranking summarized in Table 14 (whose row "automatic evaluation" reflects the conditional relative entropies in Table 9), as English GPT-2 performed better than English BERT, and Spanish BERT better than Spanish GPT-2 in all the evaluation experiments.Manual evaluation consistently ranked English GPT-2 and Spanish BERT as the best language models for NLG.Spanish BERT ranked worse than English BERT in concordance, but, as previously stated, this results might be biased by the lack of gender concordance in English.
Overall, the results of our experiments show that noncausal language modeling is more promising for Spanish NLG than for English.

VII. CONCLUSIONS
In this paper, we have first assessed English and Spanish predictability given causal and non-causal contexts, demonstrating that Spanish is more predictable given a non-causal context.For this purpose, we developed and computed a novel metric of the average causal and non-causal contextconditioned entropies of the grammatical categories present in similar and strictly parallel English and Spanish textual data sets.The experiments have shown that average causal context-conditioned entropy is higher in Spanish texts than in English texts, and that average non-causal contextconditioned entropy is higher in English texts than in Spanish ones.This was further supported by a a study of the grammatical dependencies that are more predictable in each language and how word location within a context influences predictability.
Following the validation of the hypothesis about the relation between causal-and non-causal contexts and language predictability, we selected causal and non-causal language generators based in Spanish and English models to analytically assess their quality depending on the target language to generate.To make experiments comparable, we chose similarly dimensioned unidirectional and bidirectional pretrained transformer language models and fine-tuned them using highly equivalent Spanish and English data sets.
Finally, we evaluated the outcome both analytically and manually to assess the performance of text generation in all test scenarios.In the first case, to compare the compliance of the language models with the grammatical structure of their target languages, we have proposed a conditional relative entropy metric.Manual evaluation, which was validated using inter-agreement metrics, is coherent with the automatic evaluation, validating it.
The insights of this study suggest the interest of further research into analyses of language predictability in languages other than English, as well as on efficient text production using bidirectional transformers in Spanish and other languages with similar grammatical structures.

TABLE1:
Specifications of the GPUs.

F
. JAVIER GONZÁLEZ-CASTAÑO received the B.S. degree from University of Santiago de Compostela, Spain, in 1990, and the Ph.D. degree from University of Vigo, Spain, in 1998.He is currently a professor at University of Vigo, Spain, where he leads the Information Technologies Group.He has authored over 100 papers in international journals in the fields of telecommunications and computer science, and has participated in several relevant national and international projects.He holds three U.S. patents.SILVIA GARCÍA-MÉNDEZ received the Ph.D. degree in Information and Communication Technologies from University of Vigo in 2021.Since 2015, she has been working as a researcher with the Information Technologies Group at University of Vigo.She is currently collaborating with foreign research centers as part of her postdoctoral stage.Her research interests include Natural Language Processing techniques and Machine Learning algorithms.FRANCISCO DE ARRIBA-PÉREZ received the B.S. degree in telecommunication technologies engineering in 2013, the M.S. degree in telecommunication engineering in 2014, and the Ph.D. degree in 2019 from University of Vigo, Spain.He is currently a researcher in the Information Technologies Group at the University of Vigo, Spain.His research includes the development of Machine Learning solutions for different domains like finance and health.
penalizing improper use of verb tenses, number, and, if applicable, gender of determinants, adjectives, nouns, and pronouns.• Q2.Syntactic structure correctness, by checking if all sequences have at least one subject and one verb and assessing that the sentences are syntactically sound in general.
• Q3.Word or phrase-level repetitions, by penalizing word redundancy, duplication in enumerations, and subject redundancy, while trying to respect those repetitions that may be considered stylistic choices.•Q4.Word sense.Language models can generate new Conditional entropy in bits per tag, Wikidump data set.