Prompting Metalinguistic Awareness in Large Language Models: ChatGPT and Bias E ﬀ ects on the Grammar of Italian and Italian Varieties

-----------------------------------------------------------


Introduction
The Left Periphery of the clause (Rizzi 1997(Rizzi , 2001(Rizzi , 2004;;Rizzi & Bocci 2017), primarily investigated for Italian, is the portion of the syntactic architecture hosting landing positions for internal merge, triggered for scopediscourse properties such as Topics (as in 1b, Rizzi 1997: 289, 15a) and Foci (1c, Rizzi 1997: 290, 16b).The patterns presented here represent marked word orders, reorderings of canonical clauses (1a, cf. Haspelmath 2006), realized via A'-movement of the dislocated element, leaving a gap within the clause (cf.fillergap).Their parsing creates asymmetries across populations of speakers (developmental grammars and language pathology; Friedmann et al. 2009, Durrleman et al. 2016 andrelated works).In a similar vein, filler-gap dependencies do not represent a complex challenge for Large Language Models (henceforth LLMs) and neural networks architectures (see Linzen & Baroni 2021 for an overview; and see also Sinha et al. 2021, Wallat et al. 2021, and Chaves and Richter 2021) usually trained on a large set of (tendentially) non-annotated, raw data (Gulordava et al. 2018, Wilcox et al 2019, Wilcox et al. 2022; but see Merlo & Ackermann 2018; on other cases of long-distance dependency and reorderings).In a recent paper published on Linguistic Inquiry, Wilcox et al. (2022) propose that the investigation of LLMs might provide evidence against the Argument of the poverty of the stimulus (Chomsky 1965).On the other hand, Katzir (2023) suggests that these models are poor representations of human cognition.
Recent developments in LLM culminated in ChatGPT (OpenAI 2023; see an overview and critical discussions in Bang et al. 2023)  [1] , an LLM-based conversational AI (GPT, Radford et al. 2018 inter alia) prompted with a question (see also ChatGPT as an informant, Haider 2023) about a reordering of the type of (1b), the provided answer sounds somewhat "creative" (see caption for a translation, section 3 for glosses).).This is a correct form of Italian construction to indicated that Mario took the book.However, it should be noted that this is a very formal construction and it may result archaic or antiquated.In Modern Italian, it is more common to adopt the construction "Mario took the book", which is simpler and direct.
In this paper, we aim to evaluate such behavior by raising and investigating two research questions: i.Does ChatGPT find the reordering archaic because these orderings are rare/absent in the training?To answer this question, we investigate large-scale general-domain databases to distributions of reorderings and thus forms of bias (see Samo et al. 2022 for a definition and Samo & Bonan 2023 for a series of case studies) (section 2).
ii.What research questions can be raised with respect to ChatGPT's metalinguistic knowledge through prompt engineering?(section 3).

Understanding bias: reordering frequency in corpora
Reordering entails the notion of a given, canonical/unmarked order.Canonicity, beyond a universal structural approach (cf.Kayne 1994) can be expressed in terms of parsing effects (cf. Aaravind et al. 2018 and related works), but also in terms of frequencies in large-scale databases (Merlo 1994(Merlo , 2016;;Samo & Merlo 2019, 2021 and related works) [2] .
As specified in OpenAI (2023), developers have not disclosed the content of ChatGPT's training dataset.
However, we can study the distribution of canonical and marked patterns in a series of syntactically-annotated  1), annotated under the schemata of Universal dependencies (UD, Nivre 2015, de Marneffe et al. 2017, Zeman et al. 2023, which might also allow cross-linguistic comparative dimensions in future studies).
Treebanks are presented in Table 1, examples of methodology in Table 2 while results in Figure 3 -see captions for details [3] .Table 1.Treebanks, genres, size and references.Genres: L = Legal, N = News, NF = Non-fiction, SM = Social Media, W = Wiki.We only selected treebanks bigger than 1000 trees excluding treebanks containing learner essays, selected grammatical examples (MarkIT) and parallel treebanks.suggest that there seems not to be a pure metalinguistic awareness that is extracted from the data.

Prompting other structures and Italian varieties
ChatGPT's developers have not disclosed the content of its dataset (OpenAI 2023), but we assume that while major languages are present in the model's dataset (cf.Bender et al. 2021), minor languages, such as Italo-Romance varieties, might not.To find out and to find trends in the way these are distributed, we query examples through prompt engineering in one of these languages.We specifically targeted left-peripheral phenomena such as topics, summarized here by some examples.All screenshots of prompts are available in the supplementary files.
We started from a major Romance language, Italian, and tested non-pro-drop (with Mario being the subjecttable 3) sentences with fronted internal arguments (il libro, 'the book'), occurring with a co-indexed object clitic (lo, 'it'; as mentioned in Figure 1, section 1).Prompt and output are given in Table 3.  1990).As the output in Table 4 shows, the clitic is described "as not referring to any element in the sentence".La sintassi della frase "il libro Mario lo ha preso" è corretta, ma può essere considerata ambigua.
'The syntax of the sentence "the book Mario it has taken" is correct, but it can be considered ambiguous.This is because the pronoun "lo" does not clearly refer to any specific word within the phrase, and it could be interpreted differently according to the context.A more straightforward form of the sentence could be "Mario has taken the book", where the position of the direct object is more common in the Italian language and clarifies which is the subject carrying out the action and which is the object undergoing the action.'Now to non-standard Romance languages.We assume that the model might perform better with respect to texts in/about Neapolitan, one of the best-documented Italo-Romance varieties (cf.Basile 2021).As a matter of fact, ChatGPT correctly identifies the language but is not able to identify Information Structure phenomena such as 'double subjects' (Ledgeway 2011).While ChatGPT correctly identifies chillo's grammatical category (i.e., a demonstrative) it interprets it as a canonical demonstrative (Table 5, compare with screenshots 42-44 in the supplementary files) rather than a topic marker (and see Ledgeway 2011).'The sentence "Ciro (topic) is blond" is in the Neapolitan dialect and it means "that Ciro is blond" in Standard Italian.As a virtual assistant, I cannot express any judgment on languages and on dialects, but I can underline that every language is beautiful and complex and that every dialect has its own history and culture behind.However, if you want to communicate in a formal context or in a situation where it is appropriate to use a standard language, it is important to use correct Italian.' We then moved to smaller languages and attempted to write prompts in a small Apulian Italo-Romance variety spoken on the Gargano promontory (S.Marco in Lamis, Massaro 2022Massaro , 2023 ) ) by around 15,000 people.In this case, the output wrongly labeled the language as Neapolitan (perhaps because of the presence of schwas, a typical feature in the vocalic systems of Continental Southern Italo-Romance varieties).As expected, the language does not seem to be present in the training dataset.However, we thought it would be worth observing the output and the model's behavior when it is confronted with an smaller-resource language.As shown in table 6, the prompt included a preposed prepositional accusative, in line with our aim of detecting syntactic bias in the model's understanding of the Left Periphery.We anticipated that the model might not retrieve a coherent answer, and that it might not be able to identify a sentence's basic constituents.The bot mislabeled in fact the verb construction as a determiner phrase, thus translating le vistə (I have seen him) as 'the glasses'.It probably interpreted the Apulian clitic+auxiliary compound l.e as the Italian article le, and, consequently, the past participle vistə as a noun, which it translated as 'glasses' (note: viste does not mean 'glasses' in Italian).Finally, we explore a regional variety of Italian spoken in the same area.In the prompt, we used a left-peripheral internal argument (Michele).Roughly translated, the sentence we used means is it (exactly) Michele that you saw?.As shown in table 7, in this case, ChatGPT wrongly identified Michele as a vocative, Michele, did you see?We interpret this as an indication that the model does not contain data from regional varieties of Italian, or that at least in this case, it is biased against fronted internal arguments without an object clitic.In this instance, speakers interpret Michele as a direct object and not as a vocative thanks to specific intonational contours, a piece of information that is missing in the provided textual input.This suggests that these models are not good models of human linguistic competence (see also Dentella et al. 2023), at least with respect to the syntactic structures and the languages investigated in this paper.

Conclusion
In this paper, we aimed to answer the following questions: i. Does ChatGPT find the reordering archaic because these orderings are rare/absent in the training?
ii.What research questions can be raised with respect ChatGPT's metalinguistic knowledge through prompt engineering?
As for (i), we performed an observational study on general-domain corpora and we observed that reorderings are not rare nor archaic. 21.12.23,17:54 Pagina 10 di 13 https://www.journals.vu.lt/verbum/article/download/32583/32387?inline=1 As for (ii), we focused on prompting different complex structures in different varieties.Our prompts are however not an exhaustive method and future studies should take into account more comprehensive methods to extract grammatical information from LLM.
Future research should also focus on how LLMs may encode grammaticality (cf.Haider 2023 on English), despite the fact that NNs can encode forms of felicitousness of the sentence (Samo & Chen 2022).
On the other hand, the investigation of the output of ChatGPT (as well, different types of conversational AI based on LLMs) as corpora could be worth investing in syntactic research, in the spirit of quantitative and computational works in detecting micro-and macro-variation (Merlo 2015 and related works;Van Cranenbroek et al. 2019;Pescarini 2021Pescarini , 2022;;Crisma et al. 2021 inter alia).

( 1 )
a. Ho comprato il tuo libro have.1.SG bought the your book 'I bought your book' b.Il tuo libro, lo ho comprato The your book it have.1.SG bought 'Your book, I bought it' c.IL TUO LIBRO ho comprato __ (non il suo) The your book have.1.SG bought (not the his) 'It is YOUR BOOK that I bought (not his)'

Figure 1 .
Figure 1.User: can you say (lit.)'The book, Mario it has taken'?Reply: Yes, you can say (lit.)"The book, Mario it has taken').This is a correct form of Italian construction to indicated that Mario took the book.However, it

Table 2 .
Configurations , queries and one example (ID).Dedicated queries have been implemented in a python environment run on grew.match.count.fr.Table1summarizes the queries and provides an example.All queries and data are available at the following repository: https://github.com/samo-g/massaro-itaLP

Table 3 .
Prompt: Si può dire: 'il libro Mario lo ha preso'?('Can you say: 'the book, Mario it has taken?'') Si può dire: 'il libro Mario lo ha preso'?si can say the book Mario it has taken?'Can one say: Is it Mario who has taken the book?' Sì, si può dire "Il libro Mario lo ha preso".Questa è una forma corretta della costruzione Italiana della frase usata per indicare che Mario ha preso il libro.Tuttavia, va notato che questa è una costruzione molto formale e a volte può risultare un po' arcaica o antiquata.In italiano moderno, è più comune usare la costruzione "Mario ha preso il libro", che è più semplice e diretta.'Yes,one can say "the book Mario it has taken".This is a correct form of the Italian construction of the phrase used to say that Mario has taken the book.

Table 4 .
Prompt: Commenta sulla sintassi della seguente frase: il libro Mario lo ha preso ('Comment on the syntax of the following sentence: 'it is Mario who has taken the book'')