Use of Prompt-Based Learning for Code-Mixed and Code-Switched Text Classi�cation

Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing Natural Language Processing (NLP) systems such as sentiment anal-ysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models (PLMs) outperforms full ﬁne-tuning of PLMs across various NLP tasks. Despite the growing interest in CMCS text classiﬁcation, the eﬀectiveness of prompt-based learning for the task remains unexplored. Our study endeavours to bridge this gap by examining the impact of prompt-based learning on CMCS text classiﬁcation. We discern that the performance in CMCS text classiﬁcation is signiﬁcantly inﬂuenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt , which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-speciﬁc representation of CMCS text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments span across Sinhala-English, Kannada-English, and Hindi-English


Introduction
Code mixing involves borrowing words from one language and incorporating them into another without affecting the context.In the context of code-mixed text, we distinguish two subtypes: text comprising words that alternate between two languages and text transitioning from one script to another by substituting letters in a predictable manner (Transliteration).Code-switching, also known as language alternation, occurs when individuals alternate between two or more languages within a single conversation or situation [1].
Code-mixing and code-switching (CMCS) are intricate phenomena of linguistic behaviour, characterized by the intentional or spontaneous alternation of languages within a single discourse.Another characteristic of CMCS data is lexical borrowing, where words or phrases from one language are used in another.Grammatical hybridity [2], a distinct feature of CMCS, results in blending grammatical structures from different languages.Furthermore, CMCS is influenced by linguistic, social, and cultural constraints, leading to a specific contextual framework.
CMCS is commonly observed in online conversations.A thorough understanding of CMCS data is pivotal for effective communication, advertising, sentiment analysis, and fostering inclusivity across language boundaries.However, the inherent characteristics of CMCS data introduce unique challenges to NLP systems.In particular, the inclusion of multiple scripts and lexical patterns and the potential misidentification of transliterated tokens pose challenges even to modern Natural Language Processing (NLP) systems when processing such text.These challenges are particularly pronounced when working with low-resource languages [3,4].
In recent years, the domain of NLP has witnessed remarkable advancements, notably propelled by the emergence of pre-trained language models (PLMs) [5,6].These PLMs have been trained on extensive datasets, preserving a task-agnostic stance regarding the specific tasks for which they will be later used.To leverage the extensive knowledge embedded in PLMs for diverse NLP tasks, the PLM has to be fine-tuned with task-specific data [7].This "pre-train and fine-tune" paradigm has been able to activate and harness the comprehensive knowledge within PLMs, leading to very promising results across various downstream tasks such as text classification and named entity recognition [7,8].On the negative side, this paradigm faces challenges due to the disparity between pre-training and fine-tuning objectives, leading to inefficiencies in utilizing PLMs across diverse tasks, as they may be unstable in low-resource settings, and less transferable to new tasks after fine-tuning [7][8][9][10].
Prompt-based learning has recently been demonstrated to yield promising results compared to full fine-tuning of PLMs for many downstream tasks [10], even in lowresource scenarios [11].This paradigm involves redefining downstream tasks using textual prompts, encompassing both prompt engineering and answer engineering [8].In contrast to fine-tuning, prompt-based learning leverages the existing knowledge of PLMs by redefining downstream tasks as pre-training objectives [7,8,12].This removes the need for extensive parameter updates in PLMs, thus preserving their transferability across various tasks.Prompt-based learning has been extended to incorporate pre-trained multilingual language models (PMLMs) as well, enabling experimentation in languages beyond English [13][14][15].
Despite exhibiting success over full fine-tuning for monolingual text, prompt-based learning of PLMs with CMCS data for downstream tasks has not been explored.In the context of CMCS data, we are only aware of full-fine-tuning PLMs [3,16].Given that prompt-based learning relies on textual prompts, the design of such prompts for CMCS text is an open question.In other words, a prompt formulated in one language might not be suitable for effectively classifying CMCS data.
In this study, we focus on prompt-based learning for CMCS text classification.To the best of our knowledge, we believe that we are the first to explore prompt-based learning for CMCS text classification.Therefore, we first delve into the challenges surrounding CMCS text classification and the intricacies introduced by the presence of multiple scripts within a single text.Our experiments unveil that the performance of prompt-based CMCS text classification is influenced by the intensity of code-mixing and the inclusion of multiple scripts.
In response to the aforementioned challenges, we propose a novel methodology named Dynamic+AdapterPrompt.This approach employs distinct models for each script to generate script-specific representations by considering the script of the input sentence (DynamicPrompt).Additionally, it effectively captures task-specific representations necessary for the respective CMCS classification tasks through the utilization of adapters (AdapterPrompt).This amalgamated approach leverages the benefits of both adapters and dynamic script considerations.
We have conducted extensive experiments across Sinhala-English, Kannada-English, and Hindi-English datasets, for the tasks of sentiment classification, hatespeech detection, and humour detection.It is noteworthy that Sinhala and Kannada are categorized as low-resource languages [17].The outcomes demonstrate that our novel approach, Dynamic+AdapterPrompt, outperforms the existing methodologies: full fine-tuning, adapter-based fine-tuning, and conventional prompt-based learning techniques.The key contributions of this paper are: • We present the first comprehensive exploration of the impact of the script in CMCS text classification.• We introduce a novel prompt tuning approach for CMCS text classification, Dynamic+AdapterPrompt, to address the intricacies introduced by the inclusion of multiple scripts in CMCS data, providing script-specific and task-specific representations.

Related Work
In this section, we delve into three key areas: prompt-based learning, adapter-based fine-tuning of PLMs, and the challenges and advancements in CMCS text classification.

Prompt-Based Learning
Until recently, full fine-tuning, also known as vanilla fine-tuning, was the predominant method for adapting PLMs to downstream tasks [10,18,19].In full fine-tuning, all the parameters of the PLM are trained for an underlying downstream task, which demands a significant amount of computational resources.Prompt-based learning primarily comprises three key components: the prompt, the PLM, and the verbalizer [8].As depicted in Figure 1, prompt engineering involves the selection of a prompt template for a downstream task.Early research used manually designed human-readable prompts, referred to as manual or discrete prompts [19,20,22].Subsequent studies have shifted focus towards soft prompts, also known as continuous prompts, which are optimized during training for specific downstream tasks [19,20,22].Answer engineering refers to the selection of the verbalizer.The verbalizer is the component that maps the predicted mask token of the PLM into the intended label [23] as illustrated in Figure 1.Verbalizers that are human-readable are denoted as discrete verbalizers, whereas soft verbalizers undergo optimization during the training process.Several studies have explored designing suitable verbalizers for downstream tasks, utilizing both discrete and soft tokens [9,13,24].The primary aim of these studies has been to broaden the coverage of the answer space of the verbalizer for each respective label.The effectiveness of this pipeline is significantly determined by prompt engineering and answer engineering [25].

Adapter-Based Fine-Tuning of PLMs
Adapters are compact trainable modules that can be integrated into transformer layers.They provide a lightweight fine-tuning alternative to the full fine-tuning approach [26].Houlsby [26] and Peiffer [27] are the two adapter architectures that are commonly used.The key distinction between these two is that the Houlsby adapter employs two down-and up-projection modules, whereas the Pfeiffer adapter utilizes only one module.Adapters can be generally categorized into two categories: task adapters, which learn task-specific representations, and language adapters, which learn languagespecific representations [27].Typically, language adapters are used in conjunction with task adapters [3,28].Extensive research has been conducted on adapters as a parameter-efficient fine-tuning method for various tasks.In Rathnayake et al. [3], Sinhala-English CMCS text classification was performed employing different combinations of adapters, yielding improved results compared to full fine-tuning with minimal parameter updates.Moreover, Rücklé et al. [29] demonstrated the benefits of adapters beyond lightweight fine-tuning.They observed a minimal impact on task performance when adapters were dropped from the lower layers of the PLM.
The application of adapters has proven to be beneficial for prompt-based learning as well.Karimi Mahabadi et al. [12] introduced a few-shot learning method utilizing a masked language modelling objective, and leveraged task-specific adapters as a prompt-free strategy.Their experimental results showcased the effectiveness of this technique in comparison to manual and soft prompts.
Smaller language models face difficulties with soft prompts, as discussed by Shah et al. [30].Li and Liang [19], Reynolds and McDonell [31] suggest that, as the model size increases, the performance gap between prompt-based approaches and fine-tuning narrows, indicating that larger models tend to benefit more from fine-tuning.To enhance smaller language models' effectiveness, Shah et al. [30] suggest using adapters in combination with soft prompts.Their novel approach shows promise in optimizing smaller models, achieving up to 98% of the performance of full fine-tuning.This research holds the potential to significantly improve the efficiency of smaller language models.

CMCS Text Classification
Classifying CMCS text poses a significant challenge in NLP, largely due to the scarcity of annotated datasets, particularly in the context of low-resource languages.Despite these challenges, studies have been made in developing manually annotated CMCS text classification datasets for low-resource languages [1,3,16,32,33].A range of deep-learning approaches has been employed for classifying CMCS data.For instance, Chathuranga and Ranathunga [34], Kamble and Joshi [35] utilized techniques such as capsule networks, LSTM, and BiLSTM for CMCS text classification.
Currently, state-of-the-art performance in CMCS text classification is achieved using PMLMs [3,16,36,37].However, Zhang et al. [38] showed that PMLMs are not perfectly code-switching compatible.When no training examples are provided (zeroshot), the observed performance of PLMs for CMCS-related tasks shows that these models are less effective compared to models that have been specifically trained for a task.Additionally, they exhibit limited learning capabilities in few-shot settings.

Datasets
We select three publicly available CMCS datasets.These cover low-resource languages (Sinhala, Kannada), as well as Hindi, a high-resource language [17].They exhibit different levels of code-mixing and have been annotated for various classification tasks.or Mixed Note: Si-En, Ka-En, and Hi-En denote Sinhala-English, Kannada-English, and Hindi-English, respectively.Code-mixing intensity measures the extent to which multiple languages are integrated within a single sentence or discourse.The term [other] represents the script of the language combined with English in the CMCS context.
The first dataset [3] includes CMCS sentences in Sinhala and English languages.This dataset has been annotated for sentiment classification, humour detection, and hate-speech detection tasks.The second dataset [39] consists of Kannada and English CMCS content and has annotations for sentiment analysis and hate-speech detection.The Hindi-English dataset1 contains CMCS content written in the Latin script, which has been annotated for the humour detection task.Each language possesses its unique script for writing (Latin for English, Sinhala for Sinhala, Kannada for Kannada, and Devanagari for Hindi).
Altogether, these corpora exhibit six distinct CMCS variations with respect to the script used in training instances, as shown in Table 1.In the first two variants, the text is exclusively composed in one language, employing characters from the same language.Conversely, in the next two variants, the text is written in one language, utilizing characters from a different language.The fifth variant comprises sentences that alternate between languages, while the last variant involves sentences that blend elements from two or more of the aforementioned types.
To better analyze the extent of language mixing, we systematically classify sentences in each corpus based on the percentage of characters from each script2 as outlined in Algorithm 1. Opting to examine sentences at the character level, as opposed to the word level, enables us to capture finer details of language mixing.In CMCS sentences, particularly in informal communication, individual words can seamlessly blend characters from multiple scripts.
Algorithm 1 Instance Classification Based on Script.The term [other] represents the script of the language combined with English in the CMCS context.Modify the token by removing symbols and numbers for all characters in token do end for 18: end for A threshold of 100% is considered, implying that if all characters in a sentence belong to one script, it is categorized under that script; otherwise, it is labelled as a mixed-script sentence.As shown in Table 1, note that the Hindi-English dataset does not have content in Devanagari script -instead, Hindi words have been written in Latin script.Comprehensive statistics for all three datasets are provided in the Appendix A.

Baselines
For our experiments, we use a random baseline (assigning class labels to instances without any predetermined criteria) and a majority/minority class baseline (where classification is based on the majority or minority class), along with three additional baselines associated with PLMs.As mentioned earlier, only full fine-tuning and adapter-based fine-tuning of PLMs have been employed for CMCS text classification [3].Therefore we use these two techniques as our baselines.Basic prompting entails training artificial tokens while maintaining the frozen state of the PLM.Since promptbased learning has not been attempted previously for CMCS text classification, we utilize Soft Prompt + Soft Verbalizer as our baseline for prompting.

Full Fine-Tuning (Full FT)
We train the PLM by updating all parameters, including the task-dependent sequence classification head added on top, as proposed by Devlin et al. [6].Throughout this process, the PLM weights are adjusted using task-specific data, which facilitates the learning of task-specific representations.We fine-tune the PLM separately for each downstream task (single-task fine-tuning).

Adapter-Based Fine-Tuning (Adapter-Based FT)
We integrate randomly initialized adapters into the PLM.During the fine-tuning phase, we specifically train the introduced adapter parameters while keeping the original PLM parameters frozen.For each downstream classification task, we train distinct sets of adapters.We experiment with both Houlsby [26] and Pfeiffer [27] adapter architectures.

Prompt-Based Learning with Soft Prompt + Soft Verbalizer (SP+SV)
We employ soft prompt (SP), which comprises artificial token embeddings, and soft verbalizer (SV), which consists of artificial tokens in label words, with the PLM, as proposed by Hambardzumyan et al. [24].SP and SV replace traditional discrete tokens with artificial ones.During the training phase, we fine-tune SP and SV while keeping the PLM parameters frozen.

Experimental Setup
PMLMs excel in CMCS text classification by leveraging their contextual understanding and transfer learning capabilities.Their multilingual proficiency, derived from diverse training datasets, enables effective handling of language variations within the same text.In this study, we utilize the XLM-RoBERTa-base (XLM-R) [5] model as the PMLM for our experiments.This choice is motivated because the XLM-R model has been pre-trained on a range of languages, including the languages considered in our study.Moreover, it proves to be well-suited for our work, particularly within the constraints of a resource-efficient computing infrastructure.For full fine-tuning and adapter-based fine-tuning, we employ the code released by Rathnayake et al. [3].We implement all the prompt-based learning models using the OpenPrompt3 [23] framework, which supports Hugging Face Transformers 4 and is built upon the PyTorch framework 5 .For adapter-based implementations within OpenPrompt, we utilize the Adapter-Transformers6 library, which is built on Hugging Face Transformers.The datasets specified in Section 3 are partitioned into training, validation, and testing subsets in a stratified manner, with respective proportions of 80%, 10%, and 10% (statistics are provided in A).As suggested by [3], we employ Random Oversampling (ROS) to address the class imbalance issue of the hate-speech detection task within the Sinhala-English CMCS dataset.Given the pronounced class imbalance in these datasets, the Macro F1-Score is opted as our primary evaluation metric, as it facilitates a more consistent and reliable comparison.
All models are tested across three different seeds (8, 42, 77), and the average results are reported.The maximum sequence length for the input sentence is set at 128.We conduct each experiment for 20 epochs with a batch size of 32.Early stopping is employed in the experiments with a patience of 5 epochs.An evaluation is conducted at the end of each epoch, and the best-performing model is chosen for testing.We use the Adam optimizer as the gradient optimizer, paired with a linear learning rate scheduler.Additionally, a grid search is conducted for hyperparameter tuning to boost the performance of each model.The optimized hyperparameters for each experiment are delineated in Appendix B. All the experiments are conducted using NVIDIA Tesla P100 GPU machines on the Kaggle7 platform.

Impact of Script Variation and Code-Mixing Intensity on CMCS Text Classification
The variations in CMCS data, as outlined in Table 1, underscore the unique properties and characteristics inherent to CMCS data.To better understand the complexities in handling CMCS text, let's consider an illustrative example: "I passed the ".This sentence illustrates a classic instance of Sinhala-English code-mixing.The word " " is a Sinhala term for "examination".When entirely transliterated into the Latin script, it might read: "I passed the wibhagaya".
For a model primarily trained on English data, the term might be unfamiliar.Conversely, for a model with extensive Sinhala training, the transliterated version "wibhagaya" might pose confusion.
To explore the influence of scripts on CMCS text classification, we conduct training on baseline models using the comprehensive training set outlined in Section 5 which encompasses training samples from all scripts.Table 2 illustrates the script-wise results of this experiment.In the Sinhala-English context, despite the Latin script demonstrating the best performance in full fine-tuning, the Sinhala script outperforms the other two scripts in adapter-based fine-tuning and SP+SV.Conversely, in the Kannada-English context, the Latin script yields the highest performance in full finetuning and SP+SV, while the Mixed script excels in adapter-based fine-tuning.The sentences in the Kannada script exhibit the lowest results.It is evident that in both CMCS contexts, significant performance variations exist based on the script of the training instance.To delve further into the impact of script on CMCS text classification, we create distinct training sets by considering the language script of the training instances.These training sets are employed to train the models, each focusing on a single script, to investigate the impact of the training script.For both the Sinhala-English and Kannada-English datasets, we first create separate training sets such that each training set contains only training instances written in one type of script (Latin, Sinhala/Kannada, Mixed).The first experiment involves selecting 10% of the total training data from the Latin script, Sinhala/Kannada script, and mixed script sentences, respectively.
We then expand our analysis to include a larger subset, comprising 20% of the overall training data for each script category In this phase, 20% of the training data is selected from the Latin script instances and another 20% from a combined subset of Sinhala/Kannada and mixed script instances, because the sentences in Sinhala/Kannada and mixed scripts individually constitute less than 20% of the overall training data.Each subset is stratified based on the label distribution of the task, ensuring a balanced representation of task labels within each script category.Subsequently, we train the underlying PLM, the XLM-R model, utilizing the aforementioned training subsets.
The test dataset utilized in all experiments is as described in Section 5. Note that the Hindi-English dataset, consisting solely of Latin script instances, is excluded from this particular experiment.Also, note that this analysis is focused exclusively on the sentiment classification task.
Table 3 depicts that in the Sinhala-English context, training on Latin-script sentences leads to optimal results across all three baselines.With the 10% training dataset proportion, full fine-tuning shows similar outcomes when trained on either Sinhala-script or mixed-script sentences.With adapter-based fine-tuning, training on mixed-script sentences outperforms Sinhala, whereas SP+SV shows better results  The prompt-based learning baseline, SP+SV, reaches its highest performance with Latin-script training in both Sinhala-English and Kannada-English contexts.It can be observed that the performance of full fine-tuning, adapter-based fine-tuning, and SP+SV for both datasets exhibit significant fluctuations based on the training script.
Revisiting our example, when trained on Latin-script sentences, the model might proficiently classify the sentence "I passed the wibhagaya" due to the dominance of Latin content.However, the original sentence, "I passed the ", which blends Sinhala and Latin scripts, might be more challenging, primarily contingent on the model's familiarity with Sinhala characters.
Based on the discernible findings from the aforementioned experiments, it becomes evident that performance disparities are contingent on the script of the input sentence.Thus, a conclusive inference is drawn: CMCS text classification performance is significantly influenced by the inclusion of multiple scripts and the degree of code-mixing intensity.

Optimizing Prompt-Based Learning Through Script-Based Adaptations
To address the limitations observed when employing soft prompts with small language models [30] as mentioned in Section 2.2, we conduct experiments with adapters in the context of prompt-based learning, referred to as AdapterPrompt.As elaborated in Section 6, the effectiveness of prompt-based learning for CMCS text classification depends on the script of the input text.To address this dependency, we believe that instead of using the same soft prompt and soft verbalizer for inputs of different scripts, dynamically determining the prompt and verbalizer based on the input script, could enhance the model's capability.To achieve this, we propose Dynam-icPrompt.Finally, we combine DynamicPrompt with adapters forming a fusion of DynamicPrompt and AdapterPrompt, to leverage the strengths of both and present Dynamic+AdapterPrompt.

AdapterPrompt
As mentioned in 2.2, when utilizing soft prompts, the effectiveness of small pre-trained language models such as XLM-RoBERTa-base diminishes, thereby reducing the efficacy of prompt-based learning [30].To mitigate this, we utilize AdapterPrompt, while preserving the static state of the PLM parameters.In AdapterPrompt, we integrate adapters with the SP+SV model to classify CMCS data, as depicted in Figure 2. Instead of solely querying the PLM using soft prompts as in the SP+SV approach, we incorporate task adapters into the PLM.This enhancement augments the task-specific representation for the underlying task.
We experiment with the two commonly used adapter architectures, Houlsby [26] and Peiffer [27], both integrating into SP+SV models.Additionally, employing the adapter-dropping technique [29], we progressively remove adapters, starting from higher layers of the PLM as illustrated in Figure 3.This iterative process aims to identify the optimal set of adapters necessary to effectively acquire the task-specific representations for the respective task associated with the adapters.

DynamicPrompt
Our DynamicPrompt architecture consists of separate SP+SV models, each optimized for a specific script category.Each model is trained exclusively with sentences from its respective script, yielding script-specific representations.While the PLM remains frozen, all SP+SV models share this common PLM.The script of the input sentence is programmatically determined based on the percentage of characters from each script, as elaborated in Section 3. A threshold of 100% is applied, meaning that if all characters in a sentence belong to one script, it is categorized under that script; otherwise, it is labelled as a mixed-script sentence.
Based on the identified script, the corresponding SP+SV model is selected dynamically.We then concatenate the soft prompt with the input sentence and feed it into the PLM, which predicts the masked token based on the surrounding context.Subsequently, the soft verbalizer maps the predicted answer tokens to the corresponding label using soft answer tokens.The architecture of DynamicPrompt is illustrated in Figure 4.

Dynamic+AdapterPrompt
By amalgamating the aforementioned approaches, we propose a novel prompt-based learning methodology termed Dynamic+AdapterPrompt.With the introduction of Dynamic+AdapterPrompt, we train separate SP+SV models, each augmented with adapters, for each script category.Each model is exclusively trained on sentences corresponding to its designated script, thereby providing a script-specific representation for enhanced classification.The frozen PLM serves as the backbone shared across all the SP+SV models, with adapters being integrated into the PLM to encapsulate task-specific functionality.This strategy effectively capitalizes on the inherent strengths of both DynamicPrompt and AdapterPrompt.
Two architectural variants of Dynamic+AdapterPrompt can be implemented, employing distinct methods for integrating adapters to the PLM.
• Separate SP+SV models for each script category with shared adapters across all models: Here, we employ shared adapters across all the SP+SV models.The goal of this approach is to allow the models to leverage common task-specific functionality through shared adapters while simultaneously benefiting from the script-specific representation provided by the separate SP+SV models.The architecture of this variant is delineated in Figure 5. • Separate SP+SV models for each script category with distinct adapters for each model: In this variant, we integrate adapters with the PLM separately for each SP+SV model.The objective of this approach is to facilitate fine-tuning and adaptation that are specific to the characteristics of each script category.
In our experiments, we explore the first architectural variant, employing separate SP+SV models for each script category, while applying shared adapters across all models.Subsequently, in an ablation study detailed in Section 8.3, we investigate the second variant to assess the effectiveness of both architectures and to determine the impact and viability of such architectural variants.Within these variations, we experiment with the Houlsby adapter architecture.

Evaluation
In this section, we evaluate the effectiveness of the proposed Dynamic+AdapterPrompt approach compared to the baselines.Subsequently, we conduct two ablation studies, with a particular focus on the sentiment classification task in the Sinhala-English context: a script-based analysis to examine the impact of different scripts, and a comparative study to explore the effectiveness of the two adapter integration architectures within Dynamic+AdapterPrompt-shared adapters and script-wise adapters.The training and test datasets utilized in all experiments are as outlined in Section 5. Throughout this section, Ac. denotes Accuracy, while Precision (Pr.), Recall (Re.), and F1 correspond to macro averages.

Overall Evaluation
We first conducted a detailed study to determine the optimal adapter architecture for prompt-based learning in CMCS text classification.In the results for the Sinhala-English sentiment analysis task, as depicted in Appendix C, Houlsby architecture, with activated layers from 0-10, yielded the highest performance in AdapterPrompt.Therefore, the results reported in Tables 4, 5, and 6 are with Houlsby adapter architecture.
In the baseline evaluation presented in Tables 4, 5, and 6 for the majority class, minority class, and random baselines, it is observed that the random baseline outperformed the majority and minority class baselines.However, all three of these baselines exhibit significantly lower performance in comparison to the baselines associated with PLMs.
Notably, the SP+SV approach outperforms the XLM-R full fine-tuning in all cases, except in the Hindi-English context.It achieves superior or competitive results compared to adapter-based fine-tuning, except for hate-speech detection in the Sinhala-English context.The enhanced performance of SP+SV over full fine-tuning can be traced back to the substantial discrepancy in objectives during the pretraining and fine-tuning phases within the fine-tuning paradigm, which hinders the full exploitation of knowledge within PLMs, as we discussed in Section 2.1.AdapterPrompt consistently demonstrates superiority over the baseline results in sentiment classification, hate-speech detection, and humour detection as shown in Tables 4, 5, and 6 across all language contexts, except hate-speech detection in the Sinhala-English context.This observation is aligned with previous research that highlights the effectiveness of integrating adapters into the PLM within the fine-tuning paradigm for improving CMCS text classification [3].Importantly, our findings reiterate this trend, emphasizing that even within the prompt-based learning paradigm, the integration of adapters into the PLM results in performance improvements.This strategic use of adapters significantly enhanced the SP+SV approach, showcasing a substantial improvement in the model's understanding of specific task intricacies by providing task-specific representations.
DynamicPrompt exhibits lower performance compared to the baselines in all language contexts, except for sentiment classification in the Sinhala-English context.Notably, DynamicPrompt yields inferior results compared to AdapterPrompt across all tasks and language contexts.In Section 8.2, we delve into an in-depth analysis of the models' performance across different script categories.Despite the lower performance observed with DynamicPrompt, the Dynamic+AdapterPrompt approach outperforms the results of DynamicPrompt and AdapterPrompt in the majority of tasks across all language contexts (except for sentiment classification).This improvement can be attributed to the adapters' ability to learn task-specific representations, while the soft prompt and soft verbalizer within each model acquire script-specific knowledge for classification.This underscores the proficiency of the combined approach in adeptly addressing challenges intrinsic to both script and task across various language combinations.Note that in the context of humour detection in Hindi-English, the DynamicPrompt and Dynamic+AdapterPrompt techniques are not explored, primarily because the dataset is entirely in the Latin script.
In summary, the SP+SV approach demonstrates superior or competitive results compared to XLM-R full fine-tuning and adapter-based fine-tuning for most tasks, with only a few exceptions.AdapterPrompt consistently outperforms baseline results, showcasing the effectiveness of integrating adapters into the PLM within the prompt-based learning paradigm.DynamicPrompt alone exhibits lower overall performance.However, the combination of DynamicPrompt and AdapterPrompt, Dynamic+AdapterPrompt, emerges as the most effective strategy.This underscores the benefits of leveraging both script-based prompts and adapters to address the intricacies of script and task variations in CMCS text classification.

Script-Based Analysis
Section 6 unveiled a significant variance in performance, particularly in the context of prompt-based learning, depending on the script of the input sentence.In this ablation study, we further analyze the impact of the script, employing the sentiment classification task in the Sinhala-English context as a case study.
The discrepancy introduced by the inclusion of multiple scripts is further highlighted by the findings presented in Table 7.Despite the integration of adapters into SP+SV (AdapterPrompt), this variance persists.This can be ascribed to the adapters providing only task-specific representations, which may not fully rectify the scriptrelated disparities.However, it is noteworthy that AdapterPrompt has demonstrated notable effectiveness when the input is in a single script.
Although DynamicPrompt has a relatively lower performance, it has reduced the script's influence on CMCS text classification, as demonstrated in Table 7.This method is effective because DynamicPrompt provides script-specific representations, making it more robust against script variations compared to fine-tuning, SP+SV, and AdapterPrompt.This underscores its contribution to addressing challenges related to including multiple scripts in CMCS text classification.
When considering Dynamic+AdapterPrompt, the outcomes indicate that the integration of adapters led to a variance in performance, similar to the observations in AdapterPrompt.However, it narrows the gap between Latin and Mixed scripts compared to SP+SV and AdapterPrompt, due to the script-specific representations provided by DynamicPrompt.
In conclusion, DynamicPrompt helps reduce script variations, and Adapter-Prompt enhances performance with task-specific representations.The combination, Dynamic+AdapterPrompt, achieves even better results by leveraging the strengths of both approaches.

Evaluating the Efficacy of the Architecture in Dynamic+AdapterPrompt
In Section 7.3, we employ the shared adapter architecture; however, it is crucial to note that the script-wise adapter architecture remains a viable alternative.To explore the comparative effectiveness of these two adapter integration architectures, we conduct an ablation study, and the results are presented in Table 8.The superiority observed in the shared adapter architecture can be attributed to its capacity to develop a uniform task representation across all scripts.By sharing adapters across all SP+SV models, the shared adapters in the Dynamic+AdapterPrompt model can benefit from the knowledge obtained from the entire training dataset.Conversely, script-wise adapters are confined to learning solely from the samples of a specific script with which they are associated, resulting in a more circumscribed and script-dependent understanding.Consequently, in the Dynamic+AdapterPrompt, the shared adapter architecture enables the model to leverage a more extensive spectrum of training data, leading to a performance that is markedly superior to that of the script-wise adapter architecture.

Conclusion
In this paper, we explore the potential of prompt-based learning for CMCS text classification.Our comprehensive experiments exploring the impact of the script reveal that the effectiveness of prompt-based CMCS text classification is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing.In light of these findings, we propose a novel prompt-based tuning method named Dynamic+AdapterPrompt.We employ separate models for each script category, integrated with adapters to encapsulate the script-specific representation and the taskoriented functionality of CMCS text.The experimental results prove that our proposed method outperforms strong baselines across various CMCS contexts and text classification tasks.This underscores its robustness and efficiency in classifying CMCS text, particularly involving low-resource languages.We have released our code to facilitate future research8 .As part of our future work, we intend to delve into the application of multi-task learning for code-mixed text classification, leveraging prompt-based learning techniques.

Statements and Declarations Funding
No external funding was received for this research.

Appendix C AdapterPrompt Performance
Table C7 illustrates the results of the comprehensive study conducted to identify the optimal adapter architecture for prompt-based learning in CMCS text classification, with a specific emphasis on the sentiment classification task in the Sinhala-English context.
Full fine-tuning also struggles in fully exploiting the linguistic knowledge acquired during pre-training, due to the disparity between the objectives of pre-training and fine-tuning stages [7, 20, 21].While pre-training typically encompasses self-supervised tasks such as masked language modelling, full fine-tuning has to use task-specific training objectives (e.g.classification, sequence labelling, or generation).Prompt-based learning aims to bridge this gap between pre-training and fine-tuning objectives.In other words, prompt-based learning reformulates downstream tasks to be similar to training objectives used during PLM pre-training [8].For encoder-based models that use a masked language modelling objective, one such reformulation technique is to convert the downstream task into a cloze-style format, as illustrated in Figure 1.

1: for all instances in corpus do 2 : 3 :
Initialize counters for total characters, Latin script characters, and [other] script characters for all tokens in instance do 4:if token is tagged as irrelevant or only numbers then

16 :
Calculate percentages of Latin and [other] charactersClassify the token based on percentage 17:

with
Sinhala-script compared to the mixed script.At a 20% training dataset proportion, combinations of Sinhala and mixed-script sentences underperform compared to training with Latin-script sentences.In the Kannada-English context, with the 10% training dataset proportion, mixedscript sentences training excels over Latin and Kannada scripts in full fine-tuning and adapter-based fine-tuning, while SP+SV is most effective with Latin-script sentences.Training on Kannada-script sentences results in the lowest performance across all baselines.At a 20% training dataset proportion, the patterns between Latin-script and Kannada+mixed-script training resemble those at the 10% level.

Fig. 4
Fig. 4 DynamicPrompt Architecture: A Sinhala-English Example.The input sentence translates into English as "I like to watch cricket matches".

Fig. 5
Fig. 5 Dynamic+AdapterPrompt Architecture with Shared Adapters: A Sinhala-English Example.The input sentence translates into English as "I like to watch cricket matches".

Table 1
Variations of CMCS Data Across Datasets.

Table 2
Results by Script obtained through training using the entire training dataset: Sentiment Classification.
1A-B FT stands for Adapter-Based Fine-Tuning.

Table 4
Overall Results: Sentiment Classification

Table 6
Overall Results: Humour Detection

Table 7
Results by Script: Sentiment Classification for Sinhala-English

Table 8
Dynamic+AdapterPrompt Results by Architecture: Sentiment Classification for