Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data

: Authorship Attribution (AA) is a task that aims to recognize the authorship of unknown texts based on writing style. Out of the various approaches to solve the AA problem, Stylometry is a promising one. This paper explores the use of a K-Nearest Neighbor (KNN) classifier combined with stylometry features to perform AA. This study indicates the robustness of KNN in performing AA on short historical Arabic texts written by different authors. To classify the texts according to the author, KNN was trained with a set of stylometry features including rare words, count characters and 2-, 3-and 4-grams character levels. Various feature set sizes ranging from 34 to 2000 were tested in the experiment. The experiments were conducted on limited training data with datasets consisting of 3 short texts per the author’s book. This method proved to be at least as effective as Information Gain (IG) when selecting the most significant n-grams. Moreover, the KNN classifier achieved high accuracy results with the best classification accuracy of up to 90%, except for the 5-KK using the 4-gram character level. This work contributes towards utilizing KNN for identifying the distinctive stylometry feature for robust AA identification in short historical Arabic texts.


Introduction
Authorship Attribution (AA) is the process of identifying the author of anonymous texts by providing some samples of texts of a few authors as a training set, assuming that the anonymous text is written by one of the authors of the known text samples (Shaker and Corne, 2010;Nirkhi et al., 2014).AA is a kind of Text Classification (TC) task.However, AA is different from TC because the writing style in AA is equally as important as the text content, but in TC, only the latter is important.Additionally, with different data sources such as articles and books, feature sets and classifiers may behave differently in AA (Bozkurt et al., 2007).Therefore, these differences make AA more challenging than TC.
In general, AA is useful for resolving issues such as plagiarism, detection and resolving historical questions regarding unclear or disputed authorship.Recently, practical applications for AA have grown in areas such as criminal law, civil copyright law and computer security for tracking authors of computer virus source codes.The vast majority of AA has been dedicated to identifying the author of long texts ranging from single passages to book chapters.Recently, more works have been focusing on short text.The present paper focused mainly on the issue of short-text, which refers to the amount of training data available per author.Stylistic choices are commonly accepted in texts written by an author but frequently occur less in short texts (Luyckx and Daelemans, 2011).Therefore, working with short texts constitutes a particular challenge and requires a robust and reliable representation of these texts.One of the fundamental sub-problems of AA is the extraction of the most suitable features to represent the writing style of each author.This problem is known as stylometry.This paper used the K-Nearest Neighbor (KNN) classifier to classify AA by extracting various character n-grams and lexical feature vectors of the writing style per author, as the style of a text can be used as a distinctive feature to identify its writer (Takçı and Ekinci, 2012).Also, the writing style can be analyzed using factors within the same document, or by comparing two documents by the same author (Menai, 2012).

Literature Review
Only a few studies in the field of AA have focused explicitly on data size.This area has not been probed in much depth in many languages yet since most stylometry research tends to focus on long texts by authors or multiple short texts, where the longer the texts; the better the identification (Ouamour and Sayoud, 2012).Stamatatos (2009) stated that text samples should be long enough to ensure the text features sufficiently represent the author's style.
Nevertheless, there is no consensus on the minimum text sample required.Some studies investigated AA for short texts and found that using 2,500 words per sample would hardly provide a reliable result (Eder and Maciej, 2010).Also, AA accuracy deteriorates with reduced training data size (Luyckx and Daelemans, 2011;Al-Sarem and Emara, 2019).This paper proposes an overall investigation into AA that addresses 30 different short texts written by 10 ancient Arabic travelers who wrote several books describing their travels.A special Arabic dataset called (AAAT) was built.In AAAT, the number of words per author ranged between 1,289 and 1,785.Traditionally, the reliable minimum for an authorial set is considered to be 100,000 words per author (Ramnial et al., 2016).Nevertheless, (Knaap and Grootjen, 2007) used texts no longer than a sentence.Their experiment showed that, for 2 out of 5 classification tests, the text was correctly classified.
As for Arabic texts, a few studies in the field of AA can be found, namely (Abbasi and Chen, 2005a;2005b).These studies tested the dataset of Arabic forum messages written by 20 authors with 20 messages per author.The principal conclusion of their experiments obtained the best accuracy (94%), but the overall performance was lesser than that of the English language.Despite the notable results they have, the dataset is quite large to extract enough features.
Ouamour and Sayoud (2012) also investigated Arabic text AA using a small data size.A variety of character ngrams features and word n-grams were used.Their results yielded the best accuracy (80%) using Support Vector Machine (SVM) classifier.In another work on the same dataset and the same sets of features, Ouamour and Sayoud (2018), examined authorship of short historical Arabic texts using the following classifiers: SVM, Linear Regression (LR), Multi-Layer Perceptron (MLP) and a new fusion called Vote Based Fusion (VBF).The results of their investigations indicated that the classifiers scored different accuracies and the VBF gave the highest accuracy (90%) among these classifiers.Shaker and Corne (2010) studied the task of AA on Arabic text as well.They tested a set of Arabic function words using a Hybrid of Evolutionary Search and Linear analysis.At the phase of training and testing, the used texts were divided into 2 chunks: The first with 1000 word chunks, while the second with 2000 word chunks.The best performance obtained was 87.63% accuracy when 2000 word chunks were used.Partly, they showed that at least about 1,000 words chunks are necessary to obtain adequate characterization of function word usage for the Arabic authors.Moreover, they stated that the longer the text is, the higher the performance.The disadvantage of their method is that it depends just on function words to discover the authors and these function words were identified to reflect the semantic of English function words of previous researches.
On the other hand, other studies (Al-Ayyoub et al., 2017) showed that, even with short texts, performance of the classifiers depends mainly on the feature types rather than on the text size.They applied three well-known classifiers Naïve Bayes (NB), SVM and Bayes Networks using stylometric features and Bag Of Words (BOW) methods for AA of Arabic articles.They also concluded that stylometric features can generate more accurate results under most settings.The notable of their study that, they tested their method on large dataset consisted of 14,039 short articles.
The same findings, were reported by (Ouamour et al., 2016) in which the authors examined the performance of Manhattan distance, Stamatatos distance LR, MLP and SVM.Two types of features were investigated: Character n-grams features and words using Arabic dataset.The length of text varies from 100 to 3,000 words per document.The results were quite interesting, showing that the minimum textual size required to obtain a fair AA solution depends on both the feature types and classification methods.
Furthermore, in the same work of (Ouamour et al., 2016) reported that the optimal data size for a good AA is at least 2,500 words per sample.The results confirmed the findings by (Eder and Maciej, 2010) for the English language and the minimum data size of textual is 2,500 words per sample.Their results are useful, however we cannot extend them to every feature or classifier.
Recently, (Al-Sarem and Emara, 2019) investigated the effect of increasing training set size on the performance of attribution classifiers in the context of short religious Arabic texts.They used dataset consisted of 4,631 short texts.Mahalanobis distance, MLP and LR classifiers were employed.They stated that by increasing the size of training set, accuracy of the MLP classifier increased then decreased vastly.With some nuance change, the same thing was notated with Mahalanobis distance and LR classifiers.Interesting results could be notated in Al-Sarem and Emara (2019) where the n-gram features lead to decrease the performance of the classifiers with increasing the size of training set.However, generally speaking, the n-gram approaches provide the best results among the all stylometric features.
Regarding stylometric features, a study presented by (Nirkhi et al., 2014) investigated the use of stylometric features for AA in short texts of online English messages.For evaluating the performance of the classification methods such the SVM and KNN classifiers.The performance of SVM obtained 92% average accuracy and was higher than KNN (80% average accuracy).This study proved that stylometric features provides a way to classifiers that require fewer input variables than traditional statistics.
In summary, different classifiers were tested for AA in short texts.The SVM classifier is the most used classifier in the literature.The results of the literature showed that the classifiers have different behaviors regarding the used features and length texts.Thus we purpose to investigate the performance of KNN classifier on short historical Arabic texts ranging between 1289 and 1785 words per author.The length text varies from 290 to 800 words per document.Thus, we aim to train KNN classifier on limited data.For the purpose different stylometric features used.Additionally, methods of nfold cross validation and Feature Selection (FS) were used for enhancing KNN Performance.
In the following sections, the AA approach used in this study is described.

Methodology
The common method used for solving the AA issue begins with a set of training data whose authors are known.Then, a set of features is extracted.These features are used in the ML algorithms for the classification process.This step allows the researcher to classify a test document whose author is unknown.Stylometric feature set and classification method used in this study are presented in next sections:

Stylometric Features
Stylometry is a behavioral feature that an author exhibits throughout his writing.Therefore, stylometry can be extracted and potentially used for checking the identity of the author of texts (Brocardo et al., 2013).Stylometry mainly relies on the assumption that individuals have distinctive ways of writing and this writing style cannot be manipulated consciously (Kusakci, 2012).Some examples of stylometric features include sentence length, word length, letter frequencies, word n-grams, character n-grams and function words.
The basic categorization of these features is based on character, lexical, syntactic and semantic features (Oliveira Jr. et al., 2013).In this study, several characters and lexical features were tested.Table 1 represents the description of each purposed feature with an example.In the following, the main features employed in the proposed system are listed: 1. Character N-grams: These features provide information about the author's style (or at least the topic of interest), which cannot be determined using only lexical features (Schwartz et al., 2013).
Besides, character n-gram frequency is helpful to reliably handle limited data, which is why this parameter needs to be tested to facilitate short-text AA.A variety of character n-grams and words were used in another work (Ouamour and Sayoud, 2018), with the results yielding the best score of 90% accuracy.Meanwhile, Türkoğlu et al. (2007) focused on extracting bi-gram and tri-gram features using different classifiers including the KNN classifier.They concluded that n-grams yielded more successful results than additional features with allpurpose classifiers.The Character N-grams are strings of n consecutive characters from a given text (Stamatatos, 2009).Consequently, we distinguish character level n grams.For instance, for the text "the data" all character level 4-grams that can be generated are: "the_", "he_d", "e_da", "_dat", "data".Where the underscore character (_) represents the space, as is the convention in this study.Character such as "space" can provide vital information about the author's style (Takçı and Ekinci, 2012).For the Arabic language, all character level 4-grams that can be generated from the text " ‫الكلمات‬ ‫"عدد‬ are: " ‫عدد",_‬ ‫"لمات‬ ‫"كلما",‬ ‫"لكلم",‬ ‫"الكل",‬ ‫"_الك",‬ ‫"د_ال",‬ ‫"دد_ا",‬ ".Noted that, we consider the Arabic texts from right to left 2. Character Count of Alphabets: According to these features, a text is viewed as a sequence of individual characters, so simple character level measures can be defined as a character count (Chen et al., 2012).

KNN Classifier
The K-Nearest Neighbor (KNN) algorithm is amongst the simplest Machine Learning (ML) algorithms.It is a type of instance-based learning, which runs local approximations.All computations are deferred until classification.An object is classified by a majority vote of its neighbors with the object being assigned to the class most common amongst its k nearest neighbors (Nirkhi et al., 2014).Here, k means a small positive integer.If k = 1, then, the object is simply assigned to the class of that single nearest neighbor.Ramnial et al., (2016;Nirkhi et al., 2014)  In this study, the steps for the AA task included preprocessing, feature extraction, classification and author identification.A flowchart illustrating the text processing and classification process in this research is shown in Fig. 1.

Data Pre-Processing
Data pre-processing is a crucial step in AA.Text documents in their original form are not suitable for learning.These documents must thus be converted into a vector space since most learning algorithms use attribute-value representation (Elayidom et al., 2013;Abu-Hamad and Mohd, 2019;Salam and Kadir, 2017).In this study, the AA dataset was sent to a preprocessing algorithm, which was built (using C# language) based on the following steps: 1. Tokenization: tokenization is a method of splitting a stream of text input into meaningful elements (Elayidom et al., 2013).These elements are called tokens, for example, symbols, words, phrases and so on.The extracted group of tokens serves as input for further processes such as parsing and text mining, which, in turn, are part of lexical analysis (Elayidom et al., 2013).In this study, the dataset was processed into grams of 2, 3 and 4 character grams, by tokenizing the characters on white space.White spaces were considered as character and they replaced with underscore character (_).Table 2 shows character grams and character count of the Arabic text " ‫ارتحالي‬ ‫"وكان‬ as a sample from the AAAT dataset after applying the tokenization process 2. Punctuation Mark Removal: All the punctuation marks (e.g., "!\:;?.,،) were removed from the texts of each document 3. Normalization: Is the process of finding the standard form of all letters found in each document in a dataset (Al-Badarenah et al., 2016).Normalization is used to help overcome the variation in text representation (Altheneyan and Menai, 2014;Omar et al., 2013;Saad and Latiff, 2018).In this study, some Arabic letters such as (alef) were normalized into all their forms such as ( ‫,آ‬ ‫,إ‬ ‫)أ‬ to ( ‫.)ا‬ Also, the final ‫ة‬was replaced with ‫ه‬ and the final ‫ى‬was replaced with ‫.ي‬All these letters were converted to the same case of their forms to more accurately reflect the dimensionality of the vector space.Also, numbers such as the author's dates were cleaned.This step was done because this type of information may cause an unfair advantage on the controlled dataset that will not scale the authors into genres or topics (Luyckx, 2010) This text data pre-processing step is crucial for determining the quality of the text stages and includes the feature extraction and classification stages.

Extracting Features
The features and their extraction are dependent on the text language.The features are extracted from the authors' text and can be used to understand the peculiarity of an author's writing.Different character and lexical features are extracted here, including rare words, character count, character bi-gram, character trigram and character tetra-gram.

Feature Selection and Reduction Methods
Some features such as character and lexical features can considerably increase the dimensionality of the feature set (Stamatatos, 2009;Howedi and Mohd, 2014).In such cases, Feature Selection (FS) methods such as information gain can be used to reduce such dimensionality.Dimensionality can be reduced by selecting just a subset of the original features.Some features can be removed based on the frequencies of those features, by setting those

Built author models with KNN classifier
Text Pre-processing

Predicted authors
Training set

Built file of n-gram types
Test set

3-Fold Cross validation
Split into 30% test data and 70% training data

Probabilistic model
for each author IG Chi-x frequencies greater than or less than a defined threshold value (Al-Badarenah et al., 2016).Many data mining algorithms perform better with lower dimensionality because the most characteristic features will remain after FS (Fissette, 2010).As reported here, Information Gain (IG) and Chisquared (Chi-x 2 ) were used as the FS technique: (1) Chi-Squared (Chi-x 2 ): Is a statistical method that measures divergence from the expected distribution, assuming that feature occurrence is independent of class value (2) Information Gain (IG): Considers each feature independent of others and offers a ranking of the features based on their IG score so a certain number of features can be selected easily

Classification Model
As is the case of this study, if the researcher only has a small dataset to work with for the classification problem, it would be difficult to provide enough data for separate training sets and testing sets.In this case, it is possible to apply the n-fold cross-validation technique, by using all the data as both training and testing data.Thus, the cross-validation technique was applied to provide a more meaningful result (Taş et al., 2007;Burrows, 2010).Cross validation also helps to avoid over-fitting and provides an unbiased estimate of the learning algorithm predictive performance (Keevers, 2019).By dividing the dataset randomly into n partitions, called folds.One of the n partitions is keep as the testing data and the rest of the n-1 partitions are used as training data, the training and testing data must not have any of the same data points.Then the classifier will be trained n times.For each training run a single part from n partition, will be selected as the test set and using the rest for training.Then fitting a model on the training set and evaluated it on the test set.The model will be discarded after storing the evaluation score.
In this study, we set n = 3 that means the AAAT dataset was divided into 3 partitions (folds), each fold has 10 text files from the AAAT dataset.Then three models were trained and evaluated with each fold given a chance to be the held out test set, where each model trained on 2 unique training folds.Each model also tested on a unique test fold.Therefore, the classification was performed thrice:  Model1: Trained on Fold1+ Fold2, Tested on fold3  Model2: Trained on Fold2+ Fold3, Tested on fold1  Model3: Trained on Fold3+ Fold1, Tested on fold2 To obtain a final accuracy measure, each classification model was then discarded after retaining the evaluation score.The skill scores then summarized to use.

Evaluation
Macro averaged precision (Pr (M) ) and recall (Re (M) ) were used to evaluate this work.Furthermore, macroaveraged F1-measure was also used to compare the experiments so that increases or decreases in classification efficiency could be measured.The resulting scores were evaluated by computing the number of True Positives (TP) and True Negatives (TN) over all the experiments and calculating the precision (Pr (M) ), recall (Re (M) ) and F1-measure

Experiment and Results Discussion
A set of experiments was run to evaluate the effect of short Arabic texts with limited training data (two short text documents per author) on different features to show the robustness of the KNN performance.Moreover, the effect of feature size was tested via IG and Chi-x 2 FS methods.

Dataset Description
The study considers a standard dataset of short texts as an approximation of the ancient Arabic texts.Ten different authors wrote ten books.One book per author was chosen from "Alwaraq library" website, as in the AAAT dataset (i.e., Authorship attribution of Ancient Arabic Texts).Additionally, three pages were selected from each book, each to be stored as one page in a text file.According to Table 3, the average length of each file was 550 words.This allows probing into the scalability of the approach with limited training data and short texts documents.

Overall Results
As can be seen from Table 4, a good attribution score of 90.42% average accuracy was obtained by applying 5-NN with features of tetra-gram characters.This score was the best score of all the features employed in the separately-carried-out experiments.Additionally, a good average accuracy of 89.29 and 88.33% were obtained with the features of rare words using 5-NN and 3-NN, respectively.Moreover, character-based ngrams are better than character counts, which only scored 23.33% of best attribution.
Also, it can be observed from Table 4 that the average good attribution score was an accuracy of 62.83% with 5-NN compared to 3-NN, which achieved a score of 61.84%.This result shows that the KNN model is more stable when constructed with more neighbors.However, there is no direct relationship between predication performance and range of neighborhood.That because, based on the result of experiments in Table 7 and 8, features size also has impact on prediction performance, since it can be noted that each value of K (3-NN and 5-NN) produced different accuracies depends on features size.For instance, when Rare words feature and IG were used with KNN (Table 7): The prediction of KNN when the number of neighbors = 5 produced different average precisions (83.17, 89.29 and 49.91%) depending on features size of (100, 500, 1000) respectively.Also, depending on features size of (100, 500, 1000) the prediction of KNN when the number of neighbors = 3 produced different average precisions of (88.33, 78.33 and 23.75%) respectively, so that, each prediction of KNN has different performance depends on both the number of neighbors (k) and the number of features size available.This proves that the size of features and the range of neighborhood both have an impact on the prediction performance of KNN.
It is important to mention that the average accuracy of 90.42, 89.29 and 85.00% with limited training samples is relatively high, where several previous works by (Ramnial et al., 2016) stated that, with 10,000 words per author, the average accuracy is high and reduced with 1000 words.Also, (Eder and Maciej, 2010) stated that text not be less than 2500 words per sample to obtain good results.On the other hand, this paper presented short texts ranging between 1289 and 1785 words per author.

Feature Selection for Enhancing KNN Algorithm Performance
The aim of Feature Selection (FS) methods is to eliminate the useless feature.To maximize the success of the Authorship identification system and reduce the size of the dimensionality of the vector space.Bay and Çelebi (2016).The effect of FS methods on KNN performance was also tested.The FS methods applied on 3-KK and 5-KK of the KNN value using the different features.We used IG and Chi-x 2 in RapidMiner tool.We conducted the experiments in two different ways.In the first way, we applied the 3-NN and 5-NN separately to each feature condition before applying IG and Chi-x 2 .In the second way, different experiments were conducted, by reducing our features set to different sizes by eliminating the worst features based on IG and Chi-x 2 processes, then we applied the 3-NN and 5-NN separately to each feature condition with different sizes.Improvements in the KNN performance rates after applying FS procedure can be observed in Table 5.    ----------------------------------------------- ---------------------------------------------------------------------- The Effect of Short-Text Documents on Feature Selection using Different Feature Sizes Table 6 summarizes the total number of each generated feature condition obtained from the AAAT dataset.Also, it shows the top-k feature size that was selected from the total number of each generated feature (e.g., the top-k frequent features when k = 500 means the most 500 frequent features were selected).Each top-k feature size that contains the best features was fed separately into the KNN classifier with the highest information gain or Chi-x 2 value.
According to Table 7, the results of average precision indicate that with the most used features the IG achieved better results (between 11.21 and 90.42%) than the Chix 2 (between 9.21 and 75.17%), with both cases applying 3-NN and 5-NN of the KNN value; applied separately to each feature condition.Besides, the best results obtained from both the IG and Chi-x 2 were obtained using 5-NN applied on rare words and Tetra-gram features with a feature size of 500.
In another experiment done separately for each feature using KNN; Table 8 shows the results of the average recall using different feature sizes weighted with IG and then Chi-x 2 .
Table 8 shows that the best average recall (83.33%) obtained for both IG and Chi-x 2 was recorded using 3-KK with the rare words feature and with a feature size equal to 100 IG.Likewise, the IG achieved better results than the Chi-x 2 with the most features, where, in both cases, the 3-NN and 5-NN of the KNN classifier were applied.From the results, the most suitable feature set could be obtained according to the outcomes of Re, Pr and F  1, per Fig. 2. To summarize, it can be noted that different values of attribution results were obtained by applying KNN with IG and Chi-x 2 methods using different feature sizes of 100, 300, 500, 1000 and 2000.This result proves that the size of the features has an impact on the performance of the attribution.This is because feature size has an impact on frequency, which is considered the most important criterion for feature selection.In general, the more frequent the features; the more stylistic the variation that it captures (Putniņš et al., 2006).
Furthermore, Chi-x 2 did not work on short texts, as it yielded worse results than IG.The reason behind that was supported by the data, where the texts used in this study were very short to allow the regular reoccurrence of characters.The Chi-x 2 begins to perform better on larger data which can produce higher dimensional feature space.However, character n-grams features can considerably increase the dimensionality, the texts size was not enough to produce higher dimensionality.According to the results presented in Tables 7 and 8, indicate that the Chi-x 2 achieved best results (obtained to 75.17%) by using Tetra-gram feature, which produced higher dimensionality than character count and Bi-gram which achieved results 23.33 and 50.00% of good attribution respectively.Previous work done by (Mohsen et al., 2016) showed that Chi-x 2 can outperforms other FS method only when high dimensional feature space are used.Nicolosi (2008) stated that Chi-x 2 is more suitable on larger data.On the other hand, the present investigation demonstrated that the IG can work well on low dimensionality feature space and outperforms Chi-x 2 .
Nevertheless, the Chi-x 2 was faster in its implementation of the IG algorithm, which took more time, which is more than 40 min in the experiments run.Lastly, a slight failure was also noted using bi-gram features (attribution accuracy scores of about 40% and 50% were obtained).This is because the regular reoccurrence of bi-gram characters was low in such short texts.Also, the character count feature failed, it obtained a very low classification result (11.40%) for average precision.The reason behind that is the generated features size of this type of features was very small, only 34 features.In such cases, the character count feature may not have enough information to make the best decision.So that, we can observe that fewer feature items have lower accuracy as shown in Table 9.There is a large different in Pr (M) and Re (M) between character count feature and other used features.

Comparison with Related Works
In this section we consider the recent and closer works to our work since most of previous related studies investigated the problem of AA in multi-short Arabic texts.Study such as Al-Ayyoub et al. (2017) which considered dataset consisted of 14,039 short articles written by 42 authors and Al-sarem and Emara, (2019) used large dataset consisted of 4,631 short text documents distributed among 15 authors.On the other hand, we investigated the performance of KNN with small dataset consisted only 30 short Arabic texts written by 10 authors as the case study of (Ouamour and Sayoud, 2018;2012).So that, we decided to compare our approach to (Ouamour and Sayoud, 2018;2012) works.Siham and Halim (2018) applied three classifiers: LR, MLP SVM and Vote Based Fusion technique with different features set of character and word n-grams.Siham and Halim (2012) used the same features set with SVM.We discovered that our approach using KNN, which was enhanced by FS methods has achieved the best accuracy (90.42%) while the second best accuracy (90.00%) was obtained by (Siham and Halim, 2018) using Vote Based Fusion method with MLP, followed by LR achieved 70% accuracy and SVM obtained 80% accuracy.The comparison gives an indication of different classifiers performance.

Conclusion
This paper presented a new AA task to investigate the use of a KNN classifier for the Authorship Attribution (AA) of short Arabic texts to determine the robustness of this method under different lengths (varies from 290 to 800 words) of text samples used for the training.The KNN was trained on limited data against two text documents per author, where the average text length was about 550 words per document.Several state-of-the-art features were tested for the Arabic language, with experiments carried out separately for each feature condition, including rare words, count characters and character level (Bi-gram, Tri-gram and Tetra-gram).The last set of tests evaluated the effects of feature size using various feature set sizes by applying the IG and Chi-x 2 selection methods.Some noteworthy points of this a new AA task are listed below:  Although the size of the texts used in this study was small (ranging between 1289 and 1785 words per author) the performance of the KNN classifier was interesting (90.42% average accuracy for the best score)  The character tetra-gram and rare words features have the best performance.These feature sets are very effective even with limited training data size.
On the other hand, classification failure was observed when the character count feature was used  Feature selection methods are necessary to achieve an outstanding performance of classifiers  Information Gain (IG) selection method is more suitable with short texts than Chi-x 2  Our results show that using about 2000 words per author, the authors of Arabic short texts can be successfully identified  This work on AA is one of the few works done on short Arabic texts, so it serves as real motivation to conduct more AA investigation on the Arabic language In the future, a new set of stylometric features could be used to enhance the performance of the KNN classifier.

Fig. 1 :
Fig. 1: Flowchart of text processing and classification process in AA

Table 1 :
Description of each stylometric features with an example applied two ML algorithms, KNN and Sequential Minimal Optimization (SMO), using stylometric features.All results yielded an accuracy of 90%, except for the KNN classifier.The KNN classifier is chosen as a classification method for the following reasons:

Table 2 :
Generated features of the Arabic text " ‫ارتحالي‬ ‫"وكان‬ after tokenization process Original text from the AAAT dataset

Table 3 :
Size of texts in terms of words

Table 4 :
Percentage accuracy of good attribution obtained using KNN with different features and features sizes Accuracy of good attribution using the KNN classifier

Table 5 :
Accuracy of good attribution in % before and after Feature Selection (FS)

Table 6 :
Generated features and best feature size selected for each feature type

Table 7 :
Results of average precision in % using KNN with different feature selection sizes via IG and Chi-x

Table 8 :
Results of Average Recall (in %) using KNN with different feature selection sizes via IG and Chi-x

Table 9 :
The best average Pr(M)and Re(M)according to total number of each feature