Aspect-based sentiment analysis using smart government review data

Digitalresourcessuch assmartapplicationsreviewsandonlinefeedbackinformationare importantsourcesto seekcustomers ’ feedbackandinput.Thispaperaimstohelpgovernmententitiesgaininsightsontheneedsand expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average.Also,whenusingthesamedataset,theproposedapproachoutperformsmachinelearningapproachesthatusessupportvectormachine(SVM).However,usingtheselexiconsandrulesasinputfeaturestotheSVM modelhasachievedhigheraccuracythanotherSVMmodels.


Introduction
Governments around the world aim to deliver quality government services that are efficient, cost-effective, and socially just while keeping the needs of the public at the centre of the provided services [1,2]. Digital sources such as smart applications reviews and online feedback information are considered one of the important sources to obtain customers' feedbacks and inputs. With the advances in machine learning and natural language processing techniques it is now possible to analyse the obtained data to measure citizens satisfaction with the services provided [3].
Sentiment analysis, also known as opinion mining, is the field of study that analyses people's opinions and attitudes towards entities such as products, services, and topics and their attributes. Since 2000, this field has become an active research area due to its applicability to almost every possible domain, from consumer products, services, and healthcare to social events and elections, providing a strong motivation for research [4].
In this paper, our objective is to design a model that helps government agencies gain greater insights and a broader holistic view of their customers' needs and expectations. Accordingly, the proposed model examines the data for the government mobile apps reviews to improve the government smart applications and services performance. Towards this end, we propose a model to integrate several lexicons and rule-based techniques to extract users' sentiments from smart government apps reviews. The dataset and lexical resources produced by Alqaryouti et al. [5] were used in this study. To extract structured and useful information from the unstructured sources of textual data, aspect extraction is considered. Aspect extraction is based on the notion that an opinion aimed at no target (i.e. aspect) is of limited use. Thus, the goal of aspect-based sentiment analysis is to discover sentiments and the aspects that those sentiments are aimed at [4].
This paper is the first attempt to provide a way that helps the governments and enterprises identify the improvement areas of their services by understanding their customers' feedbacks through smart apps reviews. The proposed model can highlight the issues in the government mobile apps in terms of the following aspects categories: user interface, user experience, functionality, performance, support and updates according to the users' reviews [5]. This helps in framing the challenges that need to be addressed by governments to transform their services through finding the most innovative solutions to increase the citizens' satisfaction.
This paper is divided into six sections. Section 2 provides a background about the selected topic and discusses the related work in the literature. Section 3 illustrates the research methodology and related experiments to extract aspects and classify the sentiments of smart government apps domain. Section 4 discusses the experiments performance results. Section 5 discusses the research implications. The last section provides the conclusion of our work.

Background and related work
Sentiment analysis is the computational study that analyzes individuals' attitudes, excitements, emotions, expressions, viewpoints and opinions towards certain entity. The opinion sentiment can be represented as a quintuple of five components as follows: ðe; t; s; h; dtÞ, where e represents the entity, t represents the target aspect or feature for the sentiment, s represents the sentiment towards the target, h represents the opinion holder, and dt represents the time stamp of the opinion. Figure 1 illustrates an example of a sample review with quintuple sentiment components extracted on one sample review.
There are different levels where sentiments can be analyzed, namely document, sentence and aspect levels. At document level, the whole review is considered as a basic information unit and is then classified into positive, negative or neutral opinions. Similarly, sentences are regarded as short documents. The sentiment in the document-level is determined for the document as a whole, which clarifies the sentiment associated to the target. In this case, the opinion holder might have a positive opinion regarding the entity but might not be satisfied with all the "aspects" of that target. To extract such information, aspect-based classification is used. Aspect-based analysis covers both entities and aspects. It does so by decomposing the entity into aspects (aspect extraction), then classify each aspect sentiment into positive, negative or neutral (aspect sentiment classification), and finally, summarize the results of the previous steps.
Another sentiment analysis level that has been considered is the concept-level [6]. Unlike word-based approaches, concept-level sentiment analysis focuses on the semantic analysis of text through the use of web ontologies and semantic networks. This allows the combination of conceptual and affective words associated with natural language. For example, if the concept "Cloud Computing" is split into two words, the word "cloud" would be wrongly associated with the weather. However, concept-level sentiment analysis is limited by the bounds of the knowledge base and by the fact that it fails to detect important discourse structure information that is essential for effectively detecting the polarity expressed by natural language opinions [7].
In this paper, our concern is aspect-based sentiment analysis, which is one of the levels of sentiment classification. It comprises aspect or feature extraction, sentiment polarity prediction and classification, and sentiment aggregation [8,9]. Aspects extraction in sentiment analysis is now becoming an active area of research as it is the most vital task in the aspect-based recognition [4,5]. Aspect-based sentiment analysis is the process in which sentiments in respect to different aspects are detected [10]. Aspects are attributes, characteristics, or features of a product or service. Aspect extraction phase involves the identification of these review characteristics through consumers' comments to identify aspects. Next, the polarity prediction and classification take place to decide if the aspect sentiment polarity denotes positive, negative or neutral orientation as well as its strength or tone level [11]. In the previous example, the word "amazing" denotes a positive sentiment polarity towards the "design" aspect. The last step is to summarize the results according to the extracted aspects and their corresponding classified polarity. This summary is essential to determine the strengths and weaknesses of each aspect within the app and compared to others. This summarization of results can be done, qualitatively through text-based aggregated opinions summary [12], or quantitatively through graphical and analytical representation [13].
To automate the summarization of reviews, aspect words are grouped into aspects categories. In an unsupervised machine learning approach, the model tries to make sense of the data and extract features on its own. However, when the aspect categories are known in advance, and there is enough training data available, a supervised machine learning approach to aspect category and polarity detection is feasible and may yield better results [14]. On the other hand, semi-supervised approaches use a small set of labeled data to label a larger set of unlabeled data [15].
Reviews aspects are domain dependent and differ from a context to another. Aspect-based sentiment analysis has been broadly used in numerous application domains such as products reviews, social media, hotel reviews and restaurant reviews.

Aspect-based sentiment analysis
Researchers have reported various approaches in order to extract aspects from textual resources. For instance, Samha et al. [16] used frequent Part of Speech (POS) tags and rules in addition to opinion lexicon to identify aspects and opinion words from reviews as well as group them into categories and summarize the results. POS is the process of marking up a word in a text as corresponding to a particular part of speech (i.e. verb, adjective, preposition, etc.), according to the word's definition and context [15]. Devi et al. [17] proposed a feature-based approach for sentiment analysis using Support Vector Machine (SVM). The authors collected product reviews on laptops from e-commerce platforms such as Amazon and eBay. SentiWordNet, which is a lexical resource for sentiment classification and opinion mining applications, was used to identify objective sentences and later to identify the polarity for the opinion words. Furthermore, POS tagging was used to extract aspect terms from the dataset. The authors used Stanford parser to extract the opinion words and to find the grammatical dependencies to determine the connection between the opinion words and aspects extracted in the previous step. The dependencies also assist in determining the negations that were considered in calculating the polarity score. The SVM classifier was used to classify aspects and determine their sentiment polarity score. The result of the SVM classifier is a set of vectors that contain aspects and its opinion words for each review. The performance achieved significant results with overall accuracy of 88.16%.
Similarly, Manek et al. [18] proposed a feature selection method based on Gini index. The authors used SVM classifier to predict sentiment polarities for a movies' reviews dataset. In their approach, the reviews were pre-processed with tokenization, case transformation, filtering stop words and stemming. Further, the Term Frequency/Inverted Document Frequency (TF/IDF) has been adopted with the weighting mechanism using Gini Index as a feature selection approach. This helped in measuring the impurity of the attribute for categorization and creation of the feature vector for the top 50 attributes according to the Gini impurity index value. Then, the SVM classifier was applied in order to train and test the model. This approach achieved significant results with overall accuracy of 92.81%.
The introduction of the SemEval competition resulted in a rise to the number of proposed methods for aspects extractions. For instance, Mubarok et al. [19] used sentiment analysis and classification techniques to determine the sentiment polarity of restaurants reviews using the SemEval 2014 dataset. Feature extraction was performed using Chi Square, resulting in higher computational speed despite reducing the system performance. Naı €ve Bayes classification of sentiment polarity was used to classify both aspects and sentiments. The evaluation results indicated that the system performed well with a highest score of 78.12% for the F1-Measure. Al-Smadi et al. [20] proposed two supervised machine learning approaches, namely deep recurrent neural network (RNN) and SVM. The authors aimed to investigate three different tasks; aspect category identification, aspect opinion target expression (OTE), and aspect sentiment polarity identification. The Arabic Hotels' reviews dataset from the SemEval-2016 framework were used to evaluate the efficiency of the proposed approaches. When compared to baseline researches, the results indicated that SVM outperforms the deep RNN in all the investigated tasks. However, the deep RNN was found to be more appropriate and faster in terms of training and testing execution time. Moreover, Al-Smadi [21] used two applications of deep learning long short-term memory (LSTM) neural networks for aspect-based sentiment analysis of Arabic hotels' reviews. The first application was a character-level bidirectional LSTM with a conditional random field classifier used for aspect OTE extraction. The second application was an aspect-based LTSM for aspect sentiment orientation classification. The evaluation results indicated an improvement of 39% compared to baseline research for the aspect-OTE extraction application and 6% for the orientation classification application. Likewise, Al-Ayyoub et al. [22] proposed a supervised machine learning enhanced approach to extract aspects and classify sentiments for hotels' Arabic reviews in the SemEval-2016 dataset as well. The approach consisted of three tasks, namely identifying aspect categories, extracting the opinion targets, and identifying the sentiment polarity. Evaluation results indicated that the study approach outperformed the benchmark approaches using the same dataset.
On the other hand, Garc ıa-Pablos et al. [23] presented W2VLDA, an aspect sentiment classification system that requires minimal supervision and does not require language or domain specific resources. The system is able to distinguish opinion words from aspect terms in an unsupervised way. The only supervision required by the user is a single seed word per aspect and polarity. The system performance was also evaluated using the SemEval 2016 dataset. The analysis showed competitive results for different languages and domains. Similarly, Dragoni et al. [24] presented an aspect-based unsupervised system for opinion monitoring that supports data visualization. The authors adopted an open information extraction approach to extract the aspects. The system aimed at providing users with an effective analysis and visualization tool based on the user-generated content. The approach proved its effectiveness compared to baseline supervised approaches participated in SemEval campaign.
Rathan et al. [25] proposed an ontology-based feature level sentiment analysis model for the domain of "Smartphones" for tweets. The authors included various features such as spelling correction and emoji and emoticon detection. The lexicon-based approach was used for automated training data labelling. A "Smartphone" specific domain lexicon in addition to the general one was used to improve the accuracy of the classification using SVM. The system provided good accuracy and demonstrated the importance of using emoji and emoticons detection as well as the use of attribute specific lexicons.
Cambria et al. [26] proposed an ensemble technique of symbolic and sub-symbolic AI by employing a LSTM network to discover the verb-noun primitives by lexical substitution and link them to commonsense concepts in a new three-level knowledge representation for sentiment analysis, termed SenticNet 5.
On the other hand, Xianghua et al. [27] suggested an unsupervised approach to determine the aspects and sentiments in Chinese social reviews. The authors used Latent Dirichlet Allotiment (LDA) in social review to identify multi-aspect global topics. Then, they extracted the local topic and the related sentiment according to a sliding window through the review text. Chen et al. [28] improved the LDA model and presented the Automated Knowledge LDA which is a fully automatic approach that can use existing domain independent data to learn prior knowledge and identify new aspects. The Automated Knowledge LDA approach was able to produce aspects and resolve issues related to wrong knowledge by adopting and enhancing the Gibbs sampler method. Moreover, Poria et al. [29] asserted an unsupervised rule-based approach to obtain both explicit aspects as well as implicit aspects clues (IAC) from product and restaurant reviews. In the proposed approach, the authors used the IACs to identify the IACs in the review and plot them to the aspects they are signifying depending on common-sense knowledge and on the dependency structure of the sentence using WordNet and SenticNet.
Pham and Le [30] proposed a multi-layer architecture for denoting customers' hotels' reviews. Learning techniques such as word embeddings and compositional vectors models to obtain words, sentences, aspect-based graphs and higher aspect representation layers were used, resulting in rich knowledge of the input text. These representations were then integrated into a neural network to create a model for predicting the hotels' overall ratings, while being able to efficiently extract aspects ratings and weights.
Liu et al. [31] integrated both supervised and unsupervised domain independent automatic rule-based methods to improve double propagation. In their research, the double propagation method assumes that the opinion words will always have a target. Therefore, there is a syntactic relation between the opinion word and the target within the same sentence. This approach is able to select effective part of all rules with a better performance than the whole set. Lastly, Ma et al. [32] combined the LSTM network with a hierarchical attention mechanism. The results indicated that the combination of the planned attention design and Sentic LSTM outperformed progressive strategies in aspect-based sentiment analysis tasks.
As can be noticed, most of the aforementioned works require labeled datasets for training their models for each of the domains. When a new domain is studied, it would be difficult to train the models without first extracting the domain specific aspects. When the domain independent parameters are used to describe the connections between aspects and their associated opinion expressions, the models will be able to capture the aspect specific sentiment with minimal data requirement [33].

Methodology
In this paper, we propose an integrated lexicon and rule-based aspect-based sentiment analysis approach to extract government mobile apps aspects and classify the corresponding implicit and explicit sentiments. This approach is selected due to the nature of the targeted dataset, which consists of short reviews and irregular sentences related to the various aspects of the government mobile apps. Shaalan [34] highlighted the significance of the rule-based over other approaches as it depends on manually created rules and because it is easier to integrate domain knowledge with natural language processing tasks. The propposed approach utilizes the manually generated lexicons by [5] with hybrid rules to handle some of the key challenges in aspect-based sentiment analysis in particular and sentiment analysis in general. Figure 2 illustrates the proposed methodology. The following subsections illustrate the methodology in details to extract both explicit and implicit aspects in additions to its sentiment classification.

Dataset
In this paper, we use the lexical resources and dataset produced by [5]. This dataset consists of domain specific government smart apps lexical resources and dataset which were manually annotated and verified. According to the authors [5], the dataset consists of; short reviews in average, slangs, spelling mistakes, incorrect grammar, no spam reviews, and few occurrences of ineffective emoticons. This led to choosing an integrated lexicon and rulebased model.

ACI
The dataset was split into two subsets: the training and testing sets. The training set comprises 5141 (70%) of the reviews and the testing set comprises 2205 (30%) of the reviews. This provides a better indication of how well the proposed model will perform on unseen data compared to other approaches. Table 1 summarizes the dataset content statistics.

Pre-Processing
When adopting the rule-based approach with supportive lexicons, it is crucial to perform text pre-processing tasks. The first step in the proposed algorithm is to split the review into sentences based on punctuations that identify a sentence end, such as the full-stop, question mark and exclamation mark. This would have an important impact on linking the polarity score with the right aspect term without interfering with irrelevant sentences. In addition, the review subject is added as the first sentence of the review. Next, the sentences are tokenized where for each token, punctuations are removed and all letters are converted to lowercase. However, the pre-processing tasks do not perform normalization, such as removing repeated characters. The reason is that the aspect sentiment scoring phase, are responsible to treat it as intensification which affects the polarity score. For example, the proposed approach considers the word "greaaat" as "very great". Finally, stop words will be marked to be out of any of the following phases by using a customized list of stop words such as "the", "an" and "of". This list has been initiated by studying the domain and the reviews in the dataset.

Implicit and explicit aspect extraction
Aspects categories are vital for the aspect extraction task in sentiment analysis. To address this requirement in the government mobile apps domain, [5] have defined a set of aspects categories according to the written standards by Android, Apple and Smart Dubai Office [35][36][37][38]. The resulting aspects categories were User Interface, User Experience, Functionality and Performance, Security, and Support and Updates. These aspects categories are used in this study.
One of the main challenges in the aspect extraction task is that some aspects are not explicitly mentioned in a review. A sentence with explicit aspect basically contains a term that indicates the aspect category, while an implicit aspect would not be expressed by any specific word or term. For example, in a review with two sentences "The app design is attractive. But it is buggy", the first sentence contains an explicit term "design" with an opinion word "attractive" that indicates a positive sentiment toward the aspect category "User Interface". On the other hand, the second sentence does not contain a term explicitly Aspect-based sentiment analysis (Something like: "The app functionalities are buggy"). However, it contains an opinion word "buggy" which implicitly implies the aspect category "Functionality and Performance".
One of the approaches that are widely used in aspect identification is to consider opinion words as a good potential candidate for implicit aspect extraction [29]. Thus, the designed algorithm first looks for opinion words that directly denote aspect according to the lexicon. Otherwise, if the opinion word cannot determine the aspect category, the algorithm will search for the nearest aspect term in the same sentence with maximum window size of two with more priority to the right side, since the adjective usually occurs before the term. The pair of identified opinion word and aspect term will be looked up in the lexicon in order to determine the aspect category.
Function 1 illustrates the algorithm to extract the explicit and implicit aspects. This function returns two arrays where the first array (aspectIndices) represents the indices of the aspect terms in the review and the second array (aspectCategories) represents the as aspect categories to the corresponding aspect terms in the first array.
In the first sentence of the previous example "The app design is attractive", the algorithm will identify "attractive" as an opinion word that cannot determine the aspect as per the lexicon, so it will look for the nearest aspect term in the same sentence which is "design" and determine the "User Interface" as aspect category through the lexicon. Conversely, in the second sentence "But it is buggy", as per the lexicon, "buggy" implies an opinion toward the aspect category "Functionality and Performance" even though the sentence does not contain any explicit aspect term. Similar approach was also adopted by [39].

Aspect sentiment scoring
The approach that has been followed employs the populated lexicons produced by [5]. Basically, the algorithm as Function 2 represents, navigates through the sentences and once an opinion word is identified, its polarity score is retrieved through the lexicon and linked with the extracted aspect. In the experiment, we applied several settings to the algorithm in order to identify opinion words in a sentence in addition to the use of the lexicons. For instance, various rules are adopted to handle negations, intensification, downtoners, repeated characters, and the special case of negation-opinion rules. Finally, as illustrated in Function 2, the function returns two arrays that store the polarity score in the first array (polarities) and the corresponding aspect category in the second array (orderedAspectCategories).

Function 2: Aspect Sentiment Scoring Algorithm
As a baseline, the basic lexicon approach is followed where a straightforward lookup returns the polarity score. However, the polarity score retrieved from the sentiment lexicon is not final. There are several rules that can affect the accuracy of the polarity score such as negation, intensification and downtoners [40]. For each of these modifiers' parameters, a lexicon that lists different terms that implies a modification has been constructed. For instance, the term "not" implies a negation and in case it occurs near an opinion word, as in the sentence "The app design is not attractive", the polarity score is inverted through multiplying it by (À1). This case is called "switch negation" as it reverses the polarity of the opinion word. However, there is another negation case in which the polarity is not inverted. For example, in the following review: "The app design is not innovative", the negation does not invert the polarity in this case since the strength of the opinion word "innovative" is highly positive. The algorithm will perform polarity shifting through multiplying it by certain threshold value (a fraction e.g. Threshold 5 0.5) instead of polarity reversal. In shifting negation, the proposed approach identifies the highly positive opinion word score that are greater than (0.75) and highly negative opinion word score that are less than (À0.75). In the previous example, "innovative" polarity score as per the lexicon is (1.0) which is greater than (0.75), so this value will be shifted to (0.375). Table 2 shows examples for some of the negation words from the negation lexicon.
The proposed algorithm treats the intensifiers and downtoners which can increase or decrease the polarity score of an opinion word when they appear in the sentence. For example, Aspect-based sentiment analysis the sentence "The app design is very attractive" implies more positive sentiment than the sentence "The app design is attractive", so "very" considered as intensifier. On the other hand, the sentence "The app design is quite attractive" implies less positive sentiment, so the word "quite" is considered as downtoner. Hence, the algorithm modifies the polarity score when it locates any of the intensifiers or downtoners that occurs before the opinion word according to the lexicon. However, not all intensifiers or downtoners have the same impact or power. For instance, the word "most" denotes more intensification than the word "very". Similar to [41], the proposed approach assigned a multiplication factor to each intensifier and downtoner. For example, the polarity score of the opinion word that follows the intensifier "very" is multiplied by the factor (1.25). On the other hand, the polarity score of the opinion word that follows the downtoner "quite" is multiplied by the factor (0.75). Table 3 shows several intensifiers and downtoners along with their factors and examples. Modifications can appear in different forms and can be embedded in the opinion word itself. For example, the suffix "est" as in "the greatest" is also considered as intensifier and it is equal to the word "most". On the other hand, people like to exaggerate their feelings by repeating letters as in "greaaaat" or "baaaaad" which is somehow equal to "very great" or "very bad". [42] treated the repeated characters as intensification problem and suggested a solution that whenever two or more consequent characters are found to improve the word sentiment scoring for both positive and negative orientation. For example, in the following review: "Baaaaaaad app", assume that the score of "bad" is equal to "À2" then the sentiment score due to the repeated characters could be "À4". Hence, the algorithm identifies such cases by looking for repeated characters or specific suffixes and treats them as intensifiers. Table 4 provides the details of the complete set of intensifiers and downtoners with their factors.

Aspect sentiment aggregation
The algorithm targets to determine the star rating for different aspects extracted in the review. The five-star rating scale (1)(2)(3)(4)(5) is chosen in the experiment, where: a one-star expresses a very negative sentiment toward this aspect, two-star expresses a negative sentiment, three-star expresses a neutral sentiment, four-star expresses a positive sentiment, and five-star expresses a very positive sentiment. This can play a crucial role in understanding users' feedback toward specific aspects rather than a general feedback where the smart government apps owners can be aware of the areas of pains and gains of their customers.  Since the opinion words are extracted along with their final polarity scores and aspects, it would be a straightforward procedure to calculate the average of polarity scores for opinion words that are categorized under each aspect. Consider the following illustrative scenario of the user review: "The app design is very attractive. It is organized and neat. But it is quite buggy and some features are missing". The algorithm identifies the five sentiments shown in Table 5.

Function 3: Aspect Sentiment Aggregation Algorithm
Aspect-based sentiment analysis Function 3 illustrates the algorithm that handles the aggregation task as explained in the previous example and returns the final star rating of a review. The final stage of the aspectbased sentiment analysis is to use data visualization to summarize the full set of reviews in a graphical representation that shows the equivalent star rating of each aspect based on five stars' scale that are returned from Function 3.

Discussion of results
As explained in the previous sections, several attempts were conducted to evaluate the algorithm improvements on both aspects' extraction and opinion classification. The experiments evaluation for aspect extraction and sentiment were carried out according to the several settings and parameters. Standard confusion matrix is employed to measure the performance on the unseen testing set, which comprises 2205 reviews (30%) of the overall reviews (7345). This can help in calculating more advanced evaluation metrics such as Precision, Recall, F-measure and Accuracy metrics. The confusion matrix elements for both aspect extraction and sentiment classification are defined as illustrated in Tables 6 and 7. In addition, the pearson product-moment correlation coefficients was adopted to examine the correlation between the predicted values the actual values of the sentiment classification task. This metric returns a value between À1 and þ1. Whereas, higher positive value denotes a positive relationship between the predicted and the actual values. On the other hand, lower Missing Functionality and Performance À0.56 -À0.56 The aggregation task groups the polarities by the aspect categories and calculates the average. Here there are two aspect categories: 1. User interface: (0.86 þ 0.47 þ 0.36)/3 5 0.56. This is considered as a 4-star rating in terms of "User Interface". 2. Functionality and Performance: (À0.375 þ À0.56)/2 5 À0.47. This is considered as a 2-star rating in terms of "Functionality and Performance".  negative value denotes that the relationship between the predicted and actual values are negatively correlated and that means if one value increases, the other value decreases. Correlation value close to zero refers to the situation where there is no relationship between the predicted and actual values. The corresponding performance evaluation measures are calculated according to the following formulas: The aspect extraction results include a comparison of experiments that were conducted in this study. As Likewise, several parameter settings have been employed to measure the progression of the aspect's sentiment classification. As stated in Section 3.5, the basic lexicon (L) based approach is considered as a baseline for the performance evaluation. Moreover, other rules settings are added to measure the improvements on the performance measures. These rules settings are adopted to handle numerous challenges in sentiment analysis such as handling negation (N), intensification (I), downtoners (D), repeated characters (R) and special cases of negation-opinion rules (S). As shown in Table 10, the integrated lexicon and the various rules ðL þ N þ I þ D þ R þ SÞ have achieved the highest performance scores with respect to aspect extraction and opinion classification.   Table 9 shows, there is a significant increment in the evaluation results when considering implicit aspects as well as explicit ones. While analyzing the algorithm progress in identifying the sentiments toward aspects without taking the implicit aspects into consideration, the algorithm was still able to identify opinion words and attempt to search for surrounding terms to recognize the aspect. Thus, it succeeds in aspect extraction in some cases, especially when there is another opinion word with explicit aspect in the same sentence. However, in many cases, a misleading aspect term appeared nearby the identified opinion word, where in this case the sentiment has been assigned to a false aspect and the correct aspect has not been extracted, as the sample reviews show in Table 8. This may increase both FP and FN values and as a result it will affect both Precision and Recall.
Likewise, several parameter settings have been employed to measure the progression of the aspect's sentiment classification. As stated in Section 3.5, the basic lexicon (L) based approach is considered as a baseline for the performance evaluation. Moreover, other rules settings are added to measure the improvements on the performance measures. These rules settings are adopted to handle numerous challenges in sentiment analysis such as handling negation (N), intensification (I), downtoners (D), repeated characters (R) and special cases of negation-opinion rules (S). As shown in Table 10, the integrated lexicon and the various rules ðL þ N þ I þ D þ R þ SÞ have achieved the highest performance scores with respect to aspect extraction and opinion classification.   Furthermore, to compare the performance of our proposed approach with other techniques, the feature-based approach for aspect-based sentiment analysis using SVM proposed by [17] and the Gini Impurity Index with SVM by [18] was adopted. In this vein, the supervised machine learning approach by [17] has been replicated to extract aspects and classify the corresponding sentiments from the mobile government apps reviews. First, the SentiWordNet was used to identify and remove the objective sentences in each review. Second, the aspect terms were extracted using NNS, NN, and NNP POS tags to identify the most and least frequently used aspects by the users. The most frequently used aspects were defined based on a pre-defined frequency threshold ðn ¼ 10Þ. These aspects were considered in the experiment. Conversely, the least frequently used aspects were discarded. Third, Stanford parser was used to extract the opinion words and to find the grammatical dependencies to determine the connection between the opinion words and the extracted aspects. Finally, the feature sets that comprises the aspects, opinion words and polarity scores were inducted into the SVM classifier model. RapidMiner was used to train the SVM model on the training set in order to allow the classifier to gain the domain knowledge. Further, the model was evaluated based on the same unseen testing set used in our proposed integrated rule-based model. The result of the SVM classifier is a set of vectors that contain aspects and its opinion words for each review. The vector can be represented as follows: V k ðR n ; F i ; O ij ; P ij ; N ij Þ where, V k is the k th vector, R n is the n th review, F i is the i th feature in R n , O ij is the j th opinion word for i th feature, P ij is the positive score for j th opinion word, and N ij is the negative score for j th opinion word.
Additionally, we used RapidMiner to replicate the proposed model by [18] with minor modifications for further evaluation. The TF/IDF was used as a weighting technique. TF/IDF is among the most regular and widely used term weighting techniques. Apart from TF/IDF fame, it has been considered as an empirical approach which is a frequency-based approach that considers the words and document frequencies. The TF/IDF is a well-known model with its proven performance and efficiency [43]. But, this method considers the words of low frequency as important and the words with high frequency as unimportant which impacts the precision of the classifier negatively [18]. To overcome this issue, the weightage has been assigned to the attributed using the Gini Impurity Index. This index calculates the weight of attributes with respect to the label attribute by computing the Gini Index of the class distribution. This is done by adding the "Weight by Gini Index" operator after "Process Documents" operator that uses the TF/IDF vector creation method parameter. To evaluation the performance, the "Cross Validation" operator has been used with 10 folds along using the SVM classifier. Cross-validation is an approach to evaluate SVM models by dividing the dataset into a training set and a testing set. In 10-fold cross-validation, the original dataset is randomly divided into 10 equal subsets. Out of these 10 subsets, a single subset is retained as testing the model, and the remaining 9 subsets are used as training data. The cross-validation process is then repeated 10 times, with each of the 10 subsets used exactly once as the testing sets. The average results of the 10 folds are calculated.
To measure the impact of domain lexicons and rules on the performance of SVM and provide a complete analysis and discussion, these lexicons and rules are incorporated as the input features to represent the reviews. The input for the SVM is represented as set of rows that denote the interpretation for each review, where each row is represented according to the following vector:

Aspect-based sentiment analysis
From the above representation, each feature (N, I, D, R, O) is indexed using two variables. The first variable represents the review identifier and the second variable represents the word index in the review. Whereas, m denotes the maximum words count in a review in the given dataset. As illustrated in Table 10, the results show that the achieved accuracy is higher than other SVM models. This underlines that fact that the input dataset has distinct lexicon features which can help in the classification process.
According to the results that are illustrated in Table 10, the proposed integrated lexicon and rule-based approach with all combined rules has significantly outperformed both the feature-based aspect-based sentiment analysis using SVM and the aspect terms extraction using Gini Index and SVM classifier. One of the main reasons for these results is that the feature-based aspects-based sentiment analysis using SVM approach does not consider the implicit aspects. Furthermore, this approach is dependent on structured sentences for accurate POS tagging and parsing. On the other hand, the aspect term extraction using Gini Index with SVM achieved higher performance than the feature-based aspects-based sentiment analysis using SVM approach since it does not look into the structure of the reviews and only relies on the weightage of the terms. However, by adopting the lexical features as inputs for the SVM, the model was able to achieve higher performance than other SVM models and slightly lower performance than the integrated lexicon and rule-based model. With respect to the pearson product-moment correlation coefficients, as illustrated in Table 10, the usage of various lexical rules significantly impacts the correlation value. The highest correlation value was achieved when using the combined rules (L þ N þ I þ D þ R þ S) with a score of 0.81.

Implications
Public's opinion is very important to increase the usage of the smart government mobile apps. Government agencies are looking for customer centric approaches to better understand their consumers' opinions toward their services. The reviews represent a main source for apps owners to understand the user's perspectives and hence, provide services that are up to their customers' expectations. Moreover, people are keen to look at others' opinions before downloading and using any smart app.
Research has made significant progress in regards of natural language processing albeit the difficulties faced in such tasks [44]. There is a huge need in the industry for sentiment analysis services as every business wants to know the opinions of their customers. With such needs, research in the field of sentiment analysis and opinion mining will stay active for years to come [4].
One may argue that current methods and algorithms are sufficient to provide unsupervised solutions for different domains datasets. However, a fully automated and accurate solution that can be applied to all domains does not yet exist. However, it is possible to develop effective semi-supervised solutions were domain specific aspects are manually annotated while other tasks are automated. Moreover, cross-domain sentiment analysis can be more effectively utilized to reduce the manual efforts in labelling data. In this case, machine learns from a particular domain and applies the knowledge to analyse the sentiment of texts in another domain [45].
Therefore, despite our study being domain dependent, the methods applied in this research can extend their applicability to other domains concerned about public and private services. Moreover, the methods proposed in this research can be applied in the social media where the textual posts are similar in nature and structure.

Conclusion
Aspect-based sentiment analysis is considered as one of the challenging tasks in sentiment analysis area of research. It is important that all feedbacks are understood and categorized so that smart governments can rely on this channel to listen to their customers. Therefore, this can be considered as a factor for future smart services improvements and optimizations that exceed the people's expectations. In this regard, an integrated lexicon and rule-based approach was employed to extract explicit and implicit aspect as well as sentiment classification for these aspects. In this study, an integrated lexicon and rule-based model has been chosen. This model utilized the manually generated lexicons in this study with hybrid rules to handle some of the key challenges in aspect-based sentiment analysis in particular and sentiment analysis in general. This approach reported high performance results through an integrated lexicon and rule-based model. The approach confirmed that integrating sentiment and aspects lexicons with various rules settings that handle various challenges in sentiment analysis, such as handling negation, intensification, downtoners, repeated characters, and special cases of negation-opinion rules, outperformed the lexicon baseline and other rules combinations.
The dataset contains a substantial number of comparable sentences. This can be considered as additional rules that can extend the proposed model. [46] pointed out that users may express their opinions by comparing several entities together. These comparative rules are considered as a significant source for determining the strengths and weaknesses of the government mobile apps and highlighting the future enhancements and improvement areas. Another area that can be applied on such dataset, is to identify the improvement areas and innovative ideas through analyzing the customer's reviews which may include suggestions, complaints, problems, bugs, comparisons among others.
Government mobile apps owners should take a step forward and think how to gather their customer's reviews and feedbacks. One quick and easy way is to prompt the users for providing a review within the mobile app. Also, a rewarding program could be beneficial to increase the customers' reviews towards this particular application.