Erratum: Employing sentiment analysis for gauging perceptions of minorities in multicultural societies: An analysis of Twitter feeds on the Afrikaner community of Orania in South Africa

These corrections do not alter the study’s findings of significance or overall interpretation of the study results. The errors have been corrected in the PDF version of the article. The publisher apologises for any inconvenience caused. Erratum: Employing sentiment analysis for gauging perceptions of minorities in multicultural societies: An analysis of Twitter feeds on the Afrikaner community of Orania in South Africa

In the version of this article initially published, the text entries in Eqn 1 and Eqn 4 were mistakenly misspelled which presented it in an illegible format. The actual values for Eqn 1 and Eqn 4 are updated and presented here: These corrections do not alter the study's findings of significance or overall interpretation of the study results. The errors have been corrected in the PDF version of the article. The publisher apologises for any inconvenience caused.

Introduction
South Africa is a diverse country renowned for its racial and cultural tensions. In recent years, commentators such as Khoza (2017b), Roets (2017), Brink and Mulder (2017) and Steward (2016) have noted an increase in racial tensions as witnessed on social media platforms. Social media platforms are a group of internet-based applications that allow for the creation and exchange of data that are generated by people. Gundecha and Liu (2012) describe the different types of social media applications such as online social networking (Facebook, MySpace, LinkedIn), blogs (Engadget), microblogging (Twitter, Tumblr, Plurk), social news (Digg, Reddit), media sharing (YouTube, Flickr) and wikis (Wikipedia, Wikitravel, Wikihow). The most important social media platforms used to create and exchange user-generated content (UGC) are online social networking, microblogging and blogs (Stieglitz & Dang-Xuan 2013). One of the most prominent microblogging platforms is Twitter.com (http://www.twitter.com/), which has grown steadily in its adoption as the main microblogging platform in South Africa over the last few years. In a recent survey, World Wide Worx (2016) reports that Twitter is used by approximately 7.7 million people in South Africa, making it the third most used social media platform after YouTube (8.74 million) and Facebook (14 million). Some recent important debates under the hashtags #FeesMustFall and #StateCapture have also drawn much attention to how the general public can voice their opinions using social media (Findlay 2015), and one could add the conflict around the hashtag #WinnieMandela as a more recent example of how ethnic tensions play out on social media.
Sentiment analysis, a widely adopted big data analytics technique, is often used to mine customers' views and opinions from online social media platforms (Pang & Lee 2008). Sentiment analysis is a growing focus area of natural language processing (NLP) used to determine whether a text, or part of it, is subjective or not, and if subjective, whether it expresses a positive, negative or neutral view (Taboada 2016). Sentiment analysis of microblogging data, such as Twitter, has attracted much attention in recent years, both in industry and in academia. The reasons are mainly because of the rapid growth in Twitter's popularity as a platform for people to express their opinions and attitudes towards topics of interest. Not only is Twitter popular, the platform also contains an enormous volume of text as well as links to external media, including websites that are visible to subscribed users to that service (Pak & Paroubek 2010). This makes it an South Africa is well known as a country characterised by racial and ethnic divisions, particularly for the divisions and conflicts between the white population and black population. This study uses the Twitter platform to analyse the discourse around the controversial town of Orania, a minority Afrikaner community that aims to preserve their Afrikaner culture. In doing so, we make use of sentiment analysis, a subfield of natural language processing (NLP). We follow a lexicon-based approach using four different publicly available data sets to test how the discourse around this minority community can be analysed. We show, based on the discourse on Orania on Twitter, that (1) Orania is mostly depicted in a negative light, (2) Orania is mostly seen as a racist community, and (3) Orania is often mentioned in reference to other issues that affect Afrikaners directly, such as farm attacks, first language education and land expropriation without compensation. Our study also shows that using lexicons as a sentiment analysis technique was not sufficient in the automatic detection of abusive language, but rather the sentiment of the tweet. Suggestions are made for further research that focuses on the automatic detection of abusive language online.
invaluable research resource for sentiment analysis, a field associated with extracting sentiments (or opinions) from unstructured text (such as Twitter messages).
The purpose of this study is to investigate the discourse surrounding the community of Orania by using the sentiment analysis on tweets and whether this approach is effective at gauging the public perception about a minority community. The rest of this article is organised as follows. Firstly, we provide some background to the establishment of Orania. Thereafter, we describe business intelligence, sentiment analysis and related work. We then include an outline of the methodology and the data used for analysis, and highlight the results. After this section, we discuss the findings and whether it is applicable in gauging sentiment for a minority community.

Background to the establishment of Orania
Since united by the British Empire in 1910 when the Union of South Africa was established, South Africa has struggled with accommodating its various ethnic groups. Although best known for the conflict between black population and white population because of apartheid, these groups are themselves also diverse and have often been in conflict. One solution to the problem of handling such a diverse population, and in following the European example (see Muller 2008), is to divide the country so that each population can achieve self-rule while still maintaining economic and other ties. Muller (2008:27) argues that ethnic conflict was one of the primary causes of the two World Wars and that aligning national borders with populations was proposed by Winston Churchill, Franklin Roosevelt and Joseph Stalin as 'a prerequisite to a stable postwar order'. Muller (2008) also quotes Churchill from a speech to the British parliament in December 1944, where he referred to the forced resettlement of populations: Expulsion is the method which, so far as we have been able to see, will be the most satisfactory and lasting. There will be no mixture of populations to cause endless trouble. … A clean sweep will be made. (p. 27) This was the original notion behind apartheid, which was formally introduced in 1948: Each population would gain independence and self-rule in their own geographic area. Prime Minister Verwoerd, for instance phrased this view clearly on 20 May 1959: 'Die een beginsel is die vrymaking van die Bantoe: die ander beginsel is die vrymaking van die blanke' [The one principle is the liberation of the Bantu: the other principle is the liberation of whites] (Pelzer 1966:275). In other words, no population group would rule over another; each would follow the European example where nations ruled themselves in their own geographic area. To do this, attempts were made to expand the existing homelands and grant them independence, with Transkei being the first to gain independence in 1976 and Bophuthatswana following in 1977. However, by this time it was already apparent that the homelands were not economically or politically viable, and alternative forms of partition were debated by, for instance Tiryakian (1967), Sulzberger (1977), Von der Ropp (see Blenck & Von der Ropp 1977;Von der Ropp 1979;Von der Ropp & Blenck 1976), Lambsdorff (1986) and Pabst (1996) (see also Geldenhuys 1981:55). Von der Ropp, for instance proposed that South Africa would be divided into a white part and black part, which would allow each part access to mines, harbours and large metropolitan areas that would give people access to land and economic resources. Partition in the South African case did not and has not gained favour, neither nationally nor internationally, and in the run up to South Africa's first inclusive election in 1994, the leadership of most of these communities decided to opt for an inclusive 'Rainbow Nation' where all would integrate and work and live side by side.
One of the communities that challenged this integrative solution was the Afrikaner Nationalist community. Fearing that majority rule would eradicate minority rights, languages and communities (the Afrikaner currently comprises 59.1% of the white population, which in turn comprises around 8% of the total South African population), it was proposed that the Afrikaner establish a volkstaat (a country for the Afrikaner) where they would rule themselves (Hagen 2013;Pienaar 2007;Schönteich & Boshoff 2003). The idea of violent cessation was discussed in the early 1990s, but given the cost of a civil war, the leading organisation that proposed this solution, the Freedom Front, reached a settlement with the African National Congress (ANC) that led to the inclusion of Article 235 in the new South African constitution, which guarantees minorities' right to self-determination. No volkstaat was however established.
The Afrikaner Vryheidstigting (Afrikaner Freedom Foundation), founded on 21 March 1988 and currently known as the Orania Beweging (Orania Movement), however sought a practical rather than political solution. In 1991, they bought the abandoned town of Orania in the Northern Cape with the goal of establishing an Afrikaner community here. A few hundred Afrikaners moved here, and although initial progress was slow, in recent years this community has grown to around 1400 currently. Orania now has its own bank, the Orania Spaar-en Krediet Koöperatief (Orania Savings and Credit Cooperative); uses its own 'currency', the Ora (although tied to the South African rand); has a fast-growing economy; and exports products globally (see De Beer 2006;Hagen 2013;Kotze 2003;Labuschagne 2008;Pienaar 2007;Steyn 2005).
Part of the reason for Orania's recent growth has been the Afrikaner's increasing sense of marginalisation, as studied, for instance, by Hermann (2006). Affirmative action, the loss of first language education, negligible political power, crime, farm attacks and threats of violence by black political leaders have all contributed to many Afrikaners questioning the viability of the Rainbow Nation. For instance, the leader of the third-largest political party, the Economic Freedom Fighters (EFF), Julius Malema, has repeatedly called for the dispossession of 'whites'' property and urged his followers to kill 'Boers' (Afrikaners). In 2017, Brink and Mulder (2017) compiled a report that shows numerous government officials calling for violence against Afrikaners, while commentators such as Khoza (2017b) and Steward (2016) also note the increasing amount of hate speech directed towards whites on social media. At the time of writing, the South African government is in the process of evaluating how to expropriate land without compensation, and the discourse around the subject is often phrased in ethnic terms: black people are the 'original' owners and white people 'stole' the land and should therefore be dispossessed (see, e.g. Eloff 2017; Osborne 2018). Importantly, a report by the South African Institute on Race Relations, based on a nationwide study, notes that: … 61% of black respondents now agree that South Africa is a country for blacks rather than whites, while only 38% disagree. This suggests that ANC and EFF rhetoric castigating whites and demanding a major shift in the ownership and management of the economy may be having significant impact on black opinion (Jeffery 2018).
From its inception, Orania has been the target of fierce criticism. Orania is often depicted in the media as a racist town, a leftover of Apartheid populated by white people who refuse to abandon their prejudices (see, e.g. Khan 2014;McNally 2010;Ngugi 2017). Focusing on Afrikaner culture, Orania does not exclude anyone based on race, but in reality, the Afrikaner -as a descendant of European settlers since 1652 -is a white community and no black people have settled in Orania. This has led to Orania becoming a synonym of racism.

Business intelligence and sentiment analysis
Business intelligence and analytics (BI & A) is becoming increasingly important in analysing UGC, which includes sentiments, images and videos using big data analytics (Chen, Chiang & Storey 2012). Effective BI & A can be used to improve a firm or an organisation's decision-making capabilities. Other uses include improving operations, reducing marketing costs or simply obtaining a better understanding of customer preferences and opinions (Wixom & Watson 2010).
One such an example is Twitter Sentiment Analysis (TSA), where text mining techniques are used to mine messages posted on Twitter. Twitter, a microblogging platform, allows users to share short messages, links to other websites, images or videos. The message is written by one person and read by a number of individuals, called followers. Most messages also contain hashtags, which in turn are used to indicate the relevance of a tweet to a certain topic. These hashtags are created using the # character, followed by the name of topic (#topic). Twitter Sentiment Analysis tends to focus on the sentiment identification or sentiment classification of individual Twitter messages, called tweets. Generally, two main approaches are followed for tweet-level sentiment detection: machine learning and lexicon-based.
The machine learning approach uses a sentiment classifier to determine the polarity of new texts (document, sentence or phrase). This process is referred to as supervised learning and requires training data to teach the sentiment classifier characteristics which distinguish a negative sentiment from a positive one (Pang, Lee & Vaithyanathan 2002). The training data are usually labelled according to the tweet's polarity (positive, negative and neutral) and can be inferred using hashtags and emoticons (Go, Bhayani & Huang 2009), or by means of consensus using results from Twitter sentiment websites (Barbosa & Feng 2010). The sentiment classifier algorithms, using the given training data set, build a predictive model to classify new incoming data. In the absence of sufficient manually labelled data, a semisupervised approach can be followed to extend the existing training data set with newly labelled instances.
Twitter Sentiment Analysis, using a machine learning approach, has been applied extensively. Some of the most applied sentiment classifiers include Naïve Bayes (NB), Support Vector Machines (SVM), Maximum Entropy (MaxEnt), Random Forests and Logistic Regression (da Silva & Hruschka 2014; Go et al. 2009;Pak & Paroubek 2010). An important drawback of supervised learning is that it tends to be domaindependent and requires labelling of data in a new domain, or re-training new arriving data (Taboada 2016). On the contrary, once a labelled data set is available, that is, where text has been labelled positive, negative or neutral, training is trivial, and a classifier can be built relatively quickly with programming languages such as Python, which supports machine learning algorithms (Pedregosa et al. 2011;Perkins 2014;Sarkar 2016).
Unlike sentiment classifiers, the lexicon-based approach does not require training data, but instead relies on a sentiment lexicon. The lexicon-based approach is a rulebased approach and is used to analyse text at the document or sentence level in conventional texts such as blogs, forums and product reviews (Ding, Liu & Yu 2008;Kim & Hovy 2004;Turney 2002). The lexicon-based approach can be used across different domains without changing the dictionaries, making it an attractive approach for TSA (Taboada et al. 2011). In this approach, sentiment values of text are derived from the sentiment orientation of the individual words using an existing lexicon dictionary. The sentiment values provided by the model's dictionary indicates a word's polarity (e.g. awesome is positive and horrible is negative). When new text is classified, words in the text are matched to words in the dictionary, and using various algorithms, the values are aggregated into a sentiment score for the text. In general, lexicon-based approaches are more intuitive, robust and easier to implement than supervised learning approaches. For example, lexicon-based approaches have been shown to be successful on conventional text, as well as tweets (Mohammad, Kiritchenko & Zhu 2013;Thelwall, Buckley & Paltoglou 2012;Thelwall et al. 2010). However, unlike the machine learning approach, lexicon-based methods are less explored in TSA, TSA is less explored mainly because of the uniqueness of tweet messages (words such as gr8 and yolo) and the dynamic nature with new hashtags emerging daily (Giachanou & Crestani 2016).

Related work
Sentiment analysis is often used in recommendation systems, online advertising systems and question-answering systems (Pang & Lee 2008). More recent studies include business and governments mining opinions from human-authored documents to assist with reputation management (Seebach, Beck & Denisova 2012). Other applications include mining Twitter data for opinions and sentiments during political elections (Tumasjan et al. 2010), stock market indicators (Bollen, Mao & Zeng 2011) and identifying social issues during natural disasters (Neppalli et al. 2017). In addition to these, sentiment analysis can also explore how news events affect public opinion. For example, in a study conducted by Wang et al. (2012), a real-time sentiment analysis model was used to evaluate responses to and public opinion regarding the 2012 US presidential election, while a more recent study by Jiang, Lin and Qiang (2016) assessed public opinion during the whole life cycle of a large hydro project. In addition to these studies, real-world applications, such as We Feel, are made available to the public to gauge and explore the real-time signal of the world's emotional state (Milne et al. 2015

Text preprocessing
Basic linguistic preprocessing is required to prepare the lexical source for sentiment analysis, as most forms of social media (except reviews) are very noisy. This function includes preprocessing tasks and methods that include data cleansing, tokenisation and syntactic parsing (Dey & Haque 2009). Firstly, external links and user names (signified by @ sign) were eliminated. We replaced all URLs with a tag ||HTTP_URL|| and targets (e.g. '@John') with tag ||AT_USER||. Special care was also taken with elongated words. We replaced a sequence of two or more repeated characters by two characters, for example we converted 'huuuuuungry' to 'huungry'. Special characters ($, % and #) and punctuation marks (full stops, commas, question marks and exclamation marks) were removed, except emoticons, as people often use the latter to express sentiment with tokens such as ':)', ':-)' or ':('. Because of the small corpus size, retweets (tweets that are re-distributed and start with 'RT') were kept in the corpus. We also applied automatic filtering to remove duplicate tweets, and tweets that were not written in English. After cleaning, we performed sentence segmentation, which separates a tweet into individual sentences. As is standard in NLP practices, the sentences were tokenised. Several tokenisers were investigated and the TweetTokenizer as part of the National Language Toolkit (NLTK) by Bird, Klein and Loper (2009) was found to be best suited for the study. The tokeniser handles emoticons, HTML tags, URLs, retweets, user mentions and Unicode characters correctly. Finally, all English stop-words (i.e. words that are common words with low discriminating power, e.g. the, is and who) were removed and the remaining tokens were converted to lowercase.

Negation and modifiers handling
Negation refers to the process of converting words from positive to negative, or negative to positive by using special words: never, no, not and n't. Handling negation is an important step of sentiment analysis as the negation can influence the sentiment of a text. A simple implementation strategy of handling negation was followed in this study: if a negation word is found, the polarity score of the word following the negation is reversed. A similar strategy was followed handling modifier words such as very, much and really. The polarity of the word following the modifier was adjusted with a factor of 1.3, thus increasing or decreasing the polarity of the word.

Sentiment analysis
The design of the sentiment model used in this study followed the lexicon-based approach using two lexicon dictionaries. This was because of the lack of sufficient training data and the notion that the lexicon-based approach can function without any corpus and does not require any training. A Python 2.7 sentiment analysis classifier was developed to handle the preprocessing, negation, modifiers and score each sentiment. The sentiment score of each tweet was derived using the polarity scores of each word found in the lexicon dictionaries. The scores were then used in a classification method to classify the polarity of the tweets into either positive, negative or neutral categories.

Lexicons used
The sentiment analyser employed two lexicon-based dictionaries, namely Bing Liu's Opinion Lexicon (Hu & Liu 2004)

Score aggregation
Given a tweet t, the sentiment words were first identified by matching with the words in the two sentiment lexicons. We then compute an orientation score for the tweet t. Using, for example the lexicon of Hu and Liu (2004), a positive word is assigned the semantic orientation score of +1, and a negative word is assigned the semantic orientation score of −1. A similar approach is followed using the lexicon of Mohammad, Kiritchenko and Zhu (2013). The sentiment score of a tweet t is then calculated as the sum of scores of its sentiment words divided by the number of words with scores to produce an average score.

Evaluation measures
Precision, recall, F-measure and accuracy are evaluation metrics used to evaluate the performance of a sentiment analyser (Go et al. 2009). These evaluation metrics are used within a confusion matrix to indicate true positives (T P ), true negatives (T N ), false positives (F P ) and false negatives (F N ). True positives or negatives are correctly predicted values, which means the value of the tweet sentiment and the value of the predicted tweet sentiment are the same. False positives or negatives are incorrectly predicted values, which means the value of the tweet sentiment and the value of the predicted tweet sentiment are not the same.
Precision measures the ratio of correctly predicted positive instances among the identified positive or negative tweets. Precision (P) is defined as the number of true positives over the number of true positives plus the number of false positives (F P ). Recall measures the ratio of correctly predicted positive instances among all the positive or negative tweets. Recall (R) is defined as the number of true positives (T P ) over the number of true positives plus the number of false negatives (F N ). Accuracy is the most intuitive measure among these and measures the ratio of correctly predicted instances among all the tweets. Finally, the F-measure (or F1 score) is the weighted average of Precision and Recall. The formulas for the evaluation measures are given as follows:
A comparison was made to evaluate the effectiveness of the polarity classification method with other sentiment analysers, which included Pattern (De Smedt & Daelemans 2012) and AFINN (Nielsen 2011). Both sentiment analysers make provision for sentiment classification using a lexicon. The AFINN lexicon consists of 2477 words that have been purposefully created for sentiment analysis of microblogging messages such as tweets. The Pattern lexicon is a subjectivity lexicon-based on English adjectives, where adjectives have a polarity (negative or positive, −1.0 to +1.0) and a subjectivity (objective or subjective, +0.0 to +1.0) score. The results in Table 1 show that the lexicon-based classification method performed similarly when compared to other sentiment analysers.

Sentiment analysis results
To gain an understanding into the data set, the corpus was first analysed in terms of word frequencies. Thereafter, a time-series analysis was conducted, followed by a sentiment analysis of the Twitter data.
The following tweets recorded the highest number of retweets: • "Another whites only settlement in South Africa Kleinfontein sister Town to Orania" (n = 1348) • "@EFFSouthAfrica please table a motion to disband Orania, declare #AfriForum as a right wing movement and ensure they feel the heat" (n = 176) • "South Africa: Orania Schools Bursting at the Seams" (n = 117) • "AfriForum and Solidarity must go open their university in Orania. #LanguagePolicy" (n = 111) • "Welcome to Orania a Whites Only Settlement in South Africa" (n = 95) • "The Orania land was bought in the 80s to prepare for the end of apartheid. The ANC knew about it then. Madiba visited Verwoerd's widow there. They have a museum honouring all Apartheid presidents except de Klerk. Statue of Verwoerd looks over the town. I wrote a paper on this" (n = 75) • "Paarl is the kind of town white people move to when they miss Apartheid but dont want the commitment Orania requires" (n = 66) • "You know what's scary? That Orania is protected by the constitution. What's even more scary is that the kids who go to primary and high school in Orania can go to any university in THE COUNTRY! Multiracial and all! And we think racism is going to fall? Funny!" (n = 52)

Tweet time-series analysis
From a preliminary time-series analysis, several target dates showed an above average number of tweets (> 24.81), which would suggest that the hashtag #orania or keyword orania were tweeted frequently that day on Twitter. The content of these tweets referred to specific news events in South Africa during the data collection period, which are given in Table 3.

Tweet trending analysis results
The tweet corpus was analysed for retweet trends (i.e. tweets that are retweeted over a short period of time). Results can be seen in Figure 1.
The tweets that were retweeted the most are presented in Table 4.

Sentiment analysis results
Because the proposed method produced a consistent level of accuracy in comparison with other sentiment analysers, the English tweet corpus was analysed by the polarity classification method. In addition to this, AFINN and Pattern were also used as sentiment analysers. A threshold of zero was used to classify the tweets into positive, neutral or negative groupings. In other words, if the score was 0, which indicates no sentiment value, the tweet was classified as neutral. A score of +0 was considered positive, and a score  ('right', 'wing', 'movement', 'and') ('wing', 'movement', 'and', ' of −0 was considered negative. The results of the sentiment analyser in terms of polarities are shown in Tables 5 and 6, with some annotations in Table 7.
The varied results suggest that a lexicon is dependent on a particular domain. For example, the Lexicon Opinion words were extracted from customers reviews, which are not limited to 140 characters associated with Twitter messages. The Hashtag Sentiment Lexicon, on the contrary, was generated from tweets with sentiment-word hashtags, and thus, are much closer associated with the corpus used in this study. We were surprised at the results of the AFINN, whose lexicon was also generated from tweets. The AFINN lexicon, however, only consists of 2477 words, while the NRC-Canada Hashtag Sentiment Lexicon consists of 54 129 unigram, and therefore, would be able to score more words than the AFINN lexicon. For these reasons, the study will use the results of the NRC lexicon in our discussion.

Discussion
The results above clearly show that Orania is associated with racism. Whether or not the community actually see themselves as a 'whites only' community, word frequencies and trending tweets clearly show an association with racism. The NRC lexicon's results, as shown in Tables 5 and 6 above, also indicate that the majority of unique tweets without retweets (63.95%), as well as total tweets (50.47%), have a negative sentiment towards Orania. Twitter is clearly used as a

Twitter message Sentiment classification
"I wish all racists people move to Orania" Negative RT @user: "The fact that the government has allowed Orania to exist is appalling" Negative @user "Orania is basically farm with around 1000 people living there. Hardly a threat. And there are less every year" Neutral "Orania offers a safe sanctuary from the crime-ridden neighbourhoods of South Africa, Orania is the first town of …" Positive @user "lies are spread about Afrikaners as a whole, nou just about the bunch in Orania. You're not alone on that one." Positive platform to share negative sentiments about Orania. This is to be expected, as an Afrikaner-only community that goes against the dominant ideology and government policy of integration in a majority black country is bound to elicit fierce criticism. Note also that the word racist has a score of −1.377 in the NRC lexicon, while apartheid has a score of −4.999 and racists a score of −2.699; and given the high frequency with which these words occur in the corpus, a substantial part of the total negativity score can be attributed to the frequent occurrence of these negative words. Interestingly though, these negative tweets are all from outside the community: we checked for overtly racist tweets coming from users identified as people living in Orania and did not find a single occurrence. Racial slurs are limited to references to 'white pigs', while no slurs targeting black people occur in this corpus.
The time-series analysis above also shows that Orania is part of the South African political landscape, especially where issues affect Afrikaners. The high number of tweets associated with the Black Monday protests on 30 October 2017, the issue surrounding the language policy at the University of the Free State, the issue around Hoërskool Overvaal and Afrikaans as a medium of instruction, the election of Cyril Ramaphosa as South Africa's new president and the debate around land expropriation are all issues that affect Afrikaners directly. Whenever a major event occurs that affects Afrikaners, mentions of Orania rise. In line with commentators such as Steward (2016) noting that anti-white sentiment on social media platforms is on the increase, negative sentiment towards Orania is also tied to negative sentiments towards Afrikaners. Black Monday is a case in point: the nationwide protest against farm murders was condemned as racist by the ANC, EFF and Black First Land First (BLF) (Khoza 2017a;Mphahlele 2017), and this was accompanied by tweets such as: 'Orania Racists are out in our streets today #BlackMonday'.
'Orania is basically an old apartheid flag, been provoking #BlackMonday'. And: 'that @Username [from Afriforum] is just a piece of racist crap. He's representing a bunch of rightwingers who are bitter, they can go to hell or Orania if they so wish.' However, our classifier had limited success when identifying the most negative tweets. The most negative tweets (-0.9 to−0.999) in this corpus, together with the polarity score of each word, are shown in Table 8. The words that contributed most to a tweet's negative polarity are in bold.
None of these tweets are exceedingly negative, with some actually positive. Tweet no. 10, for instance conveys a positive idea. Table 9 shows more examples of positive tweets that received a negative classification.
On the contrary, some exceedingly negative tweets were underestimated, as shown in Table 10.
Through our analysis, it became clear that more research needs to be performed on identifying hate speech and racist rhetoric.
Numerous tweets about Orania go beyond the sharing of a negative opinion and can be considered examples of abusive language. Future research could follow the line of research conducted by Tulkens et al. (2016) and work towards the automatic detection of abusive language, which is an especially important research area given the recent cases of hate speech and racism in the South African media. Note also that these negative tweets are about Orania: we did not find a single tweet of someone defending Orania using such racist or hateful language.
We should note a few important limitations of the study. Twitter is not representative of the general population: Twitter users tend to be young and urban, and hence, one cannot generalise our results to conclude that the general South African population regards Orania in a negative light. Furthermore, from manually examining the user profile pictures and usernames of the top 50 tweeters (people), we deduce that the vast majority of users are not Afrikaners. Hence, the negative sentiment towards Orania does not include the perspectives of a substantial number of Afrikaners themselves. A future study will investigate the views of Afrikaners, but that will involve using social media platforms other than Twitter, for example Facebook.

Conclusion
The rise of social media brought a wealth of data that can aid in the understanding of social issues. This article showed some of the potential and limitations when using sentiment analysis to extract meaning from unstructured text when trying to gauge the opinions people have of a community.
In the case of Orania, it was shown how negatively this community is portrayed on Twitter. It was also shown how the discourse on this community is tied to the discourse on the Afrikaner in general through the fact that mentions of Orania rise when major issues occur that concern the Afrikaner.
One of the most important avenues for future research that was identified in this study is the need to identify racist and hate speech. Some of the tweets clearly showed undertones of hate speech (see example 3 in Table 10). However, the automatic detection of abusive language online is an open challenge for NLP and still an emerging research field. Our study found that using lexicons as sentiment analysis technique was not sufficient in the automatic detection of hate speech, but rather the sentiment of the tweet. Possible future research could consider using collocation extractions, as most words are not offensive in themselves but become offensive with other words or word combinations. In particular, word embeddings (Mikolov, Yih & Zweig 2013) could be useful. Machine learning, which is a subfield of artificial intelligence, could also be used as a statistical approach to train an automatic detection system to 'learn by example'. Given the continuing and escalating racial divisions in South Africa, this avenue of research can open up new opportunities to gauge the level of tolerance or lack thereof that characterise the South African society.