Abstract
Supervised sentiment classification approaches require labeled training (source) and testing (target) dataset. Generation of such datasets demands substantial time and effort but cross-domain classification minimizes the effort by considering two different domains for source and target datasets. In this paper, we propose Cross-D-Vectorizers i.e., a set of three sentiment n-gram feature-spaces (Lexical-TFIDF, Lex-Delta-TFIDF and SEND) for the purpose of cross-domain analysis. We construct the features by extracting sentiment unigrams combination with intensifiers and negations from the source dataset. By utilizing an existing lexicon the scores of these features are computed in three different procedures. The scores for each feature are computed by multiplying sentiment value with corresponding TFIDF rating, Delta-TFIDF rating and feature-importance-values (FIV) respectively. Importance-value for each SEND (Sentiment wEight ofN-grams inDataset) feature is calculated by multiplying the number of times the feature appears in the review and the logarithm of its inverse frequency in the corpus. We experiment by using Maximum Entropy, Support Vector Machine and K-Nearest Neighbors classifiers on three benchmark datasets and one proposed dataset for cross-domain classification. Proposed approach show improved results in comparison with existing methods. The advantage of our approach is the complexity of system reduces by considering sentiment n-grams as domain independent features instead of any n-grams.
Similar content being viewed by others
References
Arunachalam R, Sarkar S (2013) The new eye of government: Citizen sentiment analysis in social media. In: 6th international joint conference on natural language processing, p 23
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders, Domain adaptation for sentiment classification. In: ACL, vol 7, pp 440–447
Bo P, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp 79–86. Association for Computational Linguistics
Bollegala D, Mu T, Goulermas JY (2016) Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans Knowl Data Eng 28(2):398–410
Brooke J (2009) A semantic approach to automated text sentiment analysis. PhD thesis, Simon Fraser University
Chen Y (2017) A high-quality digital library supporting computing education: The ensemble approach. PhD diss., Virginia Tech
Chen Y, Fox EA (2014) Using ACM DL paper metadata as an auxiliary source for building educational collections
Chen Y, Xie Z, Fox EA (2017) A library to manage web archive files in cloud storage. TCDL Bulletin 13, 1
Chidlovskii B, Csurka G, Gangwar S (2014) Assembling Heterogeneous Domain Adaptation Methods for Image Classification. In: CLEF (Working Notes), pp 448–461
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: An n-gram lexicon for sentiment analysis. Expert Syst Appl 103:92–105
García-Díaz JA, Salas-Zárate MP, Hernández-Alcaraz ML, Valencia-García R, Gómez-Berbís JM (2018) Machine learning based sentiment analysis on spanish financial tweets. In: World conference on information systems and technologies. Springer, Cham, pp 305–311
Han H, Zhang J, Yang J, Shen Y, Zhang Y (2018) Generate domain-specific sentiment lexicon for review sentiment analysis. Multimedia Tools and Applications. 1–6
Hsu C-W, Chang C-C, Lin C-J et al (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf
Hutto CJ, Gilbert E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: 8th international AAAI conference on Weblogs and social media
Ji J, Luo C, Chen X, Yu L, Li P (2018) Cross-domain sentiment classification via a bifurcated-LSTM. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 681–693
Jones KS (1973) Index term weighting. Information storage and retrieval 9 (11):619–633
Li Y, Qin Z, Xu W, Guo J. (2015) A holistic model of mining product aspects and associated sentiments from online reviews. Multimed Tools Appl 74(23):10177–10194
Liang Y, Liu B, Lin H, Lin Y (2016) Combining local and global information for product feature extraction in opinion documents. Inf Process Lett 116(10):623–627
Liu B (2011) Opinion mining and sentiment analysis. In: Web data mining. Springer, pp 459–526
Liu Y-H, Chen Y-L (2018) A two-phase sentiment analysis approach for judgement prediction. J Inf Sci 44(5):594–607
Luo B, Zeng J, Duan J (2016) Emotion space model for classifying opinions in stock message board. Expert Syst Appl 44:138–146
Martineau J, Finin T (2009) Delta TFIDF: An improved feature space for sentiment analysis, International Conference on Web and Social Media 9 106.
Matsumoto S, Takamura H, Okumura M (2005) Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees, PAKDD. vol 5
Mudinas A, Zhang D, Levene M (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the 1st international workshop on issues of sentiment discovery and opinion mining, pp 5. ACM
Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification, IJCAI-99 workshop on machine learning for information filtering. Vol 1
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 271
Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 task 4: Sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 502–518
Taboada M et al (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Taboada M, Grieve J (2004) Analyzing appraisal automatically, AAAI Press, Stanford University
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126
Wang L, Niu J, Song H, Atiquzzaman M (2018) SentiRelated: A cross-domain sentiment classification algorithm for short texts through sentiment related index. J Netw Comput Appl 101:111–119
Yu LC, Lee CW, Pan HI, Chou CY, Chao PY, Chen ZH, Tseng SF, Chan CL, Lai KR (2018) Improving early prediction of academic failure using sentiment analysis on self-evaluated comments. Journal of Computer Assisted Learning
Acknowledgments
We are grateful for the access to facilities of “E-Business Centre of Excellence” Lab at Indian Institute of Technology, Kharagpur. This work is supported by MHRD, Govt. of India, [Sanction Letter No.: F.No. 5-5/2014-TS.VII, Dt; 04-09-2014], Dept. of Higher Education, New Delhi,India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dey, A., Jenamani, M. & Thakkar, J.J. Cross-D-vectorizers: a set of feature-spaces for cross-domain sentiment analysis from consumer review. Multimed Tools Appl 78, 23141–23159 (2019). https://doi.org/10.1007/s11042-019-7553-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7553-0