Abstract
Two novel Natural Language Processing (NLP) classification techniques are applied to the analysis of corporate annual reports in the task of financial forecasting. The hypothesis is that textual content of annual reports contain vital information for assessing the performance of the stock over the next year. The first method is based on character n-gram profiles, which are generated for each annual report, and then labeled based on the CNG classification. The second method draws on a more traditional approach, where readability scores are combined with performance inputs and then supplied to a support vector machine (SVM) for classification. Both methods consistently outperformed a benchmark portfolio, and their combination proved to be even more effective and efficient as the combined models yielded the highest returns with the fewest trades.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 2003, Dalhousie University, Halifax, Nova Scotia, Canada, pp. 255–264 (August 2003)
Falinouss, P.: Stock trend prediction using news articles. Master’s thesis, Lulea University of Technology (2007) ISSN 1653-0187
Schumaker, R., Chen, H.: Textual analysis of stock market prediction using financial news articles. In: Proc. from the America’s Conf. on Inform. Systems (2006)
Mittermayer, M.: Forecasting intraday stock price trends with text mining techniques. In: Proc. of the 37th Hawaii Int’nal Conf. on System Sciences (2004)
Kloptchenko, A., Magnusson, C., Back, B., Vanharanta, H., Visa, A.: Mining textual contents of quarterly reports. Technical Report No. 515, TUCS (May 2002) ISBN 952-12-1138-5
Kloptchenko, A., Eklund, T., Karlsson, J., Back, B., Vanharanta, H., Visa, A.: Combined data and text mining techniques for analysing financial reports. Intelligent Systems in Accounting, Finance and Management 12, 29–41 (2004)
Subramanian, R., Insley, R., Blackwell, R.: Performance and readability: A comparison of annual reports of profitable and unprofitable corporations. The Journal of Business Communication (1993)
Li, F.: Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics (2008)
Yahoo! Inc.: Yahoo! finance, http://ca.finance.yahoo.com/ (last access 2008)
Kešelj, V.: Text:Ngrams Perl module for flexible ngram analysis (2003–2009), http://www.cs.dal.ca/~Evlado/srcperl/Ngrams/Ngrams.html Ver, 2.002. Avail. at CPAN
Ryan, K.: Lingua:EN: Fathom Perl module for measuring readability of english text. Available at CPAN (2007)
CPAN Community: CPAN—Comprehensive Perl Archive Network (1995–2009), http://cpan.org
Fast, G.: Lingua:EN: Syllable Perl module for estimating syllable count in words. Available at CPAN (1999), http://search.cpan.org/perldoc?Lingua:EN:Syllable
Fan, R., Chen, P., Lin, C.: Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889–1918 (2005)
Fan, A., Palaniswami, M.: Stock selection using support vector machines. In: Proceedings of IJCNN 2001, vol. 3, pp. 1793–1798 (2001)
Huang, W., Nakamori, Y., Want, S.-Y.: Forecasting stock market movement direction with support vector machine. Computers and Operations Research 32, 2513–2522 (2005)
Kim, K.: Financial time series forecasting using support vector machines. Neurocomputing 55, 307–319 (2003)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~7Ecjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Butler, M., Kešelj, V. (2009). Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports. In: Gao, Y., Japkowicz, N. (eds) Advances in Artificial Intelligence. Canadian AI 2009. Lecture Notes in Computer Science(), vol 5549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01818-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-01818-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01817-6
Online ISBN: 978-3-642-01818-3
eBook Packages: Computer ScienceComputer Science (R0)