Skip to main content
Log in

A Gaussian copula regression model for movie box-office revenues prediction

基于高斯连接回归模型的电影票房预测

Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this article, we revisit the task of movie box-office revenues prediction using multi-type features. The movie box-office revenues are affected by numerous factors. Previous work with discriminative models assumes these factors are identically and independently distributed. The correlations between these factors are rarely considered, which limited the performances of discriminative models in this task. To address these problems, we investigate a novel Gaussian copula regression model. Based on this model, we do not need to make any prior assumptions about the marginal distributions of the features. In particular, we perform a cumulative probability estimation on each of the smoothed features. The estimation learns the marginal distributions and maps all features into a uniform vector space. Sequentially, we bridge the marginal distributions with a copula function to create their joint distribution, and learn the dependency structure between them. Moreover, we propose a computational-efficient approximate algorithm for responsible variable inference. Experimental results on two movie datasets from Chinese and U.S. market show that our approach outperforms strong discriminative regression baselines.

摘要

本文中, 我们讨论利用多种特征进行电影票房预测的任务。影响电影票房的因素有很多。之前的工作采用的判别模型假设影响电影票房的这些因素是独立同分布的。这些因素之间的关联性很少被考虑, 这样的假设限制了判别模型在此任务上的效果。为了处理这些问题, 我们采用了一个全新的高斯连接回归模型。基于此模型, 我们不需要对特征的边缘分布作任何先验假设。特别地, 我们首先对平滑处理后的特征进行累积概率分布进行估计。通过估计我们学习到了特征的边缘分布, 同时将特征投影到同一向量空间。随后, 我们通过高斯连接函数将这些边缘分布转化为它们的联合分布, 同时获得这些边缘分布之间的依赖关系。此外, 我们还针对联合分布提出了一种高效的因变量推断的近似算法。在两个来自美国和中国电影市场的数据集上的实验结果证明我们的方法表现优于判别模型基线方法。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  1. Liu T, Ding X, Chen Y, et al. Predicting movie box-office revenues by exploiting large-scale social media content. Multimedia Tools Appl, 2016, 75: 1509–1528

    Article  Google Scholar 

  2. Zhou D H, Han W B, Wang Y J, et al. Information diffusion network inferring and pathway tracking. Sci China Inf Sci, 2015, 58: 092111

    Google Scholar 

  3. Duan J, Chen Y, Liu T, et al. Mining intention-related products on online q&a community. J Comput Sci Tech, 2015, 30: 1054–1062

    Article  Google Scholar 

  4. Ding X, Liu T, Duan J, et al. Mining user consumption intention from social media using domain adaptive convolutional neural network. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, 2015. 2389–2395

    Google Scholar 

  5. Wang H, Can D, Kazemzadeh A, et al. A system for real-time twitter sentiment analysis of 2012 U.S. presidential election cycle. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, Jeju Island, 2012. 115–120

    Google Scholar 

  6. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci, 2011, 2: 1–8

    Article  Google Scholar 

  7. Ding X, Zhang Y, Liu T, et al. Using structured events to predict stock price movement: an empirical investigation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1415–1425

    Google Scholar 

  8. Asur S, Huberman B A. Predicting the future with social media. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Washington: IEEE Computer Society, 2010. 492–499

    Google Scholar 

  9. Pan R K, Sinha S. The statistical laws of popularity: universal properties of the box-office dynamics of motion pictures. New J Phys, 2010, 12: 5004

    Article  Google Scholar 

  10. Sklar M. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris, 1959, 8: 229–231

    MATH  Google Scholar 

  11. Härdle W, Kleinow T, Stahl G. Applied Quantitative Finance: Theory and Computational Tools. Berlin: Springer, 2013

    MATH  Google Scholar 

  12. Eickhoff C, Vries A P, Collins-Thompson K. Copulas for information retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013. 663–672

    Google Scholar 

  13. Wang W Y, Wen M. I can has cheezburger? A nonparanormal approach to combining textual and visual information for predicting and generating popular meme descriptions. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, 2015. 355–365

    Google Scholar 

  14. Elidan G. Copula bayesian networks. Advances Neural Inf Process Syst, 2010, 23: 559–567

    Google Scholar 

  15. Fujimaki R, Sogawa Y, Morinaga S. Online heterogeneous mixture modeling with marginal and copula selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 645–653

    Google Scholar 

  16. Sharda R, Delen D. Predicting box-office success of motion pictures with neural networks. Expert Syst Appl, 2006, 30: 243–254

    Article  Google Scholar 

  17. Zhang L, Luo J, Yang S. Forecasting box office revenue of movies with bp neural network. Expert Syst Appl, 2009, 36: 6580–6587

    Article  Google Scholar 

  18. Mishne G, Glance N S. Predicting movie sales from blogger sentiment. In: Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, Stanford, 2006. 155–158

    Google Scholar 

  19. Zhang W B, Skiena S. Improving movie gross prediction through news analysis. In: Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Washington: IEEE Computer Society, 2009. 301–304

    Google Scholar 

  20. Joshi M, Das D, Gimpel K, et al. Movie reviews and revenues: an experiment in text regression. In: Proceedings of Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, 2010. 293–296

    Google Scholar 

  21. Mesty´an M, Yasseri T, Kertész J. Early prediction of movie box office success based on wikipedia activity big data. Plos One, 2013, 8: e71226

    Article  Google Scholar 

  22. Zhang L, Singh V. Bivariate flood frequency analysis using the copula method. J Hydrol Eng, 2006, 11: 150–164

    Article  Google Scholar 

  23. Wang W Y, Hua Z. A semiparametric gaussian copula regression model for predicting financial risks from earnings calls. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. 1155–1165

    Google Scholar 

  24. Nelsen R B. An Introduction to Copulas. New York: Springer, 2013

    MATH  Google Scholar 

  25. Joe H. Multivariate Models and Multivariate Dependence Concepts. Boca Raton: CRC Press, 1997

    Book  MATH  Google Scholar 

  26. Yan J, Leeuw J D, Zeileis A. Enjoy the joy of copulas: with a package copula. J Stat Softw, 2007, 21: 1–21

    Article  Google Scholar 

  27. Bird S. Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, Sydney, 2006. 69–72

    Google Scholar 

  28. Toutanova K, Manning C D. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction With the 38th Annual Meeting of the Association for Computational Linguistics- Volume 13, Hong Kong, 2000. 63–70

    Google Scholar 

  29. Manning C D, Surdeanu M, Bauer J, et al. The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, 2014. 55–60

    Google Scholar 

  30. Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B, 2005, 67: 301–320

    Article  MathSciNet  MATH  Google Scholar 

  31. Smola A, Vapnik V. Support vector regression machines. Adv Neural Inf Process Syst, 1997, 9: 155–161

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Basic Research Program of China (Grant No. 2014CB340503), and National Natural Science Foundation of China (Grant Nos. 71532004, 61133012, 61472107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Liu.

Additional information

Conflict of interest The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, J., Ding, X. & Liu, T. A Gaussian copula regression model for movie box-office revenues prediction. Sci. China Inf. Sci. 60, 092103 (2017). https://doi.org/10.1007/s11432-015-0905-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-015-0905-6

Keywords

关键词

Navigation