ABSTRACT
Representation learning models map data instances into a low-dimensional vector space, thus facilitating the deployment of subsequent models such as classification and clustering models, or the implementation of downstream applications such as recommendation and anomaly detection. However, the outcome of representation learning is difficult to be directly understood by users, since each dimension of the latent space may not have any specific meaning. Understanding representation learning could be beneficial to many applications. For example, in recommender systems, knowing why a user instance is mapped to a certain position in the latent space may unveil the user's interests and profile. In this paper, we propose an interpretation framework to understand and describe how representation vectors distribute in the latent space. Specifically, we design a coding scheme to transform representation instances into spatial codes to indicate their locations in the latent space. Following that, a multimodal autoencoder is built for generating the description of a representation instance given its spatial codes. The coding scheme enables indication of position with different granularity. The incorporation of autoencoder makes the framework capable of dealing with different types of data. Several metrics are designed to evaluate interpretation results. Experiments under various application scenarios and different representation learning models are conducted to demonstrate the flexibility and effectiveness of the proposed framework.
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence (2013). Google ScholarDigital Library
- Minmin Chen. 2017. Efficient vector representation for documents through corruption. arXiv preprint arXiv:1707.02377 (2017).Google Scholar
- Ting Chen, Lu-An Tang, Yizhou Sun, Zhengzhang Chen, and Kai Zhang. 2016. Entity embedding-based anomaly detection for heterogeneous categorical events. arXiv preprint arXiv:1608.07502 (2016). Google ScholarDigital Library
- Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NIPS . Google ScholarDigital Library
- Lingyang Chu, Xia Hu, Juhua Hu, Lanjun Wang, and Jian Pei. 2018. Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution. arXiv preprint arXiv:1802.06259 (2018). Google ScholarDigital Library
- Chris Ding, Xiaofeng He, and Horst D Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM . SIAM.Google Scholar
- Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining . Google ScholarDigital Library
- Mengnan Du, Ninghao Liu, and Xia Hu. 2018a. Techniques for Interpretable Machine Learning. arXiv preprint arXiv:1808.00033 (2018).Google Scholar
- Mengnan Du, Ninghao Liu, Qingquan Song, and Xia Hu. 2018b. Towards Explanation of DNN-based Prediction with Guided Feature Inversion. In KDD . Google ScholarDigital Library
- F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS) (2016). Google ScholarDigital Library
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173--182. Google ScholarDigital Library
- Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work . Google ScholarDigital Library
- Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed attributed network embedding. In WSDM. ACM. Google ScholarDigital Library
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR .Google Scholar
- Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In NIPS . Google ScholarDigital Library
- Seyoung Kim and Eric P Xing. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML . Google ScholarDigital Library
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Pang Wei Koh and Percy Liang. 2017. Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning . Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009). Google ScholarDigital Library
- Da Kuang, Sangwoon Yun, and Haesun Park. 2015. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization (2015). Google ScholarDigital Library
- Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In KDD . Google ScholarDigital Library
- Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556--562.Google Scholar
- Ninghao Liu, Xiao Huang, Jundong Li, and Xia Hu. 2018a. On Interpretation of Network Embedding via Taxonomy Induction. In KDD . Google ScholarDigital Library
- Ninghao Liu, Hongxia Yang, and Xia Hu. 2018b. Adversarial Detection with Model Interpretation. In KDD . Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research (2008).Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2017. Methods for Interpreting and Understanding Deep Neural Networks. arXiv preprint arXiv:1706.07979 (2017).Google Scholar
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In ICML . Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD. ACM. Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In KDD .Google Scholar
- Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.. In ICCV .Google Scholar
- Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In Proceedings of the Eleventh ACM Conference on Recommender Systems . Google ScholarDigital Library
- Aditya Sharma, Partha Talukdar, et almbox. 2018. Towards Understanding the Geometry of Knowledge Graph Embeddings. In ACL .Google Scholar
- Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing large-scale and high-dimensional data. In WWW . Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW . Google ScholarDigital Library
- Robert Tibshirani and Guenther Walther. 2005. Cluster validation by prediction strength. Journal of Computational and Graphical Statistics (2005).Google Scholar
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In ICML . Google ScholarDigital Library
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research (2010). Google ScholarDigital Library
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing (2007).Google Scholar
- Fei Wang, Tao Li, Xin Wang, Shenghuo Zhu, and Chris Ding. 2011. Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery (2011). Google ScholarDigital Library
- Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In CVPR .Google Scholar
- Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015b. Collaborative deep learning for recommender systems. In KDD . Google ScholarDigital Library
- Suhang Wang, Jiliang Tang, Yilin Wang, and Huan Liu. 2015a. Exploring Implicit Hierarchical Structures for Recommender Systems.. In IJCAI . Google ScholarDigital Library
- Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. 2017. Interpreting cnn knowledge via an explanatory graph. arXiv preprint arXiv:1708.01785 (2017).Google Scholar
- Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convolutional neural networks. In CVPR .Google Scholar
- Zhenyue Zhang and Hongyuan Zha. 2004. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM journal on scientific computing (2004). Google ScholarDigital Library
- Vincent W Zheng, Yu Zheng, Xing Xie, and Qiang Yang. 2010. Collaborative location and activity recommendations with GPS history data. In WWW . Google ScholarDigital Library
Index Terms
- Representation Interpretation with Spatial Encoding and Multimodal Analytics
Recommendations
Disentangled Item Representation for Recommender Systems
Survey Paper and Regular PaperItem representations in recommendation systems are expected to reveal the properties of items. Collaborative recommender methods usually represent an item as one single latent vector. Nowadays the e-commercial platforms provide various kinds of ...
Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementThe Web has accumulated a rich source of information, such as text, image, rating, etc, which represent different aspects of user preferences. However, the heterogeneous nature of this information makes it difficult for recommender systems to leverage ...
Representation learning with collaborative autoencoder for personalized recommendation
AbstractIn the past decades, recommendation systems have provided lots of valuable personalized suggestions for the users to address the problem of information over-loaded. Collaborative Filtering (CF) is one of the most commonly applied and successful ...
Highlights- Two different autoencoders are used to capture characteristics for users and items.
- Manifold regularization is integrated into autoencoder for user’s features learning.
- The comprehensive experiments evaluate the effectiveness of ...
Comments