Elsevier

Knowledge-Based Systems

Volume 236, 25 January 2022, 107659
Knowledge-Based Systems

Graph Fusion Network for Text Classification

https://doi.org/10.1016/j.knosys.2021.107659Get rights and content

Highlights

  • We transform external knowledge into structural information to build better graphs.

  • Our model can perform inference for new documents without rebuilding the whole graph.

  • We propose a unified Graph Fusion Network (GFN) for text classification.

  • Our model achieves the best performance.

Abstract

Text classification is an important and classical problem in natural language processing. Recently, Graph Neural Networks (GNNs) have been widely applied in text classification and achieved outstanding performance. Despite the success of GNNs on text classification, existing methods are still limited in two main aspects. On the one hand, transductive methods cannot easily adapt to new documents. Since transductive methods incorporate all documents into their text graph, they need to reconstruct the whole graph and retrain their system from scratch when new documents come. However, this is not applicable to real-world situations. On the other hand, many state-of-the-art algorithms ignore the quality of text graphs, which may lead to sub-optimal performance. To address these problems, we propose a Graph Fusion Network (GFN), which can overcome these limitations and boost text classification performance. In detail, in the graph construction stage, we build homogeneous text graphs with word nodes, which makes the learning system capable of making inference on new documents without rebuilding the whole text graph. Then, we propose to transform external knowledge into structural information and integrate different views of text graphs to capture more structural information. In the graph reasoning stage, we divide the process into three steps: graph learning, graph convolution, and graph fusion. In the graph learning step, we adopt a graph learning layer to further adapt text graphs. In the graph fusion step, we design a multi-head fusion module to integrate different opinions. Experimental results on five benchmarks demonstrate the superiority of our proposed method.

Introduction

Text classification has been widely studied in many real-world applications, such as sentiment analysis [1], [2], [3], recommendation systems [4], [5], community detection [6], and anomaly detection [7]. Over years, with the remarkable domain-expert-free feature-learning ability, various neural network models have been successfully applied to the text classification task. Among them, Recurrent Neural Networks (RNNs) [8], Convolutional Neural Networks (CNNs) [9], [10], Transformer [11] and their variants [12] are representative deep learning paradigms. Specially, powered with a latent memory, RNNs are good at processing sequential data and widely adopted in natural language processing [13], [14]. CNNs are able to exploit the shift-invariance, local connectivity, and compositionality of data, which make them popular in computer vision and natural language processing [15], [16], [17]. Some other works manage to combine RNNs and CNNs to capture local features and global features [18]. These models are proved to be effective for capturing hidden patterns in the Euclidean space.

Recently, there is an increasing number of applications where data are generated from non-Euclidean domains and represented in the form of graphs. For example, in e-commerce, users and product items can be considered as two kinds of nodes. Together with the relations between them, they are expressed in a heterogeneous graph. In the citation network, authors and their publications can be regarded as two kinds of nodes. Together with the author-to-publication, publication-to-author, publication-to-publication, and author-to-author relations, they are represented as a complex graph. However, previous neural networks cannot be applied to non-Euclidean domains directly. Driven by the strong practical application needs and the potential research value, graph neural networks (GNNs) have been widely studied and employed in real life [19], [20], [21], [22]. For example, in community detection, GNNs are adopted to model complex graph structures between user nodes and their relations in an inherently convenient way [6]. In anomaly detection, GNNs are successfully applied to capture graph information to help identify anomalous graph objects (i.e., nodes, edges, and sub-graphs) [7].

When applied to text classification, GNNs allow the features of nodes at any location to directly attend to each other according to structural information (e.g., the relations between nodes). In detail, GNN-based methods first explicitly introduce the relations (i.e., adding edges) between words or terms in the graph construction stage. Then they encode this structural information into the learning process in the graph reasoning stage. For example, TextGCN [23] considers words and documents as two kinds of nodes to build a corpus-level heterogeneous text graph. [24] extends TextGCN and employs a tensor to model different graphs. This line of works trains their system in the transductive setting, which means they can exploit the mutual relationships between training and test examples [25]. Nonetheless, they need to pre-establish all documents (including test documents) in their graphs. This leads to the need of reconstructing the whole text graph and retraining the system from scratch when new documents come. This is inefficient and unrealistic when applied to real-world scenarios. To address this problem, [26] adopts the message passing framework [21] and mini-batch-based feature propagation mechanism. The drawback of this method is that it ignores the quality of the graph which could result in sub-optimal performance. To summarize, there remain two major limitations in GNN methods: (1) these methods are unable to easily adapt to new documents; (2) most of them ignore the quality of text graphs. These impede its wide application in practical scenarios. In this paper, we propose a Graph Fusion Network (GFN), which attempts to overcome these limitations and further boost system performance on text classification.

GFN consists of a graph construction stage and a graph reasoning stage. In the graph construction stage, GFN manage to overcome the two limitations mentioned above. In detail, For the first limitation, instead of pre-defining all documents as nodes in the text graph, GFN discards document-level nodes and builds pure word-level text graphs. To generate document embeddings, GFN fuses word embeddings according to the document-level structural information on the fly. In this way, the system does not need to reconstruct the text graphs and retrain the learning system when facing new documents. For the second limitation, it is known that constructing an ideal text graph, which can exactly capture all structural knowledge, is a nontrivial task, while constructing text graphs by different methods may contain different views of information  [24], [26]. Thus, GFN constructs different corpus-level graphs by transforming external knowledge (i.e., pre-calculated co-occurrence statistic and pre-trained embedding) as structural information, and integrate them to compensate each other.

In the graph reasoning process, GFN adopts three steps (i.e. graph learning, graph convolution, and graph fusion) to boost system performance. First, GFN adds a graph learning step to adjust the initialized edge weights to better serve the task. Then, GFN adopts the message passing mechanism [21] for graph convolution computing. At last, GFN applies a late-fusion paradigm (i.e. fuse the logits or final decision results), which consists of an elaborately designed multi-head-based fusion module, to integrate the results coming from each graph in the graph fusion step. Importantly, different from the early-fusion paradigm (i.e. feature-level fusion), late fusion brings in a number of advantages. First, it avoids the early cross-contamination due to noise and irrelevant information contained in each graph. Second, it inherits the robustness and empirical good performance of ensemble learning. Third, the late fusion paradigm retains diverse representation ability. The algorithm flow chart of GFN is shown in Fig. 1.

In sum, our contributions can be concluded as follows:

  • To overcome the two limitations that remained in the GNN-based text classification task, in the graph construction stage, (1) we propose to exploit different external-knowledge-induced text graphs to better capture the structural information between words and across documents, and (2) we build homogeneous text graphs, which remove the dependency between the acquisition of document embedding and graph construction process, to make the system easily adapt to new documents.

  • To further boost the system performance, we propose to adopt three steps in the graph reasoning stage: a graph learning step, a graph convolution step, and a fusion step. On the one hand, the learning ability in the graph learning step can derive more flexible and task-oriented graphs. On the other hand, the integration of multiple views of the graph information at the graph fusion stage can further boost the performance.

  • This paper presents a unified Graph Fusion Network (GFN) for text classification. Extensive experiments on benchmark datasets validate the superiority of our framework.

The rest of this paper is organized as follows. Section 2 introduces the related work and its relation with our work. Section 3 demonstrates how to construct text graphs by incorporating external knowledge. Section 4 illustrates the reasoning process over the text graphs. Section 5 introduces the experimental datasets and implementation details and then elaborates on the experimental results. Section 6 concludes this paper and discusses future work.

Section snippets

Related work

In this section, we first discuss the role of two kinds of knowledge (i.e. co-occurrence statistic and word embedding) played in the previous works. Then, we review the recent text classification techniques, introduce the GNN-based method, and compare their relations with our work.

Text graph construction

In this section, we present how to transform the syntactic and semantic information, which is contained in co-occurrence statistics and pre-trained word embedding, into structural information and construct text graphs. In the following, we first give out the problem settings and notations, then demonstrate our proposed graph construction methods.

Graph reasoning

In this section, we show the reasoning process of GFN. GFN reasons over document-level subgraphs through the following three steps: graph learning, graph convolution, and graph fusion.

Experiments

In this section, we evaluate GFN on text classification task and answer the following questions:

  • (1)

    Can GFN achieve superior performance when compared with state-of-the-art models? (Section 5.2)

  • (2)

    How is the effectiveness of each text graph? (Section 5.3)

  • (3)

    How is the effectiveness of the fusion module? (Section 5.4)

  • (4)

    How many heads are enough for the system? (Section 5.5)

  • (5)

    Is our model memory- and time-efficient? (Section 5.6)

  • (6)

    How does our system make mistakes? (Section 5.7)

Conclusion and future work

In this paper, we proposed a Graph Fusion Network (GFN), which supports the efficient inference for the new documents without retraining the system and can better capture structural information by integrating different views of text graphs. Experimental results illustrate the superiority of our proposed method. Specially, different views of graphs are complementary and the elaborately designed multi-head fusion module can further boost the system performance.

In the future, we think that the

CRediT authorship contribution statement

Yong Dai: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Formal analysis, Validation. Linjun Shou: Supervision, Writing - review & editing, Resources. Ming Gong: Project administration, Funding acquisition. Xiaolin Xia: Software, Investigation, Visualization. Zhao Kang: Writing - review & editing. Zenglin Xu: Supervision, Writing – review & editing. Daxin Jiang: Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This paper was partially supported by the National Key Research and Development Program of China (Nos. 2018YFB1005100, 2018YFB1005104), and a key program of fundamental research from Shenzhen Science and Technology Innovation Commission, China (No. JCYJ20200109113403826).

References (84)

  • ZhangL. et al.

    Deep learning for sentiment analysis: A survey

    Wiley Interdiscip. Rev.: Data Min. Knowl. Dis.

    (2018)
  • DaiY. et al.

    Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis

  • De GemmisM. et al.

    Semantics-aware content-based recommender systems

  • BaiH. et al.

    Neural relational topic models for scientific article analysis

  • SuX. et al.

    A comprehensive survey on community detection with deep learning

    (2021)
  • MaX. et al.

    A comprehensive survey on graph anomaly detection with deep learning

    (2021)
  • S. Lai, L. Xu, K. Liu, J. Zhao, Recurrent convolutional neural networks for text classification, in: Twenty-Ninth AAAI...
  • JaderbergM. et al.

    Reading text in the wild with convolutional neural networks

    Int. J. Comput. Vis.

    (2016)
  • VaswaniA. et al.

    Attention is all you need

  • KowsariK. et al.

    Text classification algorithms: A survey

    Information

    (2019)
  • YinC. et al.

    A deep learning approach for intrusion detection using recurrent neural networks

    IEEE Access

    (2017)
  • SalehinejadH. et al.

    Recent advances in recurrent neural networks

    (2017)
  • KhanA. et al.

    A survey of the recent architectures of deep convolutional neural networks

    Artif. Intell. Rev.

    (2020)
  • KhanS. et al.

    A guide to convolutional neural networks for computer vision

    Synth. Lect. Comput. Vis.

    (2018)
  • ZhangY. et al.

    A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification

    (2015)
  • ChenG. et al.

    Ensemble application of convolutional and recurrent neural networks for multi-label text categorization

  • WuZ. et al.

    A comprehensive survey on graph neural networks

    (2019)
  • KipfT.N. et al.

    Semi-supervised classification with graph convolutional networks

    (2016)
  • GilmerJ. et al.

    Neural message passing for quantum chemistry

  • HamiltonW. et al.

    Inductive representation learning on large graphs

  • L. Yao, C. Mao, Y. Luo, Graph convolutional networks for text classification, in: Proceedings of the AAAI Conference on...
  • LiuX. et al.

    Tensor graph convolutional networks for text classification

    (2020)
  • CianoG. et al.

    On inductive-transductive learning with graph neural networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2021)
  • HuangL. et al.

    Text level graph neural network for text classification

  • MatsuoY. et al.

    Keyword extraction from a single document using word co-occurrence statistical information

    Int. J. Artif. Intell. Tools

    (2004)
  • BullinariaJ.A. et al.

    Extracting semantic representations from word co-occurrence statistics: A computational study

    Behav. Res. Methods

    (2007)
  • DharA. et al.

    Cess-a system to categorize bangla web text documents

    ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)

    (2020)
  • PenningtonJ. et al.

    GloVe: Global vectors for word representation

  • MikolovT. et al.

    Efficient estimation of word representations in vector space

    (2013)
  • LevyO. et al.

    Neural word embedding as implicit matrix factorization

  • LeeM. et al.

    Robust spectral inference for joint stochastic matrix factorization

  • S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, M. Zhu, A practical algorithm for topic modeling...
  • Cited by (38)

    View all citing articles on Scopus
    View full text