Graph Fusion Network for Text Classification
Introduction
Text classification has been widely studied in many real-world applications, such as sentiment analysis [1], [2], [3], recommendation systems [4], [5], community detection [6], and anomaly detection [7]. Over years, with the remarkable domain-expert-free feature-learning ability, various neural network models have been successfully applied to the text classification task. Among them, Recurrent Neural Networks (RNNs) [8], Convolutional Neural Networks (CNNs) [9], [10], Transformer [11] and their variants [12] are representative deep learning paradigms. Specially, powered with a latent memory, RNNs are good at processing sequential data and widely adopted in natural language processing [13], [14]. CNNs are able to exploit the shift-invariance, local connectivity, and compositionality of data, which make them popular in computer vision and natural language processing [15], [16], [17]. Some other works manage to combine RNNs and CNNs to capture local features and global features [18]. These models are proved to be effective for capturing hidden patterns in the Euclidean space.
Recently, there is an increasing number of applications where data are generated from non-Euclidean domains and represented in the form of graphs. For example, in e-commerce, users and product items can be considered as two kinds of nodes. Together with the relations between them, they are expressed in a heterogeneous graph. In the citation network, authors and their publications can be regarded as two kinds of nodes. Together with the author-to-publication, publication-to-author, publication-to-publication, and author-to-author relations, they are represented as a complex graph. However, previous neural networks cannot be applied to non-Euclidean domains directly. Driven by the strong practical application needs and the potential research value, graph neural networks (GNNs) have been widely studied and employed in real life [19], [20], [21], [22]. For example, in community detection, GNNs are adopted to model complex graph structures between user nodes and their relations in an inherently convenient way [6]. In anomaly detection, GNNs are successfully applied to capture graph information to help identify anomalous graph objects (i.e., nodes, edges, and sub-graphs) [7].
When applied to text classification, GNNs allow the features of nodes at any location to directly attend to each other according to structural information (e.g., the relations between nodes). In detail, GNN-based methods first explicitly introduce the relations (i.e., adding edges) between words or terms in the graph construction stage. Then they encode this structural information into the learning process in the graph reasoning stage. For example, TextGCN [23] considers words and documents as two kinds of nodes to build a corpus-level heterogeneous text graph. [24] extends TextGCN and employs a tensor to model different graphs. This line of works trains their system in the transductive setting, which means they can exploit the mutual relationships between training and test examples [25]. Nonetheless, they need to pre-establish all documents (including test documents) in their graphs. This leads to the need of reconstructing the whole text graph and retraining the system from scratch when new documents come. This is inefficient and unrealistic when applied to real-world scenarios. To address this problem, [26] adopts the message passing framework [21] and mini-batch-based feature propagation mechanism. The drawback of this method is that it ignores the quality of the graph which could result in sub-optimal performance. To summarize, there remain two major limitations in GNN methods: (1) these methods are unable to easily adapt to new documents; (2) most of them ignore the quality of text graphs. These impede its wide application in practical scenarios. In this paper, we propose a Graph Fusion Network (GFN), which attempts to overcome these limitations and further boost system performance on text classification.
GFN consists of a graph construction stage and a graph reasoning stage. In the graph construction stage, GFN manage to overcome the two limitations mentioned above. In detail, For the first limitation, instead of pre-defining all documents as nodes in the text graph, GFN discards document-level nodes and builds pure word-level text graphs. To generate document embeddings, GFN fuses word embeddings according to the document-level structural information on the fly. In this way, the system does not need to reconstruct the text graphs and retrain the learning system when facing new documents. For the second limitation, it is known that constructing an ideal text graph, which can exactly capture all structural knowledge, is a nontrivial task, while constructing text graphs by different methods may contain different views of information [24], [26]. Thus, GFN constructs different corpus-level graphs by transforming external knowledge (i.e., pre-calculated co-occurrence statistic and pre-trained embedding) as structural information, and integrate them to compensate each other.
In the graph reasoning process, GFN adopts three steps (i.e. graph learning, graph convolution, and graph fusion) to boost system performance. First, GFN adds a graph learning step to adjust the initialized edge weights to better serve the task. Then, GFN adopts the message passing mechanism [21] for graph convolution computing. At last, GFN applies a late-fusion paradigm (i.e. fuse the logits or final decision results), which consists of an elaborately designed multi-head-based fusion module, to integrate the results coming from each graph in the graph fusion step. Importantly, different from the early-fusion paradigm (i.e. feature-level fusion), late fusion brings in a number of advantages. First, it avoids the early cross-contamination due to noise and irrelevant information contained in each graph. Second, it inherits the robustness and empirical good performance of ensemble learning. Third, the late fusion paradigm retains diverse representation ability. The algorithm flow chart of GFN is shown in Fig. 1.
In sum, our contributions can be concluded as follows:
- •
To overcome the two limitations that remained in the GNN-based text classification task, in the graph construction stage, (1) we propose to exploit different external-knowledge-induced text graphs to better capture the structural information between words and across documents, and (2) we build homogeneous text graphs, which remove the dependency between the acquisition of document embedding and graph construction process, to make the system easily adapt to new documents.
- •
To further boost the system performance, we propose to adopt three steps in the graph reasoning stage: a graph learning step, a graph convolution step, and a fusion step. On the one hand, the learning ability in the graph learning step can derive more flexible and task-oriented graphs. On the other hand, the integration of multiple views of the graph information at the graph fusion stage can further boost the performance.
- •
This paper presents a unified Graph Fusion Network (GFN) for text classification. Extensive experiments on benchmark datasets validate the superiority of our framework.
The rest of this paper is organized as follows. Section 2 introduces the related work and its relation with our work. Section 3 demonstrates how to construct text graphs by incorporating external knowledge. Section 4 illustrates the reasoning process over the text graphs. Section 5 introduces the experimental datasets and implementation details and then elaborates on the experimental results. Section 6 concludes this paper and discusses future work.
Section snippets
Related work
In this section, we first discuss the role of two kinds of knowledge (i.e. co-occurrence statistic and word embedding) played in the previous works. Then, we review the recent text classification techniques, introduce the GNN-based method, and compare their relations with our work.
Text graph construction
In this section, we present how to transform the syntactic and semantic information, which is contained in co-occurrence statistics and pre-trained word embedding, into structural information and construct text graphs. In the following, we first give out the problem settings and notations, then demonstrate our proposed graph construction methods.
Graph reasoning
In this section, we show the reasoning process of GFN. GFN reasons over document-level subgraphs through the following three steps: graph learning, graph convolution, and graph fusion.
Experiments
In this section, we evaluate GFN on text classification task and answer the following questions:
- (1)
Can GFN achieve superior performance when compared with state-of-the-art models? (Section 5.2)
- (2)
How is the effectiveness of each text graph? (Section 5.3)
- (3)
How is the effectiveness of the fusion module? (Section 5.4)
- (4)
How many heads are enough for the system? (Section 5.5)
- (5)
Is our model memory- and time-efficient? (Section 5.6)
- (6)
How does our system make mistakes? (Section 5.7)
Conclusion and future work
In this paper, we proposed a Graph Fusion Network (GFN), which supports the efficient inference for the new documents without retraining the system and can better capture structural information by integrating different views of text graphs. Experimental results illustrate the superiority of our proposed method. Specially, different views of graphs are complementary and the elaborately designed multi-head fusion module can further boost the system performance.
In the future, we think that the
CRediT authorship contribution statement
Yong Dai: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Formal analysis, Validation. Linjun Shou: Supervision, Writing - review & editing, Resources. Ming Gong: Project administration, Funding acquisition. Xiaolin Xia: Software, Investigation, Visualization. Zhao Kang: Writing - review & editing. Zenglin Xu: Supervision, Writing – review & editing. Daxin Jiang: Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This paper was partially supported by the National Key Research and Development Program of China (Nos. 2018YFB1005100, 2018YFB1005104), and a key program of fundamental research from Shenzhen Science and Technology Innovation Commission, China (No. JCYJ20200109113403826).
References (84)
- et al.
Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification
Neurocomputing
(2016) - et al.
Word co-occurrence features for text classification
Inf. Syst.
(2011) - et al.
Representation learning and NLP
- et al.
A dynamic parameter enhanced network for distant supervised relation extraction
Knowl.-Based Syst.
(2020) - et al.
Bidirectional LSTM with attention mechanism and convolutional layer for text classification
Neurocomputing
(2019) - et al.
Arabic text classification using deep learning models
Inf. Process. Manage.
(2020) - et al.
Text classification using capsules
Neurocomputing
(2020) - et al.
Learning word dependencies in text by means of a deep recurrent belief network
Knowl.-Based Syst.
(2016) - et al.
A generative model for category text generation
Inform. Sci.
(2018) - et al.
Recurrent neural network for text classification with multi-task learning
(2016)
Deep learning for sentiment analysis: A survey
Wiley Interdiscip. Rev.: Data Min. Knowl. Dis.
Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis
Semantics-aware content-based recommender systems
Neural relational topic models for scientific article analysis
A comprehensive survey on community detection with deep learning
A comprehensive survey on graph anomaly detection with deep learning
Reading text in the wild with convolutional neural networks
Int. J. Comput. Vis.
Attention is all you need
Text classification algorithms: A survey
Information
A deep learning approach for intrusion detection using recurrent neural networks
IEEE Access
Recent advances in recurrent neural networks
A survey of the recent architectures of deep convolutional neural networks
Artif. Intell. Rev.
A guide to convolutional neural networks for computer vision
Synth. Lect. Comput. Vis.
A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification
Ensemble application of convolutional and recurrent neural networks for multi-label text categorization
A comprehensive survey on graph neural networks
Semi-supervised classification with graph convolutional networks
Neural message passing for quantum chemistry
Inductive representation learning on large graphs
Tensor graph convolutional networks for text classification
On inductive-transductive learning with graph neural networks
IEEE Trans. Pattern Anal. Mach. Intell.
Text level graph neural network for text classification
Keyword extraction from a single document using word co-occurrence statistical information
Int. J. Artif. Intell. Tools
Extracting semantic representations from word co-occurrence statistics: A computational study
Behav. Res. Methods
Cess-a system to categorize bangla web text documents
ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)
GloVe: Global vectors for word representation
Efficient estimation of word representations in vector space
Neural word embedding as implicit matrix factorization
Robust spectral inference for joint stochastic matrix factorization
Cited by (38)
Detecting mental and physical disorders using multi-task learning equipped with knowledge graph attention network
2024, Artificial Intelligence in MedicineHierarchical graph fusion network and a new argumentative dataset for multiparty dialogue discourse parsing
2024, Information Processing and ManagementDisentangled representation learning for collaborative filtering based on hyperbolic geometry
2023, Knowledge-Based SystemsLA-MGFM: A legal judgment prediction method via sememe-enhanced graph neural networks and multi-graph fusion mechanism
2023, Information Processing and ManagementText FCG: Fusing Contextual Information via Graph Learning for text classification
2023, Expert Systems with Applications