Graph Fusion Network for Text Classification

doi:10.1016/j.knosys.2021.107659

Knowledge-Based Systems

Volume 236, 25 January 2022, 107659

https://doi.org/10.1016/j.knosys.2021.107659 Get rights and content

Highlights

•
We transform external knowledge into structural information to build better graphs.
•
Our model can perform inference for new documents without rebuilding the whole graph.
•
We propose a unified Graph Fusion Network (GFN) for text classification.
•
Our model achieves the best performance.

Abstract

Text classification is an important and classical problem in natural language processing. Recently, Graph Neural Networks (GNNs) have been widely applied in text classification and achieved outstanding performance. Despite the success of GNNs on text classification, existing methods are still limited in two main aspects. On the one hand, transductive methods cannot easily adapt to new documents. Since transductive methods incorporate all documents into their text graph, they need to reconstruct the whole graph and retrain their system from scratch when new documents come. However, this is not applicable to real-world situations. On the other hand, many state-of-the-art algorithms ignore the quality of text graphs, which may lead to sub-optimal performance. To address these problems, we propose a Graph Fusion Network (GFN), which can overcome these limitations and boost text classification performance. In detail, in the graph construction stage, we build homogeneous text graphs with word nodes, which makes the learning system capable of making inference on new documents without rebuilding the whole text graph. Then, we propose to transform external knowledge into structural information and integrate different views of text graphs to capture more structural information. In the graph reasoning stage, we divide the process into three steps: graph learning, graph convolution, and graph fusion. In the graph learning step, we adopt a graph learning layer to further adapt text graphs. In the graph fusion step, we design a multi-head fusion module to integrate different opinions. Experimental results on five benchmarks demonstrate the superiority of our proposed method.

Introduction

Text classification has been widely studied in many real-world applications, such as sentiment analysis [1], [2], [3], recommendation systems [4], [5], community detection [6], and anomaly detection [7]. Over years, with the remarkable domain-expert-free feature-learning ability, various neural network models have been successfully applied to the text classification task. Among them, Recurrent Neural Networks (RNNs) [8], Convolutional Neural Networks (CNNs) [9], [10], Transformer [11] and their variants [12] are representative deep learning paradigms. Specially, powered with a latent memory, RNNs are good at processing sequential data and widely adopted in natural language processing [13], [14]. CNNs are able to exploit the shift-invariance, local connectivity, and compositionality of data, which make them popular in computer vision and natural language processing [15], [16], [17]. Some other works manage to combine RNNs and CNNs to capture local features and global features [18]. These models are proved to be effective for capturing hidden patterns in the Euclidean space.

Recently, there is an increasing number of applications where data are generated from non-Euclidean domains and represented in the form of graphs. For example, in e-commerce, users and product items can be considered as two kinds of nodes. Together with the relations between them, they are expressed in a heterogeneous graph. In the citation network, authors and their publications can be regarded as two kinds of nodes. Together with the author-to-publication, publication-to-author, publication-to-publication, and author-to-author relations, they are represented as a complex graph. However, previous neural networks cannot be applied to non-Euclidean domains directly. Driven by the strong practical application needs and the potential research value, graph neural networks (GNNs) have been widely studied and employed in real life [19], [20], [21], [22]. For example, in community detection, GNNs are adopted to model complex graph structures between user nodes and their relations in an inherently convenient way [6]. In anomaly detection, GNNs are successfully applied to capture graph information to help identify anomalous graph objects (i.e., nodes, edges, and sub-graphs) [7].

When applied to text classification, GNNs allow the features of nodes at any location to directly attend to each other according to structural information (e.g., the relations between nodes). In detail, GNN-based methods first explicitly introduce the relations (i.e., adding edges) between words or terms in the graph construction stage. Then they encode this structural information into the learning process in the graph reasoning stage. For example, TextGCN [23] considers words and documents as two kinds of nodes to build a corpus-level heterogeneous text graph. [24] extends TextGCN and employs a tensor to model different graphs. This line of works trains their system in the transductive setting, which means they can exploit the mutual relationships between training and test examples [25]. Nonetheless, they need to pre-establish all documents (including test documents) in their graphs. This leads to the need of reconstructing the whole text graph and retraining the system from scratch when new documents come. This is inefficient and unrealistic when applied to real-world scenarios. To address this problem, [26] adopts the message passing framework [21] and mini-batch-based feature propagation mechanism. The drawback of this method is that it ignores the quality of the graph which could result in sub-optimal performance. To summarize, there remain two major limitations in GNN methods: (1) these methods are unable to easily adapt to new documents; (2) most of them ignore the quality of text graphs. These impede its wide application in practical scenarios. In this paper, we propose a Graph Fusion Network (GFN), which attempts to overcome these limitations and further boost system performance on text classification.

GFN consists of a graph construction stage and a graph reasoning stage. In the graph construction stage, GFN manage to overcome the two limitations mentioned above. In detail, For the first limitation, instead of pre-defining all documents as nodes in the text graph, GFN discards document-level nodes and builds pure word-level text graphs. To generate document embeddings, GFN fuses word embeddings according to the document-level structural information on the fly. In this way, the system does not need to reconstruct the text graphs and retrain the learning system when facing new documents. For the second limitation, it is known that constructing an ideal text graph, which can exactly capture all structural knowledge, is a nontrivial task, while constructing text graphs by different methods may contain different views of information [24], [26]. Thus, GFN constructs different corpus-level graphs by transforming external knowledge (i.e., pre-calculated co-occurrence statistic and pre-trained embedding) as structural information, and integrate them to compensate each other.

In the graph reasoning process, GFN adopts three steps (i.e. graph learning, graph convolution, and graph fusion) to boost system performance. First, GFN adds a graph learning step to adjust the initialized edge weights to better serve the task. Then, GFN adopts the message passing mechanism [21] for graph convolution computing. At last, GFN applies a late-fusion paradigm (i.e. fuse the logits or final decision results), which consists of an elaborately designed multi-head-based fusion module, to integrate the results coming from each graph in the graph fusion step. Importantly, different from the early-fusion paradigm (i.e. feature-level fusion), late fusion brings in a number of advantages. First, it avoids the early cross-contamination due to noise and irrelevant information contained in each graph. Second, it inherits the robustness and empirical good performance of ensemble learning. Third, the late fusion paradigm retains diverse representation ability. The algorithm flow chart of GFN is shown in Fig. 1.

In sum, our contributions can be concluded as follows:

•
To overcome the two limitations that remained in the GNN-based text classification task, in the graph construction stage, (1) we propose to exploit different external-knowledge-induced text graphs to better capture the structural information between words and across documents, and (2) we build homogeneous text graphs, which remove the dependency between the acquisition of document embedding and graph construction process, to make the system easily adapt to new documents.
•
To further boost the system performance, we propose to adopt three steps in the graph reasoning stage: a graph learning step, a graph convolution step, and a fusion step. On the one hand, the learning ability in the graph learning step can derive more flexible and task-oriented graphs. On the other hand, the integration of multiple views of the graph information at the graph fusion stage can further boost the performance.
•
This paper presents a unified Graph Fusion Network (GFN) for text classification. Extensive experiments on benchmark datasets validate the superiority of our framework.

The rest of this paper is organized as follows. Section 2 introduces the related work and its relation with our work. Section 3 demonstrates how to construct text graphs by incorporating external knowledge. Section 4 illustrates the reasoning process over the text graphs. Section 5 introduces the experimental datasets and implementation details and then elaborates on the experimental results. Section 6 concludes this paper and discusses future work.

Section snippets

Related work

In this section, we first discuss the role of two kinds of knowledge (i.e. co-occurrence statistic and word embedding) played in the previous works. Then, we review the recent text classification techniques, introduce the GNN-based method, and compare their relations with our work.

Text graph construction

In this section, we present how to transform the syntactic and semantic information, which is contained in co-occurrence statistics and pre-trained word embedding, into structural information and construct text graphs. In the following, we first give out the problem settings and notations, then demonstrate our proposed graph construction methods.

Graph reasoning

In this section, we show the reasoning process of GFN. GFN reasons over document-level subgraphs through the following three steps: graph learning, graph convolution, and graph fusion.

Experiments

In this section, we evaluate GFN on text classification task and answer the following questions:

(1)
Can GFN achieve superior performance when compared with state-of-the-art models? (Section 5.2)
(2)
How is the effectiveness of each text graph? (Section 5.3)
(3)
How is the effectiveness of the fusion module? (Section 5.4)
(4)
How many heads are enough for the system? (Section 5.5)
(5)
Is our model memory- and time-efficient? (Section 5.6)
(6)
How does our system make mistakes? (Section 5.7)

Conclusion and future work

In this paper, we proposed a Graph Fusion Network (GFN), which supports the efficient inference for the new documents without retraining the system and can better capture structural information by integrating different views of text graphs. Experimental results illustrate the superiority of our proposed method. Specially, different views of graphs are complementary and the elaborately designed multi-head fusion module can further boost the system performance.

In the future, we think that the

CRediT authorship contribution statement

Yong Dai: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Formal analysis, Validation. Linjun Shou: Supervision, Writing - review & editing, Resources. Ming Gong: Project administration, Funding acquisition. Xiaolin Xia: Software, Investigation, Visualization. Zhao Kang: Writing - review & editing. Zenglin Xu: Supervision, Writing – review & editing. Daxin Jiang: Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This paper was partially supported by the National Key Research and Development Program of China (Nos. 2018YFB1005100, 2018YFB1005104), and a key program of fundamental research from Shenzhen Science and Technology Innovation Commission, China (No. JCYJ20200109113403826).

References (84)

WangP. et al.
Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification
Neurocomputing
(2016)
FigueiredoF. et al.
Word co-occurrence features for text classification
Inf. Syst.
(2011)
LiuZ. et al.
Representation learning and NLP
GouY. et al.
A dynamic parameter enhanced network for distant supervised relation extraction
Knowl.-Based Syst.
(2020)
LiuG. et al.
Bidirectional LSTM with attention mechanism and convolutional layer for text classification
Neurocomputing
(2019)
ElnagarA. et al.
Arabic text classification using deep learning models
Inf. Process. Manage.
(2020)
KimJ. et al.
Text classification using capsules
Neurocomputing
(2020)
ChaturvediI. et al.
Learning word dependencies in text by means of a deep recurrent belief network
Knowl.-Based Syst.
(2016)
LiY. et al.
A generative model for category text generation
Inform. Sci.
(2018)
LiuP. et al.
Recurrent neural network for text classification with multi-task learning
(2016)

ZhangL. et al.

Deep learning for sentiment analysis: A survey

Wiley Interdiscip. Rev.: Data Min. Knowl. Dis.

(2018)

DaiY. et al.

Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis

De GemmisM. et al.

Semantics-aware content-based recommender systems

BaiH. et al.

Neural relational topic models for scientific article analysis

SuX. et al.

A comprehensive survey on community detection with deep learning

(2021)

MaX. et al.

A comprehensive survey on graph anomaly detection with deep learning

(2021)

S. Lai, L. Xu, K. Liu, J. Zhao, Recurrent convolutional neural networks for text classification, in: Twenty-Ninth AAAI...

JaderbergM. et al.

Reading text in the wild with convolutional neural networks

Int. J. Comput. Vis.

(2016)

VaswaniA. et al.

Attention is all you need

KowsariK. et al.

Text classification algorithms: A survey

Information

(2019)

YinC. et al.

A deep learning approach for intrusion detection using recurrent neural networks

IEEE Access

(2017)

SalehinejadH. et al.

Recent advances in recurrent neural networks

(2017)

KhanA. et al.

A survey of the recent architectures of deep convolutional neural networks

Artif. Intell. Rev.

(2020)

KhanS. et al.

A guide to convolutional neural networks for computer vision

Synth. Lect. Comput. Vis.

(2018)

ZhangY. et al.

A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification

(2015)

ChenG. et al.

Ensemble application of convolutional and recurrent neural networks for multi-label text categorization

WuZ. et al.

A comprehensive survey on graph neural networks

(2019)

KipfT.N. et al.

Semi-supervised classification with graph convolutional networks

(2016)

GilmerJ. et al.

Neural message passing for quantum chemistry

HamiltonW. et al.

Inductive representation learning on large graphs

L. Yao, C. Mao, Y. Luo, Graph convolutional networks for text classification, in: Proceedings of the AAAI Conference on...

LiuX. et al.

Tensor graph convolutional networks for text classification

(2020)

CianoG. et al.

On inductive-transductive learning with graph neural networks

IEEE Trans. Pattern Anal. Mach. Intell.

(2021)

HuangL. et al.

Text level graph neural network for text classification

MatsuoY. et al.

Keyword extraction from a single document using word co-occurrence statistical information

Int. J. Artif. Intell. Tools

(2004)

BullinariaJ.A. et al.

Extracting semantic representations from word co-occurrence statistics: A computational study

Behav. Res. Methods

(2007)

DharA. et al.

Cess-a system to categorize bangla web text documents

ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)

(2020)

PenningtonJ. et al.

GloVe: Global vectors for word representation

MikolovT. et al.

Efficient estimation of word representations in vector space

(2013)

LevyO. et al.

Neural word embedding as implicit matrix factorization

LeeM. et al.

Robust spectral inference for joint stochastic matrix factorization

S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, M. Zhu, A practical algorithm for topic modeling...

Cited by (38)

Detecting mental and physical disorders using multi-task learning equipped with knowledge graph attention network
2024, Artificial Intelligence in Medicine
Mental and physical disorders (MPD) are inextricably linked in many medical cases; psychosomatic diseases can be induced by mental concerns and psychological discomfort can ensue from physiological diseases. However, existing medical informatics studies focus on identifying mental or physical disorders from a unilateral perspective. Consequently, no existing domain knowledge base, corpus, or detection modeling approach considers mental as well as physical aspects concurrently. This paper proposes a joint modeling approach to detect MPD. First, we crawl through online medical consultation records of patients from websites and build an MPD knowledge ontology by extracting the core conceptual features of the text. Based on the ontology, an MPD knowledge graph containing 12,673 nodes and 82,195 relations is obtained using term matching with a domain thesaurus of each concept. Subsequently, an MPD corpus with fine-grained severities (None, Mild, Moderate, Severe, Dangerous) and 8909 records is constructed by formulating MPD classification criteria and a data annotation process under the guidance of domain experts. Taking the knowledge graph and corpus as the dataset, we design a multi-task learning model to detect the MPD severity, in which a knowledge graph attention network (KGAT) is embedded to better extract knowledge features. Experiments are performed to demonstrate the effectiveness of our model. Furthermore, we employ ontology-based and centrality-based methods to discover additional potential inferred knowledge, which can be captured by KGAT so as to improve the prediction performance and interpretability of our model. Our dataset has been made publicly available, so it can be further used as a medical informatics reference in the fields of psychosomatic medicine, psychiatrics, physical co-morbidity, and so on.
Hierarchical graph fusion network and a new argumentative dataset for multiparty dialogue discourse parsing
2024, Information Processing and Management
Discourse parsing in multi-party dialogue aims to extract the relationships between elementary discourse units (EDUs) such as arguments and utterances, and has numerous applications like chatbots or virtual assistants. Two significant challenges have been encountered in previous works: the obstacles in fusing various contexts in the modeling; and the lack of argumentative dialogue datasets in this field. To tackle context fusion challenges in the modeling, we introduce the Hierarchical Graph Fusion Network (HGFN). This method introduces sufficient contexts by hierarchically modeling the dialogue and minimizes context noise through a novel routing mechanism. It specifically: (1) Encodes multiple levels of contexts using hierarchical graph neural networks. During this stage, the router allows information exchange across different levels, expanding the model’s receptive field in dialogue. (2) Fuses matching signals from multiple levels of contexts with a fusion network. Here, the router restricts the information flows to the same context level, efficiently minimizing noise from irrelevant context. Furthermore, despite the significance of argumentative multi-party dialogue in real-world applications, this area remains largely unexplored due to dataset scarcity. To address this challenge, we develop two meticulously annotated datasets, MRDL and MRDR. Unlike the prevailing datasets that primarily focus on short and colloquial conversations, our datasets feature intricate argumentative dialogues and are publicly accessible at https://github.com/AI0Research/MRDL-and-MRDR. Our new datasets and the HGFN model could promote further advancements in this field. Extensive experiments are conducted and it is revealed that the HGFN model surpasses the state-of-the-art, particularly in complex, argumentative dialogues.
Disentangled representation learning for collaborative filtering based on hyperbolic geometry
2023, Knowledge-Based Systems
In the realm of recommender systems, the exploration of hyperbolic geometry-based embeddings for users and items has emerged as a promising avenue, particularly in the context of collaborative filtering through graph convolution networks. Despite the advancements in this domain, there are two significant questions have received limited attention: (i) Most hyperbolic geometry-based methods ignore the influence of different users’ intents on the preferences of historical behaviors; (ii) They usually learn high-dimensional embeddings akin to Euclidean geometry-based methods, without taking good advantage of the capacity of hyperbolic spaces. To tackle these limitations head-on, we propose a novel method called Disentangled Hyperbolic Collaborative Filtering (DHCF). DHCF learns multiple embeddings in different low-dimensional hyperbolic spaces, enabling the disentanglement of users’ intents and the separate modeling of user and item representations. Moreover, we design an intent-aware graph reconstruction module, which adaptively allocates intent-aware relation strengths to build a dynamic interaction graph. Additionally, this module reduces the risk of oversmoothing in low-dimension spaces, facilitating the stacking of multiple graph aggregation layers. To the best of our knowledge, our method achieves competitive performance compared to state-of-the-art approaches.
LA-MGFM: A legal judgment prediction method via sememe-enhanced graph neural networks and multi-graph fusion mechanism
2023, Information Processing and Management
Legal Judgment Prediction (LJP) is a significant task of legal intelligence. Its objective is to predict the relevant law articles, charges, and terms of penalty based on fact descriptions of a criminal case. Existing methods have a drawback: they cannot effectively deal with charges confusion when using various granularity of law articles and predicting outcomes with limited data. In response to this challenge, we propose a solution: a graph neural network-based LJP method that utilizes a multi-graph fusion mechanism to fully and accurately integrate law article information. In detail, we begin by constructing five types of graphs for each case. In the phase of intra-graph information passing, we adopt a Sememe-enhanced Gated Graph Neural Networks to aggregate and update the node features by combining law articles and sememe information. For inter-graph information passing, we introduce a multi-graph fusion mechanism that merges the node features of the five graphs. Finally, we devise a graph readout function, which employs a classifier to derive the results of LJP. The results of our experiment on real-world datasets demonstrate that our method outperforms the current state-of-the-art approaches in our experimental metric.
Text FCG: Fusing Contextual Information via Graph Learning for text classification
2023, Expert Systems with Applications
Text classification as a fundamental task in Natural Language Processing (NLP). Graph neural networks can better handle the large amount of information in text, and effective and fast graph models for text classification have received much attention. Besides, most methods are transductive learning, which means they cannot handle the documents with new words and relations. To tackle these problems, we propose a novel method for Text Classification by Fusing Contextual Information via Graph Neural Networks (TextFCG). Concretely, we first construct a single graph for all words in each text and label the edges by fusing its various contextual relations. Our text graph contains different information of documents and enhances the connectivity of graph by introducing more typed edges, which improves the learning effect of GNN. Then, based on GNN and gated recurrent unit (GRU), our model can interact the local words with global text information and enhance the sequential representation of nodes. Moreover, we focus on contextual features from the text itself. Extensive experiments on several benchmark datasets and detailed analysis prove the effectiveness of our proposed method on the text classification task.
A domain adaptation approach for resume classification using graph attention networks and natural language processing
2023, Knowledge-Based Systems
With the growth of online recruitment, hiring platforms receive and store an enormous amount of resumes and job posts of various fields. Hence, applying automatic resume classification in database systems can reduce the time and labor required for database management, allowing recruiters to re-update the databases quickly. In past studies on resume classification and machine learning techniques, the shortage of labeled resume datasets used in model training was a problem to be addressed. Therefore, this study applied a domain adaptation approach based on a graph neural network, so that the latent features of job posts can be extracted to classify resumes without requiring labeled resumes and a retraining model. The domain adaptation approach compares the semantic similarity between job posts and resume contents for resume classification. The proposed method was tested in an actual recruitment database and achieved a highly accurate result.

View all citing articles on Scopus

View full text

Graph Fusion Network for Text Classification

Highlights

Abstract

Introduction

Section snippets

Related work

Text graph construction

Graph reasoning

Experiments

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Neurocomputing

Inf. Syst.

Knowl.-Based Syst.

Neurocomputing

Inf. Process. Manage.

Neurocomputing

Knowl.-Based Syst.

Inform. Sci.

Recurrent neural network for text classification with multi-task learning

Deep learning for sentiment analysis: A survey

Wiley Interdiscip. Rev.: Data Min. Knowl. Dis.

Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis

Semantics-aware content-based recommender systems

Neural relational topic models for scientific article analysis

A comprehensive survey on community detection with deep learning

A comprehensive survey on graph anomaly detection with deep learning

Reading text in the wild with convolutional neural networks

Int. J. Comput. Vis.

Attention is all you need

Text classification algorithms: A survey

Information

A deep learning approach for intrusion detection using recurrent neural networks

IEEE Access

Recent advances in recurrent neural networks

A survey of the recent architectures of deep convolutional neural networks

Artif. Intell. Rev.

A guide to convolutional neural networks for computer vision

Synth. Lect. Comput. Vis.

A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification

Ensemble application of convolutional and recurrent neural networks for multi-label text categorization

A comprehensive survey on graph neural networks

Semi-supervised classification with graph convolutional networks

Neural message passing for quantum chemistry

Inductive representation learning on large graphs

Tensor graph convolutional networks for text classification

On inductive-transductive learning with graph neural networks

IEEE Trans. Pattern Anal. Mach. Intell.

Text level graph neural network for text classification

Keyword extraction from a single document using word co-occurrence statistical information

Int. J. Artif. Intell. Tools

Extracting semantic representations from word co-occurrence statistics: A computational study

Behav. Res. Methods

Cess-a system to categorize bangla web text documents

ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)

GloVe: Global vectors for word representation

Efficient estimation of word representations in vector space

Neural word embedding as implicit matrix factorization

Robust spectral inference for joint stochastic matrix factorization