MFF ‐ CNER: A Multi ‐ feature Fusion Model for Chinese Named Entity Recognition in Finance Securities

: The objective of Chinese financial securities named entity recognition is to extract relevant entities from unstructured Chinese text, such as news, announcements, and research reports, that impact security prices. Recognizing entities in this field is challenging due to the abundance of specialized terms, diverse expressions, and the limited feature extraction capabilities of traditional models. To address this, we propose MFF-CNER, a multi-feature fusion model, to improve the effectiveness of Chinese financial securities named entity recognition. MFF-CNER encompasses several key steps. Firstly, it leverages a BERT pre-training model to capture semantic features at the character level. Secondly, a BiLSTM network is utilized to capture contextual features specific to financial securities text. Additionally, we introduce an Iterated Dilated Convolutional Neural Network (IDCNN) to blend, and extract local features, incorporating an Attention mechanism for weighted feature integration. Finally, the predicted sequences are optimized, and decoded using the Conditional Random Field (CRF). To validate the state-of-the-art performance of MFF-CNER in this domain, we compare it with five popular methods on a Chinese financial securities dataset annotated with the BIO labeling scheme. Notably, MFF-CNER demonstrates superior performance while maintaining compatibility among its components. Furthermore, we evaluate the applicability of MFF-CNER in the Chinese financial securities domain by utilizing public datasets from diverse domains, including social media (WEIBO), and news (MSRA). This research holds practical significance for downstream applications, such as constructing financial securities knowledge graphs, and analyzing factors that influence security prices.


Introduction
Identifying entities with specific meanings from unstructured natural language text, and classifying them into predefined categories is a critical task in Natural Language Processing, commonly called Named Entity Recognition (NER) [1].The performance of NER models has a significant impact on various downstream tasks, including Q&A systems [2,3], machine translation [4,5], and knowledge graph construction [6,7].However, most current NER research focuses on general domains [8,9], biomedical applications [10], clinical records [11], material science [12], , and other fields, while paying little attention to NER in the context of Chinese financial securities.
The financial securities domain is a highly specialized field where many vocabulary terms cannot be understood solely based on their literal meaning.Instead, they require a deep understanding of the financial background, and context to comprehend their specific meanings [13].NER in this domain possesses distinct characteristics compared to general fields.It involves identifying financial entities like company names, stock names, financial terminologies, financial product names, various abbreviations, as well as general entities such as person names, place names, organizations, and time.The challenges encountered in Chinese NER within the financial securities domain can be summarized as follows: Non-uniform expression of financial entities: Financial organizations often have multiple expressions for their names.For instance, "网易" (NetEase) is an abbreviation of "网易 ( 杭 州 ) 网 络 有 限 公 司 " (NetEase (Hangzhou) Company Limited) but is also known as "NTES" or even referred to as "猪厂" (Pig Factory) on the internet.Difficulty in recognizing financial terminologies: Financial texts frequently contain entity names comprising a mixture of Chinese, English, and numbers.For example, the fund name "易方达中概互联 50ETF" combines different languages, and alphanumeric characters.
Inadequate availability of well-annotated datasets: The scarcity of properly annotated datasets in the financial securities domain poses challenges, leading to insufficient data support for developing fundamental text processing technologies.
With the advancements in deep learning technology, traditional rule-based, and statistical learning-based NER models are being gradually replaced by deep learning models.The current mainstream neural networks, and pre-trained language models have demonstrated impressive results in NER research across various domains, including some progress in NER tasks within the financial domain [14,15].However, the financial securities domain presents specific challenges, such as dispersed content, sparse data, and the lack of structured entities.These challenges make it difficult to directly apply existing NER models to Chinese NER in the financial securities domain.Hence, it becomes necessary to explore novel approaches based on text features.
This study aims to tackle the challenges associated with NER in the Chinese financial securities domain, catering to the requirements of diverse downstream applications.In light of this, we introduce a multi-feature fusion model, MFF-CNER, which effectively considers the specific characteristics of the domain-specific text.Empirical findings unequivocally demonstrate the exceptional performance of this hybrid feature model, capable of extracting information at varying levels of granularity, thereby establishing its superiority in the field.

This paper makes the following contributions:
We propose a multi-feature fusion model called MFF-CNER based on the traditional BiLSTM-CRF [16] model, leveraging pre-trained language models.The MFF-CNER model considers the text features, and entity recognition challenges specific to the financial securities domain, enabling the extraction of text features at different granularity levels, and improving entity recognition accuracy.Moreover, the components of the model do not conflict with each other.
Considering the lack of available datasets for the financial securities domain, we collected financial securities text information from multiple sources, and built a domainspecific dataset using domain expertise.We evaluated the performance of different NER models on this dataset.
Experimental results demonstrate that our model achieves the best precision, recall, and F1 score on the constructed dataset, with values of 88.35%, 89.88%, and 89.10%, respectively.
We validated its applicability in the Chinese financial securities domain by applying the MFF-CNER model to publicly available datasets from different domains.

Related Work
Chinese NER has broad application prospects, and many scholars have conducted related research.Research methods mainly fall into three categories: rule-based methods using dictionaries, and rules, statistical machine learning, and deep learning methods.
At the outset, NER assignments were largely dependent on dictionaries that were created manually, and customized rulebased techniques that had the capability of assigning pertinent knowledge bases to general or specific domains, formulating rules on syntax-lexical templates, and then scrutinizing texts for strings that satisfy those dictionaries, and rules [13].This method has high accuracy but low recall.It mainly depends on domain experts constructing rule templates, which incurs high maintenance costs, and is unsuitable for the financial securities domain's large amounts of data, and fast updating speed.
In general domains, statistical-based methods have achieved remarkable results in NER tasks.Therefore some scholars have applied this approach to NER tasks in other financial domains outside the securities industry [17,18].This method requires the manual selection of text features, which are then inputted into a machine learning algorithm.Finally, each character in the sequence is classified according to its corresponding label.Although it has improved its effectiveness compared to previous methods, it still requires significant feature engineering, and involves high time, and labor costs.Currently, in statistical machine learning, only Conditional Random Fields (CRF) are widely used as decoders in new models.
Deep learning techniques based on neural networks have become the dominant approach for NER tasks in modern times, thanks to their strong feature extraction capabilities.Many scholars have studied its application in the financial domain.Huang et al. [16] pioneered in integrating BiLSTM, and CRF models for sequence labeling tasks.BiLSTM can effectively capture global contextual features, and CRF can use sentence-level label information.It has been experimentally verified that this model achieved outstanding results in NER tasks, and has gradually become the mainstream model for such tasks.In order to address the issue of inadequate labeled data in the financial domain, Liu et al. [14] put forward a semi-supervised model that leverages BERT, and Bootstrapping techniques.They utilized the pretrained BERT model to generate word vectors that could capture the semantic relationships between words, and employed BiLSTM-CRF as the model architecture for training the NER task.The addition of the Bootstrapping method further enhanced the performance of the model, and the experimental results indicated that the Bootstrapping method effectively identified more hidden entities in the dataset with inadequate labels.Moreover, the F1 score of the model with Bootstrapping increased by approximately 0.03 compared to the one without it.Liu et al. [15] proposed an entity recognition method in which the input consists of combined character vectors, and Chinese character Wubi shape embeddings.The model architecture was BiLSTM-CRF, and the authors utilized an iterative learning strategy by embedding the label encoding of the sequence trained in the previous round into the input of the current round to improve the model's prediction results continually.The experiment showed that this approach improved the overall performance of the model.Therefore, in Chinese NER in the financial domain, BiLSTM-CRF is still an effective model architecture [13].
T. Yang et al. [19] proposed the IDCNN-BiLSTM-CRF model that combines features from news, and specialized medical texts to capture more granular feature information, achieving good results on a specialized dataset.Compared to general models, IDCNN [20] has a stronger local feature extraction ability without losing information through pooling.Recently, the BERT-IDCNN-BiLSTM-CRF [21][22][23] multifeature extraction method has been applied to multiple domains, and has achieved excellent results.This method adds a BERT pre-trained model to the IDCNN-BiLSTM-CRF model to obtain dynamic word vectors, and capture semantic features.These domain texts have many similar features to financial securities texts, providing guiding significance for entity extraction in the financial securities domain.
To our knowledge, research on NER in Chinese financial securities texts is still lacking.Therefore, considering the features, and challenges of Chinese financial securities texts, this paper also attempts to apply this multi-granularity feature extraction method to this field.

Method
This section describes the basic architecture of our proposed Chinese financial securities NER model, MFF-CNER.The model consists of four layers: The first layer is the BERT pre-training layer, which trains the model in an unsupervised manner on a large-scale unlabeled dataset to extract rich syntactic, and semantic features.This layer produces dynamic word embeddings with abundant language knowledge.The second layer is the feature extraction fusion layer, which is responsible for lower-level feature extraction.It utilizes BiLSTM to extract sentence-level contextual features, and then feeds the output of BiLSTM into IDCNN to further extract local features.This facilitates the fusion, and extraction of features.The third layer is the attention mechanism layer, where Multi-Head Attention is employed to calculate the weights of features, focusing on the ones that play a crucial role in classification.The fourth layer is the feature decoding layer, which utilizes a CRF model to decode the output of Attention, resulting in an optimal label sequence that captures the entities.The Chinese financial securities NER model is illustrated in Figure 1.

Pre-training Layer
BERT is a deep, and bidirectional language representation model that is unsupervised, and utilized for pre-training [24].Conventional approaches to word embedding, like Word2Vec [25], and GloVe [26], have a disadvantage in that identical words in distinct sentences have identical embeddings.However, BERT can tackle this problem.When compared with other pre-training language models like ELMo [27], and GPT [28], BERT has attained superior performance on NLP tasks [22].
The structure of BERT is shown in Figure 2, where  As the attention mechanism fails to capture the positional information of words, the BERT architecture incorporates token embeddings, segment embeddings, and position embeddings in its input.The BERT model is trained through two tasks in parallel: Masked Language Model (MLM), and Next Sentence Prediction (NSP).The word embeddings can represent various character-level, word-level, sentence-level, and inter-sentence relations through this learning paradigm.
Pre-training financial securities text using BERT model can provide more contextual semantic knowledge to word embeddings, which can be fed into the feature extraction, and fusion layer to improve the feature extraction performance by BiLSTM, and IDCNN.

BiLSTM layer
The BiLSTM consists of two layers of LSTM in opposite directions, , and its advantage lies in its ability to capture both future , and past information.Figure 3 displays its configuration.
The basic building block of BiLSTM is LSTM, a particular RNN model type.The model employs three gate controlling units, namely the input gate, forget gate, and output gate, to control the information that needs to be remembered, and forgotten at each time step.Because of the control provided by these three gates, LSTM can remember more extended sequence features, and solve the problems of vanishing, and exploding gradients that often occur during the training process of traditional RNN models.Figure 4 displays the structure of the model.        In financial securities text, many long texts have a high density of domain-specific terminology.By leveraging BiLSTM models, dependency relationships among words in lengthy texts can be captured, and features of the relationships between words can be extracted, thereby enhancing the accuracy of entity extraction.

IDCNN Layer
IDCNN contains several DCNN layers, whereas conventional CNN's final neurons can only acquire a limited fraction of knowledge from the input text.More convolutional layers must be added to acquire contextual information, leading to deeper networks with more parameters, and a higher risk of overfitting.DCNN can address this issue effectively by introducing a hyperparameter called Dilation Rate, which expands the convolutional kernel, and increases its receptive field.Figure 5 illustrates this concept.The internal structure of DCNN is shown in equations ( 8) to (10): In this equation, is the dilation convolution at dilation distance  in layer j ,   Through experiments, we selected the BiLSTM-IDCNN model for feature extraction, which outperformed other models, such as standalone BiLSTM or combinations of IDCNN with other models, and can better capture long-term dependencies in financial securities text by increasing the model's receptive field.For input text, IDCNN outputs the probabilities of each word corresponding to its respective labels.

Multi-Head Attention Layer
The multi-head attention is crucial in enhancing entity recognition accuracy by assigning weights to features extracted from BiLSTM-IDCNN feature vectors, accentuating essential features for classification, and attenuating less relevant features.Experimental results have demonstrated that incorporating attention mechanism can substantially enhance recognition performance.
The output of the multi-head attention layer is obtained by concatenating the outputs of multiple self-attention mechanisms, and applying a linear transformation.The formula is as follows:   Attention , , In the equation, Q, K, , and V represent the Query, Key, , and Value vectors.k d is the dimension of the input vector.
i head is the output of each self-attention mechanism.

Feature Decoder Layer
CRF is a commonly used model in the classic sequence labeling task, which considers the dependency between adjacent labels, and optimizes the label sequence from a global perspective, allowing it to consider all possible results when making predictions, and choose the outcome with the highest likelihood.
The fundamental operational mechanism of CRF involves computing the score function for the anticipated sequence given an input sequence   , , , n Y y y y  .
Subsequently, using the likelihood maximization principle, the labeling sequence with the highest probability of the predicted sequence is selected as the output, given the output sequence corresponding to the input sequence Equation ( 14) reveals the score function for the forecasted sequence Y . , Matrix A denotes the transition probability, where , i j A specifies the likelihood of the transition from label i to label j .Probability matrix P is output from the preceding layer, where , i j P indicates the probability that the i -th term is tagged with the j -th label.
The likelihood of the anticipated sequence Y is computed by means of Equation (15).
x Y denotes the set of all conceivable label sequences, while Y indicates the actual label sequence.By applying a logarithmic function to both sides of equation ( 15), equation ( 16) presents the probability function for the anticipated sequence.

Dataset Gathering and Preparation
The text in this dataset is sourced from the announcements of Chinese A-share listed companies, industry research reports, and financial news between 2021, and 2022.The financial news text is obtained from Sina Finance (https://finance.sina.com.cn/),containing a wealth of information on intercompany relationships in A-share listed companies, such as related party transactions, and shareholding control.The announcements, and research reports of listed companies are sourced from the financial data platform Tushare (https://tushare.pro).These texts serve as crucial references for investor decision-making, and include substantial information about entities such as institutions, organizations, and individuals.
Through text clustering, and manual screening using domain expertise, the financial news articles were curated to retain only those relevant to changes in company ownership, executive management, and stock market developments related to A-share listed companies.Similarly, the announcements, and research reports were filtered to preserve texts relevant to specific industries, and individual stocks.As a result, a dataset consisting of 8,535 news texts, and 4,474 announcements, and research reports was obtained.
To improve the experimental effectiveness, preprocessing was performed on the text data, encompassing two steps.Firstly, special characters in the text that lack actual semantic meaning, such as "®", " ■" , and spaces, were removed.Secondly, any occurrences of traditional Chinese characters in the text were converted to simplified Chinese characters.
The Chinese financial securities dataset encompasses six categories of entities that influence security prices.The data was annotated in this experiment using the BIO [30] labeling method, with the Label-Studio(https://labelstud.io/) platform as the annotation tool.Illustrations of the annotations can be found in Figure 6, and Figure 7.The dataset was divided into training, validation, and test sets for this experiment, with an allocation ratio of 8:1:1.The specific details of entity categories in the Chinese financial securities dataset are presented in Table 1.

Other Datasets
In order to validate the applicability of the MFF-CNER model in the Chinese financial securities domain, we also applied the model along with five other baseline models to the Weibo [31,32] dataset, and the MSRA [33] dataset, representing the social media, and news domains, respectively.Since the MSRA dataset does not have a validation set, the test set was divided into a 5:5 ratio.The basic information of these two datasets is presented in Table 2.

Experiment Settings
Our model was implemented in the PyTorch framework, , and the Adam optimizer was utilized to tune the parameters.
The weight decay was set to 0.00005, and the GPU employed was the RTX A5000.The specific parameters for each model are presented in Table 3.

Evaluation Indicators
The assessment of NER performance in this investigation utilized precision (P), recall (R), and F1-score measures, which are mathematically expressed as follows: TP denotes the scenarios in which the predicted , and actual values are both affirmative; FP corresponds to the scenarios in which the predicted value is positive while the actual value is negative; FN represents the scenarios in which the predicted value is negative while the actual value is positive.

Baseline Methods
BiLSTM-CRF [13]: This method is a commonly used model architecture for NER in the financial domain.
IDCNN-CRF [20]: The IDCNN network exhibits stronger capabilities in extracting local information than the BiLSTM network.The MFF-CNER method also utilizes this structure.
BERT-CRF: This method employs BERT as a pre-training model, and utilizes CRF for entity prediction decoding.It is currently a mainstream model frequently used as a benchmark for comparison.
GlobalPoint [34]: The model proposed by Su J et al. takes a global perspective to consider the starting position of entities, enabling unbiased recognition of both nested, and non-nested entities.
UIE [35]: The model Lu Y et al. proposed is a unified textto-structure generation framework specifically designed for information extraction.It has achieved state-of-the-art entity extraction performance across multiple datasets.

Analysis of Experimental Results
The experimental results of these six models on different datasets are presented in Table 4. From Figure 8, it can be observed that IDCNN, and BiLSTM exhibit similar performance in entity recognition.IDCNN shows a higher recall rate at 82.87%, while BiLSTM achieves a higher precision rate at 86.54%.The BERT-CRF model outperforms BiLSTM-CRF, and IDCNN-CRF in precision, recall, and F1 score, with a notable increase of 2.78% in recall compared to IDCNN.The reason behind this improvement lies in the fact that financial securities text contains dispersed entity information, long sentences, and complex syntax.Compared to BiLSTM, and IDCNN, BERT can learn more intricate semantic features.
GlobalPointer performs global normalization on financial securities text to address nested entity problems.Its overall performance is similar to that of the BERT-CRF model, with a slight improvement of 0.76% in precision but a decrease of 0.69% in recall.This indicates that financial semantic information is equally important in this domain as nested entity problems.
UIE, fine-tuned, and trained on training data, surpasses the performance of the previous four models, achieving precision, recall, and F1 scores of 89.74%, 86.06%, , and 87.86%, respectively.Although this model achieves state-of-the-art results on multiple datasets, a certain gap exists compared to MFF-CNER.
MFF-CNER achieves the best performance, where precision, recall, and F1 scores reach 88.92%, 89.72%, , and 89.31%, respectively.This demonstrates that the model can capture richer features.In entity extraction, the MFF-CNER model utilizes semantic features of text, and considers character-level context features, and local text features while incorporating attention mechanisms to enhance feature information.This model aligns with the textual, and entity features prevalent in this domain, ultimately achieving excellent performance in terms of overall effectiveness.Table 5 presents the results of our proposed approach in recognizing various entity categories in the dataset.The table indicates that the F1 measure of the risk entity (32.78%) is the least, significantly inferior to other entities like company announcement (93.12%), company name (92.97%), and personal name (84.36%).The inferior performance of risk entity is due to its limited annotated samples, and the inherent intricacy in determining its boundaries, posing challenges for segmentation.For instance, the text "流动性紧张及债务⻛ 险 (liquidity stress , and debt risk)" includes two types of risks, liquidity risk , and debt risk, , and the former is difficult to identify due to ambiguous boundaries , and significant noise during the recognition process, resulting in difficulty in identification.

Performance of MFF-CNER
The F1 score of person position entity (69.77%) is also relatively low, with little difference between precision, and recall, due to the uneven distribution of this entity in the dataset.The entity " 法 定 代 表 ⼈ (legal representative)" appears more frequently than others, such as " 总 经 理 (general manager)" , and "董事(director)", posing a challenge for the model to capture the distinctive features of these entities fully.
The precision of person name entity reaches 96.13%, while the recall is only 75.16%, with a precision much higher than the recall.This is because the features of this entity are relatively obvious, and easy to be captured by the model.However, the frequency of the same entity appearing in the text is relatively low, which often fails to recognize these entities when their features are not evident.To investigate the applicability of the MFF-CNER model in the Chinese financial securities domain, we compared these six models on two different domains: Weibo, which represents the social media domain, and MSRA, which represents the news domain.

Analysis of domain applicability
On the Weibo dataset, the UIE model achieved the best performance with an F1 score of 72.01%.The GlobalPointer model ranked second with an F1 score of 64.16%, while the F1 score of the MFF-CNER model reached only 64.20%.Furthermore, both the precision, and recall of the MFF-CNER model were lower than those of UIE.On the MSRA dataset, the GlobalPointer model exhibited the best performance with an F1 score of 96.09%, while the F1 score of the MFF-CNER model was 94.35%, a mere 0.24% higher than UIE.
These experimental results indicate that the MFF-CNER model may not be suitable for other domains, and validate its applicability in the Chinese financial securities domain.In order to investigate the contributions of different components in the MFF-CNER model, we conducted an ablation study on the constructed financial securities dataset.The results are presented in Table 6.Our findings are as follows:

Ablation study
1.Each component in MFF-CNER demonstrates positive effects.As shown in Figure 11, the F1 score consistently increases with the addition of components.However, on the Weibo dataset, BERT-CRF performs better than MFF-CNER (Table 5).
2. Attention plays a crucial role.Figure 11 shows a 2.4% increase in F1 score after incorporating Attention.This indicates that weighting multiple features through Attention is suitable for this domain.
3. BERT significantly enhances the model's ability to capture more entities.Figure 11 demonstrates a significant improvement in recall after incorporating BERT.This suggests that the model has learned more semantic features from the text, and become more sensitive to entities.
4. IDCNN effectively improves the precision of the model.A 2.1% increase in precision demonstrates the beneficial impact of capturing local features on entity recognition.

Conclusion
In this paper, we propose a NER method suited for the Chinese financial securities domain.Due to the longer boundaries, and greater diversity of expressions of entities in Chinese financial securities domain texts compared to general-domain texts [13], many distinguishing features are not easily captured.Therefore, we propose a multi-feature fusion method, MFF-CNER, to address this issue.The advantage of this model lies in its ability to extract features of different granularities from the text without encountering conflicts.
First, we conducted comparative experiments between the MFF-CNER model, and five baseline models.The results demonstrate that the MFF-CNER model achieves state-ofthe-art performance.Next, we analyzed the recognition performance of the MFF-CNER model on different entity types in the domain.It was observed that there are still some challenges in identifying certain entities, such as risk entities, and position entities.The limited quantity of position entities in the dataset construction process contributes to this issue.The difficulty in delineating boundaries, and the presence of numerous noisy information account for the challenges associated with risk entities.Furthermore, we evaluated the performance of the MFF-CNER model in the domain-specific datasets of social media, and news, comparing it with other baseline models.The findings indicate that the MFF-CNER model achieves the best results only in the Chinese financial securities domain, thus confirming its applicability in this specific domain.Finally, we conducted ablation experiments on the MFF-CNER model based on the BiLSTM-CRF architecture.These experiments aimed to validate each component's positive contribution, and ensure no conflicts among the components.Notably, the BERT, and Attention modules were found to play a crucial role in significantly improving the model's performance.
To further enhance the accuracy of entity recognition in the Chinese financial securities domain , and address the potential applications in building financial securities knowledge graphs , and analyzing factors impacting stock prices, future research will focus on the following aspects: 1) Enriching the construction of domain-specific datasets by including more refined entity types; 2) Optimization of the MFF-CNER model to address the problem of not being able to delineate well the boundaries of certain entities (e.g., risks); 3) Much of the relevant information in this field is in the form of pictures , and tables, in order to further improve the performance of the model, more multimodal features are to be introduced to help entity recognition in the field of Chinese financial securities.
are the encoding representations of words, and they are obtained by training on a vast quantity of untagged data to derive word vector representations T 、T 、…、T .Here, Tr m refers to the Transformer [29] structure.

Figure 2 .
Figure 2. Structure of the BERT model.

Figure 5
Figure 5 illustrates the difference between regular, and dilated convolution, using a 3×3 kernel as an example.(a) The standard convolution is illustrated using a dilation rate, yielding a 3×3 receptive field.(b) Employing 2 dilation rate increases the receptive field to 5×5.(c) Doubling the dilation rate to 4 expands the receptive field to 15×15.The internal structure of DCNN is shown in equations (8) to (10):

0 1 D 1 j
is the dilation convolution at dilation distance 1 in the first layer.the j -th layer of dilation convolution    ; () r denotes the ReLU activation function.

Figure 9 .
Figure 9. Experiment results of different models on Weibo.

Figure 10 .
Figure 10.Experiment results of different models on MSRA.

Figure 11 .
Figure 11.Changes in indicators of ablation experiments.

Table 1 .
Statistics of Chinese Financial Securities Entities.

Table 4 .
Experiment results of NER based on different models.Experiment results of different models on My dataset.