A survey on deep learning for financial risk prediction

: The rapid development of financial technology not only provides a lot of convenience to people’s production and life, but also brings a lot of risks to financial security. To prevent financial risks, a better way is to build an accurate warning model before the financial risk occurs, not to find a solution after the outbreak of the risk. In the past decade, deep learning has made amazing achievements in the fields, such as image recognition, natural language processing. Therefore, some researchers try to apply deep learning methods to financial risk prediction and most of the results are satisfactory. The main work of this paper is to review the predecessors’ work of deep learning for financial risk prediction according to three prominent characteristics of financial data: heterogeneity, multi-source, and imbalance. We first briefly introduced some classical deep learning models as the model basis of financial risk prediction. Then we analyzed the reasons for these characteristics of financial data. Meanwhile, we studied the differences of commonly used deep learning models according to different data characteristics. Finally, we pointed out some open issues with research significance in this field and suggested the future implementations that might be feasible.


Introduction
The 1997 Asian financial crisis, the dot-com bubble burst of the late 1990s and the financial crisis of 2007-2008 caused serious damage to China's economy, which directly led to the bankruptcy of a large number of companies and the unemployment of a great quantity of workers.Since then, the vigilance of the Chinese government and public against financial crisis has reached an unprecedented height.Financial security is the core of national economic security, so we should attach great importance to potential risks in the financial field and resolutely hold the bottom line of avoiding risks (Zhou, 2017).
To prevent the financial risk with the least cost, an accurate financial risk prediction model is necessary.However, in the Internet era, financial data is experiencing explosive growth, and mixed with all kinds of risk information, which makes China's financial risk prevention system be difficult to work.For such a huge amount of data, traditional processing technology is costly and time-consuming.Therefore, researchers of financial security presented lots of novel financial risk prevention models based on machine learning (ML) and deep learning (DL).The main purpose of this paper is to review these innovative studies and answer the following research questions: • Why does financial data develop towards heterogeneity, multi-source, and imbalance?• What is the research progress of DL models for financial risk prediction?• Which DL models are more popular according to different data characteristics?
• What are the future research directions of DL research for financial risk prediction?
An important basis of this work is the three characteristics of financial data, i.e., heterogeneity, multi-source and imbalance, which are also the future trend of financial data (Qu et al., 2019).Financial data are becoming increasingly heterogeneous, since there are more and more unstructured data on the Internet, such as blogs, pictures and videos.In fact, the heterogeneity of financial data is often accompanied with multi-source.Meanwhile, the financial data reflect the nature of multi-source since they derive from different sources, such as a person, a company or even an industry.And imbalance refers to the imbalance between safe data and risky data, and the information asymmetry between companies and stakeholders.
In order to ensure the comprehensiveness of this survey, we collected and reviewed the literature on deep learning for financial risk prediction in various databases.The involved literature databases contain not only English journals and international conferences, but also Chinese journals and graduation theses.In addition, we also searched for information from several official websites of research institutes and Chinese government.
The remainder of this paper is structured as follows.After this brief introduction, the work related to this paper in this field is introduced.In Section 3, we briefly describe the architectures and main working principles of three classical deep learning models, i.e., Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT).To compare with deep learning, some research achievements of machine learning for financial risk prediction are also mentioned in Section 3. The fourth section is the core part of this article, here, we analyze the reasons for three prominent characteristics of financial data and purposefully review the applications of deep learning for financial risk prediction according to the characteristics.In Section 5, we list some issues which have not been effectively solved but have research significance in this field, and suggest some research ideas combined with other technologies.Finally, in Section 6, we summarize our research findings and draw conclusions.

Related work
This work is positioned at the intersection between finance and deep learning technology, which is new and has important research significance.At present, there are some review articles in this field using different approaches to investigate the predecessors' work.Qu et al. (2019) classified the literature according to the experimental models adopted, and then reviewed the research achievements of machine learning or deep learning for bankruptcy forecasting by category.They considered several classical machine learning models such as Multivariant Discriminant Analysis, Logistic Regression, Ensemble method and Support Vector Machines, and major deep learning methods such as Deep Belief Network and Convolutional Neural Network (Qu et al., 2019).
A popular topic at the intersection of finance and deep learning is financial time series forecasting.Sezer et al. (2020) systematically reviewed the literature which are published from 2005 to 2019 about financial time series prediction based on deep learning (Sezer et al., 2020).Their work focused on deep learning, so they briefly introduced the applications of machine learning for financial time series forecasting.Then they described the principles and structures of eight commonly used deep learning models, and surveyed specific cases of financial time series prediction, such as stock price forecasting and index forecasting.In addition, Jiang (2021) studied recent progress in the applications of deep learning in stock market prediction.Different from general research methods, Jiang not only classified different neural network structures and evaluation metrics, but also categorized different data sources (Jiang, 2021).
In addition to financial time series prediction, there are many other rapidly growing subfields of deep learning in finance.Ozbayoglu et al. (2020) from the comprehensive perspective of deep learning for financial applications, provided readers with a snapshot of the existing research status and future research directions of many subfields (Ozbayoglu et al., 2020).These subfields include algorithmic trading, risk assessment, fraud detection and so on.Deep learning technology can be applied not only to specific financial topics, but also to specific industries.For example, Leo et al. (2019) focused on banking risk management through a review of the available literature, and attempted to find hidden problems to be solved (Leo et al., 2019).
Through studying the related work in this field, we find that most of the available review papers focus on the models that deal with financial data.But financial data is not static.On the contrary, financial data is rapidly developing towards heterogeneity, multi-source and imbalance.Therefore, a survey article would better consider both the differences between neural network models and the development trend of financial data.According to the literature we gathered, our article may be the first to review the application of deep learning for financial risk prediction from the perspective of characteristics of financial data.Meanwhile, the literatures involved in this work are also grouped according to deep learning models.We believe that such content arrangement will make readers more clearly understand the current status of this cross field.Moreover, open issues that have research significance and corresponding research ideas proposed by us in Section 5 maybe give readers some inspiration.

Machine learning
The development of Internet finance in recent years shows that while financial technology enhances the adaptability and inclusiveness of financial systems, it may make financial risk transmission more rapid and bring greater challenges to financial security.Therefore, more and more researchers apply machine learning to predict financial risks.Ma and Lv (2019) proposed MLIA algorithm for predicting financial credit risks, which was an improved model of a machine learning algorithm (Ma and Lv, 2019).Yu (2017) constructed an integrated machine learning model based on historical transaction data to study online lending credit risk prediction.Zhao et al. (2018) used the machine learning method of least square support vector machines (LSSVM) to predict systemic financial risks.
Some researchers believed that machine learning techniques may be applied to solve imbalance of financial data, and most of their studies focused on the performance evaluation of imbalanced classifiers.For example, Song and Peng proposed an evaluation method based on multi-criteria decision making (MCDM) for imbalanced classifiers in financial risk prediction (Song and Peng, 2019).This method was used to examine the performance of various prediction models in imbalanced data sets.Similarly, Peng et al. defined a performance scoring index to measure the performance of the classification algorithms, and integrated three MCDM methods to rank imbalanced classifiers (Peng et al., 2011).

Deep learning models
Due to the outstanding performance, the research results of deep learning have been published in a blowout manner in the fields, such as computer vision, natural language processing, and financial risk prediction in recent 10 years.Next, we briefly introduce five kinds of classic deep learning models.

Multilayer perceptrons
MLPs is a feedforward neural network consisting of hierarchical perceptrons.As shown in Figure 1, a perceptron receives n features (must be numeric) as input, each of which corresponds to a weight (Wang et al., 2003).The weighted sum of the input features is calculated by an input function u(x), and then, pass to an activation function f( .) which determines the output of the perceptron.
where   is the value of i-th input feature,   is the weight parameter corresponding to i-th feature, n is the number of features received by a perceptron, X is an input vector composed of features, and θ is a threshold parameter.The structure of MLPs includes the input layer, the hidden layer and the output layer.The internal structure of the hidden layer is one of the most complex parts of MLPs, since it is designed as one or more layers according to specific scenario requirements.The hidden layer is arranged with activation functions, which limit the output value to a certain range, and add nonlinear factors to MLPs in order to deal with problems that are difficult to solve by linear models.Sigmoid function is a widely applied activation function, since it has an advantage that the decision boundaries are smoothly transitioned (Woo and Lee, 2018).
MLPs usually need to adjust the weights of perceptrons through back propagation algorithm or other algorithms in the training stage to reduce mean squared error (MSE).The back propagation algorithm consists of two phases (Simons, 2000): Forward phase and Backward phase.Forward phase.During this phase, the weight parameters in MLPs remain unchanged, and the input features flow from the input layer to the output layer.The final purpose of this process is to calculate the error signal between desired output and actual output.Backward phase.During this phase, the error signal obtained in the forward phase flows from the output layer to the input layer.In the process of back propagation, weight parameters and biases of MLPs are adjusted to minimize the error signal.The level of adjustment is determined by the gradients of cost functions.

Convolutional neural networks
CNNs are the earliest deep learning model and attract extensive attention due to its excellent performance in computer vision (Reimers and Requena-Mesa, 2020).CNNs has a multi-layer network including convolutional and pooling (i.e., subsampling) (Song and Cai, 2021).CNNs is suitable for the situation that the image is too large to be analyzed at the pixel level of input (Stanley, 2021).As shown in Figure 2, the structure of convolutional neural networks is complex.Currently, CNNs obtain amazing achievements in many fields, especially for image classification (Reimers and Requena-Mesa, 2020).Therefore, some financial prediction models applied the excellent image processing ability of CNNs to process financial unstructured data.

Recurrent neural networks
In recent years, some research achievements show that Recurrent Neural Networks (RNNs) method have an excellent prediction performance in the application scenarios represented by stock price prediction (Hsieh et al., 2011).Due to the unique structural design of RNNs, current states contain the historical information of previous states during the learning process.This feature makes RNNs suitable for processing time series data including stock price data.
As shown in Figure 3, the RNNs model has a loop for transmitting information (Feng et al., 2017).Similar to the general neural networks, RNNs has three types of layers: the input layer, the hidden layer and the output layer.Through the loop, RNNs copies the same structure multiple times, and takes each replica as the input to its subsequent copies.When an RNNs model outputs results, it considers not only the information of current state, but also what it has learned from the input it received previously.

Long Short-Term memory
Long Short-Term Memory networks are applied to overcome a serious problem of recurrent neural networks (RNNs)-the problem of gradient vanishing or exploding, which makes RNNs limited in retaining long-term memory (Xing et al., 2020).As an improved variant of RNNs, LSTM networks have the ability of selectively remembering patterns for a long time through an additional memory block (Roshan et al., 2020, Gopika et al., 2020).LSTM consists of an input block, an output activation function, a peephole connection, a memory block called cell, and three gates.Figure 4 illustrates the schematic diagram of LSTM block (Shajun Nisha et al., 2021), where • Input gate: Add some useful information to the cell state.
• Forget gate: Pull out information that is no longer used in the cell state.
• Output gate: Extract useful information from the current cell state.

Bidirectional Encoder Representations from Transformers
Bidirectional Encoder Representations from Transformers (BERT) is a pre-trained language model for natural language processing proposed by Google AI Languages in 2018.Owing to its unique multi-headed self-attention mechanism and self-supervised learning pattern, BERT has the potential to flexibly handle many corpora, which makes it especially popular in academia and industry (González-Carvajal and Garrido-Merchán, 2020).Therefore, some scholars applied BERT to process unstructured financial data, such as social media posts, financial tweets and financial news (Hiew et al., 2019, Wang et al., 2019, Sousa MG et al., 2019).
BERT has two working phases: pre-training and fine-tuning (Devlin et al., 2018).
Pre-training phase.During this phase, BERT adopts two unsupervised prediction tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP).For MLM, BERT deals with randomly selected words by mixing 10% random words and retaining 10% original words in the data set.Since the encoder does not know which words have been randomly replaced, it is forced to retain a distributed context representation of each input.For NSP, BERT receives a pair of sentences as input and learns to predict whether the second sentence in the pair is a subsequent sentence in the original document.
Fine-tuning phase.This phase is simple, since the self-attention mechanism in transformer allows BERT to model many tasks by exchanging appropriate inputs and outputs.For each task, we simply plug in the task-specific inputs and outputs into BERT, and fine-tune all the parameters end-to-end.

Advantages of deep learning in financial applications
Compared with traditional data processing methods, deep learning models have many obvious advantages to deal with the trend of rapid development of financial data.With the development of computer network technology and communications, the number of unstructured financial data is increasing rapidly, and now it accounts for 80-90% of the total financial data (Davis, 2019).Every day, huge amounts of unstructured financial data are integrated into the Internet, such as financial TV programs, posts of stock exchange forums and financial statements issued by listed companies.Applying traditional data processing techniques to process unstructured data is time-consuming and costly, but deep learning is efficient.Financial data can be mainly divided into textual data and image data.There are many achievements dealing with these two kinds of unstructured financial data through deep learning.
Natural language processing (NLP) is applied to process textual data.Li et al. (2021) applied financial text sentiment analysis to financial distress prediction.Li et al. (2021) chose the Lexicon-based approach instead of machine learning for sentiment analysis since it was difficult to gather enough financial labeled training texts.They constructed a Chinese Financial Domain Sentiment Lexicon (CFDSL) through the word vectors and three deep learning-based classifiers (i.e., deep neural networks, multi-head attention-based DNN and bidirectional long short-term memory).The CFDSL was applied to predict financial distress later, and was demonstrated to be effective.For financial image information, CNN and its improved variants are often preferred.Zhao (2020) designed an enterprise financial risk prediction system based on embedded system and deep learning.
The heterogeneity of financial data is generally derived from the multi-source nature, and they are closely combined from the beginning.Time series data is an important type that reflect the multi-source of financial data.Some scholars found that deep learning had high accuracy in predicting time series.Shuiling Yu (2018) tried to apply GARCH model and LSTM method to model the fluctuation of China stock price index (Yu and Li, 2018).They compared the values of four loss functions and found that LSTM model had a better predictive performance.
Another advantage of deep learning in the finance field is the processing of imbalanced data, since deep learning works well in combination with methods of dealing with imbalanced data (i.e., oversampling and undersampling).Chen et al. used Synthetic Minority Oversampling Technique (SMOTE) algorithm to add data, in order to solve the problem of overfitting caused by the unbalanced data (Chen et al., 2018).

Financial risk prediction
In this section, we carefully review the applications of deep learning in financial risk prediction from the three prominent characteristics of financial data: heterogeneity, multi-source and imbalance.In order to ensure the comprehensiveness of this survey, we moderately expand the scope of references.Apart from reviewing the applications of deep learning for financial risk prediction, some research achievements of applying deep learning to other topics of financial security are also introduced.We believe that the articles introduced in this section may enlighten readers on the application of deep learning for financial risk prediction and financial data processing methods.

Heterogeneity
As we know, there are more and more unstructured data on the Internet, such as blogs, pictures and videos.This means that financial data are becoming increasingly heterogeneous, which is likely to be an inevitable trend in this field (Qu et al., 2019).In the section, we review the application of deep learning in financial risk prediction according to the classification of structured data and unstructured data.Table 1 presents different financial risk prediction studies based on DL models from the perspective of financial data heterogeneity.In the financial field, a key source of structured data is business process data which refers to data generated or collected by businesses and public entities during economic activities.As shown in Figure 5, business process data consist of three main parts: transaction data, corporate data and government agencies data.A large part of transaction data is generated during the payment process, such as credit card data.Corporate data is useful data stored in the database or a byproduct of operation process, such as bank records, supermarket scanner data, supply chain data, etc.Government agencies data is generally led by government departments and generated in cooperation with database companies.An example is the national centralized commercial and consumer credit reporting system established, operated and maintained by the People's Bank of China (Lv, 2019).In the era of highly developed commodity economy, any enterprise cannot be separated from its dependence on monetary liquidity, especially financial institutions.If a financial institution runs short of capital, it is likely to lose revenue or even go bankrupt in a very short time.On the contrary, if it only reserves funds blindly, it is likely to miss many investment opportunities with long-term benefits.Therefore, for a financial institution, how to use the funds flow scientifically is a challenge related to its own survival.Yang et al. (2019) used the transaction data of Yu'e bao to study a prediction method of financial capital flow.Based on time series theory and LSTM, they proposed a prediction method (YEB_Hybrid) combining linear model with nonlinear model.They believed that the model improved the prediction performance and effectively prevented the problem of liquidity risk and liquidity surplus.
As a trillion-level market, how to ensure the 24-hour safe operation of E-commerce system is an unavoidable problem.Cao et al. focused on E-commerce industry and established a financial risk early warning system for E-commerce corporates based on deep learning (Cao et al., 2021).Mai et al. innovatively combined traditionally market-based structured data with 10-K files that are unstructured text (Mai et al., 2019).

Unstructured data
Every day, huge amounts of unstructured financial data are integrated into the Internet, such as financial TV programs, posts of stock exchange forums, financial statements issued by listed companies, and even real estate distribution maps taken by satellites.In fact, most financial data (80 to 90 percent) is unstructured according to several analysts' estimates (Davis, 2019).Compared with structured data, unstructured data is more abundant and more complex to process, making it enough to affect all aspects of the financial world.
As shown in Figure 5, we divided most basic financial information into text information and image information according to the way of information storage.These two kinds of information mining methods corresponded to text processing technology and image processing technology respectively.There are many achievements in the academic world dealing with these two kinds of unstructured financial data through deep learning.
For financial text information, natural language processing (NLP) is applied to extract information.For example, factors such as event propensity score, attention index and risk volatility are extracted from social media text information, including twitter, weibo, wechat official account and forum posts (Antweiler and Frank, 2004;Soon-Ho and Dongcheol, 2014;Tsukioka et al., 2018).Unlike many studies that applied financial sentiment analysis to stock market price prediction, Ahmadi et al. applied sentiment analysis to bankruptcy prediction (Ahmadi et al., 2018).They used a well-established method of data mining, known as class-related pattern mining, to filter out data relevant to the impact of a company's failure from a large amount of unstructured text (e.g., business management reports, financial news, real-time streaming tweets, financial statements, financial documents).Then they applied Dependency Sensitive Convolutional Neural Networks (DSCNNs) to sentiment analysis on the filtered data.
Although textual financial data becomes ubiquitous today, in fact companies rarely consider them when formulating development strategies.The reason is that textual data is difficult to be obtained and quantified.Therefore, the effective integration of them in risk prediction models is still an open issue that needs to be investigated further (Lang and Stice-Lawrence, 2015).As an important part of financial texts and a text form of unstructured and qualitative data, text disclosure builds a bridge between enterprises and the public for transmitting information, such as the documents submitted by listed companies to regulators every year.Mai et al. are first to use textual disclosure for large-sample analysis of bankruptcy prediction (Mai et al., 2019).Their experiment shows that although average embedding model is simpler than CNN, it has better performance after data preprocessing and feature extraction.
For financial image information, CNN and its improved variants are often preferred.Zhao designed an enterprise financial risk prediction system based on embedded system and deep learning (Zhao, 2020), which identified users by recognizing their faces.This system started by collecting a dataset of facial information from the company employees, which was then encoded and trained through CNNs.Zhao's research result may be effective in preventing criminals from forging other people's information to engage in illegal activities such as usury network loan.It is also helpful to limit the consumption ability of minors to avoid financial losses caused by online scams.
Financial image data consists of original image data and derived data.Hosaka extracted a set of financial ratios from the financial statements of 102 bankrupt companies and expressed them with gray-scale images (Hosaka, 2019).In this way, the general financial data was transformed into a two-dimensional pixel matrix.Hosaka regarded the bankruptcy prediction problem as a two-class classification problem, so they used the transformed images as a feature set to train and test their CNN model.Extracting timely and effective risk perception factors from multi-source heterogeneous information relies on multi-modal information perception cognitive technology.Xiao et al. reviewed perceptual cognitive technology and its application in the field of financial risk prediction (Xiao et al., 2021).At present, perception technology is mainly used to extract effective information from multimodal massive database for identity verification, or extract text information to provide data basis for information cognition.

Multi-source
The multi-source nature of financial data is a challenge for financial risk prediction, since financial data originates from a person, a company or even an industry.In many cases, the heterogeneity of financial data is generally derived from the multi-source nature, and they are closely combined from the beginning.There are more and more sources of financial data, so how to adapt to such a trend is a challenging problem.In this section, we review the applications of deep learning from the perspective of industry data and individual data.Table 2 tabulates studies for financial risk prediction based on DL models from the perspective of financial data multi-source.

Industry data
Although industry data and individual data are difficult to compare in quantity, industry data often plays a more important role in social impact.Since the outbreak of COVID-19, many industries (especially the tertiary sector) have been injured severely (Nicola et al., 2020).
Among the industries affected by the pandemic, the catering industry is a typical example.Hence, Becerra-Vicario et al. focused on the Spanish catering industry (Becerra-Vicario et al., 2020), because the catering industry was related to social conditions that people rely on to survive, and many restaurants and hotels were on the verge of closure due to the epidemic.They respectively applied logistic regression and Deep Recurrent Convolutional Neural Network (DRCNN) to predict bankruptcy.Their results showed that DRCNN was more suitable for this topic and financial variables related to profitability and indebtedness were the best predictors to the restaurant bankruptcy.
Another important sector affected by the pandemic is real estate construction.Jang et al. ( 2020) developed three models based on the LSTM-RNN model to predict the bankruptcy probability of construction contractors before 1, 2, and 3 years respectively.The Shapley value corresponding to the economic game theory was used to measure the influence of input variables, and determine which input variable had the greatest influence.Since the long project cycle leads to higher probability of bankruptcy than other industries, construction contractors were chosen for financial risk prediction study (Tserng et al., 2014).To deal with extreme imbalance, they combined SMOTE and Tomek link method to resample and balance the dataset.Their results showed that "housing start" had a great influence on the prediction accuracy of the three models, which may reflect the industry characteristics of construction contractors.
In addition, one of the most important components of the financial world-lending institution -has potential for applying deep learning models to decide whether to lend to companies in specific industries.Ju Wang studied a question whether banks could lend to companies that claimed to be "low pollution and low energy consumption", which was called the problem of bank's green credit (Wang et al., 2019)."Green credit" refers to a banks' project for lending money to environmentally friendly companies at a low interest rate.On the contrary, enterprises with high pollution, high energy consumption and overcapacity are financed at punitive high interest rates.Wang used BP neural networks to evaluate the risk of enterprise data and proved that the deep learning algorithm had higher prediction ability.Baranoff et al. investigated the impact of the life insurance industry on systemic risk through hedging by using derivatives before the 2008 financial crisis (Baranoff et al., 2019).

Individual data
A large part of individual data is generated by shareholders, and many scholars are working on whether shareholders may make profits over a certain period of time.In fact, we believe that this research topic may be viewed as forecasting financial risks aimed at individual stakeholders.If existing risk breaks out for some reason at a certain juncture in the future, the profitability of shareholders will correspondingly decline.
From the perspective of high-risk retail investors, Kim et al. (2020) regarded the question whether retail investors could avoid risks and earned profits in the stock market as a financial risk prediction problem.They successfully used the DNN method to help retail investors predict financial risks in experiments, and proved that the advantages of Deep Learning were able to be generalized to structural-based data sets commonly used in retail finance and enterprise decision support.
The profit of investors in stock market system is closely related to the financial market fluctuation curve.Financial market volatility refers to the fluctuation degree of asset price, which is an indicator to measure the uncertainty of asset return rates and reflect the risk level of financial assets.Predicting stock market fluctuations and investors' returns involves time series theory, so RNN and LSTM with memory function are often used in this area (Ozbayoglu et al., 2020;Omer et al., 2020).Han Zhang reviewed common volatility prediction methods, and then tried the volatility prediction method based on RNN and LSTM networks, which was tested on CSI 300 index data (Zhang, 2020).Liu et al. studied the risk spillover from crude oil market on the stock markets, and found that high volatility of crude oil market had a great influence on managing the investment portfolio of BRICS markets (Liu et al., 2019).
He et al. put forward a new Value at Risk (VaR) estimation method based on Deep Belief Network ensemble model to forecast the risk movement more accurately (He et al., 2018).VaR is a measure of the risk of investment loss, and it estimates how much a given portfolio is likely to lose over a set time period with a specified probability and market conditions.The tremendous fluctuations in exchange rates, interest rates and commodity prices over the past few decades have increased the demand for VaR by investors and companies.The salient advantage of VaR is that all the risks in a portfolio are aggregated into a single number, which is conveniently reported to a board meeting, regulators, or an annual report (Linsmeier and Pearson, 2000).Sukharev made a structural analysis of income and risk dynamics when considering economic growth (Sukharev, 2020).
In addition to data from the stakeholders, individual data can also come from other individuals.Cao et al. designed a model based on deep learning to distinguish data from A-BOOK customers (customers with the greatest risk to market makers) and B-BOOK customers (customers with less risk) (Cao et al., 2021).Moreover, panel data is another data type that is similar time series data, sometimes referred to as longitudinal data (Erica, 2019).Panel data contains observations collected at a regular frequency, chronologically.Examples of groups that may make up panel data series include countries, firms, individuals, or demographic groups.
When a statistical model is too closely aligned to its training data, the model may only be applied to processing the initial data set, and cannot perform accurately against unseen data.This situation is called overfitting.There are two main reasons for the problem of overfitting.First, the neural network has high complexity (Sirignano et al., 2016).Second, the amount of data available for training models is limited (Buxton et al., 2019).There are also many methods to avoid an overfitting problem, such as regularization, dropout, data augmentation, early stopping, reducing network size and reducing learning rate (Baek and Kim, 2018).Jiang et al. stacked base classifiers which were based on deep learning algorithms on the first layer.Then, cross-validation method was used to iteratively generate the input of the second level classifier to prevent overfitting (Jiang et al., 2020).Sirignano et al. (2016) applied dropout method to significantly improve out-of-sample fitting.Fischer's experimental results showed that during the model training stage, the risk of overfitting may be avoided and the generalization effect may be improved by randomly reducing some input units (Fischer and Krauss, 2018).

Imbalance
Imbalance, sometimes called asymmetry, is a quite common nature in the field of financial risk prediction.This is mainly caused by two reasons: Firstly, in a country with a functioning economy, the number of companies with little or no risk is far more than companies with risk, which leads to the amount of data with risk is far less than that with little or no risk.Secondly, management and general stakeholders sometimes have great differences in understanding of the actual operation of enterprises.These two reasons create an imbalance between safety data and risk data, as well as information asymmetry between companies and stakeholders.Next, we investigate these two contradictions.In addition, in most instances, the short-run and long-run effects caused by policy uncertainty are also asymmetric (Bahmani-Oskooee and Saha, 2019).Table 3 summarizes studies for financial risk prediction based on DL models from the perspective of financial data imbalance.As mentioned above, the difference in the number of safe and risky companies results in an imbalance between safe and risky data.The methods to solve the imbalance can be divided into two completely opposite techniques: undersampling and oversampling (Zhou, 2013).Roughly speaking, undersampling refers to the removal of some data from the safety data set, while oversampling refers to the addition of new data to the risk data set.Oversampling is now favored by academics because it does not need to delete the precious raw data.
Synthetic Minority Oversampling Technique (SMOTE) is one of the most famous oversampling methods (Skryjomski and Krawczyk, 2017).Smiti and Soui creatively combined Stacked AutoEncoder (SAE) (Smiti and Soui, 2020) based on the Softmax classifier with Borderline SMOTE (a variant of SMOTE) (Wang et al., 2015) to predict bankruptcy.The results showed that BSM-SAES had higher accuracy and precision than other machine learning models.Aljawazneh et al. comprehensively compared the performance of three popular Deep Learning models (Long-Short Term Memory (Aljawazneh et al., 2021), Deep Belief Network and 6-layer Multilayer Perceptron model) with three bagging ensemble classifiers (Random Forest, Support Vector Machine and K-Nearest Neighbor) and two boosting ensemble classifiers (Adaptive Boosting and Extreme Gradient Boosting) in corporates' financial distress prediction.In order to avoid data inconsistency, five over-sampling equalization techniques, two hybrid equalization techniques (i.e., oversampling and undersampling) and one clustering-based equalization technique were proposed.
The distribution of data in credit card transactions is also unbalanced, meaning fraudulent transactions are only a small fraction of all daily transactions.Chen et al. used SMOTE algorithm to add data, in order to solve overfitting caused by the unbalanced data (Chen et al., 2018).Aiming at the problem that SMOTE algorithm added noise data to affect the determination of classification boundary, they applied the classifier based on kNN algorithm and LSTM to screen out safety samples.Then they further proposed the kNN-SMOTE-LSTM credit card fraud detection network model, and verified the effectiveness and feasibility of the new model.
In addition, Wyrobek found that the current advantages of deep learning in the field of bankruptcy prediction were mainly concentrated in cases with moderate data volume (Wyrobek, 2018), so a relatively large and representative data set-the set of Polish commercial companies was processed to prove that the deep learning method (CNN) may also be used in the case of large data volume and had better performance than machine learning (e.g.discriminant analysis, logistic regression, SVM and random forest).Financial risk management has received more and more attention in the past few years.Although financial risk management is not the core competitiveness of non-financial enterprises, it also affects the business operation and financial performance (Havierniková K and Kordoš M, 2019).

Between companies and stakeholders
Companies and stakeholders are at different positions in the data flow: In the upstream, companies not only have first-hand information, but also can actively release some information to the downstream information recipients; In the downstream, stakeholders only get the data filtered, cleaned and modified by the upstream managers or released by the managers themselves.Of course, there is nothing wrong with the setting of such a relative position.After all, they have different rights to the management of companies.However, such an information imbalance or asymmetry places stakeholders in a dilemma where they may face serious economic losses, sometimes even intentionally by the company's management.Now we discuss how to apply deep learning to solve the imbalance.
A typical sample of the data asymmetry between a company and its stakeholders is the financial statements, which make it not easy for the stakeholders to understand the actual operation status of the company.In view of this asymmetry, Jan used RNN and LSTM to study the fraud detection of company's financial statements (Jan, 2021).They used data from 153 companies listed on the Taipei Exchange between 2001 and 2019 as a sample set, and it was known that 51 of these companies had been exposed to financial statement fraud.To ensure the validity of the model, Jan normalized and standardized the original data, making all data values between 0 and 1.Moreover, to avoid bias due to data duplication, random sampling techniques were applied and data were not put back into the data pool.
From a long-term perspective, it seems more straightforward and less time consuming for stakeholders to predict a company's financial distress than to analyze the financial statements published by the company.Some scholars hold the same idea.For example, Jan tried to build accurate and reasonable financial distress prediction models based on deep neural network and convolutional neural network, in order to reduce the economic losses of stakeholders caused by financial information asymmetry (Jan, 2021).The results showed that the method using the chi-squared automatic interaction detector (CHAID) to select important variables and CNN modeling performed best in financial distress prediction.

Open issues and future work
In the work of reviewing predecessors' articles, we find that there are at present the following issues to be further solved in applying deep learning to financial risk prediction, and we put forward the following possible research ideas respectively: Model optimization.Like other fields trying to apply deep learning, a common problem in this field is that deep learning models are like black boxes, which are difficult to explain its internal structure and learning parameters.This feature does help users not need to pay attention to how the model works but simply hand the data to them to run.However, it also poses challenges to researchers.The researchers are often confused about how the prediction accuracy is related to internal structure or some parameters of the models, which aggravates the complexity of financial risk prediction model optimization.This issue is essentially a black box problem at the model level, which involves the essential characteristics of deep learning and may be difficult to solve in the short term.But there are some scholars have been trying to make the models interpretable.In his paper "Can we open the black box of AI?", Castelvecchi introduced some scholars who are committed to cracking the black box.Hence, we believe this issue will be solved gradually (Castelvecchi, 2016).
Risk spread.Most of current research achievements focus on the financial risk prediction of specific people or companies, but how financial risks spread and how to cut off the spread process are less studied.Nevertheless, from the perspective of national strategy, what is needed to ensure financial security is not only specific and specialized risk response strategies, but also a systematic and comprehensive risk prevention system.For this issue, we think that future research may refer to the research methods in propagation dynamics of social network, and combine complex networks theory and deep learning to model and analyze the spreading process of financial risks.
Incomplete dataset.We have noticed that many articles related to heterogeneity of financial data completely ignored the structured data but only used the unstructured data as data sets.To be sure, the amount and impact of structured data are far lower than that of unstructured data.However, for enterprises, structured data still occupy an important position, so we believe that data sets should be comprehensively selected for enterprise risk prediction (such as bankruptcy prediction and financial distress prediction).Similarly, there are some authors who were not aware of the imbalance of data, so they neither used undersampling nor oversampling to preprocess the data.In fact, the imbalance of financial risk data is formed by its innate composition and difficult to ignore in many application scenarios.The main causes and typical manifestations of imbalance have been discussed in subsection 4.3.In this case, we would like to suggest that researchers should consider the comprehensiveness of the data set, and then decide whether to adopt corresponding technical means according to whether there is imbalance in the data.

Conclusions
In our survey, our main purpose was to provide readers with a snapshot of the current research status of DL implementations for financial risk prediction through reviewing and summarizing predecessors' work.We purposefully reviewed the applications of deep learning for financial risk prediction from the three prominent characteristics of financial data: heterogeneity, multi-source and imbalance.Moreover, we analyzed the reasons for the characteristics of financial data, and whether there are differences between popular models based on different data characteristics.At present, there

Figure 1 .
Figure 1.A perceptron with n input features.

Figure 5 .
Figure 5. Specific scope of financial structured data and unstructured data.

Table 1 .
Financial risk prediction studies based on data heterogeneity.

Table 2 .
Financial risk prediction studies based on data multi-source.

Table 3 .
Financial risk prediction studies based on data imbalance.