Big Data Audit of Banks Based on Fuzzy Set Theory to Evaluate Risk Level

The arrival of big data era has brought new opportunities and challenges to the development of various industries in China. The explosive growth of commercial bank data has brought great pressure on internal audit. The key audit of key products limited to key business areas can no longer meet the needs. It is difficult to find abnormal and exceptional risks only by sampling analysis and static analysis. Exploring the organic integration and business processing methods between big data and bank internal audit, Internal audit work can protect the stable and sustainable development of banks under the new situation. Therefore, based on fuzzy set theory, this paper determines the membership degree of audit data through membership function, and judges the risk level of audit data, and builds a risk level evaluation system. The main features of this paper are as follows. First, it analyzes the necessity of transformation of the bank auditing in the big data environment. The second is to combine the determination of the membership function in the fuzzy set theory with the bank audit analysis, and use the model to calculate the corresponding parameters, thus establishing a risk level assessment system. The third is to propose audit risk assessment recommendations, hoping to help bank audit risk management in the big data environment. There are some shortcomings in this paper. First, the amount of data acquired is not large enough. Second, due to the lack of author’ knowledge, there are still some deficiencies in the analysis of audit risk of commercial banks.


Introduction
With the rapid development of the Internet of things, cloud computing, and mobile Internet, the era of "big data" has arrived. And it has penetrated into all areas of the national economy and society, bringing us new thinking, business and management changes. Auditing is a behavior undertaken by an independent auditing agency and professional auditors to supervise and evaluate the authenticity, validity and benefit of the financial revenue and expenditure and other related economic activities of the audited entity according to law. And the aim is to evaluate the economic responsibilities, maintain the financial discipline, improve the level of operating management and raise the economic benefits. Auditing as an independent economic monitoring, evaluation, assurance and service activity, the ultimate goal is to achieve a healthy and stable national economic system. Based on the rapid development of information technology and the change of auditing environment, big data auditing has become an important means to improve audit efficiency and quality. Through big data auditing technology, it can quickly and flexibly summarize, analyze, evaluate, monitor and warn various basic data. The realization of the audit inspection from sampling to full coverage has become the direction of comprehensive transformation and deepening of audit work [Zheng (2016)]. The banking industry is one of the earliest industries to use computers. Almost all businesses in China's commercial banks are completed through computer information systems, and a large amount of data is generated and processed every day. However, as a special industry that operates currencies, commercial banks are inherently high-risk and the risks are increasing. The traditional auditing method cannot meet the needs of commercial banks in the audit work under the big data environment. Many violations and potential risks are hidden in the data [Mo (2018); Tao (2014)]. Therefore, commercial banks urgently need to break through traditional auditing methods and models. And it makes big data auditing inevitable. It is of great significance for banks to use big data for audit work. First, it helps to improve the efficiency and quality of audit work. The audits are not limited by time and space, and limited auditors are freed from cumbersome documentation work. In the auditing system, the voucher and account book are only the database to be checked in the audit project. The query, sampling, analysis and calculation are convenient, which improves the efficiency of the audit work. Second, it helps to promote scientific and standardized audit work. The use of big data for auditing makes the audit process open and the results public, enhances the transparency of the audit work, and aggregates and analyzes the problems existing in the work, so as to facilitate the understanding of the types and areas of risk-prone business and arrange the audit work more specifically. Resource optimization achieves the scientification of the audit plan, while various types of information in the audit process are stored in a standardized format. Third, it can help reduce audit risk and improve the ability of auditors. The use of information systems for auditing can calculate and analyze audit data, and conduct critical review of data in question. At the same time, various audit models can be used to detect data, so that auditors need to master capabilities of data analysis, comprehensive judgment and cross-professional knowledge application to enhance selfaudit work ability. Finally, it helps to enhance the audit effectiveness. The bank audit department or China Banking Regulatory Commission (CBRC) can use the audit information system to achieve centralized sharing of audit information, review the audit completion status and basic data, and supervise the process. For some suspicious data, they can be prevented in time [Zhao (2016); Wang (2004)]. Schönberger pointed out that big data is the source of people's new knowledge and new value [Schönberger (2013)]. The most important part of big data is forecasting. Most of the foreign research on audit risk is to define it and establish some measurement models. Chinese scholars have conducted a lot of research on commercial banks' risk-oriented internal audit [Tie, Wu, Luo et al. (2019); Zhang (2016); Wei (2019)]. Meng et al. [Meng and Zhang (2010)] believes that China's financial audit must learn from foreign experience and take corresponding measures to better maintain financial security [Zhang (2013)]. The most of the internal audits of commercial banks in China still carry out regulation audit, and there are many aspects to be improved in risk identification, internal control, personnel training and risk communication of commercial banks. Fang [Fang (2012)] pointed out that under the informatization environment, especially the extensive use of computer technology in the banking industries, the fundamental changes have taken place in the auditing environment, audit objectives, audit objects and auditing methods, making the traditional auditing mode of commercial banks unable to meet this changing needs [Shi (2005)], the former auditor of the National Audit Office, first proposed a new concept of data-based auditing. He thinks data-based auditing represents the future of computer auditing in the informatization environment. Wang et al. [Wang and Weng (2006)] conducted a systematic comparative study on the various stages of the development of commercial banks' auditing models, and comprehensively analyzed the auditing risks under different auditing modes. They found that the data audit mode has the largest amount of information and the smallest information asymmetry, and thus the audit risk may be the lowest. Although the research on big data auditing at home and abroad has never stopped, the audit management information system that has been developed in China has not applied powerful functions to the professional audit work in the bank. At the same time, the use of big data auditing in the banking industry is not widely covered [Bai (2013)]. In the era of big data, while bringing opportunities to banks, it also increases the difficulty of enterprise development. In order to continue to develop healthily, enterprises should make full use of big data technology to effectively integrate data and improve the effectiveness and quality of business management. In view of the problems existing in the internal audit of enterprises in the context of big data, we should put emphasis on internal audit, improve the informatization degree of the audit process and the professional qualities of auditors, so as to improve the quality of internal audit work and promote the sustainable and healthy development of enterprises. With the explosive growth of internal and external data in banking industry, the contradiction between the limitation of audit resources and the data information of audited entity becomes increasingly prominent. Therefore, big data audit is an inevitable outcome. In the process of rapid changes in organizational structure and continuous innovation of financial services, commercial banks must simultaneously establish and improve risk control systems and corresponding internal control systems to effectively avoid risks [Song (2019)]. The fuzzy clustering method uses the fuzzy set theory to deal with the analysis problem, and has obvious classification effect on the two-state data or polymorphic data with fuzzy features. It can improve the scientificity of the audit and reduce the audit risk.

A summary of fuzzy set theory
Fuzzy clustering analysis is a mathematical analysis method suitable for the case where the definition between things is relatively fuzzy. In the general classification process, we tend to classify things with the same characteristics into one class. However, when there is a vague definition between things, it is often impossible to carry out effective classification. This requires the method of a fuzzy theory to classify by softening the more rigid classification thresholds between things through a specific membership function, resulting in better classification results [Wu (2019)]. Clustering is an important branch of data mining. Considering the mediation of sample generics, fuzzy clustering extends the membership degree of the sample from (0,1) to [0,1]. It can objectively reflect the real world and become the mainstream of cluster analysis research. This is a very common method of multivariate analysis in mathematics. It uses mathematical methods to determine the relationship of samples. In 1965, Professor Zadeh from the United States first proposed the fuzzy set theory. The definition of fuzzy set is: let universe be U, any mapping of the closed interval U~ [0,1] can determine a fuzzy subset A of U. In the formula, ( ) is the membership function of the fuzzy subset, and is the membership degree of U for A. Where: if ( ) is closer to 0, it means that the degree to which u belongs to A is smaller, and vice versa. The methods for representing fuzzy subsets are as follows. Zadeh representation:

A=
(1) Where represents the membership degree A of ui to fuzzy set A.
Ordered representation: A= (2) Vector representation: A= (3) If the universe U is an infinite set, then the fuzzy set on it can be expressed as A= . Since the fuzzy set can represent the fuzzy concept object well, the fuzzy set becomes a tool for the user to solve the problem with fuzziness in practical applications [Tang and Song (2019); Mao (2009) ;Dou, Yuan and Liu (2011)]. The membership function, also known as the attribute function, is an important concept in fuzzy mathematics and a mathematical tool for expressing fuzzy sets. In order to get the membership of the elements in the universe, you need to take a suitable value from 0 and l to represent the membership of the element. The most characteristic feature of the membership in the fuzzy set is that it is not directly represented by 0 or l, but the number of memberships is represented by the number between 0 and l. This membership degree can well reflect the degree of closeness between the element and the fuzzy set. At present, the common methods of determining membership degree include intuition, fuzzy statistics and fuzzy distribution to determine the membership degree of the element for the fuzzy set. The following is an overview of several methods [Chen and Liu (2014); Wu (2013); Xing (2013); Li, Li and Dan (2002)]. The intuition method is that people understand some fuzzy concepts according to their own practical experience, or establish a membership function for fuzzy concepts based on common sense. It is often used to describe things that people are familiar with but have objective fuzziness in their daily lives. Intuitive methods can also solve such problems well when evaluation indicators have difficulty in obtaining data. Although the intuitive method is simple and intuitive, the degree of membership is greatly disturbed by the outside world, as different people may establish different membership functions according to their own practice and the degree of understanding of things. The fuzzy statistics method means that based on the investigation statistics, analyze and calculate the occurrence frequency of elements, then make a quotient between the occurrence frequency and the total investigation frequency, and the final result is the membership degree of elements to the fuzzy set. Fuzzy statistics method can not only objectively reflect the membership degree of the element in the universe, but also has a certain theoretical basis. When the number of statistics is large enough, the frequency of the element is always stable at a certain number, so the fuzzy statistical method is also a common method for determining the degree of membership. The fuzzy distribution method is the membership function of the fuzzy set on the real number set, which takes the real number set as the universe. The fuzzy distributions often used in practice include the rectangular distribution, Gaussian distribution, Cauchy distribution and so on.

Implementation steps and case analysis
Big data processing usually includes getting data, cleaning data, management data, analysis, and visualization (see Fig. 1). Getting data refers to the data required for the experiment from the professional data organization, the National Bureau of Statistics, the internal enterprise and the Internet; cleaning data refers to the cleaning of the acquired data source and elimination of unnecessary, incorrect and invalid data; management data means the processing of cleaned data according to business needs; and on this basis, the analysis and research; finally, the results are visualized. So that it can be displayed more clearly and intuitively, and maximize the use of data value.

Conclusion
Based on the collected data, we calculate the membership degree of the volatility of operating revenue to the "reasonable fluctuation", and judge whether the fluctuation of operating revenue belongs to the "reasonable" range. The more unreasonable data is, the higher the risk is, which means that bank needs to take risk prevention. The experimental results show that from 2011 to 2012, the growth rate of operating revenue is 16.03% and the membership degree of "reasonable fluctuation" is 0.56. The growth rate is within the range of reasonable fluctuation and does not constitute a risk factor. The experimental data comes from 140 bank staff, and the amount of data is relatively small. In the next research process, we can appropriately increase the amount of data, so as to obtain a more stable function membership degree. At the same time, it is not limited to a certain research factor, and research on various aspects of bank auditing makes the results more credible.