The integration of forensic accounting and big data technology frameworks for internal fraud mitigation in the banking industry

Abstract The purpose of this study is to investigate the integration of forensic accounting and big data technology frameworks in relation to the mitigation of internal fraud risk in the banking industry. This study employed an explanatory research design involving the use of simulated data to mirror the situation in the banking industry. To this end, the big data analytical approach considered is machine learning that involves a neural network with two-layer feed forward, one hidden layer and five hidden neuron layers created to detect the presence of fraud and classify them into two, viz.: fraudulent and non-fraudulent activities. Both the input and output target samples are automatically divided into training, validation, and test datasets, while the confusion matrix is employed to visualise the percentages of correct and incorrect classifications. Furthermore, the clustering of the fraud indicators was also carried out to group them based on their similarities. The results obtained demonstrate the feasibility of neural networks in classifying internal fraud into three levels of risks and fraud detection. This is evidenced in the percentage of correct classification (95%) and misclassification (5%) obtained from the confusion matrix. The model also demonstrates the feasibility of clustering the potential red flags of internal fraud. This study provides an understanding into the attributes of internal fraud and a practical guided approach to implement an integrated forensic accounting and big data technology framework for internal fraud mitigation. The forensic accountant should ensure that the machine learning models are regularly updated with new datasets for automatic classification and clustering analysis. There is still scanty information regarding the integration of forensic accounting and big data technology for mitigation of internal fraud risk in the banking industry. Hence, it is envisaged that this study will contribute to the method, theory and practise of internal fraud mitigation.


Introduction
According to the Association of Certified Fraud Examiners (ACFE; 2012), fraud involves the use of one's occupation for personal benefit through a deliberate misappropriation of the organisation's resources. Many reports and authors opined that fraud is a deliberate act marked with deception, violation of trust and concealment with the aim of siphoning organisation's resources for personal The success of forensic accounting for fraud mitigation is partly a function of the tools employed for data analytics (Hamdan, 2018). Most forensic accounting, especially in emerging economies, face challenges in the area of deployment of the data analytical tools for fraud investigation due to lack of the required skills or expertise. This has slowed down the full emergence of forensic accounting for fraud detection in some countries (Akhidime & Uagbale-Ekatah, 2014). The choice of the forensic accounting investigative tool is partly a function of the skills and expertise of the forensic investigator and this has a major effect on the outcome of the investigation process. The process of fraud investigation and detection in organisations is increasingly becoming more dynamic and complex. The dynamic nature of fraudulent schemes and changes in accounting principles, procedures and policies often complicate the fraud investigation processes (Ozili, 2018). This is coupled with the fact that fraud investigation and detection in this digital era is marked with conspiracy, deception and concealment, which often make it a daunting task to unravel the root cause and the perpetrators (Enofe et al., 2017;Ezejiofor et al., 2016). Due to this, a combination of the knowledge, skill, and experience of a forensic accountant plays a major role in fraud detection. Thus, this necessitates the use of advanced and cognitive analytics. Forensic accountant needs to be kept abreast of the recent advances in data analytics domain to prevent absolute reliance on the traditional data analytics for uncovering fraud in this digital era. Traditional data analytics techniques might be tedious, time-consuming and less effective in this era of emerging digital technologies. For instance, where there is vast amount of data of different types garnered from different sources, more time and personnel may be needed to support the workflow in the traditional data analytics. Hence, the need for the use of the big data technology. Mckinsey Global Institute (2011) and Warren et al. (2015) opined that the high volume and high variety nature of big data may be difficult to manage by the traditional data analytic techniques. Although traditional data analytic techniques are still useful for visualisation of datasets, advanced and cognitive analytics boast of more flexibility and automation of the data analysis phase, thus eliminating the rigour of manual work and susceptibility to human error. Advanced and cognitive analytics also boast of high computational efficiency, and data narrative, which can guide forensic accountants to make informed conclusion about suspected fraud cases.
Forensic accounting implementation frameworks geared towards fraud mitigation have been developed in recent studies. The motivation for this study is the quest to integrate forensic and big data technology that has not been reported in the existing literature. The aim of this study is to investigate the integration of forensic accounting and big data technology frameworks in relation to the mitigation of internal fraud risk in the banking industry. This study provides an understanding into the attributes of internal fraud and a practical guided approach to implement an integrated forensic accounting and big data technology framework for internal fraud mitigation.
This study finds that the integration of big data technology into the forensic accounting framework can enhance the data analysis phase of the forensic accounting framework. The machine learning approach employed for the simulated data in this study shows feasibility for achieving fraud classification and clustering of the potential red flags for internal fraud perpetration. The rest of the paper is organised as follows: the second section presents the overview of the existing literature review, while the third section presents the methodology employed in this study. This is followed by the results and discussion in the fourth section, while the last section presents the conclusion and recommendation drawn from the outcome of the study in relation to the study objectives.

Literature review
The literature review in this section comprises an overview of some existing works on internal fraud, forensic accounting and big data technology.

Internal fraud
Internal fraud has been defined as the fraud perpetrated by an organisation's employee (Ezejiofor et al., 2016;Kumar & Sriganga, 2014;Srivastava & Bhatnagar, 2021).  indicated that an organisation's employee can take undue advantage of the perceived loop holes in the control structure of an organisation coupled with easy access to organisation's information to perpetrate and conceal fraud. Hinde (2003) opined that that many organisation's security breaches aimed at fraud perpetration can be traced to internal employees directly or indirectly through collaboration with the people outside the organisation.
According to Chartered Institute of Management Accountants (Chartered Institute of Management Accountants, CIMA, 2008), internal fraud can be divided into three broad categories: asset misappropriation (cash and non-cash), fraudulent statement (financial and non-financial) and corruption (conflict of interest and bribery or extortion). The most common of the types of internal fraud is asset misappropriation. This happens when an employee steals or mismanage the organisation's resources. Examples include cash theft, fraudulent billing, inventory, or inflated reimbursement (Agarwal, 2022). On the other hand, financial statement fraud involves intentional error or omission such as presentation of false statements, forgery, record alteration, etc., in the financial reporting purposely to deceive the financial statement users (Kenyon & Tilton, 2011). In financial statements, fraud perpetrators deliberately present incorrect financial statements to deceive or misinform the users of such information (Rezaee, 2005).
The third category of fraud: corruption occurs whenever an employee abuses personal influence to impinge on the tenants of business transactions, as in the cases of bribery or conflicts of interest (Venegas, 2012). Srivastava and Bhatnagar (2021) identified some factors that promote internal frauds such as weak internal control system, lack of the required expertise and technology to combat internal fraud amongst others. The authors suggested that a data driven approach is more effective than the conventional approaches employed by many banks in tackling internal fraud. Kenyon and Tilton (2011) and Clayton (2011) identified some potential red flags, which are indicators of internal fraud perpetration. These include:  Kenyon and Tilton (2011) and Clayton (2011) further explain the need for a fraud investigator to have a good understanding of the motivation for fraud perpetration and the nature of transaction carried out as this will aid the identification of the areas where fraud is perpetrated. Potential red flags could also indicate the attributes of some employees who commit fraud. A fraud investigator must also be able to identify the potential red flags and also be aware of the trends and irregularities connected to the potential red flags, including areas that require further analysis or monitoring. The potential red flags can be detected via physical inspection or observation and can also be detected through review or analysis of the dataset (Kenyon & Tilton, 2011). Clayton (2011) indicated that data analytics can aid the identification of high fraud risk, suspicious journal entries and other potential red flags of fraud. The identification of red flags could also aid the process of fraud investigation, detection and prevention (Kenyon & Tilton, 2011).
However, the development of robust management control systems with effective internal control mechanisms, that will ensure transaction approvals, proper monitoring, access and staff controls, can aid the mitigation of internal fraud perpetration. Venegas (2012) proposed a framework for internal fraud control, which links the control environment to the risk management processes. The risk management processes comprise three major components, namely: control activities, control procedures and monitoring procedures. Pizzi et al. (2021) proposed the digital transformation on internal auditing. The study revealed that the digital transformation of the internal auditing process could positively impact the processes of continuous auditing, fraud detection, data analytics and technological innovation Pizzi et al. (2021). Furthermore, the use of Blockchain technology as a tool for professional auditing to improve business information systems and prevent fraud in a time effective manner has also been proposed (Lombardi et al., 2021). Although, it was reported that the Blockchain technology could potentially disrupt the auditing system at the initial stage, the positive effect on auditing tradition and activities was highlighted by Lombardi et al. (2021). In addition, there have been recent discussions about smart contracts enabling Audit 4.0 to promote transparency, reporting and reporting disclosure (Lombardi et al., 2021).

Forensic accounting
Forensic accounting is a technique that integrates the conventional accounting system into the legal framework for the purpose of fraud mitigation (Gerson et al., 2011). Bassey and Ahonkhai (2017), stated that forensic accounting takes into cognisance the principles of accounting, investigative and legal procedures for tackling fraud. As a fraud mitigating tool, forensic accounting can detect both internal and external fraud schemes. It has a framework for information gathering, fraud investigation, data analytics, risk assessment, fraud detection and litigation Akinbowale et al., 2020b). There exists a consensus among the authors that forensic accounting can be employed for fraud detection, investigation and fraud examination or analysis (Akinbowale et al., 2020a;Huber, 2017;Kranacher & Riley, 2019;Liodorova & Fursova, 2018;Perduv et al., 2018;Serhii et al., 2019;Shimoli, 2015). However, the lack of the required expertise, choice of data analytics techniques and implementation frameworks are some of the identified challenges mitigating effective implementation of forensic accounting for fraud mitigation.
Some existing studies have highlighted the nature of skill and expertise required by a professional forensic accountant for fraud investigation. These include accounting, auditing, investigative, legal and data analytics skills amongst others (Akinbowale et al., 2020b;Ozili, 2015). Some authors have indicated the need for the incorporation of forensic accounting education in the curriculum of academic institutions so that students can acquire the basic forensic accounting skills and knowledge during their academic programmes improve their expertise (Efiong, 2012;Kramer et al., 2017;Rezaee et al., 2016;Seda et al., 2019).
To mitigate this, forensic accounting implementation frameworks geared towards fraud mitigation have been developed (Akinbowale et al., 2020b(Akinbowale et al., , 2021. However, the integration of forensic accounting and big data technology has not been reported in the existing literature. The success of forensic accounting implementation for fraud mitigation is partly a function of the tools employed for data analytics (Hamdan, 2018). According to Ozili (2018), the theory of forensic accounting indicates that the decisions after forensic investigation are a reflection of the forensic techniques employed. To promote the reliability and success of forensic accounting investigation, this study attempts to integrate the forensic accounting framework and big data technology.

Big data technology
The word "big data" is used to refer to vast amount of data collected at high velocity and from diverse sources for processing to make informed decisions (De Dott, 2020). There are three key attributes that are commonly used to describe the "big data." These are, namely: high volume (vast size or amount of data garnered), high velocity (the speed at which the data was collected) and high variety (the diverse sources from which the data were garnered; Arnaboldi et al., 2017;De Dott, 2020;Moffitt & Vasarhelyi, 2013;Vasarhelyi et al., 2015;Yoon et al., 2015;Zhang et al., 2015). Thus, big data can be defined as a collection of high volume data of different types and from different sources. The process of scrutinizing, pre-processing and analysing big data to obtain useful information such as detection of certain trends or patterns to make an informed decision is usually referred to as big data analytics (Cao et al., 2015). Clayton (2011) as well as Kenyon and Tilton (2011) suggested data analytics involving the use of data mining techniques for the investigation of suspected fraud cases. The authors indicated that the use of data mining techniques could aid pattern recognition (detection of irregular patterns and other anomalies in transactions). It could also provide a summary of activities related to transactions and other internal fraud red flags. Kumar and Sriganga (2014) suggested that by leveraging on the power of data analytics, organisations can minimise the occurrence of internal fraud.
In data mining, the clustering analysis can aid the classification of information with similar features, while the association rules can assist in establishing the existing relationships within the dataset. The regression analysis can assist in the determination of the magnitude of changes in the data pattern when certain variables are changed. Hence, the application of data mining techniques can enhance data processing and promote the reliability of the acquired information in the quest to mitigate fraud (Kumar & Sriganga, 2014;Miller & Martson, 2011) pointed out the prevalent frauds perpetrated internally in banks. This study further classified internal fraud into different types with a focus on the data mining technique used for detecting internal frauds. There is a consensus among the authors that the implementation of big data technology can assist in uncovering corporate fraud (Yoon et al., 2015;Cao et al., 2015;Vasarhelyi et al., 2015;Rahmawati et al., 2016Rahmawati et al., &, 2017Jans et al., 2011;Baader & Krcmar, 2018;Werner, 2016;Tang & Karim, 2019;Dagilienė & Klovienė, 2019;Chiu et al., 2020;Balios et al., 2020). Nonetheless, the absence of quality data, lack of the required expertise as well as implementation frameworks are some issues mitigating effective deployment of the big data technology for fraud mitigation. Bhasin (2016) stated that data analytics can assist forensic investigation in the process of fraud investigation. Cusack and Ahokov (2016) explained that the data analytical technique can be used for fraud detection via data acquisition and analysis using specialised software to detect anomalies in the trends or patterns of the acquired data. The big data analytic process can provide vital and credible information to a forensic investigator in detecting certain patterns and trends or anomalies within a dataset (Clayton, 2011;Decker et al., 2011;Kenyon & Tilton, 2011;Miller & Martson, 2011). The investigative process of forensic accounting comprises forensic analytics involving evidence gathering and data analysis in order to obtain evidences admissible in the court (Nigrini, 2011). However, many forensic analytic tools such as quantitative methods such as the Benford's Law, benchmarking, time-series methods, risk scoring, etc., have been reported (Nigrini, 2011). The forensic analytical tools could also range from the use of common software such as MS Excel and MS Access for a small dataset to the ones that can handle large datasets such as the MS SQL Server and Oracle (Decker et al., 2011). The use of software packages such as the Statistical Analysis System (SAS) and Statistical Package for Social Science (SPSS) for statistical analyses can also be used for data analysis (Decker et al., 2011). However, the use of the big data and machine learning approach in forensic analytics is still evolving and has not been sufficiently highlighted by the existing literature. Mittal et al. (2021) stated that the integration of big data technologies into the forensic accounting domain can facilitate fraud mitigation. The integration of big data analytics, specifically machine learning, into the forensic analytic framework can assist forensic accountants to quickly and effectively identify and investigate the root causes of fraud incidences and prevent future occurrences. The machine learning technique is also flexible and can be combined with statistical concepts to develop a cognitive analytic framework that forensic accountants can employ to detect the motives and methods of the fraudsters with improved sensing capabilities for large dataset. Data mining techniques can be integrated into the data analysis phase of the forensic accounting framework to achieve fraud detection or prediction, data clustering and classification in order to obtain outputs such as suspicion scores or rules to visualise anomalies in a dataset. This can enable the generation of association rules, identification of relationships, identification of customers' approval patterns, etc., that can enable easy tracing of trends and suspicious transactions.

Forensic big data analytics
From the literature reviewed, it is obvious that there is still a death of information regarding the integration of forensic accounting and big data technology for mitigation internal fraud risk in the banking industry. Hence, it is envisaged that this study will contribute to the method, theory and practise of internal fraud mitigation.

Methodology
This study employed an explanatory research design involving the use of simulated data to mirror the situation in the banking industry. For the purpose of this study, the big data analytical approach considered is the machine learning.
Machine learning is a blend of numerous computer algorithms, which allows the computer to accomplish a task without difficult coding (Raghavan & Gayar, 2019). The machine-learning model learns by training the datasets and decision or predictions can be made based on the historical data trained. Fraud is dynamic in nature and the recent technological advancement coupled with the creative methods employed by the fraudsters necessitates the use of a versatile algorithm capable of studying historical data and identifying anomalies in the dataset. Compared to the rules-based approach, the machine learning approach was preferred in this study because of its capability to study historical data, and establish a relationship, which is useful for making future predictions. It is also time effective and can accelerate fraud detection. In addition, it can identify hidden correlations between the data in real time and classify transactions as normal or fraudulent. The rules-based system approach is a knowledge-based approach, which employs a series of "IF-THEN" statements to reach a conclusion based on certain rules or logic. The rules-based system applies a set of rules to deal with data or some established facts about a situation (Liu & Cocea, 2015). Since fraud has different features with different peculiarities, it implies that different rules must be created for different fraud cases, which might be tedious and timeconsuming. With the emerging digital technologies, which characterise modern banking operations, a robust algorithm that can handle the complexity and volume of transactions will be more efficient in combating fraud. Machine learning algorithms can trace or detect hidden transactions and update detected patterns in real time. The higher the volume of dataset fed into the network, the higher its precision for fraud detection.
The following subsection elaborate more in these topics. Figure 1 presents the proposed framework for the integration of forensic accounting and big data technology for fraud mitigation. Analysis from big data technology can be combined with other evidence acquired by the forensic accounting investigators to substantiate or refute a suspected fraud case. Besides, the forensic accounting investigators can employ the big data analytic techniques such as data mining, machine learning, etc., to uncover fraud cases rather than relying on the traditional data analytics, where a centralized database architecture is employed to store and manage acquired data in a fixed format.

Framework for the integration of forensic accounting and big data technology
A forensic accountant might consider the combination of the data mining and machine learning for investigating suspected fraud cases such as unauthorized changes in the master vendor file, cheque alteration, ghost employees, unauthorized changes in wages or salary falsification, unapproved commissions or corruption schemes, fraudulent expense reimbursements, theft of cash receipts, inventory schemes, as well as fraudulent financial statements amongst others.

Source: Authors'
In the context of fraud investigation by a forensic accountant, big data relate to the acquisition of vast and diverse amount of datasets, which may include structured, semi-structured and unstructured data. The data may be acquired in different volumes thus making it difficult to process or manage using traditional data analytics. Big data can drive the machine learning towards uncovering trends, hidden patterns and relationships in large amounts of raw data in order to make informed decisions (Ngai et al., 2011).
Sometimes, forensic accountants are faced with the challenges of combining the pieces of evidence gathered from different sources. Thus, the big data analytics can aid the integration of the different types of information from multiple sources and transform them into a valuable information in deciding suspected fraud cases. In some cases, it can be combined with the traditional data analytics' and experts' opinion to substantiate evidence of fraud incidences.
The following are the requirements for the integration of the forensic accounting and big data technology as shown in Figure 1.
• Establish the capability of both techniques and their investigators (forensic accounting and big data techniques) as well as the scope of the suspected fraud case.
• Establish the scope of the data and evidence available for analysis.
• Establish the compatibility of forensic accounting and big data techniques. The proposed machine learning techniques works more effectively when used for the analysis of big data (vast amount of data of different types and sources). In a situation where there is limited amount of data for analysis, the traditional data analytics may be considered.
• Establish the type of analysis to be carried out and select the right dig data analytic technique.
For instance, when the focus of the investigator is to detect hidden patterns, trends, anomalies or relationship in the dataset, the data mining approach can be considered. The data mining comprises of the following phases: data acquisition, selection of the target data from the data pool, data pre-processing, data transformation and analysis, pattern identification and evaluation.
On the other hand, the machine learning algorithms can train historical data or information which represents the relationships in the dataset to build models to predict future outcomes. Once the historical data is trained, the machine can apply learnt patterns on new dataset to better and future predictions. Depending on the nature of data available and the scope of investigation, it can be achieved under the supervised, unsupervised, reinforcement or deep learning environments. Under the supervised learning environment, the machine is trained to learn, recognise patterns and make predictions using labelled dataset (input and output datasets). Conversely, under the unsupervised learning environment, the machine is trained to learn, recognise patterns and make predictions using unlabelled dataset (input dataset only). For the reinforcement learning environment, the machine is trained to learn, recognise patterns and make predictions from unfamiliar dataset using trial and error approach. The deep learning is a subset of the machine learning having advanced neural networks inspired by biological neural networks. The neural network has nodes with interconnected layers which communicates with each other to analyse high volume input dataset.
The first step in the implementation of this proposed framework is to establish the capabilities of forensic accounting and big data technology depending on the nature of the suspected fraud case to be investigated. Generally, a forensic accountant must possess strong investigative and analytical capabilities, with a good understanding of the accounting and legal principles (Akinbowale et al., 2020b). The capability and big data analytics skills of the fraud investigator must also be established. These include data preparation and exploration, real-time analytics and reporting, data integration and management skills amongst others. Since forensic accounting and big data technology have different techniques based on the need or requirements, there is a need to ensure compatibility of the forensic accounting and big data technique geared towards fraud mitigation. The selection of the right choice of technique is crucial to the success of the investigation and analysis (Hamdan, 2018). In the data analysis phase of the forensic accounting implementation, big data techniques can be used depending on the fraud case to be investigated. For instance, the data analytic techniques can be used for investigating cases relating to accounts payables, payroll, cash disbursements and reimbursements, journal entries, master vendor lists, accounts receivables and cash receipts, inventory, financial ratios, etc. The use of forensic accounting software and data analytics can allow forensic investigators to carry out multiple tasks in a fraction of the time on all the identified potential red flags.
Data mining techniques with clustering, association, or classification rules can be employed for extracting valuable information from large amount of data. It is suitable for discovering accurate, unique and useful patterns in the data. On the other hand, the machine learning can be employed under the supervised, unsupervised, reinforcement or deep learning environments for investigation and future predictions from historical data.
This study employs simulated data for the implementation of the big data analytics. Specifically, the machine learning approach was used under the supervised learning environment. Under the supervised learning environment, the input and output data are given and the input dataset is trained to obtain the predicted output.

Procedure for the classification analysis
The literature survey highlights 11 potential red flags for internal fraud in the banking industry. A forensic accountant can leverage on the potential red flags identified to detect fraud or monitor employees and transactions. First, a forensic accountant needs to carry out a preliminary analysis of the potential red flags to understand their features, the employees linked to these indicators and scope of occurrence. As it relates to fraud mitigation, the preliminary analysis will provide an insight into the risk levels of the fraud indicators. Based on the risk levels, risk scores can be allocated to the fraud indicators. Thereafter, a forensic accountant may apply machine learning techniques to the dataset for the purpose of fraud detection or prediction, to obtain outputs such as suspicion scores to detect and visualise anomalies in the dataset. Furthermore, a forensic accountant can also use the machine learning technique to identify the relationships and patterns in the dataset as well as the employees involved. This will enable easy tracking of transactions, identification and investigation of suspected fraud cases.
In this example, the identification of the potential red flags and their risk levels was followed by the allocation of scores to the identified internal fraud indicators. A Neural Network with two-layer feed forward, one hidden layer and five hidden neuron layers was created to detect the presence of fraud and classify them into two, viz., fraudulent and non-fraudulent activities. Both the input and output target samples are automatically divided into training, validation, and test datasets, while the confusion matrix was employed to visualise the percentages of correct and incorrect classifications. The essence of the training dataset is to fit the model so that the model can learn from the input data to make the right classification. On the other hand, the validation test dataset is employed for the optimisation of the model parameters, while the test dataset is employed to evaluate the performance of the classifier model.
The simulation dataset considered of 20 banks with the potential red flags for internal fraud established in the literature by Kenyon andTilton (2011) andClayton (2011) are used as the input factors. Scores on a probability scale are allocated to the factors based on the frequency of perpetration in the 20 banks as an example (Table 1).
Where A represents multiple customers' complains, B represents unusual reimbursement, C represent duplicate invoices or transactions, D represents non-standard or suspicious journal entries, E represents activities in dormant or controversial accounts, F represents consistent errors, alterations or discrepancies in financial or accounting records, G represents unrecorded or incomplete documentation of transactions, H represents false documentations or forgery, I represents unauthorised transactions, J represents missing documents that could serve as evidences and K represents provision of photocopies without the original documents etc.
The first goal is to build a classifier that can differentiate between a normal activity and a fraudrelated activity from the simulated data used as an example in this study. This classification analysis is a supervised learning with input and output variables where the classifier learns how to weight multiple features and produces a generalised mapping that is not over-fitted. One of the major limitations of this approach is that it may misclassify activities that are not included in the historical or input data fed into the network.
The output target variable denoted as "t" has two rows, with the 20 values having either [1;0] for a fraudulent activity or [0;1] for a normal activity (non-fraudulent activity) as shown in Table 2. Having identified the factors, the information is fed into the neural network application in a MATLAB 2020b environment to classify non-fraudulent and fraudulent activities.
The neural network is initialized with random initial weights, and a single-hidden layer feed forward neural network with five hidden layer neurons is created and trained using a scaled conjugate gradient backpropagation. The input and output target samples are automatically divided into training, validation, and test datasets by the developed neural network. The training set is used to teach the neural network and the training continues until the performance goal is met. The test dataset provides an independent measure of the accuracy of the network. The validation dataset is used to measure network generalization, and to stop the training when there is no further improvement in the generalization. Figure 2 provides the neural network architecture for the fraud detection problem, which comprises 11 inputs (the factors), 10 hidden layers and 2 output layers (which represents the outcome; fraudulent activity or non-fraudulent activity). The hidden layer is a layer between the input and output layers, which performs a non-linear transformation of the inputs where the neurons take in a set of weighted inputs to produce an output through an activation function. Training is an iterative process carried out until the performance goal is met. Otherwise, the weights and bias can be adjusted until the network is adequately trained for predictive purpose. An adequately trained neural network is signalled by a negligible mean square error.

Procedure for the clustering analysis
The clustering of the fraud indicators presented in Table 1 was also carried out to group the input factors, based on their similarities. The clustering analysis under the unsupervised learning can overcome the limitation of the supervised learning by identifying anomalies in transactions with little or no data. The clustering model can continuously process data and update new data and patterns automatically Figure 3 presents the architecture for the clustering analysis. This is an unsupervised learning comprising only the inputs (11 identified red flags for internal fraud). The aim is to group the red flags based on their similarities. The architecture consists of a self-organising map with a competitive layer that classifies the dataset of vectors with any number of dimensions into the classes as the layers of the neurons. The neurons are arranged in a 2D topology. This permits the layer to form a representation of the distribution and a two-dimensional approximation of the topology of the dataset. The input factors (11 potential red flags for internal fraud perpetration) in Table 1 were fed as a matrix into the MATLAB 2020b and the developed neural network is trained iteratively with the aid of the Self-Organising Map (SOM) batch algorithm. The SOM algorithm is a good clustering algorithm and was considered for use in this study because its classifications can retain topological information about the similarities in the groups.
The fraud attributes in Table 1 act as the inputs into the SOM, which maps them onto a 2-dimensional layer of neurons as shown in Figure 3. The network output is a 100 × 20 matrix, where each ith column represents the jth cluster for each ith input vector with a 1 in its jth element.

Results and discusion
This section presents the results obtained for the classification and clustering analysis, respectively. Figure 4 shows the performance of the network measured in terms of mean squared error, shown on a logarithmic scale. Performance is shown for the training, validation, and test sets and the magnitude of the mean square error decreases as the network is trained. The performance goal was met at the 35 th epoch with a negligible mean square error value of 8.4094e-05. The number of epochs represents the number of times the iteration was performed before the performance goal was met. The negligible value of the mean square error shows that the network has been adequately trained for fraud classification. Figure 5 shows the plot of the gradient and validation check after the training of the neural network. The gradient is 5:6935 � 10 À 7 at 35 epochs. The training stopped at the 35th iteration once the data begins to overfit. The best validation performance shows that there was zero validation failure at the 35 th iteration. After the 35th iteration, the validation may show evidence of failure due to overfitting of data.  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  The small values of the gradient (5:6935 � 10 À 7 ) lend credence to the fact that the error is negligible and that there is a high degree of agreement between the target and the output from the network. Figure 6 shows the error histogram. The error is obtained by finding the difference between the targets and outputs from the network. Large error indicates that the network is not adequately trained and may misclassify the internal fraud indicators. On the other hand, negligible value of error is an indication that the network is adequately trained with minimal chances for misclassification. As shown by the Figure,  Source: Authors-Generated from the Neural Network architecture  The width of the error corresponds to 0.0942. Furthermore, the error at the left-hand side of the plot was −0.04996 when the vertical height of the bin for the data validation is 18. This means that 18 samples from the validation dataset have errors, which fall within such a range. The error range is, therefore, calculated thus;

Results obtained for the classification analysis
Error range ¼ À 0:04996 À À 0:0942 ð Þ 20 ; À 0:04996 þ 0:0942 20 Error range ¼ 0:007208; 0:002212 The range of error is negligible, thus, indicating that there is a high degree of agreement between the targets and the neural network outputs (Daniyan et al., 2020). This also signifies there is a high probability that the neural network can classify the potential fraud red flags correctly as fraudulent and non-fraudulent activities.  Source: Authors -computed from the Neural Network architecture activities (non-fraudulent activities). Figure 8 also shows how the false positive and true positive rates relate to the threshold of the outputs that is varied from 0 to 1. The farther left and up the line is, the fewer the number of the false positives. The best classifiers have a line going from the bottom left corner to the top-right corner, as shown in Figure 8. The confusion matrix classifies its output as "True positive" when the model classifies the output correctly as positive. In other words, when there is agreement between the prediction from the neural network model and the actual situation that the activities are not fraudulent. "True negative" stands for a situation whereby the model classifies the output correctly as negative. In other words, when the prediction from the neural network agrees with the actual situation that the activities are fraudulent. "False positive" is a situation whereby the model predicts that there are fraudulent activities, whereas in the actual situation there are no fraudulent activities. Finally, "False negative" is a situation whereby the model prediction states that there are no fraudulent activities, whereas in the actual condition, there are fraudulent activities.
The results obtained for the classification analysis demonstrated the feasibility of the developed neural network model in classifying internal fraud into two types, fraudulent and non-fraudulent activities. This is evidenced in the negligible percentages of misclassification (5%) and correctly classified activities (95%) obtained from the confusion matrix. Figure 9 presents the SOM topology comprising 100 neurons positioned in a 10 × 10 decagonal grid. Each neuron has learned to represent the different fraud attributes, with adjacent neurons typically representing their similar classes. This implies that the pattern of the fraud attributes can be recognised by the neural network architecture and a forensic accountant can be notified in real time once such a pattern is detected. Likewise, Figure 10 presents the SOM host which calculates the classes for each fraud attributes. It shows the number of fraudulent cases in each class. The areas of neurons with large numbers of hits represent the classes having similar highly populated regions of the feature space. The areas with few hits indicate sparsely populated regions of the feature space. Figures 9 and 10 assist to understand the internal fraud red flags with the same or similar attributes. Internal fraud red flags with the same cluster have the same or similar attributes. This implies that there exists a relationship between the fraud red flags; hence, the clustering will enable proper understanding of the existing relationship and how to tackle it. For Source: Authors -Generated from the Neural Network architecture instance, when a forensic accountant identifies a fraud cluster, it implies that all the potential red flags in the cluster may lead to a fraud case and as such must be investigated. Figure 11 shows the distance between a neuron's class from its neighbours. The areas marked with bright connections indicate the highly connected areas of the input space. On the other hand, the areas marked with dark connections indicate the classes that represent the regions of the feature space, which are distant apart. For instance, the bright connections in the figure may indicate the presence of possible connections among the fraud attributes. For instance, it may indicate the presence of unapproved transactions thus indicating potential fraud. The areas marked with dark connections, which are distant apart, may thus indicate the absence of connections among the attributes, indicating the absence of fraud. The long borders of dark connections that separate the large regions of the input space show that the classes on either side of the Source: Authors -Generated from the Neural Network architecture border depict fraudulent activities with different features. Figure 12 shows the neuron neighbour connections, which are typically used to classify similar samples. This figure indicates the suitability of the unsupervised machine learning approach to detect hidden patterns that may not be visible or detected by manual or other examination techniques. Figure 13 presents the SOM weight positions. It can be seen from the figure that the weights cover all parts of the data. Thus, whenever new data is fed as an input, it can easily be assigned to the exact cluster. The position of the data points implies a good representation of the dataset. Figure 14 shows the weight plane for each of the 11 input attributes. The figure displays the weights that connect each input to each of the 100 neurons in the 10 × 10 decagonal grid. The area with dark patches depicts larger weights. The correlation between the inputs is signalled by the presence of similar weight planes. Source: Authors -generated from the Neural Network architecture Source: Authors -generated from the Neural Network architecture

Conclusion and policy implications
The aim of this study was to investigate the integration of forensic accounting and big data technology frameworks in relation to the mitigation of internal fraud risk in the banking industry. This was achieved with the aid of explanatory research design involving the use of simulated data to mirror the situation in the banking industry. This study contributes to knowledge with the development of a framework for the integration of forensic accounting and big data technology for fraud mitigation. It provides an understanding into the attributes of internal fraud and a practical guided approach to implement an integrated forensic accounting and big data technology framework for internal fraud mitigation.
Furthermore, neural network analysis involving classification and clustering analyses was performed in the MATLAB 2020b environment. From the literature survey, 11 potential red flags for internal fraud in the banking industry were identified. This was followed by the allocation of scores to the identified internal fraud indicators. The results obtained demonstrate the feasibility of classifying internal fraud into three levels of risks and fraud detection. This is evidenced in the percentages of correctly classified activities (95%) and misclassification (5%) obtained from the confusion matrix. In addition, the clustering analysis shows the link among the potential red flags for internal fraud. The understanding of the relationship among the potential red flags may be necessary for making effective decisions regarding the mitigation of internal fraud.
Big data technology is an innovative way a forensic accountant can employ to detect or prevent trends, suspicious transactions or activities. It is so efficient that it can detect and analyse the slight differences in transactions, and flag them as potential fraud activities.
Hence, the banking institutions are encouraged to adjust their business model to incorporate this developed integrated forensic accounting and big data technology framework for mitigating internal fraud to promote customers' satisfaction, and reputation while minimising internal fraudrelated cases. The integration of big data and machine learning into forensic analytics for fraud mitigation is a promising process that requires further research. It could, therefore, use of added value and credibility to the forensic accounting profession, education and research geared towards fraud mitigation. With the integration of big data technology and machine learning into the Source: Authors -generated from the Neural Network architecture forensic accounting framework, there may be a need for the establishment of standard guidelines and procedures for its implementation. Furthermore, there may also be a need for human capacity development through trainings to upskill forensic accountants in this regard. As demonstrated in this study, at the data analysis phase of forensic accounting implementation, machine learning approach can be used to classify and detect fraud-related cases by using neural network algorithms that can solve classification, pattern recognition and clustering problems. For the classification problem, activities can be classified as fraudulent or non-fraudulent activities once the historical data relating to the activities are adequately trained in the network. On the other hand, once the forensic accountant has the clustering machine learning model, it has to be constantly updated by feeding in new data and automatically, new fraud pattern will be detected. This study is limited to the use of simulated data to demonstrate the classification and clustering capacities of the machine learning algortithms under the supervised and unsupervised learning environments. Hence, future works can consider the validation of the developed machine learning models with actual datasets.