Factors affecting the quality of financial statements from an audit point of view: A machine learning approach

Abstract This study examines the influence and importance of firm characteristics on the quality of financial statements of listed companies in Vietnam’s stock market from the audit point of view. We use regression models and machine learning algorithms to investigate data from 2225 observations of listed companies in the period 2014–2020. We find that business profitability, business size, and the size of the Board of Directors positively correlate with the quality of financial statements. In contrast, dividend policy, state ownership, and enterprise listing time have a negative relationship. Results show that the most critical factors affecting financial statement quality include profitability, profit after tax on total assets, state ownership, and enterprise size. This finding has practical implications for market participants and policymakers in improving financial reporting transparency and quality.


Introduction
Financial statement reporting is among the critical information standards that listed companies are required to prepare for legislative obligations (Diouf & Boiral, 2017), (Criado-Jiménez et al., 2008). Generally, a listed company acts as an information provider so that investors, who are considered the primary information recipients, can utilize the information provided for their decision-making processes. However, a gap remains between expectations and the reality of how financial reports are disclosed to meet users' needs. The quality of financial statements depends on the quality of the creation, presentation, and disclosure process of listed companies, which is influenced by many internal and external factors (Van Van Beest et al., 2009). It is essential to prepare and provide financial reporting information at a high standard level because of its positive effect on the investment and financial decision-making processes of capital providers and other stakeholders, thus enhancing overall market efficiency (IASB, 2013).
Financial reporting quality refers to the quality of the information contained in financial reports, including note disclosures. High-quality reporting, which provides relevant and decision-useful information, objectively represents the economic reality of a company's activities during its reporting period. Investors and stakeholders benefit from high-quality financial statement reporting by having greater confidence, improving liquidity, reducing capital costs, and building fair market prices (Kothari, 2000). Financial statements that are not clear and effective will adversely impact users' interpretation of the current financial health of a company. This leads to auditors playing a critical role, ensuring that the accounting process correctly represents the company's financial status (Kueppers & Sullivan, 2010).
The extant literature highlights different strands of concepts and measures of the quality of financial statements. Most studies measure the quality of financial statements indirectly, such as earnings management, financial restatement, and financial statement fraud (Schipper & Vincent, 2003), (Cohen et al., 2004). In these studies, the quality of financial statements only considers conventional financial information perspectives, while financial statement reports cover a broader range of information, including nonfinancial information, such as analysis reports of managers/ executives, auditing reports, and others . Both financial and non-financial information play a vital role in the decision-making process of users of financial statements, particularly external stakeholders (e.g., investors and analysts).
The literature also shows that financial statements provide valuable information about financial health, firm performance, cash flows, and other additional information to assist users in their decision making (Epstein & Jermakowicz, 2008); (Mackenzie et al., 2012). The usefulness of information highly depends on the quality of earnings information (Ball and (Ball & Shivakumar, 2005). Information on profits and their components is important for stakeholders in measuring business performance and forecasting future cash flows (Shubita, 2021). Factors that might influence the quality of financial statement reporting include the quality of government apparatuses, human resource competencies, internal control systems, and information technology (Suwanda, 2015).
There is limited evidence on the quality of financial statement reporting from an audit point of view, with the main focus on the combination of financial statement fraud and audit reports to measure the quality of financial statements (Tang et al., 2016). Studies find that machine learning can significantly improve managerial accounting estimates (e.g., Ding et al., 2020). As such, machine learning tools have been used in earlier studies to predict the quality of accounting numbers (J. L. Perols et al., 2017). For example, (Bertomeu et al., 2021) used a machine learning algorithm to detect misstatements. The authors find that although accounting variables do not sufficiently detect misstatements, they become essential with suitable interactions with audit and market variables in detecting misstatements. To our knowledge, no previous studies on Vietnam have examined the quality of financial statements from an audit perspective using a machine learning algorithm. 1 This study relies on the usefulness of accounting information to highlight the importance of financial statement reporting in assisting information users in making reasonable decisions (Gassen & Schwedler, 2010), (Hitz, 2007). Two key aspects are emphasized: users of financial statements and decision problems. Regarding the former, users of financial statements are primarily investors, managers, policymakers, and the public. The latter refers to the preparation of information that is in demand by users of financial statements. Although each group of users of financial statements (e.g., investors, policymakers, and business managers) has a demand for information, a thorough and correct information preparation will facilitate the decision-making process. However, conflicts often occur between the government and the community under financial market imperfections, mainly when government officials formulate policies regarding their benefits and concerns instead of looking out for their community's best interests. One efficient way to mitigate this problem is to present financial reports transparently and accountably. As specified in the theory of decision usefulness of accounting information, preparing information for financial statement reporting must contain components that meet the needs of decisionmakers and information users. 2 Our study contributes to the literature in several respects, as we examine the quality of financial statements based on a new perspective that differs from previous studies. First, we form an audit opinion based on the misstatement of the audit process and the auditor's opinion of the financial statements. Second, we determined the importance of these factors based on machine learning algorithms. The research results comprehensively and multidimensionally consider the factors affecting the quality of financial statements.
We use data from listed companies in Vietnam's stock market from 2014 to 2020 with 2,225 observations. We collected data from pre-and post-audit reports of listed companies in nine industries, including real estate and Construction, Technology, Industry, Retail and Services, Consumer goods, Energy, Agriculture, Materials, and Health. Using regressions and a machine learning approach, we find that a firm's profitability, size, and board size positively relate to the quality of its financial statements. Adversely, dividend policy, state ownership, and listing year negatively affect information reporting. We also demonstrate that the gradient-boosting algorithm exhibits the most effective self-reporting performance. We show that the most critical factor for the quality of financial statements is profitability, measured by the ratio of earnings after tax to total assets, state ownership, and firm size.
The objective of domestic and foreign research is to study the quality of financial statements from an auditing point of view. Very few studies follow a mixed approach (accounting and auditing) to measure the quality of financial statements (Tang et al., 2016). Meanwhile, in Vietnam, no studies have measured the quality of financial statements. At the same time, previous studies did not use machine learning techniques to determine the importance of these factors on the quality of financial statements. Therefore, the implementation of this study has many theoretical and practical applications.
The remainder of this paper is organized as follows. Section 2 presents the literature review. Section 3 outlines the study's research methods and model. Section 4 presents the results and a discussion. Section 5 concludes.

Research on measuring the quality of financial statements
The literature demonstrates several views on measuring the quality of financial statements. 3 From an accounting perspective, the quality of financial statements can be measured based on the quality characteristics and quality of profits. First, the quality of financial statements was evaluated based on scales built on the quality characteristics of the Financial Accounting Standard Board, including the essential characteristics of Relevance and Reliability. The two secondary characteristics are Consistent and Comparable ), (García Jara et al., 2011. In addition, the quality of financial statements can be measured through earnings such as accrual quality, earnings management, sustainability of profits, and predictability (Summers & Sweeney, 1998), (J. Perols, 2011).
Some studies predict audit reports with unqualified opinions (Pourheydari et al., 2012); (Saif et al., 2013); (Yaşar et al., 2015); (Fernández-Gámez et al., 2016); (Stanišić et al., 2019), (Sánchez-Serrano et al., 2020). business risk, type of audit firm, and the independence of the audit (Dechow et al., 2010). For example, Dechow et al. (2010) review and discuss the causes of variation and consequences of various measures used as indications of "earnings quality," including persistence, accruals, smoothness, timeliness, loss avoidance, investor responsiveness, and external indicators, such as restatements and SEC enforcement releases. No single conclusion is drawn as "quality" is contingent on the decision context. The authors highlight that the "quality" of earnings is subject to a firm's fundamental performance. (Qinghua et al., 2007) examine the relationship between audit committees, board characteristics, and financial statement quality in the Chinese stock market. The authors used the adjusted Jones model to measure the quality of listed companies' financial statements based on the level of earnings management. The study finds no significant impacts of variables capturing board behavior characteristics, including the ratio of shares owned by the board, yearly board meeting frequency, the number of independent directors holding posts concurrently in the controlling shareholder's company, and the quality of financial reporting. In particular, board meeting frequency has an abnormally negative effect on the quality of financial reporting. (Anichebe, 2019) analyzed the relationship between financial statement fraud and corporate governance elements of agricultural listed companies in Nigeria. Data are collected from annual reports of agricultural firms listed on the Nigerian Stock Exchange during the financial years 2013-2017. Applying longitudinal design and binary logit regression methods, the authors find that corporate governance variables lead to the probability of financial statement fraud of 53%. The results show statistically positive impacts of audit committees, board independence, board members' financial expertise, and firm size on the likelihood of financial statement fraud. (Alves, 2014) examined the influence of board independence on financial reporting quality in Portugal. The author also analyzes the relationship between other factors, such as financial leverage, net cash flow, investment opportunities, type of audit firm, and firm size. Using ordinary least squares and two-stage least squares techniques, the study finds that only the type of audit firm does not affect the quality of financial statements; the other factors do.
( Van Van Beest et al., 2009) examined the quality of financial reporting in terms of fundamental and enhancing qualitative characteristics, including relevance and faithful representation, understandability, comparability, verifiability, and timeliness. Data are sourced from 231 annual reports of listed companies in the US, UK, and the Netherlands for the period 2005-2007. The authors confirm the validity and reliability of the compound measurement tool by using Krippendorff's alpha and Cronbach's alpha approaches to assess the quality of financial reporting information. In addition, (Dachi, 2019) examined the determinants of financial statement quality using information technology as a moderating variable in 28 Regional Apparatus Organizations of South Nias. Data were collected from 105 questionnaire samples and processed using SEM. The authors found no impact of human resource competence on the quality of financial statements, while they indicated a significant positive relationship between the internal control system and the quality of financial statements. It is noted that information technology has no real impact on the link between human resource competencies and the quality of financial statements. However, this approach might weaken the effect of the internal control system on the quality of financial statements in the South Nias Regency. (Spathis et al., 2003) use client performance measures to identify pre-engagement factors associated with qualified audit reports in Greece by testing to what extent corporate performance measures can enhance the selection between a qualified or unqualified (clean) audit report. Data are sourced from the financial statements, auditors' opinions, and financial statement notes of Greek companies that received a qualified audit report and those that received an unqualified audit report. A multi-criteria decision aid classification method (UTADIS-UTilités Additives Discriminates) is used to model the auditor's qualification, which is then compared with other statistical techniques, such as discriminant and logit analysis. The results show that audit firms are more likely to be exposed to the risk of losing a client if they issue a qualification. However, failing to qualify causes the auditor to face potential lawsuits and lose its reputation. The results show that financial ratios and nonfinancial information, such as client litigation, affect qualification decisions, and the accuracy of the developed models in classifying the total sample is 80 percent of total sample.
In a similar vein of a research in Greek context, (Caramanis & Spathis, 2006) aim to analyse the impact of auditee and audit firm characteristics on audit qualification by testing whether combinations of financial and nonfinancial variables can be used to predict qualified and unqualified audit reports. Employing the data of 185 listed companies in the Athens stock market and applying OLS regression models, the authors find no effect of audit fees or the type of audit firm on the propensity for auditors to qualify their opinions. However, audit qualifications are statistically linked with financial metrics, such as the operating margin to total assets ratio and the current ratio. The accuracy of developed models in classifying the total sample was rated at 90% of total sample.
Machine learning methods in financial reporting quality research have also been used in some studies. (Bertomeu et al., 2021) emphasized the validity of the machine learning approach in detecting and interpreting patterns in ongoing accounting misstatements. The authors used a series of variables extracted from accounting, capital markets, governance, and auditing in the Audit Analytics Non-Reliance Restatement database to detect misstatements. The authors find that accounting variables do not contribute to detecting misstatements on their own; however, they play an essential role when they interact with audit and market variables. The authors also showed the differences between misstatements and irregularities through algorithm comparison and short-term predictions at risk of misstatements.
(Ding et al., 2020) analyze whether machine learning improves accounting estimates using data extracted from the US-based property and insurance companies' annual reports from 1996 to 2007. This study found that machine learning can substantially improve managerial estimates by applying MAE and RMSE metrics to evaluate model performance. The authors show surprising findings that loss estimates generated by machine learning are more accurate than managers' actual estimates of financial reports in four out of the five insurance lines examined. This study discusses how accounting estimates generated by machine learning have multiple uses in practice, particularly in enhancing the use of financial information by stakeholders.
(J. Perols, 2011) analyzed the differences in the performance of six popular statistical and machine learning models in detecting financial statement fraud. Under various assumptions of misclassification costs and ratios of fraud firms to non-fraud firms, the author finds a better performance of logistic regression and support vector machines, as opposed to an artificial neural network, bagging, C4.5, and stacking. Results also highlight the diversity in predictors used across the classification algorithms: the study uses 6 out of 42 examined predictors selected by different classification algorithms consistently, including auditor turnover, total discretionary accruals, Big 4 auditor, accounts receivable, meeting or beating analyst forecasts, and unexpected employee productivity. This study contributes to the literature on financial statement fraud and discusses implications for practitioners and regulators in improving fraud risk models. (Pourheydari et al., 2012) used data-mining methods with a focus on artificial neural networks to develop models for identifying qualified audit opinions. The four data mining classification techniques used in their study include the multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), radial basic function network (RBF), and logistic regression (LR). Both qualitative and quantitative variables are explored, leading to the result that the probabilistic neural network (PNN) is the most balanced model for identifying the type of auditor's opinion. As opposed to others, this technique also has a large amount of error in identifying unqualified (clean) and qualified reports. The radial basic function network, as compared to the remaining techniques, appears to show the highest performance level in identifying qualified opinions, while logistic regression returns the poorest performance. Research implications are discussed with insights into internal and external auditors and the company's decision makers. Using publicly available information, (Feroz et al., 2000) examine an artificial neural network (ANN) approach to predict SEC investigation targets due to its association with substantial losses in equity value. Using the adaptive learning processes to determine essential factors in predicting targets, the ANNs return results that classify the membership in target (investigated) versus control (non-investigated) firms with an average accuracy of 81%. The results show that the participants in financial reporting frauds have incentives to appear prosperous because of high profitability and that the ANN application is less likely to be affected by accounting manipulations. This study confirms that the value of red flags along with non-financial red flags remains a predictive value.
(Sánchez-Serrano et al., 2020) aimed to provide a new model for predicting the audit opinion of consolidated financial statements by analyzing the variables that affect the probability of obtaining a qualified opinion. The authors applied an artificial neural network technique, the multilayer perceptron (MLP), to a sample of Spanish companies. Results show that the developed method accurately predicted the audit opinions above 86%. Furthermore, the study emphasizes essential differences in the most significant variables representing audit opinion prediction for individual accounts. The variables referring to industry, group size, auditors, and board members were converted into the main explanatory parameters of the prediction when using consolidated financial statements.
Many studies have examined the factors affecting the quality of financial statements. The quality of financial statements has also been measured in different ways; each study often focuses on one or several groups of factors on the Board of Directors, company characteristics, and heterogeneous results. Based on the overview, research shows that legal regulations, the business environment, and stock market development have not been completed in a developing economy, so studying the factors affecting the quality of financial statements makes sense.

Measuring the quality of financial statements
Based on DeFond and Zhang (2014), we measure the quality of financial statements using two criteria: (1) Material misstatements expressed in the calculation of the difference in profits before and after the audit (2) Communicating an auditor's opinion and expressing an audit opinion.
Both these factors are expressed through the auditor's audit opinion (Spathis et al., 2003), the approval of the audit report, and the irregularity/fraud situation in the financial statements to assess the quality of financial statement information. Based on these studies, if the financial statements with the audited signature are not entirely accepted, the quality of the financial statements will be low, and the more significant the difference in profit before and after the audit, the lower the quality of the financial statements. We build the matrix in Table 1 to measure the quality of financial statements based on the audit outputs. Table 1 shows that the smaller the financial statements, the lower their quality of the financial statements. Quality is the lowest if financial statements have a price of 1. The quality is the highest if it is equal to 5.

Assess the importance of factors to the quality of financial statements
To assess the importance of corporate characteristics on the quality of financial statements, we followed the following steps: Profit difference before and after the audit Less than 5% Step 1: Build a regression model and evaluate the impact of the factors on the quality of financial statements.
Step 2: Machine learning algorithms are used to evaluate the importance of the factors in the quality of financial statements.
The objective of the study, in addition to determining the influence of factors on FRQ, we also consider the importance of factors to FRQ. To achieve the research goal, the regression method is not enough, so we use ML algorithms to consider the importance of the factors to the FRQ. When using ML, we take a comprehensive look by using 3 approaches which are (i) Coefficients as Feature Importance, (ii) Tree-Based Feature Importance and (iii) Permutation Feature Importance.

Linear models
Linear Regression: As first linear model, linear regression with the usual least-squares method was implemented. The aim was to minimize the sum of squares between the true and estimated values by fitting the linear model with the coefficients (Pedregosa et al., 2011).
Ridge regression and Lasso regression are two regression models that apply regularization techniques to avoid overfitting. Overfit is a phenomenon in which the model fits well on the training dataset but does not predict well on the test data. This is often the case when training machinelearning models. This phenomenon has a negative influence and makes the model inapplicable because the predictions are incorrect when applied in practice. There are several causes of overfitting. One of the common reasons is that the training dataset and forecast data have different distributions, which leads to the rules learned in the training data not being valid in the prediction data. Alternatively, it can also be from the model side that there are too many parameters; therefore, its data representation is not representative. Regularization avoids overfitting by adding a calibration component to the loss function. Usually, this component is the standard norm of first -or second-order coefficients. In the case of quadratic regression, we call it ridge regression; in the case of degree one, it is called lasso regression.
For these regressions, we need to refine coefficient α to find the best coefficient for each dataset. In the case of severely overfitting data, it is necessary to reduce overfitting by increasing the effect of the regularization term by increasing coefficient α. If the model does not overfitting, then α can be chosen to be close to 0. In the case α = 0, the regression equation is equivalent to multivariable linear regression. (Zou & Hastie, 2005)was used to add explanatory power. This is a continuation of the linear regression models trained with Lasso's L1 and Ridge's L2 penalty. Combining the penalties of both methods in one model produces a regular, competitive model in which the weight of the parameter is nonzero (Pedregosa et al., 2011).

Decision tree
Decision Tree: The decision tree is a classification model introduced by (Belson, 1959) and is widely used in many fields. After introducing the machine learning method system, the decision tree was further developed with the C4.5 algorithm by (J Ross Quinlan, 1996)and the ID3 algorithm by (J. Ross Quinlan, 1986). A Decision Tree is a structured classification tree that classifies objects based on a sequence of rules. Independent variables and attributes can be of different types, such as binary, nominal, ordinal, and quantitative data. Each variable's information weight (entropy) is calculated to determine which variable to use the classification first; the higher the information value, the more categorical the variable.

Random Forest: Random Forest (random forest) is an attribute classification method developed by
Leo Breiman at the University of California, Berkeley. Breiman is also a co-author of the Classification and Regression Trees (CART) method, rated as one of the ten data mining methods. In a random forest, a significant improvement in the classification accuracy results from the growth of a set of trees, each of which "votes" for the most popular class. Typically, random vectors are generated to develop these sets of trees, which govern the growth of each tree term in the aforementioned sets. For the kth tree in the set of trees, a random vector Vk is generated, independent of the previously generated vectors V1, V2, . . ., Vk-1; however, the distribution of the vectors is similar. A tree is grown based on the training set and the resulting vector Vk is a subclass h(x, Vk), where x is the input vector. After many trees are created, they "vote" for the most popular class.

AdaBoost:
Boosting is a technique of sequentially combining machine learning algorithms on a population of sample spaces, then aggregating different classification results to obtain an effective classifier. An efficient boosting algorithm is AdaBoost (Adaptive Boosting), which uses error allocation weights assigned to each sample, as shown. The original algorithm allocates equivalent weights to each training sample. In each iteration, the algorithm performs: (i) training the sample using a weak classifier; (ii) checking whether the classification results for that training sample are correct; (iii) recalculating the error weight distribution on the samples in the direction of increasing the error weight on misclassified samples and decreasing the error weight on correctly classified samples. After completing the loop, the algorithm synthesizes the member classifiers into a composite classifier Gradient Boosting: Gradient Boost is a synthesis algorithm that uses boosting methods to develop an advanced prediction engine. In many ways, Gradient Boost is similar to AdaBoost, but with a few key differences. Unlike AdaBoost, which builds decision trees, Gradient Boost builds trees that typically have 8-32 leaves. Gradient Boost views the boosting problem as an optimization problem, where it uses a loss function and attempts to minimize the error. Therefore, it is called a gradient boost because it is inspired by gradient descent. Finally, the tree was used to predict the residuals of the samples (prediction minus reality). Gradient Boost starts by building a tree to fit the data, and subsequent trees are built to reduce residuals (errors). This is done by focusing on areas where existing learners are underperforming, similar to AdaBoost.

SVM and KNN models
The remainder of the two applied models are grouped. Although they are not identical in the former, they use the same method to evaluate the relationship between financial ratios and stock returns.

Support Vector Machine (SVM):
SVM is a binary classification algorithm. It uses the input data and classifies them into two classes. The SVM algorithm builds an SVM model to classify the other examples into two categories. The support vector machine (SVM) builds a hyperplane to classify the dataset into two separate classes. To do this, the SVM constructs a hyperplane or a set of hyperplanes in a multidimensional or infinite-dimensional space, which can be used for classification, regression, or other tasks. For the best classification, it is necessary to determine the optimal hyperplane located as far away from the data points of all classes as possible, because, in general, the larger the margin, the greater the generalization error of the algorithm.

K-Nearest Neighbors (kNN):
The K-Nearest Neighbors algorithm (K-NN) is commonly used in the field of Data Mining. K-NN is a method to classify objects based on the closest distance between the object to be classified (query point) and all objects in the Training Data. An object is classified based on its K-neighbors. K is a positive integer that is determined before executing the algorithm. The Euclidean distance is often used to calculate the distance between objects. This simple algorithm can solve the regression problems proposed by (Altman, 1992). The nearest neighbor method predicts the output using k training data.

B. The method of assessing the importance of the factors
Assigning scores to the input features in a predictive model is called feature importance. Feature scores are an essential part of predictive modeling because they can be used to enhance the performance of the model and gain insight into the dataset and the model. The relative scores provided can be used to determine the features that are most relevant to the study. There are several types of feature scores in these techniques. Those that are simple to calculate are statistical correlation scores, such as Pearson's correlation and Spearman's rank for linear and nonlinear correlation, respectively.
Three types of more advanced feature importance scores are also implemented from the model coefficients as part of the linear model, decision tree-based model, and permutation importance, which are described in (Pedregosa et al., 2011). The three essential properties are described below: Coefficients as Feature Importance: After fitting a linear machine-learning model to the dataset, the coefficient of each input variable can be retrieved and stated as a feature importance score. This comparison is possible because the dataset is normalized, and the variables have the same scale. This approach was applied to linear regression and elastic net models for feature-criticalpoint retrieval.
Tree-Based Feature Importance: Decision tree algorithms, such as the CART algorithm implemented in this study, provide sci-kit-learning implementations of feature importance reductionbased feature reduction. This was used to select the split points. This approach was adopted for the Decision Tree model and all tree-based aggregation methods, such as Random Forest, Gradient Boosting, and AdaBoost.
Permutation Feature Importance: This technique computes the relative importance score independent of the model used. After fitting a model to the dataset, a prediction was made, which was repeated five times for each feature in the dataset, resulting in an average significance score for each top feature. This technique is suitable for models that do not provide original feature criticality, such as the k-nearest Neighbors and SVM in this study. The essential characteristics are identified as follows.
F: absolute value of the relative importance score of the feature generated by the model.

Data
Our study investigates the influence of the Board of Directors on the quality of the financial statements of companies listed on the Vietnam Stock Exchange in 2014-2020 with 2,225 observations. Data were collected from the financial statements before and after auditing from the Vietstock database. Table 3 presents the data by year and industry.

Financial reporting quality measurement results
As illustrated in Figure 1, the difference in profit after tax was less than 1%, accounting for 59.7%. The number of companies with an insignificant difference in profit after tax from 1% to 5% accounted for 19.5%. Significantly, the number of companies with a difference in profit after tax from 50% to-100% and over 100% are 2.8% and 2.3%, respectively.
Among 2,225 observations, 1,420 observations have a difference in profit after tax pre-and post-audit, accounting for 63.8%. Table 4 shows that the difference in profit before and after the audit tends to decrease more, with a small percentage. Table 5 shows the number of qualified opinions has 1,853 observations, accounting for (83.28%, of which 251 audit reports have strong opinions with a rate of 11.28%); (financial statements except for 5.3%). The audit opinion rejected was 0.09%, and there is only one financial statement had a negative opinion, accounting for 0.04%.
When measuring the quality of financial statements from an audit perspective, we rely on two aspects of audit results. First, we use the difference in profit before and after auditing based on the difference in profit after tax, as presented in Figure 1. The audit results for the financial statements are presented in Table 5, and a summary of the financial statements is presented in Table 6. Based on the audit results, the audit opinion on the financial statements and the difference in profit before and after the audit (measurements are presented in Table 1), Table 6 determines the quality of the financial statements of the enterprise, with level 5 showing financial reports of good quality and level 1 showing a financial report of poor quality. Table 7 reports the descriptive statistics of variables used in the baseline model. The FRQ variable has a mean value of 4.35, with a standard deviation of 1.13. The firm's profitability is 6.3% on average. While the average rates of foreign ownership and state ownership are 13.45% and 67.15%, respectively, the average number of members of the Board of Directors is 5.71. The proportion of independent non-executive members of the Board of Directors is 68.2%, the lowest is 20%, and the highest is 100%. On average, 21.92 % of enterprises have a Chairman of the Board of Directors cum General Director, enterprises with members of the Board of Directors who are significant shareholders account for 9.8%, and the average listing time of enterprises is 9.5 years. Figure 2 shows the correlation coefficients between the variables. In general, none of the values between the paired variables was significantly high, thereby alleviating the concern of multicollinearity in our model.  board of directors (SIZEB), and concentration of ownership (BLOCK) have dimensional and statistically significant effects on the quality of financial statements. Our findings are consistent with studies of (Xie et al., 2003), (Alves, 2014), (Abed et al., 2012), (Chalaki et al., 2012).

Regression model results and consider the importance of independent variables
We find that policies on dividend payout ratio (DIV), state ownership rate (SOWN), and listing time of enterprises have a negative and significant influence on the quality of financial statements. Our findings are similar to those of (Soliman & Ragab, 2014), (H. N. H. n.d.ang et al., 2019), Hoang et al., 2019, Hung et al., 2018, (Van Khanh & Hung, 2020, (Chalaki et al., 2012).
Next, we use the Linear Regression, Lasso, Ridge, and ElasticNet algorithms to determine the coefficients of each financial indicator, as presented in Table 9. Based on each financial indicator's Root Mean Squared Error coefficient, we determine the F index (fused) of the regression coefficient.
As shown in Figure 3, the most crucial factor is return on assets (ROA), with a value of 0.281, followed by state ownership (SOWN), and the size of the board of directors (SIZEB) with values of 0.140 and 0.119, respectively. The three financial ratios with the lowest level of importance are financial leverage (LV), short-term liquidity (LIQ), and foreign ownership (FOWN), with values of 0.003, 0.005, and 0.008, respectively.
Next, we used four algorithms of the decision tree group: random ores, decision tree, AdaBoost, and gradient boosting. Table 10 shows the number of financial statements based on decision tree algorithms. Figure 4 demonstrates the similar importance values of the factors affecting the quality of synthesized financial statements.
Among the 11 factors affecting the quality of financial statements, ROA has the highest importance score of 0.424, followed by state ownership (SOWN) and the index of enterprise size by assets (SIZE). In contrast, the three indexes with the lowest importance are the factors of the Chairman of the Board of Directors (0.022), concentration of significant shareholder ownership (0.026), and size of the board of directors (0.027). Table 11 lists the initial values provided by the Permutation Feature Importance method. The result is the average weighted score for each financial indicator, as determined in Column [3] F (Fused). The Permutation Feature Importance value is smaller than the Feature Importance and Coefficients; therefore, they are not comparable.
According to Figure 5 and Figure 6, the three most important factors affecting the quality of financial statements, according to the Permutation Feature Importance index, are profitability after tax on total assets (0.012), state ownership (0.011), and enterprise size (0.009). In contrast,  Table 5   For a comprehensive assessment, we rank the results of the three methods as a bar chart to consider the importance of 11 financial ratios with values ranging from Each factor is grouped into three groups: "Coefficients as Feature Importance," "Feature Importance," and "Permutation Feature Importance," for a better comparison ranking of financial ratios based on the approach. We show that the three most important factors affecting the quality of financial statements are profitability of profit after tax on assets (ROA), state ownership (SOWN), and the size of the enterprise (SIZE).

Conclusions and recommendations
Our study used a sample of 2,225 observations of listed companies in the Vietnamese stock market from 2014 to 2020 to examine the role and importance of factors in the quality of financial statements using the machine learning method. The results show that several factors, including corporate profitability, firm size, and size of the Board of Directors, have a positive relationship with  Figure 3. Coefficients as feature importance.
the quality of financial statements. By contrast, dividend policy, ownership state, and enterprise listing time negatively affect the quality of financial statements. Our study shows that corporate profitability, state ownership, and firm size are the most critical factors affecting the quality of financial statements from an auditing point of view.
Based on the research results, the authors propose the following policy implications: -Enterprises that want to attract investors' attention should provide more information related to financial statement quality measurement models and support investors and analysts with more complete information in making decisions. Although the necessary information for the measurement models has been provided through financial statements, investors who want such information must spend a lot of time synthesizing and processing it, so it is possible in the future. In the process of providing information on annual reports, enterprises need to add the necessary information that the measurement models need about the quality of financial statements. Since then, investors have become more interested in and trusted when making investment decisions.
-For the factor of corporate profitability, which has a positive influence on the quality of financial statements, specifically, the higher the ratio of net profit to total assets, the higher the  Figure 4. Feature importance.
quality of financial statements. It can be said that the results in this study on the influence of factors related to company efficiency on the quality of financial statements show that companies listed on the Vietnamese stock market tend to have high efficiency. The higher the results, the higher the quality of financial statements; however, stakeholders need to be more careful when  Coefficients as Feature Importance Feature importance Permutation Feature Importance Figure 6. Feature Importance Ranking.
using information on the financial statements of listed companies before making economic decisions.
Our study is limited to the use of basic machine learning algorithms such as random forest, AdaBoost, gradient boosting, and KNeighbor. In addition, the number of indicators is limited to 11. This allows future studies to review and use other algorithms such as neural networks (NN) and support vector machines (SVM). Future studies may consider examining other attributes of financial statements and governance aspects to evaluate the quality of financial statements more effectively from an audit point of view.