METRICS IN SMALL-SIZED QURAN DATASET FOR BENFORD’S LAW

Received 25 September 2021 Accepted 03 November 2021 Available online 22 November 2021 Benford's law is widely applied in testing anomalies in various dataset, including accounting fraud detection and population numbers. It is a statistical regularity, which is said that it works better with larger datasets that span large orders of magnitude distributed in a non-uniform way. In this study, we examine the potential metrics in small-sized Quran dataset that are applicable for the Benford’s law. Against our expectations, we find that the Quran dataset conforms to the Benford’s law. We provide evidence that metrics such as total paragraph per chapter and total verse per chapter conform to Benford’s distribution. However, total verse is closer to Benford’s law prediction compared to total paragraph.


INTRODUCTION
The Quran is an Islam religious text that includes God's message delivered to the Prophet Muhammad S.A.W by Gabriel, an angel, to be recited, comprehended, and practised as a guidance or living style for humanity (Oktaviani et al., 2019). The sacred Quran comprises 114 surahs and approximately 6,236 verses incorporating 77,477 terms, and each verse and term is addressed by interpretation, parse, and explanation (Hegazi et al., 2015). Good clarification of the Quran involves getting accustomed to passages in the Quran. Al-Khatib al-Iskafi asserted that only 28 or roughly 25% of the 114 Quran's chapters do not include identical or repetitive passages (Oktaviani et al., 2019). Benford's Law (BL), sometimes identified as the first digit law, describes how integers are distributed in massive databases. This rule considers that the regularity of the first integers of numbers is not uniformly scattered across lots of naturally occurring structures. Zipf's law is a similar empirical rule. BL, in reality, may be viewed as a particular instance of Zipf's law. Zipf verified that, provided a database containing a language's frequent term, the occurrence of each term is inversely related towards its place in the ordering of term's regularity (Melián et al., 2017).

Pioneer
The renowned Benford's Law (BL) was first postulated by Astronomer Simon Newcomb in 1881 and was established by Frank Benford in 1938 (Melita and Miraglia, 2021). Newcomb and subsequently Benford discovered that the incidence of significant digits across colossal data is not homogeneous but instead follows a logarithmic trend, with lower integers appearing first more often than bigger digits (Ausloos et al., 2014). In 1881, Simon Newcomb noticed that logarithmic tables favour smaller digits in the leading spot. In his publication "Note on the Frequency of Use of the Different Digits in Natural Numbers", he detailed his findings (Druică et al., 2018). Frank Benford, a physicist, published "The Law of Anomalous Numbers" in 1938, wherein he discovered the worn-out pages of logarithmic tables, similar to Newcomb. Benford expanded his study, unknown of Newcomb's work, by collecting more than 20,000 data from various resources to evaluate the probability of occurrence for every leading digit. Benford's Law was born from the findings of this research (Chang, 2017).

Definition and Concepts
Benford's Law is a rule of phenomenology that describes the probability distribution of a data collection's first significant digits (Shi et al., 2017). BL states that lower-order values such as 1, 2, and 3 are more common than higher-order values (Máté et al., 2017). Newcomb-Benford's Law, recognised as "the first significant digit law" or "the law of anomalous numbers," is rooted in a finding in which a rule concerning the circulation of the leading positive significant numbers in numeric values was formalised, where the likelihood of the most significant digits are dispersed unevenly (Palacios, 2020). It emphasises that in a vast data collection, the occurrence of dispersion of the first significant integer meets the respective equation: where 1 is a provided digit, 's first non-zero number while 1 is the Benford's likelihood of that value to occur (Melita and Miraglia, 2021).
Benford showed further that "first digit law" applies to almost every set of values in a given dataset. Random datasets that have constraints, such as winning digits, contact information, or fuel costs, are examples of exclusions (Chang, 2017). The fundamental features of Benford's Law are scaling and basis invariance, with the scale-invariant characteristic implying that BL remains valid although the measurement's units are altered. That is to say, the degree to which certain data fits Benford's Law is unaffected by the measuring structure (Badal-Valero et al., 2018).

Application
Awareness of Benford's Law has increased throughout a variety of uses, such as detecting fraud, system programming, and information extraction, with the evolution of digital technologies and the capacity to analyse enormous data sets (Chang, 2017). It was used to compare the first and second value possibilities to determine the weights, orbiting durations, semimajor axial, eccentricities, and radius of current exoplanets (Melita and Miraglia, 2021). In addition, BL testing was used to see if the worldwide market price of certain financial data gathered by the Financial Times Security Exchange had any inaccurate readings (Shi et al., 2017). One of the disciplines that profited enormously from such discoveries was fraud detection, where BL began to be utilised as the foundation of audits (Druică et al., 2018). In order to detect academic fraud, co-authors and publishers should start introducing an analysis program that uses Benford Law to identify possible "warning sign" publications in order to reduce the likelihood of fraud and thereby improve the reputation of academic research papers (Horton et al., 2020). In the framework of an actual Spanish legal case, we use BL and artificial learning algorithms to identify trends of tax evasion offenders (Badal-Valero et al., 2018). On the other hand, adherence of social networking algorithms and Intelligence Analysis acts to Benford's Law was inspected, where the findings revealed that bots obey BL, implying that utilising this rule can aid in the detection of harmful online programmed entities and associated behaviours on media platforms (Madahali and Hall, 2020).

Claims of Benford's law applied on large datasets
Benford's Law could not operate given tiny datasets, and a sampling of 200 observations for every order or more of magnitude is necessary (Melita and Miraglia, 2021). Furthermore, it may be utilised as a technique for identifying abnormalities in massive datasets, which can be implemented to subgroups of more extensive sets to limit the range of probable unusual information and make the approach easier to manage. The implications of tiny datasets for BL are almost entirely excluded. (Druică et al., 2018). All numbers from 1 to 9 might have an equal chance of appearing as the first number in any practical randomised assessment if the numbers are scattered evenly. That's not the reality since this rule defies logic and seems to be applicable to enormous amounts of data with ease (D'Alessandro, 2020).

METHODOLOGY
The Quran contains 114 chapters (surahs), where each chapter is divided into verses (ayats). The chapters are not equal in length. For example, Surah Al-Kawthar has only three verses, which is the shortest chapter, while the longest chapter is Al-Baqarah contains 286 verses.
The Quran dataset identified in our study based on total Chapter categorised by:

• Total verse per Chapter • Total paragraph per Chapter
This study applied the Benford's law for the two Quran datasets mentioned above in Microsoft Excel, where results are shown in Tables 1  and 2 and Charts 1 and 2.  Figure 1: This chart shows frequency percentage of first digit of total verse in the Quran and frequency percentage of first digit predicted by the Benford's law.
Next table and chart are result related to total paragraph dataset.

DISCUSSION
From Tables 1 and 2 and Figure 1 and 2, dataset of total verse is closer to Benford's law prediction compared to total paragraph. Subsequently, we showed such a closeness mathematically by calculating sums of the squares of the differences between frequency precentage of the datasets and frequency percentage predicted by the Benford's law.
Based on Table 1, we calculated sum of the squares of the differences between frequency of total verse and the Benford's law prediction as shown in Table 3.  Table 2, we calculated sum of the squares of the differences between frequency of total paragraph and the Benford's law prediction as shown in Table 4.

CONCLUSION
This study shows that total verse and total paragraph in the Quran are suitable metrics for the Benford's law. Furthermore, comparing between the metrics, total verse presents a better fit to the Benford's law as compared to total paragraph. This is because its sum of the squares of the differences is closer to zero, which means closer gap between the two values.