Toward News Authenticity: Synthesizing Natural Language Processing and Human Expert Opinion to Evaluate News

The growing popularity of online news has prompted concerns regarding (i) the socio-political influence over news dissemination, (ii) the waning freedom of news media, (iii) and a facile news evaluation process. A piece of news having the power to capture a large audience and sow the seed of bizarre consequences on a national scale should be prudently evaluated before reaching the mass. In quest of making a substantial profit, and sometimes due to inevitable socio-political influence, news with biased heading outpours mass media, resulting in ambiguity and mass manipulation. In this paper, we suggest a blockchain, smart contract, and incremental machine learning-based news evaluation procedure for the Bengali language to overcome these challenges. Weighted synthesis of machine classification and human expert opinion in a decentralized platform are synthesized to evaluate news. With continuous data, the Natural Language Processing (NLP) model is incrementally trained, and the best version of the model is used to detect deprived fake news. During experiments, the NLP model with initial training and testing accuracy of 84.94% and 84.99% was increased to 93.75% and 93.80% after nine rounds of incremental model training. On the Ethereum test network, the protocols have been installed and tested. The simulation demonstrates successful implementation of our proposed system.


I. INTRODUCTION
Bengali is the language of more than 310 million people, mostly residing in India and Bangladesh [1]. Bengali, as a distinct language, has some similarities to Sanskrit. It is now the mother tongue of Bangladesh and the second most spoken language in India [2]. There are approximately 300 million native speakers and an additional 37 million second-language speakers. Bengali is the fifth most often used native language and the seventh most widely used language overall in the world. Bengali is the world's fifth most widely spoken The associate editor coordinating the review of this manuscript and approving it for publication was Siddhartha Bhattacharyya .
Indo-European language. A total of 98% of Bangladeshis speak Bengali as their first language, making it the country's official and national tongue. Bengali is the official language of the Indian states of West Bengal, Tripura, and the Barak Valley region of Assam. It is also the official second language of the Indian state of Jharkhand [3].
Bengali language and its alphabets are difficult, complicated, varied, and have large lexical-recourses [4], which makes the research on Bengali news more challenging. Fake news has been a significant challenge in this region due to the uprising of social media as a news accumulation medium for the mass [5], [6]. On the other hand, despite being democratic, the freedom of both of the country's mainstream media is VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the verge of jeopardy. According to Reporters Without Borders or World Press Freedom Index [7], India stands 150 th , and Bangladesh stands at 162 nd among 180 countries which manifests that the mainstream media cannot publish unbiased news. While the mainstream media is subjected [8], online news media, including social media sites, are also more prone to disseminating fake news with deprived motivation. The scale of impact of fake news is undoubtedly harmful, so the news we read or receive must have a genuine source, and the information provided in the news must be authentic. Unfortunately, the digitization of news, in general, has offered immoral individuals access to more effortless ways to spread fake news or misinformation on the internet [6].
Many news portals and sources use fake news to misguide people or benefit from false information. Factors such as political polarization, motivated reasoning, and social media algorithms are responsible for fake news [9]. A surge of false statements is being caused even by reputed media, including minor news sources on social platforms. A popular TV station did one example of such a vile incident, with over 10 million YouTube subscribers claiming that the Saudi state had decided to approve a draft for redesigning their national flag and remove the Kalema (Islamic declaration of faith) [10]. In reality, no such thing occurred. Sometimes fake news may seem harmless, but in south Asian countries like India and Bangladesh, the study of Alkawaz et al. and the survey of Riaz et al. revealed that fake news could cause mob killings and religious extremism and exerts political influence on people [11], [12], [13]. Hameleers announced that the underpinning characteristic of fake news is representing simulated statistical data or expert suggestion maneuvering pre-existing beliefs [14]. This study [14] highlights that the scarcity of fact-checking by experts triggers misinformation which leads to fake news.
To overcome the problem of political influence on news and the absence of a proper news evaluation process, we propose a decentralized Bengali fake news verification platform. It utilizes the privacy-preservation, immutability, and transparency characteristics of blockchain to gather human expert opinions on the authenticity of specific news. It combines expert opinion with the output of machine learning classification (NLP model). To incorporate human opinion and machine decision, we proposed a novel synthesis mechanism that exerts more importance on human expert opinion and subtly less weightage on machine classification. Human cognitive decision-making ability being superior to a machine algorithm [15], [16] has led to this weightage criterion. Though, the human-decision making can be contrived and sometimes influenced [17], blockchain technology in participation with smart-contract protocol has been proven effective for collaborative decision-making [18]. On the other hand, due to the lack of a proper representative dataset [19] and the morphological complexity [20], Bengali is a complex language to work on and achieve accuracy. In order to assess the veracity of the news, this study combines human reasoning capacity with a potent NLP language model. Some basic assumptions that are considered in the study are: a) The news media lacks sufficient independence to cover the news in mainstream media. b) Fake news is being spread on social media for monetary and political benefit, occasionally just to get more ''views'' on a page. c) A suitable and transparent technique for evaluating news could be provided by human and synthesizing machine (NLP) evaluation. d) Due to a lack of reliable verification methods, social media users are disseminating news stories without first validating their accuracy. e) A blockchain-based system with privacy protections enables voting, uploading, and checking without concern for political repression. Our proposed solution enables the user to post any news into the system for evaluation. The publisher node initially checks the news for twaddle and then broadcasts the news into the blockchain network. The evaluators vote on the news topic as either authentic or fake. At the same time, the news is also classified using an NLP model. The evaluation process is immutable and transparent, powered by blockchain technology. After the evaluation, the human opinion and machine decision are combined using the novel synthesis procedure introduced in this paper. From the result, the concerned participant can get a unified and reliable prediction about the legitimacy of the news. The model is incrementally trained with new data to keep the NLP model up-to-date with variable data. The contributions of this study can be summarized as follows.
• Create an NLP model based on Bangla Bert models using Transfer Learning.
• Propose decentralized news verification (by human experts) using blockchain technology governed by a smart contract.
• Present a novel synthesis procedure combining human opinion and machine decision.
• Improve the performance of the NLP model by incrementally training the model with new data. The main objective of including human voting on the news is to combine human analysis with the synthesis results produced by the NLP model. The blockchain's consensus mechanism serves as the same justification or authentication in this case as in a regular blockchain-based transaction involving the crypto currency. Consensus techniques are used to verify the transactions, and the underlying blockchain is kept safe [21]. A distributed, decentralized, frequently open ledger used to record transactions is known as a blockchain. Each of these transactions is represented as a separate ''block'' of data that must pass independent peer-to-peer network verification before being included in the chain. Consensus mechanisms are employed by blockchain networks to ensure that all users (also known as ''nodes'') concur on a single version of history. Whenever a piece of new news is submitted to our system, the verifier nodes can cast their vote (human input) on that news. The verifiers are news media representatives who are always up to date with new news and its authenticity. Just like transactions, it is news for our case. Human verification is only another step in the verification process, and since all of our verification channels are journalists and media representatives, this voting is later merged with an NLP model output that has excellent accuracy metrics and precise prediction capability.
The rest of the paper is documented as follows. The background research on our subject is covered in Section II, which also outlines the development of the field's expertise over time. The selection of the natural language processing model, dataset processing, and assessment metrics are all covered in detail in Section III. Our system's architecture is presented in Section IV, along with a thorough examination of its components and associated actors. The system testing results have been shown in section VI by simulations of the system testing, experimenting with the algorithms, and testing the continuous learning of the NLP model. After discussing the system implications in section VII, the paper is concluded in section 8.

II. BACKGROUND TECHNOLOGIES
Significant progress has been made in research on the English language, although relatively few publications are accessible for low-resource languages like Bangla. There are other languages, like Bangla, for which the lack of datasets was the initial obstacle. A hoax detection model for the Indonesian language was demonstrated by several Indonesian students utilizing a dataset of 250 pages of real and fake news stories [22]. A database of 50K news articles published in Chinese is used by the real-time news certification mechanism on the Chinese microblogging website [23]. An area of recent research has been the automatic detection of rumors on social media. These methods can be divided into two groups: propagation-based and classification-based [23]. The classification-based method uses supervised learning algorithms to determine the credibility of news. Microblog data can be used to retrieve a vast array of lexical and semantic properties, and labeled data can be used to create supervised learning algorithms.
Additionally, a number of scholars propose cuttingedge techniques for identifying implicit patterns. A hybrid SVM classifier built on a graph kernel is presented by Wu et al. [24] to identify high-order message propagation patterns. The propagation strategy, such as that provided by Gupta et al. [25], has been highly successful and outperforms the classification-based approach.
Furthermore, fake news can sometimes result from constraints on the news media. Several researchers pointed out that socio-political influence and rules result in fabricated news [71] in the mainstream media. At the same time, social media is used to spread fake news to manipulate public opinion [73]. Shannon McKeown [26] examines the connection between Donald Trump's Twitter insults and conventional propaganda techniques in his study to evaluate the impact and reach of his social media presence. Again, bogus political news played a role in one of the most horrific disasters humanity has experienced in recent years: Covid-19. Wallace Chipidza pointed out the impact of toxicity on the establishment of COVID-19 news networks in political subcommunities on Reddit in his paper [27]. In COVID-19, news sources are divided into five communities: mainstream, international, right-wing, scientific, and left-wing. Right-wing sources are connected with the most toxicity, while scientific sources are related to the lowest toxicity. Then, sources with high toxicity levels are more popular than those with low toxicity levels. He claimed that the ''greatest obstacle'' to successful COVID-19 pandemic mitigation measures in the United States continues to be political polarization. Social media has been held responsible for this split.

A. STATE OF NLP RESEARCH IN FAKE NEWS DETECTION
Natural language processing (NLP) has recently gained much attention due to its potential to accurately represent and interpret human language using computers [28]. Machine translation [29], detecting spamming mail [30], extracting information [31], as well as in medicine in recent times, such as Covid [32] are just some of the many applications it has found success in recently. Successors to the BERT (Bidirectional Encoder Representations from Transformers) method have also proven beneficial for NLP [33]. The examples above for elementary NLP may be solved using BERT models, and it has been utilized for other applications, such as text categorization [34] and Semantics and Aspects Analysis [35].
Fact-checking with the assistance of specialists was employed in earlier systems to detect false news, but this method was labor-intensive and time-consuming [36]. Since then, professionals in the field of natural language processing have developed automated solutions utilizing machine learning and deep learning [37]. Researchers have shown that emotional language clues and emotional pattern distinctions exist between true and fraudulent news [39]. The supervised machine learning methods, e.g., Decision Trees, have all been utilized extensively as Random Forest (RF) and Support Vector Machine (SVM) in the quest to identify bogus news [40]. Previous studies have also used neural network ensembles, which combine several different neural network designs in a unified framework. To incorporate more contextual information into the final classification, Mikolov et al. [41] input article representations from CNN and Bi-LSTM models into NLP. Thanks to the drastic improvements and inventions in the field of NLP concepts such as word embedding [41], sequence-to-sequence model [42], and Attention mechanism [43], the current Bert Models have come into existence. Using transfer learning [44] these Bert models are now used for various text classification problems. Aurpa et al. [45] present the results of their investigation into the use of transformer-based deep learning models for automatically identifying threatening or offensive posts in Bangla on Facebook. The outcomes of their implementation of the BERT and ELECTRA designs have been demonstrated. In another paper, Alam et al. [46] fine-tuned a multilingual transformer model for Bangla text tasks such as sentiment analysis, emotion detection, authorship, and news categorization. They improved on top of the past results in a range of 5-29 % accuracy in various tasks.

B. BLOCKCHAIN AND SMART CONTRACTS IN NEWS INDUSTRY
In recent years, blockchain has developed as a popular technology because of its potential to address several issues with network security [47]. The initial block in the chain is called the genesis block, containing information like the previous block's cryptographic hash address and transaction data [48]. Experts all around the world are finding novel uses for blockchain technology beyond the realm of online financial transactions [49], [50], [51]. Ochoa et al. [52] discussed the central blockchain platform for verifying news sources. With this method, a consensus algorithm is developed based on data mining to recognize previously unrecognized connections between pieces of information. The system described here may spot false news, warn readers, and take action against the truth finder while rewarding the publisher of the phony story. Arquam et al. [53] presented a news tracking system using blockchain technology to distribute credit fairly. These financing options are available in two varieties regional credibility and international credibility. The publisher receives a validation request when a user validates news with this system. Blockchain technology powered by artificial intelligence was presented by Shae and Tsai [54]. An integrated blockchain and natural language processing (NLP) system has been created by Shahbazi and Byun [55] to employ machine learning approaches to detect fake news and more precisely forecast fake user accounts and postings. Smart contracts are used in the news industry to prevent the dissemination of false information [56]. In this procedure, the publisher is authenticated before any news release. It is then provided with a public key and verification before any information is shared. Babar et al. [57] have leveraged the blockchain's security and immutability to track a news story's original source. The blockchain's immutability ensures that the credit for breaking news can never be retracted. They create tool to assist news outlets in delivering reliable content. An anonymous method of exchanging news on the network was presented by Islam et al. [58], which advocated for the use of blockchain technology. NEWSTRADCOIN is a mechanism for disseminating information that uses the anonymizing qualities of the blockchain. It also resists tampering and the immutability of its records.

C. INCREMENTAL MACHINE LEARNING
It should be noted that none of the methods discussed had the potential for incremental learning, or the ability to pick up new ideas without having to retrain the classifier using the entire collection of data [59].How well a model does its job depends heavily on how that data was sampled. Overfitting is possible if the distribution of the old datasets is different from the distribution of the new data. For text categorization, the authors offer a generic incremental learning strategy using deep learning supplemented by a reinforcement learning module; when applied to product evaluations, the system achieves an f1 score of 0.80 [60]. As new samples are introduced, incremental learning eliminates the need to retrain with all data, saving both time and space. For incremental learning to work, the algorithm must make accurate inferences from a steadily increasing amount of streaming input. A perfect cumulative learning system would absorb information without requiring access to the whole training set. Because of this, the degree of time and space complexity is kept manageable. Unfortunately, the present model tends to lose the knowledge it learned from earlier training sessions when taught incrementally, an issue known as catastrophic forgetting [61]. In particular, fine-tuning a model on new data typically leads to a considerable decline in performance on old data. By allowing the model to keep some of the training data from earlier samples while learning new ones and retraining the portion of networks that experiences the most negligible loss during the incremental learning process, our proposed INC+, incremental learning technique differentiates from these past efforts. Natural language processing (NLP), INC+, and blockchain technology (smart contracts) are all combined to analyze Bangla news with unprecedented accuracy.

III. NLP MODEL SELECTION
This study is comprised of two major segments. In the first segment, three NLP models are subjected to extensive testing in order to pick the suitable model for deployment in the blockchain-based environment. In the second segment, the system is developed with all the necessary implementation, including the deployment of the ML model. While the system is in use, the NLP model is incrementally trained with new data so that the model remains coordinated with the current data trend in the real world. This section contains the results and discussions on the initial tests with three NLP algorithms to illustrate the selection process for the NLP model.

A. NLP MODELS
Since this study works with Bangla fake news, we chose three popularly used NLP models for Bangla text, which were trained using large amounts of multilingual texts, including Bangla. The three models are Bangla-Bert-Base, Bangla-Bert, and Bert-base, multilingual-cased, available in the Scikit-Learn library. These three pre-trained Bert-based architectures were used to train our initial dataset. The description of these three models is given below.

1) BANGLA-BERT-BASE
This pretrained Bengali language model [74] is based on the mask language modeling technique that Bert and its GitHub repository describe. In examples involving sentiment analysis, the hate speech task, and the news topic task, this model was applied. Training Information: The currently available model (12-layer, 768-hidden, 12 heads, 110M parameters) is built using a Bert-base-uncased model architecture.
2) BANGLABERT [75] Using the Replaced Token Detection (RTD) goal, this ELECTRA discriminator model was pre-trained. ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) is a transformer with a new pretraining method that specifically trains the generator and discriminator transformer models. The discriminator tries to determine which tokens in the sequence are replaced by the generator. The generator replaces tokens in the sequence, trained as a masked language model. In place of masking the input, this pretraining task is known as replaced token detection [62].
It could do several tasks such as Using the most comprehensive Wikipedia and 104 languages, this model [76] was pre-trained using a masked language modeling aim. Although this paradigm is case-sensitive, it has no relevance for the Bangla language. Two goals guided the pretraining of this model: Masked Language Modeling involves taking a sentence, masking 15% of the words at random, running the entire text through the model, and then having the model predict the masked words. The models concatenate two masked sentences as inputs during pre-training for next sentence prediction (NSP). Sometimes they line up with sentences that are adjacent to one another in the original text, and other times they don't. The next step is for the model to determine if the two sentences were in order. The unique aspect of this model is that, for example, if we have a dataset of labeled phrases, we may use the features generated by the BERT model as inputs to train a conventional classifier.

B. DATASET DESCRIPTION
The dataset was collected from BanFakeNews [42], [63], which collected data from different sources and created a brand-new dataset on Bangla news. They chose the dataset from twenty-two of Bangladesh's most widely read and reputable news portals. The dataset comprises the following twelve news categories (shown in table 1). Any news that provides inaccurate information or contains facts that could mislead readers has been categorized as ''fake'' news. Other than the news itself being fake, misleading and sensational heading often creates ambiguity towards the authenticity of a news. One example of such kind of news is Clickbait [64], which is designed to attract readers' interest and encourage clicks, ultimately boosting site traffic and bringing in money for the creators. A large percentage of regional or less popular websites typically act in this way. Information was gathered from well-known websites that post parody news in Bangla. The news is identical on the majority of websites. So, duplicates were eliminated once the news was scraped from these websites. The news stories with misleading or inaccurate context were gathered from www.jaachai.com and www.bdfactcheck.com. These two websites offer a logical and educational justification for false information that has previously been posted on other websites. To minimize duplication, the news that is mentioned on those two websites was gathered from real publishing sites. After removing duplicate data and punctuation from the dataset, BasicTokenizer was used to make the dataset cleaner. The dataset having about eighty-four thousand instances was partitioned into several subsets, as seen in Figure 1. We have separated 60% of the examples to be used for model selection which is further divided into training and testing sets. The remaining 40% of data was reserved for testing continuous learning, and we separated it into nice experimental groups to assess and track the constant learning rate.

C. EVALUATION METRICS
In the case of unbalanced data, classification is a difficult process since only anticipating the majority class will always result in an accurate classification. In addition to the confusion matrix, a number of other measures are employed to determine if a classification system has genuinely learned or not [38], [65]. The metrics we used in the study are illustrated in equations 1-4.
In the equations above, TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively, which is obtained from the confusion matrix generated from the number of true and false instances classified by the NLP model.

D. EXPERIMENT AND SELECTION OF NLP MODEL
60% of the experiment set was used to train and test three NLP algorithms. A ratio of 80:20 was used for training and testing to divide the 60% of data (49,977 data instances). Both the training and testing sets employed the undersampling algorithm. At this point, only 40% of the dataset has been touched; the remaining 60% will be utilized to assess the incremental learning stage. The metrics for the first training and testing of the three models are shown in Table 2. As a result of somewhat higher training and testing accuracy, the Banglabert-base model outperforms the other two models, according to the results. With significantly higher precision, recall, and f1-score, the Bert-base-multilingual-cased model's training, and testing accuracy, however, are approaching the scores of the Bangla-Bert-base model. For this reason, we ultimately decided to adopt the Bert-base-multilingual-cased model into our system. Figure 2 displays the Bangla-bert-base model's confusion matrix based on our testing data (after applying undersampling).

IV. SYSTEM DESIGN
The main objective of this work is to prevent the dissemination of fake news. We suggest a blockchain-based decentralized news verification platform. The issue with news industry is the lack of authoritative control over social networking sites (SNS) and mainstream media (MSM) [66]. The so-called ''news platforms hardly give the verification and authenticity of a proper news attention.'' We propose a solution to the fake news dissemination problem with the combination of decentralization using blockchain and natural language processing techniques. The system has three primary actors, the evaluators, the publishers, and the news media. The news publisher initiates the system by performing the required actions. These include activating the news detection process for a piece of particular news, posting the news to the system, authenticating the evaluators, and finalizing the detection process. To be precise, the second category of actors consists of participants in our system, experts in the news industry, journalists, or news bloggers. These individuals will contribute to the system by voting for or against news to determine whether it is authentic. Only the verified (by the publishers) evaluators can cast a vote in our system. The third type of actors is new media, registered (licensed by the government) news media, who will also participate in the evaluation process by providing votes for or against any news posted into the system.

A. OVERVIEW OF PROPOSED ARCHITECTURE
The system's architecture comprises four distinct components: actors, the NLP model, the DApp (distributed web application), and the Ethereum-blockchain network. The actors are contributing news agencies, verified evaluators, and publishers who participate in different tasks that are executed in our system. The blockchain network is the fundamental core component of our system. Different nodes in the network represent other actors, each having a distinct role. When a piece of new news is submitted to the system by anyone, the publishers get notified. After initial justification, the publisher will directly reject or submit the news to be verified by the system. After the news is posted in the blockchain network, the evaluators and the news media nodes will cast their votes, and at the same time, an NLP model will also be used to determine whether the news is fake. The final determination of whether the news is phony or legitimate will be made using a weighted combination of human and computer decisions. The new information is saved in the system and will be utilized later to retrain the model using fresh examples of data. The proposed system architecture is  shown in figure 3. Altogether, the system can be described using the five stages described in the preceding section.

1) FIRST PHASE
Before the news evaluation process starts, the evaluators must be verified. The evaluators need to register into our system and give information regarding their identification credentials and the field in which they work. The evaluators first get confirmed by the news media nodes, and then as additional evidence, they are authenticated by the news publisher nodes. This step involves the news publisher thoroughly checking the evaluator's credentials and profession. News publishers will grant only journalists and well-reputed news bloggers permission to be news evaluators to verify the news. And at the same time, the NLP model is also deployed into the system in this phase.

2) SECOND PHASE
In the second stage, whenever news is posted into the DApp by any user nodes, the news publisher will broadcast the information into the blockchain network, hence beginning the news detection and evaluation process. The evaluators and news media nodes will get notified and have the option to cast a vote for or against the news based on their knowledge about the posted news.

3) THIRD PHASE
In the third phase, the news will be live on the network for a certain time, awaiting voting by the evaluator and news media nodes. One evaluator or publisher can only vote once. When all of the news evaluators are done with their news verifying process of the news, the publisher will then end the process of news evaluation. Once the process of detecting news has been completed, the news publisher can view the outcomes of the news verification procedure. This data is the human expert opinion about the news being authentic or fake, which a decentralized platform has collected. The outcome will provide the system with information regarding the overall number of evaluators who recognize the news as authentic or fake, which the system will use to determine the final decision on that particular news.

4) FINAL PHASE
In the final phase, the votes (human expert opinion) from the blockchain network on a news topic and natural language processing classification will be used to produce the outcome. The publisher can begin the synthesis process using a smart contract function after the decentralized opinion poll (voting) for a particular news topic has concluded and the NLP model prediction is also ready. This function will use equation 5 to combine these two predictions and ultimately determine whether the news is authentic or fake.

V. SETTING UP THE TEST ENVIRONMENT
Our entire system incorporates blockchain technology as the system's backbone and natural language processing techniques as a machine detection mechanism. We have used several technical components such as APIs, packages, and libraries, to implement the system and its underlying functionalities. The combination of these technological components helped us to develop the test environment in which we simulated the news evaluation process.

A. DEVELOPING THE BLOCKCHAIN NETWORK USING GANACHE AND METAMASK
As we want to use blockchain technology as the fundamental of our system, a test network has been developed using ganache, metamask, and smart contracts written in the solidity programming language. The description of the test environment components is as follows.

1) GANACHE
Ganache is a simulator for running and developing distributed applications over a local blockchain network. This blockchain development environment provides with personal Ethereum test network on which personalized smart contracts can be deployed and tested. The personalized workspace of ganache offers the opportunity to customize the component of the blockchain and smart contract. Figure 4 shows the test blockchain network running with ten accounts, each of which has one hundred test ether. The user interface allows us to investigate the blockchain by inspecting the blocks, viewing the blocks being mined and the transactions taking place on the blockchain.

2) METAMASK
We can communicate with the blockchain with the help of Metamask, a software cryptocurrency wallet. It allows the blockchain participants to access their Ethereum wallet using a mobile app or a browser extension, which can subsequently be used to connect with decentralized applications. By connecting the ganache client to the Metamask, the participant can transact (operate) on the blockchain network. Figure 5 shows the creation of a Metamask wallet with 100 ethers to spend.

B. DEVELOPING SMART CONTRACTS
One smart contract was designed to facilitate automated and trustworthy maintenance of the functions and actions within the blockchain network. The properties, events, functions, modifiers, and other coding terms make up the smart contract named as NewsDetection contract. Functions are the control statements that regulate the actions between the news publisher, evaluator and media. This contract obtains all news, information about the news's authors, the news's evaluators, and all the evaluation findings. Additionally, it includes the functions that must be implemented to combine expert human judgment with machine judgment to make the final decision. Figure 6 depicts our contract creation's characteristics, responsibilities, occasions, and modifiers. To write and test the smart contract, we use remix IDE, which is an opensource collection of modules for writing, testing, debugging, and deploying smart contracts into local blockchain networks. The 0.9.0 stable solidity version was used during our proposed architecture's development.

C. NLP MODEL UPDATE
One key component of our system is the NLP model, which is the machine decision on news evaluation. By utilizing a continuous machine learning process, we intend to use the latest and best-performing NLP model at a given time. Any model being outperformed by a new model will be replaced by the better version. To support this mechanism, our system uses the NLP model as a pointer to the best model. The variable pointing to the NLP model will always point to the best available NLP model. Figure 7 depicts the updating of our NLP model, which is also referred to as continuous machine learning. After the initial preprocessing, the data is partially used to train the existing NLP model. Partial training [67] is a procedure of incrementally training a machine learning model with new data instances without forgetting or replacing the learning history of the model. The goal of incrementally training a model is to keep the model up-to-date by continuously training the model with new data and, by doing so, upgrading the model's performance. Though it is not guaranteed that the performance will improve, in this work, we compare the metrics of a newly trained model to the currently available best model's metric and replace the model if the new model performs better on the test set. As shown in figure 7, whenever a new set of data is submitted to the system by any news media, the data is first saved to the data center on the cloud and then passed to the data preparation section of our system.

VI. SYSTEM TESTING AND RESULT ANALYSIS
The antecedent sections provide a detailed discussion of the selection process of the NLP model as well as the architecture and implementation specifics. In this section, we will put our model through its paces by analyzing it from the blockchain, smart contract, and natural language processing points of view. This section will also demonstrate our smart contracts and the execution of the contract within the blockchain network.

A. ALGORITHMS
Our proposed system implements several algorithms to carry out its purpose of news evaluation. Among the algorithms, the most crucial is the NLP model update algorithm, the news evaluation score algorithm, and the final integrated outcome algorithm. This section will demonstrate three of our system's algorithms, which are the most important to our system.
The first algorithm aids the purpose of incremental machine learning. The decision to update the model, which is currently being used, is dependent on this algorithm. The metric of the first model which was selected for deployment is marked as the best metric; for example, P b refers to best precision. These metric values are stored as the global best metric at any given time. The new metrics are compared to the current global best metrics whenever the model is partially trained using new data. The metric of the newly trained model is set to be the global best. The deployed model is also replaced with the more recent version only if the newly trained model performs better than the currently deployed model, as stated by the conditionals stated in algorithm 1, specifically in execution sequences five and six. If the metric is not satisfactory, then the algorithm will ignore that iteration of incremental model training.
The second algorithm's goal is to facilitate the process of news evaluation within the blockchain network. The evaluation process is initiated from a news media node. The evaluators who are authenticated and verified and have not yet participated will be able to submit their vote within the time limit T. As long as the elapsed time (T elapsed ) is less than the time limit (T), the voting thread will be live on the network and will automatically end after T amount of time. After the voting is closed, the algorithm will generate the percentage of authentic and fake votes cast on that news.
One of this study's significant contributions is combining the human expert evaluation of news with machine detection to determine whether the news is fake. The comparative VOLUME 11, 2023

Algorithm 1 Model Weight Updating
Result: Model weight update decision. 01 calculating current model metrics (precision, recall, f1score) 02 assign the current value to variable precision → P current , recall → R current , F1score → F1 current 03 importing the previous best metrics (P b , R b and f1 b ) 04 Compare_metrics( [current metrics], [best metrics])

Algorithm 2 News Evaluation Process
Result: Output of news evaluation process 01 A new news is posted to the system. 02 News publisher broadcasts it to the network. where authentic percent = [(N authentic /N evaluator ) * 100] fake percent = [(N fake /N evaluator ) * 100] 011 return authentic percent and fake percent cognitive ability of human and machine intelligence is perplexing [68]. However, the extent of machine intelligence still has limitations that are subject to further exploration [16]. Consequently, during the combination of human and machine decisions, we introduced α as the human prediction bias (initially set to 0.6) and β as the machine prediction bias (initially set to 0.4).
These two variables control the 'importance' of the decision made by human experts and machines. In our current setup, we put more weight on the human decision, and slightly less weightage is given to machine decisions. The final evaluation score (FE) follows equation 5, stated below.

FE = α
Authentic percentage 100 + β(NLP Prediction ) (5)  The overall process of combining human and machine decisions is shown in algorithm 3.
The final evaluation process was chosen empirically, which aligns with our objective to ''combine while putting slightly more weight on the human decision.'' Table 3 shows cases where varying human vote and machine decision has been combined to an intelligible final evaluation using equation 5. Table 3 depicts that for both the NLP and human opinion being at maximum, the evaluation is authentic, and if 100% of human votes for the authenticity of the news, and even if the machine prediction is 0% confident, the news will be finally evaluated as authentic. However, if all evaluators identify the news as fake and the machine predicts the news as authentic, it has finally been considered fake. For both being 50%, the news will be assessed as accurate. For a higher range of contrast where the human decision is 80% authentic, and the machine decision is 20% authentic, it can be observed that the human decision has contributed more towards the final evaluation score and vice versa. The same scenario is repeated when the range of human vs. machine decisions is competitive; the human decision has been weighted more while determining the final status of the news.

B. SYSTEM SIMULATION
The system architecture implements an interface between smart contracts, the Ethereum blockchain, and natural language processing. Before the system is operational and has been deployed, the NLP model has been chosen based on experimentation described in section III. After the system's deployment, any connected node or user can post their news into the system, and the publisher gets notified. After initial verification (whether a random sentence or legitimate news), the publisher can broadcast the new news into the blockchain network. The dashboard for the news publisher is shown in figure 8.
As shown in figure 8, the news publisher can add new news to the system and authenticate evaluators. The publisher initiates the evaluation process, where newly submitted news gets broadcasted to the blockchain. At the same time, the news is also passed to the NLP model for classification. Figure 9 shows the publisher dashboard after successfully verifying the blockchain network. The main objective of our system is to combine human opinion and machine decisions on certain news. After the news verification process terminates and the NLP model has also classified the news, the publisher node can initiate machine and human news prediction synthesis. Figure 10 delineates the publisher window after the synthesis process.
The system executes many functions and methods in the background while evaluating news. The functionalities are governed by smart contract, which runs within the blockchain network. The attributes and functions of the smart contract are shown in table 4.

C. INCREMENTAL LEARNING OF NLP
The initial NLP model has been selected by training and experimenting with a batch of dataset. However, with time, newer and unexampled ways of fake news dissemination make their way into the overall news industry [65], [69], which the model has no idea about. To keep the model updated and in harmony with the current representation of data, the model needs to learn from new data instances. But, batch training for every new dataset is complex and computationally costly, and the model also forgets its previous learning. Incremental learning is a technique that leverages the partial training technique to train the model with new instances of data while keeping the prior knowledge of the model intact [70].
In our system, the connected actors like news media and news publishers are representatives of the news industry who own the recent data in this regard. They can contribute these data to improve the NLP model's performance incrementally. To demonstrate this incremental learning scenario, we made nine experimental sets using the rest of the 40% of data from the dataset used. Each experiment set contains 1550 instances of data which is split into train and test sets. The initial model has 88.7% training and 89.3% testing accuracy and was trained for nine rounds of incremental training. Table 5 shows the metric during incremental training, and figure 11 shows the trend in metric change. VOLUME 11, 2023  The results of incremental training in table 5 shows that, from initial training and testing accuracy of 84.94 and 84.99, respectively, only increment number 2,4 and 9 have shown improvement. The greatest improvement has been seen for increment 9, which yielded 93.75 training and 93.80 testing accuracy.

D. SYSTEM ANALYSIS
This section contains an extensive analysis of the proposed architecture. The system uses a combination of blockchain, smart contract, and NLP techniques, and these technologies incorporate a few disadvantages while providing significantly more advantages. The detailed analysis is discussed below.

1) IMPLICATIONS AND ADVANTAGES
With the introduction of a novel news evaluation procedure, integrating Blockchain and NLP-powered news evaluation seeks to do away with socio-political power-exertion practices in south Asian nations like Bangladesh and India. This will restore the freedom of the news media, which has been severely restricted in this region. With the advancement of electronic devices and increasing availability of internet service, the ''previously known'' legacy news media (e.g., press, television, and radio) has shifted towards contemporary social media (e.g., blogs, YouTube, and social connection sites) [70]. This shift has also triggered easy dissemination of fake news, using social media to dissemble large-audience by boosting fake news or misinformation [68], [76]. The lack of governance of these growing alternate news media and the absence of a proper news evaluation system has indicated the indispensable necessity of a new evaluation system [10], [18].
Our proposed architecture uses a rather uncustomary approach toward news evaluation. We proposed decentralization for a human opinion about certain news using blockchain technology. The primary drivers behind this are privacy protection and objective opinion gathering from a dispersed network of skilled news and media workers. Adding to human opinion, we parallelly use a classification algorithm based on natural language processing. Finally, we proposed a novel synthesis procedure that combines these two predictions and outputs a final status about the news in the form of fake or authentic. Blockchain's transparency and immutability, added to the rigid governance of smart-contract technology, are the key aspects that make our platform trustworthy among the participants by removing the trust barrier [72]. As sociopolitical coercion over the legacy news media is one of the prime reasons that the news is manipulated and fabricated, it is also why contemporary news media is used for fake news dissemination [70], [73]. Our method ensures that everyone's  privacy is protected so everyone can contribute without experiencing any hesitation. To better demonstrate the contribution of this study to the subject, table 6 compares our NLP model output with existing research, and table 7 compares our system with previous research employing blockchain, smart contracts, and natural language processing.
Although some studies in table 6 reported better accuracies than this study, datasets, ML algorithms and research objectives vastly differ among these studies. In the dataset used in this study, our proposed model shows competitive performance with the state-of-the-art. As presenting a highly accurate NLP model is not the objective of this study, a satisfactory metric was achieved by our model. The following findings have been identified by comparing the proposed architecture to existing research works. • None of these other works have evaluated news from the perspective of human experts and machine prediction. Our architecture uniquely combines both of these decisions while considering a news. VOLUME 11, 2023  • The proposed architecture provides a rigid and trustable platform for the news evaluation process to encourage greater adoption of Blockchain and NLP.
• As the trend of fake news might change in the future, our architecture adopts an incremental learning method for the NLP model so that the model does not get inefficacious.
• Work contributing to Bangla fake news is scant even though the relatively large census uses the language, written and spoken. Our work is completely dedicated to the evaluation of Bangla news. Compared to other studies, we proposed a unique combination of decentralization and natural language processing for Bengali fake news detection. The implications of this research can be summarized as follows.
• News agencies can use our architecture to verify a piece of news before they publish it.
• Before using any ''viral'' social media information, our architecture can be used for initial verification.
• If news media personnel are under any commination, they can, without hesitancy, contribute their newsrelated knowledge using our system.
• The freedom of opinion and freedom of the press is restored as decentralization establishes utmost privacy preservation.
• People's trust in news media will increase as they know that the system verifies a piece of news by the experts among the people themselves in a bias-less and privacypreserved manner. Implications demonstrate that the system is a potential solution to the fake news problem while also ensuring freedom of the press. The design shows a plausible solution to news verification without any socio-political influence or bias.

2) CHALLENGES
Though our system uses advanced regulatory smart contracts and immutable and verifiable ledger technology (blockchain) to evaluate Bangla news securely and inflexibly, some flaws still require further inspection. Blockchain technology has not yet been implemented in practical applications because it is still in its early phases. Using the blockchain network is a trade-off between reliability and computational cost because it is expensive in terms of execution and mining time. Since it is practically impossible to edit or amend data on the Blockchain, once something has been placed into the system, error correction becomes impossible. The evaluation procedure will be unchangeable and trustworthy on the one hand, but any false-negative cannot be modified to take its proper shape on the other hand. Additionally, data-driven machine learning is always susceptible to intentionally corrupted data, which will pose difficulty for the model to perform, and hence might completely fool the model.

VII. CONCLUSION
In southern Asia, Bangla is a large population's spoken, written, and official language. Studies show that fake news is a social and political contrivance used to alter public opinion to conceal socio-political slanders or divert issues. Sometimes the press and media are under intimidation, and this trepidation leads to fabricated news with distorted information. The recent inclusion of social media, blogging sites, and other internet-based communication media has worsened the situation, as these mediums are without the control of any news evaluation authority. The news media have also joined social media, which has made the problem much worse as more individuals (news consumers) are actively using social media. News that is ''viral'' on social media sites spreads quickly and reaches millions of people before it can even be assessed. Even if it may seem harmless now, if this uncontrolled and unregulated method of news broadcast is not put under stringent supervision and regulations, terrible consequences will await it in the future.
Integrating blockchain and natural language processing can be a productive solution to Bangla news evaluation before reaching the masses. Our architecture uses the potential of decentralization and machine classification and combines both of these to generate a unified decision about whether the news is fake or authentic. Human experts evaluate news in our system as counterfeit or authentic, and an NLP model also classifies the same news. We proposed a novel synthesis formula combining human expert opinion and NLP classification output into a status (authentic or fake). Using both of these together makes the evaluation process more rigid and accurate. The synthesis formula subtly puts more weight on human opinion because of the cognitive superiority of human experts. The whole process is governed using a smartcontract which makes the process rigid and authoritative. The contract being deployed inside the blockchain is immutable, and the execution procedures are also safeguarded inside the blockchain network.
The NLP model was chosen by experimenting with three Bert-based models. The Bert-base-multilingual-cased model was chosen among three competing models with an initial training accuracy of 87.35% and testing accuracy of 88.63%. The Bangla-Bert-base scored nominally better training and testing accuracy, but the precision, recall, and f1-score was marginally higher for the Bert-base-multilingual-cased model. Our system employs incremental machine learning techniques for the NLP model so that the model remains effective even if the nature of fake news propagation changes. Our system testing demonstrates a workable prototype of our system exhibiting the major functionalities in action. The results of incremental training of the NLP model show that the model's performance can be improved by partial training. The initial accuracies were improved to 93.75% training and 93.80% testing accuracy by incrementally training the model. Our proposed news evaluation process using blockchain, smart contract and NLP shows great prospects in the pre-evaluation of news. With combined human and machine opinion, our system evaluates news without bias or intervention by socio-political attributes. The solution is generic to be applied to analogous problems. However, the problem of corrupted data misguiding the model is still a matter open to investigation by future researchers.