Neutrosophic Logic-Based Document Summarization

Nowadays, rich quantity of information is oﬀered on the Net which makes it hard for the clients to detect necessary information. Programmed techniques are desirable to eﬀectively ﬁlter and search useful data from the Net. The purpose of purported text summarization is to get satisﬁed content handling with information variety. The main factor of document summarization is to extract beneﬁt feature. In this paper, we extract word feature in three group called important words. Also, we extract sentence feature depending on the extracted words. With increasing knowledge on the Internet, it turns out to be an extremely time-consuming, exhausting, and boring mission to read the whole content and papers and get the relevant information on precise topics


Introduction
By increasing the knowledge on the Internet, it turns out to be an extremely time-consuming and boring mission to read whole content and papers and get the relevant information on precise topics. Content summarization is recognized as a key for this matter as it generates programmed briefing of the data. Summarization of text can be defined as an abbreviated version of generated text from several documents without down core contents or impression of the original documents and expressive summary of a certain manuscript by covering greatest imperative part of the contents and with smallest redundancy from different contribution resources.
ere are various types of content summarization depending on rate of recurrence of input sources, the technique of generated summary, the goal of summary, and the input and output language of summarization process.
Recently, the theory of neutrosophic logic and sets has been introduced. Florentin [1,2] presented the neutrosophic logic. It is a decision in which each proposition is valued to have three grades such as a grade of truth (T), a grade of indeterminacy (I), and a grade of falsity (F). A neutrosophic set is defined as a set where every component of the universe has a grade of truth, indeterminacy, and falsity, respectively, and lies between [0, 1] * , which is the nonstandard unit interval [3][4][5].
In this paper, we propose neutrosophic logic centered multidocument summarization procedure to debrief vital sentences to create nonredundant summary. e projected approach is associate degree extractive primarily built generic report system, and outline within the context of this projected work is matter outline created from one or many news connected documents. e paper is well-structured as follows. In Section 2, we give some basic concepts on the text summarization system. Section 3 introduces the proposed summarization technique. e fundamentals of neutrosophic sets are introduced in Section 4. e basics of using neutrosophic sets based on information retrieval are introduced in Section 5. Section 6 is devoted to present our approach to document summarization using distance between neutrosophic sets. e conclusion of paper is given in Section 7.

Text Summarization
As previously said, text summary is a condensed version of a document that retains the major points and ideas of the original material(s). e goal of a summarizing system is to offer a concise and fluid overview of a given text by addressing the most important parts of the material while minimizing redundancy from various inplace sources.
ere exists a range of taxonomies for text summarization [8][9][10][11][12] supported frequency of input sources, the means of outline generated, purpose of outline, and language of input sources.
ere are 2 varieties of algorithms regarding which varied works are printed around text summarization. ey are extraction-based summarization and abstraction-based summarization.
e extraction-based technique works by extracting sentences from a document. ere is no compression in any format during this technique. It is just a matter of memorizing sentences in order to create a more compact outline.
Abstraction-based reports, on the other hand, are effective. Apart from memorizing the most important sentences, it alters the way a text is organized. e retrieved text is regenerated. It is categorized as a single document or multidocument report depending on the number of input sources considered for generating the outline. Once a document has been provided as an input for a text report, it is known as a single document report, whereas a multidocument report uses a collection of papers as input to create the outline. e outline of a domain-specific report is generated using domain-specific data, whereas the outline of a domain freelance report (generic) is generated using generic alternatives. Domain-specific report approaches have become popular among academics.
In this research, we offer a document summarization system based on neutrosophic logic for extracting relevant sentences and generating a summary. e planned approach is an extraction-based generic report system, and the outline in this planned work is a matter outline created from one or more news-related papers.

The Proposed Document Summarization Technique
Summary is not sufficient to just generate words and phrases that apprehend the source document. Summary also must be accurate and read fluently as a new separate document. Summarization of text [3,[13][14][15] is the duty of creating a brief and fluent summary while retaining the overall meaning and information content. e process of summarization takes some steps: first is the preprocessing of data; second is the feature word extraction; third is the feature sentence extraction; and the last step is the organization of the set of documents to produce the summary. In the last step, we use the neutrosophic logic, and we illustrate it later.

Input
Preprocessing. Some preprocessing activities are required for the set of raw documents before they can be entered into the planned technique.
(i) Words that should be avoided or removed: e most commonly used terms, such as "a," "an," and "the," do not have any linguistics data related to the text area unit. All of the stop words have been preprogrammed and saved in a separate file.
is is the process of converting all words to their root type by eliminating their prefix and suffix. For the stemming procedure, we employed a porter stemmer. (iii) Removal of Special Characters. House character removes all special characters from a collection of input documents, including punctuation, interrogation, and exclamation. (iv) Segmentation Process. is is a method of extracting each sentence from a document independently. All sentences from documents are retrieved and saved in this manner. (v) When a sentence is segmented, the tokenization process is applied to all of the sentences. It is a technique for isolating words from sentences. It is used to define the character structure, such as the date, time, punctuation, and number.

Feature Extraction.
To perform an efficient document summarization, we consider the feature extraction. Feature extraction is not only limited on words but also on sentence.
In the following subsections, we illustrate our method to extract words with different levels of strength. Also, the sentence extraction depends on words feature. e preprocessed knowledge in word is used to see sentence score in the feature extraction phase. e effectiveness of different sentence evaluation methods is determined by the type of text, genre of text, language, and structure of contribution text. e main belief is that completely distinct themes will enjoy different characteristics, which can be differentiated by a variety of possibilities.
All the text selections are divided into two categories: word level and sentence level alternatives. We have run tests on various combinations of shallow text options on various datasets to find the optimum mix of options that will deliver the greatest results in terms of coverage and relevancy for the news domain. e options that were used in the planned strategy are listed below.

Word Features.
e previous methods of text summarization depend on word information in the whole documents. Another way, we can extract feature that recognizes topics by using words without reading the whole document. For example, word "algorithm" can indicate the document field "computer science"; the appearance of this word in any sentence means that this sentence is also important. e term "document field" refers to basic and mutual information that is useful in human communication.
A field tree is a visual representation of document field relationships. e field tree's leaf nodes are parallel to terminal fields, super-fields are nodes connected to the root, and other nodes are middle fields. Text field can be cleared efficiently if there are many important words and if the frequency rate is high. erefore, we can define three levels of important words (IM-W) which will be more effective than using full documents as traditional methods. e three levels of IM-W are defined as follows: (IM-W) 1. is appears with title of document and in one terminal field, and we can calculate it as follows: for the root of supper field F, the child field is F/c; the following formula is used to justify whether or not the word w is (IM-W) 1.
appearance(w, < F >) � the frequency number for appear the word w in the field F total number of words in field F . (1) (IM-W) 2. is appears with more than one terminal field in one medium field.
(IM-W) 3. is appears only with one medium field.

Sentence Features.
Sentence features are the most important to construct the summary. Two features of sentences are identified: the first is the sentence that contains IM-W and the second is sentence length, and the short sentences do not give any vital information, so short sentences are not recommended. Sentence length score is computed as follows: length of sentence S i � number of of word occuring in sentence S number of words occuring in a long sentence .

Summarization Process.
e summarization process [16][17][18] is done with three steps. First, all the sentences are arranged from the highest to lowest score achieved using the neutrosophic approach. Sentences are chosen based on their degree of resemblance to other sentences in the summary. We used the following formula to determine sentence similarity: Euclidian distance between two neutrosophic sets which is explained in Section 6. e second step is the optimization process; in this step, we delete the repeated sentence and delete the similar sentence which contains the largest number of similar words. e third step is sentences arrangement. Sentences are organized in the final summary in the order in which they appeared in the foundation documents. We have laid up certain guidelines for you, which are as follows: (1) Sentences are arranged in declining order of their importance (2) If two sentences in the same document have the same score and are at the same location, the sentence in the earlier document is given priority over the other sentence

Neutrosophic Sets
e neutrosophic set is an influential general frame that has been recently proposed by F. Smarandache in [1,2]. He

Information Retrieval Based on
Neutrosophic Sets Ñ El in [19] discusses the fundamentals of information retrieval using neutrosophic sets as follows. Let D be a limited set of documents, D � d 1 , d 2 , . . . , d n }. W is a set of words, W � w 1 , w 1 , . . . , w j , w j ∈ d i ; the neutrosophic set Ñ in D is considered by a truth-membership function tÑ, an indeterminacy-membership function iÑ, and a falsity-membership function fÑ, wherever tÑ, iÑ, fÑ: D ⟶ [0, 1] are functions and ∀d ∈ D, d ≡ d(tÑd(w), iÑd(w), fÑd(w)) ∈ N. Consider a neutrosophic single-valued element of Ñ.
A neutrosophic single-valued [8][9][10][11][12]20] set Ñ over a limited universe D � d 1 , d 2 , . . . , d n is characterized as follows: Journal of Mathematics N � d 1 , < tÑd 1 w i , iÑd 1 w i , fÑd 1 w i > + d 2 , < tÑd 2 w i , iÑd 2 w i , fÑd 2 w i > + · · · + d n , < tÑd n w i , iÑd n w i , fÑd n w i > , where r is the number of appearance of the word w j in the document d i , S is the number of appearance of the word w j in the set D, and M is the number of appearance of the word w j in the subsetÑ.

Document Summarization Based on Neutrosophic Sets
We use the distance between two Neutrosophic sets [21,22] to create a summary with related and closely-related sentences. Single-valued neutrosophic sets [18,23] are a type of neutrosophic set that were motivated by a practical argument and can be employed in real-world applications like science and engineering. Distance and similarity are important concepts in a variety of fields, including psychology, linguistics, and computer intelligence.

Neutrosophic Summarization Technique Using Euclidian
Distance between Two Neutrosophic Sets. We introduce the distance between two sentences as a single-valued neutrosophic.
Let the sets S 1 and S 2 be defined over the finite universe D � S 1 , S 2 , . . . , S n , and let S 1 and S 2 be two single-valued neutrosophic sets in D � S 1 , S 2 , . . . , S n . en, the distance between S 1 and S 2 is as follows: e Euclidian distance between S i and S j is defined as follows: e normalized Euclidian distance between S 1 and S 2 is defined as follows: Example 1. In this example, we explain the whole method in one document, let us have a topic called "computer and math," and this topic considers a field and a part from the field tree as shown in Figure 1.
We take an article from the subfield "computer science," an article under title "Environmental impact of computation and the future of green computing." Assume that S � S 1 , S 2 , S 3 , S 4 , S 5 , S 6 is a set of extracted sentence from the document, the set of important words are W �{ Environmental, impact, computation, future, green, computing }, and Ñ is a subset of sentence fromÑ � S 1 , S 3 , S 5 . ey were selected according to the occurrence of the set of keywords W where tÑ S i (w j ), iÑ S i (w j ), and fÑ S i (w j ) are a degree of 'strong occurrence of important words,' a degree of 'indeterminacy of important words,' and a degree of 'poor occurrence of important words,' respectively. e following step is to determine the Euclidean distance between two sentences like S 1 and S 2 : Number of occurrence of keywords in the documents is as follows: Environmental "7," impact "6," computation "6," future "3," green "4," and computing "13." A single value for neutrosophic set N is given in Table 1.

Journal of Mathematics
Example 2. From the data of Example 1 and Table 1, the normalized Euclidian distance between S 1 and S 3 is given as follows:

Conclusions and Future Works
e aim of our work is to study another method of text summarization based on neutrosophic sets. e benefit of using neutrosophic sets is that they are used as a good mathematical tool for document summarization via distance between two neutrosophic sets. e expected future work for our paper is to compare this method of document summarization with other methods like fuzzy logic and fuzzy ontology.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.