An Interpretable Fuzzy Graph Learning for Label Propagation Assisting Data Classification

Graph-based semisupervised learning has become an indispensable tool for data classification recently, owing to its innate capability of efficient data structuring and representation. However, their reliance on predefined graphs constrains the efficacy of label propagation (LP) and interpretability in predictions, especially in high-dimensional feature spaces with limited information. Addressing these challenges, this article employs a fuzzy graph-based label propagation (FGLP) model, which is inherently interpretable in exploring the similarities of the normalized histogram envelope-based scaled features assisting data categorization. FGLP initiates the structuring of an undirected fuzzy-weighted graph using the novel fuzzy distance matrix by exploiting the local data affinity with reduced influence of outliers. The learned information is then optimized using two distinct SoftMax-constrained objective functions coupled with cross-entropy and lasso regularization to construct the similarity and projection matrices in tandem assisting LP and feature selection in scarcely labeled high-dimensional feature space. Performance validation on heterogeneous datasets showcases FGLP's superiority, achieving over 88% accuracy with just 10% labeled data, surpassing prior methods by an average enhancement of 18.29%.

An Interpretable Fuzzy Graph Learning for Label Propagation Assisting Data Classification Cherukula Madhu , Member, IEEE, and Sudhakar M.S.
Abstract-Graph-based semisupervised learning has become an indispensable tool for data classification recently, owing to its innate capability of efficient data structuring and representation.However, their reliance on predefined graphs constrains the efficacy of label propagation (LP) and interpretability in predictions, especially in high-dimensional feature spaces with limited information.Addressing these challenges, this article employs a fuzzy graphbased label propagation (FGLP) model, which is inherently interpretable in exploring the similarities of the normalized histogram envelope-based scaled features assisting data categorization.FGLP initiates the structuring of an undirected fuzzy-weighted graph using the novel fuzzy distance matrix by exploiting the local data affinity with reduced influence of outliers.The learned information is then optimized using two distinct SoftMax-constrained objective functions coupled with cross-entropy and lasso regularization to construct the similarity and projection matrices in tandem assisting LP and feature selection in scarcely labeled high-dimensional feature space.Performance validation on heterogeneous datasets showcases FGLP's superiority, achieving over 88% accuracy with just 10% labeled data, surpassing prior methods by an average enhancement of 18.29%.

I. INTRODUCTION
T HE daily expansion and exponential spread of social me- dia networks across the globe generate tons of data, of which only a tiny percentage are labeled, as labeling is cost intensive and incur more time with human labor [1], [2], [3].Consequently, posing a significant challenge to the machine learning society in performing effective data exploration and structuring supplementing classification [2].Over the years, semisupervised learning (SSL) faired handy owing to its simplicity with sophistication and has received significant attention in the area of label propagation (LP).Majorly, the problem with LP is the assignment of labels for the entire set of feature vectors (FV) with relatively lesser available labels in the FV subset [4], [5], [6].This elucidates SSL's potency in addressing The authors are with the School of Electronics Engineering (SENSE), VIT, Vellore 632014, India (e-mail: cherukulamadhu@gmail.com; sudhakar.ms@vit.ac.in).
This article has supplementary material provided by the authors and color versions of one or more figures available at https://doi.org/10.1109/TFUZZ.2023.3323093.
Digital Object Identifier 10.1109/TFUZZ.2023.3323093 a wider variety of issues, such as poor model generalization and low classification accuracy concerned with supervised and unsupervised learning [2].Among the diverse SSL variants, particularly, graph-based semisupervised learning (GSSL) benefits data scientists in managing and analyzing exponentially growing unstructured data due to their inherent convexity, scalability, and unique suitability in detecting inherent connections among data points [7], [8].Graphs serve as the building blocks in GSSL [9], [10], [11], [12] covering subspace and manifold learning [13], [14] wherein, the affinity between FVs is modeled as pairwise edges [15], [16] assisting LP [8].Generally, GSSL graphs a dataset with each sample denoted as a vertex and their relation defined by an edge [17] with its nodal affinity defined by the degree of closeness determined using the distance measure.
Despite the numerous advantages and simplicity of GSSL, their reliability depends on the constructed similarity matrix [18] weighted by the Gaussian (S x ij = e −σ(d) 2 ; d-is the Euclidean distance measure on X = {x i } i∈Z ; σ-scaling parameter) which in turn relies on σ [12].GSSL further insists that nearby nodes are strongly related and are presumed to have comparable labels.Subsequently, GSSL creates several manifolds that intersect or partially overlap with each other [11], when dealing with the distribution of items in diverse datasets.As the labels are allocated based on their proximity, the class information on the graph gets smoothened by neglecting the dissimilar edges [12].However, GSSLs fail miserably, when the labeled data are highly limited [19], [20] and increases the processing cost while handling large-scale data.Moreover, the graph connectivity built by Euclidean among samples during propagation is unreliable [21].In addendum, the accuracy of data categorization depends critically on the quality of the initialized graph and is questionable, as GSSLs tend to classify the data on a predetermined or fixed graph with subjective label information.
Inferences from the erstwhile literature on GSSL applied to LP reveal the following challenges.
3) LP efficacy, demanding exhaustive investigations to deal with.Especially, these challenges are notable when dealing with high-dimensional feature spaces limited by the label information.Accordingly, to address the aforementioned issues, this work proposes the fuzzy graph-based label propagation (FGLP) by replacing the conventional crisp graph (CG) with fuzzy graph (FG) for data representation offering interpretability, transparency, dynamicity, and adaptivity [22], [23].To begin with, FGLP initiates graph structuring using a novel fuzzy distance matrix (FDM) instead of the conventional Euclidean easing computational operations in high-dimensional data spaces with simultaneous reduction of outliers [21], [24].The fuzzynatured FDM realizes edges by capturing the likelihood of two nodes sharing the same label which is in contrast to the CG, relying heavily on the subjective label information prone to misclassification [25], [26].Further, the parameter-free FGLP model ensures reliability by adaptive determination of crossentropy and lasso regularizers penalty parameters through recursive tuning of the Lagrangian operator that mitigates the biases in label prediction.In addition, FGLP favors the downing of computational complexity with enhanced interpretability using the introduced normalized histogram envelope (NHE)-based fuzzy feature extraction and feature selection (FS) warranting reduced FV dimensions.Moreover, the entire operations in fuzzy space enable reliable prediction of data connections without solely relying on the subjective label information, greatly enhancing the LP efficiency, even under constrained circumstances with greater interpretability.To elevate the appropriateness and help understand the significance of the proposal, the issues in GSSL are orderly emphasized based on the recent contributions in Section I-A.

A. Related Work
The quest for efficient data storage and management systems is on the rise owing to the humongous volume of unstructured data delivered across multiple domains catering to several real-world applications.Scores of trending supervised and unsupervised models necessitate higher computational demands with biased outcomes, and false classification tarnishing data categorization.Therefore, a suitable learning alternative in the form of inductive or transductive GSSL encompassing convexity with connectivity is adopted to improve data classification.Induction learns a global representation of the data space from the "unknown" data, whereas transduction propagates labels by learning a local representation utilizing labeled and unlabeled data thereby making it progressive and hence adopted in this approach.Consequently, the below section outlines the issues related to the recent literature on transductive learning-based GSSLs employed for LP and embedding in the context of dimensionality reduction (DR) and graph structuring.
At the onset, GSSLs, such as the local-global consistency (LGC) [27] and Gaussian field-based harmonic functions (GFHF) [28] reduced FVs dimensions by smoothing the graph's manifold that closed the gap between the predicted labels.Despite being straightforward and effective, these models gradually lost relevance with the increase in dataset size, which was addressed by marginal Fisher analysis (MFA) [29] by compacting the interclass deviations using an intrinsic graph.However, graph construction and projection learning demanded two different procedures that increased the computational complexity.Likewise, the semisupervised low-rank representation, Li et al. [1] significantly reduced DR by maintaining a tradeoff between accuracy and representational complexity.Alternatively, Nie et al. [11] tuned the similarity matrix with fixed and adaptive FV weighting followed by LP, which was quite expensive and the method failed with lesser labeled samples.The variant of [11] termed semisupervised projection with graph optimization (SPGO) [30] learned the graph similarity in low-dimensional space by neglecting the labeled FVs that degraded feature projection.Consequently, the multicategory classification [31], [32] employed a pair of linearly modeled unconstrained minimization criteria for learning the graph structure and label prediction.The engaged criteria increased the model's complexity proportional to the FVs size and datasets.The unification of LP with structured graph learning (LPSGL) [33] reduced dataset dimensions using an iterative optimizer that demanded extensive computations and failed to perform well if the labeled samples are scarce.The probability-tuned similarity graph [2] performed LP using the Euclidean distance measure using iterative optimization that frequently altered the learned labels thereby, unsettling.The above discussions outline the need for an effective and adaptive LP mechanism functioning using a minimal number of computations.
Similarly, recent and prevailing graph embedding models are surveyed and chronologically enunciated hereunder, to showcase the downfalls that are addressed by FGLP.The widely popular semisupervised discriminant analysis (SDA) [34] optimized location preservation and discrimination using both labeled and unlabeled samples.However, in cases of higher levels of data nonlinearity, its inherent geometric structure was ignored, while flexible manifold embedding (FME) [35] smoothened the former's manifold for DR.Likewise, L1-Semi [36] constructed graphs by replacing the Gaussian weighting with for spectral embedding as the innate label details were neglected during propagation.Similarly, the margin-based discriminant embedding [37] constructed the nonnegative sparse graph (NNSG) using Euclidean-tuned Laplacian that achieved discriminative nonlinear data projection.NNSGs increased constructional complexity coupled with local information negligence led to the model failure.Few other embedding models aligning with NNSG, such as FDEFS [38], DLA [39], GCSE [40], and DFEFP [41], efficiently projected FVs in low-dimensional space for data classification.The strength of the available labeled information determined these models' efficacy while their performance deteriorated when dealing with large datasets.Alternatively, Chen et al. [42] constructed graphs using an energy-based distance metric that neglected the data distribution.However, the highly time-consuming likelihood-learning failed to surpass the accuracies registered by recent GSSLs.
As evident from the above discussions, the labeling accuracy of contemporary models highly relies on the availability of labeled data, and the distance measure utilized for graph structuring.Moreover, the increased propagation complexity is directly related to the dataset size and complex learning modules incorporated into their structurization.Furthermore, their operation in crisp space lessens flexibility and adaptability thereby downing accuracies, rather, these characteristics are deemed essential.Thus, from the literature reviewed above, the following objectives are formulated to fill the gaps in the usage of GSSL for LP and FS aiding data classification.
1) Design a parameter-free dynamic graph structuring model replacing the widely popular and parametric Gaussian, K -Nearest Neighbors (KNN) graphs.Also, the intended graph model should be fair, trustworthy, and interpretable, which is lacking in the conventional CG. 2) Develop a new distance measure demonstrating resilience toward noise and outliers experienced by the traditional Euclidean metric [21], [24].3) Realize a unique graph regularization model to optimize the similarity matrix easing LP. 4) Implement a simple LP algorithm performing effective information propagation with limited supervised labels.5) Adopt an optimal DR technique that downs the model's computational burden without compromising its performance.To meet these research objectives the following contributions are made in this proposal.
1) Introduced a parameter-free FG approach combining scalability, dynamicity, and interpretability into the graph structure that overcomes the limitations of predefined CGs [22], [23].2) Developed a novel fuzzy distance measure similar to geodesic distance, enhancing interclass separability with increased intraclass affinity.3) Improved FG learning using a SoftMax-tuned crossentropy regularizer bonding intraclass FVs with heightened LP performance.4) Decomposition of optimal similarity matrix favoring efficient prediction of unknown labels followed by label learning using linguistically advanced soft computing technique offering improved classification even with scarcely available labeled FVs. 5) Reduce FV dimensions with realization complexity performed in two stages.a) Extraction of scaled NHE-based fuzzy FVs using a probability-tuned membership function retaining desired data variations.b) Projection matrix learning using a Lasso regularized fuzzy entropy objective function addressing overfitting with improved interpretability.To realize the aforementioned contributions mathematically the related notations utilized in their formulation and along with the characteristics are dealt below.

B. Mathematical Notations
A summary of mathematical notations denoting the diverse variables used for realizing the proposed model is given in Table I.
The rest of this article is organized as follows.Fuzzy fundamentals in the context of set theory, feature scaling, and FG learning are dealt with in Section II.In Section III, learning validity on real-world data sets is relatively analyzed in detail followed by complexity and interpretability investigations in Section IV.Finally, Section V concludes this article.

II. METHODOLOGY
To address the research objectives formulated from the shortcomings of standard GSSL elucidated in the literature review, a novel FGLP is introduced in this section, which is expected to be interpretable for enabling downstream applications [42].Existing research mainly focuses on the inferred graph by circumventing each node's vicinity based on existing metrics [44].In contrast, the proposed model constructs a graph by defining edges based on membership values generated by the distance matrix in the fuzzy space.As the degree of membership lies in the range [0 1], the edge formation is completely based on the probability of closeness, thereby, making the graph highly interpretable.The graphical representation of the proposed LP is shown in Fig. 1.
At the onset, the scaled fuzzy FVs are extracted by the normalized histogram envelope (NHE) by exploiting the inherent flexibility and adaptability qualities of fuzzy logic.FG is then structured using the novel FDM to ease data categorization followed by the construction of the optimal similarity matrix on the resultant data by fitting a simple cost function.Later the higher dimensional FVs are projected to lower dimensional space using lasso regularization (LR)-based minimization tuned by fuzzy entropy for optimal FS.Finally, a unified framework is realized addressing similarity learning, and FS followed by LP.To better understand the fuzzy mathematics involved in developing different modules of the FGLP framework, first, a brief introduction to fuzzy sets is provided followed by discussions on other modules.

A. Fuzzy Sets
Fuzzy rule-based system modeling bestows easy interpretability [45], rather their accuracies diminish on par with their peers.Therefore, fuzzy variables with minimal rules are desired for making a tradeoff between interpretability and accuracy [46].Generally, these systems are equivalent to FG defined as granular representations of functional dependencies and relations [47].FG with lesser nodes represents a fuzzy system with a minimalistic rule set offering improved interpretability while maintaining consistent accuracy.In addition, the highly flexible nature of fuzzy-based learning supplements labeling with reduced computations when compared with its peers [48], [49], [50] thereby, motivating FG's engagement in this learning framework.Accordingly, the following fuzzy definitions are adopted to understand the presented learning model.
Definition 1: A fuzzy set A in universal set X is mathematically represented as a set of ordered pairs [47], [48] defined in (1) Definition 2: A fuzzy set A is said to be convex if it satisfies the condition in ( 2) ) α → constant that cuts the membership function and is hence termed as α−cut, which is necessary for determining the convexity of the membership function.(3)

B. NHE-Based Fuzzy Feature Extraction With Scaling
Large-sized FVs are a serious problem in computer vision, data mining, machine learning, and pattern recognition and require greater computational effort and storage space [51], [52].Therefore, to reduce their space and process complexity, this work prescribes a straightforward approach to capture the highly prioritized local variations.Accordingly, each FV denoted by f of size 1 × m with levels f m lying in the range [0 L)|L m, are fabricated into histograms.These histograms represent the probability distribution of levels in terms of N m defined by the count of each level in f to extract its envelope termed NHE (H(f )) given in ( 4) H(f ) obtained from (4) formulates membership function (μ h (x)) as in [43] for converting the crisp FVs {f = f m | 0 ≤ f m < L} to fuzzy FV with its elements representing the degrees of membership [48], [49], [50].The extracted FVs are constrained by the number of elements in μ h (x) that is very much less than the number of elements in f , thereby reducing the feature dimensions and the computational cost.Accordingly, the transformed H(f ) is given in ( 5) μ h (x) constitutes a dataset X = {x 1, x 2 , x 3 , . .., x l , x l+1 , . .., x n }, and x i represents a fuzzy FV.Among n data points only the first l samples are labeled as y l = {1, 2, 3, . .., c} with the remaining considered as unlabeled ul bounded between l ul, and n = l + ul as defined in ( 6) and ( 7) This arrangement facilitates the construction of a n × n matrix defining the affinity between nodes in the graph G. Node affinity is quantified using the initial similarity matrix {S = s ij |0 ≤ s ij ≤ 1} using an appropriate distance measure d on X .According to GSSL if d(x i , x j ) is very small then the labels y i ≈ y j and this assumption holds for both labeled and unlabeled data as defined in (8).

C. FG Structuring
The relation between fuzzy FVs μ h (x) in the dataset X play an important role in LP, and to determine these relations G is constructed with each node representing a fuzzy FV or a fuzzy set μ h (x)|x ∈ X. FG is formally given in Definition 4.
Definition 4: where α(p) and μ(p, q) are the membership values of the vertex p and edge pq in G, respectively, as shown in Fig. 2. Definition 5: A CG G * is a special case of an FG with vertices {α|V = 1} and edges {μ|V × V = 1}.
Definition 6: An FG G = (V, α, μ) is said to be a strong graph if it satisfies (10) μ(p, q) = min {α(p), α(q)} .(10) FG adopted in this work corresponds to fuzzy relations with induced interpretability.Also, FG structuring is simpler when Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1) Conventional GSSLs consider graphs as stationary observations [25], [26].When the graph is assumed to be random, the additional correlations between data characteristics, labels, and structure may be extracted from the joint distribution.Accordingly, the proposed FGLP considers various forms of uncertainty connected to the data and class/label information.This quality induces dynamicity into the graph for learning the relation between FVs. 2) Likewise, graph regularization using GSSLs demands detailed investigations particularly in the scaling aspect when the input size is large.Especially, many such models available in the literature either concentrate on reducing the computational complexity or improving the propagation accuracy.Rather in the proposed FGLP, almost half of the work is done by graph structuring as the node affinities are determined by considering the local variations in the data on a probability basis.3) Also, existing GSSLs completely trust the available subjective label information, which is highly prone to misclassification [25], [26].Instead, FGLP predicts the probability of two nodes sharing the same label to minimize misclassification.The aforementioned advantages motivate the replacement of CG with FG and are adopted in this work for label learning.Later, the node affinity of G is determined by constructing the FDM of n × n dimensions from the dataset X as discussed below.
1) FDM: GSSLs accuracy is highly influenced by the local connectivity or affinity between FVs [11].If there is no local linearity in a neighborhood then the Euclidean distance-based affinity fails to accurately represent the relationship between data points.Furthermore, the affinity between data points is impacted by the data distribution, which is mandatory while constructing the affinity matrix [21].To address this issue, a new FDM (d) is proposed fulfilling the properties of conventional geodesic distance measures.Initially for FDM construction, the Euclidean metric evaluating the node similarity is used by rationalizing the samples using the standard deviation (σ) given in (11) to localize their variations, as they remain highly distinctive Equation (11) calculates the distance between the FVs in dataset X .The first term in (11) represents the Euclidean distances between the FVs normalized by their respective standard deviations σ x i , σ x j .This normalization aids in reducing the effect of outliers [54], [55].While the second term minimizes the intraclass heterogeneity and increases the interclass FVs deviations in the dataset X .To constrain the values of fuzzy geodesic distance obtained in (11) the resultant is then normalized by adopting the fuzzification procedure [56], [57], [58].A numerical instance of the fuzzification in attaining FDM is demonstrated in Example 1 of the Appendix by considering the fuzzy features defined in (12), with each row corresponding to FV elements packed in the interval [0 1] 0.1 0 0.5 0.8 0.3 0.5 0.8 0 0.3 0.1 0.1 0 0 0.5 0.9 The FDM constructed from the FVs X in ( 12) using ( 11) is presented in ( 13) Equations ( 11) and ( 13) outline a few mathematical facts related to the FDM with its corollaries presented below.( Corollary 5: The proposed distance metric is equal to classical geodesic distance if the vectors a and b are in the crisp form. Corollaries 1-5 assist in understanding the data affinity captured by FDM from which FG is constructed.FGs' choice is mainly attributed to their ability in dealing with uncertain and ambiguous data existing in many real-world phenomena [59], while, the crisp counterparts obscure the ambiguity, interpretability, and validity [25].The abovedemonstrated FG construction process is extended to FVs in the dataset X , wherein each FV corresponds to a vertex with d x ij representing the edges.2) Similarity Learning: The similarity matrix (S ij ) constructed using the relation (18) should satisfy the positivity, symmetry, and separability properties.
According to GSSL [2], [11], [60], if the distance between the FVs x i and x j quantified in d x ij is large then their similarities in S ij is reduced.Also, the diagonal elements of S ij are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
zeroed for eliminating the self-loops in FG (S ii = 0).Finally, S ij is normalized using the SoftMax to constrain the rowwise sum to 1 ( n j=1 e S ij n j=1 e S ij = 1) with the resultant equivalently interpreted as a probability distribution matrix.Application of the SoftMax function improvises the intraclass firmness and separability in the features of interclass data points [61].To optimize S ij , the formulated FG and the realized d x ij are combined by tuning the overall cost function (19) covering the aforementioned constraints where γ-regularization parameter.
The first term in (19) represents the loss function responsible for determining similarities between the samples, while the second corresponds to the cross-entropy determining the dissimilarity measure in data points.In combination with SoftMax, this cross-entropy term achieves better interclass separability [62], [63].Minimizing (19) produces Reorganizing (19) in vector form using (20) given in [33] produces (21) The minimization problem in ( 21) is solved using the Lagrangian function given in ( 22) η, β ≥ 0 are the Lagrangian multipliers.Solving (22) for S ij , η, and β yields ( 23)-( 25) The optimal S ij satisfying Karush-Kuhn-Tucker (KKT) conditions is determined using [11], [16], and [33] constrained by the parameters η and γ is determined using ( 26) To determine η and γ, a sparse S ij is constructed using ( 23)-( 26) with the consideration that u nonzero elements (closest neighbors) of FG are selected adaptively based on known labels in the dataset represented by S iu > 0 and the remaining n − u elements are denoted as S iu+1 ≤ 0. Later, S ij is sorted in the ascending order to select the first u elements, and the rest are zeroed.This process is mathematically presented in ( 27) and ( 28) Also from (26), the value of the Lagrangian multiplier η is obtained by exploiting the constraint that the rowwise sum of S ij is unity Upon simplifying (29) the Lagrangian multiplier η for u closest neighbors is determined using (30) Substituting ( 30) in ( 25) produces (31) from which the regularization parameter γ is calculated as in (32) Finally, by substituting ( 30) and ( 32) in (22), the S ij is obtained in (33) where Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. FS and Projection Matrix Learning
In the case of high-dimensional FVs, the computational complexity in learning the graph similarity is large, to compensate for this, FS is incorporated into the framework, which additionally improves its interpretability.Also, FS plays an important role in retaining or dropping FV elements based on their significance in the learning perspective.Accordingly, in this contribution, it is highly essential to assess the importance of data attributes extracted from the heterogeneous datasets and establish FGLPs' uniformity in dealing with diverse applications.
Accordingly, FS is performed using a novel fuzzy entropybased optimization function tuned by LR for constructing the projection matrix w ij of dimensions m × ω.The rationale is to exploit DR through FS, which is accomplished by the LR's penalty term (responsible for the selection of relevant features) with simultaneous reduction of FV dataset dimensions from n × m to n × ω, supplemented by the introduced importance score, thereby, emphasizing the role of data attributes in classification.
Accordingly, the projection matrix w ij for minimizing FV size is recursively tuned using ( 34) x j →jth FV with importance score v j ; λ → penalty parameter; p j → the predicted label of x j , determined from the pseudolabels matrix comprising of available labels and predicted labels represented as P = ( P l P ul ).To ensure error-free w ij the predicted labels are initially assumed zero (P ul = 0).The first term in (34) is the cost function approximating w ij , whereas the second corresponds to the penalty responsible for FS.The first term adjusted by the importance score v j is formulated using the fuzzy entropy to capture the FVs uncertainty and potentially enhances FS by upsizing or downsizing the FV length during learning.Also, it effectively distinguishes the supervised and unsupervised samples thereby, offering accurate label prediction.The second term in (34) biases optimization by effectively removing less informative or redundant features favoring representational sparsity [64].λ's magnitude directly influences FS, if high very less features are selected while lower values result in the selection of more features thereby leading to underfitting or overfitting.Therefore, striking a balance in adaptively fitting λ is determined using the Lagrangian multiplier that overcomes the aforesaid issues with simultaneous reduction of FV dimensions.Accordingly, (34) is minimized to produce w ij in vector form as in (35) Lagrangian representation of the lasso problem in ( 34) is simplified in (36) using ( 35) The positive definite Lagrangian multiplier η 1 > 0 is determined by following the procedure adopted in evaluating S ij followed by the construction of w ij meeting KKT conditions and stated in (37) η 1 in ( 37) is determined utilizing the constraint n j=1 e w ij n j=1 e w ij = 1, for u closest neighbors and presented in ( 38) Finally, the penalty parameter λ is determined upon substituting (38) in (37) with the consideration that u nonzero elements (closest neighbors) of FG are selected adaptively based on unknown labels in the dataset and stated in (39) Substituting λ, η 1 in (35) produces the optimal projection matrix that helps in addressing the dimensionality issues in GSSL.
The aforesaid mathematical process is outlined in Algorithm 1. FS algorithm introduced above reduces the dimensions of the dataset X and S ij , is utilized for label learning as discussed below.

E. Label Learning
Labeling unlabeled samples present in dataset X depends on the constructed FG, the information of labeled nodes, and the membership values of graph edges capitulated in S ij .The optimized S ij upon FS strictly confines the learning process to u nearest neighbors by eliminating outliers to acquire highly correlated FVs, thereby improving the classification accuracy.Later, to propagate the label x l to x ul in X , a simple prediction matrix P = ( P l P ul ) is formulated by decomposing S ij as in ( 40) where s l,l , s l,ul , s ul,l , and s ul,ul are the local similarity features of labeled-labeled, labeled-unlabeled, unlabeled-labeled, and unlabeled-unlabeled FVs, respectively.Using (40), the predicted labels for unlabeled FVs are determined using ( 41) (41) FG learns the local similarity between the FVs based on easily accessible label information corresponding to the s l,l components before LP using (42) s l,l = s ij , if y i and y j are known and i = j 0, Otherwise .( The labeled-labeled similarity feature component s l,l stated in (42) helps in grouping S ij into c block diagonal elements [33], thereby, eliminating the need for additional graph cutting, hence boosting LP.Later, the label information P ul for unlabeled FVs with u nonzero row elements is predicted using ( 43) where i ∈ [0, l] & j ∈ [l + 1, n] with u constraining the search range.This FGLP arrangement propagates label information y l of s l,l directly to the first u element in each column of nonzero elements in s ul,l .After completion of each class LP, the j value is incremented to j + u to search for a new region, and the label information is propagated to the complete dataset using the optimized S ij .All the operations, such as label learning, similarity, and projection matrix learning are done simultaneously.The mechanism discussed above is outlined in Algorithm 2 and also, mathematical explanation for an instance of the FV dataset is provided in Example 2 of the Appendix.
Finally, the rendered FDM is capable of handling multilabels owing to the compaction of intraclass FVs with simultaneous decorrelation of interclass FVs an attribute owed to the standard deviation-based normalization of the FVs presented in (11).This discrimination characteristic is extended to the construction of optimal S ij for handling the multiclass data.Also, when combined with LP, it warrants consistent scores for classifying multiclass or multilabel data by exploring the effect of label information, correlation, and local graph structure with the criteria that there should be at least one labeled FV for each class.To demonstrate the multiclass handling ability of FGLP, it is rigorously validated and relatively analyzed on the UCI and a few other datasets in Section III.

Algorithm 2: Optimization Algorithm for FGLP.
Input: • γ, λ from ( 32) and (39) Step 6: Initialize pseudo label matrix P with given labels P l by treating P ul as zeros for ∀ classes in X do Step 7: Update S ij using (33) Step 8: Update w ij using (35) Step 9: Update P using (43) end for

A. Experimental Settings
The performance of the proposed FGLP on different datasets on par with the traditional and recent contemporaries is investigated using relevant validation metrics and elaborated in the below sections.
1) Dataset Description: FGLPs effectiveness is examined by conducting numerous tests on heterogeneous real-world datasets in the context of LP for data categorization and FS.Especially the model is evaluated on 23 UCI datasets [65] with 14 being multiclass datasets and the other nine corresponding to binary datasets.To analyze the suitability of the proposed model when extended to real-time applications, along with UCI datasets, a few other image, text, and digit heterogeneous-natured datasets are additionally engaged.The details of these datasets with the number of classes and size are briefly given in Table II.
The datasets engaged for investigation are composed of diverse entities with different categories defined with a varying number of FVs, lengths, and the number of classes as given in Table II  especially for the face, object, scene, and text datasets.Particularly, the data redundancy in FVs belonging to these datasets is well exploited by the introduced NHE-based feature extraction.

B. Relative Analysis
Apart from the parameter setting, the quantity of randomly selected labeled and unlabeled samples from the dataset remains the same for all methodologies used for relative analysis.Herein, the mean classification accuracy along with standard deviation over ten random splits are reported for uniform relative analysis.Accordingly, if x i is the unlabeled FV from the dataset X , and g i , y i are the predicted and ground truth labels, then the task of labeling is set as in (44) and the model's accuracy is assessed using ( 45) 1 if g i = y i 0 elsewhere ( 44) 1) LP Models: To evaluate FGLPs proficiency, tests were widely conducted on 23 datasets representing binary and multiclass data from the UCI machine learning repository with the achievements relatively recorded with the trending and conventional predecessors.Initially, the FGLP model operates on binary datasets, and the achieved outcomes are rated with the graph-based models [11].Subsequently, a maximum of 10% of FVs are selected as seed labels by fixing u as the number of unlabeled images per class.As the FVs count in every class is different, the labeled information is fixed for each class and is graphically shown in Fig. 3.
The accuracy curves shown in Fig. 3 outperform their counterparts, which are owed to the fuzzy-based edge and node characterization ensuring interpretability, flexibility, and adaptability irrespective of the diverse-natured datasets.Moreover, the probability-based edge formation in FGs plays an important role in LP with the propagation of only the first u elements, leading to a reduction in the false classification.To demonstrate the inherent characteristics of FGLP aligning with the aforesaid qualities this analysis is performed as the data intricacies of multiclass UCI datasets are extremely high in comparison with the binary counterparts.Accordingly, the efficacy analysis is extended with the labeled information confined between 10% and 70% followed by the achieved accuracy tabulation in Table III along with the recent semisupervised multicategory classification [31].
FGLPs accuracies recorded in Table III demonstrate its effectiveness in the context of available labels.If the label information is more than 30% of the dataset size FGLP dominates both linear and nonlinear versions of SSL and even with 10% labels its performance matches with the nonlinear SSWTDS.This dominance is owed to FDM, which bounds the data distribution within the standard deviation thereby, reducing outliers.Also, the fuzzy-based characterization of edges and nodes in structuring FG confines their membership value in the range [0 1] ensuring interpretability.

TABLE III RELATIVE ANALYSIS OF MULTICLASS CLASSIFICATION RESULTS ATTAINED USING FGLP
FGLPs' suitability and applicability in real-world applications are investigated by engaging heterogeneous datasets, namely MSRA25, ORL, Coil-20, and USPS, consisting of face and object images along with text and digit datasets, such as CNAE-9 and Digi-1-10.Specifically, for the text dataset CNAE-9 the FVs are extracted utilizing term frequency and inverse document frequency (TFIDF) followed by NHE-based processing and later subjected to accuracy assessment.The accomplishments are compared with the recent graph-based LP models [30], [33], [34], [35], [36] and reported in Table IV.
Table IV numerically depicts the dominance of FGLP with the other eight models, while, the case of single labeled sample Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. in Dig 1-10 and Coil-20 declines due to the global scaling nature of NHE, which neglects data intricacies if the labels are very less thereby, insisting the need for adaptive FV scaling constrained by local data characteristics.Rather, FGLPs accuracy surpasses its peers when the labeled samples were increased.However, a steady increase in the accuracy is witnessed in the other datasets and remains very high even with less labeled information, which is owed to FG and the similarity matrix.Also, the tuned similarity matrix combining SoftMax with cross-entropy ensured swift data adaptability that increases propagation accuracy.As the overall model building and processing is done in the fuzzy domain, the data values are constrained in [0 1], which makes the prediction and propagation model more flexible by prioritizing edge formation and determining the nearest neighbor at ease. 2) Embedding Models: In addition, the analysis is extended to a few other image datasets, such as 8 sports, Scene 15, Coil-20, and Extended YALE from the scene, object, and face categories and compared with the recent embedding models [38], [39], [40], [41] in Table V.
Accuracies registered by the proposed model have outperformed the others as given in Table V.This quality is attributed to the standard deviation-based normalization and the max term in (11) that had promised better accuracies by reducing outliers and ensuring interclass separability in FVs.Also, the cross-entropy-based cost function with SoftMax is an another major reason for increasing propagation accuracy.

3) Ranking by Statistical Testing:
To justify these relative analyses additionally, a pairwise analysis using a nonparametric statistical hypothesis test namely the Wilcoxon signed ranked test was conducted on all the datasets.To test the hypothesis a significance level (α) is assumed 0.05 and later inferences were made based on the obtained probability values (p − values) [66].The choice α = 0.05 denotes the labeled sample size, which is fair for relative analysis and if decreased adds to the sample strength, which makes LP complex.Also, if the attained p − value for a pair of models in a dataset is less than α then it indicates that there is a difference in the models' performance outlining the superiority over the other.Table VI presents the Wilcoxon signed rank results (p − values) corresponding to the datasets and models from Table III.
The same superiority is observed in Table VIII with the models ranked as FGLP > GCSE > DFEFP > DLA > FDEFS.From these Wilcoxon's tests, it is inferred that the performance of the proposed FGLP is consistent with different types of data, irrespective of the number of classes, and size of the dataset.This consistency is owed to the optimal similarity matrix, which is responsible for decorrelating the interclass FVs by maintaining intraclass firmness that makes LP effective.

C. Effect of Feature Selection
To investigate the strength of the introduced FS in Section II-D, it is specifically validated on the Coil20, ORL, and YALE image datasets as their FVs are larger in dimensions compared with the FVs of the other datasets.Relatively, the experiments are conducted by randomly selecting 10% of the labeled FVs from the dataset followed by the selection of the top {10%, 30%, 50%, 70%, 90%} features to maintain uniformity with its peers [67], [68], [69], and given in Table IX.
From Table IX, it is evident that the proposed label learning process attains higher accuracies even at reduced dimensions.Irrespective of the FS percentage FGLP dominates its predecessors in the ORL and YALE datasets and particularly in Coil20 for the 30% case it supersedes others.However, for the other cases in Coil20, FGLP closely tails top performers and falls marginally behind them.Upon observing the dominators in Coil20, it can be seen that they are majorly KNN-based FS schemes whose fixed choice of K has influenced the achievement, whereas they largely decline in the other FS cases to the FGLP-based FS in the other datasets, which is an indication of the optimization introduced across every stage of FS.Upon analyzing experiments, it is further inferred that to have a better balance between the FV size, and accuracy, at least 10% of features should be selected whereas no such significance is noticed in its competitors.Overall, it is claimed that the introduced fuzzy entropy clearly distinguished the labeled FVs from unlabeled FVs.Also, the introduced lasso penalty tuned by the Lagrangian assists in efficient learning of the thereby effectively minimizing the FV size without compromising on the accuracy.Further to justify the model's simplicity and amicability for real-world scenarios computational complexities along the time and space are dealt with in Section IV.

IV. COMPLEXITY AND INTERPRETABILITY ANALYSIS
The computational complexity of the presented method is described with the O− for analyzing the incurred time and space.The presented method's complexity considers the mathematical steps involved in different stages dealing with feature extraction followed by learning of similarity, projection, and label matrices.In contrast, most SSL approaches involve individual data training and testing that highly escalates these complexities.The suggested methodology's complexity is detailed in Sections IV-A and IV-B.

A. Time Complexity
Formulating time complexity commences with the extraction of NHE-based feature extraction based on the data entity being handled.For instance, when dealing with images, the n images of the dataset (face dataset and handwritten numbers) having the dimensions p × q incur the time complexity prescribed in (46) to compute the NHE O (n (pq + b)) .Similarly, for n text documents first, the TFIDF features are extracted in O(ng log(ng)), where g is the length of sequence or N −gram.Later, NHE scales these TFIDF features that incur O(n(a + b)) operations, where a is the number of elements in the TFIDF vector.Thus, the total time required for feature scaling of text documents is stated in ( 47) ) Therefore, the computational time for calculating the distance between every FV with all FVs in the dataset requires O(nm) operations.n-Number of FVs in the data and m-Number of features for each FV.The complexity rises to O(n 2 m) when constructing the distance matrix from the FVs.As the proposed work considers only, L features for data distinction based on their intensity variations, hence, the computational time for determining Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Similarly w ij learning with u nearest neighbors requires O(nωu) with n × ω being its dimensions.Further, the incorporated optimization model downscales the computing time to finally achieve the total time (T total ) for LP using FG given in (50) To signify the reduced realization time of FGLP additionally, LP experiments are performed on MATLAB R2021a running on Intel Core i3-6100 U CPU @ 2.30 GHz and 8 GB RAM with the outcomes relatively presented in Table X along with time complexities.
The complexities of the widely popular and recent methods are extremely higher in comparison with FGLP as given in Table X.FGLPs' reduced complexity highly benefits implementation thereby, signifying the models' simplicity, which is an essential quality desired for real-time extensions.

B. Space Complexity
Similar to the time complexity, the space complexity of the proposal is also determined across all stages.In the first stage, the extracted NHE-based features (for n images or text documents) occupy O(nL) space.Later, FDM requires O(n 2 ) space for storing the similarity matrix.Thus, the overall space occupancy(S total ) is evaluated in ( 51) The evaluated time and space complexities in ( 50) and ( 51) of the proposed model are in quadratic order and are highly less in comparison with its peers having higher orders.Also, it is established that despite the computational reduction, the method does not compromise the classification accuracy, thereby, making it more favorable for real-time heterogeneous data classification.

C. Interpretability Analysis
The projection matrix w ij learning minimizes the FV size, reducing computational complexity thereby increasing the FGLP's interpretability as both complement each other.Therefore, a mathematical metric defining interpretability index I of the fuzzy system is introduced in [43] is engaged in this work and presented in (52) Q F representing the overall fuzzy complexity and is determined using ( 53) Herein, Q F completely rely on FG structuring encompassing the number of fuzzy relations for edge formations characterized by the number of fuzzy rules and sets.Based on these discussions Q F is finally reduced to (54) Wherein Q rules is the complexity related to the number of rules firing one at a time and Q F S is the ratio of the number of fuzzy sets utilized at a time (fs i ) to the total number of fuzzy sets (T fs ), as presented in (55) and (56), respectively T rules = n(n−1) 2 correspond to the total number of rules utilized in decisioning the edge formation and T fs represents n fuzzy sets each of length L. By considering these factors, and utilizing To better understand FGLPs reliability and expressiveness, the developed interpretability measure is examined on eight datasets given in Table XI.
From Table XI, it is evident that the interpretability involved in FG structuring is highly dependent on dataset dimensions, especially the reduction in FV size that led to the hike in interpretability thereby outlining FGLPs enhanced fairness in LP.

V. CONCLUSION
This research proposed a simple yet versatile FG-based paradigm for LP.The key contribution is the creation of the FDM that assists in grouping intraclass samples and deviating interclass data samples, which is extremely crucial in the design of the FG for LP.Later to attain the optimal similarity matrix, the newly framed cost function engages the SoftMax function coupled with cross-entropy to offer wider data deviations in the interclass FVs.The coalescence of these techniques makes it more efficient than its predecessors, as evidenced in the accuracy analyses by a factor of 20% done on diverse heterogeneous datasets.Furthermore, the complexity analysis strongly demonstrates the appropriateness of this model for real-time applications without sacrificing performance.Although FGLP showcases higher accuracy values with and without FS even with lesser label information, the model's computational complexity increases with the dataset size.Also, it is observed that the model's accuracy decay if the labeled information is less in large datasets.Moreover, the interpretability, flexibility, and adaptability characteristics of FG require deep exploration for accelerating simultaneous learning to reduce the model's computational complexity.

Definition 3 :
If A and B are the fuzzy sets from universes X and Y , then the relation R between the sets A and B is a cartesian product represented as R → A × B, and the membership function of the relation R is given in (3) μ R (x, y) = μ A×B (x, y) = min (μ A (x), μ B (y)) .
Number of bins.
(38).., n where X l corresponds to l labeled FVs, X ul corresponds to ul unlabeled FVs, and n = l + ul.Determine the similarity Matrix S ij = 1 − d x ij ; | S ii = 0 and set w ij = 0 m×ω ; where ω is the size of FV after FS Step 5: Determine the following parameters by meeting KKT conditions • η, η 1 from (30) and(38) . Columns 4 and 5 given under "Length of FV" in Table II demonstrate the potential of the introduced NHE-based feature extraction that has downed the FV length significantly,

TABLE IV PERFORMANCE
COMPARISON ON FACE, OBJECT, TEXT, AND DIGIT DATASETS TABLE V PERFORMANCE COMPARISON ON SCENE, OBJECT, AND FACE DATASETS

TABLE VI WILCOXON
SIGNED RANK TESTS WITH α = 0.05 FOR THE METHODS FROM TABLE III

TABLE VII WILCOXON
SIGNED RANK TESTS WITH α = 0.05 FOR THE LP METHODS (IN TABLE IV)

TABLE IX EFFECT
(49)S ON PROPAGATION ACCURACY TABLE X COMPARISON OF THE COMPUTATIONAL COMPLEXITY OF DIFFERENT METHODSFor calculating the similarity matrix this computational time is again reduced to a very large extent based on u nearest neighbors constrained by u L and presented in(49) O n 2 u .(

TABLE XI INTERPRETABILITY
(57)YSIStwo fuzzy sets at a time, Q F is finally reorganized in(57)