A kernel-free L1 norm regularized ν -support vector machine model with application

With a view to overcoming a few shortcomings resulting from the kernel-based SVM models, these kernel-free support vector machine (SVM) models are newly promoted and researched. With the aim of deeply enhancing the classification accuracy of present kernel-free quadratic surface support vector machine (QSSVM) models while avoiding computational complexity, an emerging kernel-free ν -fuzzy reduced QSSVM with L1 norm regularization model is proposed. The model has well-developed sparsity to avoid computational complexity and overfitting and has been simplified as these standard linear models on condition that the data points are (nearly) linearly separable. Computational tests are implemented on several public benchmark datasets for the purpose of showing the better performance of the presented model compared with a few known binary classification models. Similarly, the numerical consequences support the more elevated training effectiveness of the presented model in comparison with those of other kernel-free SVM models. What`s more, the presented model is smoothly employed in lung cancer subtype diagnosis with good performance, by using the gene expression RNAseq-based lung cancer subtype (LUAD/LUSC) dataset in the TCGA database


Introduction
Recently, machine learning has been extensively applied to various fields of contemporary science, such as computer science, statistics, medicine, economics, engineering, and applied mathematics (Di & Joo, 2007;Mainali et al., 2021), and one of which crucial research orientations is just data categorization is just one of most crucial research orientation; additionally, there are numerous research and applications about it.As a computationally powerful tool, support vector machine (SVM) has shown to be superior to other classification methods in various applications and is gaining popularity (Cholette et al., 2017;Danenas & Garsva, 2015).SVM was originally proposed by Cortes and Vapnik (1995).It has been well developed in the last two decades and utilized in numerous practical problems.With the aim of handling one binary classification task, a separation hyperplane with a maximized margin is discovered in the SVM model through solving a convex quadratic programming problem.Later, Schölkopf et al. (2000) propounded the so-called ν support vector machine (ν-SVM) regulating the number of support vectors by one unique parameter ν.The parameter ν of ν-SVM is bounded, so it is more convenient to tune the parameters of v-SVM.
Masses of binary data sets applied to practical problems do not become linearly separable.For the case of linearly inseparable data sets, Cortes and Vapnik (1995) proposed an SVM model with kernel functions.Through adopting those functions of the kernel, the SVM with kernel can yield the nonlinear separation surfaces for some data sets.Our thought illustrates that the related data is mapped into a more elevated dimensional characteristic space; subsequently, the mapped data points are categorized linearly.The ν-SVM can be provided with the kernels to manage nonlinear cases as well (Yan & Xu, 2007).It is clear how to select a suitable kernel function; in addition, to deeply tune the involved hyper-parameters depletes a considerable quantity of computational efforts (Cristianini & Shawe, 2000;Schölkopf & Smola, 2002).As a result, it is natural to consider a practical approach seeking the nonlinear classifiers within the source space.
To avoid a challenge of using nonlinear kernel tricks, Dagher et al. (2008) proposed kernel-free SVM.Kernel-free SVM directly uses nonlinear surfaces to separate the two types of data without mapping them to a high-dimensional feature space.It does not require pre-selection of kernels or tuning kernel parameters, which saves plenty of computational efforts and time during the training.Later, Bai et al. (2015) presented one kernel-free least squares QSSVM with the intention of diagnosing diseases.Luo et al. (2016) introduced fuzzy methods and presented one fuzzy kernel-free quadratic surface SVM model.Gao et al. (2019) presented the least squares twin QSSVM through acquiring figures using two quadratic surfaces.Nevertheless, kernel-free QSSVM models might also own drawbacks.Firstly, the quantity of variables inside SQSSVM will exponentially ascend after the equivalent reconstruction (Luo et al., 2016;Gao et al., 2022;Mousavi et al., 2019;Yan et al., 2018), which degrades the training efficiency of SQSSVM.Moreover, SQSSVM cannot be simplified to a hyperplane as the given data set becomes linearly separable (Mousavi et al., 2019).One potential remedy to resolve both issues is adding an L1 norm regularization term to the objective function.The introduction of the L1 norm regularization term can bring sparsity to the model (Mousavi & Shen;2019;Mousavi et al., 2019;Shen & Mousavi;2018).As the penalty parameter of the L1 norm regularization term increases, the decision variables of the quadratic surface become more sparse as a way to prevent computational complexity and overfitting.At the same time, with the large penalty parameter of the regularization term, the quadratic surface will be nearly a linear plane (Mousavi et al., 2019).In similar circumstances, the model is capable of capturing the sparsity mode of the Matrix coefficient of the authentic quadratic surface.
In this paper, we put forward the most advanced kernel-free ν-fuzzy is raised, which reduced quadratic surface support vector machine with L1 norm regularization model for nonlinear binary categorization, marked as (L1-ν-FRQSSVM).The proposed model was rigorously derived and studied.Similarly, computational tests are implemented with the objective to detect the accuracy of classification, the training effectiveness, along with the parameter sensibility.These models of SVM have been smoothly employed in numerous authentic issues, one of which is to diagnose the disease.Lung carcinoma, acting as a malignant tumor, is one of the most common carcinomas all over the world.Early diagnosis is important for the treatment (Xie et al., 2021).Lung carcinoma has several subtypes, among which lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC) become the most pervasive.LUAD and LUSC differ in clinical presentation, diagnosis, treatment and prognosis.Therefore, it is very important to make accurate pathological diagnosis for the subtypes of patients with subtypes of lung carcinoma.Presently, several QSSVM models with no kernel (Luo et al., 2016) have been adopted in the diagnosis of diseases.However, on account of the vast quantity of features inside gene expression RNAseq-based lung cancer datasets, the kernel-free QSSVM model presented in literary works may not be appropriate to diagnose the lung cancer subtype.Some advanced machine learning models (Malik et al., 2022;Tang et al., 2022) have similarly been put forward and adopted for lung cancer subtype diagnosis.In this paper, by using the gene expression RNAseq dataset of LUAD/LUSC obtained from the TCGA database, The proposed model was demonstrated to cope with the subtype diagnosis of lung cancer successfully.The dominating contributions of our research are summed up as listed: 1.The presented (L1-ν-FRQSSVM) model introduces the L1 norm regularization term.It promotes the sparsity of the model, so that computational complexity and overfitting can be avoided.At the same time, it enables the model to infinitely approximate the linear plane when facing the linearly divisible case of the split interface.2. The proposed (L1-ν-FRQSSVM) model combines the ν-SVM idea.As a bounded parameter, the parameter ν is selected in a more fixed range, which makes the model more convenient for parameter tuning.Meanwhile the model is able to control the quantity of support vectors by the parameter ν. 3. The separation quadratic surface originates from the presented (L1-ν-FRQSSVM) model with no cross terms in form of the quadratic form such that it`s far more effective compared to the other experimented QSSVM models with no kernel on the part of time for computing.What`s more, we equip the presented model with a fuzzy membership, which is employed in every one of training data points and assists in lessening the comparative benefit of noise or outliers, producing the optimal separation surface.4. We apply this proposed (L1-ν-FRQSSVM) model to the diagnosis of lung cancer subtypes (LUAD/LUSC).There are nearly two million features in the data of the lung cancer subtype diagnosis issue, limiting the usability of the known kernel-free QSSVM model, however, doesn`t affect the usability of the presented model.In accordance with the computational consequences, the great performance for lung cancer subtype diagnosis indicates that the kernel-free model has good potential for handling real-world problems with data having a vast quantity of features.The other parts of our article are structured in the following manner.Some well-established support vector machine models are reviewed in Section 2; they are the basis of the presented model in this research.The presented (L1-ν-FRQSSVM) model is put forward inside Section 3. The computational tests are depicted inside Section 4, which is performed on several public benchmark datasets and gene expression RNAseq datasets from LUAD/LUSC in the TCGA database.Additionally, the conclusion is described in Section 5.

Preliminaries
Some notations are introduced in this part, as well as the preconditions used in this paper, along with a short retrospection of several relevant SVM models for the sake of binary classification.
Throughout the research, the lowercase letters are employed for scalars, the lowercase bold letters are employed for vectors; moreover, uppercase bold letters are also employed for matrices.ℝ refers to the set of authentic numbers; additionally, ℝ refers to the -dimensional authentic vector.ℝ .refers to the n-dimensional non-negative real vector.I is utilized to denote the  ×  identity matrix and the set of  ×  matrices are regard as  .Also, the binary data set is defined as below: Next, a brief retrospection of several corresponding SVM models is offered.As to an assignment of binary classification, the SVM mechanism is used for discovering one best hyperplane that satisfies these classification requirements such that the hyperplane maximizes the separation margin while guaranteeing the classification accuracy (Cortes & Vapnik, 1995).While the data is not separable linearly, to handle training data sets with outliers and noise, the soft-margin is utilized through depicting this slack vector  = [ , … ,  ] ∈ ℝ .We figure out the soft-margin linear SVM: in which  > 0 means a given parameter.Each  is utilized to measure the classification error for each  ( ) .Like the SVM models, the ν-SVM model is propounded by Schölkopf et al. for the sake of binary classification inside (Schölkopf et al., 2000).We figure out that model as below: To address the linearly inseparable case, Schölkopf et al. (2000) similarly propounded the ν-SVM models with the kernel.The thought is just to use a nonlinear kernel : ℝ → ℝ ( > ) to map these datasets points in a more elevated dimensional feature space; next, the mapped data is classified.That kind of ν-SVM model is figured out as below: In this model,  ∈ [0 , 1] is the only parameter, and the role of ν is capable of effectively controlling the quantity of these support vectors (Schölkopf et al., 2000).Its dual issue could be figured out as follows: in which, the given parameter is denoted by ν.Several theoretical qualities have already been studied inside (Schölkopf et al., 2000), such as the fact that the ν parameter is bounded and can be utilized to control the number of support vectors.Nevertheless, as to an appointed data set, there exists no general rule of selecting the most appropriate kernel for utilization automatically.Furthermore, the effect of one soft SVM model with a kernel leans chiefly upon the chosen parameter set in the kernel function; moreover, to tune those parameters often takes much effort (Cristianini & Shawe, 2000;Schölkopf & Smola, 2002).To avoid the challenges in using the nonlinear kernel trick, multiple SVM models with no kernel have been put forward and promoted to classify nonlinearly (Dagher, 2008;Luo et al., 2016;Mousavi et al., 2019).Rather than mapping data points into one more elevated dimensional feature space, those aforementioned SVM models divide data sets through straightforward yielding nonlinear separation surfaces within the source space.One representative kernel-free SVM model listed below is the soft quadratic surface SVM (SQSSVM) model (Luo et al., 2016), maximizing the sum of comparative geometrical margins of data points when penalizing these margin errors: in which  denotes an appointed parameter.We could formulate the (SQSSVM) model equivalently again into the model listed below with the aim of implementing more easily: in which  and  ( ) (∀ = 1, . . . ) are stated inside (Luo et al., 2016).Furthermore, we figure out the dual issue of (SQSSVM′) (Luo et al., 2016) is as stated below: In real-life applications, there is often a certain amount of unavoidable noise in the datasets, and this noise can reduce the accuracy of the SVM model classification.Therefore, Lin and Wang proposed an SVM model with fuzzy property (Lin & Wang, 2002), and they quoted  ∈ (0,1] as a weight to indicate the importance of each data point  ( ) for classification.If  is closer to 1, then  ( ) is more important, and conversely if  is closer to 0, then  ( ) is more like an outlier.With the fuzzy attribute, the SVM model helps to classify data sets where noise emerges.Considering one data set  as stated by (1), we have a definition of the fuzzy membership  of  ( ) as below: in which a small positive number is used to prevent the fuzzy membership value from being zero.The  and  are average of positive data points and negative data points, respectively.The formulations for  and  are as follows, expressed as the radii of two classes of data points, separately.
We figure out the linear fuzzy SVM (Lin & Wang, 2002) as below: in which C > 0 means the given parameter; additionally,  ∈ (0,1] means the fuzzy membership connected to the data point  ( ) (∀i = 1, . . .), N).Based upon the  , the significance of the data point  may be handled since a larger value of  generates a comparative important data point  ( ) for classifying.What`s more, we calculate the dual issue of (FSVM) as below: in which C means the given parameter.Make sure these FSVM models could be offered with kernels to classify nonlinearly.Varieties of fuzzy-based SVM models were presented in literacy works (Tian et al., 2018;Luo et al., 2017), which are employed with the intention of handling disparate categories of the authentic issues.

The L1 norm regularized ν-fuzzy reduced quadratic surface support vector machine model
The (L1-ν-FRQSSVM) model is presented within this section for binary classification to improve the computational effectiveness and generalization capability of present approaches.
in which ν is the given parameter.Mousavi et al. (2019) demonstrated that the introduction of the L1 norm regularization term can bring sparsity to the model.As the penalty parameter of the L1 norm regularization term increases, the decision variables of the quadratic surface become more sparse.With the large penalty parameter of the regularization term, the quadratic surface will be nearly a linear plane.Therefore, we introduce the L1 norm regularization term based on (ν-SQSSVM) and propose the (L1-ν-SQSSVM) model below.
where λ and ν refer to the given parameters.Similarly, (L1-ν-SQSSVM) is (L1-ν-SQSSVM′) after the equal reformulation is adopted while (SQSSVM′) is formulated: where ν and λ are the given parameters.We know that the variable  in the convex optimization model (L1-ν-SQSSVM′) becomes a vectorization of the matrix variable  inside the (L1-ν-SQSSVM), and the quantity of variables connected to these separation surface coefficients is of order of O ( ).Therefore, when the feature dimension of the dataset increases, the computational volume of the model becomes very large, and the computational efficiency decreases rapidly.
In (Gao et al., 2022), to circumvent the computing complexity resulting from vast quantities of characteristics and benefit from the kernel-free QSSVM model in the meanwhile, a method of approximately simplifying surfaces is proposed.That is, when designing the variable , only the factors of the diagonal of the matrix  are recorded and other upper triangular elements other than that are ignored.We know that for the matrix W, the elements on its diagonal correspond to these coefficients of the quadratic terms of the separated surface, and the remaining is just these coefficients of the cross terms.To put it differently, our thought is to divide two classes of data through adopting one reduced quadratic surface; additionally, its quadratic coefficient matrix is diagonal.Based on the above ideas, we propose the corresponding simplified (L1-ν-SQSSVM) model: where ν and  are the given parameters.(L1-ν-RSQSSVM) could be formulated equally again as the listed model by means of the identical trick in the formulation.Make  ≜ dvec(∑), ∀ = 1, . . ., , and  ( ) ≜ qvec( () ).The reduced (L1-ν-SQSSVM′) is capable of being expressed: where ν and  are the given parameters.(Σ * ， * ， * ， * ， * ) denote the optimal solution. * () =  () Σ *  () +  ()  * +  .denotes the decision function.If  * () ≥ 0, the data point x is the positive class; or it is the negative class.Finally, the fuzzy affiliation is assigned to every one of data points  () , the following (L1-ν-FRQSSVM) model is put forward: where ∑  () is the reduced quadratic surface, ν and λ are the given parameters, ∑ | | means reducing the W matrix to the form with the cross terms removed,  is the fuzzy affiliation, and  is the loss term.the optimal solution of the (L1-ν-FRQSSVM) model is (Σ * ， * ， * ， * ，ν * ), and the decision is made in the same way as (L1-ν-RSQSSVM).
Its equivalent reconstruction is as follows: Remark 1.The slack variable  ( ) gauges this mistake of misclassification; additionally, the fuzzy membership  means one comparative in-class significance of data  () .Recall that the fuzzy membership is placed inside two terms of the objective function.Each of these terms,   ( ) and  ∑  () +  could be considered to be a gauge of  ( ) and ∑  () +  with a weight  , separately.While  () trends to be one outlier, it should be less essential for classification.So, its related weight  is anticipated to be small for the sake of lessening the impact of  ( ) and ∑  () +  on the separation surface generated by the presented model.

Numeric experiment
In this section, we have done some computer experiments with the objective to detecting efficiency and effectiveness of the presented model (L1-ν-FRQSSVM).Firstly, the experimental setup is presented.Next, the presented model and several binary classification models are tested on a few public benchmark data sets.

Test settings
We compared the presented model to several benchmark models, containing a few developed SVM models, like SVMs with quadratic kernel and RBF kernel, the -SVM model with quadratic kernel and RBF kernel, Fuzzy SVM (Lin and Wang, 2002), etc.Similarly, we implement the two kernel-free SVM models, such as SQSSVM model (Luo et al., 2016), ν-SQSSVM and ν-FSQSSVM model (Mousavi et al., 2019).In addition, other typical binary classification methods are experimented with for comparison, such as the decision tree model, the logistic regression model, the artificial neural network model, along with the Gaussian naive Bayes model.Table 1 lists the abbreviations of each tested model and the commercial solvers or packages used for catching out relevant models.Make sure the (L1-ν-FRQSSVM′) is carried out for (L1-ν-FRQSSVM) through adopting the Cplex solver.
We implement all the computational tests on one computer with two Intel(R) Xeon(R) Gold S117 CPUs @ 2.00 GHz and 64 GB of memory.A procedure for five-fold cross validation adopted for every test.Every one of the tests is duplicated five times with the intention of letting the results to be statistically significant and valid.The standard deviation (SD) and mean of the accuracy scores for each experiment are put down with the aim of verifying every one of the models.What`s more, we record the CPU time, which is just the time when conducting a model one time using the fixed parameters.
The parameters are coordinated by a grid method, which is commonly applied to literature (Luo et al., 2016;Gao et al., 2021).Initially, a few tests on artificially designed data sets are implemented.We selected a quadratically separable artificial dataset to pretest the proposed model and compare it with several other selected benchmark SVM models.The classification results are shown in Figure 1.Where the blue and red dots denote two types of data, the hollow dots denote the experiment data, and the solid dots represent the training data.Five-fold cross validation procedure is utilized, where the percentage of the training set is 60%.We present the hyperplane of each model with different lines in the figure and calculate the accuracy of different models.
It is shown in Fig. 1 that the consequences of classification, which is produced by these presented models, are equally great, or even exceeds those of other well-researched models.Indeed, the quadratic separation surfaces are produced by the proposed (L1-ν-FRQSSVM) model for binary categorization; additionally, it is anticipated that the quadratically separable data set is classified better.The accuracy of classification is enhanced by the presented model, driving us to implement more computing tests to discern the performance of the presented model in comparison to others.

Experimented on public benchmark data
(L1-ν-FRQSSVM) was experimented on several benchmark datasets, of which the foundational information is as follows inside Table 2.As to every one of the data sets, fivefold cross-validation is employed during training for the adjusted parameters.Those parameters are adjusted by means of the grid-search approach.The SD and mean of accuracy scores of each experimented model are put down.The training CPU time of each tested model is also put down for the purpose of comparing the computational effectiveness.
We have these listed observations from the consequences within Tables 3 and 4: • (L1-ν-FRQSSVM) generates the highest or second highest mean accuracy scores on experimented data sets.Make sure there exist large-and small-scale data sets inside Table 2, supporting the efficiency of the presented models on classified data sets in disparate scales.
• The CPU time of the presented (L1-ν-FRQSSVM) model is short on masses of the data sets.For the same solver Cplex, the CPU time depleted by (L1-ν-FRQSSVM) is less than those by SQSSVM, ν-SQSSVM, along with ν-FSQSSVM models.Indeed, the model (L1-ν-FRQSSVM) has fewer variables than the SQSSVM, the ν-SQSSVM, and the ν-FSQSSVM models, which leads to a lower computational complexity of the proposed model.
• Make sure that, aside from those kernel-free QSSVM models, all the other experimented models are implemented through adopting Scikit-learn packages supported professionally.Even if the presented model depletes a lengthier CPU time, it`s acceptable as before.To visualize the difference in performance metrics between our model and other models on different datasets, we selected the Brain tumor, Heart, Ecoli and Pima datasets from the benchmark and made the corresponding box plots Fig. 2 shown.The performance of our model is the best of all models, and the standard deviation of the accuracy is small and fluctuates less, which indicates the good generalization ability of our model.A few statistical tests are also applied with the aim of showing that the presented model outperforms all other experimented models.We firstly figure out the mean rank of accuracy scores from all experimented models on the public benchmark dataset, as depicted inside Table 5.Notice that the proposed L1-ν-FRQSSVM ranks best in all experimented models.Then, the Friedman tests (Zhou, 2021) on all experimented models are implemented.Make  be the quantity of models,  be the quantity of data sets; in addition,  represents the mean rank of the th model ( = 1, . . ., ).We figure out the statistics λ and λ as below: in which a χ distribution with  − 1 degrees of freedom is followed by λ .Moreover, an -distribution with degrees of freedom  − 1 and ( − 1)( − 1). is followed by λ .With the results in Table 8, we have λ = 82.0409and λ = 11.1916 for  = 13 and  = 16.The F-distribution table at the 95% level of significance reaches 2.4589, less than λ = 11.1916.As a result, it is confident to refuse the null assumption which the performances of all the experimented models are alike.In another word, there exists an obvious difference in all the experimented models.
With the aim of inspecting the differences between two models, the Nemenyi post hoc experiment is implemented (Gao et al., 2022;Zhou, 2021).The critical difference  is figured out as below:  = .05(( + 1)/6) / (5) in which . 05 means the critical value of the Tukey distribution (Gao et al., 2022;Zhou, 2021).Two models become disparate on condition that the difference of their mean ranks surpasses CD.Inside our case, .05 = 3.354 and C = 4.5933.Accordingly, the Nemenyi experiment investigates the obvious distinctions between the presented (L1-ν-FRQSSVM) model and the other experimented models except SQSSVM and ν-SQSSVM models.However, it`s not difficult to observe that the mean rank of the presented model exceeds that of SQSSVM and ν-SQSSVM models.
Remark 2. With the consequences inside Tables 3 and 4, alongside consequences from the statistical experiments, it is surely inferred that the presented (L1-ν-FRQSSVM) model becomes powerful competitive in all tested models.It`s significant to utilize the presented model in authentic problems.

Application to diagnose the lung cancer subtypes
The presented (L1-ν-FRQSSVM) model is employed in the diagnosis of lung carcinoma subtypes (LUAD/LUSC) in this part.
The background and dataset of AD are introduced briefly.

Background and data
Lung carcinoma is one category of malignant tumor that usually originates in cells of the lung tissue and can invade surrounding tissues and spread to other sites.Lung carcinoma becomes one of the most common carcinomas throughout the world; additionally, it is also one of the dominating reasons for deaths related to cancers.The survival rate for patients with lung carcinoma is merely 17.8%, however, is more elevated significantly on condition that the disease is investigated while it is localized in lungs as before.Nevertheless, merely 15% of cases are investigated during an early phase (Cabrera et al., 2015).It means that early diagnosis is vital to the therapy of lung carcinoma.There are several types of lung cancer, two common types of which are lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC), which occupy major categories of lung cancer, respectively (Yu et al., 2020).LUAD and LUSC may differ in terms of clinical presentation, diagnosis, treatment, and prognosis.Clinicians will develop individualized treatment plans based on the patient's condition, pathological diagnosis, and stage to maximize patient outcomes and survival (Hsu and Si, 2018).Therefore, it is very important to make accurate pathological diagnosis and classification for subtypes of lung cancer patients.
In fact, machine learning methods are now frequently used in the medical community to subtype cancer based on data such as gene expression RNAseq (Yu et al., 2020;Hsu and Si, 2018).In this subsection, the present (L1-ν-FRQSSVM) model is promoted and employed for lung cancer subtype prediction in LUAD and LUSC, as well as all other tested models.The gene expression RNAseq dataset for LUAD/LUSC used in this study was obtained from the TCGA database.The final acquired LUAD/LUSC gene expression RNAseq dataset contains a total of 1129 data points, including 576 data points in the LUAD class and 553 data points in the LUSC class.The data points have 20531 features showing the transcriptional estimates for each gene level.In order to avoid larger values for some input features and smaller values for preferences, we also performed maximum-minimum normalization for all data points after transforming the raw gene expression values using the  ( ) method.

Numerical tests on lung cancer subtype data sets
Indeed, the number of features in the lung cancer subtype (LUAD/LUSC) data set is n=20531.Due to the enormous dimensionality, we apply Principal Component Analysis (PCA) to this data set to reduce the feature dimensionality.We finally choose to retain 90% of the information and obtain a reduced dimensionality data set with feature number n=497.For the SQSSVM models, even if the feature number n=497, it still has n(n+1)/2=123753 variables, which exceeds the memory and arithmetic power of the computer, so we do not test this dataset using the SQSSVM models.Instead, the proposed (L1-ν-FRQSSVM) model encountered no computational problems.All the results are listed in Table 6.These boxplots of the consequences are depicted as follows (in Fig. 3).The listed observations originate from the consequences: • In accordance with these results in Table 6 and the box plots, it can be seen that the presented (L1-ν-FRQSSVM) model is implemented better, compared with the other experimented models on the part of classification accuracy.Additionally, SD of accuracy scores from the presented model is the smallest in those of the experimented models.To sum up, the (L1-ν-FRQSSVM) model offers the steadiest and most accurate classifier in the total experimented models to diagnose the lung cancer subtypes.

•
Make sure the representative kernel-free QSSVM models, including SQSSVM, ν-SQSSVM or ν-FSQSSVM, aren`t capable of dealing with the LUAD/LUSC data set in the computing circumstance in our article, on account of the vast quantity of characteristics.Nevertheless, the presented model is capable of coping with this issue; additionally, it is implemented better, compared to the other experimented models on the part of accuracy.The time for CPU, which is depleted by the presented model, is receivable, despite that it`s longer compared with several of the experimented models that perform by means of commonly used solvers.Capitalizing on the reduced quadratic surface for separating, the presented (L1-ν-FRQSSVM) might cope with authentic issues more effectively compared with other QSSVM models with no kernel.• The presented (L1-ν-FRQSSVM) is implemented more greatly compared with all the other experimented well-known models on the part of the accuracy of classification.differing from the nonlinear SVM models with the kernel, the presented (L1-ν-FRQSSVM) model demands no kernels and tunes no kernel parameters, saving multitudes of efforts in reality.

•
Through introducing the L1 norm regularization term, the sparsity of the model is promoted, so that the computational complexity and overfitting can be avoided.At the same time, it enables the model to infinitely approximate the linear plane when facing the linearly divisible case of the split interface.The proposed model combines the ν-SVM idea.As a bounded parameter, the parameter ν is selected in a fixed range, which makes the model more convenient for parameter tuning.

•
Numerical tests have already been implemented in an attempt to illustrate the impact of the above-described parameters on classification accuracy.Furthermore, the hopeful numerical consequences on several artificial and several public benchmark data sets show the efficiency of the presented models in dealing with the authentic binary classification issues.
In the end, the elevated efficiency of the presented algorithm for the (L1-ν-FRQSSVM) model has been validated by means of the short training CPU time.

•
Especially, the presented model has shown its anticipated effectiveness and acceptable efficiency when adopted in the lung cancer dataset.The gratifying performance of the presented model for lung cancer subtype diagnosis has offered another dependable machine tool in the domain of lung cancer subtype diagnosis research.

Fig. 2 .Fig. 3 .
Fig. 2. Boxplots of the consequences of tested models.(a) Results on the Brain tumor; (b) Results on the Heart; (c) Results on the Ecoli; (d) Results on the Pima.Subsequently, the following figures are plotted with the aim of showing the sensibility of parameters λ and ν on the Brain tumor, Heart, Ecoil and Pima datasets from benchmark.It can be noticed that the parameter ν, as a parameter with a fixed range, can make the actual parameter tuning process easier and save much effort for training.

Fig. 4 .
Fig. 4. Boxplots of the consequences of the experimental models on lung cancer subtype figures5.ConclusionsTo sum up, one advanced nonlinear binary classification kernel-free (L1-ν-FRQSSVM) model has been presented by us.The (L1-ν-FRQSSVM) model has been derived and investigated rigorously.The classification efficiency and effectiveness of the presented model have already been verified using numerical tests on a few public benchmark data sets.What`s more, the presented model has been employed for the sake of diagnosing lung carcinoma subtypes within the LUAD/LUSC dataset based on gene expression RNAseq in the database of TCGA.The main discoveries of this research are summed up as follows:

Table 1
Abbreviations and solvers of experimented models.

Table 2
Public benchmark data sets

Table 3
The consequences of the experimented models on Benchmark data

Table 4
The results of tested models on Benchmark data

Table 5
Mean rank of the total experimented models on public benchmark data sets.

Table 6
The accuracy score consequences of the experimented models on lung cancer subtype figures.