Next Article in Journal
Marx’s Thoughts on Human Essence and Its Realistic Significance from the Perspective of Artificial Intelligence
Previous Article in Journal
Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory †

1
College of Science, Liaoning Technical University, Fuxin 123000, China
2
Institute of Intelligent Engineering and Math, Liaoning Technical University, Fuxin 123000, China
*
Authors to whom correspondence should be addressed.
Presented at the 2023 Summit of the International Society for the Study of Information (IS4SI 2023), Beijing, China, 14–16 August 2023.
Comput. Sci. Math. Forum 2023, 8(1), 74; https://doi.org/10.3390/cmsf2023008074
Published: 14 August 2023
(This article belongs to the Proceedings of 2023 International Summit on the Study of Information)

Abstract

:
The algorithm proposed in this paper can not only solve the binary classification problem, but can also solve the multi-classification problem. The combination of categories is defined on the basis of the sweeping serial classification algorithm, and the merging serial classification method of factors, both explicit and implicit, is proposed. The algorithm steps are given, and the numerical example is used for analysis. The results show that the proposed combined serial scanning classification method achieves factor concealment and is feasible and practical. The conclusion of factor implicit research on multi-classification learning expands the theory and application of factor space.

1. Introduction

In 1982, Chinese scholar Wang Peizhuang [1] first proposed the new concept of factor space. The theory of factor space has become an important theoretical basis for deeply analyzing the causal relationship between many things, and an indispensable mathematical theoretical basis for mechanistic artificial intelligence, providing an important theoretical framework for the generation of concepts, mathematical reasoning and the distinction and judgment of objects. With the continuous innovation of science and technology and the rapid development of computer network technology in recent years, machine learning has become an increasingly important part of the field of artificial intelligence. As the main task of machine learning, the classification problem can be applied to more and more rich and accurate algorithms. Among them, the dichotomy problem is also an important part of machine learning. It is an important research direction in the field of artificial intelligence to find an algorithm that can solve the dichotomy problem more accurately. Machine learning [2,3,4,5,6,7] has become a key research direction of artificial intelligence and data science. There are many binary classification algorithms in machine learning algorithms [8]. Factor implicit thought in factor space theory can be used to solve the classification problem of machine learning. A classification algorithm is proposed on the basis of factor implicit theory, a factor implicit model is constructed with this algorithm, and then a test classification is carried out. The feasibility of this algorithm in finding the direction of key implicit factors is studied, so that the factor space theory can continue to develop into its next stage and better solve the classification and discrimination problems in life. In this paper, a serial classification algorithm for sweeping classes is proposed, and a factor implicit model is constructed. The results show that the algorithm is feasible and effective.

2. Scanning Serial Classification Algorithm

Scanning Direction

Definition 1.
Given divisible two-class data sets
X = {xi|i = 1, …, k > 0}, X+ = {xj+|j = 1, …, l > 0}.
The centers of the two classes are o = (xi + … + xk)/k, o+ = (xj+ + … + xl+)/l, denoting w = o+o, which is called the sweep direction.
When people are sorting, they sweep from one kind of center to another kind of center, and what they notice in their overall vision is the sweeping direction, which is also the sweeping vector.
Definition 2.
Take the class vector w, for the positive class, a+ = argminj{(xj+,w)|j = 1, …, l}, b = argmaxi{(xi, w)|i = 1, …, k}, J = a+ − b, which is called the interval between the two classes with respect to the direction w. If the interval is positive, the w pair is called the explicit factor of classification, and J* = (a+ + b)/2 is called the partition wall. The two categories are separated according to the projection of data on w. If the interval is negative and there is a+ ≤ b, then the closed interval [a+, b] is the mixed domain of the projection of two classification data in the direction w. The sample points projected in the mixing domain are called mixed points, and h(w) is denoted as the number of mixed points in the direction w.
The projection mixing domain is empty, only if the interval is positive.
A given classification sample point set X = {xi|i = 1, …, k > 0}, X+ = {xj+|j = 1, …, l > 0}; if the convex closures generated by them in the factor space do not intersect each other, this is called a separable two-classification data set.
A sweep class serial classification algorithm, given a classification, can be divided into two sample points X = {xi|i = 1, …, k > 0}, X+ = {xj+|j = 1, …, l > 0}, which should be improved from the sweeping class vector w0 to the explicit and implicit vector wT.
Step 0 t: = 0, select the sweeping class vector w0 as the initial vector;
Step 1 Sequence the sample point set w, find at+ and bt and remove all non-miscible points from the data set. If the data set is deleted or empty, the solution is obtained, the operation is stopped and the output is: “explicit factor is wt, partition wall is Jt”. Otherwise, the mixed domain of wt and projection [at+, bt] is recorded. Delete all non-mixed points and update two classification data sets. Set t: = t + 1, go back to step 1, and repeat the process until the data set is empty.
Example 1.
X+ = {x1+ = (0, −1), x2+ = (1, 0), x3+ = (0, 2), x4+ = (−4, 2), x5+ = (−5, 5)}
X = {x1 = (3, −4), x2 = (4, −3), x3 = (0, −3), x4 = (−1, −1), x5 = (0, 0)}
finds the serial scanning class vectors, namely the explicit and implicit vector.
The solution first finds the point falling to the left of the mixed domain from the negative class points. The left endpoint of the mixed domain is a+, and the projection is −3.8.
Since (x1, w) = −23.6, (x2, w) = −22.6, (x3, w) = −11.4, all of them are less than −3.8, x1, and x2 and x3 must be judged as negative class points, not mixed points.
The right endpoint of the mixed domain is b, and the projection is 0. Since (x3+, w) = 7.6, (x4+, w) = 18.8, (x5+, w) = 33 are all greater than 0, x3+, x4+ and x5+ must be judged as positive class, not mixed points. Remove the points that have a clear category, and what is left is a collection of mixed points. The new two-category data set is:
X1 = {x4 = (−1, −1), x5 = (0, 0)};
X1+ = {x1+ = (0, −1), x2+ = (1, 0)}
Firstly, the centers of the positive and negative categories are found: o1+ = (0.5, −0.5); o1 = (−0.5, −0.5);
Find the sweep vector:
w1 = o1+o1 = (1, 0)
The recalculation is:
a1+ = min{(x1+, w1), (x2+, w1)} = min{0, 1} = 0,
b1 = max{(x4, w1), (x5, w1)} = max{(−1, 0} = 0.
The projection interval is J1 = a1+b1 = 0. Because the spacing of w1 is not positive, it is still not an implicit factor. Its projection mixing domain is [0, 0].
Continuing this process, the remaining mixed point classification data set is:
X2 = {x5 = (0, 0)}; X2+ = {x1+ = (0, −1)}.
o2+ = (0, −1); o2 = (0, 0)
Find the sweep vector:
w2 = o2+o2 = (0, −1);
a2+ = min{(x1+, x1+)} = 1,
b2 = max{(x5, w1)} = 0.
The projection interval is J2 = a2+b2 = 1 > 0. It is known that w2 is an implicit factor in X2 and X2+, J*2 = 0.5.
Test step
After the explicit and implicit factors are found and the partition wall is calculated, the explicit and implicit model is established to test the test point. The steps are as follows.
Step 1: input test data z = (z1, … zn);
Step 2: start from t = 0 and calculate and test the data of wt projection (z, wt) to determine whether the mixed domain [at+, bt];
Step 3: If (z, wt) is not in the mixed domain, it means that for z, for wt non-mixed points, the category is determined according to its projection. At the left of the mixed domain, it is judged as a negative class, and at the right of the mixed domain, it is judged as a positive class.
Step 4: If (z, wt) is in the mixed domain, to mix all point as a new binary classification data sets, t = t + 1 ≠ T, go to step 2 until t = t + 1 = T, wT has a partition wall J*, according to the symbol of (z, wT) − J* to determine the category of z. Obtain the solution, then stop running and output.
Example 2.
Take two classification data sets given in Example 1, input test points z = (2.5, 1.5), z’ = (2.5, −1.5) and try to identify their categories.
The solution to Example 1 gives the initial sweeping class vector w0 = (2.8, 3.8) for the given two classification sample points set, pointing out that it is not an implicit factor of the given data set, but has a projected mixed domain [3.8,0]. From this, find the projection (z, w0) to see if it has a definite category. Since (z, w0) = 1.3, in the mixed domain of w0, there is no clear category, so go further down.
From Example 2, we know that the new data set is:
X1 = {x4 = (−1, −1), x5 = (0, 0)};
X1+ = {x1+ = (0, −1), x2+ = (1, 0)}
The sweep class vector w1 = o1+o1 = (1, 0) is obtained, and the projection mixed domain is [0, 0]. Since (z, w1) = 2.5 is to the right of the mixed field, z is judged to belong to the positive class.
Similarly, z’ prime is in the negative class.

3. Merging the Sweeping Class Algorithm

Let all data be labeled as class K, and the whole training data set X can be written as X = X1 + … + XK; here, the sign “plus” represents the non-intersection operation of sets. If the following K1 categories are combined into one class, remember that X1′ = X2 + … + XK, so there is X = X1 + X1’ using the sweeping algorithm SL (X1, X1’) of the sequence of criteria to obtain the classification criteria of X1 and X1’. Enter a test point z. If it is judged to belong to class 1, its class is determined and stops. If z belongs to X1′, then combine the following k2 categories into one class, remembering that X2′ = X3 + … + Xk, so that X1′ = X2 + X3′, using the sweeping class algorithm SL (X2, X2′) can obtain the classification criterion of X2 and X2′. If z is judged to belong to X2, its class is determined and it stops. If z belongs to X2’, then combine the following k−3 categories into one class: X3′ = X4 + … + Xk and so on, until Xk−1′ = Xk.
Algorithm step:
Enter K (>2) class training data set X1XK;
Output the multiple classification criteria sequence:
{{wkt, Hkt = [akt+, bkt]}(t = 0, 1, …, T−1); wkt, Jk*}(k = 1, …,K).
k: = 1; t: = 1;
Step 1 Take X−k = Xk as the negative training point set, and set X+k = Xk+1 + … + XK; this is taken as the positive training point set;
Step 2 Apply SL (X−k, X+k) and output {{wktt, Hkt}(t = 0, 1, …, T−1); wkt, Jk*}.
If k < K−1, let k = k + 1; otherwise, stop the machine.
Summary: Output HSL (X1XK) of the total sequence of criteria:
{{wktt, Hkt = [akt+, bkt]}(t = 0,1, …, T−1); wkt, Jk*}(k = 1, …, K).
Example 3.
Let K = 3. There are three categories of lily, chrysanthemum and red rose. Four criterion factors were selected:
I(f1 = petal length) = {short, medium, long} = {1, 2, 3};
I(f2 = petal width) = {narrow, middle, wide} = {1, 2, 3};
I(f3 = petal color) = {light, medium, dark} = {1, 2, 3};
I(f4 = flower fragrance) = {light, medium, strong} = {1, 2, 3};
X1(lily) = {x11(2, 1, 3, 2), x12(1, 2, 3, 2), x13(2, 1, 3, 1), x14(2, 1, 2, 3), x15(1, 1, 3, 1)},
X2(chrysanthemum) = {x21(2, 3, 1, 3), x22(2, 3, 2, 3), x23(1, 2, 3, 2), x24(2, 3, 1, 3), x25(2, 3, 1, 3)},
X3(red rose) = {x31(2, 2, 1, 3), x32(3, 2, 1, 3), x33(3, 1, 1, 3), x34(3, 2, 2, 3), x35(3, 2, 2, 2)}.
The combined serial scanning algorithm was used to establish the sequence of criteria, and the test points x1 = (2, 1, 2, 2) were determined, respectively.
k: = 1, (X1, X2 + X3)
Step 1 Determine the positive and negative training point sets:
X1 = X1 = {x11, x12, x13, x14, x15};
X+1 = X2 + X3 = {x21, x22, x23, x24, x25, x31, x32, x33, x34, x35};
Step 2 Calculate the two types of centers o+1, o1 and the sweep vector:
w1 = o+1o1;
o+1 = (o2 + o3)/2 = (2.3, 2.3, 1.5, 2.8),
o1 = o1 = (1.6, 1.2, 2.8, 1.8),
w1 = (0.7, 1.1, −1.3, 1);
Step 3 Calculate the lower limit of positive projection a+1 and the upper limit of negative projection b1:
(x21, w1) = 5.4, (x22, w1) = 4.1, (x23, w1) = 1, (x24, w1) = 6.4, (x25, w1) = 6.4,
(x31, w1) = 5.1, (x32, w1) = 5.8, (x33, w1) = 6, (x34, w1) = 4.7, (x35, w1) = 3.6.
a+1 = min{(x, w1)|xX+1} = 1
(x11, w1) = 1.9, (x12, w1) = 1, (x13, w1) = −0.4, (x14, w1) = 2.9, (x15, w1) = −1.1,
b1 = max{(x, w1)|xX1} = 2.9
Step 4 Determine the projection mixed domain and partition walls.
Because a+1 < b1 has mixed domain projection, the first record is w1 = (0.7, 1.1, −1.3, 1), H1 = [1, 2.9], then t: = t + 1.
t = 2
X+2 = {x23}, X2 = {x11, x12, x14}
Go back to Step 2 to calculate the two types of centers o+2, o2 and the sweeping vector w2 = o+2o2:
o+2 = x23 = (1, 2, 3, 2),
o2 = (1.7, 1.3, 2.7, 2.3),
w2 = (−0.7, 0.7, 0.3, −0.3).
Step 3 Calculate the lower limit of positive projection a+2 and the upper limit of negative projection b2:
(x23, w2) = 1
a+2 = min{(x, w2)|xX+2} = 1
(x11,w2) = −0.4,(x12, w2) = 1, (x14, w2) = −1
b2 = max{(x, w2)|xX2} = 1
Step 4 Determine the projection mixed domain and partition walls.
Because a+2 = b2 = 1, the projection mixing domain degenerates into a single point [1, 1], so stop the machine, record T = 2 and partition J*T = 1;
Step 5 Sort out the sequence of criteria.
t = 1: w1 = (0.7, 1.1, −1.3, 1),H1 = [a+1, b1] = [1, 2.9];
T1 = 2: w2 = (−0.7, 0.7, 0.3, −0.3), J1* = 1
k: = 2, (X1, X2 + X3)
t = 1
Step 1 Determine the positive and negative training point sets;
X+1 = X3 = {x31, x32, x33, x34, x35},
X1 = X2 = {x21, x22, x23, x24, x25}.
Step 2 Calculate the two types of centers o+1, o1 and the sweep vector w1 = o+1o1;
o+1 = (2.8, 1.8, 1.4, 2.8),
o1 = (1.8, 2.8, 1.6, 2.8).
w1 = (1, −1, −0.2, 0)
Step 3 Calculate the lower limit of positive projection a+1 and the upper limit of negative projection b−1;
(x31, w1) = −0.2, (x32, w1) = 0.8, (x33, w1) = 1.8, (x34, w1) = 0.6, (x35, w1) = 0.6,
a+1 = min{(x, w1)|xX+1} = −0.2,
(x21, w1) = −1.2, (x22, w1) = −1.4, (x23, w1) = −1.6, (x24, w1) = −1.2, (x25, w1) = −1.2.
b1 = max{(x, w1)|xX1} = −1.2
Step 4 Determine the projection mixed domain and partition walls.
Since a+1 > b1, the projection mixed domain is an empty set, so close T2 = 1, record and partition J2* = (a+1 + b1)/2 = 0.7.
Step 5 Sort out the sequence of criteria.
T = 1: w1 = (1, −1, −0.2, 0), J2* = −0.7
Summary: Output (X1, X2, X3) sequence of criteria:
k = 1:
t = 1, w11 == (0.7, 1.1, −1.3, 1), H11 = [1, 2.9];
t = 2 = T1, w12 = (−0.7, 0.7, 0.3, −0.3), J1* = 1;
k = 2:
t = 2 = T2, w21 = (1, −1, −0.2, 0), J2* = −0.7.
Input test point x1 = (2, 1, 2, 2), and calculate (x1, w11) = 1.9. Because it is within the mixed domain interval H11 = [1, 2.9], it cannot determine its category, so let t = t + 1, calculate (x1, w12) = −0.7, because T1 = 2, o12 = 1, because (x1, w12) < JT1**, x1 belongs to the negative class. The negative class is the original class X1, which has achieved the purpose of discrimination, so x1 is judged as the class X1 lily.

Author Contributions

Y.W. provided the algorithm and software coding for verification analysis and writing of the paper; F.Z. provided the guidance for writing and preparing the first draft; H.S., X.L., K.L., and K.Z. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted by the Science and Technology Research project of the Education Department of Liaoning Province: Research on classification algorithms and Knowledge Representation based on factor space Theory, project number: LJ2019JL019; Basic Scientific Research Project of colleges and universities of Liaoning Provincial Department of Education, key research project: Theory and Application Research of Factor Space-based intelligent incubation under the Digital Background, project number: funded by LJKZZ20220047.

Informed Consent Statement

Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The data used in this article is all from the UCI dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, P. Fuzzy Sets and Category of Fuzzy Sets. Adv. Math. 1982, 1, 1–18. [Google Scholar]
  2. Guo, M.W.; Zhao, Y.; Xiang, J.; Zhang, C.; Chen, Z. Review of object detection methods based on SVM. Control Decis. 2014, 29, 193–200. [Google Scholar]
  3. Xu, J.H.; Zhang, X.G.; Li, Y.D. Advances in support vector machines. Control Decis. 2004, 19, 481–484. [Google Scholar]
  4. Liu, J.; Cheng, J.; Chen, J. Support vector machine training algorithm. J. Inf. Control 2002, 45–50. [Google Scholar]
  5. Wang, H.-Y.; Li, J.-H.; Yang, F.-L. Overview of support vector machine analysis and algorithm. J. Comput. Appl. Res. 2014, 31, 1281–1286. [Google Scholar]
  6. Ling, L.; Lin, H.T. Ordinal Regression by Extended Binary Classification//Advances in Neural Information Processing Systems 19. In Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  7. McKinna, L.I.W.; Furnas, M.J.; Ridd, P.V. A simple, binary classification algorithm for the detection of Trichodesmium spp. within the Great Barrier Reef using MODIS imagery. Limnol. Oceanogr. Methods 2011, 9, 50–66. [Google Scholar] [CrossRef]
  8. Yang, J.; Qiao, P.; Li, Y.; Wang, N. Machine learning classification problem and algorithm research review. J. Stat. Decis. 2019, 35, 36–40. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Zeng, F.; Sun, H.; Liu, X.; Lin, K.; Zhang, K. Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory. Comput. Sci. Math. Forum 2023, 8, 74. https://doi.org/10.3390/cmsf2023008074

AMA Style

Wang Y, Zeng F, Sun H, Liu X, Lin K, Zhang K. Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory. Computer Sciences & Mathematics Forum. 2023; 8(1):74. https://doi.org/10.3390/cmsf2023008074

Chicago/Turabian Style

Wang, Ying, Fanhui Zeng, Hui Sun, Xiaotong Liu, Kaile Lin, and Kaijie Zhang. 2023. "Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory" Computer Sciences & Mathematics Forum 8, no. 1: 74. https://doi.org/10.3390/cmsf2023008074

Article Metrics

Back to TopTop