Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory

Wang, Ying; Zeng, Fanhui; Sun, Hui; Liu, Xiaotong; Lin, Kaile; Zhang, Kaijie

doi:10.3390/cmsf2023008074

Open AccessProceeding Paper

Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory^†

¹

College of Science, Liaoning Technical University, Fuxin 123000, China

²

Institute of Intelligent Engineering and Math, Liaoning Technical University, Fuxin 123000, China

^*

Authors to whom correspondence should be addressed.

^†

Presented at the 2023 Summit of the International Society for the Study of Information (IS4SI 2023), Beijing, China, 14–16 August 2023.

Comput. Sci. Math. Forum 2023, 8(1), 74; https://doi.org/10.3390/cmsf2023008074

Published: 14 August 2023

(This article belongs to the Proceedings of 2023 International Summit on the Study of Information)

Download Versions Notes

Abstract

:

The algorithm proposed in this paper can not only solve the binary classification problem, but can also solve the multi-classification problem. The combination of categories is defined on the basis of the sweeping serial classification algorithm, and the merging serial classification method of factors, both explicit and implicit, is proposed. The algorithm steps are given, and the numerical example is used for analysis. The results show that the proposed combined serial scanning classification method achieves factor concealment and is feasible and practical. The conclusion of factor implicit research on multi-classification learning expands the theory and application of factor space.

Keywords:

factor space; factor explicit implicit; sweeping serial classification algorithm; machine learning

1. Introduction

In 1982, Chinese scholar Wang Peizhuang [1] first proposed the new concept of factor space. The theory of factor space has become an important theoretical basis for deeply analyzing the causal relationship between many things, and an indispensable mathematical theoretical basis for mechanistic artificial intelligence, providing an important theoretical framework for the generation of concepts, mathematical reasoning and the distinction and judgment of objects. With the continuous innovation of science and technology and the rapid development of computer network technology in recent years, machine learning has become an increasingly important part of the field of artificial intelligence. As the main task of machine learning, the classification problem can be applied to more and more rich and accurate algorithms. Among them, the dichotomy problem is also an important part of machine learning. It is an important research direction in the field of artificial intelligence to find an algorithm that can solve the dichotomy problem more accurately. Machine learning [2,3,4,5,6,7] has become a key research direction of artificial intelligence and data science. There are many binary classification algorithms in machine learning algorithms [8]. Factor implicit thought in factor space theory can be used to solve the classification problem of machine learning. A classification algorithm is proposed on the basis of factor implicit theory, a factor implicit model is constructed with this algorithm, and then a test classification is carried out. The feasibility of this algorithm in finding the direction of key implicit factors is studied, so that the factor space theory can continue to develop into its next stage and better solve the classification and discrimination problems in life. In this paper, a serial classification algorithm for sweeping classes is proposed, and a factor implicit model is constructed. The results show that the algorithm is feasible and effective.

2. Scanning Serial Classification Algorithm

Scanning Direction

Definition 1.

Given divisible two-class data sets

X⁻ = {x_i⁻|i = 1, …, k > 0}, X⁺ = {x_j⁺|j = 1, …, l > 0}.

The centers of the two classes are o⁻ = (x_i⁻ + … + x_k⁻)/k, o⁺ = (x_j⁺ + … + x_l⁺)/l, denoting w = o⁺ − o⁻, which is called the sweep direction.

When people are sorting, they sweep from one kind of center to another kind of center, and what they notice in their overall vision is the sweeping direction, which is also the sweeping vector.

Definition 2.

Take the class vector w, for the positive class, a⁺ = argminj{(x_j⁺,w)|j = 1, …, l}, b⁻ = argmaxi{(x_i⁻, w)|i = 1, …, k}, J = a⁺ − b⁻, which is called the interval between the two classes with respect to the direction w. If the interval is positive, the w pair is called the explicit factor of classification, and J* = (a⁺ + b⁻)/2 is called the partition wall. The two categories are separated according to the projection of data on w. If the interval is negative and there is a⁺ ≤ b⁻, then the closed interval [a⁺, b⁻] is the mixed domain of the projection of two classification data in the direction w. The sample points projected in the mixing domain are called mixed points, and h(w) is denoted as the number of mixed points in the direction w.

The projection mixing domain is empty, only if the interval is positive.

A given classification sample point set X⁻ = {x_i⁻|i = 1, …, k > 0}, X⁺ = {x_j⁺|j = 1, …, l > 0}; if the convex closures generated by them in the factor space do not intersect each other, this is called a separable two-classification data set.

A sweep class serial classification algorithm, given a classification, can be divided into two sample points X⁻ = {x_i⁻|i = 1, …, k > 0}, X⁺ = {x_j⁺|j = 1, …, l > 0}, which should be improved from the sweeping class vector w₀ to the explicit and implicit vector w_T.

Step 0 t: = 0, select the sweeping class vector w₀ as the initial vector;

Step 1 Sequence the sample point set w, find a_t⁺ and b_t⁻ and remove all non-miscible points from the data set. If the data set is deleted or empty, the solution is obtained, the operation is stopped and the output is: “explicit factor is w_t, partition wall is J_t”. Otherwise, the mixed domain of wt and projection [a_t⁺, b_t⁻] is recorded. Delete all non-mixed points and update two classification data sets. Set t: = t + 1, go back to step 1, and repeat the process until the data set is empty.

Example 1.

X⁺ = {x₁⁺ = (0, −1), x₂⁺ = (1, 0), x₃⁺ = (0, 2), x₄⁺ = (−4, 2), x₅⁺ = (−5, 5)}

X⁻ = {x₁⁻ = (3, −4), x₂⁻ = (4, −3), x₃⁻ = (0, −3), x₄⁻ = (−1, −1), x₅⁻ = (0, 0)}

finds the serial scanning class vectors, namely the explicit and implicit vector.

The solution first finds the point falling to the left of the mixed domain from the negative class points. The left endpoint of the mixed domain is a+, and the projection is −3.8.

Since (x₁⁻, w) = −23.6, (x₂⁻, w) = −22.6, (x₃⁻, w) = −11.4, all of them are less than −3.8, x₁⁻, and x₂⁻ and x₃⁻ must be judged as negative class points, not mixed points.

The right endpoint of the mixed domain is b⁻, and the projection is 0. Since (x₃⁺, w) = 7.6, (x₄⁺, w) = 18.8, (x₅⁺, w) = 33 are all greater than 0, x₃⁺, x₄⁺ and x₅⁺ must be judged as positive class, not mixed points. Remove the points that have a clear category, and what is left is a collection of mixed points. The new two-category data set is:

X₁⁻ = {x₄⁻ = (−1, −1), x₅⁻ = (0, 0)};

X₁⁺ = {x₁⁺ = (0, −1), x₂⁺ = (1, 0)}

Firstly, the centers of the positive and negative categories are found: o₁⁺ = (0.5, −0.5); o₁⁻ = (−0.5, −0.5);

Find the sweep vector:

w₁ = o₁⁺ − o₁⁻ = (1, 0)

The recalculation is:

a₁⁺ = min{(x₁⁺, w₁), (x₂⁺, w₁)} = min{0, 1} = 0,

b₁⁻ = max{(x₄⁻, w₁), (x₅⁻, w₁)} = max{(−1, 0} = 0.

The projection interval is J₁ = a₁⁺ − b₁⁻ = 0. Because the spacing of w₁ is not positive, it is still not an implicit factor. Its projection mixing domain is [0, 0].

Continuing this process, the remaining mixed point classification data set is:

X₂⁻ = {x₅⁻ = (0, 0)}; X₂⁺ = {x₁⁺ = (0, −1)}.

o₂⁺ = (0, −1); o₂⁻ = (0, 0)

Find the sweep vector:

w₂ = o₂⁺ − o₂⁻ = (0, −1);

a₂⁺ = min{(x₁⁺, x₁⁺)} = 1,

b₂⁻ = max{(x₅⁻, w₁)} = 0.

The projection interval is J₂ = a₂⁺ − b₂⁻ = 1 > 0. It is known that w₂ is an implicit factor in X₂⁻ and X₂⁺, J^*₂ = 0.5.

Test step

After the explicit and implicit factors are found and the partition wall is calculated, the explicit and implicit model is established to test the test point. The steps are as follows.

Step 1: input test data z = (z₁, … z_n);

Step 2: start from t = 0 and calculate and test the data of w_t projection (z, w_t) to determine whether the mixed domain [a_t⁺, b_t⁻];

Step 3: If (z, w_t) is not in the mixed domain, it means that for z, for w_t non-mixed points, the category is determined according to its projection. At the left of the mixed domain, it is judged as a negative class, and at the right of the mixed domain, it is judged as a positive class.

Step 4: If (z, w_t) is in the mixed domain, to mix all point as a new binary classification data sets, t = t + 1 ≠ T, go to step 2 until t = t + 1 = T, w_T has a partition wall J*, according to the symbol of (z, w_T) − J* to determine the category of z. Obtain the solution, then stop running and output.

Example 2.

Take two classification data sets given in Example 1, input test points z = (2.5, 1.5), z’ = (2.5, −1.5) and try to identify their categories.

The solution to Example 1 gives the initial sweeping class vector w₀ = (2.8, 3.8) for the given two classification sample points set, pointing out that it is not an implicit factor of the given data set, but has a projected mixed domain [3.8,0]. From this, find the projection (z, w₀) to see if it has a definite category. Since (z, w₀) = 1.3, in the mixed domain of w₀, there is no clear category, so go further down.

From Example 2, we know that the new data set is:

X₁⁻ = {x₄⁻ = (−1, −1), x₅⁻ = (0, 0)};

X₁⁺ = {x₁⁺ = (0, −1), x₂⁺ = (1, 0)}

The sweep class vector w₁ = o₁⁺ − o₁⁻ = (1, 0) is obtained, and the projection mixed domain is [0, 0]. Since (z, w₁) = 2.5 is to the right of the mixed field, z is judged to belong to the positive class.

Similarly, z’ prime is in the negative class.

3. Merging the Sweeping Class Algorithm

Let all data be labeled as class K, and the whole training data set X can be written as X = X₁ + … + X_K; here, the sign “plus” represents the non-intersection operation of sets. If the following K₁ categories are combined into one class, remember that X_1′ = X₂ + … + X_K, so there is X = X₁ + X₁’ using the sweeping algorithm SL (X₁, X₁’) of the sequence of criteria to obtain the classification criteria of X₁ and X₁’. Enter a test point z. If it is judged to belong to class 1, its class is determined and stops. If z belongs to X_1′, then combine the following k₂ categories into one class, remembering that X_2′ = X₃ + … + X_k, so that X_1′ = X₂ + X_3′, using the sweeping class algorithm SL (X₂, X_2′) can obtain the classification criterion of X₂ and X_2′. If z is judged to belong to X₂, its class is determined and it stops. If z belongs to X₂’, then combine the following k−3 categories into one class: X_3′ = X₄ + … + X_k and so on, until X_k−1′ = X_k.

Algorithm step:

Enter K (>2) class training data set X₁ … X_K;

Output the multiple classification criteria sequence:

{{w_kt, H_kt = [a_kt⁺, b_kt⁻]}_{(t = 0, 1, …, T−1)}; w_kt, J_k*}_{(k = 1, …,K)}.

k: = 1; t: = 1;

Step 1 Take X^−k = X^k as the negative training point set, and set X^+k = X^k⁺¹ + … + X^K; this is taken as the positive training point set;

Step 2 Apply SL (X^−k, X^+k) and output {{w_ktt, H_kt}_{(t = 0, 1, …, T−1)}; w_kt, J_k*}.

If k < K−1, let k = k + 1; otherwise, stop the machine.

Summary: Output HSL (X₁ … X_K) of the total sequence of criteria:

{{w_ktt, H_kt = [a_kt⁺, b_kt⁻]}_{(t = 0,1, …, T−1)}; w_kt, J_k*}_{(k = 1, …, K)}.

Example 3.

Let K = 3. There are three categories of lily, chrysanthemum and red rose. Four criterion factors were selected:

I(f₁ = petal length) = {short, medium, long} = {1, 2, 3};

I(f₂ = petal width) = {narrow, middle, wide} = {1, 2, 3};

I(f₃ = petal color) = {light, medium, dark} = {1, 2, 3};

I(f₄ = flower fragrance) = {light, medium, strong} = {1, 2, 3};

X¹(lily) = {x₁₁(2, 1, 3, 2), x₁₂(1, 2, 3, 2), x₁₃(2, 1, 3, 1), x₁₄(2, 1, 2, 3), x₁₅(1, 1, 3, 1)},

X²(chrysanthemum) = {x₂₁(2, 3, 1, 3), x₂₂(2, 3, 2, 3), x₂₃(1, 2, 3, 2), x₂₄(2, 3, 1, 3), x₂₅(2, 3, 1, 3)},

X³(red rose) = {x₃₁(2, 2, 1, 3), x₃₂(3, 2, 1, 3), x₃₃(3, 1, 1, 3), x₃₄(3, 2, 2, 3), x₃₅(3, 2, 2, 2)}.

The combined serial scanning algorithm was used to establish the sequence of criteria, and the test points x₁ = (2, 1, 2, 2) were determined, respectively.

k: = 1, (X¹, X² + X³)

Step 1 Determine the positive and negative training point sets:

X⁻₁ = X¹ = {x₁₁, x₁₂, x₁₃, x₁₄, x₁₅};

X⁺₁ = X² + X³ = {x₂₁, x₂₂, x₂₃, x₂₄, x₂₅, x₃₁, x₃₂, x₃₃, x₃₄, x₃₅};

Step 2 Calculate the two types of centers o⁺₁, o⁻₁ and the sweep vector:

w₁ = o⁺₁ − o⁻₁;

o⁺₁ = (o₂ + o₃)/2 = (2.3, 2.3, 1.5, 2.8),

o⁻₁ = o₁ = (1.6, 1.2, 2.8, 1.8),

w₁ = (0.7, 1.1, −1.3, 1);

Step 3 Calculate the lower limit of positive projection a⁺₁ and the upper limit of negative projection b⁻₁:

(x₂₁, w₁) = 5.4, (x₂₂, w₁) = 4.1, (x₂₃, w₁) = 1, (x₂₄, w₁) = 6.4, (x₂₅, w₁) = 6.4,

(x₃₁, w₁) = 5.1, (x₃₂, w₁) = 5.8, (x₃₃, w₁) = 6, (x₃₄, w₁) = 4.7, (x₃₅, w₁) = 3.6.

a⁺₁ = min{(x, w₁)|x ∈ X⁺₁} = 1

(x₁₁, w₁) = 1.9, (x₁₂, w₁) = 1, (x₁₃, w₁) = −0.4, (x₁₄, w₁) = 2.9, (x₁₅, w₁) = −1.1,

b⁻₁ = max{(x, w₁)|x ∈ X⁻₁} = 2.9

Step 4 Determine the projection mixed domain and partition walls.

Because a⁺₁ < b⁻₁ has mixed domain projection, the first record is w₁ = (0.7, 1.1, −1.3, 1), H₁ = [1, 2.9], then t: = t + 1.

t = 2

X⁺₂ = {x₂₃}, X⁻₂ = {x₁₁, x₁₂, x₁₄}

Go back to Step 2 to calculate the two types of centers o⁺₂, o⁻₂ and the sweeping vector w₂ = o⁺₂ − o⁻₂:

o⁺₂ = x₂₃ = (1, 2, 3, 2),

o⁻₂ = (1.7, 1.3, 2.7, 2.3),

w₂ = (−0.7, 0.7, 0.3, −0.3).

Step 3 Calculate the lower limit of positive projection a⁺₂ and the upper limit of negative projection b⁻₂:

(x₂₃, w₂) = 1

a⁺₂ = min{(x, w₂)|x ∈ X⁺₂} = 1

(x₁₁,w₂) = −0.4,(x₁₂, w₂) = 1, (x₁₄, w₂) = −1

b⁻₂ = max{(x, w₂)|x ∈ X⁻₂} = 1

Step 4 Determine the projection mixed domain and partition walls.

Because a⁺₂ = b⁻₂ = 1, the projection mixing domain degenerates into a single point [1, 1], so stop the machine, record T = 2 and partition J*_T = 1;

Step 5 Sort out the sequence of criteria.

t = 1: w₁ = (0.7, 1.1, −1.3, 1),H₁ = [a⁺₁, b⁻₁] = [1, 2.9];

T₁ = 2: w₂ = (−0.7, 0.7, 0.3, −0.3), J₁* = 1

k: = 2, (X¹, X² + X³)

t = 1

Step 1 Determine the positive and negative training point sets;

X⁺₁ = X₃ = {x₃₁, x₃₂, x₃₃, x₃₄, x₃₅},

X⁻₁ = X₂ = {x₂₁, x₂₂, x₂₃, x₂₄, x₂₅}.

Step 2 Calculate the two types of centers o⁺₁, o⁻₁ and the sweep vector w₁ = o⁺₁ − o⁻₁;

o⁺₁ = (2.8, 1.8, 1.4, 2.8),

o⁻₁ = (1.8, 2.8, 1.6, 2.8).

w₁ = (1, −1, −0.2, 0)

Step 3 Calculate the lower limit of positive projection a⁺₁ and the upper limit of negative projection b−1;

(x₃₁, w₁) = −0.2, (x₃₂, w₁) = 0.8, (x₃₃, w₁) = 1.8, (x₃₄, w₁) = 0.6, (x₃₅, w₁) = 0.6,

a⁺₁ = min{(x, w₁)|x ∈ X⁺₁} = −0.2,

(x₂₁, w₁) = −1.2, (x₂₂, w₁) = −1.4, (x₂₃, w₁) = −1.6, (x₂₄, w₁) = −1.2, (x₂₅, w₁) = −1.2.

b⁻₁ = max{(x, w₁)|x ∈ X⁻₁} = −1.2

Step 4 Determine the projection mixed domain and partition walls.

Since a⁺₁ > b⁻₁, the projection mixed domain is an empty set, so close T₂ = 1, record and partition J₂* = (a⁺₁ + b⁻₁)/2 = 0.7.

Step 5 Sort out the sequence of criteria.

T = 1: w₁ = (1, −1, −0.2, 0), J₂* = −0.7

Summary: Output (X¹, X², X³) sequence of criteria:

k = 1:

t = 1, w₁₁ == (0.7, 1.1, −1.3, 1), H₁₁ = [1, 2.9];

t = 2 = T₁, w₁₂ = (−0.7, 0.7, 0.3, −0.3), J₁* = 1;

k = 2:

t = 2 = T₂, w₂₁ = (1, −1, −0.2, 0), J₂* = −0.7.

Input test point x₁ = (2, 1, 2, 2), and calculate (x₁, w₁₁) = 1.9. Because it is within the mixed domain interval H₁₁ = [1, 2.9], it cannot determine its category, so let t = t + 1, calculate (x₁, w₁₂) = −0.7, because T₁ = 2, o₁₂ = 1, because (x₁, w₁₂) < J_T₁**, x₁ belongs to the negative class. The negative class is the original class X¹, which has achieved the purpose of discrimination, so x₁ is judged as the class X¹ lily.

Author Contributions

Y.W. provided the algorithm and software coding for verification analysis and writing of the paper; F.Z. provided the guidance for writing and preparing the first draft; H.S., X.L., K.L., and K.Z. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted by the Science and Technology Research project of the Education Department of Liaoning Province: Research on classification algorithms and Knowledge Representation based on factor space Theory, project number: LJ2019JL019; Basic Scientific Research Project of colleges and universities of Liaoning Provincial Department of Education, key research project: Theory and Application Research of Factor Space-based intelligent incubation under the Digital Background, project number: funded by LJKZZ20220047.

Informed Consent Statement

Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The data used in this article is all from the UCI dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, P. Fuzzy Sets and Category of Fuzzy Sets. Adv. Math. 1982, 1, 1–18. [Google Scholar]
Guo, M.W.; Zhao, Y.; Xiang, J.; Zhang, C.; Chen, Z. Review of object detection methods based on SVM. Control Decis. 2014, 29, 193–200. [Google Scholar]
Xu, J.H.; Zhang, X.G.; Li, Y.D. Advances in support vector machines. Control Decis. 2004, 19, 481–484. [Google Scholar]
Liu, J.; Cheng, J.; Chen, J. Support vector machine training algorithm. J. Inf. Control 2002, 45–50. [Google Scholar]
Wang, H.-Y.; Li, J.-H.; Yang, F.-L. Overview of support vector machine analysis and algorithm. J. Comput. Appl. Res. 2014, 31, 1281–1286. [Google Scholar]
Ling, L.; Lin, H.T. Ordinal Regression by Extended Binary Classification//Advances in Neural Information Processing Systems 19. In Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
McKinna, L.I.W.; Furnas, M.J.; Ridd, P.V. A simple, binary classification algorithm for the detection of Trichodesmium spp. within the Great Barrier Reef using MODIS imagery. Limnol. Oceanogr. Methods 2011, 9, 50–66. [Google Scholar] [CrossRef]
Yang, J.; Qiao, P.; Li, Y.; Wang, N. Machine learning classification problem and algorithm research review. J. Stat. Decis. 2019, 35, 36–40. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zeng, F.; Sun, H.; Liu, X.; Lin, K.; Zhang, K. Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory. Comput. Sci. Math. Forum 2023, 8, 74. https://doi.org/10.3390/cmsf2023008074

AMA Style

Wang Y, Zeng F, Sun H, Liu X, Lin K, Zhang K. Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory. Computer Sciences & Mathematics Forum. 2023; 8(1):74. https://doi.org/10.3390/cmsf2023008074

Chicago/Turabian Style

Wang, Ying, Fanhui Zeng, Hui Sun, Xiaotong Liu, Kaile Lin, and Kaijie Zhang. 2023. "Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory" Computer Sciences & Mathematics Forum 8, no. 1: 74. https://doi.org/10.3390/cmsf2023008074

Article Menu

Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory^†

Abstract

1. Introduction

2. Scanning Serial Classification Algorithm

Scanning Direction

3. Merging the Sweeping Class Algorithm

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory †

Abstract

1. Introduction

2. Scanning Serial Classification Algorithm

Scanning Direction

3. Merging the Sweeping Class Algorithm

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory^†