Semi-supervised learning with constrained virtual support vector machines for classification of remote sensing image data

We introduce two semi-supervised models for the classification of remote sensing image data. The models are built upon the framework of Virtual Support Vector Machines (VSVM). Generally, VSVM follow a two-step learning procedure: A Support Vector Machines (SVM) model is learned to determine and extract labeled samples that constitute the decision boundary with the maximum margin between thematic classes, i.e., the Support Vectors (SVs). The SVs govern the creation of so-called virtual samples. This is done by modifying, i.e., per-turbing, the image features to which a decision boundary needs to be invariant. Subsequently, the classification model is learned for a second time by using the newly created virtual samples in addition to the SVs to eventually find a new optimal decision boundary. Here, we extend this concept by (i) integrating a constrained set of semi-labeled samples when establishing the final model. Thereby, the model constrainment, i.e., the selection mechanism for including solely informative semi-labeled samples, is built upon a self-learning procedure composed of two active learning heuristics. Additionally, (ii) we consecutively deploy semi-labeled samples for the creation of semi-labeled virtual samples by modifying the image features of semi-labeled samples that have become semi-labeled SVs after an initial model run. We present experimental results from classifying two multispectral data sets with a sub-meter geometric resolution. The proposed semi-supervised VSVM models exhibit the most favorable performance compared to related SVM and VSVM-based approaches, as well as (semi-) supervised CNNs, in situations with a very limited amount of available prior knowledge, i


Introduction
Techniques for robustly deriving thematic information from remote sensing image data is of high interest.The past paradigm shift from expert-based systems, which involved implementing dedicated if-then rules to extract thematic information, to machine learning-based methods, which derive rules automatically from empirical observations, remains relevant.
In recent years, especially deep learning methods gained increasing popularity (Li et al., 2019).The underlying model structure often exhibits high generalization capabilities when a substantial amount of prior knowledge is available.Models like fully convolutional neural networks (CNNs) (Long et al., 2015) enable the learning of a hierarchy of discriminative feature representations, often resulting in improved accuracy for pixel-level predictions, such as semantic segmentation, when an ample amount of training data is available.However, depending on the domain under analysis, the collection of required prior knowledge for machine learning-based methods can remain costly.Consequently, it can be desirable to implement techniques that facilitate, for example, the efficient collection and compilation of prior knowledge such as costsensitive active learning (Persello et al., 2014;Geiß et al., 2018), the augmentation of the feature vector by using initial model outcomes to enhance estimates with the same level of deployed prior knowledge (Geiß and Taubenböck, 2015;Geiß et al., 2022a;Geiß et al., 2022b;Zhu et al., 2021), or the implementation of a model structure capable of achieving high generalization even when only a small amount of prior knowledge is available for model learning.Regarding the latter, Support Vector Machines (SVM) (Schölkopf and Smola, 2002) represent a framework that has shown excellent performance in situations with (i) a limited amount of prior knowledge, (ii) high dimensional feature vectors deployed for classification, and (iii) highly complex and intrinsically non-linear class patterns (Volpi et al., 2013).
The SVM algorithm establishes a so-called hyperplane, i.e., decision boundary, in between the patterns of thematic classes.To account for non-linear classification problems, labeled samples can be projected to a space of additional dimensionality using a non-linear transformation function.In this space, the hyperplane is positioned to maximize the margin between the patterns of thematic classes (Schölkopf and Smola, 2002).It is important to note that the maximum margin between the thematic classes is equally determined by two marginal decision boundaries that enclose the labeled samples nearest to the hyperplane: the Support Vectors (SVs) (Burges, 1998).Only these samples are necessary to establish the model and its corresponding class boundaries.This algorithmic property is particularly well-suited for creating accurate models based on a limited amount of prior knowledge.Moreover, additional prior knowledge can be efficiently encoded into the model based on the SVs.Specifically, SVs can be strategically employed to make the classification model invariant.Thereby, invariance refers to model robustness regarding representation changes of the objects induced by variations in their shape and size, spatial composition, or signal-to-noise ratio, respectively (Camps-Valls et al., 2014).Considering changes in representation for determining optimal decision boundaries is referred to as model invariance (Izquierdo-Verdiguier et al., 2013).As a result, the learned decision boundaries of a model should enable high accuracy when classifying unlabeled samples, even in the presence of a substantial number of changed object representations (Geiß et al., 2019).Different possibilities how invariances can be encoded in an SVM model were proposed.Those foresee the engineering of kernel functions which lead to invariant SVM, the generation of artificially transformed samples, as well as hybrid methods which internalize the principles of both aforementioned approaches (DeCoste & Schölkopf, 2002).In this context, we adopt the idea of generating a set of samples that have been artificially transformed.Such so-called virtual samples are generated by modifying the image features to which the classification model needs to be invariant.However, virtual samples are solely generated from the training samples that have become an SV within the initial learning stage of the model.This is done to selectively enrich the training set which is deployed for a second learning stage of the model.This type of model belongs to the family of Virtual Support Vector Machines-based (VSVM) methods (DeCoste & Schölkopf, 2002).
Few studies in remote sensing image classification explored the integration of virtual samples to achieve SVM model invariance.For instance, Izquierdo-Verdiguier et al. (2013) focused on creating invariances related to object rotations and sizes using altered squareshaped image subsets.Nevertheless, creating feasible invariances requires careful pre-engineering by a specialist.In contrast to that, Geiß et al. (2019) employ a self-learning technique to discard noninformative virtual samples that were eventually formed during a potentially arbitrary process of creating invariances.Therefore, the procedure's goal is to select informative virtual samples that closely resemble their corresponding SVs.Otherwise, virtual samples may introduce divergence in the model, potentially reducing its generalization capabilities.The self-learning strategy serves as a pruning mechanism.If artificially transformed samples surpass empirically determined class-specific distances from their associated SVs or a specific margin distance, they are excluded when the model is retrained.This ensures that the virtual samples employed in the final model closely resemble their corresponding SVs and satisfy margin similarity requirements with respect to previous decision boundaries.As a result, the risk of causing model divergence is mitigated.
It can be noted, that the implemented self-learning strategy internalizes principles of active learning (Tuia et al., 2009).Model heuristics of active learning methods aim to identify few relevant unlabeled samples for prioritized labeling by an annotator (Samat et al., 2015).Relevance is for instance expressed by uncertainty.This rationale is implemented by selecting uncertain unlabeled samples which are located in close proximity to the border of the hyperplane (Demir et al., 2011).In the context of the self-learning strategy, the iterative human-machine interaction of active learning techniques was transformed into a machine-machine interaction (Geiß et al., 2019).
Related mechanisms were followed also within semi-supervised learning techniques (Dópido et al., 2013;Lu et al., 2016;Chen et al., 2022;Shu et al., 2022).Semi-supervised methods aim to encode the structural information related to the unlabeled samples for a better representation of the thematic classes and fitting decision boundaries with a better generalization capability as compared to the deployment of labeled samples alone.In a typical semi-supervised learning procedure, unlabeled samples are iteratively labeled to create semi-labeled samples using an initially learned model.Subsequently, the semi-labeled samples are deployed in conjunction with the labeled ones for learning a final supervised model.However, improperly constrained semi-labeled samples can lead to model divergence (Li and Zhou, 2015).As a result, query functions in active learning and semi-supervised models often employ opposing criteria.Active learning focuses on identifying the most uncertain unlabeled samples nearest to decision boundaries, as subsequent labeling is considered reliable.In contrast, query functions in semisupervised learning prioritize selecting the most reliable semi-labeled samples, which are often those farthest from the decision boundary (Persello and Bruzzone, 2014).This is where the VSVM framework with a self-learning strategy provides a unique opportunity to deploy a set of constrained semilabeled samples for encoding invariances.It allows us to target the augmentation of samples used for model learning by selectively choosing semi-labeled samples that resemble existing SVs and meet margin similarity requirements with respect to previous decision boundaries.Our goal in this work is to improve the generalization capabilities of VSVM models by incorporating a set of constrained semilabeled samples.As such, the pool of samples from which the model is learned from is enriched by informative semi-labeled samples.Thereby, we naturally extend the VSVM framework in the context of semisupervised learning, since we utilize SVM/VSVM-specific model heuristics for this task.
Generally, the discussed concepts are relevant in data settings where the ground sampling distance, i.e., the geometric resolution of the data, is much smaller than the objects that need to be extracted.Such a relation can emerge in various classification problems.However, particularly image data from remote sensing missions with a sub-meter geometric resolution including WorldView-I-IV or Pléiades feature such a relation.Furthermore, the proposed techniques, including the generation of invariances, are particularly relevant for remote sensing imagery with limited spectral resolution, such as multispectral imagery.This is especially important when there is a need to generate discriminative information due to a high level of intra-class variability and low interclass variability.
We introduce two novel algorithms that enhance the VSVM framework by incorporating constrained semi-labeled samples and virtual semi-labeled samples to encode invariances during the learning process.Our contributions in this paper are as follows: Contribution 1 (Model 1): Our first model extends the VSVM by incorporating a set of constrained semi-labeled samples to establish decision boundaries.We demonstrate its relevance by applying it to very high geometric resolution multispectral remote sensing imagery.
Contribution 2 (Model 2): Building upon the first model, our second model deploys semi-labeled samples to generate semi-labeled virtual samples.Specifically, we modify the image features of semi-labeled samples that have become semi-labeled SVs after the initial model runs.Constrained semi-labeled virtual samples are additionally integrated when establishing the final model.This paper is structured as follows: we use Section 2 to describe the two semi-supervised VSVM models.The experimental setup is documented in Section 3 and the corresponding results are provided in Section 4. Concluding remarks are given in Section 5.

Proposed semi-supervised VSVM models
For our models, we built upon VSVM, which are detailed in Section 2.1.Subsequently, we propose two semi-supervised extensions of the algorithm: (i) we also include semi-labeled samples in the model and prune non-informative both virtual samples and semi-labeled samples with a self-learning strategy from the model before a final relearning step (Section 2.2); (ii) additionally, we perturb image features of semilabeled samples and additionally include informative virtual semilabeled samples before a final relearning step (Section 2.3).Section 2.4 documents the procedure to encode invariances from both labeled and semi-labeled data.

Virtual Support Vector Machine
VSVM extent the well-known SVM algorithm.SVM fit a decision boundary in a way that enables a separation according to the maximum possible margin between different thematic classes (Fig. 1a) (Schölkopf and Smola, 2002) is the pool of labeled samples, with x i ∈ X and with u≫l is the pool of unlabeled samples.In the VSVM approach, we begin by training an SVM model.A significant characteristic of SVM is that only a fraction of the available labeled samples makes up the actual decision boundary between classes, known as SVs.As a result, the trained SVM is used to identify labeled samples that serve as SVs.SVs determine the computation of invariances (Section 2.4) by exclusively modifying the image features of the identified SVs to which the model should be invariant.The resulting virtual samples (Fig. 1b) are combined with the SVs to retrain the model, potentially resulting in a new decision boundary (Fig. 1c).In existing scientific literature (e.g., Izquierdo-Verdiguier et al., 2013), virtual samples derived from SVs have been referred to as virtual Support Vectors (vSVs).We solely use the term vSVs if the generated virtual samples actually turn out to be samples closest to the decision boundary after the relearning stage in order to follow an unambiguous terminology.
Further, VSVM hyperparameters need to be determined with a crossvalidation strategy during the relearning stage.Otherwise virtual samples eventually exceed the number of SVs substantially.Consequently, the model can be dominantly fitted on virtual samples, while the actual prior knowledge, i.e., the SVs, has a marginal effect on model determination.Moreover, virtual samples closely resemble their corresponding SVs.Therefore, when constructing a training data pool that includes virtual samples, it may be challenging to achieve a rigorous separation and simulation of unseen data using cross-validation techniques.This phenomenon is commonly recognized as data leakage.To allow for learning a valid VSVM model, we implement the holdout method, i.e., strictly separating the data in train and test set (Foody, 2009).To account for both internal and external spatial autocorrelation, we strictly compile train and test set spatially disjoint to avoid overoptimistic model accuracy estimates (Geiß et al., 2017a).

VSVM with self-learning constraints and semi-labeled samples
We extend VSVM by considering also unlabeled samples when establishing the model.Fig. 2a shows a corresponding scheme.Congruent with the VSVM approach, an SVM model is derived from X and deployed to extract labeled samples that represent SVs, i.e., X SV .Corresponding features are perturbed to create the corresponding pool of virtual samples V SV .However, in parallel, n unlabeled samples U are randomly selected from the pool of unlabeled samples U ∼ , i.e., U⊂U ∼ , and the SVM model is deployed to assign semi-labeled samples U semi− labeled .To remove non-informative samples from V SV and U semi− labeled , we implement a self-learning mechanism.It is composed of a similarity and margin sampling constraint (Fig. 3).The former foresees the computation of the Euclidian distance in the feature space between the samples of V SV and U semi− labeled and the nearest SV (Fig. 3a) (Lu et al., 2016;Geiß et al., 2019).We assume that samples contained in V SV and U semi− labeled C. Geiß et al. with a significant distance from an SV are non-informative and may lead to model divergence.Consequently, samples of V SV and U semi− labeled which exceed a certain class-specific distance threshold δ w.r.t. to the SVs are pruned from the model.Additionally, a margin sampling method is implemented (Tuia et al., 2011;Geiß et al., 2017b;Geiß et al., 2018).This is done to remove samples contained in V SV and U semi− labeled that are positioned at a great distance from the hyperplane and are thus not likely to turn into a vSV or semi-labeled SV and add positively to the classification outcome.Only samples from V SV and U semi− labeled that are in close proximity of the hyperplane are retained.The tolerated margin distance is specified with the threshold l (Fig. 3b).Thereby, we use a one-against-one SVM for multiclass classification settings: if a sample's distance from the hyperplane is less than l for at least one of the classspecific hyperplanes, it satisfies the criteria.The optimal model selection for the constrainment mechanism, which involves determining the class-specific distance to the SVs (δ) and distance to margin (l), is implemented as an additional minimization task.The optimal combination of these parameters is determined based on a classification accuracy metric.After pruning non-informative samples, we concatenate constrained semi-labeled samples Ûsemi− labeled and constrained virtual samples VSV with the SVs to form the new training set X = This combined dataset is used to retrain the model for a second time (Fig. 3c).
VSVM with self-learning constraints and semi-labeled samples  (continued )

VSVM with self-learning constraints and virtual semi-labeled samples
We further extend the previously proposed semi-supervised procedure by modifying the image features of semi-labeled samples that have become a semi-labeled SV after the second model run (Fig. 4).We extract SVs of Ûsemi− labeled and add them to the pool ÛSV semi− labeled .Analogous to Algorithm 1, we use ÛSV semi− labeled to perturb features and encode invariances for specific semi-labeled samples that have become SVs, denoted as V U semi− labeled .We employ the self-learning strategy to prune samples and create a subset of constrained virtual semi-labeled samples, referred to as VU semi− labeled (Fig. 4a and b).In the final step, we concatenate samples representing SVs in the training set, constrained virtual samples VSV , constrained semi-labeled samples Ûsemi− labeled , and constrained virtual semi-labeled samples This combined dataset is used for relearning the model (Fig. 4c).
VSVM with self-learning constraints and virtual semi-labeled samples

Encoding of invariances
We encode invariances based on image segmentation techniques, which are an essential constituting aspect of object-based image analysis methods (Blaschke, 2010).The first step includes the modelling of the objects that need to be extracted from the (image) data with a segmentation algorithm.The resulting segments, i.e., super-pixels, are subsequently deployed to characterize the objects of interest.In challenging classification settings where only a very limited number of labeled samples are available, the training data frequently covers solely a minor fraction of all existing object variations in the image domain.Additionally, an optimal representation of the objects of an image domain by segments remains challenging, despite approaches to compute and select an optimal segmentation in an automated way (Geiß et al., 2016a).Our strategy to cope with this situation foresees the variation of the parameters of a segmentation algorithm to generate a variety of object representations (details on the deployed segmentation algorithm are provided in the experimental setup Section 3.2).The parameters are varied w.r.t. to both size (i.e., object scale) and shape properties of the modelled objects, respectively.The technical implementation foresees the following processing steps: (i) a segmentation level with an initial parametrization of the segmentation algorithm is established; object features are calculated; an SVM model is established; and SVs are extracted (Fig. 2a); (ii) SVs are located and object representations from additional segmentation levels with varied parametrization are included in the model if they contain an SV; (iii) selected segments are deployed to compute features, which are integrated as virtual samples (cf.Fig. 1).Following this procedure, virtual sample equate according to the amount of SVs multiplied with the additional segmentation layers with varied parametrization.Likewise, virtual semi-labeled samples are generated by extracting the SVs of constrained semi-labeled samples, locating the SVs in the image domain, selecting corresponding additional segmentation levels with varied parametrization, and computing features which are depicted as virtual semi-labeled samples (Fig. 2b).
We encode two sorts of invariances for model enhancement: (object) scale and (object) shape, respectively.To render the model invariant w. r.t.size, i.e., scale, we establish a set of segmentation layers in a hierarchical way (Geiß et al., 2016b;Aravena Pelizari et al., 2018): the segments of a particular segmentation level must only be contained in one segment at the subsequent coarser segmentation level in order to guarantee an explicit hierarchy (cf.Geiß et al., 2019).Thereby, small scales allow for a valid representation of the smallest objects in the image, while large scales allow for representing the largest objects properly.To render the model invariant w.r.t.shape, we alter the parameters that constrain the shape-related characteristics of objects while maintaining the scale-related parameter unvaried.Analogous to the computational steps to establish invariance w.r.t.scale, a set of segmentations is computed for an exhaustive description of the image objects.Thereby, the self-learning mechanism is intended to remove noninformative virtual samples, as induced by the non-proper segmentbased representation of objects, from the classification model in an automated fashion.Finally, generated virtual samples are added to the set of virtual samples V SV and virtual semi-labeled samples V U semi− labeled (Fig. 5), respectively, which can be used for relearning the model.

Data
We apply the algorithms for classifying two data sets of WorldView-II multispectral data with 0.5 m geometric resolution.The first data set covers a small part of the built-up area of Cologne in Germany, and was recorded on January 31, 2014 (Fig. 6a).The imagery was taken form an off-nadir position.Six thematic classes were defined, i.e., "bush/tree", "meadow", "roof", "facade", "shadow", and "other impervious surface" (Fig. 6b).The six thematic classes were assigned using techniques of photointerpretation while including both additional aerial imagery and cadastral information.As mentioned in Section 2.1, we strictly separate the image data spatially according to training, test, and validation data (Fig. 6c).
The second data set represents a subset of 2000 × 2000 pixels showing the refugee camp of Hagadera in Kenya.It was acquired on March 01, 2012 (Fig. 6d).We distinguish five thematic classes, i.e., "built-up area", "bush/tree", "bare soil", "fence/wall", and "shadow" (Fig. 6e).The labels were determined with techniques of photointerpretation while integrating a thematic map from UNHCR (UNHCR, 2012).Fig. 6f documents the spatial separation of training, test, and validation data.

Experimental setup
To establish multiple segmentation layers for the generation of invariances, we followed a bottom-up region-growing segmentation procedure (Baatz & Schäpe, 2000).Thereby, a number of parameters need to be determined: To create an initial segmentation level, we parameterized the algorithm in a way that stresses the shape heterogeneity of generated segments.We followed this strategy since objects of the built environment feature distinct shape and size characteristics.The numerical value for the so-called scale parameter of the algorithm, which determines the extent of the modelled objects, corresponds to 20 for the Cologne data set and 25 for the Hagadera data set to establish a suitable tradeoff between under-and oversegmentation.When establishing scale invariance, we created nine extra segmentations with varied scale parameter for the first data set and seven extra segmentations for the second data set, respectively.To encode invariances of shape, we created for both images eight additional segmentation layers with varying parametrization of shape properties of modelled objects.A very detailed description of the optimization of all parameters can be found in Geiß et al. (2019).
An exhaustive number of features was computed to describe the modelled objects.The features comprise statistical metrics of central tendency (mean) and spread (standard deviation), regarding the bands of the multispectral imagery, and the Normalized Differenced Vegetation Index.Additionally, rotation-invariant texture metrics based on the grey-level co-occurrence matrix (GLCM) (Guo et al., 2021;Haralick et al., 1973) were computed.Regarding the latter, we deployed three measures, i.e., mean, homogeneity, and dissimilarity.Additionally, five shape-related features were computed, i.e., the so-called rectangular and elliptic fit, roundness, shape index, as well as compactness.Analogous to the segmentation procedure, we deployed the software eCognition (Trimble, 2014) to calculate the features based on provided or customized feature computation protocols.It can be noted that the numerical feature values were normalized to a 0-1 interval based on the segmentation with the initial parameterization.Subsequently, feature values as induced by segmentations with varied parameterization were aligned accordingly.
We used Gaussian RBF kernels for the models.We determined C and γ for each model individually as follows: 5 , ⋯, 2 3 } .We carried out experiments for both binary and multiclass classification, respectively.Regarding binary classification, we distinguish the class "bush/tree" from the remaining classes for data set I, and "built-up area" from the remaining classes for data set II. Naturally, we distinguish between all six and five of the thematic classes present in the Cologne and Hagadera data set, respectively, in terms of multiclass categorization.We created balanced training and test sets by randomly picking labeled samples from the training and test sets in a stratified way, using the same number of labeled samples per thematic class for learning and selecting a model.However, subsequently the amount of labeled samples per thematic class was varied to quantify the corresponding sensitivity of the model w.r.t.accuracy, i.e., we learn all models with a varying number of labeled samples per class = {10, 20, 30, 60, 90, 120, 160, 200}.Thereby, we treated the number of semi-labeled samples and virtual semi-labeled samples as a further hyperparameter that is required to be determined for all considered semi-supervised models in this work.Consequently, we deploy interchangeably up to 100 (20, 40, …, 100) semi-labeled and virtual semilabeled samples per thematic class additionally for model learning and select the model with the highest estimated generalization capability in terms of overall accuracy.Results for each model, learned with a specific number of labeled samples, are documented as averaged accuracies of 20 fully independent runs.However, independent from the SVM/VSVMbased algorithms, we implemented both CNNs (Geiß et al., 2022a) and semi-supervised CNNs to gain further insights on the competitiveness of the proposed models.Regarding the latter, we designed a semisupervised CNN framework.Thereby, multinomial logistic regression is deployed as classifier, whose scores represent class-conditional probabilities given by the softmax function (Geiß et al., 2022a).The class-conditional probabilities, obtained after an initial model run, are deployed to determine reliable semi-labeled samples which are included in a second model run.Here it can be noted, that all considered methods in this paper have strictly seen the same amount of prior knowledge.This is in particular relevant, since especially deep learning-based methods are frequently pretrained based on additional prior knowledge and subsequently transferred to address small sample settings, for instance in the context of few-shot learning approaches (Wang et al., 2020).To account for the available computational resources for this study, we provide averaged accuracies as a function of five fully independent runs regarding the considered CNN-based techniques.Additionally, corresponding classification maps (with additional κ and F1 statistics) from single model runs with 20 labeled samples per thematic class are presented to particularly highlight visual differences in the classification maps for settings with a few labeled samples.

Experimental results and discussion
The added value of the semi-supervised models is demonstrated by benchmarking them w.r.t. an SVM model which is built from features using the initial segmentation level, i.e., the model which is actually deployed for generating SVs (SVM-single-level).Moreover, an SVM is trained utilizing features computed from multiple segmentations (SVMmulti-level): further encoded object attributes are represented as extra features rather than virtual samples like in the VSVM approach (a likewise method is described in Bruzzone and Carlin, 2006).Additionally, we integrate accuracies of the VSVM when established with selflearning constraints (VSVM-SL) and without self-learning constraints (VSVM), respectively.Moreover, we also implemented a semisupervised SVM, where also a self-learning strategy was adopted to Fig. 5. Illustration of the creation of both virtual samples (green) and virtual semi-labeled samples (red) regarding scale and shape properties for a labeled and semi-labeled sample, respectively.Initially, an image segmentation algorithm with varying parameters is applied to the image to model (varying green/ red object outlines) and characterize (compute features) the objects of interest.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)solely include relevant semi-labeled samples (SVM-semi-labeled-SL).As additional benchmark, results from the supervised CNNs (CNN) and semi-supervised CNNs (CNN-semi-labeled) are also documented.In the following, the two newly proposed models are termed VSVM-SL-semilabeled and VSVM-SL-virtual-semi-labeled, respectively.The considered classification problems represent situations where solely a few labeled training samples can be deployed for model learning.

Results from data set I (Cologne)
Fig. 7 documents overall accuracies as a function of the number of labeled samples for data set I. For the binary classification problem, both types of invariances reveal comparable accuracies (Fig. 7a-b).The two newly proposed semi-supervised VSVM models allow obtaining unambiguously the highest accuracies across the whole spectrum of available prior knowledge and establish already with a few labeled samples a plateau-like performance pattern.A notable competitor represents the constrained semi-supervised SVM model (SVM-semi-labeled-SL), with slightly decreased performance properties, which underlines the general usefulness of informative semi-labeled samples.The VSVM-SL approach also achieves high accuracy levels especially with a few labeled samples.Overall, this stresses the necessity to solely include constrained, i.e., informative, both semi-labeled or virtual samples, to avoid model divergence.Correspondingly, the accuracies of unconstrained VSVM, SVM-single-level, and SVM-multi-level reveal a gap in performance compared to the aforementioned approaches for settings with a few labeled samples.Lastly, both CNN-based approaches require a substantially larger number of labeled samples to accelerate the gain in accuracy and achieve a competitive accuracy level, whereby the CNNsemi-labeled model provides slight but consistent advantages regarding accuracy compared to the CNN.
The related classification maps obtained with 20 samples per class reflect these relationships (Fig. 8a): they indicate that the conventional SVM-based methods, i.e., SVM-single-level and SVM-multi-level, suffer from commission errors regarding the class "bush/tree".In contrast, the more advanced methods are able to reduce errors of commission for the class "bush/trees".This is also true for the CNN-based models, whereby the high level of commission errors regarding the class "bush/tree" produced by the CNN could be reduced by the CNN-semi-labeled approach.However, for both CNN-based methods a low level of spatial regularization can be observed.In contrast, in this binary classification setting, the two newly proposed models feature the most favorable tradeoff between errors of commission and omission, i.e., highest F1 values, and they provide spatially smooth classification maps.
A likewise accuracy pattern can be inferred from the multiclass classification problem with shape invariance (Fig. 7d).It is exigent that more labeled samples are required to obtain high accuracy levels in the multiclass environment than in the binary one.Thereby, the semisupervised methods unambiguously allow for the highest accuracies.The strategy to encode additional object characteristics as features and not as virtual samples, i.e., SVM-multi-level, reveals increasing accuracies solely with an increasing number of labeled samples, i.e., with a decreasing level of overfitting.Thereby the SVM-single-level models are overtaken along the spectrum of available prior knowledge in an idealtypical way by the SVM-multi-level models.This indicates that the virtual samples-based approaches avoid problems related to a curse of dimensionality, i.e., no additional dimensions are added to the feature space what can be disadvantageous in small sample settings.When encoding scale invariance (Fig. 7c), the two newly proposed semisupervised VSVM models feature frequently the highest and most consistently increasing accuracies, which further underlines the beneficial performance properties induced by a constrained set of semi- labeled and virtual semi-labeled samples for model generation, respectively.Also here, both CNN-based models can solely reach a competitive accuracy level when learned from a substantial amount of prior knowledge.Corresponding classification maps achieved with 20 samples per class (Fig. 8b) further stress the properties of the SVM/VSVMbased semi-supervised models to enable spatially regularized and more precise maps in this very challenging multiclass classification setting with very few labeled samples.

Results from data set II (Hagadera)
We plotted overall accuracies w.r.t. the number of labeled samples for data set II in Fig. 9.In the binary classification context (Fig. 9a-b), the models utilizing constrained semi-labeled samples are consistently beneficial.Notably, regarding scale invariance, the model which also includes virtual semi-labeled samples (VSVM-SL-virtual-semi-labeled) induces additional performance gains with very little prior knowledge, i. e., 10-20 labeled samples per thematic class.Also here, the constrained VSVM model (VSVM-SL) allows for competitive accuracies and the unconstrained models, i.e., VSVM, SVM-single-level, and SVM-multi-level, feature the least favorable accuracy patterns of the SVM/VSVM-based methods.Analogous to data set I, the (semi-)supervised CNN algorithms require a substantially larger number of labeled samples for reaching a competitive accuracy level.However, while the SVM/VSVMbased methods reach a plateau fairly soon with only moderate improvements given additional prior knowledge, the (semi-)supervised CNN can consistently improve as the amount of prior knowledge grows.Thereby, the CNN-semi-labeled approach provides slight but consistent advantages regarding accuracy compared to the CNN.Also, the associated classification maps mirror these numbers (Fig. 10a).SVM-based maps feature a substantial error of commission regarding "built-up" areas.Instead, the (semi-supervised) constrained virtual sample-based models allow for considerably few errors of commission for "built-up" areas while jointly enabling spatially well-regularized yet fine-grained maps.Contrarily, given such as restricted amount of prior knowledge, the CNN-based maps suffer from a high level of omission of "built-up" areas.
Multiclass classification setting results are documented in Fig. 9c-d.The models which include constrained semi-labeled samples, i.e., SVMsemi-labeled-SL and VSVM-SL-semi-labeled, enable the highest accuracies.Thereby, our newly proposed VSVM-SL-semi-labeled model enables consistently the highest accuracy with a few labeled samples.When models are learned with 50 or more labeled samples per class, all SVM/VSVM-based techniques converge to a plateau of maximum accuracy, similar to the binary classification problem.Thereby, conventional SVM-based methods, i.e., SVM-single-level and SVM-multi-level, reveal least favorable performance properties.Also analogous to the binary classification problem, the (semi-)supervised CNN algorithms require a larger number of labeled samples and are not competitive in very small samples settings.For the scale invariance setting, a substantial commission error regarding the class "fence/wall" can be observed (Fig. 10b), especially when relying on the SVM-multi-level approach.However, a substantially better tradeoff in this multiclass classification setting can be achieved by the other SVM/VSVM-based methods, especially when applying the semi-supervised models.The obtained maps from the scale invariance setting suggest that all methods except for the two newly proposed models suffer from a commission error w.r.t. the class "built-up".As such, VSVM-SL-semi-labeled and VSVM-SL-virtual-semi-labeled simultaneously internalize a reduction of commission errors w.r.t. the class "built-up" and a reduction of omission errors regarding the class "fence/wall" compared to the baseline SVM, respectively, which allows for more accurate and fine-grained mapping results.To sum up, the outcomes for data set II also underline the benefits of the virtual samples-based semi-supervised learning techniques in terms of model accuracy.

Conclusion and outlook
We introduced two semi-supervised learning algorithms which are based on the framework of VSVM.The first model extends VSVM by integrating additionally constrained semi-labeled samples.Thereby, the model constrainment, i.e., the selection mechanism for including solely informative semi-labeled samples, builds upon a self-learning procedure composed of two active learning heuristics.The second model consecutively deploys semi-labeled samples for generation of semi-labeled virtual samples by modifying the features of semi-labeled samples that have become a semi-labeled SV after initial model runs.The proposed techniques were deployed to classify two multispectral data sets with a sub-meter geometric resolution.Classification results highlight the efficiency of the suggested techniques, which provide better accuracy properties in settings with very few labeled samples compared to related benchmark methods including SVM (single-level and multi-level), and VSVM with self-learning and without self-learning constraints, as well as (semi-)supervised CNNs, respectively.
Future work can naturally adapt the proposed algorithms in the context of a collaborative learning approach, i.e., combining active learning with semi-supervised learning (e.g., Munoz-Mari et al., 2012;Pan et al., 2018), since the self-learning procedure already internalizes dedicated active learning heuristics for SVM.Beyond, it would be interesting to substitute the presented super-pixel-based invariance generation process with a representation learning approach (Bengio et al., 2013), presumably based on models that aim to generalize well in small sample scenarios, such as contrastive learning (Jaiswal et al., 2021).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence

Fig. 1 .
Fig. 1.Functionality of a VSVM: (a) the decision boundary, i.e., separating hyperplane, that enables a separation of the different thematic classes is fitted; the decision boundary is fitted in a way that enables a separation according to the maximum possible margin between the different thematic classes; (b) corresponding SVs are deployed to generate virtual samples; (c) the model is learned for a second time using SVs and virtual samples to possibly change the position of the decision boundary which maximizes the margin.Previously unlabeled samples are eventually reassigned to a thematic class w.r.t. the newly found decision boundary.

Fig. 2 .
Fig. 2. Block schemes of the proposed semi-supervised VSVM learning strategies.(a) VSVM with self-learning constraints and inclusion of constrained semi-labeled samples; (b) additional module where also virtual semi-labeled samples are generated and eventually pruned by the self-learning constraints before the relearning stage.

Fig. 4 .
Fig. 4. Extended self-learning strategy where virtual semi-labeled samples a pruned from the model with a) the similarity constraint and b) the margin sampling constraint before c) relearning the model based on the remaining samples.

Fig. 6 .
Fig. 6.Data set I -Cologne: (a) multispectral imagery, (b) thematic classes, (c) distinction of areas according to training, test, and validation data; Data set II -Hagadera: (d) multispectral imagery, (e) thematic classes, (f) distinction of areas according to training, test, and validation data.

Fig. 7 .
Fig. 7. Overall accuracy (%; y-axis) reported as mean from twenty independent realizations as a function of the number of labeled samples per thematic class (xaxis); (a) binary classification problem with encoded scale invariance; (b) binary classification problem with encoded shape invariance; (c) multiclass classification problem with encoded scale invariance; (d) multiclass classification problem with encoded shape invariance.

Fig. 8 .
Fig. 8. Classification maps obtained with the various models for the binary (a) and multiclass (b) classification problem, respectively.

Fig. 9 .
Fig. 9. Overall accuracy (%; y-axis) reported as mean from twenty independent realizations as a function of the number of labeled samples per thematic class (xaxis); (a) binary classification problem with encoded scale invariance; (b) binary classification problem with encoded shape invariance; (c) multiclass classification problem with encoded scale invariance; (d) multiclass classification problem with encoded shape invariance.
6. Apply self-learning strategy on U semi− labeled and V SV 7. Establish subset of constrained semi-labeled samples Ûsemi− labeled and constrained virtual samples VSV 8. Compile training set X = X SV ∪ VSV ∪ Ûsemi− labeled 9. Learn SVM model with X