Generalized Synchronized Active Learning for Multi-Agent-Based Data Selection on Mobile Robotic Systems

In mobile robotics, perception in uncontrolled environments like autonomous driving is a central hurdle. Existing active learning frameworks can help enhance perception by efficiently selecting data samples for labeling, but they are often constrained by the necessity of full data availability in data centers, hindering real-time, on-field adaptations. To address this, our work unveils a novel active learning formulation optimized for multi-robot settings. It harnesses the collaborative power of several robotic agents, considerably enhancing the data acquisition and synchronization processes. Experimental evidence indicates that our approach markedly surpasses traditional active learning frameworks by up to 2.5 percent points and 90% less data uploads, delivering new possibilities for advancements in the realms of mobile robotics and autonomous systems.

based on the scenario dictating data selection and the specific method employed in choosing the data.According to Settles [5], the existing scenarios are generally classified into stream-based and pool-based active learning.The former presents data as a continuous stream, necessitating immediate decisions on data selection or disposal, whereas the latter assumes the availability of all data at a centralized data center, accessible at any time.Query strategies typically fall into three categories: uncertainty-based, diversity-based, and learning-based, each with unique approaches to data selection and utilization [6].Whereas uncertainty-based methods estimate the value for each sample individually, diversity-based and learning-based methods utilize both labeled and unlabeled data to either enhance dataset coverage or ascertain the estimated value of a sample using a trained model.
Although active learning holds considerable potential in the field of robotics, it also poses certain challenges in implementation.The prevailing pool-based scenario demands the centralization of the entire dataset to select samples for labeling, a requirement that often proves infeasible due to logistical and financial constraints.Although stream-based active learning is a viable alternative, current research in this domain inadequately addresses the multifaceted needs of robotic operations [7], especially in the context of perception tasks and multi-robot deployments.Furthermore, a significant gap exists in accommodating multiple data streams simultaneously, which is crucial when coordinating multiple robots.
Given the growing deployment of mobile robots in diverse fields such as autonomous driving, supply delivery, and autonomous hazard zone inspections, enhancing robotic perception remains a critical research frontier.Addressing this, our work aims to devise strategies for efficient label selection to facilitate improved perception, focusing particularly on fostering collaboration among multiple robots in exploring and annotating environments within an active learning framework.We introduce a novel hybrid scenario that integrates the strengths of stream-based and pool-based active learning, promoting effective multi-robot label collection and synchronization while enabling multi-stream processing.
Our contributions can be summarized as follows: r We propose a comprehensive, generalized formulation for active learning scenarios, uniquely adapted to facilitate multi-agent data collection, setting the stage for more collaborative robotics systems.
r Leveraging the aforementioned formulation, we introduce a pioneering framework for Synchronized Multi-Agent-Robotic AcTive Learning (SMARTL).This framework efficiently adapts and fuses data selections from multiple agents, optimizing both resource allocation and operational effectiveness.
r Through rigorous experimentation, we substantiate the versatility and efficacy of our newly devised framework, showcasing its superior performance in advancing multirobotic active learning compared to existing stream-and pool-based methods.In the subsequent sections, we provide a detailed exposition of these contributions, exploring the potential of the SMARTL framework in a multi-robotic environment.

II. RELATED WORK
Research in the field of active learning has primarily focused on pool-based techniques.A limited number of studies have ventured beyond this to explore different facets of active learning, including alternative training strategies [8], and varied scenarios [7].
For sample selection, early adaption of uncertainty methods utilized sampling techniques such as Monte Carlo dropout [9] and ensembles [10], assessing the uncertainty across multiple forward passes or models using entropy or Bald [11] for each sample individually.To increase the effectiveness of batch selection, Kirsch et al. [12] introduced the joint uncertainty of the batches as a selection metric.Latter research directions employ Gaussian mixture models to asses uncertainty [13], [14].Uncertainty methods have been applied and tailored to the tasks of classification [7], [15], object detection [16], [17], 3D object detection [18], [19], semantic segmentation [20] and graphs [21].
In contrast, diversity-based methods aim to represent the dataset distribution more accurately using a limited number of samples.A prominent representative of this category is Core-Set [22].Subsequent studies have endeavored to integrate diversity and uncertainty metrics into a single cost function [23], [24].Schmidt et al. [25] leveraged distance gradients for diversitybased selection.Following the combination of uncertainty and diversity, this field has given rise to task-agnostic active learning frameworks specifically designed for 3D object detection [26], [27].Liang et al. [28] combined diversity metrics computed in latent space with diversity metrics in 3D space and time.
Learning-based methods utilize an auxiliary model or extension to determine the utility of a sample, distinguishing them significantly from the approaches mentioned above.Several approaches have been explored, including the introduction of loss prediction modules for ranking samples [29], the development of variational adversarial active learning [30], [31], [32], as well as teacher-student approaches [33], [34].Caramalau et al. [35] structured the latent space as a graph and trained a graph convolutional network (GCN) to differentiate labeled and unlabeled samples.Their CoreGCN applies CoreSet on the GCN features, while UncertaintyGCN uses the uncertainty of the GCN prediction.
Despite advancements in the aforementioned categories, stream-based active learning remains relatively unexplored, especially in the realms of perception and robotics.The primary focus in this sector has been on distribution shifts [36], [37], [38], with significant theoretical contributions in submodular optimization [39], [40].Perception has been approached with Mondrian forests [41].In deep learning-based perception, uncertainty-based methods have been implemented on a single robot [42] or combined with submodular optimization [43].In subsequent work, Schmidt and Günnemann [7] exploited the temporal characteristics of streams to enhance data selection.Saran et al. [44] introduced a approximate volume sampling in gradient space.
Beyond these areas of research, the emerging field of federated active learning aims to select data from multiple distributed data clients.On these clients, samples have been selected using probabilities [45] or individual local acquisitions [46].Ahn et al. [47] compared the scores of the local models for a global data selection.Later, Kim et al. [48] expanded upon this by balancing the data subsets of the different clients.In robotics, active learning can be additionally used for control and reinforcement learning [49], whose methodologies differ substantially from their perception counterparts.
However, a significant research gap remains in tailoring these methods to multi-robot scenarios, where centralized training and data storage are the norm, but decentralized computation and storage are restricted.The current literature scarcely addresses active learning scenarios involving multiple agents outside the context of federated active learning.

III. TWO STAGE ACTIVE LEARNING WITH DISTRIBUTED AGENTS
In this section, we first introduce the setting in which our SMARTL framework operates.From highlighting the limitations of contemporary approaches, we derive a novel generalized active learning formulation.Subsequently, we refine this concept to develop the SMARTL framework.
Preliminaries: Active learning fundamentally involves the selection of a subset D L of a dataset D that should be labeled.The dataset D is assumed to be a sample from some (unknown) distribution of perception scenarios denoted as D ∼ Ω. Concurrently, an unlabeled subset D U exists, satisfying the condition The overarching goal of this process is evaluated by the performance of a trained model F (ω|D L ) parameterized by its weights ω while maintaining a minimal size for D L .The active learning process is iterative, encompassing N cycles, wherein each cycle i selects a new batch D l of size b.A poolbased query Q P leverages the current label D i L and unlabeled set D i U as well as the model weights to select the new batch D l of size b for annotation: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The pool-based scenario assumes all data to be present in the data center such that a full-scale selection can be conducted, including diversity-based and learning-based methods relying on extensive calculations using all unlabeled and labeled data.In robotics, unlike other applications where data is primarily stored in a data center, data is initially stored on the robot during the collection and operational phase.If the dataset is not considered to be comprehensive before the robot's operation commences, recorded data from an operation D t O is added to the data pool, such that , where D t O ∼ Ω.For the sake of simplification, we presuppose that the operations t are synchronized with the cycles i, thus i = t.
However, this scenario has significant downsides for mobile robotics.It assumes the transfer of all data to the data center, which may not always be feasible, depending on the type of data transfer.Additionally, storing all data during the operation puts additional hardware requirements in place.
In contrast to the pool-based scenario, the stream-based scenario dictates that the samples are not retained.Instead, they are either earmarked within an internal storage of size b in the streambatch scenario [7], [43] or instantaneously marked for labeling.Since the samples are not stored, the query function ) is restricted to evaluate each sample once, indicated with an arrow − → D U .This method has the advantage of facilitating selection online at the mobile operator level, ensuring only necessary data is saved.Nonetheless, this assumes a single agent overseeing data collection and cannot reflect multiple robots.
Multi-Agent Formulation: To address the aforementioned challenges, we introduce a hybrid two-stage scenario that scales to multiple agents by effectively preventing information overlap among various agents and simultaneously minimizing the requirements for data storage and uploads.It is crucial to consider that mobile robots frequently operate in non-isolated environments and may receive overlapping information.In this setup, we consider j agents engaged in data collection.Given the impracticability of storing and uploading all the data collected by a large number of agents, a stream-based selection becomes indispensable.Each agent j encounters in each cycle i a sensor stream, denoted as − → D i,j O , embodying potential candidates for labeling.
To mitigate the data upload problem, a subset O of size b j s for each agent j is selected on the agent, which can be expressed for a single cycle i as follows: Upon aggregating the uploaded data D i C , we encounter a predicament where the information content from two distinct agents overlaps, implying D i,j c ∩ D i,j c = ∅, ∀i ∈ N .To resolve this, we delineate a secondary stage executed at the data center to select b p samples by defining D i C as an addition for the unlabeled Combing the onboard and data center selections given in ( 6), ( 9), the full framework composes to: Through this formulation, we have created two distinct stages represented by Q P and Q S .The query function Q S operates in real-time, independently on each robotic agent.Conversely, Q P functions centrally at the data center, orchestrating the subsequent layer of data processing and selection.A visual representation of our framework is shown in Fig. 1.The illustrated two-phase query process begins with the stream-based query Q S on the agents (1) and the upload of D C from the first selection phase (2).Afterward, follows the pool-based query Q P on the data center (3).To close the cycle, the data D l is sent to a human annotator (4) and added to the labeled pool (5).Subsequently, the network is trained with the updated labeled pool (6), and the agents are synchronized with the newly refined model (7).
Generalized Active Learning Formulation: In our pursuit to establish (10) as a generalized expression of active learning, we introduce a fundamental component, the identity query Q I .This query function is designed to select all data, thereby satisfying S = Q I (S) for any dataset S. Consequently, this identity element permits the representation of both stream-based or pool-based scenarios [5], [7] as special instances encapsulated within (10), underlining the entitlement as generalized scenario formulation.To substantiate this claim, we provide the corresponding parameters to generate the plurality of active learning scenarios in Table I.
Framework Definition: Enabled by the generalized active learning formulation, we introduce a comprehensive and adaptable hybrid scenario active learning framework: Stage 1 -Mobile Operator Data Stream Processing: This stage primarily concerns onboard data streams from sources like cameras and LiDAR sensors on the robots.Each robot or agent acts individually in this stage, only involving the deployed perception neural network.Given the stream-based setup inherent to online robotics selection, the query function at this Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Stage 2 -Data Center Pool Processing: Subsequent to the mobile operator stage, a post-operation initiates a pool-based selection and synchronization phase.After the robots upload b s samples, a diversity-based method is employed to accurately reflect the relationship between the newly acquired data and the data previously uploaded to the system, optimizing coverage over the entire dataset.The diversity-based state queries b p samples, so possible information overlaps of samples are mitigated.This can be achieved by minimizing a set of latent space features f using a pairwise distance function d.In our approach, we adopt the greedy L2 distance metric of the CoreSet approach [22] and enrich it with metadata, which has been successfully presented by Liang et al. [28].The metadata contains the information about the selecting robot and whether it is in the labeled or unlabeled pool.We concatenate the normalized metadata to feature vectors f to reflect the relationship of samples collected by one robot in contrast to those collected by different agents.
Application Example: Consider a large-scale active learning scenario with hundreds of autonomous vehicles, representing an exemplary mobile robot type, acting as agents.In this scenario, each vehicle is equipped with a computer vision system comprising a camera sensor and a computing unit running a deep neural network.
Implementation of SMARTL: Importantly, SMARTL is agnostic with respect to the applied uncertainty estimation technique.It allows employing methods based on minor neural network modifications [7], [29] or multiple [9], or temporal network forward passes [17] that are most feasible for the existing computing unit and task.We will highlight this flexibility in the experiments and refer to original works for details on implementation.Additionally, data storage for b s samples is required.
As SMARTL has no real-time requirements, the samples can be uploaded via any protocol or transferred via physical devices at hubs.This flexibility ensures that SMARTL operates without any latency dependencies.In our example, we assume that collected data is transferred daily.Furthermore, the number of agents can be adjusted dynamically, as data collection is decentralized with inter-robot dependencies, enhancing scalability.Since SMARTL modifies only the data collection phase, it does not affect the data labeling processes used in traditional frameworks.
Relation to existing frameworks: Traditional active learning frameworks primarily assume the presence of all data on a data center, requiring each agent to store all data collected and subsequently transfer it to a centralized data center for selection.Given the use of high-definition cameras, this approach could demand several GB per hour per agent, resulting in substantial data storage and transfer challenges.SMARTL considers a selection and collection outside the data center, enabling data selection at the individual agent level.This approach removes the need to gather all data at a centralized data center before the selection process and eliminates most storage and transfer costs.The second selection phase on the data synchronizes the selection of the individual agents and removes overlaps, leveraging scalability.

IV. EVALUATION OF TWO STAGE ACTIVE LEARNING
In the subsequent section, we evaluate the performance of our hybrid active learning framework.To showcase the versatility of our approach, we conduct experiments involving various tasks and datasets.We evaluate classification with the GTA V streets (GTAVs) dataset [7] and CIFAR-100 [50] to validate our approach's efficacy for environments with a larger number of classes.Furthermore, we validate the semantic segmentation task on A2D2 [51] and CityScapes [1].Given that distributed multi-agent collection stands as a central pillar of our framework, we concurrently process multiple streams during each active learning cycle as depicted in Fig. 1.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.We postulate the existence of a set of mobile robots engaged in data acquisition from multiple streams, with subsequent synchronization occurring at the data center.This scenario is compared to a purely pool-based data center selection, as well as isolated steam-based selection processes undertaken by individual robots.Importantly, we monitor the data saved during the data upload process, as minimizing the volume of uploads constitutes a relevant objective of our methodology.
In our experiments, we compare our method against several baselines.As other methods cannot leverage a two-stage selection, we added the identity query Q I in the data center phase for stream-based selections and in the agent selection phase for pool-based methods.Note that this grants a considerable advantage of a full data upload for pool-based methods.We consider the Badge [23], CoreSet [22], CoreGCN [35] and UncertaintyGCN [35] as pool-based baselines.As streambased baselines, we consider active learning on edge devices (ALED) [43], Monte Carlo dropout Entropy (Ent) [9] and Bald, loss learning (LLoss) [29] and a Random selection.
Classification: For the classification task, we utilize a ResNet18 [52] model and the GTAVs dataset [7], specifically designed for operation domain detection.This dataset encompasses seven distinct routes.In our setup, ten selection agents are deployed, each encountering an alternating segmented subset of the current route.We use the setting and routes presented in [7], but start the early stopping after 50 epochs.Given its unstructured nature, which is atypical in robotic applications, the CIFAR-100 dataset is partitioned randomly.This deliberate selection compensates for the limited class variety encountered in operation domain detection datasets.For CIFAR-100, we followed the parameters in [53] that are provided in the benchmark repository. 1 We additionally used the augmentations LLoss guidelines established for CIFAR-10 in [29].During each cycle, 10000 samples are presented in partitions of 1000 to each agent in a stream-based manner, with a selection size of b p = 25%.In contrast, the GTAVs dataset saw a more conservative selection of b p = 5% per cycle, adhering to the experimental 1 [Online].Available: https://github.com/weiaicunzai/pytorch-cifar100setup described in [7].We choose our mobile operator selection size to b s = 2 • b p and all b j s to be equal.The robot executes a LLoss with modifications of [7] for GTAVs and an Ent query for CIFAR-100.We conduct two distinct experiments to simulate the behavior of autonomous agents traversing overlapping routes: One scenario with no overlap (0%) and another with significant overlap, constituting 30% of the routes.For an overlap, we appended samples of the previous and subsequent subsets of one robot to that of another robot, which increased the stream size and, consequently, the number of selected samples.
The experiments conducted on the GTAVs dataset, as illustrated in Fig. 2, demonstrate that our method achieves stronger performance with fewer queries and matches the performance of the fully trained model earlier.While the LLoss with the modifications of [7] exhibits good performance in the second cycle, the lack of robot synchronization caused the performance to drop again.For the overlap experiment in Fig. 2(b), a noticeable increase in performance is observed, as our method attains a performance level comparable to the CoreGCN approach.Furthermore, a review of the data uploads presented in Table II reveals that our SMARTL significantly reduces the number of uploads by 90% for the GTAVs dataset.It should be noted that since all uploaded samples are available for pool-based methods, these techniques can capitalize on interesting scenarios that might be overlooked by methods that select directly on the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.As depicted in Fig. 3, our SMARTL consistently exhibits robust performance throughout all cycles on CIFAR-100, with only CoreSet marginally outperforming it in the final cycle.Notably, CoreGCN encountered memory issues while handling this dataset, a drawback attributable to its reliance on distance-based calculations, which become intractable for a larger amount of uploaded samples.This scenario highlights the versatility of our framework, which is adept at accommodating multi-class datasets -a theoretical exploration that holds promising relevance for diverse and practical applications in robotics.
Our ambition is not to overshadow existing state-of-the-art performances but to align with them, showcasing efficiency through a significant reduction in the number of necessary uploads.In addition to assessing the commonly evaluated metric of performance gain in an active learning setup, we are also keen on scrutinizing the volume of data transfer in order to furnish a comprehensive analysis of our system's efficacy and efficiency.
Semantic Segmentation: To validate our framework on a more challenging task, we use the CityScapes [1] dataset and A2D2 [51].Both datasets contain different drives in different cities.We assume four and three robots operate concurrently in separate cities or drives for CityScapes and A2D2, respectively.For our experiments, we used a DeepLabV3+ [54] model outfitted with a ResNet34 backbone, utilizing pre-trained weights provided by PyTorch.Regarding the Cityscapes dataset, we adhered to the parameter settings outlined in [55], with the exception of the resize crop parameters, where we adjusted the factor to a range of 0.5 to 1 and set the resized dimensions to 256 × 512.For A2D2, we followed the guidelines established in [7] but decreased the learning rate to 0.01 for the backbone.We use an auxiliary loss with an FCN head at the third ResNet34 block to improve the diversity-based selection within our data center query, delivering the diversity calculation features.Our framework utilizes Monte Carlo Dropout Ent for Q S in the A2D2 dataset and BALD for the Cityscapes dataset with max accumulation.We further select b p = 20% of each stream with  In Fig. 5, we present the evaluation of our experiments conducted on the CityScapes dataset.As depicted in Fig. 5(a), our method demonstrates superior performance, improving faster and surpassing all other methods.This is particularly evident when compared to stream-based selections, where our approach excels in effectively synchronizing data.However, in the overlap experiments depicted in Fig. 5(b), the overlap weakens the individual robots' selection process, narrowing the performance gap between our method and the CoreSet approach, which uploads all data.In our A2D2 experiments, we observed similar trends, substantiating our initial findings.Notably, A2D2 offers a larger pool of selectable samples, 4510, compared to CityScapes' 2245.In Fig. 4, it is evident that while our SMARTL approach initiates with a slightly subdued performance compared to poolbased methods, it gains momentum over subsequent cycles, ultimately eclipsing the performance of all other contenders.Conversely, the stream-based approaches lag, showcasing a weaker performance trajectory.Since A2D2 recordings around Munich are not overlap-free, SMARTL confirms its robustness in the overlap scenario.
Upload Volume Analysis: To examine the influence on the different query sizes b s and b p , we compare different rations in Table IV.Our results indicate that the sizes balance Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.between uncertainty-and diversity-based selection in the different phases.The b s can be reduced for CityScapes with more diverse and overlap-free routes, while for the overlapping A2D2, a higher b s leads to a performance increase.Following this SMARTL can effectively combine the strengths of both uncertainty-and diversity-based strategies.
Our experiments demonstrate that the distributed structure of our approach seamlessly integrates the uncertainty and diversity components of active learning query functions.SMARTL effectively reduces the number of required data uploads, as shown in Table III, while maintaining the highest performance over all experiments.This preserves even in scenarios with overlapping operations.Due to the absence of interrobot dependency SMARTL is seamlessly scalable to any number of robots.The openness in the uncertainty stage and the embedding space distance calculation enabled SMARTL to outperform existing methods on various tasks on established benchmark datasets.As these characteristics are shared for all computer vision tasks SMARTL is not limited to presented tasks.

V. CONCLUSION
In this study, we have introduced SMARTL, a pioneering active learning framework that facilitates simultaneous data collection from multiple mobile agents, addressing the existing gaps in modeling multi-agent active learning scenarios.By deriving a general active learning formulation, we pave the way for novel active learning scenarios.A central feature of our SMARTL framework is its ability to strategically branch out the active learning query into two components: an uncertainty-based query managed by the robot and a diversity-based query conducted at the data center.This division allows for a scalable integration with an arbitrary number of agents, with minimal adaption of the perception model.The flexibility in the uncertainty estimation component allows task-specific adaption, amplifying scalability.
Our extensive experiments underscore the framework's efficacy, demonstrating a significant reduction in data upload volumes while surpassing the label selection performance of conventional pool-based active learning methods, necessitating complete data transfer to the data center.In future work, we plan to expand the capabilities of the diversity-based query component by incorporating metadata more extensively to further capitalize on the distributed nature of multi-robot setups.Besides, we intend to investigate robot synchronization to tackle overlapping routes and include an unsupervised learning phase in our framework.

Fig. 1 .
Fig. 1.Multi-Robot Active Learning Framework.Left: Selection on the individual robots.Middle: query to synchronize and filter the uploaded samples.Right: send labels to human annotator for labeling, update training set, and train neural network.

Fig. 2 .
Fig. 2. Comparison of various active learning configurations on GTAVs, highlighting standard errors.Dotted lines represent stream-based selections, while dashed lines indicate pool-based methods that upload all data.

Fig. 3 .
Fig. 3. CIFAR-100 with overlap on ResNet18, with highlighting standard errors.Stream-based selections in dotted, pool-based methods that upload all data in dashed.

Fig. 5 .
Fig. 5. Comparison of different active learning methods -mIoU over used data with indicated standard errors.Dotted lines represent stream-based selections, while dashed lines indicate pool-based methods that upload all data.

TABLE II COMPARISON
OF UPLOAD FREQUENCIES ACROSS DIFFERENT METHODS FOR CLASSIFICATION EXPERIMENTS REVEALS THAT OUR SMARTL , FUNCTIONINGAS A HYBRID MODEL, REQUIRES SIGNIFICANTLY FEWER UPLOADS THAN POOL-BASED METHODS

TABLE III COMPARISON
OF UPLOAD FREQUENCIES ACROSS DIFFERENT METHODS FOR CITYSCAPES (CS) AND A2D2 HIGHLIGHTING THE UPLOAD EFFICIENCY OF SMARTL AGAINST POOL-BASED METHODS b s = 2 • b p .As for classification, we conduct two different experiments where we model the behavior of overlapping routes of the autonomous agent: One with 0% overlap and one with 30% overlap between the agents.

TABLE IV PERFORMANCE
(MIOU) COMPARISON OF DIFFERENT SELECTION BUDGETS FOR CITYSCAPES (CS) AND A2D2.STANDARD ERROR TO THE LAST DIGIT IN BRACKETS