Fast and Adaptive Multi-agent Planning under Collaborative Temporal Logic Tasks via Poset Products

Efficient coordination and planning is essential for large-scale multi-agent systems that collaborate in a shared dynamic environment. Heuristic search methods or learning-based approaches often lack the guarantee on correctness and performance. Moreover, when the collaborative tasks contain both spatial and temporal requirements, e.g., as Linear Temporal Logic (LTL) formulas, formal methods provide a verifiable framework for task planning. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing studies are mostly limited to small artificial cases. To address this issue, a new planning paradigm is proposed in this work for system-wide temporal task formulas that are released online and continually. It avoids two common bottlenecks in the traditional methods, i.e., (i) the direct translation of the complete task formula to the associated B\"uchi automaton; and (ii) the synchronized product between the B\"uchi automaton and the transition models of all agents. Instead, an adaptive planning algorithm is proposed that computes the product of relaxed partially-ordered sets (R-posets) on-the-fly, and assigns these subtasks to the agents subject to the ordering constraints. It is shown that the first valid plan can be derived with a polynomial time and memory complexity w.r.t. the system size and the formula length. Our method can take into account task formulas with a length of more than 400 and a fleet with more than $400$ agents, while most existing methods fail at the formula length of 25 within a reasonable duration. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments.


I. INTRODUCTION
Recent advances in computation, perception and communication allow the deployment of autonomous robots in large, remote and hazardous environments, e.g., to assist service staff in hospitals [1], to maintain offshore drilling platforms [2], to monitor and assist construction sites [3].Furthermore, fleets of heterogeneous robots, such as unmanned ground vehicles and unmanned aerial vehicles, are deployed to accomplish tasks that are otherwise too inefficient or even infeasible for a single robot [4].Not only the overall efficiency of the team can be significantly improved by allowing the robots to move and act concurrently [5]; but also the capabilities of the team can be greatly extended by enabling multiple robots to directly collaborate on a task [6].Recent works have demonstrated such potentials preliminary for simple tasks such as collaborative exploration [7], formation control [8], object transportation [9] and pursuer-evader games [10].The task planning problem for multi-robot systems is in general NPhard [11], due to the inherent combinatorial nature of robottask assignment and various constraints such as capabilities and deadlines.The standard approach is to formulate a mixed integer linear programs (MILP) over the integer variables and constraints [12].Whereas being sound and optimal, these methods are applicable to only small-scale systems.Thus, extensive work can be found on designing meta-heuristic algorithms for finding sufficiently good solutions in a reasonable time, e.g., genetic algorithms [13], colony optimization [14], [15], particle swarm optimization [16], [17], learning-based algorithms [18], [19], [20], or large language models [21], [22].However, these methods often lack a formal guarantee on the correctness and quality of the planning results.Moreover, to specify more complex tasks, many recent works propose to use formal languages such as Linear Temporal Logic (LTL) [23], Computation Tree Logic (CTL) [24], and Signal Temporal Logic (STL) [25], as an intuitive yet powerful way to describe both spatial and temporal requirements on global [26] or local [27] behaviors.Notably, the works in [28], [29], [30] formulate MILP by a central planning unit given different system models and task constraints; the works in [31], [32], [33], [34], [35] instead propose various search algorithms over the state or solution space of the whole system.However, the aforementioned planning methods are often executed offline for a set of predefined static tasks.A particularly challenging scenario is when the system operates indefinitely, i.e., new tasks are released or canceled dynamically and continually by external demand [36]; or certain target features related to the tasks can change location during run time [10].This would require the fleet to adaptively change their task plans online to modify existing assignments and incorporate new tasks.Thus, the aforementioned methods become inadequate as the sequence of tasks is infinite and their specifications are unknown beforehand.Recursive application of the centralized methods in a naive way leads to not only intractable computation complexity, but also inconsistent or even oscillatory assignments.Thus, an efficient and adaptive planning scheme is essential for multi-robot systems that collaborate in a dynamic environment [37], [38], [39] or an unknown environment [40].

A. Related Work
The standard framework for planning under temporal tasks is based on the model-checking algorithm [23]: First, the task formulas are converted to a Deterministic Robin Automaton (DRA) or Nondeterministic Büchi Automaton (NBA), via off-the-shelf tools such as SPIN [41] and LTL2BA [42].Second, a product automaton is created between the automaton of formula and the models of all agents, such as weighted finite transition systems (wFTS) [23], Markov decision processes [43] or Petri nets [44].Last, certain graph search or optimization procedures are applied to find a valid and accepting plan within the product automaton, such as nested-Dijkstra search [32], integer programs [45], auction [46] or sampling-based [33], [47].
Thus, the fundamental step of all aforementioned methods is to translate the task formula into the associated automaton.This translation may lead a double-exponential size w.r.t. the formula length as shown in [42].The only exceptions are GR(1) formulas [48], of which the associated automaton can be derived in polynomial time but only for limited cases.In fact, for general LTL formulas with length more than 25, it takes more than 2 hours and 13GB memory to compute the associated NBA via LTL2BA.Although recent methods have greatly reduced the planning complexity in other aspects, the length of considered task specifications remains limited due to this translation process.For instance, the sampling-based method [33], [47] avoids creating the complete product automaton via RRT sampling, of which the largest simulated case has 400 agents and the task formula has a maximum length of 14.The planning method [32] decomposes the resulting task automaton into independent sub-tasks for parallel execution.The simulated case scales up to 100 robots and a task formula of maximum length 18.Moreover, other existing works such as [49], [50], [51], [52] mostly consider task formulas of length around 6-10.This limitation hinders its application to more complex robotic missions.
This drawback becomes even more apparent for dynamic scenes, where contingent tasks specified as LTL formulas are triggered by external observations and released to the whole team online.In such cases, most existing approaches compute the automaton associated with the new task, of which the synchronized product with the current automaton is derived, see e.g., [51], [52].Thus, the size of the resulting automaton equals to the product of all automata associated with the contingent tasks, which is clearly a combinatorial blow-up.Consequently, the amount of contingent tasks that can be handled by the aforementioned approaches is limited to handpicked examples.

B. Our Contribution
To overcome this curse of dimensionality in the size of tasks and agents, we propose a new paradigm that is fundamentally different from the model-checking-based methods.First, for a syntactically co-safe LTL (sc-LTL) formula that is a conjunction of numerous sub-formulas, we calculate the R-posets of each sub-formula as a set of subtasks and their partial temporal constraints.Then, an efficient algorithm is proposed to compute the product of R-posets associated with each sub-formula.The resulting product of R-posets is complete in the sense that it retains all subtasks from each R-poset along with their partial orderings and resolves potential conflicts.Given this product, a task assignment algorithm called the time-bound contract net (TBCN) is proposed to assign collaborative subtasks to the agents, subject to the partial ordering constraints.Last but not least, the same algorithm is applied online to dynamic scenes where contingent tasks are triggered and released by online observations.It is shown formally that the proposed method has a polynomial time and memory complexity to derive the first valid plan w.r.t. the system size and formula length.Extensive large-scale simulations and experiments are conducted for a hospital environment where service robots react to online requests such as collaborative transportation, cleaning and monitoring.
Main contribution of this work is three-fold: (i) a systematic method is proposed to tackle task formulas with length more than 400, which overcomes the limitation of existing translation tools that can only process formulas of length less than 25 in reasonable time; (ii) an efficient algorithm is proposed to decompose and integrate contingent tasks that are released online, which not only avoids a complete re-computation of the task automaton but also ensures a polynomial complexity to derive the first valid plan; (iii) the proposed task assignment algorithm is fully compatible with both static and dynamic scenarios with interactive objects.

C. Problem Statement
Consider a multi-agent system with heterogeneous capabilities, a series of sc-LTL task formulas that are released online, and a set of interactive objects in the dynamic environment.The objective is to generate a task plan for the system online such that these tasks are satisfied with high efficiency.
For instance as shown in Sec.II-C, a fleet of heterogeneous service robots is deployed in the hospital environment.Different tasks such as transportation of goods or patients, cleaning and maintenance are released online continuously.Many of such tasks contain numerous subtasks with ordering constraints that require direct collaboration of different robots.An efficient coordination algorithm is proposed such that these subtasks are assigned and fulfilled online in a timely manner.

II. RESULTS
In this section, we present the proposed solution briefly, where we first give a definition of task specification and introduce the core method of computing the R-poset product.Then, an efficient assignment algorithm is introduced under these posets with temporal and spatial constraints.The simulation and hardware experiment results are presented, against several strong baselines for different sizes of fleets and task complexities.Technical details and derivations can be found in the section of "Materials and Methods".

A. Task specification and R-poset product
We briefly introduce the syntax of Linear Temproal Logic (LTL) used for task specifications.The basic ingredients of LTL formulas are a set of atomic propositions AP in addition to several Boolean and temporal operators.Atomic propositions are Boolean variables that can be either true or false.The syntax of LTL is defined as: ♢φ where J fi True; p P AP is the alphabet; _ (conjunction), (disjunction), and ␣ (negation) are the logical operators; ♢ (eventually), ⃝ (next), U (until) are the temporal operators; l (always), ñ (implication) are the derived operators.A special class of LTL formula called syntactically co-safe formulas (sc-LTL) [23], [53] only contain the temporal operators ⃝, U and ♢.A complete description of the semantics and syntax of LTL can be found in [23].
Given a series of sc-LTL formulas that are released online, most existing methods would combine the formulas with (conjunction) and convert them into a Nondeterministic Büchi Automaton (NBA).It results in a graph structure that consists of states, transitions, guards, inital states, and final states.However, since its size is double exponential to the length of formula φ as proven in [42], it quickly becomes intractable with more task formulas.Instead, we compute the product of R-posets associated with these formulas, called the Poset-prod (denoted by b).The R-poset P " pΩ, ĺ, ‰q is a high-level abstraction of NBA, proposed in our earlier work [54], which consists of a set of subtasks Ω and their partial relations as less equal ĺ and conflict ‰.It has been proven therein that if the subtasks are executed under the partial relations, the resulting traces satisfy the NBA.Once a new task formula is released, it is transformed into a new R-poset P 2 and its product with the current R-poset P 1 is computed.More specifically, given P 1 , P 2 , the Poset-prod returns a new set of R-posets P " P 1 b P 2 that satisfies both P 1 and P 2 , which is computed by iterating the following two procedures: Task Composition and Relation Update.The first procedure is to create a group of subtasks as a combination between the subtasks in P 2 and the subtasks of P 1 that have not been executed.A depth-first-search algorithm is proposed to gradually add the subtasks along with the corresponding mapping function.The second procedure is to calculate the partial relations between the subtasks such that the ordering constraints among the subtasks in the composed product are consistent without conflicts.Consequently, the overall poset is given by P f inal " pΩ f , ĺ f , ‰ f q, which is iteratively computed each time a new poset is added.
The proposed Poset-prod method outperforms the tradition method both in computational efficiency and performance.This improvement becomes even more pronounced when the number and length of sub-formulas increase.This is because the time of generating NBAs of φ 1 , φ 2 , ¨¨¨grows linearly, while the time of converting φ 1 ^φ2 ^¨¨¨into NBA grows exponentially.Moreover, for a new added formula, the algorithm updates the final R-poset based on the previous result, which ensures the performance for online cases.Finally, it is an anytime algorithm that the algorithm can calculate an R-poset within linear complexity for multiple sub-formulas at the expense of optimality.The concrete complexity analysis of Poset-pord is shown in Sec.III, and the definition of R-poset and label is shown in Sec.IV.
To give an example, consider four sub-formulas φ 1 , φ 2 , φ 3 , φ 4 as shown in Fig. 1.Their NBAs B 1 , ¨¨¨, B 4 can be transformed into four R-posets P 1 , ¨¨¨, P 4 .The first round of Poset-prod is between P 1 and P 2 as P 1 b P 2 " tP f 1 , P f 2 u.Then, the following rounds of Poset-prod are performed between the results of the previous round and the next R-poset as P f 1 b P 3 , of which the first solution is denoted by P f 3 .At the expense of some optimality, the algorithm can go to next round before all results of P f 2 b P 3 are generated, as it is an anytime-algorithm.Finally, we can derive P f 4 by computing P f 3 b P 4 as the final R-poset which satisfies P 1 , ¨¨¨, P 4 .Note that the time to generate the NBA associated with φ 1 , i"1 φ i grows significantly with a duration of 0.058s, 0.076s, 1.56s, 25.56s via LTL2BA [42].By contrast, the time to compute these products P 1 b P 2 , P 1 b P 2 b P 3 , P 1 b P 2 b P 3 b P 4 grows slower with a duration of 0.199s, 0.279s, 0.385s.Thus, our method can deal with a much larger number of tasks online.Detailed comparisons between our method and traditional methods can be found in Sec.II-D.

B. Online Subtask Assignment under Complexity Constraints
Once the set of R-posets that satisfies all sub-formulas is generated, a series of subtasks in the R-poset should be assigned to the agents under several complexity constraints, including temporal constraints from R-poset, objects constraints and cooperative constraints.Similar to the classical Contract Net method [55], we propose an efficient and suboptimal assignment algorithm called Time Bound Contract Net (TBCN).The main difference is that all these constraints are represented uniformly by time bounds in our methods, and an example is shown in Fig. 2 .The final assignment is a group of timed sequence of robot actions J " rJ 1 , ¨¨¨, J n s, where J n " rpt k , ω k , a k q, ¨¨¨s indicates that agent n will execute action a k at time t k to satisfy subtask ω k , such that the newlyreleased tasks are fulfilled.
TBCN consists of three steps: Initialization, Computation of Feasible Subtasks and Online Bidding.As the partial constraints of final R-poset P f inal might be changed after computing the product, the subtasks that are conflicting with the updated partial orders are removed in Initialization.Then, the last two steps are iterated: the set of subtasks whose partial orders are satisfied given the current assignment J in Computation of Feasible Subtasks; Consequently, a linear program is formulated and solved in the Online Bidding, to choose one subtask from the feasible set.Thus, the resulting execution time and action plans are added to J .

C. Numerical Simulation
As shown in Fig. 4, the hospital environment consists of the wards, the operating rooms, the hall, the exits and the hallways.The multi-agent system is employed with 3 Junior Doctors, 6 Senior Doctors and 8 Nurse, and 4 types of patients are treated as interactive objects including Family Visitors, Vomiting Patients, Senior Patients and Junior Patients.The detailed mappings between agents, action, objects and their labels are shown in Table I.Furthermore, various types of tasks are considered, such as "Go the rounds of the wards and provide medicine", "Check and record the patient status", and "Prepare and perform an operation on a patient".Some tasks are released online under certain conditions.For instance, when a patient vomits at a region, the task "Take the patient into ward and check his status.Meanwhile, the doctors should not enter this region until it has been cleaned".The sc-LTL formula associated with the complete task is given by φ " φ b1 ^¨¨¨φ b6 ^φV P ^φJP ^φ1 SP ^φ2 SP ^φF V , which has a total length of 62. Detailed description of formulas are shown in Table II.
As shown in Fig. 3, each subtask ω i consists of the constraints before execution (in blue) and during execution (in green).The directed black arrows denote the ordering constraints, while the bidirectional red arrows for the conflicting constraints.The mapping from the subtasks within the input R-posets to the subtasks within the final R-poset is shown in brackets as ω 1  1 pω 1 1 q : ω 1 1 Ñ ω 1 .Note that most of R-posets on the right are directly incorporated into the final R-posets on the left, such as P b1 , ¨¨¨, P b6 , P u1 , ¨¨¨, P u5 .This is due to the fact that their subtasks are independent without ordering constraints, which allows for parallel execution.Additionally, some subtasks on the left represent the same subtasks on the right.For example, ω 1  1 of P b2 and ω 1 1 of P b3 representing both ω 5 , since their ordering constraints (in green) are identical.The same action can be performed to satisfy multiple subtasks on the left, which improves the overall efficiency.Furthermore, all subtasks on the right satisfy the partial relations between the corresponding subtasks on the left, while additional constraints are added if there are conflicts among the ordering constraints.For instance, action D H w7,w7 of ω 15 should not be executed before the subtask ω 14 .Thus, an additional ordering constraint is added such that subtask ω 15 should be executed after ω 14 .These properties guarantee that each subtask within the Rposets on the left can be executed, as their partial relations are satisfied.Consequently, the final R-poset satisfies all R-posets on the left.Note that the complete formula has a total length of 62, of which the NBA takes more than 1h to compute.On the contrary, the first final R-poset is computed within 10.96s.
The results of task assignment are shown in the Fig. 4, which include the updates at 40, 180, 215, 400s with 5 additional objects added.Then, after modifying the current Rposets to accommodate these formulas, the method TBCN is executed to assign the new subtasks within the final Rposet.The trajectories of agents and objects are shown, with certain regions that are not allowed to enter.For instance, when a patient vomits at region h13 in Fig. 4 (c), the nurses cannot enter until other agents have cleaned this region.These constraints ensure that both their actions and their trajectories satisfy the R-poset.In addition, once a new object is added, a new formula is released and then the R-poset is updated.All subtasks satisfy the ordering constraints in the R-posets.For instances, as shown in Fig. 4(d), although agents 6, 14 have arrived in region w7 before 100s to collaborate on the subtask σ 16 " tM H w7,w7 u (in blue), they have to wait until that agent 10 has fulfilled the subtask σ 14 " tC H w7,w7 u, due to the ordering constraints that pω 14 , ω 16 q Pĺ, tω 14 , ω 16 u P‰.The constraints introduced by objects are also satisfied, e.g., task ω 7 (in green) cannot be executed at 380s before subtask ω 6 (in purple) has been finished, since the required object 3 has not been transferred to region o 4 .Last but not least, most tasks are executed in parallel mostly with a total completion time of 815ss, which is much shorter than 2013.5sif the subtasks are executed sequentially.

D. Scalability Analysis and Comparisons
First, we show how the computation time of the proposed task assignment method TBCN varies under different numbers of agents and subtasks.Since the number of subtasks cannot be directly determined, we run the TBCN with a large number of formulas, of which the length ranges from 20 to 80.The number of subtasks and the associated computation time are recorded.As shown in Fig. 5, the average computation time only increases slightly as the number of agents is increased from 12 to 400, while the computation time increases considerably if the number of subtasks is increased from 22 to 34.This is due to the fact that new subtasks would introduce additional temporal constraints in the assignment procedure.The execution efficiency η from (6) decreases as the number of agent increases, while η increases slightly as the number of subtasks grows.The highest η is about 39% where most subtasks are executed in parallel with only minimum waiting time for task synchronization.
Secondly, to further validate the scalability of the proposed method against existing methods, we evaluate the time and memory cost to compute both the first R-poset and the complete R-posets by the proposed Poset-Prod, given the task formulas of different lengths.The conversion from a LTL formula to NBA is via LTL2BA.The following three baselines are considered: (i) the direct translation from the complete formula to NBA [42], in which the complete formula is the conjunction of all subformulas; (ii) the decompositionset-based algorithm [56], which decomposes the NBA into independent subtasks; (iii) the sampling-based method [33], [47], which generates a product automaton much smaller than the complete one by sampling the product states of NBA and WTSs via RRT.Each method is tested three times with five  formulas of the same length ranging from 5 to 400.As shown in Fig. 6, both the time and memory cost increase drastically with the formula length for all methods except the proposed Poset-prod to compute the first R-poset.In particular, when the formula length exceeds 15, the decomposition algorithm runs out of memory or time.The sampling-based method can not generate a solution when the formula length exceeds 20.Since all the baseline methods require the translation to NBA first, it becomes intractable as the formula length exceeds 25.
In contrast, the proposed method of Poset-prod can generate all R-posets with a formula length of 70 and the first R-poset even when the length reaches 400, which is consistent with our analyses in Sec.III.

E. Hardware Experiment
We further tested the proposed method on hardware within a simplified hospital environment.The multi-agent system consists of 10 differential-driven mobile robots, with 3 JD in green, 3 SD in yellow and 4 N u in blue.The required tasks include "prepare and execute an advanced operation for patient 1 at operating room o4".Similar to Sec.II-C, there are event-triggered tasks released online such as "when a patient vomits, take him to the ward, do not enter this region until it is cleaned."The exact task description and formulas are shown in Table .III.
In summary, the system is required with 6 sub-formulas, and the final formula is φ " φ b1 ^¨¨¨^φ b4 ^φF V ^φJP , with the total length 27, thus can not be translated into a NBA directly.The agent trajectories within 185s are shown in Fig. 7.As shown in the right of Fig. 7, the final R-poset consists of 14 subtasks, where the part in red is calculated offline and the part in yellow is generated online.Most subtasks are executed in parallel, meaning that the R-posets can be satisfied with a high efficiency.The complete R-poset product is derived in 3.1s and the task assignment method TBCN is activated three times, of which the average planing time is 2.7s.As shown in the left of Fig. 8, the Gantt graph is updated twice at 5s and 100s during execution, and subtasks that are released online are marked by green boxes.It is worth noting that the agent movement during real execution requires more time due to motion uncertainty, communication delay, drifting and collision avoidance.Nonetheless, the proposed method can adapt to these fluctuations and still satisfy the specified tasks, as shown in the Gantt graph of the overall execution in Fig. 8. Experiment videos are provided in the supplementary files.

III. DISCUSSION
In this work, we propose Poset-prod, an efficient online task planning algorithm for multi-agent systems where tasks are released dynamically and constantly online.It consists of a systematic method to convert the temporal tasks into their equivalent R-posets, of which their products are computed online.Given these R-posets, an anytime task assignment algorithm is proposed to adapt the local plans of robots online, such that the overall safety and correctness is ensured.The overall framework is shown to be fast and efficient, thus particularly suitable for large-scale multi-agent systems collaborating in dynamic environments.
The most significant advantage of Poset-prod is that the complexity of obtaining the first solution only grows linearly with the length of formulas.Given a set of formulas tφ i , i ď mu and max i |φ i | " n, deriving the first solution has a polynomial complexity of Opm 2 n 3 q.Despite that the overall complexity to compute the complete R-posets of tφ i u is Opn 5m q, it is still much smaller than the complexity of calculating the NBA of the conjunction φ via LTL2BA [42], which is Opmn ¨2mn q.Thus, our method can plan for task formulas with a length of about 400 within about 50s, while most existing methods fail at the formula length of 25.
Future work involves two directions: (i) extending the sc-LTL task formulas to general LTL and other languages such as CTL [24] and STL [25].It remains unclear how general LTL formulas with always operators can be incorporated in the R-poset, especially with the prefix-suffix structure; (ii) considering unknown and uncertain environments that are modeled as Markov Decision Processes (MDP).In this case, a reactive high-level plan is essential to take into account all possible environment behaviors.

IV. MATERIALS AND METHODS
In this section, we provide the knowledge of LTL in Preliminaries, the definition of alphabets and objective function in problem formulation, and algorithm details in Approach.

A. Preliminaries
As mentioned in Sec.II-A of Result, the basic ingredients of Linear Temporal Logic (LTL) formulas are a set of atomic propositions AP in addition to several Boolean and temporal operators.For a given LTL formula φ, there exists a Nondeterministic Büchi Automaton (NBA) as follows: Definition 1.A NBA A fi pS, Σ, δ, pS 0 , S F qq is a 4-tuple, where S are the states; Σ " AP ; δ : S ˆΣ Ñ 2 S are transition relations; S 0 , S F Ď S are initial and accepting states.
An infinite word w over the alphabet 2 AP is defined as an infinite sequence W " σ 1 σ 2 ¨¨¨, σ i P 2 AP .The language of φ is defined as the set of words that satisfy φ, namely, Lpφq " W ordspφq " tW | W |ù φu and |ù is the satisfaction relation.Additionally, the resulting run of w within A is an infinite  sequence ρ " s 0 s 1 s 2 ¨¨¨such that s 0 P S 0 , and s i P S, s i`1 P δps i , σ i q hold for all index i ě 0. A run is called accepting if it holds that infpρq X S F ‰ H, where infpρq is the set of states that appear in ρ infinitely often.A special class of LTL formula called syntactically co-safe formulas (sc-LTL) [23], [53], which can be satisfied by a set of finite sequence of words.They only contain the temporal operators ⃝, U and ♢ and are written in positive normal form where the negation operator ␣ is not allowed before the temporal operators.A relaxed partially ordered set (R-poset) over an NBA B φ is defined as follows: Definition 2 (R-poset).
A word is accepting if it satisfies all the constraints imposed by P φ .Additionally, the word of R-poset will satisfy the NBA as W ordspP φ q Ă W ordspφq. Definition 3 (Language of R-poset).[54] Given a word w " σ 1  1 σ 1 2 ¨¨¨satisfying R-poset P φ " pΩ φ , ĺ φ , ‰ φ q, denoted as w |ù P φ , it holds that: i) given ω i1 " pi 1 , σ i1 , σ s i1 q P Ω φ , there exist Relation calculation Environment Fig. 9. Framework of the proposed method.For a new input formula φ i , we firstly change it into an R-poset P i with the method in Sec.IV-A.Then, Poset-prod between P i and P f inal is calculated as in Sec.IV-C1.Thirdly, after the R-poset P f inal is updated, the TBCN method in Sec.IV-C2 determines and adjusts the task sequence of each agent and object.New tasks are released when the agents execute their local plans and detect new objects online.
iii) @pω i1 , ¨¨¨, ω in q P‰ φ , Dℓ ď n, σ i ℓ Ę σ 1 j1 .Language of Rposet P φ is the set of all word w that satisfies P φ , denoted by LpP φ q fi tw|w ( P φ u. Assuming that P bi " tP bi 1 , P bi 2 , ¨¨¨u is the set of all possible R-posets of NBA B bi , it is shown in Lemma 3 of our previous work [54] that: (i) LpP bi j q Ď LpB bi q, P bi j P P bi φ ; (ii) LpP i q " LpB bi q.In other words, the R-posets contain the necessary information for subsequent steps.

B. Problem Formulation
1) Collaborative Multi-agent Systems: Consider a workspace W Ă R 2 with M regions of interest denoted as W fi tW 1 , ¨¨¨, W M u, where W m P W . Furthermore, there is a group of agents denoted by N fi t1, ¨¨¨, N u with different types L fi t1, ¨¨¨, Lu.More specifically, each agent n P N belongs to only one type l " M type pnq, where M type : N Ñ L. Each type of agents l P L is capable of providing a set of different actions denoted by A l .The set of all actions is denoted as A a " Ť lPL A l " ta 1 , ¨¨¨, a n C u. Without loss of generality, the agents can navigate between each region via the same transition graph, i.e., G " pW, Ñ G , d G q, where ÑĎ W ˆW represents the allowed transitions; and d G :Ñ G Ñ R `maps each transition to its duration.
Moreover, there is a set of interactive objects O fi to 1 , ¨¨¨, o U u with several types T fi tT 1 , ¨¨¨, T H u scattered across the workspace W .These objects are interactive and can be transported by the agents from one region to another.An interactive object o u P O is described by a three-tuple: where T hu P T is the type of object; t u P R `is the time when o u appears in workspace W ; W p u R `Ñ W is a function that u ptq returns region of o u at time t ě u ; and W p u pt u q Ď W is initial region.Additionally, new objects appear in W over time and are then added to the set O. With a slightly abusive of notation, we denote the set of initial objects O in that already exist in W and the set of online objects O on that are added during execution, i.e., O " O in Y O on .
To interact with the objects, the agents can provide a series of collaborative behaviors C fi tC 1 , . . ., C K u.A collaborative behavior C k P C is a tuple defined as follows: where o u k P O Y tHu is the interactive object if any; a i P A a is the set of cooperative actions required; 0 ă n i ď N is the number of agents to provide the action a i ; and L k is the set of action indices associated with the behavior C k .Also, d k denotes the execution time of C k .
A behavior can only be executed if the required object is at the desired region.Since objects can only be transported by the agents, it is essential for the planning process to find the correct order of these transportation behaviors.Related works [57], [50] build a transition system to model the interaction between objects and agents, the size of which grows exponentially with the number of agents and objects.
2) Task Specifications: Consider the following two types of atomic propositions: (i) p l m is true when any agent of type l P L is in region W m P W; p r m is true when any object of type r P T is in region Given these propositions, the team-wise task specification is specified as a sc-LTL formula over tp, cu: where tφ bi , i " 1, ¨¨¨, Iu, tφ eu , o u P O on u are two sets of sc-LTL formulas over tp, cu.The φ bi is specified in advance while φ eu is generated online when a new object o u is added to O on .
To satisfy the LTL formula φ, the complete action sequence of all agents is defined as where J n P J is the sequence of pt k , C k , a k q, which means that agent n executes behavior C k by providing the collaborative service a k P A a at time t k .In turn, the sequences of actions for an interactive object is defined as: where J o u " pt k , C k q ¨¨¨is the sequence of tasks associated with object o u P O.The task pair pt k , C k q is added to J o u if object o u joins behavior C k at time t k .Assume that the duration of formula φ bi , φ ei from being published to being satisfied is given by D , the average efficiency is defined as which is the percentage of time when actions are Problem 1.Given the sc-LTL formula in (3), synthesize and update the motion and action sequence agents J and objects J o each agent n P N to satisfy φ and maximize execution efficient η.
Although maximizing the task efficiency a multi-agent system is a classical problem, the combination of interactive objects, long formulas and contingent tasks imposes new challenges in terms of exponential complexity [42], [50] and online adaptation [51].

C. Approach
As shown in Fig. 9, when a new R-poset is generated, the proposed solution realizes the requirement through two main components: i) Product of R-posets, where the product of existing R-posets is computed incrementally; ii) Task assignment, where subtasks are assigned to the agents given the temporal and spatial constraints specified in the R-poset.
1) Product of R-posets : As the first two steps showed in Fig. 9, when a new formula φ i P Φ b , Φ u is added, it will be transform into NBA first.Then, an R-poset P i is generated by the method proposed in our previous work [54].The other R-poset P f inal involved in calculation is the previous result of Poest-prod, which will be updated after this round calculation.With P f inal as P 1 " pΩ 1 , ĺ 1 , ‰ 1 q, P i as P 2 " pΩ 2 , ĺ 2 , ‰ 2 q, we define product of R-posets as follows: Definition 4 (Product of R-posets).Given a finite word w 0 , the product of two R-posets P 1 , P 2 is defined as a set of Rposets P r " tP 1  1 , P 1 2 , ¨¨¨u, denoted as P r " P 1 Â P 2 where P 1 i satisfies two conditions: (i) if w 0 w P LpP 1 q, w P LpP 2 q, then w 0 w P Ť P 1 i PP r LpP 1 q; (ii) if w 0 w P LpP 1 i q, P 1 i P P r , then w 0 w P LpP 1 q, w P LpP 2 q.
The word w 0 is already executed, containing the finished subtasks Ω 1 f inish of P 1 .Thus, w 0 can not influence P 2 whose subtasks will be executed in the future.Specially, if the new formula is offline as φ i P Φ b and the agents have not started to execute subtasks, w 0 will be set as empty.As showed in Fig. 9, Poset-prod consists of following two steps.
A depth-first-search (DFS) [58] is used to add the subtasks of Ω 2 to Ω 1 in order and these mapping relations will be recorded in M Ω .The search sequences of DFS can be initialized as que " rpΩ 1 " Ω 1 , M Ω " Hqs.Then, during the circle, we will fetch the first node of que as pΩ 1 , M 1 Ω q.The next unmixed subtasks in which means the subtask ω 1 j in Ω 1 can satisfy both ω 1 j , ω 2 i .Moreover, for the subtask ω 2 i , we can always create a set of subtasks Ω1 and the corresponding mapping function M 1 Ω by appending ω 2 i into Ω1 as ω1 j such that ω1 j |ù ω 2 i holds, i.e., which means the subtask ω 1 j can be executed to satisfy ω 2 i .This step ends if the time budget t b or the search sequence que exhausted.Once |DpM 1 Ω q| " |Ω 2 |, a Ω 1 satisfying Ω 1 , Ω 2 is already found.In this case, the next step is triggered.As showed in Fig. 9 , one of found combination is M 1 Ω p1q " 1, M 1 Ω p3q " 2, M 1 Ω p4q " 3, and ω f 1 is created by (7), and ω f 3 , ω f 4 is created by ( 8).(ii) Relation Update.Given the set of subtasks Ω 1 and the mapping function M Ω , we calculate the partial relations among them and build a product R-poset P .Firstly, we construct the "less-equal" constraint ĺ follows: which inherits both less equal relations ĺ 1 in P 1 and ĺ 2 in P 2 .Then, we update Ω 1 to consider the constraints imposed by the self-loop labels in other subtasks.Specifically, if a new relation pω i , ω j q is added to ĺ by pM ´1 Ω pω 2 i q, M ´1 Ω pω 2 j qq Pĺ 2 while pω 1 i , ω 1 j q Rĺ 1 holds, ω i is required to be executed before ω j although pω 1 i , ω 1 j q does not belong to ĺ 1 of P 1 .In this case, σ i and σ s i are updated to guarantee the satisfaction of the selfloop labels σ s j before executing σ j .For each subtask ω i , the newly-added suf-subtasks from ĺ 1 , ĺ 2 are defined as S 1 p , S 2 p , i.e., S 1 p "tω j |pω i , ω j q Pĺ, pω 1 i , ω 1 j q Rĺ 1 u, S 2 p "tM ´1 Ω pω j q|pω i , ω j q Pĺ, pM ´1 Ω pω 2 i q, M ´1 Ω pω 2 j qq Rĺ 2 u, (10) where are the subtasks that should be executed after ω i .Thus, the action labels σ i and self-loop labels σ s i in ω i " pi, σ i , σ s i q are updated accordingly as follows: in which σ s i and σ i should be executed under the additional labels σ thus the self-loop labels of S 1 p , S 2 p are satisfied.Finally, we find the potential ordering by checking whether a subtask σ s i is in conflicts with another subtask σ j while pω i , ω j q Rĺ.If so, an additional ordering pω i , ω j q will be added to ĺ as: Then, Ω will be updated following (10) and (11).Regarding the set of subtasks Ω that have no conflicts in ĺ, its "notequal" relations ‰ is generated by a simple combination as: The resulting poset P i " pΩ, ĺ, ‰q is added to P f inal .As shown in Fig. 9, Relation Update gets two R-posets and the partial relations of each R-poset are succeeded from P i , P f inal .Due to the anytime property, the two steps procedure can be repeated until all possible R-posets are found or just ended when the first R-poset is found.
2) Task Assignment: To satisfy the final R-poset P f inal " pΩ f , ĺ f , ‰ f q, the subtasks Ω f should be executed under the partial orders of ĺ f , ‰ f .Each subtask ω i P Ω c represents a collaborative behavior C j .Thus, we can redefine the action sequence of each agent J n P J as J n " rpt k , ω k , a k q, ¨¨¨s and the action sequence of each object J o u P J o as J o u " rpt k , ω k , a k q, ¨¨¨s, in which we replace the cooperative behavior C k with ω k since C k P σ k .
We propose a sub-optimal algorithm called Time Bound Contract Net (TBCN) to generate and adapt the action sequence of agents and objects.Compared with the classical Contract Net method [55], the main differences are: (i) the partial order ĺ f , ‰ f of R-poset might be changed when new formula added, (ii) the assigned subtasks should satisfy the partial orders in ĺ f , ‰ f ; (iii) the cooperative task should be fulfilled by multiple agents; and (iv) interactive objects should be considered as an additional constraints.TBCN solves these differences with three steps: Firstly, we design a cancellation mechanism in the initialization to adapt to the changes of Rposet mentioned in (i).Secondly, the partial orders in (ii) are guaranteed by only assigning feasible subtasks but not all unassigned subtasks in each loop.Thirdly, the constraints mentioned in (iii) and (iv) are considered as a time bound t 1 P R `in the bidding process.
(i) Initialization: Once the R-poset P f inal is update by Poset-prod, we firstly collect the action sequence J , J o of previous solution and the set of finished subtasks Ω f inish from executing word w 0 .Specially, all of them will be empty if it is the first round.Then, a set of essential conflict subtasks Ω ec is defined to collect the subtasks which might conflicts the updated partial orders ĺ f , ‰ f : Then, we compute the set of subtasks Ω conf that should be removed from J , J o : in which are the subtasks in Ω ec and the subtasks whose presubtasks will be removed.With the action sequences J , J o removed all the subtasks in Ω conf , we can initialize the set of assigned subtasks Ω as " tω i |@pt i , ω i , a i q P J u and the set of unassigned subtasks Ω u " Ω f {Ω as .
(ii) Computation of Feasible Subtasks: After initialization, the algorithm starts a loop to assigned the subtasks in Ω u : getting a set of feasible subtasks Ω f e ; calculating their time bounds; choosing the best subtask.For the ordering constraints ĺ c , if pω j , ω i q Pĺ c , assigning pt kj , ω j , a kj q to a task sequence J i " ¨¨¨pt ki , ω i , a ki q will violate such constraints.Thus, the Ω f e based on current Ω as is defined as: Ω f e " tω i |ω i P Ω u , @pω j , ω i q Pĺ f , ω j P Ω as u, (16) in which the subtasks may lead to unfeasible action sequences being eliminated.
(iii) Online Bidding: Then, we will try assigning each subtask in Ω f e and only choose the one with the best result.Without loss of generality, we assume that subtask ω i P Ω f e requires a label pC k q u k k1,k2 , which means the agents need to execute the behavior C k from region W k1 to region W k2 using object u k .Any constraint mentioned in (ii), (iii) and (iv) can be considered as a time bound t 1 P R `which means such constraint can be satisfied after t 1 .Here, we use three kinds of time bounds: the global time bound t ω i , the object time bound t o u k and the set of local time bounds T s .The global time bound t ω i is the time instance that the ordering constraints ĺ f and conflict constraints ‰ f will be satisfied if behavior pC k q u k k1,k2 is executed after t ω i : t ω i ě t j ,@pj, iq Pĺ f , ω j P Ω as , t ω i ě t ℓ `dℓ ,@tω i , ω ℓ , ¨¨¨u P‰ f , ω ℓ P Ω as . ( For the required object u k , assuming its participated last task is J o u k r´1s " pt ℓ , ω ℓ q, the object time bound t o u k should satisfy that: W p u ptq " W k1 , t ě t ℓ `dℓ , , @t ě t o u k , which means the object u k will be at the starting region W k1 of the current behavior pC k q u k k1,k2 and ready for it after t o u k .Additionally, we set t o u k " 8 if the object is not at W k1 after the action sequence J o u k , and we set t o u k " 0 if the behavior pC k q u k k1,k2 does not require object as u k " H.The set of local time bound is defined as T s " tpA n , t a n q|A n " A Mtypepnq X A C k , t a n " t ℓ `dℓ `dG pk ℓ , k 1 q, @n P N u, (19) where A n is the set of actions that agent n can provide for behavior C k , t a n is the earliest time agent n can arrive region W k1 , t ℓ `dℓ is the time when the last subtask ω ℓ P J n r´1s has finished, d G pk ℓ , k 1 q is the cost of moving and W k ℓ is the goal region of ω ℓ .pA n , t a n q means agent n can begin behavior pC k q u k k1,k2 after time t a n by providing one of action a ℓ P A n .Using these time bounds, we can determine the agents and their providing actions and generate a new party assignment J i , J i o to minimize the ending time of subtask ω i .The efficiency η of each assignment J i , J i o , ω i P Ω u is calculated, and the subtask with max efficiency will be chosen.Afterwards, the chosen subtask is removed from Ω un and added to Ω as .The action sequences J , J o are updated accordingly as J " J i , J o " J i o for the next iteration.
Proof.If a word w " σ 1 1 σ 1 2 ¨¨¨satisfies P j " pΩ j , ĺ j , ‰ j q, P j P P f inal , it satisfies the three conditions mentioned in Def. 3. In first condition, due to the step Task Composition of Poset-prod, we can infer that for any ω 1 n " pn, σ 1 n , σ s1 n q P Ω 1 , there exists ω n " pn, σ n , σ s n q P Ω j , with σ 1 n Ď σ n , σ s1 n Ď σ s n .Thus, we have σ 1 i1 Ď σ i1 Ď σ 1 ℓ1 and σ s1 i1 X σ 1 m1 " H, @m 1 ă ℓ 1 , which indicates that w satisfies P 1 for condition 1.For the second condition, due to (9) in step Relation Update, we have ĺ i Ďĺ j .Thus, we can infer that w satisfies P 1 for the second condition: Any pω 1 i1 , ω 1 i2 q Pĺ 1 , we have pω i1 , ω i2 q Pĺ j , thus Dℓ 1 ď ℓ 2 , σ 1 i2 Ď σ i2 Ď σ 1 ℓ2 , and @m 2 ă ℓ 2 , σ s1 ℓ2 Ď σ s ℓ2 Ď σ 1 m2 .Additionally, for the last condition, as the word w satisfied the ‰ j order of P j .We have ‰ 1 Ď‰ j due to (13).Thus, the word w also satisfied the third condition.In the end, we can conclude that w satisfies P 1 .In the same way, we can proof the w also satisfies P 2 .Thus, LpP j q Ď LpP 1 q X LpP 2 q Theorem 2 (Completeness).Given two R-posets P 1 , P 2 getting from B 1 , B 2 , with enough time budget, Poset-prod returns a set P f inal consisting of all final product, and its language LpP f inal q " Ť PiPP f inal LpP i q is equal to LpP f inal q " LpP 1 q Ş LpP 2 q.

1 Fig. 2 .
Fig.2.The bidding process of assigning subtask 4 under the global time bound (when the partial relations in R-poset are satisfied), the time bound for object (when the object is reachable) and local time bounds (when the agents get ready).

3 Fig. 3 .
Fig. 3. Illustration of computing the product posets.Left: The final R-poset computed from the initial sub-formulas and online sub-formulas (dashed lines); Right: P b 1 , ¨¨¨, P b 6 are the R-posets associated with the initial sub-formulas and Pu 1 , ¨¨¨, Pu 5 are associated with the online sub-formulas.

Fig. 4 .
Fig. 4. Left: Snapshot of agent trajectories at 40, 180, 215, 400s when new tasks are added online.Right: The Gantt graph of task assignment at these time instants, as highlighted in green boxes.

Fig. 5 .
Fig. 5.The computation time (Left) and the execution efficiency η (Right) with respect to different number of agents and subtasks.

Fig. 6 .Fig. 7 .
Fig. 6.The comparison of the computation time (Left) and memory (Right) by different methods.

Fig. 8 .
Fig. 8.The Gantt graph of the planned execution (Left) and the Gantt graph of the actual execution (Right).