Collaborative Computing and Resource Allocation for LEO Satellite-Assisted Internet of Things

School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory, 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, China


Introduction
Internet of things (IoT) plays an important role in future intelligent society, and many techniques have been evaluated and implemented to provide better service for the data transmission in IoT network. Although the fifth generation (5G) wireless systems can support massive machine type communication (mMTC), it mainly focuses on the terrestrial network-based IoT. For depopulated areas where lack telecommunication infrastructures, satellite communication has been adopted as an important component for 5G beyond or the sixth generation (6G) wireless systems [1,2]. Moreover, edge computing refers to the techniques that shift the computing units to the access nodes near the user equipment, or the data is processed at the user equipment locally [3]. With edge computing, the access delay can be reduced, and the radio resource can also be utilized efficiently. Benefit from the development of satellite on-board processing techniques [4], edge computing-enhanced satellite networks have also become a hot topic for integrated satellite and terrestrial networks [2,5,6]. Thus, edge computing-enhanced satellite-assisted IoT (S-IoT) receives lots of attention in both industrial and academic areas [7,8].
For edge computing-enhanced S-IoT networks, the IoT devices and satellites are both resource-limited. Therefore, the joint computing and communication resource allocation for tasks generated by users (IoT devices) is important for improving the performance of the systems [9]. Moreover, the characteristics of the satellite networks will also affect the mechanisms used for the resource management [10]. Generally, the existing satellite communication systems can be divided into three categories, which are geosynchronous earth orbit (GEO), medium earth orbit (MEO), and low earth orbit (LEO). Among the three categories of satellite systems, LEO satellites with the lowest propagation delay are emerging as an important component for future integrated satellite and terrestrial networks [11][12][13]. In this paper, we also focus on the edge computing-enhanced LEO satellite networks, and the advantages of LEO satellite-assisted IoT over the other satellite systems can be summarized as follows: (1) The propagation delay introduced by the LEO satellite is low. For example, the one-trip propagation delay is about 5 ms for LEO satellites located at the height of 1500 km. While it is about 120 ms for GEO satellites (2) Though the satellite on-board processing capability is usually limited, multiple satellites in the LEO networks, especially the mega-constellation LEO networks, can form a virtual resource pool, which can be utilized to improve the performance of the edge computing enhanced S-IoT (3) Multiple LEO satellites can generate overlapped coverage areas, and the communication and computing resource can be allocated flexibly From the above analysis, the LEO S-IoT can benefit from the low propagation delay and collaborative processing among multiple satellites. However, the edge computing-enhanced LEO S-IoT still faces several problems. First, the varying topology of LEO networks makes it difficult to manage the limited communication and computing resource dynamically [14]. Second, the resource management should jointly consider multiple types of links, such as satellite-to-ground and satellite-to-satellite. Last but not the least, the communication and computing resource should be jointly allocated.
In this paper, edge computing-enhanced LEO S-IoT is considered, and the task generated by the users needs to be handled locally or via the LEO networks collaboratively. Different from the existing studies, collaborative computing among multiple satellites is utilized to reduce the on-board processing latency. Moreover, the satellite-to-ground and satellite-to-satellite links are considered jointly for communication and computing resource allocation. The main contributions are listed as below.
(i) A framework for collaborative computing among multiple LEO satellites with varying topology is provided, and the effects of satellite-to-ground and satellite-to-satellite links on the processing latency are jointly considered (ii) The collaborative computing and resource allocation for user tasks are formulated as a joint task offloading, scheduling, and multidimensional resource allocation problem to maximize the completion rate of tasks, and the complex problem is divided into two subproblems with low complexity (iii) Deep reinforcement learning (DRL) and max-min fairness optimization are adopted to achieve longterm benefits in terms of task completion rate, and simulation results verify the performance of the proposed algorithms The remainder of this paper is organized as follows. Section 2 summarizes the related works. The system model and problem formulation are described in Section 3. In Section 4, resource allocation based on max-min fairness and task scheduling and offloading with DRL is analyzed, respectively. Section 5 evaluates the proposed algorithms, and Section 6 concludes this paper.

Related Works
Joint task offloading, scheduling, and resource allocation plays an important role in edge computing enhanced S-IoT networks. Papa et al. evaluate the reconfigurable softwaredefined network with LEO constellation and propose an optimal controller placement and satellite-to-controller assignment method which can minimize the average flow setup time [15]. Liu et al. propose a task-orient network architecture for edge computing enhanced space-airground-aqua integrated networks [9]. Xie et al. analyze the joint caching, communication, and computing resource management for space information networks [2]. Although the joint task offloading, scheduling, and resource allocation for edge computing enhanced S-IoT is highlighted in the existing works [2,9,15], the resource management methods are not given in detail.
Cheng et al. propose a computing offloading method for IoT applications in space-air-ground integrated network with fixed data rate [16]. Cao et al. propose an edge-cloud architecture based on software-defined networking and network function virtualization for the space-air-ground integrated network [17]. Wang et al. introduce the hardware and software structure for the edge computing-enhanced S-IoT [18]. A fine-grained resource management scheme is introduced by Wang et al. for edge computing-enhanced satellite networks [19]. Yan et al. propose a 5G satellite edge computing framework based on microservice architecture with the embedded hardware platform [5]. LiWang et al. investigate the computing offloading methods with delay and cost constraints for satellite-ground internet of vehicles [20]. Jiao et al. analyze a joint network stability and resource allocation optimization problem for high-throughput satellite-based IoT [21]. An orbital edge computing architecture is introduced by Denby and Lucia, and the power and software optimization for the orbital edge are also analyzed [22]. Cui et al. propose a joint offloading and resource allocation for GEO satellite-assisted vehicle-to-vehicle communication [23]. A collaborative computing and resource allocation method among multiple user pairs is given by Zhang et al. for GEO S-IoT [24]. Song et al. propose a mobile edge computing framework for terrestrial-satellite IoT, and an energy-efficient computing offloading and resource allocation method is used to minimize the weighted sum energy consumption [25]. A learning-based queueaware task offloading and resource allocation algorithm is analyzed by Liao et al. for space-air-ground-integrated power IoT [26]. Tang et al. present a hybrid cloud and edge computing LEO satellite network and investigate the computation offloading decisions to minimize the sum energy consumption of ground users [27].
Though some exiting works listed above investigate the task offloading and resource allocation for edge 2 Wireless Communications and Mobile Computing computing-enhanced S-IoT, none of these works consider the collaborative computing among multiple LEO satellites. Moreover, the joint optimization of satellite-to-ground and satellite-to-satellite links is not investigated either.

System Model and Problem Formulation
In this paper, we consider a typical scenario, as shown in Figure 1, consisting of multiple terrestrial users and a LEO satellite constellation. The LEO satellite constellation is deployed with intersatellite links (ISLs) for cooperative processing among satellites, and each satellite can exchange information with four adjacent satellites through ISLs. With the network topology described above, the tasks generated by the users arrive randomly as a time series, and the tasks can be processed locally or offloaded to satellites for processing. Although the computing resource of a single satellite is scarce due to the characteristics of the on-board devices, computing units on multiple satellites can form a collaborative computing pool, and the satellites which are overloaded can forward the tasks that need to be handled to the other satellites with light load. Thus, the task offloaded to the satellites can be handled by its serving satellite or other satellites available via ISLs. Moreover, the computing resource available for the IoT devices is also limited, and tasks that need to be processed locally should be handled one by one. While for satellites, tasks from multiple users can be handled in parallel, and the resource is shared among tasks belong to multiple users. Moreover, every satellite needs to maintain the satelliteto-ground transmission queue and the on-board processing queue. For satellite-to-ground transmission, the tasks offloaded to satellites will be scheduled slot by slot, and the tasks cannot be partitioned. When the task arrives at the satellite used for data processing, it will enter the processing queue and wait for data processing. To guarantee the efficiency and reliability of transmission and data processing, the communication/computing resource allocated to user tasks will be occupied until the end of the transmission/processing. After the data processing, the results will be delivered to the users. During the task offloading processes, the resource occupancy and mobility of the LEO satellites will both affect the performance of the system. For tasks handled locally, the latency is mainly composed of waiting latency in queue and processing latency. While for tasks handled by satellites, several factors, which are transmission latency, propagation delay, and processing latency, need to be considered. Thus, the system models adopted and problem formulation are listed in the following subsections.

Satellite Orbit Model.
In earth-centered inertial (ECI) coordinate system, the position of the satellite in space can be described by orbital elements, namely, eccentricity e, semimajor axis a, inclination i, right ascension of the ascending node (RAAN) Ω, argument of periapsis ω, and initial true anomaly φ. In this paper, we consider the satellite orbit to be a circular orbit with e = 0 and ω = 0, so the ECI coordinate of satellite at time t can be expressed as where R is the earth radius, h is the height of satellite, and where ω s = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi GM e /ðR + hÞ 3 q denotes the angular velocity of satellite, which is related to the altitude of satellite, gravitational constant G, and mass of the earth M e . t = ρðl − 1Þ denotes the running time of the satellite at the beginning of time slot l, in which ρ is the length of one time slot.
To obtain the coordinates of satellite n, the RAAN Ω n and initial true anomaly φ n also needs to be calculated. Since the Walker constellation is symmetrical and all satellites adopt circular orbits with the same height and the same inclination, the orbital plane is evenly distributed along the equator, and the satellites are evenly distributed in the orbital plane. The phase relationship of the satellites in different orbital planes can be expressed as where N denotes the number of satellites, P denotes the  3 Wireless Communications and Mobile Computing number of orbits, S denotes the number of satellites in each orbit, thus, N = P × S. P n is the serial number of the orbit where the satellite n is located, and P n = bn/Sc + 1. N n is the serial number of satellite n in its orbit, and N n = n − ðP n − 1ÞS. F denotes the phase factor of orbit. With (1) and (2), we can obtain the ECI coordinate of any satellite in the constellation at any time slot.

Coverage
Model. Generally, the users' coordinates are expressed with longitude, latitude, and altitude (LLA) in the geographic coordinate system. To derive the situation of coverage of satellite at time slot l, we first need to convert the ECI coordinate of the satellite to earth centered earth fixed (ECEF) coordinate with the following formula where η g = η g 0 + ω e t, in which η g 0 denotes the Greenwich hour angle at the beginning of the first time slot, and ω e denotes the angular velocity of the earth's rotation. Then, we need to convert the ECEF coordinate of the satellite to LLA coordinate ðlon, lat, altÞ according to where J = a/1 − e 2 sin 2 lat. Since e = 0, J = a can be achieved. With (3) and (4), we can obtain the LLA coordinate of users and satellites, and the elevation angle σ of user can be expressed as σ = arctan cos Δlon cos lat u cos lat s + sin lat u sin lat where Δlon = lon u − lon s , lon u and lat u denote user's longitude and latitude, respectively, and lon s and lat s denote satellite's longitude and latitude, respectively. We consider user m is covered by satellite n when σ m n is larger than the minimum elevation angle σ min . At the beginning, user m will select the satellite n a with the smallest elevation angle for association, and when σ m n is less than σ min as the associated satellite moves, the user will select the satellite n h according to the elevation angle at the current time slot for handover.

Channel Model.
In the scenario shown in Figure 1, two kinds of links should be considered, which are satellite-toground (StG) and satellite-to-satellite (StS) links. For StG links, line-of-sight (LOS) channel is assumed to be always existing between satellites and users on ground. Since we focus on the resource management for the satellite systems, only the path loss affected by the distance between satellites and users is considered in this paper, and the free space path loss (FSPL) model is adopted. Moreover, the channel quality of the users on ground, which are indicated by signal-tonoise ratio (SNR), will be quantified and transmitted to the satellites via the control channels. For StS links, point-topoint optical links are assumed to be implemented, and the channel capacity of the StS links is assumed to be large enough for the data transmission between satellites.
3.4. Task Arrival Model. Generally, LEO satellite-assisted IoT is suitable for multiple kinds of services, such as object identification and tracking and assets monitoring. In this paper, the tasks of each user are assumed to arrive continuously, and the task arrival follows Poisson distribution. The probability of κ tasks arriving in l time slots can be expressed as where λ denotes the rate of task arrival. And the time interval of task arrival follows the exponential distribution with parameter λ. Moreover, multiple tasks belong to one single user can only be handled with a first-in-first-out policy, and a single task cannot be scheduled until the processing results of its previous task are sent back to the user. More complicated scenarios, where multiple tasks of a single user are scheduled simultaneously, will be considered in future work.
3.5. Latency Model. Since the task can be handled locally or offloaded to satellites, the latency τ m k of task k generated by user m will be analyzed with three possible cases.
If task k is handled locally by user m, the latency τ m k can be expressed as where τ wait k,m denotes the waiting latency in queue and waiting latency due to the local resources being occupied by the task being processed. τ process k,m = T m k /X m denotes the processing latency, T m k is the number of bits in task k of user m, and X m is the local computing resource of user m.
If task k is handled by satellite n a associated with user m, the latency τ m k can be expressed as where τ off k,m is the latency of offloading task k from user m to its associated satellite n a , and it consists of waiting latency τ w trans is the latency caused by sending the results back to user. Note that the transmission latency of return link is omitted, because the number of bits of computing results is usually very small. If the result can be returned to user within the coverage time of satellite n a , τ return k,m = d n a ,m /c, otherwise, τ return k,m = d n a ,n h + d n h ,m /c, and d n a ,n h denotes the routing distance from satellite n a to satellite n h , and d n h ,m denotes the routing distance from satellite n h to user m.
If task k is offloaded to satellite n p through satellite n a for processing, the latency τ m k can be expressed as where τ off k,m is the latency of offloading task k from user m to its associated satellite n a . τ ISL k,m is the propagation latency of task k routing from satellite n a to satellite n p through ISLs. Suppose that optical links are adopted for intersatellite communication, and the ISLs data rate is high enough, the transmission latency of ISLs can be omitted. Therefore, τ ISL k,m = d n a ,n p /c, in which d n a ,n p denotes the routing distance from satellite n a to satellite n p , and the routing strategy based on minimum distance is adopted. τ is the propagation latency of returning the result back to user. First, the result will be routed from satellite n p to satellite n h , which is the serving satellite of user m. And then, the result will be sent by satellite n h to user m. Thus, τ return k,m = d n p ,n h + d n h ,m /c.

Problem Formulation.
In this paper, we intend to achieve the collaborative computing among multiple satellites through ISLs and maximize the completion rate of user tasks under latency constraints. Thus, the problem can be formulated as where w m k = 1 when new task k arrives at user m, and v m k = 1 when τ m k ≤ τ max , otherwise, v m k = 0. τ max is the maximum latency constraint for the task k of user m.
Ψ l = f Ψ 1,l , Ψ 2,l ,⋯, Ψ M,l g and Ψ m,l ∈ f1, 2,⋯,Ng, Ψ m,l = n denotes that the task k of user m will be offloaded to satellite n at time slot l for processing.
O l = f O 1,l , O 2,l ,⋯, O M,l g and O m,l ∈ f0, 1g, O m,l = 1 denotes that the task k of user m will be scheduled to be transmitted or processed at time slot l. C l = fC 1,l , C 2,l ,⋯,C N,l g and C n,l = fC 1 n,l , C 2 n,l ,⋯,C M n,l g denote the communication resource allocated by satellite n to task k of user m at time slot l. X l = fX 1,l , X 2,l ,⋯,X N,l g and X n,l = fX 1 n,l , X 2 n,l ,⋯,X M n,l g denote the computing resource allocated by satellite n to task k of user m at time slot l. C n,l and X n,l are the available communication resource and computing resource of satellite n at time slot l, respectively. Φ n,l and Θ n,l denote the tasks scheduled in transmission queue and processing queue of satellite n at time slot l, respectively.
From (10), we can see that the completion rate of tasks is affected by the offloading decision, scheduling decision, and resource allocation at each time slot. In addition, the offloading decision, scheduling decision, and resource allocation at the current time slot l will affect the states of time slot l + 1. For instance, if the task cannot complete the transmission from user to satellite at time slot l, it cannot be processed at time slot l + 1, and the communication resource occupied cannot be released to other tasks. Thus, the problem formulated in (10) can be seen as a dynamic programming problem based on joint offloading decision, scheduling decision, and resource allocation, which is difficult to be solved with traditional methods. To tackle the problem, we will decompose the complex problem into two subproblems.

Task Completion Rate Optimization Based on Deep Reinforcement Learning
According to Section 3, task offloading and scheduling decision indicators are discrete, while the resource allocation variables are continuous. Therefore, the problem formulated in (10) is a dynamic mixed-integer problem, which is nonconvex and difficult to find the optimal solutions. To solve this problem, we decompose the problem into two subproblems to reduce its complexity. The first subproblem is communication and computing resource allocation with fixed offloading decision and scheduling decision, which will be solved based on max-min fairness. The second subproblem is the joint offloading decision and scheduling decision, which will be solved with a DRL-based algorithm. The two subproblems are analyzed in the following subsection A and subsection B, respectively.

Resource Allocation Based on Max-Min Fairness with
Fixed Task Assignment. With the fixed offloading decision Ω l , Ψ l , and scheduling decision O l at time slot l, the communication and computing resource allocation of satellite n can be formulated as two separated max-min fairness problems, which are listed as where τ trans n,m is the transmission latency for a given task of user m associated with satellite n, and it can be calculated after the task being scheduled from transmission queue and assigned with communication resource. τ pro n,m is the processing latency a given task of user m processed at satellite n, and it can be obtained after task being scheduled from processing queue and assigned with computing resource. Here, minimize the maximum latency is adopted as an optimization objective, and it is helpful to guarantee the latency constraints of every task.
Take (11) as an example, introduce auxiliary variable χ, the problem can be rewritten as Obviously, the objective function and the second constraint of (13) are both convex. Thus, we only need to prove that the first constraint is convex to prove that this problem is convex.
Let F = T m /χC m n,l , the constraint can be rewritten as F ≤ 1, ∀m. Find the second partial derivative of F, and the Hessian matrix of F can be expressed as With (14), we can easily get that all the principal minor of H F are nonnegative, and H F is a positive semidefinite matrix. Thus, the first constraint of (13) is convex, and problem (13) is a convex problem, which can be solved by the dual ascent method.

Construct Lagrangian functions L as
where μ m ≥ 0 and ν ≥ 0 are Lagrangian multipliers. Then, the dual function of L will be where C m * n,l = arg min LðC m n,l , μ m , νÞ. Since the problem defined in (13) is convex, the maximum value of the D is equivalent to the minimum value of the problem defined in (13). Thus, we can find the optimal solution of problem defined in (13), which is also the solution of problem defined in (11), via the method proposed in Algorithm 1.
In this paper, μ m and ν are seen to be converged when the value difference is less than 0.001 for 100 consecutive iterations. By continuously iterating the independent variable and Lagrangian multipliers alternately, we can find the optimal solution C m n,l for communication resource allocation. Similarly, X m n,l can be obtained.

Joint Task Offloading, Scheduling, and Resource
Allocation. With Algorithm 1, we can obtain the resource allocation solution for each time slot. However, the joint offloading decision and scheduling decision is still a nonconvex problem with integer programming problem, which cannot be tackled directly with traditional methods based on optimization theory. To address the problem with affordable complexity, we model it as an Markov decision process (MDP) problem and propose a DRL-based method to achieve long-term rewards in terms of task completion rate.
The MDP corresponding to the problem defined in (10) can be expressed as (1) State. The states are defined for every time slot, because the scheduling and resource allocation for tasks are managed slot by slot. The state at time slot l can be defined as h l = fP U ðlÞ, P S ðlÞ, T l ,Ξ l ,X U,l ,C l , X l , Q trans,l , Q pro,l g. P U ðlÞ and P S ðlÞ denote the location of users and satellites, respectively. T l = fT 1,l , T 2,l ,⋯,T M,l g denotes the bits of tasks currently waiting to be scheduled.Ξ l = fΞ 1,l ,Ξ 2,l ,⋯,Ξ M,l g, and Ξ m,l ∈ f1, 2,⋯,Ng denotes the satellite associated with user m at time slot l.X U,l = fX 1,l ,X 2,l ,⋯,X M,l g andX m,l ∈ f0, 1g,X m,l = 1 denotes that the local computing resource is occupied at time slot l.C l = fC 1,l ,C 2,l ,⋯,C N,l g denotes the communication resource of satellites occupied by users at time slot l.
Similarly,X l = fX 1,l ,X 2,l ,⋯,X N,l g denotes the computing resource of satellites occupied by users at time slot l. Q trans,l = fQ 1 trans,l , Q 2 trans,l ,⋯,Q N trans,l g and Q pro,l 6 Wireless Communications and Mobile Computing = fQ 1 pro,l , Q 2 pro,l ,⋯,Q N pro,l g denote the total bits of tasks that wait for transmission and processing in the queue of satellites, respectively (2) Action. For each time slot l, the action consists of offloading decision, scheduling decision, and resource allocation of user's current tasks. Since we can obtain the resource allocation with Algorithm 1, we only need to define the action space for offloading decision and scheduling decision with lower dimensions. Thus, the action at time slot l can be defined as a l = f A 1,l , A 2,l ,⋯, A M,l g. A m,l = fA off , A exe g, in which A off ∈ f0, 1,⋯,Zg denotes the satellite that will handle the current task of user m at time slot l, and A exe ∈ f0, 1g, A exe = 1 denotes that the current task of user m will be scheduled from the queue at time slot l. Otherwise, the task will be kept on waiting in the queue for scheduling. With a specific action a l , the offloading decision and scheduling decision of all current tasks at time slot l can be obtained correspondingly (3) Transition Probability. For MDP, the transition probability from one state to another is needed for any action a l . However, it is difficult to get the accurate probability for all of states h l and actions a l , because the states space and action space are too large. In this paper, a method based on model-free DRL is considered (4) Reward. To maximize the completion rate of tasks, the reward Rðh l , a l Þ at time slot l with state h l and action a l is defined as where τ k denotes the latency defined in (8) or (9) for task k at time slot l. R p is a constant value that makes the R p − τ k positive. R d is a positive completion reward, and d denotes the number of tasks which are completed within the latency constraints in time slot l. Given the action policy π, value function Vðh | πÞ, which can be used to evaluate the long-term performance of the policy π, is defined as where γ denotes the discount factor, and the value function can be seen as an expectation of completion rate defined in (10) with γ = 1. Thus, the optimal policy π * can be expressed as where state h can be obtained with action a and state h. In this paper, deep Q-network (DQN) [28,29], which is composed of target network and main network, is adopted to obtain the target Q-value Q * ðh, aÞ. Moreover, the approximated Q-function Qðh, a ; θÞ will approach Q * ðh, aÞ via training process by minimizing the loss function, which can be defined as LðθÞ = E½ðQ * ðh, aÞ − Qðh, a ; θÞÞ 2 . And θ is the weight of network. The detailed description and analysis for the processes of DQN can be found in [23]. The proposed joint task offloading, scheduling, and resource allocation (JTOSRA) approach for collaborative computing among LEO satellites is shown in Algorithm 2, where G denotes maximum of training step, and ζ denotes the experience replay buffer. ε-greedy policy is utilized to balance the exploration and utilization of models [29], and ε will decay from 1.0 to 0.001 through 20000 steps.
Generally, it is hard to obtain the accurate computational complexity of a DRL-based algorithm. In Algorithm 2, the computational complexity of the DQN network mainly depends on the number of users and the network structure

Simulation Configurations.
Simulation parameters are listed in Table 1. In the simulation, we focus on the users located in a specific area covered by the LEO satellites. The simulation time starts at 0 : 00 on October 1, 2020, and the Greenwich hour angle θ g 0 at this moment is 10.2. The communication capacity of satellite is set to 10 Gbps. The CPU cycle needed for processing is set to 1000 cycle/bit [31]. And we set the satellite computing capacity and local computing capacity to 10 GC/s [32] and 1.5 GC/s [33], respectively. Figure 2 shows the convergence of the loss function. It can be seen that the loss function defined in LðθÞ will converge when the training steps increase. As shown in Figure 3, the completion rate of tasks will also increase during the training processes. Figures 2  and 3 demonstrate that the JTOSRA based on DQN is applicable for the problem formulated in Section 3. Though a large amount of training steps is needed to achieve convergence, the training processes will only be implemented in the initial phase of the LEO network. Once the convergence is achieved, joint task offloading and scheduling decisions can be made step by step with low complexity. Moreover, the training processes can be implemented offline via pretraining processes to decrease the complexity further.

Input:
IoT terminal information: P U ðlÞ,X U,l , T l Satellite information: P S ðlÞ,Ξ l ,C l ,X l Queue information: Q trans,l , Q pro,l Output: Offloading and scheduling decisions: Ω l , Ψ l , O l Resource allocation: C l , X l 1: Initialize network with γ, ε and ζ. 2: Initialize state h l 3: While l < G 4: Select an action a l according to ε-greedy policy. 5: Allocate resource according to Algorithm 1. 6: Calculate reward Rðh l , a l Þ. 7: Update next state h l+1 . 8: Save ðh l , a l , Rðh l , a l Þ, h l+1 Þ, and update ζ. 9: Update θ. 10: l ++. 11: End while Algorithm 2: Joint task offloading, scheduling, and resource allocation based on DQN. To analyze the performance of MLMRA with fixed task assignment, two reference schemes are adopted and compared in terms of completion rate. The referred resource allocation scheme is listed as (i) Average Resource Allocation (ARA). The resource will be allocated evenly to tasks (ii) Resource Allocation Based on Latency Minimization (LMRA). The resource will be allocated to tasks by minimizing the sum latency of tasks Figure 4 shows the influence of the number of tasks on the completion rate with different resource allocation method. It can be seen that the completion rate of the task will decrease along with the increase of the number of tasks. Moreover, the MLMRA performs better than the LMRA, and the ARA algorithm performs the worst. This shows that the MLMRA can allocate resources more equitably and minimize the maximum latency of the task. This is because that the LMRA focuses on reducing the sum latency of tasks, while the MLMRA will allocate more resources to tasks that are difficult to be completed within the limited latency. In general, the proposed MLMRA algorithm can effectively improve the completion rate of the task with fairness among tasks.
To evaluate the performance of the JTOSRA, the following two algorithms are introduced for task assignment: (i) Random. Tasks will be offloaded and scheduled randomly, and resources will be allocated according to LMRA and MLMRA. And in Figures 5-7, the methods are labeled as random-LMRA and random-MLMRA, respectively (ii) Simulated Annealing (SA). Tasks will be offloaded and scheduled through the SA algorithm, and resources will be allocated by LMRA and MLMRA. In Figures 5-7, the methods are labeled as SA-LMRA and SA-MLMRA, respectively In addition, in order to ensure the validity of simulation results, each point in the figures is obtained by taking the average value over multiple tests, and each test lasts for 200000 slots. Figure 5 shows the completion rate performance of the algorithms as the number of users increases. With the increase of the number of users, the number of tasks waiting for scheduling will increase, and the shortage of resources will lead to the decline of the task completion rate. Obviously, the proposed JTOSRA algorithm performs better than the SA algorithm. And for the JTOSRA, the task completion rate decreases slower than that of the SA algorithm as the number of users increases. This is because the SA algorithm tends to fall into local optimal solutions, resulting in poor algorithm performance. On the other hand, traditional algorithms such as the SA algorithm can only optimize the decision for a specific time slot, but cannot continuously optimize the offloading and scheduling decisions for multiple time slots, and the algorithm needs to iterate at each step, which brings high time cost. However, the proposed JTOSRA algorithm can continue to accumulate experience in the decision-making process to optimize the   In Figure 6, satellite communication capability is adopted as variable to investigate the performance of the proposed algorithm. It can be seen that the increase of satellite communication resources, which means that more communication resource can be allocated to tasks, will lead to the increasing of completion rate. But the increasing rate of the curve decreases with the rise of satellite communication resources. This is because that communication resource is the main factor influencing the latency of tasks when the amount of communication resources is small. When the amount of communication resources of satellites increases to a certain value, the computing resources of satellites will become the dominant factor, which will mainly affect the latency of tasks. In addition, the performance of the JTOSRA algorithm is still better than that of the other two algorithms, and the performance of the MLMRA algorithm is better than that of the LMRA algorithm. Similarly, the completion rate with respect to satellite computing capability is shown in Figure 7. The variation trend of each curve in the figure is close to that in Figure 6.

Conclusion
In this paper, collaborative computing and resource allocation for LEO satellite networks are investigated. A framework for collaborative computing among LEO satellites with varying topology is proposed, and the joint task offloading, scheduling, and multidimensional resource allocation problem is divided into two subproblems with low complexity. JTOSRA based on DRL and max-min fairness is proposed to solve the problems, and simulation results demonstrate that the JTOSRA outperforms the referred schemes in terms of task completion rate.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.