Space Information Network Resource Scheduling for Cloud Computing: A Deep Reinforcement Learning Approach

With the development of satellite technology, space information networks (SINs) have been applied to various fields. SINs can provide more and more complex services and receive more and more tasks. The existing resource scheduling algorithms are difficult to play an efficient role in such a complex environment of resources and tasks. We propose a resource allocation scheme based on reinforcement learning. Firstly, according to the characteristics of the resources of SINs, we established the cloud computing architecture of SINs to manage the resources uniformly. Next, we adopt a variable granularity resources clustering algorithm based on fuzzy and hierarchical clustering algorithms. This algorithm can adaptively adjust the resource size and reduce the scheduling range. Finally, we model the resource scheduling process by a reinforcement learning algorithm to solve the joint resource scheduling problem. The simulation results show that the scheme can effectively reduce resources consumption, shorten the task execution time, and improve the resource utilization efficiency of SINs.


Introduction
As an essential platform for all kinds of space applications, SINs have the advantage of global coverage. It can provide meteorological, positioning, and communication services and is outstanding in navigation, rescue, and remote sensing applications. SINs can provide collaborative management for all kinds of space equipment and improve the utilization efficiency of space equipment, which has been widely concerned by researchers [1][2][3]. After years of SIN development, all kinds of spacecraft have made significant progress, constantly improving the load capacity of resources and enriching all types of tasks. Although the diversity of mission execution has been dramatically improved, it is difficult to quickly break with the limited resources on the satellites. Therefore, with the continuous development of technology and the increasing number of tasks, the resource allocation ability of SINs needs to be increased. The current resource allocation technology and network architecture are difficult to support the resource allocation of SINs with more and more complex tasks [4].
SINs need innovative network architecture and scheduling scheme. In recent years, researchers have done a lot of work on the architecture of SINs. Yang et al. applied software defined network (SDN) to satellite networks to solve the problems of isolation and private use of satellite network resources. They designed software defined satellite network (SDSN) architecture [5]. Zhang et al. proposed a software defined air-ground integrated network architecture to enable network resources to switch freely [6]. In addition to applying SDN to SINs, edge computing is also concerned by researchers. Zhu et al. proposed the edge computing architecture of satellites to unload computing tasks to visible satellites for execution [7]. Wang et al. solve the capacity limitation of resources by applying the new mobile edge computing of satellite network architecture [8]. There are few studies on eliminating resource heterogeneity and the sharing architecture between resources in the above studies. Due to the limited and dynamic resources of SINs, a new architecture is needed to share the resources of SINs. Cloud computing is worthy of our reference. This paper proposes the cloud computing architecture of SINs.
In addition to the network architecture, there has been some critical research on the resource scheduling algorithm of SINs. Early researchers mainly applied dynamic programming and search algorithms to model to complete satellite resource scheduling. Bianchessi et al. studied the management of multiple satellites through a tabu search heuristic algorithm [9]. Liu et al. modeled the multisatellite scheduling problem as a dynamic constraint problem and solved the search algorithm for the new tasks [10]. Habet et al. modeled satellite scheduling as a hybrid system search and tabu search algorithm and constrained by visible time window [11]. Lemaıˆtre et al. studied satellite resource scheduling through a dynamic programming algorithm and analyzed the advantages of the proposed algorithm by comparing greedy algorithms [12]. Wang et al. designed a scheduling decision system to manage satellite resources through a nonlinear model and applied priority heuristic ideas and conflict avoidance in the system [13]. Dilkina and Havens improved the simulated annealing algorithm, spread the observation time window of satellite resource scheduling to the local search process, and carried out satellite resource scheduling through a model annealing algorithm [14]. Li et al. research on satellite resource scheduling through combinatorial genetic algorithm [15]. Geng et al. applied the genetic algorithm to satellite resource scheduling by improving the coding mode of the genetic algorithm. They improved the genetic algorithm by using the mixed coding mode of binary and integer coding [16]. Marinelli et al. studied the scheduling problem of satellite resources through a heuristic prime search algorithm [17,18].
There are many single objective optimization algorithms in the above algorithms. With the increasing business of SINs, satellite resource scheduling needs to consider multiple objectives. Many researchers have studied the multiobjective satellite resource scheduling algorithm. Tangpattanakul et al. optimize the satellite user demand target as a local search and optimize the total target revenue and user fairness in the algorithm design [19]. Sun et al. applied four objectives to complete task planning during optimization so that the scheduling results can better meet users' needs [20]. In Li et al.'s study, when designing the algorithm, multiple objectives such as profit and the number of observation tasks are considered. A coding and decoding method is proposed using heuristic rules. The results show that the performance of this method is remarkable [21].
Heuristic algorithms are often used in multiobjective optimization in the above research, and these algorithms have several problems. The operator and parameter setting in the heuristic algorithm depend on expert experience, and the results are pretty different when the parameters are different [22]. With the increasing complexity of the environment and the increase of constraint problems, the search process of the heuristic algorithm will increase sharply, and the difficulty of algorithm solutions will increase [23]. Recently, reinforcement learning, an essential branch of machine learning, can better analyze dynamic decisionmaking problems, which provides ideas for solving the above issues. At present, there are many reinforcement learning algorithms, such as Deep Q-Learning Network (DQN) [24] and Deep Deterministic Policy Gradient (DDPG) [25]. Reinforcement learning algorithm has been widely used in resource allocation in the Internet of things and cloud environment [26][27][28][29][30]. Rjoub et al. proposed a variety of scheduling algorithms based on reinforcement learning for cloud computing resource scheduling. Experiments show that deep reinforcement learning combined with the LSTM (Long Short-Term Memory) algorithm has good performance [31]. Arisdakessian et al. studied the integration of cloud computing resources in the IoT and proposed a multistandard intelligent scheduling method using game theory, which has good ability in service execution time and resource integration [32]. Cheng et al. studied the impact of resource supply and task scheduling on energy in cloud computing and proposed a DRL-Cloud algorithm to effectively reduce the energy cost [33]. In recent years, reinforcement learning algorithm has attracted much attention in the research of resource scheduling in SINs. Qiu et al. jointly manage the satellite network's network, cache, and computing resources through the deep reinforcement learning method [34]. Deng et al. applied deep reinforcement learning to model the allocation process of satellite network resources, showing the specific function of satellite resources allocation through deep reinforcement learning [35]. Liao et al. proposed a cooperative multiagent deep reinforcement learning framework to realize the management of wireless resources in satellite networks [36]. In addition, other researchers have applied reinforcement learning algorithms to satellite network resource scheduling [37][38][39][40].
The reinforcement learning algorithm can overcome the shortcomings of the heuristic algorithm. Still, in most studies of SINs, there are few multiobjective optimization studies to reduce the use of resources directly. The reinforcement learning algorithm can overcome the shortcomings of the heuristic algorithm. Still, in most studies of SINs, there are a few multiobjective optimization studies to reduce the use of resources directly. At the same time, there is less research on the diversity of SIN resources. Because SINs contain a variety of resources, dividing and matching a variety of resources in detail are also a problem to be considered in SIN resource scheduling. To solve the above issues, this paper proposes DQN Resource Scheduling (DQNRS) algorithm. Based on the DQN algorithm, this algorithm makes a detailed analysis of SIN resources and focuses on the use and types of SIN resources. The main contributions of this paper are summarized as follows: (1) We designed the cloud computing architecture of SINs and divided the SINs into space service facilities, space IoT facilities, and ground service facilities. SIN architecture is the first step of space resource scheduling. A perfect SIN architecture can improve the efficiency of resource scheduling In the traditional satellite architecture, due to the insufficient processing capacity on the satellite, the data information obtained by the satellite can only be transmitted to the ground for processing and forwarded to users. There are some problems in dynamic space networks, such as waiting for the return. To solve the above issues, we designed the SIN cloud computing architecture. The architecture can share the resources of SINs and realize task processing in various situations. As shown in Figure 1, the left half of Figure 1 is the infrastructure part of the cloud computing architecture, and the right half is the layered function of the cloud computing architecture. The architecture realizes the efficient utilization of SIN resources through the cloud data center. It applies to cloud computing ideas to integrate heterogeneous computing resources, storage resources, and perceptual resources in SINs. It can realize the resource sharing and collaborative service of SINs. The cloud computing infrastructure layer of SINs includes high orbit satellite networks based on geosynchronous satellites, near-earth satellite networks of medium and low orbit satellites, and ground servers. As shown in Figure 1, it is divided into space service facilities, space IoT facilities, and ground service facilities. The space service facility consists of high orbit satellites, including multiple geosynchronous satellite clusters. It can always be connected with near-earth satellites and ground servers. It can provide various types of services and can analyze and process tasks without returning to the ground. It is the space service center of cloud computing architecture. Space IoT facilities are composed of near-earth satellites, including all kinds of satellites flying around the Earth. It can provide high-quality earth observation services, collect various signals in realtime, and transmit information and data to space or ground servers. The ground service terminal is composed of several ground service centers. The service center can carry out large-scale computing and storage. When the service capacity of the space service terminal is insufficient, it can transfer the tasks back to the ground service facilities. It can quickly perform various tasks of the SINs. It is the ground service center of cloud computing architecture. Finally, the hardware facilities of the infrastructure layer are virtualized to generate a virtual resource pool for scheduling by the cloud computing management layer to prepare for scheduling by the management layer. For the infrastructure of space networks, SDN technology is used to manage the spacedispersed satellites' physical nodes. This method realizes the data processing platform of "physically distributed and logically unified." The satellite server nodes and the general control center are virtualized by virtualization technology to provide services such as computing, storage, and data forwarding for service requestors. We can realize the functions of collaborative management, flexible scheduling, and ondemand expansion of resources through virtualization technology.
The cloud computing management layer of SINs can provide functions such as resource monitoring, task receiving, and task decision-making distribution. This layer can access the resource pool of the infrastructure layer. At the same time, this layer can provide services for the application layer, including navigation, communication, and remote sensing. After the application layer user submits the task requirements, the management layer makes decisions through the scheduling algorithm according to the task requirements and resource environment. It sends the tasks to various resources in the resource pool.
The cloud computing application layer of SINs is a service that the management can provide through resource scheduling decisions. Relying on the facilities and data in the SINs and using the current satellite processing technology, it can provide remote sensing, navigation, and communication services. The application layer is responsible for returning the task results processed by the management layer to the users. At the same time, the application layer is responsible for receiving the requirements of the tasks submitted by users and sending the task requirements to the management layer for decision algorithm resource scheduling.
The cloud computing architecture of SINs can share all kinds of resources, realize the interconnection between resources, realize large-scale task processing through the cooperation between resources, and realize the operation of data processing, acquisition, and transmission without returning information to the ground.

Space Cloud Environment Infrastructure Layer Resource
Processing. The infrastructure in the space cloud environment includes a variety and many resources, and the scale of resources is complex. However, the task requirements in 3 Wireless Communications and Mobile Computing the space cloud environment are very changeable, and a single resource dimension is difficult to meet the task requirements. The resource forms with different granularity are an effective way to execute tasks quickly. We propose a variable granularity adaptive clustering algorithm. Firstly, the particles are obtained through the clustering algorithm, then, the particles are transformed through the aggregation network, and finally, the appropriate particle layer and clustering results are obtained.
In SINs, resources are usually heterogeneous, and different resources can provide various capabilities, such as computing, storage, communication, and perception. Most scheduling studies do not distinguish between resource capabilities, increasing the cost of scheduling, and there may be unreasonable scheduling. These problems will have a specific impact on the completion time of the tasks and the utilization of resources. In SINs, some tasks may have a large amount of data, but the amount of calculation of these tasks is tiny, such as shooting image storage. However, some other tasks may require a lot of computing, such as various decision tasks, and some tasks may not require extensive data and computing but need real time, such as the display of some video data. Therefore, we use a variable granularity clustering algorithm to cluster the resources in SINs, and we need to consider the characteristics of SIN resources fully. We distinguish resources as follows: computing resources, storage resources, communication resources, navigation resources, and perception resources. These resource attributes can abstractly represent various resources' functions and jointly define the resource capabilities of the space cloud environment.
The resources model of SINs can be represented by R = fr cal , r stor , r tran , r nav , r per g, where r cal represents the com-puting power of the resource, which is calculated by the number of instructions executed per second; r stor represents the storage capacity of resources, which is calculated by the storage capacity of resources; r tran represents the transmission capacity of resources, which is calculated by the bandwidth of resources; r nav represents the navigation capability of the resource, which is calculated by the coverage of the resource; and r per represents the perception ability of resources, which is calculated by imaging separation rate.
Multiple resources can form a resource cluster, which is represented by R total = fR 1 , R 2 , ⋯, R n g. A resource cluster is a collection of virtual resources in a virtual resource pool. The result of resource clustering is to find different levels of virtual sets, namely, resource clusters, through a clustering algorithm. The granularity of each virtual resource set is different, which can provide accurate services for different task requirements. The scheduling algorithm puts forward additional requests for resources with different granularity according to the needs of tasks, which can effectively reduce the resource search space and speed up the speed of resource scheduling.
Resource variable granularity clustering algorithm is completed by secondary clustering. The first clustering is fine-grained clustering, which is used to analyze the internal characteristics and structure of resources in detail-the different results obtained by the first clustering complete the solution of particles in the aggregation network. Next, granular clustering is completed. Granular clustering is mainly the aggregation of particles under different similarity thresholds. The similarity threshold is continuously adjusted to find the appropriate clustering particles and particle layers in the clustering process. If the clustering particles are too fine, the aggregation particle layer will be dense and slow   Wireless Communications and Mobile Computing scheduling time. If the clustering particle is too coarse, the aggregation layer is sparse, and the scheduling is inaccurate.
After determining the resource model and characteristics, according to variable granularity clustering, the resource clustering of SINs is divided into the following stages: data standardization, resource classification, cluster particle generation, and cluster particle layer generation. The clustering process is discussed in the steps below.
2.2.1. Establishing Data Matrix. According to the five attributes of the resource attribute set R = fr cal , r stor , r tran , r nav , r per g, the original data matrix of SIN resources can be listed It is necessary to normalize the values of various resources to a certain range, eliminating the differences caused by the significant differences of a certain type of values. The first step is to standardize the data and calculate the average value and standard deviation of resource capacity. Through this operation, standardized data r i ′ can be obtained, as shown in In the formula, r k = 1/n∑ n i=1 r ik is the average value of the kth attribute of the resource.
is the standard deviation of the resource attribute. In addition to the above solution, dimensional elimination calculation is also required. The calculation formula is shown in 2.2.3. Resource Classification. Calculate the proportion of each attribute capability value of each resource, and get the attribute capability with the highest proportion of the resource according to the proportion value. Then, the resource category is this kind of resource. The classification formula (4) is expressed as The resource category is obtained according to C ik . The most extensive resource attribute capability in each category is the category of the resource. Firstly, the resources of a resource category are divided into a resource cluster. Similarly, m resource clusters can be obtained. Finally, each resource cluster will be fuzzy clustered.

Cluster Particle Generation.
A fuzzy matrix calculates particles. In the first step, the fuzzy similarity matrix needs to be calculated and measured by the similarity between resources. The similarity of resources R i and R j is expressed as p ij . We use the exponential similarity coefficient method to calculate the similarity, so we can use the similarity to construct the fuzzy similarity matrix. The calculation method p ij is shown in where m is the number of resource attribute capability indicators and r ik ′ ′ is the k-dimensional performance indicator of the resource r i . We can construct the fuzzy similarity matrix P = ðp ij Þ m×m between resources according to r ik ′ ′.
Transitivity is necessary when solving clustering results, but the fuzzy similarity matrix does not have this property. It is essential to calculate the fuzzy equivalent matrix further and solve it by the transitive closure method based on the fuzzy similarity matrix. This solution process P′ is obtained by circularly calculating the similarity matrix P, and r ij does not change at the end of the calculation to get the fuzzy equivalent matrix P′. The solution method of transitive closure is shown in Different thresholds can obtain other clusters. At this time, particles are obtained. The performance of particles is different, but the difference is slight.
2.2.5. Cluster Granular Generation. The obtained particles are hierarchically clustered, and the particle layer is obtained by average group proximity. Single join calculates the distance between the two most similar samples in each pair of clusters and merges the clusters of the two nearest samples. For a cluster composed of one or more samples in each particle, the single connection calculation method finds the minimum distance between each cluster sample and merges two clusters.
The resource cluster obtained by variable granularity clustering can be expressed as where i represents the granular layer of resources after clustering, j represents the resource attributes after classification in resource clustering, and m represents the number of resource clusters under this granular layer.

Wireless Communications and Mobile Computing
According to the analysis of variable granularity clustering of resources, the clustering algorithm is designed in detail, as shown in Algorithm 1. Firstly, the fuzzy clustering algorithm is used to cluster the resource dataset for the first time to obtain the cluster particles. Then, the hierarchical clustering algorithm is used to cluster the dataset for the second time to form the resource particle layer. If the particle is larger than the task demand, the particle clustering is not satisfactory enough. Repeat particle clustering until the mean value of particles is less than the task demand.

Space Cloud Environment Management Layer Resource
Scheduling Algorithm. After the infrastructure layer resources of the space cloud environment are processed, the tasks to resource matching decisions need to be completed at the management layer of the space cloud environment. To carry out efficient resource scheduling, in this section, we design a resource scheduling algorithm suitable for the space cloud environment based on DQN under the constraints of task execution time, resource usage, and location coverage in SINs. This section describes the resource scheduling of the SINs as after the cloud environment application layer receives the task, it decides the management layer, which infrastructure layer resources will be used to perform the task. Our goal is to overlap the task demand location and resource coverage location, minimize the execution time of the task, and occupy all kinds of resources in the information network as little as possible.

Problem Constraints.
According to the resource scheduling characteristics of SINs, the constraints of the model are as follows.
(1) Time Consumption Model. The total time required for the task T i to execute the process on the virtual resource cluster R j is expressed in totalTime ij , where T i and R j are the comprehensive representation of task and resource modeling, respectively. The total time totalTime ij consists of the waiting time waitTime ij before task execution and workTime ij during task execution, which is expressed by where waitTime ij represents the time that the task needs to wait for the resource cluster before executing on the resource cluster and workTime ij is the execution time after the task is deployed on the resource cluster, expressed by T need R ability , t worktime = 0: t worktime represents the execution time attribute in the task. If the task is continuous, the execution time of the task is directly equal to the time. If the task is timely, the execution time of the task needs to be calculated. The execution time is calculated by the resource amount T need required by the task and the resource amount R ability that the resource can provide.
(2) Resource Occupancy Model. The cluster's resources R j are occupied during task execution, and the resource amount of the cluster is represented by R j ability . The resources occupied by the execution process can be determined by the execution time workTime ij and the amount of resources R j ability , so the total resource usage obtained by the task according to the deployment plan P is expressed by After one of the resource clusters is determined, the resource cluster's ability when performing tasks is also certain, and the use of resources per unit time can be determined.
(3) Location Coverage Model. The request area of the task T i and the coverage area of the virtual resource cluster R j sometimes do not coincide. We need to determine the visibility of the satellite and the ground request area. It is assumed that the Earth is a spherical body with uniform mass, the radius of the Earth is r e , the height of resource cluster R j is h, and the subsatellite point of the satellite on the Earth's surface is R ′ . In the figure below, as the tangent line between the satellite and the ground (tangent point M), the earth surface area surrounded by the line tangent between the satellite and the ground surface is called the instantaneous maximum coverage area of the satellite. β = ∠ROM is called the coverage angle of satellite to ground, α = ∠ROM is called the field angle of view of the satellite to the ground, as shown in Figure 2.
Due to the occlusion of objects on the Earth's surface, the effect of satellite observation and communication near the edge of the largest coverage area of the Earth's surface is usually not very good. To improve the communication and observation effect of the satellite on the Earth's surface area, it is stipulated that the included angle between the satellite line of sight and the tangent plane passing through M on the ground shall not be lower than a certain angle, which is called the minimum observation angle. The coverage angle corresponding to the minimum observation angle is β σ , and the corresponding satellite field angle is α σ . The above analysis shows that the larger the minimum observation angle σ, the smaller the corresponding β σ , and the better the communication and observation effect of the satellite on the Earth's surface area.
The longitude and latitude of the point of the subsatellite of resource R i are (longitude i , latitude i ). The longitude and latitude of the task request area R j are (longitude j , latitude j ). At this time, the geocentric angle Thus, the smaller the value β ij , the larger the corresponding coverage area. Thus, the smaller the value β ij , the better the communication and observation effect of the corresponding coverage area. Therefore, in the process of selecting resources, the task chooses β ij smaller resources as much as possible.

Resource Scheduling Algorithm based on DQN.
Aiming at the corresponding problem model and resource preprocessing, we express the resource scheduling problem of SINs as a DQN process. We improve the state space, behavior space, and reward design to adapt to the resource scheduling process. Q-learning is one of the classical reinforcement learning algorithms and one of the artificial intelligence algorithms. The algorithm can store actions and states through Q-table (s, a). The expected Q value of the current state can be expressed as where r is the reward, which can be obtained through strategy π, γ is the discount factor, and ðs′, a′Þ is the next state and corresponding action, respectively. The Q-table is constantly updated during the algorithm's execution, and the environment state s is initialized at the beginning of the algorithm's execution. The execution process is represented by episode. Each episode will calculate the corresponding reward r and the next Input: R total = fR 1 , R 2 , ⋯, R n g , R T //Resource dataset, Task required resources Output: C cluster , C layer // The clustering results get the clustering cluster and the clustering layer    According to this method, reinforcement learning can be applied to complex state spaces. The detailed idea of the DQN algorithm is to generate action a and reward value r by using a neural network model and training network parameter weight vector θ and bias b. The input and output of the neural network are state s and action values a, respectively. The specific process is shown in Figure 3. After the neural network outputs the Q value, we should explore enough action space and learn as many states and action sets as possible. Researchers generally use ε-greedy strategy for action selection, but the exploration probability of ε-greedystrategy for all actions is equal, and a certain state of this method may not fully explore the action space. In the case of large action space and multiple suboptimal solutions, insufficient exploration may lead to the network falling into the local optimal solution and difficult to jump out. To solve the above problems, we use Boltzmann exploration [41] for action selection. This method can avoid completely random action selection, and the action selection probability is expressed as where β is the exploration coefficient. The larger the coefficient, the smaller the exploration range. This method takes the Q value as the probability basis, which can ensure that when the Q values of multiple suboptimal solutions are close, each suboptimal solution has a high probability to be explored, and some poor choices are explored as little as possible. This paper combines εgreedy method and Boltzmann exploration method, namely, ε-Boltzmann, to make action selection. We use the probability of ε to explore the action according to the above formula (13); otherwise, we choose the best action.
To better predict the Q value and the next action, it is necessary to continuously train the parameters of the neural network. The parameters of the neural network will be updated by backpropagation and gradient descent. The goal of DQN is to keep the Q value close to the target-Q-value.

Wireless Communications and Mobile Computing
The target-Q-value can be calculated by the Q-learning algorithm. In this paper, the mean square error is used as the loss function to update the parameters of the DQN neural network, as shown in where r + γ max a′ Qðs ′ , a ′ ; θ − Þ in the process of obtaining the target-Q-value using the target network, s ′ is the state at the next time, a ′ is the action selected when executing the greedy strategy, and the parameter θ − is the corresponding parameter of the target network. r + γ max a ′ Qðs ′ , a ′ ; θ − Þ is the process of obtaining the target-Q-value using the target network, s′ is the state at the next time, a′ is the action selected when executing the greedy strategy, and the parameter θ − is the corresponding parameter of the target network. The parameter of the evaluation network is θ, and the corresponding estimated value is Qðs, a, θÞ. As shown in Figure 3, the working process of the whole DQN continuously updates θ and θ − through the loss function, and the training goal is to adjust the network parameters to minimize LðθÞ, where θ is updated at each step and θ − is updated after multiple steps to reduce the correlation between networks. Due to the dynamic changes of task request location and required resources, resource scheduling should consider the resource environment of the whole SINs. In this paper, a resource scheduling algorithm DQNRS based on the DQN framework is proposed to minimize task execution time and resource occupation to complete resource scheduling. The algorithm ensures that the resource scheduling requirements of SINs are fully considered through three parts.
(1) State. The status consists of task execution time, resource consumption, and coverage angle. Therefore, the status can be expressed as s = fTotalTime, TotalUse, βg, and the task status can be determined by the task's execution time, the occupation of resources, and the coverage angle, respectively.
(2) Action. In this algorithm, the action is composed of T and R, and the action vector can be given as a = fa r t ,∀t ∈ T, ∀r ∈ Rg, where a r t indicates that the agent reasonably 10 Wireless Communications and Mobile Computing schedules the resource R according to the task T for task execution.
(3) Reward. After selecting the action in each step, the agent can obtain certain benefits. Through the calculation of benefits, the algorithm can achieve the expected results to minimize the time and resource occupation in task execution. At the same time, location coverage is crucial in task execution, so the benefits also need to be considered to make the location coverage as large as possible. Therefore, these three objectives should be regarded as in the reward based on environmental feedback. We express the DRL agent's return from the environment as where TotalTime represents the total time of task execution, TotalUse represents the total resources occupied during task execution, and β represents the location coverage angle.
As shown in Figure 3 and Algorithm 2, given the state, action, and reward, we first establish the evaluation and target network of random parameters θ and θ − and store state s and action a by building an experience playback pool. In the process of resource allocation for each task k, we first initial-ize the state. Then, for each step i, the input state s of the network is evaluated, and the action a is selected according to the output of the evaluation network through the ε-Boltzmann strategy. We get rewards r and the next state s ′ through action a. To reduce the correlation of samples, we store the obtained (s, a, r, s ′ ) in a playback pool D to update the estimated network.
The parameters of the evaluation network and the target network are updated using the loss function (1). The parameters of the target network are updated after multiple steps. Resource scheduling is aimed at making resources cover the required location of tasks and minimize task execution time and resource usage. These goals can be achieved through our predefined rewards. In the process of SIN resource scheduling, if the task demand location is closer to the resource service location or the resource usage is smaller, the agent will obtain positive feedback. With the continuous accumulation of experience, the tasks in the space cloud environment can be allocated to more appropriate resources.

Experimental Settings.
We simulate the cloud computing architecture of SINs through STK and can obtain visibility and location information between satellites and resources.  Table 1.
We record the satellite position in one cycle of the simulation model. We divide the period into multiple time slices and disregard the change in the satellite position in each time slice.
The number of neural network layers of DQNRS is 3, and the learning rate of the reward value is γ = 0:9. When applying the strategy ε-Boltzmann, β = 1, we set the initial value of the probability P ε of the Boltzmann exploration to 1 and gradually decrease it at the rate α = 0:8 during the training process. When P ε = 0:3, its value remains constant. We mainly compare the latest SIN scheduling DRL-VNE algorithm [42] with the classical resource scheduling strategy, heuristic ACO algorithm [43], which is a first come first serve (FCFO) scheduling strategy [44] that is commonly employed in cloud computing scheduling, and random scheduling algorithm.

Simulation
Result. Next, we will analyze the impact of tasks and resources on task execution time, resource usage, and task execution success rate. Figures 4-6 show the performance changes when the number of satellite nodes is 100 and the corresponding tasks are increased. Figures 7-9 show the performance changes when the number of satellite nodes increases and the number of tasks is 500. One satellite node can generate multiple virtual machines in the cloud environment. Figure 4 shows the corresponding task execution time after a task is added. With an increase in the number of task requests in the SINs, the task execution time rises. When there are few tasks in the SINs, the task execution time of our algorithm is not much different from that of other algorithms. When the task increases gradually, our algorithm has more significant advantages than other algorithms. The main reason is that it is easier for the algorithm to identify relevant resources when there are few tasks, so the execution times of several algorithms are similar. The DQNRS

12
Wireless Communications and Mobile Computing algorithm can fit the Q-table through the neural network and has a better effect in a significant environmental state. The algorithm optimizes the task execution time so that DQNRS can identify the appropriate task and resource matching strategy after the number of tasks increases. Simultaneously, the DQNRS algorithm clusters resources in the resource preprocessing stage, so it can match resources faster and achieve a better execution time than other algorithms, such as DRL-VNE. Figure 5 shows the change in the corresponding task execution success rate when the tasks of the SINs increase. Task execution failure is mainly attributed to the invisible waiting time and location coverage of multiple tasks after the tasks are allocated to resources. Among them, the DRL-VNE algorithm performs best, while the task execution success rate corresponding to the random algorithm decreases significantly with an increase in the number of tasks. Although the task execution success rate of the DQNRS algorithm also decreases, it can better maintain the task execution success rate because it considers the task execution time, resource usage, and coverage area as dynamic optimization objectives. Intelligent experience dynamically adjusts the optimization objectives according to the changes in the resource pool so that more tasks can be executed successfully.
As shown in Figure 6, the increase in tasks in SINs increases the demand for resources. The DQNRS algorithm proposed in this paper still has advantages compared with the ACO, random, FCFO, and DRL-VNE algorithms. The DQNRS algorithm performs variable granularity clustering of resources at the beginning of scheduling, aggregating resources into different types of resource clusters, which helps the scheduling algorithm quickly search for appropriate resources. Different resource granularities can provide different capabilities. The agent of the algorithm can match more appropriate resources for the tasks through continuous dynamic optimization. The algorithm improves the efficiency of resource use and reduces the amount of resource use.
As shown in Figure 7, the task execution time decreases with an increase in the number of satellite nodes in the SINs. With this increase, virtual resource clusters in the cloud environment continue to increase so that tasks have more choices and it is easier to identify appropriate resources that can reduce the execution time of tasks. The random algorithm and FCFO algorithm consider the overall situation of the tasks to a lesser extent, so the resource utilization efficiency is low. Although the task execution time can be reduced after the number of resources is increased, the effect

Wireless Communications and Mobile Computing
is not optimal. The DQNRS, DRL-VNE, and ACO algorithms can better optimize the task execution time with the expansion of resources. Both DQNRS and DRL-VNE can search the environment and perform better in a complex environment, but DQNRS can make better dynamic adjustments according to the change in environment, so it achieves better performance in a high dynamic space environment.
As shown in Figure 8, the completion rate of tasks increases with an increase in resources. When the number of tasks remains unchanged, the capacity and quantity provided by the cloud environment will increase accordingly after the increase in resources, so the success rate of tasks will increase. With fewer resources, the task completion rate of the DQNRS algorithm is only less than that of the DRL-VNE algorithm because the DQNRS makes dynamic adjustment decisions according to the optimization objectives. This process can continuously optimize the resource utilization rate. The more reasonable the resource application is, the higher the task completion rate is. In the process of increasing resources, although the completion rate is not as high as that of DRL-VNE, the DQNRS algorithm still performs well. The algorithm applies the neural network to adapt to the increasing environmental state. The algorithm resource processing stage can have a role in increasing the number of resources so that the search space can better match the search ability of the algorithm, improve the resource utilization, and increase the task completion rate. Figure 9 illustrates the resource usage of different algorithms under resource growth. As seen in the figure, the increase in resources can reduce the use of resources. With the expansion of resources, more resources can be matched by tasks, and the decision-making process can better match the appropriate resources to reduce their use. The figure shows that the random algorithm randomly performs resource matching, and the resource utilization is not high, which makes the resource usage increase significantly compared with other algorithms. With the increase in resources, the DQNRS algorithm has advantages over other algorithms. The algorithm aggregates resources into resource clusters with different capabilities through variable granularity clustering of resources, effectively improving the search ability of resources and improving the probability of searching for more suitable resources. The multiobjective optimization of resource usage and location can effectively improve the resource utilization rate to reduce resource usage.

Conclusions
This paper investigates the resource scheduling problem in the SIN environment. We propose an SIN cloud computing architecture to share multisource and heterogeneous SIN resources, which provides a new idea for resource management. To reduce the complexity and diversity of cloud environment resources, we propose a variable granularity clustering algorithm, which reduces the resource search space in the cloud environment and improves resource utilization efficiency. We analyze the scheduling problem with regard to the task execution time, resource usage, and coverage location and propose an intelligent resource allocation algorithm, DQNRS, based on DQN technology. Based on the adaptive learning ability and decision-making characteristics of DQN in the environment, our algorithm can better adapt to SINs with dynamic changes and complex resources.
We conducted several simulations to analyze the performance of the DQNRS. The experimental results show that although the completion rate of the proposed algorithm is weaker than that of the DRL-VNE algorithm, it performs better with regard to task execution time and resource usage. The performance is better than that of the ACO, random, and FCFO algorithms. In a future work, we plan to design a perfect network architecture and continue to design new scheduling algorithms to improve the resource utilization of SINs.

Data Availability
The data used to support the findings of this study are included within the article.