Resource- and Neighbor-Aware Observation Transmission Scheme in Satellite Networks

The observation satellite can exploit its own storage and computational resources to reduce the transmission delay. However, too excessive usage of these resources can have negative effects on the queuing delay at the relay satellite and/or on conducting other tasks at each observation satellite. In this paper, we proposed a new resource- and neighbor-aware observation transmission scheme (RNA-OTS). In RNA-OTS, each observation satellite decides whether to use its resources and the resources of the relay satellite at each time epoch by considering its resource utilization and transmission policies of neighbor observation satellites. For the optimal decision of each observation satellite in a distributed manner, the operation of observation satellites is modeled by means of a constrained stochastic game, and a best-response-dynamics-based algorithm is devised to find the Nash equilibrium. The evaluation results demonstrate that RNA-OTS can decrease the delay to deliver the observation to the destination by up to 87% compared to a relay-satellite-based scheme while guaranteeing a sufficiently low average utilization of the resources of the observation satellite.


Introduction
Observation satellites are commonly used to watch particular targets for a variety of tasks, such as environmental monitoring and catastrophe surveillance due to their overflight capability [1,2]. However, wireless resources and the transmission power of observation satellites are limited, and thus sufficiently high transmission throughput cannot be achieved. Moreover, the opportunities to transmit the observations to the ground station are quite restricted [3]. That is, as shown in Figure 1, even though the observation satellite observes the target, it cannot immediately transmit the observation. It is only after the observation satellite contacts the ground station that it can transmit the observation to the ground station. As a real example, within a system period of around 100 min, a typical observation satellite may contact a specific ground station in less than 10 min [4]. These limitations cause the increased delay in delivering the observation to the destination. To mitigate this problem, we can exploit the resources of two types of satellites: (1) resources of the relay satellite; and (2) resources (i.e., storage and computational resources) of the observation satellite. When using the resources of the relay satellite, the observation satellites can forward their observations to the relay satellite. Then, the relay satellite can relay them to the destination. However, when lots of observation satellites simultaneously forward the observations to the relay satellite, the delay can increase due to the queuing in the relay satellite. To avoid the excessive simultaneous transmissions, each observation satellite does not forward the observation when lots of neighbor observation satellites are forwarding the observations to the relay satellite. Instead of that, the observation satellite can store its observation (i.e., exploit the storage resources of the observation satellite) until the connection to the ground station becomes available.
Then, the observation satellite can compress the observation to reduce the transmission delay by using its computational resources. However, when using excessive storage and computational resources, the observation satellite cannot store and conduct other tasks (e.g., image processing for environmental monitoring) in a timely manner. To sum up, the resources of two types of satellites should be appropriately utilized. Even though a centralized approach may aid in coordinating the utilization of two types of resources, a significant signaling overhead can arise and a centralized entity can become overburdened owing to a high computation overhead. In this paper, we propose a resource-and neighbor-aware observation transmission scheme (RNA-OTS). In RNA-OTS, to avoid the excessive simultaneous transmissions to the relay satellite (i.e., to avoid the increased queuing delay at the relay satellite), each observation satellite decides whether or not to transmit its observation to the relay satellite (or to the ground station) (i.e., whether or not to use the resources of the relay satellite) by considering the neighbor observation satellites' transmission policies. Additionally, to conduct other tasks (e.g., image processing for environmental monitoring) in a timely manner, each observation satellite decides whether or not to delay the transmission and to compress the observation before the transmission (i.e., whether or not to use its own resources) with a consideration of its current resource utilization. For the distributed optimal decision of each observation satellite by considering the transmission policies of neighbor observation satellites and its resource utilization, the operation of observation satellites is modeled by means of a constrained stochastic game. Then, a best-responsedynamics-based algorithm is devised to find the Nash equilibrium of the game. The Nash equilibrium indicates the status where the best performance (i.e., minimized delay) can be achieved and no players can profit from arbitrarily deviating from the equilibrium [5,6]. The evaluation results demonstrate that RNA-OTS can decrease the delay to deliver the observation to the destination by up to 87% compared to a relay-satellite-based scheme while guaranteeing a sufficiently low average utilization of the resources of the observation satellite. Moreover, it can be found that RNA-OTS can adaptively operate according to its operating environments.
The contributions of this paper can be summarized as follows: (1) by means of a constrained stochastic game model, the transmission policy of each observation satellite can be optimized by considering the other observation satellites' transmission policies in a distributed manner; (2) since the optimal transmission policy can be attained with little overhead, the proposed scheme can readily achieve the best performance in real systems; and (3) we evaluate RNA-OTS in a variety of situations, and the results may be used as design guides for observation satellite systems.
The remainder of this paper is constructed as follows. First, we describe related works in Section 2 and elaborate RNA-OTS in Section 3. The constrained stochastic game model is defined in Section 4. Then, the evaluation results are given in Section 5. The conclusion is presented in Section 6.
Gooley et al. [7] firstly introduced a satellite range scheduling problem to schedule the satellite transmissions with the ground stations within the contact time between them. A mixed-integer programming (MIP) problem is formulated to maximize the total number of satellite transmissions with ground stations and solved by a heuristic method. Meng-Gérard et al. [8] presented a problem for scheduling satellite tasks to maximize the profit by processing the tasks. Marinelli et al. [9] formulated a multiprocessor task scheduling problem to maximize the number of satellite missions with ground stations and developed a Lagrangian-based heuristic method to solve the problem. Spangelo et al. [10] provided a MIP model and algorithm for the communication scheduling problem considering singlesatellite and multi-ground stations to maximize the total amount of transmitted data. Cho et al. [11] proposed a binary integer linear programming formulation for the satellite task scheduling to maximize the overall mission performance metric. Lin et al. [12] formulated a non-convex optimization problem to maximize the energy efficiency of satellite-terrestrial integrated networks and proposed two beam-forming schemes with low complexity to solve the formulated problem. Lin et al. [18] formulated an optimization problem to maximize the secrecy energy efficiency while maintaining the total transmission power below a certain level, and decomposed the original optimization problem into sub-problems to obtain the near-optimal solution.
Note that the above-mentioned previous works focused on the direct transmission between observation satellites and ground stations; however, the opportunities for this direct transmission are limited due to the short connection time between observation satellites and ground stations. To alleviate this problem, we can introduce relay satellites that always have connection to ground stations.
Jia et al. [13] provided a collaborative data download scheme to schedule communications links both between satellites and between satellite and ground stations to maximize the throughput. Zhou et al. [14] presented a dynamic programming framework based on the Markov decision process (MDP) to optimize the transmission scheduling by considering the stochastic data arrival characteristics, the inter-satellite contact model, and the energy dynamics model. Deng et al. [15] proposed a two-phase task scheduling method to dynamically adjust the scheduling according to the frequent task variation in relay-satellite-based systems. Wu et al. [16] introduced a priority-based task scheduling algorithm considering the different requirements of emergency and normal tasks. Chen et al. [17] proposed a data breakpoint transmission method which can divide a single task into multiple subtasks and schedule them in multiple contact time periods to maximize the task completion rate. Table 1 summarizes the comparison of related collaborative approaches. Even though these works focused on the transmission scheduling or task scheduling with the relay satellites (i.e., how to use the resources of the relay satellites), it was not investigated how to utilize the storage and computational resources of the observation satellites.

Ref Objective
Decision Element Method [13] Maximize the throughput Transmission schedule Algorithm [14] Maximize the throughput Transmission schedule MDP [15] Maximize the number of completed tasks & minimize the total power consumption Task schedule Algorithm [16] Maximize the number of completed tasks Task schedule Algorithm [17] Maximize the task completion rate Task schedule Algorithm Figure 2 shows the system model in this paper. There are N observation satellites. These observation satellites can observe the target when the target is in their coverage. After the observation, each observation satellite needs to deliver the observation to the destination at the ground. For this, the observation satellite can directly transmit the observation to the ground station; however, the observation satellite can transmit the observation to the ground station only when it can have a connection with the ground station, and the opportunities for the connection with the ground station are quite limited [4]. Moreover, even if the observation satellite has a connection with the ground station, the transmission power of the observation satellite is limited. Therefore, a large delay to deliver the observation to the destination can be expected. To reduce the delay to deliver the observation to the destination, RNA-OTS periodically decides whether to exploit the resources of relay and observation satellites. Specifically, regarding the usage of the resources of the relay satellite, each observation satellite periodically decides whether to transmit its observation to the relay satellite (or to the ground station) (i.e., whether to use the resources of the relay satellite). When the observation satellite decides to use the resources of the relay satellite, it forwards the observation to the relay satellite at geostationary orbit that always has connection to the ground station.

Resource-and Neighbor-Aware Observation Transmission Scheme in Satellite Networks
Then, the relay satellite can relay them to the destination. However, when lots of observation satellites simultaneously forward the observations to the relay satellite (i.e., when excessively using the resource of the relay satellite), the queuing delay at the relay satellite can increase. Therefore, each observation satellite should consider neighbor observation satellites' actions and decide whether to forward its observation to the relay satellite.
On the other hand, regarding the usage of the resources of the observation satellite, each observation satellite periodically decides whether or not to delay the transmission and to compress the observation before the transmission (i.e., whether to use the storage and computational resources of the observation satellite). However, when excessively using the resources of the observation satellites, the observation satellite cannot conduct its own important tasks (e.g., image processing for environmental monitoring) in a timely manner.
To sum up, it should be appropriately decided whether to use the resources of two types of satellites by considering their operating environments. Especially for the decision on the usage of the resources of the relay satellite, each observation satellite should consider the transmission policies of neighbor observation satellites to avoid the increased queuing delay at the relay satellite. In addition, for the decision on the usage of its own resources, each observation satellite should consider the average utilization of its resources. For the distributed optimal decision of each observation satellite by considering the transmission policies of neighbor observation satellites and their resource utilization, a constrained stochastic game is developed in the next section.

Constrained Stochastic Game
When using appropriately the resources of two types of satellites by considering the operating environments, the delay to deliver the observation to the destination can be minimized and the average utilization of resources of the observation satellite can be kept below certain levels, which are the objectives of the formulated constrained stochastic game model [19]. In our model, observation satellites are the players. Thus, observation satellite i and player i are interchangeable in the formulation. The important notations are summarized in Table 2 (in this paper, the bold notation indicates the space).
where M i and G i denote the observation state space of observation satellite i and the ground station state space of observation satellite i, respectively. Since M i can represent whether observation satellite i observes a target or not and the elapsed time from the observation, M i can be represented as where M i (∈ M i ) = 0 denotes that observation satellite i does not observe any target. In addition, M i (> 0) represents that observation satellite i observes a target and the elapsed time from the observation is M i . M max is the maximum elapsed time in our system model. Meanwhile, G i can be defined as where G i (∈ G i ) denotes whether observation satellite i can connect with the ground station or not. That is, if G i = 0, observation satellite i cannot connect with the ground station. Otherwise, it can connect with the ground station.

Action Space
where A i (∈ A i ) = 0 represents the situation where observation satellite i delays its transmission. In addition, A i = 1 and A i = 2 denote the situation where observation satellite i transmits its observation to the relay satellite and ground station without the compression, respectively. Moreover, A i = 3 and A i = 4 denote the situation where observation satellite i transmits its observation to the relay satellite and ground station after the compression, respectively (any kind of compression method can be exploited). Note that A i = 2 and A i = 4 can be conducted only when observation satellite i has a connection with the ground station (i.e., G i = 1).

Transition Probability
The transition probability P[S i | S i , A i ] means the probability of transiting to the next state S i , given the current state S i and action A i of observation satellite i. The state M i is affected by the chosen action (e.g., when observation satellite i transmits its observation, M i becomes 0), whereas the state G i is not influenced by the chosen action. In addition, these states change independently of each other. Therefore, P[S i | S i , A i ] between the current state S i = [M i , G i ] and the next state S i = [M i , G i ] can be described as When it is assumed that the inter-observation time of observation satellite i follows the exponential distribution and its mean is 1/λ M i , the probability that observation satellite i observes the target during the decision epoch τ can be calculated as λ M i τ [20,21]. Thus, Meanwhile, if observation satellite i does not transmit its observation (i.e., A i = 0), the elapsed time from the observation increases one by one until M i reaches the maximum elapsed time. Therefore, the corresponding transition probabilities can be described as and When observation satellite i transmits its observation (i.e., A i = 0), M i will be always 0. Therefore, P[M i | M i = 0, A i = 0] can be represented as We assume that the duration of the disconnection (connection) between the ground station and observation satellite i follows the exponential distribution with mean 1/λ D i (1/λ C i ). Thus, the probability that observation satellite i connects with the ground station during τ is λ C i τ [20,21]. In addition, the probability that observation satellite i loses the connection with the ground station during τ is λ D i τ [20,21]. Therefore, and

Cost Function
The delay to deliver the observation to the destination is used as the cost function r(S i , A i ). Because the delay is affected by the chosen action, we can consider the following four cases: (1) observation satellite i transmits its observation to the relay satellite without the compression (i.e., A i = 1); (2) observation satellite i transmits its observation to the ground station without the compression (i.e., A i = 2); (3) observation satellite i transmits its observation to the relay satellite after the compression (i.e., A i = 3); and (4) observation satellite i transmits its observation to the ground station after the compression (i.e., A i = 4).
For each case, the delay to deliver the observation to the destination can be calculated. For the first and third cases, because the observation is delivered through the relay satellite, we consider the latency between observation satellite i and the relay satellite and the latency between the relay satellite and the the ground station. In addition, the average queuing delay in the relay satellite should be considered. To sum up, the delay L 1 and L 3 to deliver the observation to the destination for the first and third cases can be obtained as (L W T R,G ) and L C T O,R (L C T R,G ) denote the latency between observation satellite i (relay satellite) and the relay satellite (ground station) without and with the compression, respectively. In addition, E[Q] and L Q are the expected queue length when the observation arrives at the relay satellite and the unit latency to handle one observation in the relay satellite, respectively. Note that the expected queue length E[Q] depends on the transmission policies of observation satellites. On the other hand, for the second and fourth cases, because the observation is directly delivered to the ground station, we only consider the latency between observation satellite i and the ground station. Thus, the delay L 2 and L 4 to deliver the observation to the destination for the second and fourth cases can be calculated as M i + L W T O,G and M i + L C T O,G , respectively, where L W T O,G and L C T O,G represent the latency between observation satellite i and the ground station without and with the compression, respectively. Note that observation satellite i can directly transmit the observation to the ground station only when there is the connection between observation satellite i and the ground station (i.e., G i = 1). To sum up, r(S i , A i ) can be defined as (12).

Constraint Function
To keep the usage of the computational resource of observation satellite i below a specific threshold, the constraint function c C (S i , A i ) is defined to represent whether or not to utilize the computational resource of observation satellite i. When observation satellite i compresses the observation (i.e., A i = 3 and A i = 4), it consumes the computational resource. Therefore, c C (S i , A i ) can be defined as To keep the average utilization of the storage resources of observation satellite i below a certain level, we define the constraint function c S (S i , A i ) to represent whether or not to utilize the storage resources of observation satellite i. To delay the transmission of the observation (i.e., A i = 0), observation satellite i should store the observation in its storage. Thus, c S (S i , A i ) can be defined as

Optimization Formulation
If a stationary multipolicy of all observation satellites π is given, a long-term average delay ζ D to deliver the observation to the destination can be defined as where S t and A t are the global state and the action at time t, respectively. Meanwhile, observation satellite i tries to keep the average utilization of its computational and storage resources below certain levels. The constraints for the average utilization of computational and storage resources, ξ C and ξ S , can be represented as and where θ C and θ S denote the target utilization of the computational and storage resources, respectively. Let π * i and π * −i denote the optimal policies (i.e., Nash equilibrium) of observation satellite i and all observation satellites except observation satellite i, respectively. Consequently, the optimal multipolicy (i.e., constrained Nash equilibrium) can be described by π * = π * i , π * −i . (When a stochastic game is constructed with a finite number of players, states, and actions, there is always Nash equilibrium (i.e., optimal solution) in the game [19].) Note that, for any other stationary policy π i of observation satellite i, while satisfying all constraints. Meanwhile, when the policies of all observation satellites except observation satellite i (i.e., π −i ) are given, the optimal policy π * i of observation satellite i (i.e., best-response policy) should satisfy the inequality ζ D π * i , π −i ≤ ζ D π i , π −i . Let φ i,π −i (S i , A i ) represent the stationary probability that observation satellite i chooses the action A i in local state S i when the stationary policies of other observation satellites π −i are given. Then, we can formulate the equivalent linear programming (LP) model and its solution φ * i,π −i (S i , A i ) can be interpreted as the optimal policy of the formulated game [22][23][24]. The LP model can be formulated by In order to minimize the delay to deliver the observation to the destination from observation satellite i, the objective function in (18) is designed. Meanwhile, the average utilization of the computational and storage resources can be maintained below the desired levels θ C and θ S by the constraints in (19) and (20), respectively. The constraint in (21) is for the Chapman-Kolmogorov equation derived by the chain rule for the probability and Markov property. The basic probability properties can be constrained by (22) and (23).
If any feasible solution of the LP problem exists, the stationary best-response policy of observation satellite i can be obtained as [25].
To obtain the optimal policies (i.e., best-response policies) of observation satellites, we develop a best-response-dynamics-based algorithm as in Algorithm 1. Each observation satellite first initializes its policy (line 1 in Algorithm 1). Then, each observation satellite transmits its current policy to other observation satellites for the interaction (line 4 in Algorithm 1). Based on the policies of other observation satellites, each observation satellite can calculate the expected queue length E[Q] when the observation arrives at the relay satellite (line 5 in Algorithm 1). Specifically, E[Q] can be calculated as . After that, each observation satellite solves the LP problem to obtain the optimal policy π * i (line 6 in Algorithm 1). The algorithm is finished when the policies of all observation satellites converge (line 7 in Algorithm 1).
1: Initialize the policies π i for ∀i. 2: repeat 3: for All observation satellites i do 4: Transmit the policy π i to other observation satellites 5: Calculate the expected queue length E[Q] when the observation is arrived at the relay satellite 6: Solve the LP problem to obtain the optimal policy π * i 7: end for 8: until Stationary policies of all observation satellites converge The complexity of solving the LP problem is relatively low [26]. For example, the complexity of Vaidya's algorithm, which is a representative LP-solving algorithm, is a polynomial time (i.e., O(|S i | · |A i | 3 ) for our problem). In addition, the stationary policies can be obtained within few iterations (see Section 5.1). Therefore, the proposed algorithm can be implemented without significant overhead.

Evaluation Results
For performance evaluation, we compare the proposed scheme, RNA-OTS, with the following three schemes: (1) GROUND, where the observation satellite directly transmits its observation to the ground station with/without compressing the observation; (2) RELAY, where the observation satellite always exploits the relay satellite with/without compressing the observation; and (3) RAND, where the observation satellite randomly chooses the action. The performance metrics are the average delay ζ D to deliver the observation to the destination, the average utilization ξ C of the computational resource, and the average utilization ξ S of the storage resource.
The default parameter settings are as follows: The number of observation satellites is 5. The average inter-observation time 1/λ M is set to [2,5] (s), where [a, b] is a random value between a and b. The average duration of the disconnection (connection) between the ground station and the observation satellite is set to [5,20]. The target utilization, θ C and θ S , for the computational and storage resources is set 0.05 and 0.01, respectively. The latency L W T O,G (L C T O,G ) between the observation satellite and the ground station without the compression (with the compression) is set to 1 (0.5) (s). The latency L W T R,G (L C T R,G ) between the relay satellite and the ground station without the compression (with the compression) is set to 1 (0.5) (s). The latency L W T O,R (L C T O,R ) between the observation satellite and the relay satellite without the compression (with the compression) is set to 0.1 (0.05) (s). The unit latency L Q to handle one observation in the relay satellite is set to 0.01 (s).

Convergence to Nash Equilibrium
The process by which the policies of observation satellites are converged to the Nash equilibrium (i.e., best-response policy) is illustrated in Figure 3. From Figure 3, it can be seen that the proposed algorithm can find the Nash equilibrium within few iterations. (Note that each observation satellite randomly set its initial policy.) This indicates that our algorithm may be implemented in satellite networks without incurring significant signaling overhead. Meanwhile, from Figure 3, it can be seen that each observation satellite decides its action with the consideration of the other observation satellites' actions at the best response to reduce the queuing delay at the relay satellite. Specifically, in this result, after the convergence, the observation satellites 1, 2, and 3 transmit their observations to the relay satellite with relatively high probability. In this situation, if other observation satellites (i.e., satellites 4 and 5) transmit their observations to the relay satellite, the queuing delay at the relay satellite can increase significantly. To realize this fact, the observation satellites 4 and 5 maintain lower transmission probability to the relay satellite. Figure 4 shows the effect of the target utilization of the computational resource θ C . From Figure 4, it can be found that RNA-OTS can achieve the lowest average delay ζ D while maintaining acceptable average utilizations of computational and storage resources, ξ C and ξ S (i.e., maintaining sufficiently low average utilization of computational and storage resources). This is because observation satellites in RNA-OTS efficiently exploit given computational and storage resources and decide their transmission policies by considering other observation satellites' policies. For example, if observation satellites in RNA-OTS are expected to have a connection with the ground station soon, observation satellites use their storage resources to store the observations. Then, after connecting to the ground station, observation satellites directly transmit the observation to the ground station. In doing so, observation satellites in RNA-OTS can effectively utilize their storage resource (i.e., the storage resource can be used only during a short time (when some observation satellites which are expected to have a connection with the ground station soon delay their transmission, the queuing delay at the relay satellite can naturally be maintained at a low level). Additionally, since observation satellites in RNA-OTS have more computational resources, they compress the observation (i.e., A i = 3 or A i = 4) to reduce the delay ζ D . Moreover, a certain observation satellite does not transmit its observation to the relay satellite when many other observation satellites transmit their observations to the relay satellite.

Effect of θ C
Meanwhile, from Figure 4a, it can be found that the average delay of RNA-OTS decreases with the increase in the target utilization θ C of the computational resource. This is because observation satellites in RNA-OTS can aggressively compress the observation by using more computational resources under higher θ C , which can be observed in Figure 4b. Therefore, the observation can have a smaller size, and thus be averagely delivered within a short duration. On the other hand, other comparison schemes do not change their policies regardless of θ C , and therefore their average delay and average utilization of computational resources do not change (see Figure 4a,b). Figure 5a,b show the effect of the target utilization θ S for the storage resource on the average delay ζ D and the average utilization of the storage resource ξ S , respectively. As shown in Figure 5a, the average delay of RNA-OTS decreases with the increase in θ S . This can be explained as follows: Larger θ S means that observation satellites can store their observations for longer time, which implies that observation satellites can have more chance to transmit their observation without going through the relay satellite. This can lead shorter average delay. Meanwhile, since the other comparison schemes operate with the fixed policies, their average delay and average utilization of the storage resource do not change (see Figure 5a,b). This indicates that RNA-OTS can achieve better performance gain (i.e., lower average delay) compared to the other schemes when higher θ S is given. Specifically, with θ C = 0.5, the average delay of RNA-OTS is just 13% of that of RELAY. Figure 6 shows the effect of the average duration 1/λ C of the connection between the ground station and the observation satellite on the average delay ζ D to deliver the observation to the destination. From Figure 6, it can be observed that the average delay of RNA-OTS decreases as 1/λ C increases. This is explained in the following: A large duration 1/λ C of the connection between the ground station and the observation satellite indicates that the observation satellites can transmit the observations to the ground station most of the time without conducting the delay action. In addition, because a large duration 1/λ C indicates that observation satellites can transmit the observations to the ground station most of the time, the load of the relay satellite does not increase significantly (i.e., short queuing delay in the relay satellite), which leads the decreased delay ζ D in delivering the observation to the destination.     Figure 7 shows the effect of the unit latency L Q when handling one observation in the relay satellite on the average delay ζ D . From Figure 7, it can be observed that the average delay of all schemes except GROUND increases as L Q increases. This is because the queuing delay at the relay satellite increases as L Q increases. However, because the observation satellites in RNA-OTS maintain a small length of queue in the relay satellite by considering the other observation satellites' actions, its increasing rate is smallest. Note that, because GROUND does not use the relay satellite, the average delay ζ D is constant regardless of L Q .

Conclusions
This paper proposes a resource-and neighbor-aware observation transmission scheme (RNA-OTS) where each observation satellite periodically decides whether or not to exploit its own resource and the resource of the relay satellite. For the optimal decision of each observation satellite in a distributed manner, the operation of observation satellites is modeled by means of a constrained stochastic game, and a best-response-dynamics-based algorithm is devised to find the Nash equilibrium. The evaluation results demonstrate that RNA-OTS can decrease the delay to deliver the observation to the destination by up to 87% compared to a relay-satellite-based scheme while providing a sufficient low average utilization of computational and storage resources. Moreover, it can be observed that RNA-OTS optimizes its transmission policy by considering its operating environment (e.g., connection duration between the ground station and the observation satellite). In our future work, we will extend the proposed scheme in order to learn the optimal policy without prior information on the operational environment. In addition, we will conduct the route optimization of the observation satellites to further reduce the observation delivery delay.