A Satellite Observation Data Transmission Scheduling Algorithm Oriented to Data Topics

,


Introduction
In recent years, the spatial information obtained by earth observation satellites has played an essential role in geographic mapping, disaster prevention, resource surveillance, maritime search and rescue observations, and so on [1].When a satellite flies into the visibility of an antenna of a ground station, the satellite can transmit the observation data which stores in the satellite onboard memory to the ground via the communication link. This transmission is called a visibility service. When several satellites pass over a ground station simultaneously, or when a satellite runs into the overlapping visibility of different stations, these conditions will cause visibility conflicts.
With the increasing number of earth observation satellites in recent years and the centralized distribution of ground stations (all located within China's territory), the conflicts among transmission windows of different satellites become more and more apparent [2][3][4]. Furthermore, with the development of remote sensing applications and technological progress of satellite onboard storage file systems, a new special satellite data transmission requirement named data transmission oriented to topics has appeared. This requirement has characteristics such as (i) There are different users from different departments with different priorities. Each user (or department) has unique data topic requirements. Each data topic requirement consists of various observation data (e.g. remote sensing image) provided by different satellites. The observation data which belong to one specific topic have a strong correlation between each other. Examples include a stereoscopic photography data topic that needs different observation data taken by different satellites from different observation angles, a multisource information fusion data topic that needs the observation data obtained by different types of satellite sensors (such as visible light, infrared, and synthetic aperture radar), and a continuous observation data topic that requires the observation data of the same ground target every several hours (ii) Unless all the observation data belonging to one topic has been transmitted to ground, the data observation tasks for the topic cannot be completed and the value of the observation data will be decayed sharply, if the observation data belonging to the topic is not complete. The topics required by users with high-level priorities should be prioritized (iii) Some observation data topics on disaster damage assessment or emergency rescue have the characteristics of high timeliness. It is required that all the observation data belonging to the data topic should be transmitted to the ground before the expected time or deadline. If the timeliness requirement is not satisfied or incomplete observation data are transmitted before the deadline, it will obtain only a partial reward (or even no reward) for the data transmission How to guarantee that the observation data topics required by high-priority users can be transmitted to the ground station as completely and timely as possible has become an urgent new problem to be solved in aerospace scheduling.
Conventional satellite data transmission scheduling is one of the satellite range scheduling (SRS) problems. This highly constrained problem has been proven to be NP-hard [5,6]. Because of the complexities, many heuristic resolving strategies have been brought up in literatures [7]. According to the deconflicting strategies to resolve the visibility conflicts, these studies can be divided into the following three categories.
The research work belonging to the first category employs mathematic scheduling methods with heuristic rules to solve the satellite range scheduling problem. Lee et al. [8] proposed a heuristic algorithm named "first finished first serviced" to handle COMS scheduling of Korea. Xhafa et al. [9] adopted a heuristic hill climbing algorithm based on the characteristics of a satellite data downlink process. Vazquez et al. [10,11] employed a heuristic deconflicting operation when conflicts were found: firstly, moving passes to a different antenna at the same site or at another site; secondly, shortening the passes' time slot on the antenna and then formulating a global integer linear programming model that simultaneously contained all these deconflicting operations. Karapetyan et al. [12] studied the downlink scheduling problem for Canada's earth observing SAR satellite RADARSAT-2 and proposed a flexible solution approach separating the details of the problem-specific constraints from the optimization mechanism. Liang et al. [13] proposed a greedy strategybased heuristic data download scheduling algorithm which considered trade-offs among throughput, Qosand fairness. Ling et al. [14] calculated the priority value of each data download task and then adopted a resource competition algorithm, which can be considered a ranking problem to select a data download task with high priority. Xiao et al. [15] modeled the multisatellite observation and datadownlink scheduling problem as a two-stage flow shop scheme. They employed mixed-integer linear programming to optimize the observation scheduling (at stage 1) and the data-downlink scheduling (at stage 2) concurrently. Liu et al. [16] decomposed the satellite range scheduling problem into a multistage decision process. An approach based on dynamic programming with a multilevel route reduction strategy was leveraged to optimize the result.
Due to the computational complexity of satellite data transmission scheduling problem, nature-inspired metaheuristic algorithms [17,18] were employed to solve it. This is the second category. Feng and Lining [19] proposed a learnable ant colony optimization approach, which legitimately combines an intelligent optimization model with a knowledge model. Xhafa et al. [20,21] proposed two different algorithms, simulated annealing algorithm and genetic algorithm, to solve ground station scheduling problem, with the consideration of a combination objective function including access window fitness, communication clashes fitness, communication time requirement fitness, and ground station usage fitness. Li et al. [22] focused on reducing data transmission time for rapid-response earth-observing operations. A data transmission topological model in which satellites and ground stations were mapped as vertexes, while the visibility services between them were mapped as edges is constructed. Then, a top k shortest path-searching algorithm based on the genetic algorithm was proposed. Tsatsoulis and van Dyne [23] introduced some artificial intelligence techniques, including case-based reasoning, rule-based systems, and generate-and-test techniques into the scheduling problem of the AFSCN (Air Force Satellite Control Network) [6]. Chen et al. [2,24,25] employed improved genetic algorithm and particle swarm optimization algorithm to handle the satellite data transmission scheduling problem with some specific requirements, for incremental observation tasks, member satellites of the same cluster, and real-time and playback data transmission modes. Addressing the satellite image downlink scheduling problem, Yao et al. [26] and Song et al. [27] brought up a heuristic genetic algorithm.
The third category formulates the satellite data transmission scheduling problem as a multiobjective optimization problem and adopts improved MOEAs (multiobjective evolutionary algorithms) to generate pareto front as solutions for the problem [3,28,29]. Although the objective function of satellite data transmission scheduling problem is an essential multiobjective optimization problem and MOEAs can obtain good results, the computational cost of MOEAs are much heavier than the algorithms belonging to the former two categories. This cost cannot be afforded in practical engineering at present.
In the aforementioned research work which considers that all the observation data obtained by different satellites is independent, the correlations between observation data are neglected. The observation data belonging to a specific topic usually has a strong correlation with each other. The value of a topic observation data may be decayed sharply, even if a small part of the observation data belonging to the 2 International Journal of Aerospace Engineering topic cannot be transmitted to the ground in time. Therefore, the current models and algorithms can hardly meet the requirements of satellite observation data topic transmission scheduling (SDTTS) problem. The contributions of this paper are as follows: (i) To the best of our knowledge, this is the first work to formulate the SDTTS problem. We bring up the timeliness reward function and completeness reward function as criteria to estimate the performance of scheduling results. A constraint satisfaction problem model is then proposed (ii) A novel algorithm consists of a particle swarm optimization algorithm and genetic algorithm designed. After fully considering the characteristics of the problem, a novel heuristic rule-based mutation operation is employed in order to improve the performance of our algorithm further (iii) We provide a quantitative analysis showing the effectiveness and efficiency of the proposed model The rest of the paper is organized as follows. Firstly, we introduce some preliminary findings on satellite data transmission. Secondly, we describe the problem and formulate it. Thirdly, we provide a novel hybrid scheduling algorithm based on evolutionary computation. Fourthly, we design a domain-knowledge-based mutation operator to enhance the performance of our algorithm further. Fifthly, experiment results are provided to validate the effectiveness of our algorithm. Finally, a conclusion is drawn based on the whole research work.

Preliminary on Satellite Data Transmission
2.1. Visibility Conflict. When a satellite is within the line-ofsight of an antenna of a ground station, the satellite and the ground station can communicate with each other, called a visibility service. During the visibility service, the observation data transmission can be done. The visibility service start time is called acquisition of signal (AOS), and the end time is called loss of signal (LOS). A visibility conflict appears when two or more satellites pass over a ground antenna and the duration between the former LOS and the latter AOS cannot satisfy the reconfiguration time requirement. In Figure 1, there is a visibility conflict among SAT-1, SAT-2, and SAT-3 in the visibility service provided by GS-1.

Onboard Storage Data Transmission
Model. Nowadays, a new onboard file management system [30,31] has been developed, and the observation data is stored as files on onboard storage. Because of the onboard file management system, the files can be randomly accessed and managed in the downlink data transmission process. Even if the total data files in onboard storage cannot be completely transmitted to the ground station in one visibility service, the files containing the most important part of the data (represented by dots and cruciform in Figure 2) can be selected and transmitted to the ground firstly. And the rest data files that are not down-linked to ground station 1 in the current visibility service can be transmitted to the next candidate visibility service later.
Furthermore, even in one visibility service, the observation data which is stored on the onboard storage of different satellites but belongs to the same topic can be easily accessed, organized, and transmitted to the ground. As illustrated in Figure 3, there are 5 different data topics which are represented by different patterns (e.g., dots or cruciform). There are 3 satellites having the observation data belonging to the 5 topics. Firstly, ground station1 provides a data downlink service to satellite 1. Then, at time t 1 , ground station 1 reconfigures its antenna and download data from satellite 2. Lastly, the antenna of station 1 is reconfigured again at time t 2 , and the observation data of satellite 3 is received. Within the visibility service, each satellite can select a part of the observation data belonging to a different topic and download them to ground station 1, respectively. Therefore, this process is very different from the conventional satellite data transmission scheduling problem and can greatly enhance the efficiency of visibility services usage supporting for a certain data topics.

Problem Formulation
Based on the analysis above, the essential of the SDTTS problem can be described as where and when to transmit which part(s) of the data in the onboard storage. The problem is a typical multidimensional combinatorial optimization problem. In this section, a constraint satisfaction problem model is established.

Symbols and Descriptions
(1) S set = fS 1 , S 2 ,⋯g is the satellite set, while ϕ j is the minimal data transmission duration for satellite j. If satellite j start to transmit data to a ground station, the duration of the transmission needs to be longer than (2) G set = fG 1 , G 2 ,⋯g is the ground station set, while φ k is the minimal reconfiguration time for ground station k (3) OBSet = fob 1 , ob 2 ,⋯g is the set of observation activities performed by the satellite. Satellites perform an observation activity to a target and transmit the observation data (image) to it. The data obtained in one observation activity is called observation data unit and can be defined as ODU n = <t g , ob n > , in which t g represents the generation time of the observation data and ob n is the observation activity that gets the observation data (4) One or more observation data units can make up a data file. And a data file is the minimal unit to be stored in satellite onboard solid memory and to be transmitted to ground. A data file can be represented as DF mj = fod 1 , od 2 ,⋯g while m and j stand for the mth data file stored in the onboard solid memory of satellite j. The number of observation data units within depends on the character of the satellite constraints. This has been determined before satellite data transmission scheduling represent the expected time and deadline, respectively, for the data unit set of the data topic i to be transmitted to the ground station, which is shown in Figure 4. Ω i is the observation data unit set that belongs to data topic i. If an observation data unit ODU n ∈ Ω i means that the observation data unit ODU n is a part of the data topic DS i . If kΩ i k = 1, it means that only one observation can complete the data topic. A data topic usually consists of more than one observation data units obtained by different satellites, while each observation data unit usually belongs to one or more data topic (6) VSset = fVS 1 , VS 2 ,⋯g is the visibility service set, and each visibility service can be formulated as VS l = <l, AOS l jk , LOS l jk | j ∈ S set , k ∈ G set > . In which, AOS l jk , LOS l jk are the start time and end time of visibility service, respectively (7) VSS j is a subset of VSset which provides a visibility service for satellite j. Similarity, VSG k is also a subset of VSset which is provided by ground station k. In which, j ∈ S set and k ∈ G set Defining decision variable π l mj ⋅ π l mj = 1 represents that the data file DF mj is transmitted to the ground station during the visibility service VS l and ts l mj , te l mj represent the start time and the end time of the transmission; otherwise, π l mj = 0.  International Journal of Aerospace Engineering

Assumptions
Made. Some assumptions are made to simplify the satellite data transmission scheduling problem for the purposes of this paper, which are as follows: (i) Each satellite has no priority, so does each ground station. Apart from the consideration of satellite data, when two or more satellites pass over a ground station simultaneously, these satellites are equivalent; or when a satellite passes over different ground stations one by one, these ground stations have no preference (ii) A ground station has one and only one antenna. If a ground station has more than one antenna, we consider it as N ground stations, each of which has only one antenna and locates in the same position, where N is the number of antennas (iii) A ground station can only serve one satellite at one time. When a transmission starts, the bidirectional communication connection is built between the ground station and the satellite. Before the transmission accomplishes, this connection is exclusive (iv) No more extra reward for repeated transmission of the same data and no extra reward for oversatisfied transmission (v) Data file is the minimal unit to be transmitted to the ground and there is no reward for a partial transmission. A data file includes not only the satellite observation data but also other information on calibration and redundancy. If a data file is partially transmitted, it can be hardly recovered

Constraints
(i) Any data file stored in satellites can only be transmitted once to any available ground station at most, namely, (ii) At the same time, a ground station can only serve one satellite, that is, if two data files DF m 1 j 1 , DF m 2 j 2 can be transmitted to the same ground station sequentially, they must satisfy the temporal constraints, namely, (iii) Any data file must be transmitted within its validity, namely, (iv) For any transmission, the total duration must be no less than the minimal data transmission, namely, 5 International Journal of Aerospace Engineering 3.4. Objective Function. The scheduling objective is to maximize the reward of data transmission activities. The observation data units belonging to high-priority data topics should be transmitted to the ground station as completely and as fast as possible. Therefore, the objective function of our problem can be divided into timeliness reward function and completeness reward function.
3.4.1. Timeliness Reward Function. Obviously, the timeliness reward of each data topic is not constant, which decays as the time goes on. The timeliness reward can be expressed as a continuous piecewise function.
(i) If a data topic is transmitted to the ground station before the expected time θ i , it will obtain the full amount reward, which can be simply described as 1 (ii) If a data topic is transmitted to the ground station between the expected time θ i and deadline ϑ i , the reward will decay increasingly. That is to say, it will obtain less reward as the time goes on, which can be expressed as monotone decreasing continuous function, such as exponential function, logarithmic function, or even linear function. In our model, exponential function is adopted (iii) If it is a beyond-deadline transmission, it can still gain only a little reward. For simplification, it can be defined as a small constant greater than 0 Then, the timeliness reward function is defined as in which, the parameter t i is the download time of the last observation data unit belonging to data topic DS i and ρ = −1n δ/ðϑ i − θ i Þ. Figure 4 is an example of the timeliness reward function.

Completeness Reward Function.
Each observation data unit, which belongs to a topic, has its contribution to the topic, when it is transmitted to the ground. In conventional satellite data transmission scheduling models, the reward of an observation data unit is usually only relevant with the importance of the target to be observed. While in our model, the reward of an observation data unit is not only dependent on the data itself but also on the completeness of the topic which it belongs to. Completeness reward function is dependent on the correlation among the observation data units belonging to the same topic, which can be divided into three types, namely, no relation, strong relation, and weak relation.
(i) No relation: it is the same with a conventional satellite data transmission scheduling problem. The completeness reward of a topic is linearly increasing with the completion proportion of the transmitted obser-vation data units belonging to a data topic. In this case, the completeness reward function can be expressed as a linear function (ii) Strong relation: it means that no observation data can obtain a reward until all the observation data units belonging to the data topic have been transmitted to ground station. In this case, the observation data in different onboard data files have a strong relationship with each other, and the data product for the end user is useless unless the data topic is complete (the data topic requirements are fully satisfied). If not satisfied, no matter how much the proportion of the data topic is transmitted to ground, the data topic actual reward remains 0. It can be expressed as a step function intuitively (iii) Weak relation: it is a kind of correlation between no relation and strong relation, which means that more observation data units can obtain more reward and vice versa, while it is no more a linear function, but a typical convex function generally Then, the completeness reward function is formulated as in which, q, ξ q is the order and coefficient of the polynomial function with ∑ q ξ q = 1, and p i is the download data proportion of data topic DS i which can be formulated as In which, Ω done i ∈ Ω i , is the downloaded observation data units belonging to topic DS i . Figure 5 illustrates an example of the completeness reward function.
Therefore, the actual reward of each data topic DS i obtained from the transmitted data can be defined as Then, the objective function of the proposed model can be formulated as

Model Solution
Since the satellite data transmission scheduling problem has been proven to be NP hard, the exact search algorithms such as exhaustion algorithm and branch-and-bound algorithm are unable to get satisfied results in a short time. The 6 International Journal of Aerospace Engineering intelligent optimization algorithms, one kind of stochastic search algorithms, have been widely adopted in many NP hard problems [27], such as 0-1 knapsack problems, workshop scheduling problems, and traveling salesman problems. Particle swarm optimization algorithm (PSO) and genetic algorithm (GA) are two important intelligent optimization algorithms. PSO [4] is easy to implement and there are few parameters to adjust, but PSO is easy to get trapped in local optimum. It is the main reason for the premature convergence of PSO that the diversity of population gradually decays. GA [32] can maintain the population diversity by genetic operators including crossover and mutation which simulate the evolution process in a natural environment. This paper proposes a hybrid algorithm combining PSO and GA for satellite observation data topic transmission scheduling (HPG4TT). This takes advantage of both PSO and GA not only to avoid prematurely falling into local optimal solution but also to have a fast convergence.
4.1. The Process of HPG4TT. PSO [4] is a population-based stochastic evolutionary algorithm. Each particle has its own position and velocity current velocity and moves in multidimensional searching space according to its own experience and the experience of the neighboring particles. At the beginning of the searching process, the particles of PSO are initialized randomly and then find the optimal solution by iteration. During the movement in the search space, the position and velocity of the particles are updated by learning from that of the two optima. The first one is the individual particle itself with historical best position called the individual extreme p i best .The second one is another individual particle which has the best position found so far among the entire population of the particles, and it is called the global extreme g best . The particle status updating formula is as follows: where v, x represent the velocity and position of each particle, w is the individual-learning coefficient, c 1 , c 2 are the social-learning coefficients, and r 1 , r 2 are random float numbers in the range of 0 to 1. GA [17,18,32] is a widely used stochastic search algorithm based on Darwinian evolution and Mendelian genetic theories, proposed by American scholar Holland. GA proceeds to initialize a population in which each individual represents a solution to the problem and then to improve it through repetitive application of the mutation, crossover, and selection operators.
From the PSO particle status updating formula, we can see the particles learn from the local optimal particle and the global optimal particle. Combining with GA, the crossover operator of GA is employed to implement the learning process, while the mutation operator of GA is adopted to  Blue illustrates the data topic of no relation type, which is a linear function and this is the case of a conventional satellite data transmission scheduling problem. The green one illustrates the data topic of a weak relation type, which is a typical convex function. And the red one is a data topic of strong relation type, which is a step function. All the three kinds of data topic requirements are considered in our problem. 7 International Journal of Aerospace Engineering maintain the diversity of the population. Therefore, the HPG4TTupdate formula can be described as where Γ i j is the encoding of particle j, and functions Crossoverð⋅Þ, Mutationð⋅Þ represent the process of crossover and mutation, and other parameters are the same as Equation (8). The main step of the HPG4TT algorithm is illustrated in Figure 6.

Encoding.
Binary encoding is a commonly used encoding method in PSO and GA, which uses 0 and 1 to encode one problem solution. Since the decision variable π l mj ∈ f0, 1g, binary encoding can be applied to the proposed model directly, and the individuals via binary encoding are easy to be operated in crossover and mutation operations. However, binary encoding has some obvious weaknesses. The encoding length of each particle is L 1 = M × J × L, and the solution space is Ω 1 = 2 M×J×L , if the binary encoding method is applied to the above model. In which, M is the total number of data files, J is the number of satellites, and L is the total number of the visibility services. For instance, M = 10, J = 6, L = 10, the solution space Ω 1 ≈ 4 × 10 180 can become tremendously large and a lot of solutions are illegal. The proposed algorithm needs to make great efforts to deal with the illegal solutions, which may slow down the convergence rate of the algorithm.
Another encoding method is a natural number encoding. For natural number encoding, the encoding length of each particle is L 2 = M × J, and the solution space is Ω 2 = ðM × JÞ L+1 . We can see, for the same scale of the problem (e.g., M = 10, J = 6, L = 10), Ω 2 = 60 10+1 ≈ 3:6 × 10 19 ≪ Ω 1 , the solution space of natural number encoding decreases greatly. But, it is doubtful that the natural number encoding for the proposed problem is complete. Apparently, binary encoding can express every solution of the proposed problem and is a complete encoding method. And if we can prove that for each valid solution of the proposed problem, the binary code and the natural number code of the solution can be transformed to each other, we can make a conclusion that the natural number encoding is a complete one.In summary, Theorem 1 is proved.  Find the individual optimization for particle Γ i The GA evolution The PSO framework Output the best particle g best Stopping criteria Figure 6: The main steps of the HPG4TT algorithm. 8 International Proof. Let the binary encoding be Π = fπ 1 11 ,⋯,π l 11 ,⋯,π 1 mj ,⋯ ,π l mj g and the natural number encoding be Y = fϱ 11 ,⋯,ϱ mj g. Given any set of binary encoding Π q . From inequality (1), we can see that for any data file DF mj , ∑ l π l mj ≤ 1. If ∑ l π l mj = 0, all the visibility service cannot support the data transmission of DF mj , then let ϱ mj = 0. If ∑ l π l mj = 1, there is only one visibility service x that supports the data file DF mj , then let ϱ mj = x. Then, a unique natural number encoding Γ q can be generated from a valid binary encoding Π q .
Given any set of natural number encoding Y q , for any data file DF mj , has ϱ mj ≥ 0. If ϱ mj = 0,let π l mj = 0, ∀m, j; if ϱ mj = x > 0, it means that data file DF mj is transmitted to the ground station during the visibility service x, according to inequality (1), the data file will not be transmitted by any other visibility service, then let π l mj = 1, if l = x ; otherwise π l mj = 0.Then, a unique binary encoding Π q can be generated from a valid natural number encoding Y q .
From Theorem 1, we can see that natural number encoding for the proposed problem is complete, and the solution space is much smaller than that of binary encoding. Therefore, we employ natural number encoding as the basic encode method for the proposed algorithm.
4.3. Crossover. From the particle status updating formula, the learning process is inducted by the local and global optima. The particles take advantage of the two extremes to update their own status, which is similar to the conception of crossover in GA. The particles update their status by probabilistic crossover operation to learn from the local and global optima separately.
The crossover operator is designed like this: where β ∈ ð0, 1Þ is a random number and ω c is the crossover rate. The operation CrossðT, SÞ process is as follows: (1) generates two random numbers a, b ∈ ½0, N − 1 and (2) replaces the fragments of data between a and b on T with the fragments on S, as shown in Algorithm 1.

Mutation.
In order to keep the diversity of the particle population, a mutation operation is designed. The diversity of the population helps to prevent the algorithm from falling into the local optimal solution. According to the characteristics of the proposed problem, in order to search and exploit the most promising region in the solution space, we design a mutation operation based on two heuristic rules described below.

Stochastic Greedy Heuristic Rule.
In this operation, we adopt a heuristic rule: the earlier and lesser conflict visibility services are more likely to be used. For each ground station, if the earlier visibility service is assigned, there may be more visibility services which can be used. If the lesser conflict visibility services are used, there may be more visibility services can be assigned. Therefore, more data can be downloaded to the ground stations. Apparently, they are two heuristic greedy rules. In order to alleviate the premature convergence brought about by greedy rules, we also introduce a stochastic strategy which is based on probability. Supposing the particle position γ i is selected, which represents the data file DF i stored on satellite S j .
Let L = fljVS l ∈ VSS j g indicate the indexes of the potential visibility services.
The temporal selection probability is described as where function f r f is the same as the timeliness reward function defined in Section 3.4.1.
Let VSG l = fVS k | VS k conflicts with VS l g denote all visibility services which conflict with VS l .
The conflict probability is described as where kVSC 1 k is the number of the visibility services. Then, for the visibility service VS l , its selection probability is There is still an important but easily neglected mutation value 0, which represents nontransmission of the data file DF i ⋅ η is introduced to indicate the selection probability of 0, called Do Not Transmit Probability (DNTP) coefficient. The pseudocode of the stochastic greedy heuristic operator on a particle γ i is shown in Algorithm 2.

The Matthew Effect Heuristic
Rule. Since we have made the condition that all the observation data belonging to the same data topic has to be transmitted to ground before the deadline or expected time, then if most of the correlated observation data units are selected, the selection probability of the remainder should be increased; otherwise, the selection probability will be decreased. Just like the Matthew effect International Journal of Aerospace Engineering which in sociology is the phenomenon where "the rich get richer and the poor get poorer." Supposing the particle position γ a is selected, which represents the data file DF a ∈ DS a . Transmission proportion p a can be calculated by its definition. There are three cases for this: (i) p a is greater than the upper bound μ 1 . In order to obtain more reward, we consider all the unselected data files which belong to DS a . The particle positions which represent these unselected data files will change to be assigned with a larger probability (1 − η). Those selected will have a smaller probability (η).
(ii) p a is less than the lower bound μ 2 . It is just the opposite of the former case. In order to fully arrange the visibility services to other data topics, the particle positions will change to a very small value greater than 0 with a large probability (1 − η).
(iii) p a is between μ 1 and μ 2 . In this case, only the particle position γ a is handled by the mutation operation The pseudocode of mutation which combines the stochastic greedy heuristic and the Matthew effect heuristic on Γ is shown in Algorithm 3. ω m is the mutation rate. There is still no acknowledged benchmark test data set for the SDTTS problem. The experiment simulates the scenario including 18 satellites and 5 ground stations. The ephemeris data of satellites is obtained from NORAD (North American Air Defense Command). All ground stations are set with locations in mainland China, such as Beijing, Sanya, and Kashi. The time windows of visibility services between satellites and ground stations are calculated by STK (Satellite Tool Kit). 120 ground targets randomly distributed in the area between −30 ∘~1 50 ∘ E and −30 ∘~6 0 ∘ N. There are several different data topics which have one or more observation data units. The satellite observation target scheduling problem is not studied in this paper, so the results are directly derived from the method proposed in literature [1], as shown in Table 1.

Simulation
After the satellite observation task scheduling is done, experiment scenarios of different scales, totalling 10 groups, are given in Table 2. The scheduling horizon of all the scenarios is one day (24 hours).
As indicated in Table 2, ID denotes the sequence number of experiment scenarios, TPC_N, TPC_W, and TPC_S represent the number of data topics that consist of no relation observation data units, weak relation observation data units, and strong relation observation data units, respectively; TPC = TPC N + TPC W + TPC S is the total number of data topics; OBV is the total number of observation data units; DF is the number of data files, which contain one or more observation data units; VS is the amount of visibility services; and 1. function SG¯Mutðγ i , ηÞ 2. L′ ⟵ L ∪ f0g 3. for all l ∈ L′ do 4.
if l = 0 then 5. g′ðlÞ ⟵ η 6. else 7. g′ðlÞ ⟵ gðlÞ ⋅ ð1 − ηÞ 8. end if 9. end for 10. using roulette wheel strategy to select vs j ∈ L′ based on {g′ðlÞ} 11. set γ i be transmitted in vs j 12. end function Algorithm 2: The stochastic greedy heuristic operation  10 International Journal of Aerospace Engineering CD is conflict degree of the scenario. Conflict degree of the scenarios is defined in literature [25]. The higher the conflict degree is, and the greater number of conflicting visibility services is contained in the scenario. The scale of scenarios s6-s10 is larger than that of s1-s5 and the data topics of s6-s10 do not include no relation observation data units.

Baselines.
To verify the effectiveness of the HPG4TTalgorithm proposed by this paper, we compare it with three heuristic algorithms: first finished first service (F3S) [8], real-time scheduling and proactive data download algorithm (RSPD2) [14], and route reduction-based dynamic programming (R2DP) [16], as well as with three nature-inspired metaheuristic algorithms: simulated annealing algorithm for ground station scheduling (SAA4GS) [20], quantum discrete particle swarm optimization for data transmission (QPSO4DT) [2], and hybrid genetic algorithm combined with neighborhood search for data transmission (HGANS) [26].
Since the objective of the SDTTS problem is only one, the approaches based on MOEAs are not suitable as baselines of our algorithm. Each algorithm has been run for 50 times, and the average results are recorded, having been normalized already.
The hyperparameters of the baselines are to be set following their authors' description. The hyperparameter settings of HPG4TT are shown in Table 3.
From Figure 7, we can see that the computation results by HPG4TT are the best among all of the algorithms, especially in the experiment scenarios s6-s10, of which the data topics only contain weak relation and strong relation observation data units. That is because HPG4TT takes fully into account the data topics. Compared with binary encoding, the natural number encoding of HPG4TT ensures that the searching process will avoid wasting time in searching some invalid regions of the solution space. The breadth searching scope of HPG4TT is enlarged due to the hybrid mechanism of PSO and GA. Furthermore, using the Matthew effect heuristic rule in the mutation of HPG4TT ensures as many data topics as can possibly be completed in order to obtain additional reward. Thus, the exploring and exploiting abilities of HPG4TT are enhanced. Because of the well-designed operators of HPG4TT, the performance of HPG4TT stands out especially when transmission windows are badly conflicted (such as the cases s5, s6, and s7), and when the scale of the problem is large (such as the cases s8-s10).
F3S and RSPD2 are algorithms based on certain heuristic rules, of which are not suitable for the SDTTS problem. R2DP is an algorithm based on dynamic programming with heuristic rules, of which the global searching capacity is better than F3S and RSPD2. SAA4GS, QPSO4DT, and HGANS are all based on nature-inspired metaheuristic algorithms and perform well in relation to their own problems. Although the metrics on data topic have been added to their objective, the performances of the algorithms are not as good as HPG4TT, since they have no special designed operator for the SDTTS problem.  Since the proposed algorithm HPG4TT falls into the category of nature-inspired metaheuristic algorithm, we compare the iterative times of HPG4TT with that of other nature-inspired metaheuristic algorithms in the baseline. Figure 8 shows the comparison of iterative times in all experiment scenarios s1-s10. A statistical method is illustrated as follows: The four algorithms exit when there is no performance improvement for 100 iterations. We run each algorithm for 50 times and obtain the average iteration times.
As is shown in Figure 8, the iterative time of HPG4TT is obviously fewer than that of the other three algorithms, especially when the scale of the problem is large (such as s8-s10) and the percentage of strong relation observation data units is high (such as s6-s10). Due to the natural number encoding, HPG4TT does not need to search for invalid or redundant space in solution spaces. Furthermore, with the help of effective mutation operator based upon the domain knowledge, HPG4TT can not only explore in a large range of solution spaces and have the powerful ability to step out of local opti-mal results but also exploit the most promising region of solution space. It can be concluded from Figure 8 that the convergence speed of HPG4TT is faster than the other three algorithms for the SDTTS problem, because of the welldesigned components.

Comparison of Different Mutation Methods.
To test the influences of different mutation methods in the HPG4TT algorithm, randomicity, precedence, and consistency one are adopted. The randomicity mutation operator randomly selects another visibility service or another data file to download. Every visibility service and data file have the same opportunity to be selected. The precedence mutation operator is based on the stochastic greedy heuristic rule which is described in Algorithm 2. The consistency mutation operator is presented in Algorithm 3 that is used in HPG4TT. We adopt the HPG4TT architecture with three different mutation operators on 4 medium-scale scenarios (s9-s6). The fitness convergence curves are shown in Figure 9 and best fitness values and iterations of convergence are shown in Table 4.
We introduce the reward ratio τ i = g i best /max i g i best to compare the performance of different parameters or algorithms for each test case. From Figure 9 and Table 4, in the methods employed in HPG4TT: randomicity, precedence, and consistency mutation operators, the consistency mutation operator has the best performance. Thus, the consistency mutation operator is more suitable for the SDTTS problem, since it combines the Matthew effect heuristic rule and stochastic greedy heuristic rule both.
The average convergence iterations always behave in the same manner trend for all test cases, that is, randomicity > consistency > precedence. Randomicity mutation operator assigns another visibility service to be used or another data file to be downloaded randomly, instead of taking any domain knowledge from the SDTTS problem itself, so it takes much more steps to find its optima (or local optima) and is more likely to be trapped into local optima. Precedence mutation operator always changes the status of particles

12
International Journal of Aerospace Engineering towards the current optima with a higher probability, so it will fasten the convergence process, but it is also easily to be trapped into the local optima, because of the greedy strategy it used. Consistency mutation operator takes advantage of the characteristics of the SDTTS problem and employs the Matthew effect heuristic rule to enhance the completion probability of data topic transmission jobs. With the help of the consistency mutation operator, HPG4TT can explore more region in the solution space than the precedence mutation operator. Although the iteration number of the consistency mutation operator is a little more than that of the precedence mutation operator, the performance of the consistency mutation operator is much better than that of the precedence mutation operator.

Comparison of Different Crossover Probabilities.
To verify the influences of different crossover probabilities on the HPG4T algorithm, five sets of parameter values are taken.
In each test case, the crossover probability varies from 0.2 to 0.8 with step 0.15. The fitness convergence curves are shown in Figure 10 and the best fitness and the iterations of convergence shown in Table 5. From Figure 10, we can see that HPG4T performs best when ω c = 0:5. From Table 5, it is apparent that experiment results can be divided into three parts P 1 ( ω c = 0:8, 0:65), P 2 ( ω c = 0:5), and P 3 ( ω c = 0:35, 0:2). For P 1 , the average iteration of the convergence is greater than 130 and the average reward ratio can only reach 86.5% (88.7% for the best and 84.4% for the worst); for P 3 , the average iteration of the  13 International Journal of Aerospace Engineering convergence is less than 88 and the average reward ratio can reach 90.5% (92.4% for the best and 89% for the worst); for P 2 , it can obtain the best g best in each test case and the average iterations of convergence is around 107. The explanation of this phenomenon can be found in the particle status updating process. The crossover probability of GA has a similar effect as social-learning coefficient of PSO. if the value of ω c is too large, the particles will rarely take advantage of the two extremes to update their status, then not only the convergence of the algorithm becomes slow but also the    76  87  100  120  120  s8  91  104  119  148  147  s7  90  105  119  147  146  s6  68  79  90  109  110   14 International Journal of Aerospace Engineering performance of the algorithm is poor. On the contrary, if the value of ω c is too small, the particles are almost entirely dependent on the two extremes, which makes the particle population lose diversity; although the algorithm converges faster, it easily falls into the local optimal solution and cannot get out. Therefore, it is important to choose an appropriate value: on one the hand, to take advantage of the two extremes with proper probability to update the status of particles, and on the other hand, to maintain the diversity of the particle population. The result indicates that moderate values can achieve better results for the SDTTS problem.

Conclusion
SDTTS problem is a new problem aroused in aerospace resource scheduling. Addressed to the specific requirements for data topic transmission scheduling, we establish a constraint satisfaction optimization model. A hybrid algorithm consists of PSO and GA designed. And a heuristic rulebased mutation operator is proposed to enhance the performance and speed up the convergence process of the proposed algorithm further. Validated by the experiments on the simulated data, the proposed algorithm shows better searching efficiency and convergence rate than that of the state-ofthe-art approaches on the SDTTS problem.
For future works, we would like to explore more effective algorithms with learnable parameters to obtain better results on the SDTTS problem.

Data Availability
The experiment data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.