Aircraft Tracking Based on an Antidrift Multifilter Tracker in Satellite Video Data

Using remote sensing video to monitor aircraft dynamics is significant for military applications, airport management, and aircraft rescue. The aircraft has a fixed size and obvious characteristics, so it is suitable for correlation filtering. Correlation filtering algorithms can extract features from input data to predict motion trajectories, and the calculation speed of correlation filterings is fast. Hence, such algorithms are advantageous for tracking targets in remote sensing images. In this article, an antidrift multifilter tracker based on a correlation filter and the Kalman filter is proposed for this purpose. This article proposes a temporal consistency-constrained background-aware correlation filter algorithm based on temporal regularization that resists the model drift caused by clouds by using motion information to correct it. Experimental results show that our proposed method shows improved antidrift performance compared with other advanced tracking methods in cases of cloud occlusion and stable performance in other complex conditions. We believe that our model will be helpful for researchers who are interested in object tracking in satellite video, especially for processing satellite video data with cloud occlusion.


I. INTRODUCTION
W ITH the continuous development of video satellite technology in recent years, many video satellites (constellations) have been successfully launched worldwide. Video satellites can continuously observe dynamic changes on the Earth's surface, enabling long-term dynamic real-time monitoring of targets through remote sensing technology. At present, the Jilin-1 satellite constellation has 31 satellites in orbit, and 12 satellites have video imaging capabilities, as follows. The first-generation color video satellites include the Jilin-1 SP-01, SP-02, and LQ satellites. The Jilin-1 SP-03 satellite is a second-generation color video satellite. The Jilin-1 SP-04-SP-08 satellites are third-generation dual-mode push-broom and gaze imaging video satellites. The fourth generation of small-batch-production video satellites includes the Jilin-1 GF-03C01-GF-03C03 satellites. These satellites can provide color videos at ten frames per second (fps) for up to 180 s with a spatial resolution of approximately 1 m. These remote sensing videos provide a basis for developing more diverse and convenient applications.
The development of high-resolution remote sensing video satellites has extensively promoted and enriched modern monitoring technologies and methods. The suitability of satellite data for change detection and monitoring applications depends on the data characteristics. Here, we provide a few examples of satellite video applications: oil and gas exploration [1], disaster monitoring [2], marine monitoring [3], monitoring for ecosystem changes and disturbances [4], traffic monitoring [5], change detection [6], and recognizing and monitoring military objects [7], [8]. Object tracking is a core step of such remote sensing data applications. To date, studies on tools for satellite video tracking, such as the video background extractor algorithm [9], have focused on the detection and tracking of moving targets. These algorithms use pretrained object detection modules to find targets in each frame and track them. Nevertheless, it is difficult to enable such a model to distinguish among objects within a class and acquire moving targets precisely.
Some methods based on deep learning have also been pursued in the existing research [10], [11], but after testing, it has been found that the data processing time of these methods is far from satisfying the needs of practical applications. Some methods based on correlation filtering for remote sensing target tracking have been presented in the existing research, which can serve as a reference for our work. However, the existing methods are often oriented toward a single application scenario. It is not always easy to maintain stable conditions in practical applications. The existing methods [7], [12], [13], [14] focus on simple video scenes and have difficulty dealing with complex conditions in target tracking, such as smoke, clouds, and light spots caused by changes in illumination. Algorithms of this kind also have difficulty when the target is moving slowly. Moreover, to date, research on the detection and tracking of moving targets has mainly focused on ground vehicles [13]. At present, research on other vehicles, such as aircraft and ships, is insufficient. Aircraft are a primary means of transportation and military use today. Therefore, this article focuses on developing an algorithm that can track aircraft quickly and accurately based on correlation filters.
The main problems encountered when tracking objects in remote sensing video are as follows: 1) Targets can be obscured by clouds.
2) Fast movement (large displacement) makes background change leading to tracking difficult. 3) Rotation-induced deformation can change the target appearance. 4) Because of the large data size, tracking can be timeconsuming. Due to widespread problems, there are many data of general quality that have not been used thus far. In particular, data in which targets are obscured by clouds are often not well utilized because of model drift and other problems.
The main contributions of this article are as follows: 1) By considering the potential motion relationships of a moving target during a certain period of time, we introduce a temporal consistency constraint into the BACF algorithm. Extensive experiments show that this method can effectively mitigate model drift. We quickly solve this model by ADMM in the frequency domain. 2) In this article, we use the Kalman filter (KF) to estimate the current location of the target from its visual information and then predict its future position by using the observation sequence. By analyzing and comparing the average peak-to-correlation energy (APCE) in each frame, we can estimate the degree to which we believe occlusion occurs. For cases of occlusion, a corrected fusion strategy based on the weighting of multiple trackers is proposed.

A. Satellite Video Data and Preprocessing
The video data selected for this study include multiple videos taken by the No. 3 satellite of Jilin-1 provided by ChangGuang Satellite Co., Ltd., and the original satellite videos were acquired by the No. 3 Jilin-1 satellite. There are nine normal moving targets, three targets with complex background changes, four rotating targets, and four targets obscured by clouds. The original satellite videos have lengths of over 30 s with a frame rate of 10 fps and a spatial resolution of 0.92 m. The single-frame image size of the true-color (RGB) video is 12000 * 5000, and each video contains more than 300 frames. To facilitate our experiment, we do not use the full-time series videos. In addition, for convenience in labeling, we clip some of the data. Based on actual measurements and statistics, we believe that the size of the aircraft in the remote sensing video data is relatively stable. Size changes under a remote sensing lens are caused by rotation and occlusion; however, an aircraft has unique features, and its outer frame is close to square. Therefore, the changes in target size caused by rotation can be ignored. Therefore, we label the targets with a fixed rectangle. Moreover, we think that cloud cover generally affects objects only for discernible targets, and partially obscured targets are not our research targets.
We collected diversity data under four different conditions: normal flight, complex background change, target rotation, and cloud obscuration. Through the verification of different target tracking in complex situations, it is fully proven that our model can adapt to aircraft tracking in complex situations. In the dataset, there are four series of videos in different cloudy conditions. These four series of videos help us verify that our model has antidrift ability. In Table I, we show the appearance characteristics of the tracking target. In Fig. 1, we show the motion trajectories of 20 sequential targets. The four sequences with occlusion are important research objects. We show the frames in cloudy sequences in Fig. 2.

B. Moving Object Tracking in Satellite Videos
Existing methods for object tracking in remote sensing videos include foreground detection methods, correlation filter methods, and deep learning methods. A common approach is to use time information (as in the background subtraction method, the optical flow method, and the interframe subtraction method) to highlight the areas exhibiting changes in consecutive frames and to start tracking without considering such information for the existing targets. With this approach, under conditions of noise, cloud, and light interference, the moving targets cannot be reliably detected in each frame. Additionally, deep learning methods are rarely selected for remote video applications, mainly because their speed has difficulty meeting the requirements of real applications. Another reason why the deep learning method is difficult to apply is that the existing video data are insufficient and the labeling cost is expensive, which makes it difficult to meet the training needs. Therefore, we instead choose the correlation filtering approach to solve the problem of aircraft tracking.
For learning from greyscale images, Bolme et al. [15] proposed the minimum output sum of squared error (MOSSE) filter, in which the minimum output and correlation frequency are applied for tracking. This method requires only simple calculations and can track objects quickly, but it cannot guarantee accurate tracking when the appearance of a moving target changes. Later, Henriques et al. [16] proposed training a correlation filter in kernel space and exploiting the circulant structure of the training patches. In 2014, Henriques et al. [17] proposed the method of kernelized correlation filters (KCF) by adjusting the channel features to multichannel features and introduced a color name (CN) feature for tracking. The CN feature improves the identification ability of a tracker. However, the adaptability of the tracker to rotation and fast motion still requires improvement. Subsequently, Danelljan et al. [18] and [19] proposed a discriminative scalespace tracker (DSST) using a feature pyramid to solve the multiscale changes problem; later, they also presented an improved DSST algorithm. With the rapid development of deep learning, the continuous convolution operator tracker (C-COT) algorithm [20] has emerged as a combination of correlation filtering and a convolutional neural network (CNN), in which spatial location information is simply represented by the features of a shallow CNN. This algorithm won the 2016 visual object tracking (VOT) competition. Similar to C-COT, the discriminative correlation filter with channel and spatial reliability (CSR-DCF) algorithm [21] also applies CNN features in combination with a correlation filter. The use of CNN features improves the robustness of the algorithm. Tang and Feng [22] proposed multiple kernelized    correlation filters (MKCFs) in 2015. MKCF can achieve stronger discrimination than KCF through the introduction of multikernel learning (MKL) into KCF. In 2018, the MKL-based tracker MKCFup [23] was proposed by reconstructing the correlation filter objective function. This improvement significantly reduced the detrimental mutual interference among different particles. In the learning process of the method of spatially regularized discriminative correlation filters (SRDCF) [24], a spatial adjustment component was introduced to punish the correlation filter coefficients in accordance with their spatial positions. After that, Li et al. [25] proposed a tracker based on spatial-temporal regularized correlation filters (STRCF), combining temporal and spatial regularization constraints, which showed better performance than SRDCF in terms of both accuracy and speed.
The efficient convolution operator (ECO) [26] was introduced as a novel formulation for the training and application of a continuous convolution filter. An implicit interpolation model is used to model the learning process in a continuous spatial domain. However, the above-mentioned tracking methods based on correlation filters are sensitive to boundary effects due to boundary samples that are not truly negative samples in real scenes, which affects their tracking performance. In contrast, the BACF [27] is a learning/updating filter that can effectively extract negative samples from the background in real time rather than focusing solely on moving foreground patches. Before this article, some articles studied tracking methods for occlusion. The visibility of the target will be different even if the cloud is completely obscured by the thin environment. In CFME, proposed by Xuan et al. [12], occlusion is processed through an update strategy. Recently, some convolutional regression network and motion features [28], [29], [30], [31] are integrated for final target location prediction. Shangtang Intelligent Video Team has performed a series of work on the twin network, including the Siamese region proposal network (SiamRPN) [32], which implements the first high-performance twin network tracking algorithm after introducing detection into tracking. Cavity convolution is introduced into the Siamese box adaptive network (SiamBAN) [33]. Experiments show that cavity convolution can increase the receptive field and improve tracking performance. The anchor-free reference in SiamBAN removes the predefined anchor, which reduces the overall parameters of the model and further improves the speed. Siamese fully convolutional classification and regression (SiamCAR) [34] has an additional centrality branch to better determine the location of the target center point. Through the anchor-free strategy, the regression output of the network is transformed into the distance between the feature map point on the search patch and the four sides of the selected ground-truth box. Such methods are not suitable for all complex environments. Target features will change due to cloud cover. Considering this change when expressing features can better track in a cloudy environment.
For object tracking in a remote sensing video, the tracking updates will constantly drift when occlusion occurs. Under such conditions, the principle of an antidrift multifilter tracker (ADMFT) is to learn a relatively stable model over a certain period of time. However, this regularization strategy imposes unequal penalties on the filter coefficients, causing the filter to learn the appearance features of the deformed target. The ADMFT algorithm uses the BACF to process complex background changes. The BACF can deal with rotation by truly negative samples in real scenes. It is difficult to estimate position solely on the basis of appearance features. However, in general, the motion state of an aircraft should always be stable. But predicting only by the motion state could not deal with complex motion trajectories. Therefore, the predicted result for the motion state can be used to correct the predicted position.

C. Development of an Antidrift BACF via the Introduction of Temporal Regularization
First, we briefly revisit the BACF formula. The correlation filter learns the optimal E(h) by optimizing the following formula: where P is a D×T binary matrix, with T being the number of pixels. u denotes a training image sample, v denotes the corresponding output centered on the peak of the target, and W represents the correlation filter. u ∈ R , v ∈ R T and h ∈ R D . u[Δτ i ] denotes the circular shift operator of U. Operator T denotes a conjugate transpose. λ is a regularization. With the application of the circle shift operator, the number of samples will increase. To improve the speed, we express the above-mentioned formula in the frequency domain as follows: Here,ˆdenotes the discrete Fourier transform, and ࣹ denotes the Kronecker product.ĝ is an auxiliary variable. F denotes the orthonormal matrix of complex basis vectors for mapping to the Fourier domain for any T-dimensional vectorized signal. Deformation, occlusion, or a complex background of the target will impact the tracking performance. For example, if occlusion occurs, the BACF tracker will lose the target. Even if the occlusion disappears in subsequent video frames, the tracker cannot relocate the target. In previous studies, termination of the model update process was often used to overcome occlusion. We believe that the main difference between cloud occlusion and the other types of occlusion is that clouds have certain transparency. Many methods of resisting model drift are based on ceasing to update the model when occlusion occurs. We believe that although the apparent features of the target change due to occlusion in the case of cloud occlusion, these changes should not be neglected. For a moving target, the target has a potential motion relationship between consecutive frames. Considering the motion relationship of a moving target within a certain period, we introduce an L2 regularization term constraint and propose minimizing the following objective function to train the improved BACF algorithm: Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  where η is a regularization parameter (λ, η ≥ 0) and µ > 0 is the corresponding penalty factor, which is used to adjust the function of the target in the previous frame for model training in the current frame. The last term in the above-mentioned formula is a global temporal consistency constraint. To improve computational efficiency, a correlation filter is usually converted into the frequency domain by means of the Fourier transform. In this way, the proposed filter can be represented in the frequency domain as follows: To solve the above-mentioned formula, we rewrite it using the augmented Lagrange method [35] L(w,ĝ,ζ) where ζ denotes a complex Lagrangian multiplier. This equation can be solved iteratively using the ADMM technique, and each of the subproblems,ĝ and h, has a closed-form solution.
Subproblem h is solved as follows: where g and ζ are defined as g = 1 Subproblemĝ is solved as follows: We express problemĝ as an independent problem and directly obtain the solution to (7) Subproblemζ is solved as follows: whereĥ = √ T (P F ⊗ I)h and μ is a penalty factor. We update μ by using the iterative ADMM. μ = min(μ max , βμ), where μ max denotes the maximum value of μ and β is a scale factor.
We choose the histograms of oriented gradients (HOG) feature and the CN feature to extract the feature map, where the HOG feature is a gradient feature and the CN feature is a color feature. Accordingly, these two features can complement each other to help better satisfy the tracking objective.

D. Motion Estimator
In the KF, only the current measured value and the estimated value from the previous sampling period are needed to estimate the state, which does not require much storage space. The number of calculations in each step is small, and the calculation steps are clear, making this filter very suitable for computer processing. The KF can help estimate the positions and velocities of moving targets. However, the parameters of the KF are difficult to determine. To this end, we use a frame-based parameter selection strategy. Specifically, we use the expectation maximization (EM) algorithm to estimate the parameters [36] when the frame number is greater than a certain threshold. When the frame number is greater than a certain threshold, the dynamics and observation model can be written as follows: where x t and x t-1 are the state vectors of the system at times t and t-1, respectively. In this article, we choose the state vector x t = [xs t , ys t , xv t , yv t ] , where xs t and ys t are the horizontal and vertical positions of the target, respectively, at time t and xv t and yv t are the horizontal and vertical velocities of the target at time t. w t and r t are Gaussian-form noise matrices, with the distributions of the covariance matrices being Q t and R t . Since the time between any two consecutive frames is short, it can be assumed that moving targets such as vehicles move with uniform linear motion. When occlusion occurs, we use the previous motion state to estimate the motion state under occlusion. Assume that x t and y t are given for 0 ≤ t ≤ T occ (the time of occlusion occurrence); then, the likelihood of A, C, Q, and R can be written as follows: This equation can be expanded as follows: where Tr is the trace of a matrix and β is a constant. By maximizing l(A, C, Q, R | x, y) for A, C, Q, and R in turn, we can obtain We represent the motion state estimates as follows: where x t+1 is the optimal state estimate and K is the KF gain matrix. In the inference stage, the calculation of the KF includes only 10 instances of matrix multiplication, 5 instances of matrix addition, and one calculation of the reciprocal of a 2×2 matrix.
Compared with the computational complexity of the correlation filter, the increase in computational complexity is very small. The KF offers high accuracy in estimating the target motion state, but the KF is very complex. This filter can converge only when sufficient frames are used to update the filter. To estimate the motion of a moving target before KF convergence, we propose a method of simulating the real motion state by using an assumed motion state. We can assume that the target moves in a uniform, straight line over a short time, even if the target is in a state of turning, stopping due to an emergency, or accelerating. Based on this assumption, the speed of the moving target in the current frame can be estimated from the average  displacement with respect to the previous frame. The moving target's position in the current frame can be estimated by using the speed and position of the moving target in the previous frame. Therefore, the values can be estimated as described in the following equations: where S t-1 is the state vector of the target at time t-1, y t ) is the position vector of the target at time t; and φ is a transfer matrix, which can be written as follows: n is the number of frames used for estimation. If n > Num confident , then the parameters can be ensured by the EM algorithm. Using a frame-based parameter selection strategy can help us obtain the most suitable parameters.

E. Tracker Fusion Based on the APCE
To combine the result of motion state prediction with the result of the correlation filter, we propose a combination strategy based on multipeak matching Fig. 3. First, to judge whether the target is occluded, we calculate the APCE, the degree of fluctuation of the response diagram, and the confidence level of the detected target where F max , F min , and F w ,h represent the maximum and minimum response values and the response at position (w, h), respectively. The current APCE value will be significantly reduced relative to the historical mean when the moving target is blocked, changed, blurred, or lost. Consequently, the current response diagram will oscillate and exhibit a multipeak phenomenon. At this time, confidence in the target center position is considered to be low. Generally, when multipeak oscillation occurs, the response value at the center of the target will also be significantly reduced, that is, the peak F max will generally be lower than the peak without interference. It can be seen that F ymax reflects the confidence in the target center position from the local part of the response diagram, whereas the APCE reflects its confidence from the overall response diagram. Accordingly, higher confidence can be achieved by combining the two in the current frame t only when the y max and APCE values are in a certain proportion, represented by α and β. In this article, we set α to 0.5 and β to 0.3. α and β can be adjusted in the reality; for example, if the movement is complex, the KF should be suppressed, and β should be downwards. If the object has confusing features, α should be downwards. When the historical mean value is exceeded, it is considered that the target center position has high confidence, that is, two conditions need to be met simultaneously.
Each time F i,max and APCE j are calculated, the values will be saved in the corresponding sets P y and P E as a pair of historical values for the next judgment. To reduce the number of calculations in the algorithm, we do not correct the position in each frame; however, when there is multimodal oscillation in the current frame t and the target center position may be judged incorrectly, that is, when the F max and APCE values do not meet the conditions for high-confidence detection, we introduce motion information to correct the position. At this time, we fuse the motion information with the relevant filter information in a weighted manner, replace the current prediction result with the fused result, and update the filter model with the current result

A. Performance Measures
To evaluate the performance of our proposed algorithm, we use one-pass evaluation (OPE) as the evaluation protocol. This protocol was proposed for the OTB-2013 benchmark [37]. OPE relies on two plots, which are called the accuracy plot and the success plot. The accuracy plot shows the percentage accuracy of the predicted positions relative to the ground-truth values at different thresholds. The success plot represents an average overlap measure [38]. Given the result bounding box b r and the ground-truth bounding box b g , the success score (success) is calculated as follows: where represents the intersection of two regions, represents the union of two regions, and s represents the area of a region.
The AUC is defined as the area under the receiver operating characteristic curve.
To evaluate the performance of our proposed tracker, we also adopt the center location error (CLE), which is the average Euclidean distance between the center location of the estimated target and the ground-truth target center location.
Similar to BACF, we adopt the regularization factor λ is set to 0.01, and η is set to 10 −4 by experience in Fig. 4. For the ADMM optimization, the number of iterations and the penalty factor μ are set to 2 and 1. The penalty factor at iteration i+1 is  For cloudy data, drift often occurs in the last frame; therefore, when estimating, we focus on its trajectory and drift degree.

B. Quantitative Evaluation
In our experiment, to ensure fair comparisons, a list of the most advanced trackers of the same type, i.e., the top-performing trackers that function similarly to the proposed ADMFT, was compiled as the set of trackers considered for comparison. For this purpose, the efficient convolution operator with handcrafted features (ECO) was selected from among trackers based on handcrafted features as a representative tracking model with good performance. The circulant structure of the tracking-bydetection with kernels (CSK) algorithm also achieves good performance by introducing the kernel technique and ridge regression into MOSSE. The CN approach is a good method to obtain color features. It achieves good performance on images with obvious color contrast. The CSR-DCF algorithm combines spatial reliability and channel reliability methods in image segmentation to more accurately select the effective target tracking area. MKCFup significantly reduces the detrimental mutual interference among different particles. The STRCF method constrains the effective scope of the filter template to overcome the boundary effect. The BACF is the basic method on which our improved algorithm is based. SiamRPN combines the twin network in tracking and the regional recommendation network in detection: the twin network can adapt to the tracking target so that the algorithm can use the information of the tracked target to complete the initialization of the detector. The regional recommendation network allows the algorithm to predict the target location more accurately. SiamBAN adopts the anchor-free strategy, which does not preset the size of the anchor box so that the box has more powerful degrees of freedom. SiamCAR has an additional centrality branch to better determine the location of the target center point. We compare our improved method with the above-mentioned advanced methods.
Our model ensures high accuracy and a good antidrift ability while maintaining high operation efficiency. The purpose of our experiments is to verify that under a variety of different target states, it achieves an AUC that is higher than those of other trackers. It can adapt to complex conditions. The performance and programming language specifications are given in the following table.
Moreover, the frame rate of our method reaches 83.84 fps, which is only 0.13 times slower than that of the BACF before improvement. Our experimental environment is as follows: the algorithms are executed on a Windows 10 system with an Intel(R) Core(TM) i7-9700 CPU and 16 GB of RAM. The performance is given in Table I.  Table III shows the CLE results (in pixels) achieved by our proposed tracker and the other approaches on 20 sequences. Our model achieves satisfactory performance among the compared methods, with a small CLE. The results show that the ADMFT is robust on video sequences with fast motion, occlusion, and deformation of the tracking targets.
The ADMFT uses a correlation filter for position estimation. When occlusion occurs, its influence will be corrected based on the motion state. Its average CLE is 4.363 pixels, which is greatly superior to the results for the other correlation-filterbased trackers (CFTs). These results show that the proposed combination of the KF and a correlation filter is quite effective for position estimation.
Furthermore, a frame-by-frame comparison of the CLEs on the 20 sequences is shown in Fig. 5. The vertical axis represents the CLE, while the horizontal axis represents the frame number in the image sequence. The proposed ADMFT produces favorable results for the 20 targets. Compared with the other eight methods, the ADMFT handles cloud occlusion well. In the complex_background_01-03 videos, the targets suffer from background clustering, and the ADMFT achieves better performance. On videos showing aircraft moving at regular and slow speeds, almost all trackers perform well. In videos with aircraft rotation (rotation_01-04), the target appearance changes significantly with deformation and illumination variation, and most trackers fail to track the target at the beginning of the image sequences. However, our method succeeds in estimating the position of the target.   It can be seen from the above-mentioned findings that our algorithm can track targets more accurately under normal conditions than the BACF method before improvement. Moreover, the ADMFT has an improved tracking ability in cases of occlusion, especially in terms of the balance between resisting occlusion and performing well in general situations. Consequently, our method is far superior to the existing methods. Occlusion is widely encountered in remote sensing images. Because general methods do not have a high ability to deal with the problem of cloud occlusion, even though targets in cloud-occluded image data can be observed by the human eye, these data have often been directly abandoned in previous practice. Under the previously available methods, such cloud-obscured data have not been well utilized, and the quantity of data discarded for this reason is considerable. Therefore, the ability to effectively use these data is expected to be of great significance in research and practical applications.

C. Antidrift Ability Evaluation
From a comparison of the CLEs for cloudy_little_01 and cloud_little_02 in Fig. 1(b) and (c), it can be seen that the ADMFT achieves constant, stable tracking in the case of thin clouds. To more intuitively see whether the target suffers from a position offset under cloud occlusion, we comprehensively consider the accuracy and success rate results for the cloudy_little_01(b) video and observe the corresponding trajectories in Figs. 6 and 7. It is found that in the case of thin cloud occlusion, although the original method is only slightly disturbed, our model can still more effectively resist the interference and achieve better performance.
We also comprehensively consider the accuracy and success rate results for the cloudy_little_02 video and observe the corresponding trajectories in Figs. 8 and 9. Again, it is found that in the case of thin cloud occlusion, the original method is only slightly disturbed, but our model still performs better. However, the Siam-trackers model drifts in this sequence, and the Siamtracker cannot deal with sudden changes in two adjacent frames.
In the case of medium cloud cover, some methods fail badly, and the trajectory offset is serious. All other methods show performance fluctuations to varying degrees, whereas our method is stable. The CN approach uses color features. The CSR-DCF algorithm combines CNN features to better express the target characteristics. MKL achieves a stronger distinguishing ability than KCF in MKCFup. The results of these trackers show slight deviations. When the target is initially occluded, these methods can obtain features that are not occluded to continue tracking the target. However, when the target is occluded and then exposed, the previously occluded part is often exposed first, and because the features of this part have not been learned for some time, they cannot well represent the target. Improving the feature extraction capabilities is effective for transient cloud occlusion but cannot overcome the influence of the occlusion caused by dense clouds. However, feature extraction over a certain period of time can solve this problem. In the cloudy_less_01(a) video, the STRCF results drift because of rotation. The STRCF method does not intensively extract negative examples from the background in real time rather than focusing on moving foreground patches. Introducing the time consistency constraint into the BACF algorithm endows it with good tracking ability that can adapt to complex scenes. We comprehensively consider the precision and success rate results for cloudy_less_01(a) and observe the corresponding trajectories in Figs. 10-12. As seen in the above-mentioned figures, in cases of occlusion, our method effectively limits the model drift compared with the original BACF algorithm. Through incorporating motion features, the prediction of the target position is improved.
In the case of thick cloud occlusion, the model update process gradually causes the model to prioritize cloud features over real target features; however, our model is less polluted when the target enters a cloud. In this case, the target becomes seriously occluded within five frames. It can be seen that other methods suffer from model drift, resulting in a gradually increasing target offset. Due to improper updates, the CN, CSR-DCF, and MKC-Fup methods also exhibit model drift. In contrast, the correction process based on the motion state and time regularization helps our method predict the target's position more accurately. When the clouds are thick and the aircraft body is severely blocked, our method can still closely track the target in Fig. 13, while other methods suffer from offset.
We used white blocks to simulate clouds of different transparencies and performed tests using the simulated data to verify the performance of the ADMFT. We selected several targets with good tracking performance for each tracker as the research targets and used white blocks with transparency values of 0.2, 0.5, and 0.8 to occlude half of each target to test the performance of the trackers. In this verification test, our method can still achieve stable tracking under occlusion with a transparency value of 0.8, as shown in Fig. 14.
In CFME, a method using short-term motion state prediction to replace the prediction value in the case of model drift is proposed. After testing in four cloud occlusion videos, the CFME method is an efficient tracking method that performs well in the case of a simple motion state. However, after verification in the video, the CFME method cannot track the target normally in the case of occlusion with a complex motion state in Fig. 15.

D. Ablation Experiment
The temporal regularization strategy can improve the antidrift ability of a tracker. The features extracted over a period of time are more stable than those extracted from only one frame. Here, we compare the peak distributions before and after the introduction of temporal regularization. It can be seen in the peak diagram that the peak fluctuation with our method is slight in Fig. 16, resulting in a better anti-interference effect.
The motion estimator also helps to correct the tracking results. To test its contribution, we used only the antidrift BACF with temporal regularization on cloudy_less_01(a). The tracking result is shown in the left part of Fig. 17, and the tracker fusion result is shown in the right part of the figure.
As seen in the above-mentioned comparisons of the influence of the motion estimator and the temporal regularization strategy, these two model components both improve the tracker's ability. The motion estimator incorporates motion information, which is always stable in aircraft tracking. The temporal regularization strategy can help the tracker learn stable features over a certain period of time.
The features of the target will change upon entering a cloud. During this process, the target states will serve as our essential reference values for updating the model. Model drift often occurs before and after the target enters the cloud. The greater the degree of occlusion is the more pronounced the model drift. Some tracker models will drift during the update process. Our method restricts the model update by means of L2 regularization and uses the motion state to correct the trajectory. Thus, tracking failure caused by drift is successfully avoided.

IV. DISCUSSION
The ADMFT was written in Python and implemented on a PC with a 3.00 GHz CPU and 16 GB of memory. Our experiments show that the developed ADMFT can process video data at more than 83 fps. In summary, the remote sensing image features remain unchanged, and the main model drift is caused by cloud occlusion. For this type of drift, we successfully solve the drift problem by learning features over a time series and correcting the target model. Moreover, in the general process of aircraft flight, the proposed method can effectively obtain the features of the target aircraft to address the aircraft tracking problem in complex scenes. The ADMFT can be widely used as an aircraft tracker for satellite video. However, the ADMFT only works for aircraft, so we did not consider the scaling problem or target labeling with rotation rectangles. Moreover, the tracking efficiency can be further improved by implementing the algorithm in a parallel processing system.

V. CONCLUSION
This article describes an effective method for tracking moving aircraft in satellite videos. For video tracking of moving aircraft, we design a BACF based on temporal regularization to address the model drift caused by cloud occlusion and use the ADMM to speed up the solution process. In addition, the KF helps improve detection accuracy. The APCE is used to judge whether the trajectory needs to be corrected. The motion information can correct the trajectories of objects in simple environment and temporal regularization can help resist the influence of occlusion. We tested our method on satellite videos by tracking 20 moving aircraft of different sizes and in different dynamic states. The proposed ADMFT algorithm achieved better tracking accuracy than the most advanced existing algorithms. Specifically, its tracking effect for targets under cloud cover is better than that of other advanced methods. The ADMFT exhibits good robustness and can handle various remote sensing video environments for airplanes. In future work, we will further test the tracking capabilities for other remote sensing targets, such as ships and ground vehicles.