GDDS: Python software for GNSS data download

With the rapid development of global navigation satellite system (GNSS), GNSS data products have been widely used for high-precision positioning and navigation applications. They are typically downloaded from the international GNSS service (IGS) analysis centers and continuously operating reference stations (CORS). However, the conventional GNSS data download method is cumbersome, repetitive, and time-consuming, and it is challenging to meet the demands for rapid acquisition of multi-source data products. Therefore, we have developed a GNSS data download software with Python, which provides an interactive interface for the Windows or Linux operating system to realize the efficient and stable download for a large amount of GNSS data. The software includes five main function modules: Global IGS Data, Post-Processing Product, Regional CORS Data, Custom Download, and Data Decompression. It has the characteristics of diverse data products, map interaction support, and station information retrieval, which can meet the needs of different users.


Introduction
Global navigation satellite system (GNSS) data products are the fundamental prerequisite for high-precision positioning and navigation applications. With the rapid development of high-precision data processing, the importance of GNSS data products is becoming more and more prominent (Montenbruck et al. 2017;Bruyninx et al. 2019). On the one hand, the scope of GNSS services continues to expand, and it has become an essential alternative to traditional technical means (Farrell and Wendel 2017). On the other hand, the development of GNSS itself promotes the rapid growth of GNSS data products in terms of variety and quantity to meet the research needs of atmospheric sounding, reference frame, earth rotation, and crustal movement (Beutler et al. 2009;Yu et al. 2019;Vaquero-Martínez et al. 2020;Hu et al. 2020).
The international GNSS service (IGS) analysis centers and continuously operating reference stations (CORS) provide GNSS data products free of charge for global users. Users typically visit the target uniform resource locator (URL), switch between different file paths to find these corresponding data products, and finally click on the target filename to download them one by one. However, this conventional mode is cumbersome, repetitive, and timeconsuming, especially for long-term, multi-station data product download tasks. In order to improve the efficiency and flexibility of download services, there are usually three download methods based on scripts, tools, and webpages. Download scripts are batch files written in scripting languages such as Shell. For example, GAMIT (Herring et al. 2018) and PRIDE PPP-AR (Geng et al. 2019) use this method to implement the download function. Nevertheless, the corresponding script file must also be modified once the download target is changed. Download tools are software packages developed in languages such as C/C++ , including RTKLIB/RTKGET (Takasu and Yasuda 2009) and GAMP II-GOOD (Zhou 2022). However, RTKLIB/RTK-GET only supports IGS data product download, and GAMP II-GOOD lacks a visual interface. Download webpages are The GPS Toolbox is a topical collection dedicated to highlighting algorithms and source code utilized by GNSS engineers and scientists. If you have a program or software package you would like to share with our readers, please submit a paper to the GPS Toolbox collection or email ngs.gps.toolbox@noaa.gov for more information. To download source code from this or any GPS Toolbox paper visit our website at http:// geode sy. noaa. gov/ gps-toolb ox * Tangting  , and other research institutions both provide download service pages on their official websites, but most web-based download services only support data products released by their own research institution. Considering the shortcomings of GNSS data download methods mentioned above, we developed an easy-operating GNSS data download software to obtain various data products quickly. It is designed as a multi-platform application and provides an interactive interface for Windows or Linux operating systems. The following sections start with an introduction to the GNSS-data URL, which is essential for data download. Afterward, we describe the software design procedure in detail, and then software features are separately investigated. Furthermore, we evaluate software download performance from different perspectives. Finally, the conclusions are drawn.

GNSS-data URL
URL is the basis of data download, which determines whether the download task is correct and is directly related to the final download result. Therefore, we take GNSS-data URL as a starting point and clarify its structural compositions for subsequent automatic URL configuration support.
GNSS-data URL is an identification method used to describe GNSS data address, and the address represented by this method is unique. The complete URL includes protocol, domain name, file directory, and file name, as shown in Fig. 1. • A protocol describes the file transfer type used to download GNSS data products from the server to the local. Standard protocols are FTP, HTTP, and HTTPS. Since different GNSS data download sources correspond to different protocol types, an appropriate protocol scheme needs to be adopted in the download process.
• A domain name represents the name or IP address of the server where the target file is stored, depending on the target download source. GNSS data download sources are divided into global and regional data download sources according to the service scope. Among them, global data download sources are provided by IGS. In contrast, regional data download sources come from some CORS sites. • A file directory is where the target GNSS data are stored on the server. Taking the IGS data center of Wuhan University as an example, we introduce the storage rules of main data products. For instance, observation data, meteorological data, and broadcast ephemeris are stored in the /pub/gps/data path and subdivided by year, day-ofyear (DOY), and file type. Post-processing products are mainly concentrated in the /pub/gps/products directory and classified according to GPS week. • A file name is the specific name of target data. Although GNSS data products come from different CORS sites and processing agencies, they always follow the same file naming rules. For example, in the file name "abmf3000.21o.gz", "abmf" is the station name, "3000" means that the DOY corresponding to the file is the 300th day, and the last letter indicates the serial number. "21" is the abbreviation of 2021, "o" shows that the file type is an observation file, and "gz" is the extension of file compression.

Software development
Since the GNSS data download process mainly revolves around the GNSS-data URL, we develop a GNSS data download software (GDDS) by integrating configuration, request, link, and transmission of the GNSS-data URL. We first conceive the data download process of the software and then complete the software implementation based on this process.

Software design
We use Python to design GDDS, which can automatically configure the URL of the target product according to user demands, send the URL request based on the secure network session, and realize GNSS data download. The GDDS design process includes user interaction, URL configuration, simulated login, and file download, as illustrated in Fig. 2. • The user interaction part provides a user-software interaction platform. Users can customize various download information on this platform, such as download source, time range, and file type. Note that the observation station needs to be added if the file type is observation data. Then the signal is transmitted to the back-end through the connection mechanism between signal and slot to complete the interactive operation. • The URL configuration part is responsible for processing download information set by users and formulating the corresponding URL. The matching protocol and domain name are searched according to the selected download source. After that, combined with file storage mode and file naming rules in the target server, the file directory and name are acquired from information such as time range, file type, and station name. Finally, merge to generate the complete URL. • The simulated login part automatically verifies user information of the server and creates a secure and reliable network session environment. It will judge the URL protocol type, generate the corresponding session record, and use the appropriate protocol solution to submit user information (account and password) embedded in the background to the server and finally complete the user identity verification. • The file download part is to download the target file to the local. Based on the session foundation completed by user information authentication, this part sends various download requests to the target server, establishes the data transmission channel, writes the website files to the local disk, then displays and records this process in the form of the progress bar and log. Besides, multithread download, breakpoint transmission, and exception handling are integrated to ensure the efficiency and stability of the download process.

Software implementation
Based on the above design process, we use Python GUI library PyQt5 to develop GDDS function modules, i.e., Global IGS Data, Post-Processing Product, Regional CORS Data, Custom Download, and Data Decompression.
• Global IGS Data: to download various data provided from the IGS data centers, such as station observation data, meteorological data, and broadcast ephemeris; • Post-Processing Product: to download precision products released by the IGS analysis centers and other rele-vant institutions, such as precise ephemeris and precise clock offsets; • Regional CORS Data: to download CORS data, in which the supported data types are mainly station observation data, and some also include meteorological data and post-processing products; • Custom Download: to realize various GNSS data products download based on the target GNSS-data URL configured by the user; • Data Decompression: to support batch decompression of UNIX compressed files (with Z, gz as extensions) and CRINEX files (with d, crx as extensions).

Software features
GDDS has advantages that help users search and obtain target data conveniently and quickly to improve the quality of download services. The following three representative features are described in turn.

Diverse data products
For the sake of better support for multi-source GNSS data product download, GDDS provides four data download modules for global IGS data, post-processing product, regional CORS data, and custom download. Each module distinguishes various download sources and file types; therefore, data products are rich and diverse. The URLs corresponding to these different data download sources (global or local CORS) are shown in Table 1.

Map interaction support
In order to intuitively understand the distribution of stations, Baidu map API development is integrated into GDDS to mark the location information of all stations on the map to realize the station visualization function. This function has a wide range of applications: not only can it be used to study the vertical motion characteristics of stations in different geographical locations (Fuhrmann et al. 2014) but also to promote related research work about the influence of station distribution on earth rotation parameter estimation, satellite orbit determination, water vapor tomography, and reference frame conversion (Ferland and Piraszewski 2009;Pedro et al. 2018;Zajdel et al. 2019). Given the needs of different users, the map is also compatible with various interactive functions, such as distance measurement, frame selection, and area calculation. Distance measurement obtains the straight-line distance between any two stations on the map, which is applicable to research the optimal observation duration and corresponding solution strategy of different baseline lengths (Amiri and Tiberius 2007). Frame selection quickly selects all stations in the target area and provides three options (i.e., rectangle, polyline, and circle), which are suitable for regional ionospheric modeling and crustal movement research (Hu et al. 2020;Kaftan et al. 2021). Area calculation is to accurately calculate the actual area of a selected region, which is conducive to the practical analysis of station density on deformation monitoring (Turen and Sanli 2019).

Station information retrieval
GDDS adopts crawler technology to obtain observation files in batches according to the webpage structure of different download sources. Then, it extracts station coordinates, receiver, and antenna type for each station and summarizes them into a station information database. Therefore, based on this information database, we can quickly obtain basic information about the target station. On the other hand, the stations with specific information can be retrieved and matched, such as all stations of the same receiver type, which aim to explore the relationship between the receiver type and the pseudo-range deviation of the satellite navigation system (Choi and Lee 2018;Zhang et al. 2021).

Software performance evaluation
In order to further evaluate the data download performance of GDDS, three modules of "Global IGS Data," "Post-processing Product," and "Regional CORS Data" are tested and analyzed, respectively. The test environment is ①Address: Nanchang, Jiangxi, China; ② Computer: Intel Core i5-7200U CPU@2.50 GHz, 8.00 GB memory, Win10 64-bit operating system; ③Internet: East China University of Technology campus network, the average network speed is about 100 Mb/s or 12.5 MB/s.

File download for different sources
In the "Global IGS Data" module, by comparing and analyzing the download efficiency of the identical files from different IGS data centers, appropriate IGS data download source strategies suitable for users are formulated. Since the current network restrictions cannot access the US CDDIS server, only five IGS global data centers, including China WHU, USA CDDIS, French IGN, European ESA, American SIO, and South Korea KASI, have been tested. The download time of GPS broadcast ephemeris files (abbreviated as N) from these IGS global data centers with a mean size of 66 KB/day in three months is counted by the average of ten tests, and the results are illustrated in Fig. 3.
As shown in Fig. 3, there are noticeable differences in service capability among different IGS data centers. Under the ECUT campus network test environment, WHU performs best, CDDIS, IGN, and ESA followed, and finally SIO and KASI. One reason is that the server and test environment of WHU are both in the same country, so the network latency is low. Besides, the WHU server has a high hardware configuration, supports sufficient concurrent visits, and is suitable for multi-thread downloads. However, other servers are far from the IP address where the request is sent. Hence, the request information needs to pass through many intermediate network nodes before reaching the server side, resulting in response delays; Furthermore, the hardware configuration of different servers also restricts overall access speed. When the number of accesses reaches the upper limit allowed by the server, an overload phenomenon occurs, and the server enters a temporary sleep state. Subsequent download requests will not respond unless the current download access is processed.

File download for different sizes
In the "Post-Processing Product" module, we download GNSS products of different types and periods. As mentioned above, WHU performs best in the campus network, so the download source tested here adopts WHU. The selected download files are BIA files (OSB Bias-SINEX, 10 KB/day), SP3 files (Precise Ephemeris, 93 KB/day), I files (Global Ionosphere Map, 144 KB/day), SNX files (Weekly Solution, 1.63 MB/day), CLK files (Precise Clock, 3.05 MB/day) and TRO files (Troposphere Delay, 3.16 MB/day). The download period is from January to May, and the average time consumption of ten test results is recorded. The statistical results are shown in Fig. 4.
As depicted in Fig. 4, GDDS only takes a very short time to download KB-sized files, and even quarterly download tasks can also be completed quickly, within tens of seconds. By contrast, downloading MB-sized files takes slightly longer, but monthly download tasks are just a few minutes. From the comparison of different sizes, GDDS download efficiency decreases with the increase of file sizes. The main reason is that the download speed of KB-size files is more restricted by I/O throughput. As for MB-sized files, however, the restriction of I/O throughput on download speed is weak, and at this moment, it is more affected by the network transmission rate. Under the premise of a stable network transmission rate, multi-thread download can effectively improve I/O throughput, so GDDS, with this technology, has better download efficiency, especially for small-sized files. Given that GNSS data products are generally KB-sized and MBsized files, it follows that the overall download efficiency of GDDS is outstanding.

File download for different stations
In the "Regional CORS Data" module, we download observation data for different numbers of stations in one day. The download object is compressed format observation files (RINEX 2.11 d files) of American CORS on January 1, 2021. As stated above, the average download time for a different number of station files is counted, and the download speed is recorded simultaneously. As Fig. 5 shows, with the increased number of download files, the download speed of GDDS gradually accelerates and finally tends to stabilize. This phenomenon is closely related to multi-thread technology with multiple parallel computing threads for different download tasks. In the case of waiting for I/O calculation, the CPU will skip to execute other thread tasks, thus significantly improving the operation efficiency. In other words,  multi-thread technology has more remarkable advantages in dealing with numerous tasks. However, once the number of tasks exceeds a specific limit, the benefits of multithread technology have reached a bottleneck. At this time, the download speed is mainly limited by the network transmission rate, which causes the file download speed to change gently.

Conclusions
The open-source software GDDS makes up for the complex and cumbersome defects of conventional website download mode and the shortcomings of existing download patterns. It can quickly obtain massive multi-source GNSS data products. This software has the characteristics of various data products, map interaction support, station information retrieval, and visualization interactive interface to satisfy the data download needs of individual users. All in all, it has strong practicability for data download in the fields of GNSS scientific research and engineering applications. Nevertheless, some bugs and omissions may inevitably exist in GDDS, and software details will be continuously updated for further application. Comments and suggestions from readers and users are sincerely welcome to the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third-party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit https:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Code availability
The software is available on the GPS Toolbox website at: https:// geode sy. noaa. gov/ gps-toolb ox.