The Inconsistent Pairs Between In Situ Observations of Near Surface Salinity and Multiple Remotely Sensed Salinity Data

This study employs three remotely sensed Sea Surface Salinity products to diagnose the “inconsistent pairs” between the in situ observations of the Near Surface Salinity from delayed‐mode tropical moored buoys and Argo floats and the satellite salinity in the temporal range of April 2015–December 2018. By means of an adapted 3‐Sigma criterion and unanimous voting strategy, 11 (636) moored buoys (Argo floats) have at least one inconsistent observation pairs in their time series and 1 (41) have more than five. Besides, the time series of 1 (25) moored buoy (Argo floats) is diagnosed as inconsistent series due to the large bias of the whole series. Corresponding to a wide range of shifted observations, the continuous inconsistent values of moored buoy 8n38w can be flagged as bad observations. In terms of Argo, the combined analysis of time series, trajectories, profiles, and analyzed fields implies that the inconsistent pairs between Argo and satellite products are closely related to the mesoscale motions. The results suggest that sub‐footprint variability plays a dominant role in the inconsistent pairs of Argo, as most inconsistencies are characterized by the near‐surface mixed layer. Furthermore, the continuous positive inconsistencies of Argo 4901466 highlight the temporal under‐sampling of the existing satellite salinity products.

Different from the in situ salinity typically inferred from conductivity, the measurement of remotely sensed SSS bears more uncertainties due to, for example, the low sensitivity of brightness temperature (T B ) to SSS and the contamination of radio frequency interference (RFI) (Reul et al., 2020). Therefore, near-surface in situ salinity is mostly considered as a reference in the retrieval or correction of SSS Mu et al., 2019;Vernieres et al., 2014) and the evaluation of SSS products Tang et al., 2017). However, there can be large differences between remotely sensed SSS and in situ near surface salinity (NSS) (Lee, 2016;Yan et al., 2019), which are mostly dropped as outliers in the evaluation. The large differences are not necessarily induced by erroneous satellite observations. After all, the microwave sensor can only penetrate ∼cm and the satellite salinity is in substance the average value of the satellite footprint. Boutin et al. (2016) has summarized these factors as the sub-footprint variability and near-surface stratification. Nevertheless, the signatures of these two factors in the time series of in situ observations are not systematically investigated. It is still not figured out which float is significantly affected by sub-footprint variability or near-surface stratification and when and where it happened.
Besides, it is less considered that the differences possibly result from the problematic measurements of in situ NSS. In fact, the comparison between the time series of TAO and SMAP SSS has illustrated the obvious shift of several real-time TAO buoys Tang et al., 2017). These erroneous observations have been inevitably transmitted to those in situ-based data products (e.g., EN4 products revealed in Bao et al., 2019), potentially having a bad impact on scientific research. Thus Tang et al. (2017) suggested that satellite SSS could be used to perform real-time quality control (QC) of mooring salinity data.
Thanks to the efforts of scientists in improving the retrieval and debiasing algorithms, the remotely sensed SSS has become more and more reliable and thus presents consistent pattern in the analysis of ocean phenomena (see Reul et al., 2020 and the references therein). In this case, this paper takes the time series of the remotely sensed SSS as the reference instead and systematically diagnoses and analyzes the inconsistent pairs between the in situ NSS and satellite SSS. Two kinds of in situ data are considered, including the GTM-BA buoys and Argo arrays. It must be noted that, different from Guinehut et al. (2009) that uses altimeter data in Argo quality control, we are not aiming at performing QC on the dubious in situ NSS based on satellite SSS. As revealed by Boutin et al. (2016), there are innate differences between two kinds of measurements. Hereafter, the collocated inconsistent observations of in situ data and satellite data are defined as the "inconsistent pairs." Our main objective is to tabulate these inconsistent pairs and analyze the potential causes of them.
The paper is organized as follows. In Section 2, the data and method to diagnose inconsistent pairs are introduced. Sections 3 and 4 present the inconsistent observations of GTMBA and Argo NSS, respectively. In Section 5, the collocated inconsistent pairs are discussed and explained. Finally, the results are concluded in Section 6. The tabulated inconsistent pairs and the trajectories of the inconsistent series of Argo can be seen in Appendix A.

Remotely Sensed SSS Data
Three remotely sensed SSS products are employed. a) The 8-days running SMAP V3.0 L3 data set (70-km version) on a 0.25° × 0.25° grid. The SMAP SSS product has two versions, that is, the 40 and 70-km version. The 40-km version is directly retrieved by the geophysical model adapted from Aquarius V5.0 retrieval algorithm . The 70-km one undergoes smoothing based on the 40-km version and is recommended as the official version. Note that SMAP is more reliable in the RFI-contaminated regions such as the Gulf of Mexico (Fournier et al., 2016), benefiting from the onboard RFI filtering device. b) The 9-days running SMOS Barcelona Expert Center (BEC) V2.0 L3 data set with the spatial resolution of 0.25° × 0.25°. It is retrieved by the non-Bayesian algorithm, which could effectively correct the spatial biases (Olmedo et al., 2017). The "low-resolution" data are used in this study, which is firstly binned onto 0.25° rectangle grid and then smoothed through a 50 km Gaussian filter. c) The 9-days 25 × 25 km 2 debiased L3 v4 SMOS SSS of Center of Expertize for CATDS (CEC)-Locean. This data set blends the RE05 version L2 data produced at the Data Production Center (CPDC) of the Center Aval de Traitement des Données SMOS (CATDS) (Vergely & Boutin, 2017) and introduces new corrections, including the improved filtering and the correction of seasonal-varying latitudinal systematic errors (Boutin et al., 2018). The temporal range of above three SSS products is from April 2015 to December 2018.
Although the selected three data sets are sampled on the same grid resolution, the footprint (namely the ideal resolution determined by the satellite antenna) of SMOS (∼43 km on average) differs from that of SMAP (∼40 km). Besides, despite of the different retrieval and correction algorithms in the production of three data sets, the common data (e.g., wind speed, salinity climatology) applied in their retrieval processes could possibly result in common problems. Note that the data of Aquarius are not employed due to the abrupt breakdown in 2015.

Floats of GTMBA and Argo
The GTMBA buoys can provide moored salinity at the shallowest depth of 1 m. A total of 110 buoys are available in the selected temporal range. The salinity observations of GTMBA are extensively quality-controlled to ensure their accuracy (see https://www.pmel.noaa.gov/gtmba/data-quality-control for details). Only those delayed-mode data are considered.
The Argo Program is part of the Global Ocean Observing System. Argo data were collected and made freely available by the International Argo Program and the national programs that contribute to it (https://argo. ucsd.edu, https://www.ocean-ops.org). A number of data centers work on the QC of Argo data and submit them to the official Global Data Assembly Center (GDAC) (Argo, 2000). However, the quality-controlled data assembled from different centers are in substance of different quality. We use the Global Observational Argo Data Set (V3.0) from China Argo Real-time Data Center (Z. Li et al., 2019;Z. Q. Li et al., 2020). This data set carries out a series of post-quality-control on all of the real-time and delayed-mode data retrieved from GDAC to improve uniformity of quality. The post-quality-control comprises 15 QC processes. In addition to 13 standard automatic QC processes in Wong et al. (2020), the automatic Racape spike tests and the delayed-time manual check are also carried out to make sure of the high quality of this "quality re-controlled" Argo data set. Because there is no strict SSS measured by the floats, the shallowest measurements (<10 m) of Argo are taken as the NSS. A total of 6812 floats (including those who used to be active) can provide NSS in the selected temporal range. Only the salinity observations whose quality are flagged "good" after the second-run QC (namely the variable "psal_adj" with the "psal_qc" = 1) are used. In order to make comparison, we bilinearly interpolate the satellite SSS products onto the locations of in situ arrays and form the collocated time series of GTMBA (Eulerian), Argo (Lagrangian), and the corresponding remotely sensed SSS. Since the CEC data cannot always find the exact day matching the date of GTMBA or Argo (it is not a running daily data set), we approximately use the one whose date is closest to that of the float. The risk of this approximation would be quite small considering the unanimous voting diagnosis strategy in Section 2.2.
Note that one time series corresponds to one given float, which can eliminate the differences among platforms and can thus highlight the inconsistent observations of each float (Kennedy et al., 2011).

Other Gridded Data
Two other satellite data sets are employed to show the horizontal structure: a) The Data Unification and Altimeter Combination System (DUACS) L4 reprocessed gridded global Sea Level Anomaly (SLA) (ID: SEALEVEL_GLO_PHY_L4_ REP_OBSERVATIONS_008_047). This data set merges a number of altimetric observations to promise its quality (Taburet & Pujol, 2020). The nominal resolution is 0.25°/daily. b) The Multi-scale Ultra-high Resolution (MUR) SST (Chin et al., 2017). This data set synthesizes the SST observations from multiple platforms (especially, the 1 km infrared data) through multi-resolution variational analysis method. The nominal resolution is daily and 0.01°. Two eddy-resolving (1/12°) (re-)analysis data sets are utilized to show the three-dimensioanla (3D) pattern. a) The HYbrid Coordinate Ocean Model/Navy Coupled Ocean Data Assimilation (HYCOM/NCODA) analysis (Chassignet et al., 2009). The 0 m temperature/salinity data are available. b) The GLORYS12V1 reanalysis (ID: GLOBAL_REANALYSIS_ PHY_001_030) (Fernandez & Lellouche, 2018). The shallowest layer of 0.51 m is taken as approximate surface for comparison.
The WOA13 monthly climatology is also utilized, which is gridded on 0.25° regular grid (Locarnini et al., 2012). This data set can provide the 3D temperature/salinity from 0 to 1,500 m.

Method for Diagnosing the Inconsistent Pairs
The main idea of diagnosis is to flag the collocated NSS observations where there are outliers w.r.t all three satellite products as "collocated inconsistent observation pairs" (CIOPs). To be more specific, we firstly calculate the statistics of the difference between in situ NSS and collocated SMAP/BEC/CEC SSS time series and detect the outliers against each satellite product and second take the intersection of the outliers as CIOPs. This is a strict unanimous voting strategy to ensemble the diagnosis of each satellite product. Possibly it would miss out some inconsistent values but it would in turn benefit to focus on those most inconsistent pairs.
Then the problem lies in the method to find out the outliers of each product. The 3-Sigma criterion is a classical method to detect outliers of in situ data (Wang et al., 2012). However, this method is not usually suitable under the circumstance of the wide-ranged shifted values. Figure 1 presents a wide range of shifted values revealed by Tang et al. (2017), which most likely result from the malfunction of mooring sensor. The original statistics are subjected to heavy bias (0.26 for CEC) and large standard derivation (STD) (0.81 for SMAP and CEC). If we divide the time series into three segments, we can find that the outliers of GTMBA cumulate in the first segment, resulting in the extreme large mean and STD of not only the first segment but also the whole series. In this case, only a small part of the suspicious observations can be detected by the traditional 3-Sigma method (the filled light blue points). A simple adaption could help to improve this problem. Instead of directly using the statistics of the whole series, we divide the series equally into three segments and calculate the statistics of each segment. For each satellite product, we define the minimum STD of three segments and the corresponding mean as the adjusted STD/mean. The adjusted STD of SMAP/BEC/CEC is thus 0.12/0.16/0.23. Utilizing these adjusted statistics, we can diagnose all of the inconsistent observation pairs from March 2016 to August 2016 and the other three pairs in September 2015.
There is a possibility that all of the three segments are contaminated by wide-ranged shifted values. In this case, neither the conventional 3-Sigma criterion nor the adapted 3-Sigma criterion is competent to figure out all of the outliers. As a result, the in situ time series will be biased against satellite products. Based on the series after the removal of outliers diagnosed by adapted 3-Sigma method, we quantitatively define the "collocated inconsistent series pair" (CISP) as the one where the bias (absolute value) of the whole in situ NSS time series (after the removal of CIOPs) is over a threshold w.r.t all of the three satellite products. Note that one CIOP corresponds to one NSS observation along with the collocated satellite SSS observation, while one CISP corresponds to the whole NSS series of one float and the collocated satellite SSS series. Considering that all of the three products are focusing on debiasing, we choose the threshold as 0.1 for GTMBA and 0.2 for Argo in this paper.
We can see from Figure 1 that after removing CIOPs, the quality of GTMBA series is significantly improved and its biases w.r.t all three products are no larger than 0.1. It suggests that the CIOPs here come from bad in situ observations. Interestingly, among three products, either the bias or STD must be sacrificed to improve YAN ET AL.   (μ) and STD (σ) of the difference between GTMBA and three satellite products are calculated and labeled for the whole series and each segments. The dubious GTMBA observations are marked as light blue filled circles for the unanimous voting of conventional 3-Sigma diagnosis and pink circles for the adapted 3-Sigma diagnosis. The statistics labeled "after" are calculated after removing the CIOPs diagnosed by the unanimous voting of adapted 3-Sigma method. Note this series is the real-time data in order to make comparison with the results of Tang et al. (2017). the other one. For example, the CEC has the lowest bias but the highest STD (see the overall "after" statistics in Figure 1). In Section 4, we would further investigate whether this problem is occasional.

Diagnosing the Inconsistent Pairs of GTMBA
Based on the adapted 3-Sigma criterion and the unanimous voting strategy, 11 buoys contain inconsistent observation pairs, of which only buoy 8n38w has more than five CIOPs. It is encouraging that the suspicious observations of GTMBA that were reported in Tang et al. (2017) and Bao et al. (2019) have been truncated by the delayed-mode QC procedures. We can focus on the buoy 8n38w in Figure 2. The series is obviously characterized by shifted values and we can attribute the inconsistencies to the suspicious measurements of GTMBA NSS. Note that the original QC flag of this buoy is 3 (adjusted), however, we can re-label this buoy as 4 (bad quality) based on our diagnosis.

Diagnosis of Argo Floats
Following the same method of Section 3, we detect the CIOPs of Argo NSS between 60°S and 60°N. There are totally 6,812 Argo floats and 636 are diagnosed to have at least one inconsistent observation pairs. Most (445) of the diagnosed floats have just one CIOP and 41 of them have more than five CIOPs (see Table A1 in Appendix A). The locations of the CIOPs and the differences between Argo NSS and satellite SSS are plotted in   The CIOPs cumulate in those regions with large SSS uncertainties Kao et al., 2018), for example, the river plumes, the Gulf of Mexico, the Bay of Bengal (BoB), and the strong boundary currents. Figure 3 also presents a similar pattern with the result of sub-footprint variability estimated by TSG data in Boutin et al. (2016), which suggests that the CIOPs also reveal the sub-footprint variability. Furthermore, Figure 3 delineates a more complete quasi-global map than Boutin et al. (2016) and highlights the sign (positive/negative) of differences. Interestingly, there are a large number of observations with negative difference in those surface-freshening regions, for example, Amazon plumes and BoB, which is contradictory to the expected positive values induced by the surface-diluting stratification. Besides, the western boundary has much more CIOPs than the eastern boundary, albeit the remote sensing of SSS in both regions are affected by RFI. It is not surprising since the sub-footprint variability is usually related to meso-or submeso-scale motions. Taking the Gulf Stream (GS) as an example, there are a number of eddies dominating the cross-shelf exchange in the southeastern America seaboard  and various rings, filaments and streamers detached from the meander or "the north wall" Klymak et al., 2016;Mcwilliams et al., 2019). We must make a full consideration on the time series, trajectories, and profile shapes to discover the potential meanings contained in the CIOPs.
We focus on the first six Argo floats with the most inconsistent observation pairs. The time series of them are presented in Figure 4. The series of Argo and satellite products are generally consistent when the series vary smoothly, except for the continuous CIOPs in panel (a) and (e). However, when Argo varies sharply to a local extreme value, for example, the spikes of low salinity in panel (d), it is easy for satellite products to separate from Argo observations. It must be noted that Argo NSS can be significantly fresher than satellite SSS in a good number of CIOPs. Although there can be a salty skin on sea surface due to evaporation, the magnitude of difference is estimated to be no larger than 0.15 and thus salty skin cannot be a major source of error (Yu, 2010). Therefore, it is not reasonable that most negative anomalies are over 0.5 from the perspective of near surface stratification.
In order to confirm whether the CIOPs are spatially related, we plot the trajectories of those six floats in  moving southward (panel (a) and (c)) and westward (panel (b)), indicating that they are not driven by GS but by the intrusion of the Labrador Current or the recirculation of GS. A common feature of these four floats is that all of the CIOPs are north than 36°N. As for the two floats in Amazon, the CIOPs are distributed near the land. It seems that the observations in low salinity regions are easier to be diagnosed as inconsistent pairs. However, no determinate relation between CIOPs and the distance to land can be derived, especially for the spikes of Figure  We also look into the relation between the salinity stratification and CIOPs, as shown in Figure 6. Most profiles present a mixed layer of 10 m or deeper. In this case, surface stratification is not the dominance of the inconsistencies. Besides, in terms of the same Argo, the negative (positive) anomalies tend to arise at those profiles with low (high) near surface salinity, and the number of positive anomalies is almost equal to that of negative anomalies (except panel (d)). These features are aligned with sub-footprint variability, which is also reflected by the eddy-like trajectories in Figures 5a-5f. Taking Figure 5b as a typical example, the trajectories lingering in the pattern of eddy correspond to the clustered anomalies in Figure 6b.
Another possibility is that all of the three satellite products are doing a bad work in reflecting sharply fluctuated salinity values due to, for example, their method of debiasing or calibration against monthly or climatology field (Boutin et al., 2018;Meissner et al., 2018). For instance, the ubiquitous spikes in Figure   stratification usually have a shorter lifespan. In order to investigate the reason behind the inconsistence, we present the horizontal patterns of various products, as shown in Figure 7.
In Figure 7, the Argo NSS is significantly higher than the satellite SSS in panel (a)-(c). One possible cause is the fresh sheet over the salty water. However, in Figure 6e, the upper-mixed-layer shape of most profiles with positive anomalies does not conform to the near-surface stratification. Although the <5 m salinity is missing in Figure 6e, the salinity profiles of GLORYS and HYCOM prove that the difference between surface and 5 m cannot exceed 0.2 psu (details not shown). Thus, the near-surface stratification cannot be the dominant factor. Besides, HYCOM and GLORYS (Figures 7e and 7f) also present similar SSS pattern with three satellite SSS products in the vicinity of the Argo float. It suggests that the satellite SSS products are not subjected to serious systematic errors. Figure 7d indicates that the Argo is caught by a mesoscale eddy, which is also supported by the trajectories in Figure 5e. It turns out that sub-footprint variability should take responsibility for the inconsistent pairs. In the SSS maps of panel (a-c), the surrounding salinity of Argo 4901466 presents large horizontal gradients, indicating that the average salinity of the footprint can be reasonably inconsistent with pointwise salinity. However, there is not any salinity value that is comparable with that of Argo in the domain. Furthermore, the Argo locates inside a quite homogenous eddy and its near surface temperature agrees well with the satellite SST in Figure 7g. The problem lies in that SSS is, while SST is not subjected to sub-footprint variability. One possible explanation is that the MUR SST has much higher spatial resolution than three satellite SSS products. Nevertheless, the much coarser (∼25 km) Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) (Donlon et al., 2012) is also aligned with Argo temperature (details not shown). It implies that the footprint of remote sensor cannot totally explain the discrepancy.  The malfunction of Argo sensors is also considered. Different from GTMBA mooring buoys, the malfunction of Argo means that the whole profile would depart from the climatology. In Figure 8b, significant and continuous positive salinity anomaly w.r.t WOA13 can be seen from August 2017 to mid-September 2017, corresponding to the salty discrepancy in panel (a). The salinity anomaly is much more significant than seasonal variability and presents a surface-intensified pattern. In terms of temperature, the temperature anomaly in panel (c) resembles that in panel (b) but is subsurface intensified. That is, the center of temperature anomaly locates at tens of meters instead of the surface during August to mid-September 2017. Therefore, the Argo near surface temperature is closed to the climatology. Combined with the consistence of Argo temperature with satellite temperature, it turns out that the temperature sensor of Argo is on the normal run. In this case, the subsurface temperature is most likely corresponding to a reasonable mesoscale structure. Thus, the Argo salinity must be in good quality, as panels (b) and (c) depicts very similar structure.
Based on the above analysis, the continuous inconsistencies of Argo 4901466 are most likely due to the innate difference between satellite SSS and Argo NSS. We further investigate the inconsistence through the zonal sections of HYCOM and GLORYS, as shown in Figure 9. Although the detailed structure of two data sets differs from each other, both products depict the mixed layer overlaying an anticyclonic eddy. The main difference between surface temperature and surface salinity is that the temperature is quite homogenous in the mixed layer, while the salinity present noticeable horizontal gradient. Especially, the Argo locates exactly at the dividing line between the surface salty water and the surface fresh water. The misfit between temperature and salinity should take responsibility for the correspondence of salinity inconsistence to the sub-footprint variability. However, the sub-footprint variability is still not a perfect explanation due to the significant high Argo NSS against the surrounding satellite SSS.
YAN ET AL.    Note that the satellite SSS products are quasi-weekly running products as only SMOS and SMAP can measure remotely sensed salinity now. As a contrast, a great number of missions can observe SST from the space. The temporal under-sampling of satellite SSS turns out to be the reasonable culprit of the continuous inconsistencies of Argo 4901466.
The collocated inconsistent series pairs (CISPs) of the Argo floats whose entire time series of NSS (after the removal of CIOPs) is >0.2 psu biased against all three remotely sensed products in Table 1. Twenty five CISPs are diagnosed. It must be noted that in terms of over half of floats, the signs of the bias w.r.t three SSS products are not consistent, which alludes to that at least one SSS product is in bad quality at the locations of the Argo float. The CISPs of Argo are mainly distributed at several regions, including the Mediterranean Sea, the western boundary currents, and the Antarctic Circumpolar Current (see Figure A1 in Appendix A). Note that these regions are either contaminated by RFI or abundant of eddies. Generally speaking, the positive anomalies caused by sub-footprint could usually be balanced by the negative anomalies. Even the Argo 4901466 analyzed above is not as biased as 0.2 psu. It implies that the inconsistent series pairs in Table 1 are probably a combination of qualified Argo measurements and biased satellite products. As for those floats whose biases of NSS are of the same sign against three products, they possibly reflect the systematic errors of salinity retrieval.
The bias (STD) for the difference between Argo and SMAP/BEC/CEC is 0.04/0.00/-0.03 (0.33/0.42/0.38) for original series and 0.04/0.00/-0.03 (0.33/0.42/0.37) after the removal of inconsistent pairs. It is reasonable that the statistics are almost unchanged because the CIOPs (1110) take a very small part (0.01%) of all Argo observations (9339252). From the perspective of time series, 6812 floats are considered, of which 1164/709/1082 (17.1%/10.4%/15.9%) w.r.t SMAP/BEC/CEC have the lowest bias and STD at the same time, while 1317/1040/1132 (19.3%/15.3%/16.6%) have the lowest bias (STD) but the highest STD (bias). Although the overall STD implies the best quality of SMAP, no specific data set is significantly superior to the other two considering the tradeoff of STD and bias. More efforts are needed to reduce bias and STD at the same time.

Discussion
From the perspective of time series, we use an adapted 3-Sigma criterion to detect the inconsistencies between Argo NSS and three satellite salinity products. Although we also calculate the bias of the whole series (i.e., the diagnosis of CISP) to complement the shortcoming of the adapted 3-Sigma in tackling those series where all of the three segments have wide-range shifted values, the adapted method could still lose effectiveness in detecting high STD series. However, the conventional 3-Sigma criterion cannot solve this problem as well. After all, we cannot promise that all of the inconsistencies would be detected, but our adaption (and the unanimous voting strategy) could improve the accuracy of diagnosis.
Statistically speaking, the inconsistent pairs diagnosed in this paper can be a reference of bad in situ NSS observations as long as the time series of at least one SSS product is reasonable (see Figure 1). The shifted values of GTMBA are thus easily detected and we could confirm the justification of our results in Figure 2.
Different from the tropical GTMBA buoys, most of the Argo observations with inconsistent observation pairs (see Figure 3) are located in the seas (e.g., the Gulf of Mexico), the regions where the fresh water pro-YAN ET AL.  cesses are active (e.g., the river plumes) and the strong boundary currents (e.g., the Gulf Stream). Due to the contamination of RFI, the SSS retrieval in these regions is challenging and the quality of satellite products cannot be promised. Besides, the above regions have typical mesoscale motions or near-surface processes, which could lead to significant differences between Argo NSS and satellite SSS.
Most time series in Figure 4 present the abrupt occurrence of inconsistencies and the approximately equal number of positive differences and negative differences. These characteristics are in accordance with sub-footprint variability, which is also supported by the trajectories in Figure 5. The majority of profiles in Figure 6 present near-surface mixed layers, even in the Amazon plume where the fresh water processes are expected to be dominant. The near-surface stratification accounts for a negligible part for the inconsistencies of Argo. This conclusion is not contradictory to the results of Boutin et al. (2016), as the surface freshening is short-lived and the shallowest measurement of Argo cannot usually reach the very surface. As stated in Boutin et al. (2016), the minimum in situ sampling depth required to resolve the near surface salinity needs to be further investigated.
The continuous discrepancies of Argo 4901466 (Figure 4e) are not aligned with typical circumstance of sub-footprint variability. Based on the comparison with multiple data sets, the inconsistencies are closely related to an anticyclonic eddy and the malfunction of Argo sensors can be excluded. Considering that the Argo NSS is extremely higher than the surrounding satellite SSS, the temporal under-sampling of salinity satellite is most likely to be the culprit. However, it is still not perfectly explained why the inconsistencies could stay positive for about two months.
The biased series (the "CISPs") in Table 1 partly reveal the problems in the existing satellite products. Over half of the Argo NSS series diagnosed in Table 1 have different-sign bias w.r.t three SSS products. This disagreement demonstrates that at least one satellite product is problematic, while the agreement of signs also preserves the possibility of the systematic errors in all three SSS products. It is recommended that the retrieval of SSS should make a contrast with the series of these floats to ensure the quality. Besides, the comparison of the bias and STD of the differences between in situ NSS and three SSS products (see the last paragraph in Section 4) alludes to that no satellite data are outstanding to reduce both bias and STD at the same time. Much more efforts must be devoted to improve the quality of remotely sensed SSS.

Summary
To summarize, the collocated inconsistent observation pairs illuminate the problematic salinity measurements of mooring buoy 8n38w and the dominance of sub-footprint variability in the inconsistent observations of Argo NSS in Table A1. The continuous salty discrepancy of Argo 4901466 indicates that the temporal under-sampling is also an important factor of inconsistencies. The inconsistent series diagnosed in Table 1 present the Argo floats whose NSS are >0.2 biased against all three satellite products. Those series whose bias is of different sign against three satellite product suggests that at least one product is erroneous. These Argo floats could possibly act as the touchstone of the qualified satellite product. Additionally, by the analysis on the STD and bias of three satellite products, it turns out that no one satellite product is outstanding to significantly reduce STD and bias at the same time. More studies are needed to improve the quality of satellite data and to evaluate the salinity differences resulting from the meso-and submeso-scale eddies within the footprint.
Moreover, the results are supportive of the high quality of the delayed-mode GTMBA data set and the Global Argo Observational Data Set (V3.0). However, there are some circumstances that the salinity observations would be radiolabeled "adjusted" in the quality control. The comparison with multiple satellite data sets could help to assess the reliability of these products.