Modelling 85Kr datasets with python for applications in tracer hydrology and to investigate atmospheric circulation

We present a model written in python to evaluate data from comprehensive 85Kr collection schemes comprising 11 datasets from different monitoring stations around the globe. The model is designed to (1) calculate atmospheric input functions for the application of 85Kr as a dating tracer and (2) to investigate atmospheric circulation based on a two-box model of the atmosphere. Different functions were implemented, to (1) filter the data, (2) fit polynomials and running means, (3) extrapolate fits from the northern to the southern hemisphere, (4) calculate interhemispheric exchange times and 85Kr emission rates and (5) export data to a csv file. Although the model is designed to evaluate atmospheric 85Kr datasets, some functionality and basic concepts can be applied to other dating tracers, like tritium and SF6.• Standardized method to systematically analyse atmospheric 85Kr activity concentration time series for dating water and ice and to investigate atmospheric circulation.• Easily modifiable python script to adapt functions for similar data analysis procedures.


Specifications
Resource availability To reproduce the fits and models, the data collection published in the corresponding Data in Brief paper is required [1] .

Method details
With its ideal geochemical and geophysical properties as a noble gas, 85 Kr is a valuable tracer for various applications in Earth sciences [2 , 3] . For dating groundwater, oceanwater and ice with 85 Kr, a good knowledge of the atmospheric concentration during the infiltration or formation process of the investigated water or ice body is crucial. Over the past 60 years, the origin of 85 Kr in the atmosphere can be considered purely anthropogenic, as its main source are nuclear reprocessing plants. The activity and location of nuclear reprocessing plants varied strongly over this time period, resulting in large temporal and regional changes of atmospheric 85 Kr concentrations [4] . These trends and fluctuations are clearly detectable in the 85 Kr datasets at the monitoring stations, which form the basis of the modelling tool that is presented here. This data analysis and plotting tool is written in python 3.8 and is designed to evaluate a comprehensive 85 Kr dataset [1] for dating water and ice and to investigate atmospheric circulation [5] . However, certain functions of this tool can be applied to similar datasets as well.
For a better understanding of the individual functions of the model, they are applied to the 85 Kr data collection in [1] . The main script functions.py as well as an example script example.py together with a requirements.txt and a readme.txt file can be found in the supplementary material.

The 85 Kr data collection
The python model has been written for a specific 85 Kr data collection which comprises 11 individual datasets from various monitoring stations around the globe ( Fig. 1 ). In total, it contains about 80 0 0 atmospheric 85 Kr activity concentrations measured over the past 60 years. The typical sample collection and preparation process happens in two steps. First, about 5 mL of krypton from 10 m ³ of air is collected over the course of one week, by trapping noble gases, including krypton, on liquid nitrogen cooled activated charcoal [6] . The krypton is then released from the charcoal trap and the enriched gas sample is sent to the laboratories of the Bundesamt für Strahlenschutz (Federal Office for Radiation Protection), in Freiburg, Germany. Here, the krypton fraction is separated by gas chromatography and transferred into gas proportional counters to determine the 85 Kr activity of the sample via decay counting [7] . The measurement uncertainty of this 85 Kr analysis method is about 3%.

The model
In Fig. 2 , the complete data collection is plotted. The northern hemisphere data is represented by two distinct models: First, a polynomial fit of degree 5 (red line) is used to fit the 85 Kr data of the 3 northern hemispheric monitoring stations (Freiburg, Schauinsland, Jungfraujoch). This fit represents the average trend of 85 Kr emissions in central Europe. Secondly, a 7th degree polynomial is fitted to the minima of the northern hemisphere data within a certain time step, representing the northern hemispheric 85 Kr activity concentration undisturbed of local 85 Kr sources (a baseline) and used for calculating interhemispheric exchange times. The southern hemisphere data is represented by a polynomial fit to the data of 7th degree.
In the following section, the term 'dataset' refers to three arrays with n entries, with the first array being the sampling date, the second array the corresponding 85 Kr activity concentration and the third array being the measurement uncertainty of these concentrations. The number n refers to the total number of measurements in a single dataset.
The functions of the model that are used to generate the fits presented in Fig. 2 are minDays, poly and errPoly . The values for the baseline of the northern hemispheric dataset are generated with the minDays function, which takes a data set and the integer days as parameter. The given dataset is reduced to one entry for each time interval, which represents the minimum 85 Kr activity concentration of the given period of days.
The poly function takes a dataset and an integer polyDegree as parameters and returns the polynomial fit of degree ' polyDegree ' to the given dataset.
The errPoly function estimates the corresponding uncertainties of the polynomial fits based on Monte Carlo simulations. It takes a dataset, the corresponding integer polyDegree and the integer iteration as parameters. An iteration of 10 0 0 means that 10 0 0 synthetic datasets based on the original dataset are generated. A function of the python numpy module is used, which generates a synthetic data point for each real data point based on a gaussian distribution around the measured 85 Kr activity concentration with the error of the measurement as the standard deviation. A polynomial fit of degree polyDegree is then fitted to each synthetic dataset resulting in 10 0 0 possible fitted values for each point in time. The standard deviation of these 10 0 0 data points is calculated and taken as the error of the fit for this specific time.
The fit (black line) shown in Fig. 2 represents a polynomial fit of degree 7 to all southern hemispheric datasets. The dashed grey line shortly above and below is the error of that fit generated by adding or subtracting the standard deviation calculated by the errPoly function.
The orange fit represents the baseline of the northern hemispheric datasets. It was generated by first filtering the northern hemispheric datasets with the minDays function with the parameter ' days' set to 90. Then a polynomial fit of degree 7 ( poly ) as well as the error of that fit ( errPoly ) was fitted to the resulting dataset. The red line as well as the corresponding dashed grey line was generated by applying the poly and the errPoly function to all northern hemispheric data. For illustrative purposes, a polyDegree of 5 is taken here, resulting in a smoother fit.
The three fits displayed in Fig. 2 represent atmospheric 85 Kr input functions which are essential for the application of 85 Kr as a dating tracer for water and ice. More details and further implications of the fits shown in Figs. 2 and 3 are discussed in the original research paper [5] .
The main sources of atmospheric 85 Kr are nuclear reprocessing plants, which are mostly located in the northern hemisphere. Therefore, many features of the python script are based on a simple atmospheric two box model, in which the northern and the southern hemispheres are treated as two well mixed boxes with specific 85 Kr activity concentrations C NH and C SH , time independent air mass flux across the equator (1/ τ ex ) and a source term S NH applied only to the northern hemisphere. With the decay constant λ of 85 Kr, the differential equations that govern the hemispheric 85 Kr activity concentrations can be written as [8] : The interhemispheric exchange time can then be calculated as: Furthermore, the source term is given by: When only the trend of the northern hemispheric 85 Kr activity concentration is known and an interhemispheric exchange time is given, the southern hemispheric concentration can be modelled: To better analyse short term trends within the data, a running mean can be calculated. This is done with the runningMean function, which takes a dataset and the integer runningMeanCycles as parameters. For each cycle, the input data is smoothened by a 3rd order low pass filter of the form: In Fig. 3 a two features of the model are presented. First, the black line represents the extrapolation of the northern hemispheric baseline (green line) to the southern hemisphere according to Eq. (5) . This is done with the extrapolate function which takes a dataset and a float value for the interhemispheric exchange time in years (here 1.25) as input and returns the extrapolated dataset. The dashed lines in the figure are the decay corrected activity concentrations relative to a given date (here 1st of August 2020). The corresponding python function is the decay function, which takes a dataset and the reference date as input. The decay constant is set to 15.5 years for 85 Kr and can easily be replaced to apply this functionality to other radioactive tracers, like tritium.
The interhemispheric exchange time ( Fig. 3 b) is estimated according to Eq. (3) . The required function for this procedure is the interhem function, which takes the polynomial fit to the southern hemispheric datasets, as well as the polynomial fit to the baseline of the northern hemispheric datasets including their fitting errors as parameters. It then calculates the interhemispheric exchange time on an annual basis and returns the three arrays date, interhemispheric exchange time and error of the interhemispheric exchange time. The error is calculated according to the Gaussian error propagation from the uncertainties of the fits.
The trends within the Adelaide dataset are analysed in Fig. 3 c by fitting 5 and 50 iterations of the running mean to the original data, according to Eq. (6) .
The source term is calculated on an annual basis ( Fig. 3 d). The function emissions is written analogously to the interhem function and is based on Eq. (4) .
Two additional functions of the python model are not represented in the figures. The gofNH_SH function calculates how well the extrapolated northern hemispheric fit describes the southern hemispheric data. This is done by calculating the χ² value of the extrapolated fit. The export function writes the modelled dates, values and errors of a polynomial fit to a specific dataset to a csv file.