Objective Estimation of Sensory Thresholds Based on Neurophysiological Parameters

Reliable determination of sensory thresholds is the holy grail of signal detection theory. However, there exists no assumption-independent gold standard for the estimation of thresholds based on neurophysiological parameters, although a reliable estimation method is crucial for both scientific investigations and clinical diagnosis. Whenever it is impossible to communicate with the subjects, as in studies with animals or neonates, thresholds have to be derived from neural recordings or by indirect behavioral tests. Whenever the threshold is estimated based on such measures, the standard approach until now is the subjective setting—either by eye or by statistical means—of the threshold to the value where at least a “clear” signal is detectable. These measures are highly subjective, strongly depend on the noise, and fluctuate due to the low signal-to-noise ratio near the threshold. Here we show a novel method to reliably estimate physiological thresholds based on neurophysiological parameters. Using surrogate data we demonstrate that fitting the responses to different stimulus intensities with a hard sigmoid function, in combination with subsampling, provides a robust threshold value as well as an accurate uncertainty estimate. This method has no systematic dependence on the noise and does not even require samples in the full dynamic range of the sensory system. We prove that this method is universally applicable to all types of sensory systems, ranging from somatosensory stimulus processing in the cortex to auditory processing in the brain stem.


INTRODUCTION
Objective, reliable, and reproducible estimation of sensory thresholds is a fundamental problem in neuroscience as well as clinical diagnostics. For example, hearing thresholds must be determined as objectively and precisely as possible in patients with hearing loss, especially in those who cannot report their hearing, e.g., babies (Luts et al., 2004), to provide them with the optimal type of hearing aid and to adjust the operating parameters of the device. Similarly, the determination of thresholds from behavioral or physiological measurements in animals is a challenging task (Yu et al., 2015;Deuis et al., 2017) as these thresholds are usually obtained by fitting various sigmoidshaped functions to the data using a specific threshold-criterion (Jiang et al., 2015;Salthammer et al., 2016).
A fundamental problem with common methods for threshold estimation is that these usually use responses just above threshold, and trivially these responses are small and thereby strongly affected by noise. In other words, common methods for threshold estimation are based on data with low signalto-noise ratio (S/N), where responses may be hard to detect or cannot be detected at all (miss) so that thresholds are often assessed too large. In particular, automated methods that use a statistical criterion to define the threshold, e.g., the signal amplitude several standard deviations above background noise (Ozdamar et al., 1994;Luts et al., 2004;Walter et al., 2012), are prone to such threshold overestimation. On the other hand, also false positive ratings (false alarms) may occur, especially if data are evaluated subjectively by human observers. A striking example for such severe uncertainty is the evaluation of auditory brainstem responses (ABR) by clinical professionals, where threshold estimates have been demonstrated to differ by up to 60 dB between evaluators (Vidler and Parkert, 2004). Finally, as responses are repeatedly measured and then averaged, the S/N strongly depends on the number of repetitions, which introduces a further source of error to the threshold estimation. Obviously, a fully automated method for threshold estimation which is robust against low S/N would be preferable to guarantee objectivity and reproducibility. This challenging problem has also been tackled by the implementation of machine learning algorithms such as support vector machines (Acir et al., 2006). However, machine learning approaches generate black-box systems with complex internal decision criteria that are not comprehensive to the users (Castelvecchi, 2016). In addition, they require huge data sets for training and hence are neither feasible nor accepted for medical diagnostic purposes (Kononenko, 2001).
We here introduce a method for threshold estimation that solves the problems mentioned above and that can be applied to any type of neuronal data for threshold estimation, i.e., to any data of response strength as a function of stimulus intensity. Our method is robust against low S/N, as no measurements of responses close to threshold have to be included into the analysis.
We used simulated data to evaluate the objectivity, reproducibility and robustness of the method. In addition, we demonstrate the method's feasibility with real ABR data and with cortical neuronal responses [local field potentials (LFP) and single neuron spiking responses] from different sensory modalities (auditory and somatosensory) in an animal model.
We show that the fitting of stimulus-response functions to neural responses can be significantly improved by taking into account the spontaneous neural background activity under nonstimulus conditions. Furthermore, we demonstrate that any threshold criterion based on the fit function should not depend on the spontaneous activity amplitude as such criteria lead to a monotonic decrease (divergence to −∞) of the determined threshold with decreasing noise amplitude or increasing number of measurement repetitions. Thus, a generalized hard sigmoid function was chosen to fit the data, where the lower knee of the function defines the sensory threshold, resulting in a minimum set of three free parameters and a completely objectified method for threshold estimation based on neurophysiological data.

Animals
Mongolian gerbils (Meriones unguiculatus) were housed in standard animal racks (Bio A.S. Vent Light, Ehret Labor-und Pharmatechnik, Emmendingen, Germany) in groups of 2 to 3 animals per cage with free access to water and food at 20 to 24 • C room temperature under 12/12 h dark/light cycle. The use and care of animals was approved by the state of Bavaria (Regierungspräsidium Mittelfranken, Ansbach, Germany, No. 54-2532.1-02/13). All methods were performed in accordance with the relevant guidelines and regulations of NIH. A total of 11 male gerbils aged 10-12 weeks purchased from Janvier Laboratories Inc. were used in this study.

Generation of Artificial Test Data Sets
In order to evaluate our algorithm, artificial data sets were generated, to mimic far field ABR recordings (cf. Figure 1). We assume that the response amplitude (root-mean-squared, RMS) as an answer to the stimulus intensity can be described by a sigmoid function (in this case a generalized logistic function): Where a describes the maximal response, b defines the stimulus intensity where the response is exactly half the maximal response, and the factor c defines the slope of the sigmoidal shape. We assign the parameters with a = 10 mV, b = 60 dB, c = 11.89 dB to obtain values in a realistic range and sample the function at 22 equally distributed points from −30 dB to 130 dB ( Figure 1A). A template amplitude response curve was used to generate the artificial field potential ( Figure 1A). For each of those supporting points we generate a sinusoidal signal with a duration of 10 ms and an amplitude given by Equation (1), to imitate the physiological response, e.g., the auditory brainstem activity ( Figure 1B). For each of these artificial responses, a measurement noise being Gaussian distributed noise (µ = 0, σ = 40 mV, S/N = 1/4) is created (Figures 1C,D) and added to the pure response ( Figure 1D).
For every stimulus intensity N = 200 such responses are created, to obtain N measurement repetitions (trials). The values of the parameters a, b, c as well as the noise amplitudes are chosen in accordance to real ABR measurements, however the sigmoid shape of the underlying stimulus response function is universal and thus the results can be applied to any kind of threshold determination tasks based on the interpretation of neural responses.
For each single trial, new noise is sampled, but with the same average background noise amplitude. This background noise reflects a combination of spontaneous neuronal activity and measurement noise. This physical noise being a superposition of several noise sources can be approximated with a Gaussian distributed noise of a certain amplitude.
From these (artificial) single trial responses, all N simulated repetitions for each stimulus intensity are averaged (mean simulated ABR responses, averaging 200 single trials as in Figure 1E). The obtained averaged responses mimic the processes during a real measurement to provide data where the real underlying response amplitude function is known (cf. Figure 1F, ground truth: Figure 1A).

ABR Recordings
ABR measurements were recorded using a custom made setup. Pure tone stimuli of different frequencies ranging from 1 to 8 kHz were generated by a custom-made Matlab program and presented at different, pseudorandomized intensities ranging from 40 to 110 dB SPL in 5 dB steps. Stimulation was free-field to one ear at a time via a speaker (Sinus Live NEO) corrected for its frequency transfer function to be flat within +/-1 dB at a distance of ∼3 cm from the animal's pinna while the contralateral ear was tamped with an ear plug. To compensate for speaker artifacts stimuli were presented in double trials consisting of two 6 ms stimuli (including 2 ms sine square rise and fall ramps) of the same amplitude but opposite phase, separated by 100 ms of silence. One hundred and twenty to five hundred double trials (N: Number of double trials) of each combination of intensity and frequency were presented pseudorandomly at an inter-stimulus interval of 500 ms. For the measurements the Mongolian gerbils were kept under deep anesthesia. Anesthesia was induced by an initial dose of 0.3 ml of a ketamine-xylazine-mixture (mixture of ketamine hydrochloride: 96 mg/kg BW; xylacin hydrochloride: 4 mg/kg BW; atropine sulfate: 1 mg/kg BW), and maintained by continuous application of that mixture at a rate of 0.2 to 0.3 ml/h. As has been demonstrated previously, such ketamin-xylazine anesthesia has only little effect on ABR signals compared to awake animals (Smith and Mills, 1989). During measurements, animals were placed on a feedback-controlled heating pad at 37 • C to maintain body temperature. Data were recorded using three silver electrodes positioned subcutaneously, one for grounding at the back of the animals, one reference electrode at the forehead, and the measuring electrode infra-auricular overlying the bulla contralateral to the stimulation side. The potential difference between the reference and measuring electrode was amplified by a low noise amplifier (JHM NeuroAmp 401, J. Helbig Messtechnik, Mainaschaff, Germany; amplification 10,000; bandpass filter 400 to 2,000 Hz and 50 Hz notch filter). Note that for further analysis the amplified signal was used, that is, amplitudes are given in mV whereas the actual neuronal signals were in µV-range. The output signal of the amplifier was digitalized and recorded by an analog-digital converter card (National Instruments Corporation, Austin, TX, USA) with a sampling rate of 20 kHz and synchronized with the stimulation via the trigger signal from the stimulation computer. Raw data of N double trials per sound level for one stimulus frequency were averaged. Finally, these averaged responses of the two single, The stimulus response function without background noise is approximated using a generalized logistic function (dark green). The fit function (light green) is the square root of the sum of the squared "pure" neuronal signal and the squared noise amplitude (including neural noise and measurement noise, for the derivation cf. Supplementary 2). (B) The estimated threshold for the 2σ-criterion as a function of the number of applied data samples decreases monotonically with rising sample size and diverges to −∞ for an infinite number of samples. (C) The estimated threshold for the 5% criterion is independent of the sample size, only the uncertainty decreases (estimated by subsampling, cf. Methods). (D) Different choices of the threshold parameter p for this criterion result in different threshold estimates. (E) To get rid of the arbitrary parameter p, the sigmoid shape of the stimulus response function can be approximated with a hard sigmoid function (dark blue), where the threshold is defined as the lower knee. In analogy to the logistic function, for fitting the responses the function is superimposed with a noise offset sigma. (F) Estimated threshold for the knee criterion as a function of the sample size. The small constant offset compared to the analysis in (C) is caused by the arbitrary parameter p used for the 5% criterion. The analysis in (B,C,F) was performed by stepwise subsampling with increasing sample size N, with 100 subsamples for each size.
phase inverted stimuli within one double trial were averaged to eliminate stimulus artifacts (Figures 4A,B). From these averaged, artifact-corrected data the root mean square (RMS) values from 0 to 10 ms after stimulus onset were calculated to obtain a measure of response amplitude for each stimulus intensity presented (cf. Figures 4C,D).

Neuronal Recordings in Auditory and Somatosensory Cortex
For neuronal recordings in the primary sensory cortices a craniotomy was performed on Mongolian gerbils under deep ketamine-xylazine anesthesia as described above. During the complete surgery the animal was kept on a feedback-controlled heating pad at 37 • C to maintain body temperature, and the paw withdrawal reflex was checked periodically to ensure sufficient depth of anesthesia. A screw was fixed to the skull of the animal using instant glue and dental cement to provide a fixation during neuronal recordings. The neuronal recordings were performed directly after surgery. For recordings in auditory cortex Tziridis et al., 2015), the animal was placed in an anechoic chamber on a heating pad, the head was fixed, and anesthesia was continued. Then a 16 electrode Pt-Ir array (Clunburry Scientific, 4x4 array, spacing 500 µm, Bloomfield Hills, MI 48304 USA) was inserted into the auditory cortex for LFP and single unit spike recording. For auditory threshold estimation 200 ms pure tone stimuli (2 kHz, 5 ms ramp) of Stepwise removal of the supporting points starting at the sub-threshold range (starting at low stimulus intensities) for the logistic function fit with the 5%-criterion (stepwise deletion of supporting points from yellow to dark green) (A) and the hard sigmoid fit with the knee criterion (B), from cyan to dark blue. (C) The determined threshold is very robust against deletion of supporting points near the subthreshold range where the S/N ratio is poor. (D,E) Stepwise removal of supporting points near the saturation range [starting at high stimulus intensities, yellow to green) for the logistic function fit (D) and the hard sigmoid fit (E), from cyan to dark blue]. (F) The hard sigmoid fit is very robust against missing supporting points near the saturation range. Note that even though the used artificial data is generated using an underlying generalized logistic function the hard sigmoid fitting procedure is more stable against missing supporting points near the saturation range. This fact is important e.g., for far field potential measurements as the dynamic range of such measurements typically is very large due to the different thresholds of the involved neurons are distributed over a wide range (cf. Nizami and Schneider, 1999).
varying sound pressure level (30-90 dB SPL) were presented. The different sound pressure levels were presented pseudorandomly with 300 repetitions each.
Recordings in somatosensory cortex were performed as described above. For threshold measurements a linear resonant actuator specifically designed for haptic feedback application (LRA, C10-100 Precision Microdrives) was used to apply vibrotactile stimuli to the hind limb of the animal (frequency: 175 Hz, range: ∼0.9-15 µm).

General Computation
All simulations and evaluation algorithms were run on a standard desktop PC and were written in Python using the Anaconda bundle. For mathematical operations the NumPy (Walt et al., 2011) and SciPy (Oliphant, 2007) library were used. The plots were created using the Matplotlib (Hunter, 2007) library combined with the Pylustrator add-on (Gerum, 2018) for plotting style editing.

General Fitting and Subsampling of the Data
In general, monotonous rate-intensity functions in neuronal systems (based on physiological data) follow a sigmoid function and, thus, are often described by a logistic function (Berkson, 1944;Bi and Ennis, 1998). However, the superposition of measurement noise and stimulus induced neural signal can lead to slight changes in the shape of the stimulus response function. In the following, the effect of measurement noise on the curve shape as well as different threshold criteria based on the fit function are described in detail.
For all following analysis steps in addition to the fitting procedure a subsampling technique is applied. In all cases 100 different subsamples were generated. For each subsample N-d trials were randomly drawn, where d > √ N (delete-djackkife criterion, cf. Shao and Tu, 1995;Bertail et al., 1999), without returning from the complete set of measured trials (N). The reduced set of measurement repetitions is used as base for the fitting procedure. This procedure is done for each stimulus intensity separately, so that each subsample contains N-d measured trials for each stimulus intensity.
The procedure is applied for two different reasons. First, the sample size is systematically altered to analyze the effect of different number of applied measurement repetitions on the determined threshold (in the following sample size is used synonymic to number of measurement repetitions), on the other hand for real neural data the method is used similarly to bootstrapping methods to estimate confidence intervals.

Artificial Test Data
The major problem for the verification of any method for threshold estimation lies in the fact that net stimulus response amplitudes are unknown due to variable amounts of noise. To test our new approach for reliable threshold estimation we generate an artificial test data set (Figure 1; for details cf. Methods). This allows for the verification of estimated thresholds by comparing them to the thresholds on which the artificial data is based. To this end, a stimulus-response function was defined for the ideal case of no measurement noise ( Figure 1A). This stimulus response function is then translated to artificial neuronal responses (raw signal) approximated by a 1,000 Hz sine wave (Figure 1B). The addition of Gaussian distributed background noise ( Figure 1C) simulating real measurement noise results in realistic raw data ( Figure 1D).
For each stimulus intensity x, we generate 200 independent samples of such artificial raw neural recordings. As in a real data evaluation, these recordings are then averaged to reduce the noise (Figure 1E). Based on the resulting average signal, the root-mean-square (RMS) amplitude is computed. When plotted as a function of the stimulus intensity x, we obtain a re-constructed stimulus-response function (Figure 1F), which in general has an altered sigmoidal shape f (x) = f 0 (x) + σ . This re-constructed function f (x) is then used to test our new method of threshold estimation.

Threshold Criteria
To estimate sensory thresholds the standard procedure is to measure the response amplitude as a function of stimulus intensities. The resulting stimulus response function typically follows a sigmoid shape, (cf. Winter et al., 1990;Nizami and Schneider, 1999) and can be fitted using a generalized logistic function f 0 with an offset for the background noise: where a refers to the maximum response, b, the location of the inflection point, c, an indirect measure for the slope at the inflection point, and σ the noise level (cf. Equation 1). Depending on the experiment, the noise either is added directly to the response (Equation 2a; e.g., for spike rates), or the response is a RMS value where the noise is added according to equation (Equation 2b, cf. Figure 2A; e.g., for RMS values of field potentials; cf. Supplementary 2 for the derivation). If the noise would also be treated as an additive term in the case of a RMS value, the signal and noise would not be decoupled correctly and the obtained thresholds would be noise dependent (see Supplementary 1).
In principle all four parameters could be fitted (Nizami and Schneider, 1999;Nizami, 2002), but fitting the noise level from the response function results in highly unstable threshold estimates (cf. Supplementary 4). Therefore, we estimate the background noise level by analyzing the response without stimulation and hold the background noise level σ constant for the fitting procedure. To extract a threshold from the fitted stimulus response functions, a threshold criterion has to be defined.
There exist multiple approaches how to define a threshold criterion for a given sigmoidal function. However, defining a reliable threshold criterion is anything but trivial. In many studies threshold criteria are used that depend on the background noise, e.g., the 2σ-criterion, where the threshold is set to the point where the function exceeds two times the standard deviation of the noise, i.e., two times the RMS of the noise . Another common approach is to define the threshold as a constant fraction p of the dynamic range a, e.g., the 5%-criterion, where the threshold is set to the point where the sigmoid function exceeds 0.05a (Nizami and Schneider, 1999). As we will show in the following, both approaches to define a threshold criterion have fundamental drawbacks.
The threshold based on the 2σ-criterion can be expressed as a function of the fit parameters a, b, c, and the constant parameter Thresholds based on the 5% criterion (p = 0.05) can be calculated from the fit parameters c and b of the generalized logistic function: The parameter a cancels out when transforming the equations, but still has an indirect influence on the threshold estimate as it influences the other parameters of the fit.
For both criteria, we analyze how the sample size (mimicking number of measurement repetitions), and thus the effective background noise, influences the obtained threshold values.
For a systematic analysis we use the artificial data set as described above (cf. Figure 1), and evaluate the threshold for different sample sizes (Figures 2B,C,F). As the background noise is Gaussian distributed and the data is averaged across all samples, the effective noise amplitude σ scales indirectly proportional to the square root of the sample size N: Hence, for a measurement with lower background noise the number of samples across which has to be averaged to get a satisfying signal to noise ratio can be reduced. As mathematically the effective noise level decreases with the number of samples, it is trivial to see that the 2σcriterion, being directly noise dependent, results in a decrease of the threshold value with increasing number of samples ( Figure 2B). The fit parameters a, b, c approach a constant value for increasing number of samples and σ approaches 0, the threshold estimate t σ diverges to −∞ (cf. Figure 2B), i.e., for an infinite number of samples the threshold becomes infinitesimal. In real measurement situations such behavior leads to poor comparableness of the results measured in different laboratories as the threshold estimate is systematically influenced by the number of measurement repetitions.
In contrast to the 2σ-criterion, for the 5% criterion the threshold is set to the stimulus intensity where f 0 (x) exceeds p · a (p = 5%), i.e., the threshold is set to the value where the net amplitude response function (with no background noise) exceeds the fraction p of the dynamic range (cf. Equation 4). This modified procedure leads to highly reproducible threshold estimates that are independent of the number of measurement repetitions ( Figure 2C).
The median threshold for increasing sample size converges to the threshold obtained from the net response function without noise (cf. Figure 1A). The 5% criterion overcomes the limitation of a systematic dependency of the estimated threshold on the number of measurement repetitions. This behavior means in detail that the systematic error of threshold estimates based on the 2σ-criterion is replaced by a stochastic error and thus more measurement repetitions lead to more reliable threshold estimates but do not cause a systematic bias of the determined values.
However, p = 5% is an arbitrary parameter on which the threshold depends and the choice of p can significantly influence the value obtained for the threshold (cf. Figure 2D).
This fitting approach has a second major shortcoming: The generalized logistic function is inadequate for the analysis of most real data as the function is symmetric around the inflection point and missing supporting points in the saturation range lead to unstable fitting of the sigmoid function, as we will show in detail below. These supporting points are often missing in measured ABR responses, as the response of bigger clusters of neurons often saturate for very high stimulus intensities (Nizami and Schneider, 1999) (exemplarily shown in Supplementary 5).

Parameter Reduction Using a Hard Sigmoid Based Fit Function
To overcome these problems we chose a generalized hard sigmoid function-often used for artificial intelligence approaches (t: lower knee = sensory threshold, h: saturation value, s: slope, cf. Figure 2E) The lower knee of this function can be defined as the sensory threshold in analogy to the procedure used for the logistic function. When noise is added, this knee is smoothed according to Equation 2b (Figure 2E, for derivation of the smoothening of the lower knee cf. Supplementary 3).
When this approach is applied to the data set shown in Figure 1, in analogy to the approach described before (cf. Figure 2C, for sigmoid function), the determined threshold is independent of the number of measurement repetitions (cf. Figure 2F).
Taken together, this procedure further reduces fit parameters as the arbitrary parameter p is eliminated. Additionally, the novel fit function is more robust against missing data (supporting points) in the saturation range, as will be discussed in detail in the following section.

Effect of Removal of Data Supporting Points
Naturally, when attempting to measure sensory or behavioral thresholds, the actual threshold for a given stimulus as well as the dynamic range are both unknown. Depending on the chosen intensity range of presented stimuli, response amplitudes may lie (as preferred) within the dynamic range, but stimulus intensities may also lie below threshold or above saturation levels. If too few data points are available to sample the dynamic range of the sensory response, standard fitting procedures often fail to yield meaningful results. Therefore, to further validate the robustness of our new approach we test the effect of removal of data points on the estimated threshold. We use the same simulated data set with the logistic function as underlying stimulus response function (cf. Figure 1) and then stepwise reduce the number of supporting data points.
The determined threshold for both fit functions are independent of deletion of the subthreshold supporting points (Figures 3A-C). As σ can easily be calculated from the raw data measurement under the non-stimulus condition (which is mandatory) and is used in our approach as a fixed value for fitting, subthreshold data points are redundant. The shape of the curve can even be estimated if approximately half of the dynamic range supporting points are deleted (cf. Figure 3C). Consequently, data points close to the threshold and hence with low signal-to-noise ratio do not have to be measured anymore.
Likewise, a stepwise removal of supporting points within the saturation range has little effect on the estimated threshold for the hard sigmoid fit (cf. Figures 3E,F). In contrast to the logistic function fit, which is less robust (cf. Figures 3D,F). This is particularly important as very high stimulus intensities for practical reasons often cannot be presented without causing potential damage to the sensory system. Furthermore, in far field recordings like ABR measurements does not always follow an idealized sigmoid function in the complete stimulus range, as it is a sum potential out of huge populations of neurons with various stimulus-response characteristics. However, the novel fitting approach is also reliable if the stimulus response function does not exactly follow a sigmoid shape and still provides reproducible threshold estimates.
Taken together, our approach to use a fitting procedure with a hard sigmoid function provides high robustness against deletion of supporting points in subthreshold and threshold as well as saturation range, reduction of the number of applied measurement repetitions (sample size), and does not depend on an arbitrary threshold parameter p.

Application of the Method to Different Types of Neurosensory Data
So far we have developed and tested our new approach for threshold estimation based on artificial data. In the following section, we demonstrate that this approach is universally applicable and that the underlying principles can be applied to a number of different neurophysiological parameters. We performed stimulus response measurements in different brain regions of Mongolian gerbils (brainstem, cortex), different sensory modalities (auditory, somatosensory system) and different measures of evoked neurophysiological activity (far field potentials, spiking activity).

Auditory Brainstem Responses (ABR)
ABR waves in response to pure tone stimuli of 6 ms duration with onset and offset ramps (2 ms) and four different stimulus frequencies (1, 2, 4, 8 kHz) were measured.
The response amplitude is quantified by the calculation of the RMS over the signal in the 10 ms time interval after stimulus onset (Figures 4A,B), resulting in a typical stimulus response relationship (Figures 4C,D). This relationship can be welldescribed by the hard sigmoid fit, yielding threshold estimates with the knee criterion.
To estimate the confidence of these obtained sensory thresholds, we use the subsampling method (200 out of 240 trials), which prove that the obtained thresholds are robust against outliers (Figure 4E). The method reliably shows different thresholds for the different stimulus frequencies, indicating, that the hearing ability of this animal is best in the range between 2 and 4 kHz (Figure 4E), which corresponds to audiograms from the literature measured via behavioral paradigms (Ryan, 1976).

Cortical Local Field Potential (LFP) Data From Different Sensory Modalities
The described fitting procedure can also be applied to other kinds of electrophysiological measures like cortical LFP data. Here we applied our new threshold estimation procedure to LFP recordings from the auditory and somatosensory cortex. Analogously to the processing of the ABR data, the RMS of the LFP data was used to estimate level response functions by fitting to the hard sigmoid function. For auditory stimulation puretone stimuli of 2 kHz were used, and for vibro-tactile stimulation 175 Hz stimuli were applied to the contralateral hind-limb of the animal via a Linear Resonant Actuator (cf. Methods).
For both auditory and somatosensory stimulation the stimulus response relationship can be described with the hard sigmoid function (Figures 5B,E), yielding a threshold value through the knee criterion. Interestingly this method works equally well for sound stimuli sensed by the animals ear as well as for vibrational stimuli sensed via the animal's paw. The analysis of LFP data recorded from both sensory modalities again demonstrates that our approach is robust against missing of supporting points near the threshold (cf. Figures 5B,E) and the number of applied measurement repetitions (Figures 5C,F).

Spiking Data From Auditory Cortex
Finally, we apply our approach for threshold estimation to single neuron spiking data with just a few modifications, exemplarily shown for pure tone responses of auditory cortex neurons ( Figure 6). As spikes are detected by multi thresholding the raw signal, spike rates are not superposed by measurement noise. Instead, σ here is defined as the spontaneous activity, i.e., the spike rate in the absence of stimulation. As in contrast to LFP data no RMS values are calculated and spike rates can in first order be added, the fit function is the sum of hard sigmoid function and spontaneous activity (cf. Equation 2a, where f 0 (x) is the hard sigmoid function).
The measured spike rates show a clear dependency on the sound pressure level (Figure 6A), which again can be described by a hard sigmoid function (cf. Figure 6B). As already shown for the LFP data (Figure 5), our novel fitting approach again needs a minimum of fitting parameters and is robust against the number of measurement repetitions (cf. Figure 6C).

DISCUSSION
In this article we present a novel and robust method for threshold estimation based on neurophysiological data. The robustness and objectiveness of the method is based on three main principles.
First, the sensory threshold is determined by the analysis of the complete amplitude response function and not only by analyzing the data near the threshold where the S/N ratio is worst. This advantage is achieved by applying a fitting procedure, where the measurement noise level σ is kept constant and is not treated as a free parameter. Thus, it is possible to estimate the shape of the amplitude response function by exclusively analyzing supporting points above the threshold. The validity of this procedure was systematically analyzed using artificial data (cf. Figure 2) and verified by the analysis of different neurophysiological parameters. Taken together, the approach enables us to obtain an objective automated measurement of a threshold, eliminating the need of investigating "just-above" threshold responses within a noisy signal by visual threshold estimations by experimenters or clinical professionals (Zheng et al., 1999;Casper et al., 2003;Vidler and Parkert, 2004;Zaitoun et al., 2016). The second principle is that the threshold estimation should not contain an explicit dependency on the measurement noise, i.e., the definition of the threshold as the minimum signal exceeding a certain fraction of the background noise amplitude (cf. 2σ criterion) leads to one major problem: These criteria depend systematically on the number of measurement repetitions, meaning that an increase of the S/N ratio of the measurement procedure leads to systematic decrease of the determined threshold. We have shown that the threshold as a function of the number of applied measurement repetitions does not asymptotically approach a constant value but diverges. To this end, we deleted any discrete dependency of the threshold criteria on the background noise.
The third major principle is the reduction of free selectable parameters. We have shown that fitting of an extended generalized logistic function could be used for threshold estimation, but needs one further free parameter defining the threshold. In contrast to that approach, we chose a hard sigmoid function with a clear defined knee (cf. Figure 2E), which can be used to estimate the threshold without a pre-defined threshold criterion. Furthermore, the fit of a hard sigmoid function is more robust against missing points from the saturation range, thus the experiment does not need to cover the whole dynamic range of the sensory system to be investigated.
The method was evaluated on different sensory modalities (auditory and somatosensory), from different brain regions (cortex and brain stem), to demonstrate its wide and general applicability.
Though the method provides the possibility to estimate highly reproducible and plausible threshold estimates based on electrophysiological measurements it has to be considered that the determined threshold are only correlates of the thresholds estimated using psychophysical methods.
Thus, thresholds based on electrophysiological measures led to frequency specific thresholds higher to what is known from the literature (Ryan, 1976). The reasons for this discrepancy may lie in the fact that the electrophysiological sum potentials are produced by several thousands of neurons, whereas a behavioral response can be evoked by the activity of a far smaller amount of firing neurons.
However, it has to be assumed that also psychophysical thresholds are only a man-made measure to quantify sensory sensitivity. Thus, in signal detection theory the sensory threshold is set to the inflection point of a fitted sigmoid function to the fractions of the correctly perceived stimuli, which depend on the presented stimulus intensities. This fit function is called psychometric function being typically a logistic or Weibull function (Treutwein and Strasburger, 1999;Tyler and Chen, 2000;Strasburger, 2001).
For mathematical and symmetry considerations the inflection point of the psychometric function is a sophisticated choice for the threshold estimate, however, it is more a consensus than biologically motivated. This threshold estimation technique based on psychometric function fitting and inflection point determination cannot easily be translated to neurophysiological data, because there-as described above-saturation range points are often difficult to measure. Nevertheless, the here described method for the determination of sensory thresholds is based on similar considerations as numerical stability and reproducibility. This means the method provides a stable correlate of the psychophysically determined threshold, however, a direct translation of the two measures can only achieved by reference experiments in identical subjects.
One further reason for the threshold discrepancy could lie in the effects of anesthesia, though it could be shown that the ketamine/xylazine anesthesia has no effect on ABR thresholds (Smith and Mills, 1989;Ruebhausen et al., 2012) in gerbils and rats in contrast to mice (Van Looij et al., 2004) where an effect is measurable.
Despite these limitations, a reproducible correlate for sensory thresholds is essential for scientific purposes as especially the objectiveness prevents the experimenter from systematic errors and reproducibility is a core concept of scientific studies.

DATA AVAILABILITY
The data and software for all figures is provided on Dryad (doi: 10.5061/dryad.121s23b).

ETHICS STATEMENT
Mongolian gerbils (Meriones unguiculatus) were housed in standard animal racks (Bio A.S. Vent Light, Ehret Labor-und Pharmatechnik, Emmendingen, Germany) in groups of 2 to 3 animals per cage with free access to water and food at 20 to 24 • C room temperature under 12/12 h dark/light cycle. The use and care of animals was approved by the state of Bavaria (Regierungspräsidium Mittelfranken, Ansbach, Germany, No. 54-2532.1-02/13). All methods were performed in accordance with the relevant guidelines and regulations of NIH. A total of 11 male gerbils aged 10-12 weeks purchased from Janvier Laboratories Inc. were used in this study.

AUTHOR CONTRIBUTIONS
AS and HS led the project. AS, PK, RG, and HS developed the method, to which CM made critical contributions. AS and RG programmed the evaluation software. AS and PK conducted the measurements. AS, RG, PK, CM, KT, and HS discussed the results. AS, RG, and HS wrote the manuscript. CM and KT provided advice and article revisions. All authors approved the final version of the article.