Making do with what we have: use your bootstraps

Key Points 

a jack knife is a pocket knife that is put to many tasks, because it's ready to hand. Often there could be a better tool for the job, such as a screwdriver, a scraper, or a can-opener, but these are not usually pocket items. In statistical terms, the expression implies making do with

deviation). After taking a sample, we can calculate a mean, and by assuming a specific form of population distribution we can estimate confidence intervals for our sample (Figure 1).
In practice, most researchers would be more reassured to know that the mean and its confidence interval could be reliably estimated directly from a sample that they had taken, and not wish to make assumptions about the characteristics of the population. Often, an initial test to assess a sample for 'normal distribution' can be unhelpful or misleading, often because there may be insufficient data to provide a convincing test result-another occasion when absence of proof is not proof of absence.
The bootstrap process is a way of working with the data we have. As long as we have a sample of sufficient size, we can use the sample to determine the probability distribution, and hence other population measures such as confidence intervals. We do not need to assume anything else about the population from which we have taken the sample, other than the fact that it has been randomly sampled. The saying 'a bird on the hand is worth two in the bush' summarizes this philosophy: although the data in the sample we have taken may be insufficient to provide an adequate probability distribution directly, we now have the opportunity to take repeated further random samples from the sample that we already have. The sample we have taken contains values that reflect the original population, and random sampling from these values allows characteristics of the original population to be inferred. The principle of the bootstrap is that we use our sample, which can be repeatedly randomly sampled, to estimate features of a source population that is inaccessible.
These procedures have only become popular with the use of computers to do statistical tests (5), since the calculations are tedious and have to be done many times. A famous photograph of R. A. Fisher (he devised a permutation test that also required repeated calculations) shows him keying data into a mechanical calculator. Any moderate bootstrap test can involve 1,000s of 'resamples' and would probably wear out both calculator and statistician long before the test was completed! The concept is that after a random sample has been taken, the values in this sample are repeatedly, randomly, 'resampled' to generate a large series of new sets of values that we shall call 'pseudo-samples'. From each of these pseudo-samples, we can calculate values that characterize the source population. For example in Figure 2 we derive mean values. In other words, we are using the original sample as a substitute for the original population, to provide further samples. We use these further samples, in this case, to estimate the sampling distribution of the mean, but we can use the same process to obtain other features of the population from the data.
The original sample contains within it the features of the population it was drawn from. We then apply the same principles of inference and sampling to the sample we have taken as we did when we took our original sample from the population. The bootstrap samples are to the original sample, as the original sample was to the population. Although this process yields an approximate result, it is likely to give more accurate estimates of population parameters than if these values had been calculated on the basis of an initial incorrect assumption of a specific pattern of distribution. The estimates may be inaccurate if the first sample is limited, because limited data cannot provide a sufficiently representative sample. The bootstrap process is particularly suited to describing populations; it can also be used for comparisons, but other tests such as permutation tests may be more appropriate here.
Let us compare a sample of frogs from the north of California, with a group sampled near Calaveras, where escapees from the jumping competition have interbred with native frogs. We wish to compare estimates of the two populations from which these samples have been drawn. We generate repeated random pseudo-samples from each sample. Each time a value is taken, we can choose any of the values that are in the original sample so there is the opportunity for each value to be chosen more than once, in each pseudo-sample, and from pseudo-sample to pseudo-sample. (This is sampling 'with replacement', i.e. after it has been chosen, the value is replaced in the original stock of values.) Figure 2 shows a simplified version of the process.
The mean of each pseudo-sample is calculated. Figure 2 shows the first six of these pseudo-samples as dot plots. If we continue taking pseudo-samples until we have 20, and calculate the mean of each of these, we can then arrange these mean values as a distribution histogram, as seen in the bottom panels of Figure 2. By taking the central 18 values we define the 90% confidence limits for the mean of the original sample.
In practice, the process is repeated many more times than this. Typically we could generate 10,000 pseudo-samples to generate a range of mean values. We use these mean values to generate the confidence limits. This is referred to as the bootstrap percentile method. The distribution of the means for this number of samples can be seen in Figure 3.
Further analysis of the data can be done with the same basic method. We can use the method to conduct comparisons. Thus in our example we find that the mean jump distance of our Calaveras sample is 0.44 m greater: what are the 95% confidence intervals of this estimate? We approach this by independently drawing a random pseudo-sample, with replacement, from each group (as in Figure 2, B and C), and calculate the mean value of each sample. The difference between these mean values is an estimate of the difference between the groups. We continue to repeat the process of randomly taking pseudo-samples from each group and calculating differences between the means. Figure 4 shows the distribution of differences obtained after repeating this process 10,000 times. If we count 250 from each extreme of these differences we define the 95% confidence interval for the original observed difference:   0.44 m (0.039, 0.847). These values do not include zero, and thus we conclude that the observed difference in jump distance between the two groups of frogs is unlikely to be a result of chance, and gain an indication of the likely size of this difference.
The bootstrap method is flexible and robust, well suited for analysis of data whose population distribution is uncertain, as is often the case in biological studies (6). For example, assumptions about distribution can be avoided when comparing quanta at synapses (8). There may be occasions when the bootstrap can fail: for example it is not good with extreme distributions, or to estimate statistics-like the maximum-that depend on very small features of the data. Modern computers make the tedious procedure of repeated sampling straightforward. However, standard textbooks and the standard statistics packages have failed to acknowledge the value of the bootstrap approach. Curran-Everett describes how to use the statistical software package R for bootstrap methods (3), an add-on facility exists for SPSS and Excel, and Cole has described a macro to use with SAS (2). Most current packages lack standard facilities for these procedures. These useful and powerful methods should become gradually more common in standard statistical software.
Note: Those with some programming skills can set up basic approaches themselves. Useful insights on how to implement different bootstrap methods in various programming languages can be found in Good's book (7). For users of the Python programming language, the data used in this paper and the code used for its analysis is available at http://bit.ly/KJ67RW (1).

DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s).