Estimation of effective number of breeders from molecular coancestry of single cohort sample

The effective population size, Ne, is an important parameter in population genetics and conservation biology. It is, however, difficult to directly estimate Ne from demographic data in many wild species. Alternatively, the use of genetic data has received much attention in recent years. In the present study, I propose a new method for estimating the effective number of breeders Neb from a parameter of allele sharing (molecular coancestry) among sampled progeny. The bias and confidence interval of the new estimator are compared with those from a published method, i.e. the heterozygote-excess method, using computer simulation. Two population models are simulated; the noninbred population that consists of noninbred and nonrelated parents and the inbred population that is composed of inbred and related parents. Both methods give essentially unbiased estimates of Neb when applied to the noninbred population. In the inbred population, the proposed method gives a downward biased estimate, but the confidence interval is remarkably narrowed compared with that in the noninbred population. Estimate from the heterozygote-excess method is nearly unbiased in the inbred population, but suffers from a larger confidence interval. By combining the estimates from the two methods as a harmonic mean, the reliability is remarkably improved.


Introduction
The effective population size, N e , is one of the most important parameters in population genetics and conservation biology, because this parameter determines both the amount of genetic drift and the rate of inbreeding (Crow and Kimura 1970;Falconer and Mackay 1996). N e can be estimated from demographic data such as the number of parents and the variance in their progeny number (Caballero 1994). However, the demographic data needed to estimate N e is often not available in many wild species. As an alternative to estimating N e from demographic data, methods for estimating N e from genetic data have been developed (for reviews, see Waples 1991;Schwartz et al. 1999;Beaumont 2003;Leberg 2005;Wang 2005). These methods have different time scales on which N e is measured. Some of them infer the long-term N e in the past on an evolutionary time scale, and others estimate the current or short-term N e (Waples 1991;Wang 2005). For solving practical issues such as managing a small population of endangered species, an accurate estimate of the current or short-tem N e is of special importance, which is a major concern of this study.
To date, three methods are available for this purpose: the temporal method (Nei and Tajima 1981;Pollak 1983;Waples 1989), the linkage disequilibrium method (Hill 1981) and the heterozygote-excess method (Pudovkin et al. 1996;Luikart and Cornuet 1999). These methods actually assess the effective number of breeders (N eb ) of a cohort from which a sample is obtained. If the sample consists of reproductive adults, N eb is nearly equivalent to N e in populations with nonoverlapping generations (Schwartz et al. 1999; and as will be discussed later). N e can be estimated from N eb in populations with overlapping generations, if the age structure is known (Waples 1991).
The logic behind the temporal method is that the change of allele frequency in samples separated in time is Keywords effective number of breeders, effective population size, genetic estimate, molecular coancestry, single cohort sample.

Correspondence
Tetsuro Nomura, Department of Biotechnology, Faculty of Engineering, Kyoto Sangyo University, Kyoto 603-8555, Japan. Tel.: +81 75 705 1929; fax: +81 75 705 1914; e-mail: nomurat@cc.kyoto-su.ac.jp a reflection of genetic drift. This method is the most tested of the genetic N eb estimators and has been used to estimate N eb of various species (Schwartz et al. 1999). The primary weakness of this method is that two or more samples separated in time are necessary (Schwartz et al. 1999). This can be expensive and, by nature, timeconsuming. The linkage disequilibrium method is based on the fact that genetic drift generates nonrandom association among alleles in different loci. Despite of the obvious advantage that this method can be used to estimate N eb from a single cohort sample, there are several drawbacks (Schwartz et al. 1999;Wang 2005). Perhaps, the most critical one is that the estimator assumes an isolated equilibrium population with a constant effective size, which may not be tenable for natural populations of endangered species. The heterozygote-excess method is based on the fact that when the breeding population is small, binomial sampling error produces allele frequency differences between male and female breeders, resulting in an excess of heterozygotes in their progeny (Robertson 1965). As in the linkage disequilibrium method, this method has the advantage that only a single cohort sample is required. Further, this method is appealing because the estimate is easily computed. However, there are few applications of this method, presumably because of the low precision, as empirically shown by Luikart and Cornuet (1999).
Several authors (Waples 1991;Pudovkin et al. 1996;Luikart and Cornuet 1999) emphasized the importance of exploring a method that gives an estimate independent of ones from existing methods, because a combined estimate of several independent estimates is expected to improve the precision of separate estimates. In the present study, a novel method for estimating N eb from genetic data of a single cohort sample is proposed. The estimator is obtained from a simple parameter (molecular coancestrty) of allele sharing among sampled individuals. Reliability of the new estimator is compared with that from the heterozygote-excess method using computer simulation. Improvement of the reliability attained by combining the two methods is also examined.

Methods
Estimation of N eb from parent-based coancestry Although a monoecious diploid population is assumed throughout the following derivation, the extension to dioecious diploid species is straightforward and the same estimation method is applicable to the population.
Let f t be the coancestry among two randomly sampled individuals in generation t, and P be the probability that two randomly sampled alleles each from different individuals in generation t come from the same individ-ual in generation t ) 1. The recurrence equation for the coancestry is given by (Crow and Kimura 1970, p. 102), where F t)1 is the inbreeding coefficient of individuals in generation t ) 1. Following the definition by Crow and Kimura (1970, p. 347), we define the effective number of breeders (N eb ), or strictly the inbreeding effective number, as We set the base population of f t at the population of generation t ) 1 by assuming F t)1 = f t)1 = 0. Putting t ) 1 = 0 in (1), we obtain from (1) and (2), f 1 ¼ P=2 and This means that an estimate of N eb can be obtained if the parent-based coancestry (f 1 ) among individuals in one cohort is estimated.

Molecular coancestry
For locus l, molecular coancestry f M,xy,l (frequently called 'molecular similarity index') between individual x having alleles a and b and individual y having alleles c and d is defined as (Malécot 1948) where indicator I ac is one when allele a of individual x is identical to allele c of individual y, and zero otherwise, etc. When there are L marker loci, molecular coancestry f M,xy is the average molecular coancestry over all loci (Toro et al. 2002(Toro et al. , 2003: Molecular coancestry will be not only because of alleles that are identical by descent but also because of alleles that are alike in state (AIS). Molecular coancestry is, therefore, an upward biased estimator of the coancestry relative to an arbitrary base population. When s l denotes the probability that two alleles at locus l are AIS in the base population, the expected molecular coancestry between individual x and y at locus l is (Oliehoek et al. 2006) 1 À E f M;xy;l where f xy is the coancestry between individuals x and y expressed relative to the base population.
Equation (5) shows that a value for s l is needed for each locus to obtain f xy . If allele frequencies in the base population are known without errors, s l is computed as s l ¼ P n l i¼1 p 2 i , where n l is the number of alleles in locus l and p i the frequency of ith allele in locus l in the base population. Because allele frequencies in the base population are, however, usually unknown, s l needs to be estimated. Similar problem is arisen in estimating any relatedness from molecular markers. In most of the published works (e.g. Ritland 1996;Lynch and Ritland 1999), allele frequencies have been estimated from the current population for which relatedness is estimated, meaning that the base population is set equal to the current population. For our purpose, this approximation leads to an apparent contradiction, because it implicitly assumes no drifts in allele frequencies between parent and progeny generations (i.e. N eb = ¥).
Estimation of f 1 from f M,xy Irrespective of the upward bias, simulations suggest that molecular coancestry can be a good indicator of the coancestry relative to an arbitrary base population (e.g. Toro et al. 2003;Oliehoek et al. 2006). We take advantage of this property to convert the molecular coancestry to the parent-based coancestry (f 1 ).
Suppose that n individuals are sampled from progeny in a given generation, for which f 1 is estimated. We assume that the sample consists of at least two nonsib families. This assumption will be satisfied except for a population with an extremely small number of parents, such as a population with only one male parent in polygynous species. Thus, for a given individual in the sample, at least one nonsib pair should be involved in the possible n ) 1 pairs with other sampled members. Underlying concept of our estimation is that the nonsib pairs could be inferred from molecular coancestry. Fernández and Toro (2006) showed that a sib-ship can be reconstructed from molecular coancestry with a high accuracy, suggesting that the inference on nonsib pairs based on molecular coancestry has a fairly high precision.
We assume that pairs inferred to be nonsibs (putative nonsibs) are true nonsibs (i.e. f xy = 0). Thus, substituting the average molecular coancestry ( f M;l )for locus l over all pairs of putative nonsibs into (5) gives an estimate of s l : With the weight w l to optimize the contributions of loci to the estimate of coancestry, suggested by Oliehoek et al. (2006), the parent-based coancestry between individuals x and y, f 1,xy , is estimated aŝ w l f M;xy;l Àŝ l 1 Àŝ l ; where W ¼ X L l¼1 w l ; (Oliehoek et al. 2006) andp i is the estimated frequency of allele i in locus l from the sampled individuals. Note that the weight w l puts more weight on loci with small s l and with lots of alleles at nearly equal frequency. The estimate of f 1 is simply obtained by averagingf 1;xy over n P ¼ n n À 1 ð Þ=2 pairs: And from (3), N eb is estimated bŷ Selection method for putative nonsib pairs The simplest method for selecting putative nonsibs from all the possible pairs is to select a given number (n 0 ) of pairs with the smallest molecular coancestry. However, this method leads to an underestimation of s l , because of the positive correlation between f M,xy and f M,xy,l due to the finite number of marker loci (L). For example, in an extreme case where only one marker locus is available (L = 1), the selection of the smallest f M,xy automatically results in the selection of pairs with the smallest f M,xy,l . When the number of selected pairs (n 0 ) is much smaller than the number of the actually existing nonsib pairs, the average of f M,xy,l over the selected n 0 pairs is expected to be lower than that of f M,xy,l over all the actually existing nonsib pairs, leading to an underestimation of s l [cf. equation (6)]. In a strictly statistical sense, the selection of putative nonsibs for the estimation of s l should be based on data independent of the sample from which s l is estimated. This problem could be largely solved by excluding the information on locus l in selecting putative nonsib pairs for the estimation of s l . Denoting the molecular coancestry between individuals x and y excluding the information on locus l by f M,xy,/l , we can compute it as For estimating s l , the selection of n 0 pairs with the smallest coancestry is based on this partial molecular coancestry.
In the present study, the following selection method was applied: (i) Give the sequential numbers (i = 1, 2, …, n) to n sampled individuals. (ii) For the first individual (i = 1), a pair with the smallest f M,xy,/l [computed from (8)] is selected from n ) 1 pairs with other members. (iii) For the proceeding individual (i ‡ 2), a pair with the smallest f M,xy,/l is selected in the same manner. But if the pairs already selected in the previous selection are included in n ) 1 candidate pairs, the pairs are excluded from the candidates to avoid doubly selecting the same pairs. (iv) As a result, we obtain n 0 (=n) pairs with the smallest f M,xy,/l ; (v) averaging f M,xy,l [computed from (4)] over the n 0 pairs. The average ( f M;l ) is the estimate of s l [cf. equation (6)]. (vi) Steps (ii)-(v) are repeated until estimates of s l are obtained for all marker loci.

Computer simulation
Computer simulation was carried out to evaluate the reliability of the presented method. Genotypes of individuals in the initial population were generated by assigning alleles randomly sampled from an infinite (conceptual) gene pool with a uniform allele frequency distribution with two alleles for the 'low-polymorphic' marker loci case or 10 alleles for the 'high-polymorphic' marker loci case. The number of loci was 80 for both polymorphic cases. Prior to progeny sampling for the estimation of N eb , eight generations of random mating with a breeding system defined below were simulated to accumulate inbreeding and relationship. As the breeding system, monogamy and polygyny were modeled. Under monogamy model, an equal number of male and female parents (N/2) were randomly paired to form N/2 permanent couples. Progeny (parent of the next generation) was produced from a randomly sampled couple, and the sampling of a couple and the reproduction were repeated until N/2 replacements of each sex have been obtained. Under polygyny model, N m males and N f (>N m ) females were generated, and each female was mated with a randomly sampled male (thus, there are N f fixed matings). Progeny was produced from a randomly sampled mating, and this was replicated to obtain N m males and N f females for the parents of the next generation. In the final generation, a sample of n progeny was obtained in the same manner of reproduction of the respective breeding system. From the loci each with at least two segregating alleles in the sampled progeny, L = 5-30 loci were randomly chosen as marker loci. For the standard parental population size, N = 10 in monogamy, and N m = 5 males and N f = 20 females in polygyny were computed. Sample size of progeny (n) in the final generation was 100 for the two breeding systems. In the low-polymorphic marker loci case, all the marker loci should have exactly two alleles (n l = 2) as in single nucleotide polymorphisms, but the allele frequency distribution is varied among the loci. In the high-polymorphic marker loci case, not only the allele frequency distribution but also the number of alleles is varied among the loci. In the above standard population size, the average numbers of alleles per marker locus was 3.83 in monogamy, and 5.31 in polygyny, which would be comparable with the allele number of microsatellite markers in a practical survey. This type of data generation is referred to as the 'inbred population' model, in a sense that the parental population of sampled progeny consists of inbred and related individuals, which will be a general situation of endangered species populations.
As another type of data generation, the 'noninbred population' model was also simulated. The manner for the assignment of initial genotypes and the acceleration of generations were exactly same as in the inbred population, except for that the number of accelerated generations was seven. At the final generation, the allele frequency distribution of each locus was memorized. Then, genotypes of parents were regenerated by assigning alleles randomly sampled from an infinite gene pool with the memorized allele frequency distribution. The sampling of progeny and the choice of marker loci were same as in the inbred population. These procedures could produce a parental population consisting of noninbred and nonrelated individuals but having the same quality of molecular information as in the corresponding inbred population. This type of data generation could be an approximation of a recently recolonized population in an ephemeral habitat.
In additional computations, different sizes of parental population and progeny sample were examined. The effect of unequal contribution of parents on the estimates was also evaluated under monogamy with N = 10, by considering the following two patterns of unequal contributions of N/2 = 5 couples: (0.4, 0.3, 0.1, 0.1, 0.1) and (0.6, 0.1, 0.1, 0.1, 0.1). The number of replicated runs for each combination of population model, breeding system and variables was 5000.
Demographic effective number of breeders (N eb,demo ) under monogamy model was computed from the standard formula of the inbreeding effective size (Caballero 1994): where l k ¼ n= N=2 ð Þ ½ and r 2 k are the mean and variance of the number of progeny of couples, respectively. The expression of r 2 k under the simulated condition is given in Appendix A. N eb,demo under polygyny is computed as The derivation of this equation is shown in Appendix B. N eb from pedigree coancestry was also computed, which was simply obtained by substituting the average parent-based pedigree coancestry of sampled progeny into (7). The computed N eb well agreed with N eb,demo . Thus, only the value of N eb,demo was presented in results, and it was referred to as the true value of simulation. In addition to the estimate (denoted asN eb;fmol hereafter) obtained from (7), estimate from the heterozygote-excess method (N eb;he ; Pudovkin et al. 1996) was computed for comparison. The locus specificN eb;he;l is estimated aŝ and H obs,i and H exp,i are the observed and expected proportion of heterozygotes having allele i, respectively. Multiple loci estimate was simply computed as the harmonic mean ofN eb;he;l over the marker loci, following the previous simulation studies (Pudovkin et al. 1996;Luikart and Cornuet 1999). In both methods, when a negative estimate was obtained, the estimate was regarded as an infinite (N eb ¼ 1).
As a criterion of evaluation, the harmonic mean of estimates over 5000 replicates was computed. Furthermore, to characterize the variation and distribution of estimates, 10th, 50th and 90th percentiles in replicates were calculated. The xth percentile was obtained as the 5000 · (x/ 100)th smallest estimate in 5000 replicated estimates.

Results and discussion
Left and middle panels in Fig. 1 (A: monogamy and B: polygyny) illustrate the 10th, 50th and 90th percentiles, and a harmonic mean of 5000 replicated estimates of the effective number of breeders (N eb ) from the heterozygoteexcess and molecular coancestry methods applied to the noninbred population with L = 5-20 high-polymorphic marker loci. The three percentiles indicate that the distributions of estimates from both methods are skewed upward. The 50th percentile and harmonic mean were, however, close to N eb,demo (10 for monogamy and 13.79 for polygyny) in both methods. Under monogamy, the interval between 10th and 90th percentiles inN eb;he tended to be wider than that inN eb;fmol , whereas the reversal tendency was observed under polygyny.
The corresponding simulation results in the inbred population are shown in Fig. 2. Although the 50th percentile and harmonic mean show that the heterozygoteexcess method gives an essentially unbiased estimate of N eb , the estimate from the molecular coancestry method tends to be biased downward. The degree of bias became larger as the number of marker loci increased. Inbreeding and relationship in the parental population gave quite a different impact on the confidence interval in the two methods. The interval between 10th and 90th percentiles inN eb;he was widened in the inbred population, compared with that in the noninbred population (Fig. 1). The increase of confidence interval was more remarkable under monogamy. In fact, the 90th percentile under monogamy was infinite even with L = 20 marker loci. In contrast, the interval inN eb;fmol was remarkably narrowed in the inbred population. For example, the 10th and 90th percentiles inN eb;fmol under monogamy with L = 20 marker loci were 3.75 and 12.93, respectively.
In a strict sense, the heterozygote-excess method is valid only when the progeny are produced by random union gametes (Pudovkin et al. 1996;Luikart and Cornuet 1999). When the progeny are produced by individualbased pairwise matings such as monogamy and polygyny, the sample of progeny is family-structured. In such a sample, heterozygote deficiency generated by the interfamily Wahlund effect may mask the heterozygote excess, reducing the usefulness of the heterozygote-excess method (Luikart and Cornuet 1999). Using computer simulation, Luikart and Cornuet (1999) examined the effect of a family-structured sample on the reliability of the heterozygote-excess method. They found that the heterozygoteexcess method gives an essentially unbiased estimate even with a family-structured sample. However, the existence of family structure in sampled progeny substantially increased the variance of estimates under monogamy. Simulation data of Luikart and Cornuet (1999) was generated in the same manner as the noninbred population of the present study. Thus, their sample of progeny contains only sib families. On the other hand, the sample of progeny from the inbred population consists of families with various degrees of relationship (e.g. cousins). The increased confidence interval observed in Fig. 2 indicates that the application of the heterozygote-excess method to such a sample reduces the reliability, although the method still gives an unbiased estimate. The reduction of reliability will be more serious under monogamy (Fig. 2).
As a detail information on the estimation process in the molecular coancestry method, Table 1 gives the observed and estimated [from equation (6)] AIS probability (s l ) in the parental population, and the average estimated parent-based coancestry among actual nonsibs (NS), actual half-sibs (HS), actual full-sibs (FS) and all pairs of sampled progeny, for the case of monogamy and polygyny with L = 15 high-polymorphic marker loci. All the values are shown as the average over 5000 replicates (and over 15 marker loci for s l ). In the noninbred population, the estimated AIS probability was close to the observed value, giving the average estimates of the parent-based coancestries in the three categories (NS, HS and FS) close to the pedigree coancestries, i.e. 0, 0.125 and 0.25 for NS, HS and FS, respectively. Thus, the molecular coancestry method gives an essentially unbiased estimate of N eb for the noninbred population (Fig. 1). However, the process of selecting putative nonsibs in the molecular coancestry method causes a problem when applied to the inbred population. The selection method may select the actual nonsibs with a reasonably high probability. But the putative nonsibs selected from the inbred population may be less-related nonsibs with regard to further back ancestral relationships than the average nonsibs among the sampled progeny. As seen from Table 1, this causes an underestimation of AIS probabil-ity, implying that the base population for coancestry is set at a further back generation over the parental generation. This overrun in setting the base population results in an overestimation of the parent-based coancestry, leading to a downward bias ofN eb;fmol as observed in Fig. 2. Irrespective of this drawback, the narrow confidence interval ofN eb;fmol in the inbred population is attractive in its practical use. Although the molecular coancestry method will be less useful for a point estimate of N eb in inbred populations, it will be useful for detecting a small N eb .
The simulation results for the estimation with the lowpolymorphic marker loci are shown in the left and middle panels in Fig. 3(A) for noninbred and Fig. 3 Figure 1 Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the noninbred population under (A) monogamy with N = 10 (half of each sex) parents and (B) polygyny with N m = 5 male and N f = 20 female parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100.N eb;he is the estimate from heterozygote-excess method (Pudovkin et al. 1996),N eb;fmol the estimate from equation (7) andN eb;comb the estimate by the harmonic mean ofN eb;he and N eb;fmol . The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates withN eb ¼ 1. The dashed line shows the effective number of breeders expected from demographic parameters (N eb,demo = 10 under monogamy and 13.79 under polygyny, respectively). populations in monogamy. Results in polygyny (data not shown) were essentially similar to those in monogamy. As seen from the 10th and 90th percentiles inN eb;he , the heterozygote-excess method suffers from a larger confidence interval. In fact, even with L = 30 marker loci, the 90th percentile inN eb;he was still infinite in both noninbred and inbred populations. In contrast, the molecular coancestry method gave an estimate with a practically acceptable confidence interval when L = 30 marker loci were available. Table 2 shows the results from simulation runs with additional combinations of the number of parents and sample size, for the case of L = 15 high-polymorphic marker loci. As the harmonic mean of replicated estimates well agreed with the 50th percentile, it was not shown in the table. The general properties of estimates, e.g. a small bias of estimation from both methods in the noninbred population and a downward bias ofN eb;fmol in the inbred population, were similar to those observed in Figs 1-3. A remarkable point in Table 2 is a narrower confidence interval ofN eb;fmol in a small sample of progeny from a small inbred population. For example, under monogamy with N = 10 parents, the 90th percentile of N eb;fmol from n = 10 progeny was 38.2, while the corresponding percentile ofN eb;he was infinite. In most of the practical situations of conservation biology, the population in question will be small and inbred, and may suffer from a low reproductive ability. The molecular coancestry method could significantly contribute to the detection of small N eb of such populations. The magnitude of the downward bias ofN eb;fmol increased in a larger inbred population, as seen from the 50th percentiles in   Figure 2 Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the inbred population under (A) monogamy with N = 10 (half of each sex) parents and (B) polygyny with N m = 5 male and N f = 20 female parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100.N eb;he is the estimate from heterozygoteexcess method (Pudovkin et al. 1996),N eb;fmol the estimate from equation (7) andN eb;comb the estimate by harmonic mean ofN eb;he andN eb;fmol . The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates withN eb ¼ 1. The dashed line shows the effective number of breeders expected from demographic parameters (N eb,demo = 10 under monogamy and 13.79 under polygyny, respectively). monogamy with N = 50 and polygyny with N m = 20 and N f = 80, which may limit the usefulness of the molecular coancestry method. However, even in these populations, the narrow confidence interval ofN eb;fmol would be of practical significance for obtaining a conservative estimate of N eb . Table 1. Observed and estimated AIS probability, and estimated parent-based coancestries among actual nonsibs (NS), actual half-sibs (HS), actual full-sibs (FS) and all pairs of sampled progeny from the noninbred and inbred parental populations under monogamy with N = 10 parents or polygyny with N m = 5 male and N f = 20 female parents, for the case of L = 15 high-polymorphic marker loci and the sample size of n = 100. The AIS probability is the average over 5000 replicates and 15 marker loci, and the coancestry is the average over 5000 replicates.
Estimated Neb  Figure 3 Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the (A) noninbred and (B) inbred populations under monogamy with N = 10 (half of each sex) parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100.N eb;he is the estimate from heterozygote-excess method (Pudovkin et al. 1996),N eb;fmol estimate from equation (7) andN eb;comb the estimate by harmonic mean ofN eb;he andN eb;fmol . The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates with hatN eb ¼ 1. The dashed line shows the effective number of breeders expected from demographic parameters (N eb,demo = 10).
The effect of unequal contributions of parents on estimates of N eb is shown in Table 3, in which a monogamy with N = 10 (half of each sex) and a sample size of n = 100 offspring was assumed. In all the cases computed, the 90th percentile in the molecular coancestry method was much smaller than in the heterozygote-excess method. As unequal contribution of parents is an important factor for a smaller N e than the census number of breeders (Frankham 1995), the higher accuracy of the present method observed in Table 3 will be a practically appealing point. Figure 4 represents the joint distribution of estimates from the heterozygote-excess and molecular coancestry methods applied to the inbred populations under polygyny with N m = 5 and N f = 20 parents and L = 15 highpolymorphic marker loci. The moment and Spearman's rank correlations, excluding the pairs with infinite estimate, were )0.003 and )0.164, respectively. In all other cases simulated, the correlations of these orders were obtained. An interesting point in Fig. 4 is that the incidence of overestimations in the two methods tends to be exclusive. At present, it is not theoretically obvious how to combine several estimates of N eb optimally to give a single best estimate (Wang 2005). As a tentative method, I combined the two estimates as the harmonic mean, according to the suggestion of Waples (1991): Table 2. Percentiles (10th, 50th and 90th) of estimated effective number of breeders for 5000 replicated simulation runs in the noninbred and inbred populations with several additional combinations of the number of parents and sample size. Population and breeding system N or N m :N  Fifteen (L = 15) high-polymorphic marker loci were assumed. N, the number of parents (half of each sex) in monogamy; N m , the number of male parents; N f , the number of female parents in polygyny; N eb,demo , effective number of breeders expected from demographic parameters;N eb;he , estimated N eb from the heterozygote-excess method; N eb;fmol , estimated N eb from equation (7);N eb;comb , harmonic mean ofN eb;he andN eb;fmol . Figures in parentheses are the percentage of replicates withN eb ¼ 1. Table 3. Percentiles (10th, 50th and 90th) of estimated effective number of breeders for 5000 replicated simulation runs with unequal contribution of parents under monogamy in the noninbred and inbred populations with N = 10 (half of each sex) parents and the sample size of n = 100. Fifteen (L = 15) high-polymorphic marker loci were assumed. Contribution: expected contributions of N=2=5 couples to sample. N eb,demo , effective number of breeders expected from demographic parameters;N eb;he , estimated N eb from the heterozygote-excess method; N eb;fmol , estimated N eb from equation (7);N eb;comb , harmonic mean ofN eb;he andN eb;fmol . Figures in parentheses are the percentage of replicates withN eb ¼ 1.
The harmonic mean is expected to work well in the present case, because of the exclusive incidence of overestimations in the two methods; an overestimated N eb returned by one method is filtered out and the combined estimate is largely determined by the estimate from the other method. The property of the combined estimate is shown in the right panels in Figs 1-3 and the column ofN eb;comb in Tables 2 and 3. The combined estimate in the inbred population was biased downward because of the downward bias ofN eb;fmol . However, as expected, the confidence interval of the estimate was substantially narrowed, comparing with the separate estimates. It is notable that the improvement is larger for lower marker quality, i.e. for a smaller number of marker loci and/or a smaller number of alleles in each locus (Figs 1-3), and for a smaller sample size (Table 2). Although the development of an optimal method for combining separate estimates into a single estimate deserves further investigation with sophisticated statistical tools, the above results strongly suggest that a highly reliable estimate can be obtained from the optimal combination. Some of the limitations of the method proposed in this study are shared by most of the published methods: marker alleles are assumed to be selectively neutral, mating within the population is at random and immigration from other populations is absent (Leberg 2005). In addition, the present method involves a problem associated with age at sampling. Estimation of N e from the recurrence equation (1) is based on the assumption that the average coancestries in two successive generations are measured as the same age stage. In fact, the application of the present method to a sample of juveniles gives an estimate of 'the effective number of breeders'. But even in a population with nonoverlapping generations, the estimate can be largely different from N e , depending on the survival pattern of juveniles to adults. Following Crow and Morton (1955), we consider two extreme patterns of the survival: (i) random survival and (ii) survival of the family as a unit. In the random survival model, survival from juvenile to adult is randomly determined with the expected survival rate s. Under this pattern of survival, the average coancestry among adults is expected to be unchanged from that among the juveniles. Thus, if the present method is applied to a population with nonoverlapping generations, N e ¼ EN eb;fmol Â Ã . Under the survival of the family as a unit, the entire juveniles in a family either survive or do not. With the average survival rate s in the population,N eb;fmol obtained from a sample of juveniles is related to N e as N e ¼ sEN eb;fmol Â Ã (for the theoretical aspect of the above consideration, see Appendix C). Although this model describes an extreme pattern of survival,N eb;fmol of animals with low fecundity and high survival rate, such as mammals and birds in which parental nursing for their brood is generally observed, should be cautiously interpreted. On the other hand,N eb;fmol will give an appropriate estimate of N e when the method is applied to animals with high fecundity and low survival rate, such as marine invertebrates and fishes, whose survival seems to be essentially random.
The present method involves additional problems associated with the selection method for putative nonsibs. One is the problem as to the determination of the number (n 0 ) of selected pairs as putative nonsibs. Although the selection method applied to the present study automatically assigns the number (n) of the sampled progeny to n 0 , this is an arbitrary choice. With a smaller n 0 , it is more likely that the selected pairs are actually nonsibs, but the coancestry among them will underestimate the AIS probability, and vice versa. Another problem is the drift-induced linkage disequilibrium among marker loci. In small populations, the drift-induced linkage disequilibrium may be an important factor (Hill 1981) and reduce the degree to which loci provide independent information about coancestry. This may reduce the effectiveness of the selection criterion of putative nonsibs defined by equation (8). One potential for solving these problems and improving the estimates of N eb from molecular coancestry is the use of a sib-ship reconstruction technique. To date, several methods for sib-ship reconstruction from molecular markers have been developed using different algo- rithms, such as Markov Chain Monte Carlo (MCMC) algorithm (Almudevar and Field 1999;Thomas and Hill 2002;Wang 2004) and simulated annealing (Almudevar 2003;Fernández and Toro 2006), and have been reviewed by Blouin (2003) and Butler et al. (2004). I here take the method proposed by Fernández and Toro (2006) as a trial example of the use of a sib-ship reconstruction technique for estimating N eb . By the use of their method, we can find the sib-ships among sampled individuals that yield a parent-based coancestry matrix with the highest correlation with the molecular coancestry matrix. A notable feature of their method is that it is free from the assumption of linkage equilibrium among marker loci. Two methods for the use of the reconstructed sib-ships were examined: In the first method (SR1), the reconstructed sib-ships were directly used for computingf 1 in equation (7). In the second method (SR2), the average locus-specific coancestry among the inferred nonsib pairs were used for estimating s l as in equation (6). Simulation with 200 replicates was run for the case of polygyny in the inbred population with N m = 5 and N f = 20 parents, n = 100 sample of progeny and L = 15 high-polymorphic marker loci. The results are summarized in Table 4. The two methods with sib-ship reconstruction worked quite well; they gave nearly unbiased estimates and narrower confidence intervals. Although further evaluations including other published methods for sib-ship reconstruction should be carried out under a wide range of scenario, the results in Table 4 suggest the potential for improving the molecular coancestry method.  (0) The corresponding values from the heterozygote-excess (N eb:he ) and molecular coancestry (N eb;fmol ) methods are also presented. Polygyny with N m = 5 male and N f = 20 female parents in the inbred population with L = 15 high-polymorphic marker loci and the sample size of n = 100 was assumed. The effective number of breeders expected from demographic parameters is 13.79. Figures in parentheses are the percentage of replicates withN eb ¼ 1.
Appendix C -Effect of age at sampling on relation between N e and N eb For simplicity, consider a population of monogamous species with an equal number (N/2 = N m = N f ) of male and female parents. Generations are assumed to be discrete (nonoverlapping). Let k ei be the number of offspring at the early age stage (juveniles) contributed by family (couple) i, and k ai be the number of offspring at the later age stage (reproductive adults) contributed by family i. The average survival rate from juvenile to adult is s. According to the standard formula of effective population size (Caballero 1994), the effective number of breeders of juveniles N eb and the effective population size N e (or equivalently the effective number of breeders of adults) are expressed as N eb ¼ Nl ke À 1 l ke À 1 þ r 2 ke l ke and