Degree of adaptive response in urban tolerant birds shows influence of habitat-of-origin

Urban exploiters and adapters are often coalesced under a term of convenience as ‘urban tolerant’. This useful but simplistic characterisation masks a more nuanced interplay between and within assemblages of birds that are more or less well adapted to a range of urban habitats. I test the hypotheses that objectively-defined urban exploiter and suburban adapter assemblages within the broad urban tolerant grouping in Melbourne vary in their responses within the larger group to predictor variables, and that the most explanatory predictor variables vary between the two assemblages. A paired, partitioned analysis of exploiter and adapter preferences for points along the urban–rural gradient was undertaken to decompose the overall trend into diagnosable parts for each assemblage. In a similar way to that in which time since establishment has been found to be related to high urban densities of some bird species and biogeographic origin predictive of urban adaptation extent, habitat origins of members of bird assemblages influence the degree to which they become urban tolerant. Bird species that objectively classify as urban tolerant will further classify as either exploiters or adapters according to the degree of openness of their habitats-of-origin.

Each survey represents a list of species for a defined area and time (ranging from 20 minutes to one month), with geographic co-ordinates. All data were collected between 1998 and 2002. Data initially extracted for this study included 4,221 2-ha searches, 4,993 small area searches, 793 large area searches, and 1,427 incidental observations, and were compiled in a matrix as species and their relative abundance (number of surveys in which a species was recorded in a cell divided by the total number of surveys conducted in the cell) by site.
Using ArcMap GIS, a 1 x 1 km grid based on that developed by the Australian Research Centre for Urban Ecology (ARCUE) (Hahs & McDonnell, 2006) was intersected with Atlas records to produce a matrix of grid cells by species presence/absence. All surveys were assigned to the grid cell in which the central geographic coordinates fell, regardless of survey spatial or temporal scale. It was assumed that most large area searches (6.9% of the surveys in the unfiltered data set) referred to areas of between 500 -2,000 m diameter, and therefore could reasonably be assigned to 1 x 1 km grid cells within which the central coordinates fell.

Estimated sampling completeness
As there is a likelihood that less abundant species may be missed where sampling effort is lower, leading to uneven representation of species (Watson, 2003), a measure of estimated sampling completeness was calculated for each of the grid cells. This enabled an assessment of the evenness of sampling, and for unrepresentative samples to be removed from the data to be analysed.
First, the predicted number of species (SChao2) was calculated for each cell in a 66 km x 65 km grid, using the Chao2 formula (Chao, 1987) (Supplementary Formula 1), where Sobs equals the number of species observed, Q1 the number of unique records (species observed only once at a site during surveys), and Q2 the number of doubletons (species observed only twice). SChao2 is the estimated total number of species present at survey sites, including those not found during surveys.

Supplementary Formula 1: Chao2
From these calculations a standardized measure of sampling completeness (%Completeness) was also calculated for each grid cell, with observed species richness (Sobs) as a proportion of predicted species richness (SChao2) (Peterson & Slade, 1998).

Data organisation
Several assumptions were made about species to be excluded from analyses, and species were not included in grid cell totals and were eliminated from further analyses if any of the following exclusion criteria were met: (i) constituted fewer than five records in the total dataset; (ii) was an irregular or vagrant species to the area or feral species not yet naturalised, determined from the literature (Barrett et al., 2003); or, (iii) were seabird, waterbird, and nocturnal species, except the Tawny Frogmouth Podargus strigoides, from the orders or families: Anseriformes, Podicipediformes, Strigiformes, Eurostopodidae, Aeogothelidae, Procellariformes, Spenisciformes, Phalacrocaraciformes, Ciconiiformes, Gruiformes, Charadriiformes (sensu Christidis & Boles, 2008). A final list of 141 species (hereafter 'the filtered species list') was retained for further analysis.
Grid cells were eliminated from further analyses if any of the following exclusion criteria were met: (i) ≤ 1 surveys in the cell; (ii) %Completeness < 50%; (iii) land area < 25% of the cell; or (iv) a high proportion of singleton records (>50%) and/or no doubleton records (indicating skewed data collection, e.g. single-species or other narrowly targeted surveys). A final list of 390 grid cells was retained for further analysis.
Spatial data on the degree of urbanisation of the study area employed in this study were developed at
The data for cluster analysis consisted of a standard row by column 'r x c' array, with sites as rows, species as columns, and relative abundance (% incidence in surveys conducted in each cell) data for species occurring in each grid cell. A Bray-Curtis distance matrix was prepared, and groups of species were formed by hierarchical agglomerative clustering using Ward's algorithm performed on the distance matrix, as a function of their similarity in distribution and relative abundance. Following González Oreja et al., (González-Oreja, Bonache-Regidor, Buzo-Franco, la Fuente Díaz Ordaz, & Hernández Satín, 2007), an assemblage is a cluster of species separated from all other such clusters by an ecological distance greater than the greatest distance between the two most disparate members of the clade. Where significant sub-structuring in the dendrogram coincided with diagnosable trends in the environmental and demographic data, sub-assemblages were recognized. Assemblages were named using Blair's (1996) standard nomenclature, in keeping with its wide use in the urban bird ecology literature (Chace & Walsh, 2006).
The species and grid cells were ordinated by global non-metric multidimensional scaling (NMDS) methods, using the 'vegan' package (Oksanen et al., 2013). A two-dimensional solution using the Wisconsin square-root transformation and Bray-Curtis coefficients as a measure of dissimilarity in species composition between the sample plots was chosen. Vectors for seven variables were fitted to both the species and grid cell two-dimensional ordination space using the procedure, 'envfit', in 'vegan', and the species ordination space was plotted in an ordination graphic.
Each grid cell was attributed to the bird cluster that had the highest proportion of its total number of species within it, except for 13 cells out of 390 (3.3%), which had equal numbers of cluster 2a and 2b species present.
A Kruskal-Wallis test was used to test whether these five groups of grid cells differed in a simple measure of urbanisation intensity, People/km 2 . A Kruskal-Wallis test was also used to test whether these five groups of grid cells differed in longitude (indicating their position on a west to east environmental gradient in Melbourne), and to test whether bird mass differed significantly between the five groups. The Mann-Whitney U-test was used to determine which means were significantly different from others. As we were principally interested in the 'comparisonwise error rate' rather than the 'experimentise error rate', an α correction (such as Bonferroni) for multiple comparison testing was judged to be unnecessary (Bender & Lange, 2001).
The assemblage members were then allocated to the categories of urban exploiter, suburban adapter, or urban avoider (Blair, 1996)   In the data frame 'clade2a', columns are species and rows are sites. The values in this 'r x c' frame are relative abundance of each species at each site.
This script runs a three dimensional NMDS ordination.

# run NMDS 3D
clade2a.mds <-metaMDS(clade2a, distance = "bray", k = 3, zerodist = "add", autotransform = TRUE, noshare = 0. Plotting the first two axes. # plot 2D NMDS -first 2 axes ordiplot(clade2a.mds, type = "none", main = "Urban adapter birds -assemblage 2a") points(clade2a.mds, "sites", pch = 21, col = "black", bg = "black") text(clade2a.mds, "species", col = "blue", cex = 0.5) A suite of environmental factors were fitted to the ordination. Columns in 'envar.clade2a' represent parameter names and rows represent sites. The values represented by column names are as follows: • The matrix of vectors, and their significance in explaining the ordination, is shown above. In the data frame 'fg', column headings and the data to which they refer are as follows: • pop indicates population; in this case referring to 7 binned intervals of Frequency Greenspace in the larger data set • adapter indicates species richness of urban adapter birds at a given site within the Frequency Greenspace bin • exploiter indicates species richness of urban exploiter birds at a given site within the Frequency Greenspace bin In this example I have set the number of MCMC steps at 10,000 and burn-in generations to 1,000, as for the analysis shown in the manuscript. Output truncated at step 2,500 (of 10,000) Setting the proposal distance in the MCMC (dirvar) to the default of 2 resulted in poor mixing of MCMC chains, and so after a number of trials a satisfactory mixing of chains was achieved by setting dirvar=20.
The plot shown below is a representation of chain mixing from plotting MCMC steps against population preference.  An identical workflow was created and run for the intensity of urbanistion (Combined Index) analysis.

Plotting Frequency Greenspace preference data from 'bayespref' analysis
This is an R Mardkown file which contains a 'ggplot2' script for plotting urban tolerant bird preference for Frequency Greenspace bins from a 'bayespref' analysis that I ran in: Conole, L. E. (2013 In the data frame 'fgprefs', column headings and the data to which they refer are as follows: • bin refers to 7 binned intervals of Frequency Greenspace within the larger data set, • urban refers to the two urban tolerant bird assemblages of Adapter and Exploiter, • median.preference is the median population preference for that Frequency Greenspace bin, • low indicates the lower 95% confidence interval around the median, • up indicates the upper 95% confidence interbal around the median. Using 'ggplot2' to plot the habitat preferences at landscape scale for urban adapter and exploiter birds is achieved with the following script: p = ggplot(fgprefs, aes(x = bin, y = median.pref, shape = urban)) p = p + geom_pointrange(aes(ymin = low, ymax = up), size = 1.5, xlim = c(1:6)) p = p + labs(x = "Frequency Greenspace", y = "Median preference") print(p) In the dataframe 'eac', column headings and the data to which they refer are as follows: • first column = species • columns 2 -10 = habitats-of-origin Create a distance matrix from the dataframe, using Manhattan distance measure.