Land-use history impacts spatial patterns and composition of woody plant species across a 35-hectare temperate forest plot

Land-use history is the template upon which contemporary plant and tree populations establish and interact with one another and exerts a legacy on the structure and dynamics of species assemblages and ecosystems. We use the first census (2010–2014) of a 35-ha forest-dynamics plot at the Harvard Forest in central Massachusetts to describe the composition and structure of the woody plants in this plot, assess their spatial associations within and among the dominant species using univariate and bivariate spatial point-pattern analysis, and examine the interactions between land-use history and ecological processes. The plot includes 108,632 live stems ≥ 1 cm in diameter (2,215 individuals/ha) and 7,595 standing dead stems ≥ 5 cm in diameter. Live tree basal area averaged 42.25 m2/ha, of which 84% was represented by Tsuga canadensis (14.0 m2/ ha), Quercus rubra (northern red oak; 9.6 m2/ ha), Acer rubrum (7.2 m2/ ha) and Pinus strobus (eastern white pine; 4.4 m2/ ha). These same four species also comprised 78% of the live aboveground biomass, which averaged 245.2 Mg/ ha. Across all species and size classes, the forest contains a preponderance (> 80,000) of small stems (<10-cm diameter) that exhibit a reverse-J size distribution. Significant spatial clustering of abundant overstory species was observed at all spatial scales examined. Spatial distributions of A. rubrum and Q. rubra showed negative intraspecific correlations in diameters up to at least a 150-m spatial lag, likely indicative of crowding effects in dense forest patches following intensive past land use. Bivariate marked point-pattern analysis, showed that T. canadensis and Q. rubra diameters were negatively associated with one another, indicating resource competition for light. Distribution and abundance of the common overstory species are predicted best by soil type, tree neighborhood effects, and two aspects of land-use history: when fields were abandoned in the late 19th century and the succeeding forest types recorded in 1908. In contrast, a history of intensive logging prior to 1950 and a damaging hurricane in 1938 appear to have had little effect on the distribution and abundance of present-day tree species. Our findings suggest that current day composition and structure are still being influenced by anthropogenic disturbances that occurred over a century ago.


INTRODUCTION
In forested landscapes around the world, legacies of human activities have shaped the composition, size structure, and spatial patterns of trees, understory vegetation, and associated ecosystem processes (Birks et al., 1988;Turner et al., 1990;Russell, 1997;Foster & Aber, 2004;Ellison et al., 2014). The extent of the interactions between anthropogenic effects and abiotic factors such as climate, soils, and episodic disturbances in shaping vegetation patterns depends on the intensity of the effects and the spatial scale of analysis (Rackham, 1986;Glitzenstein et al., 1990;Zimmerman et al., 1995). A complex interplay of succession, competition, disturbance, environment, and land use shape dynamics and patterns of forests at local-to-regional scales (Condit et al., 2000;Thompson et al., 2002;Chazdon, 2003;Van Gemerden et al., 2003).
By further examining the spatial patterns of trees within a forest, ecologists can begin to uncover the underlying processes and mechanisms that led to those patterns (e.g., are species randomly distributed, aggregated, or dispersed in space and why? (Wiegand, He & Hubbell, 2013)). A growing number of studies have used point pattern analysis to examine the spatial structure of forests by using fully mapped plots, as each tree, or point, has a mapped location (Zhang et al., 2010;Wang et al., 2011;Lutz et al., 2013;Fibich et al., 2016;Nguyen, Uria-Diez & Wiegand, 2016). A variety of univariate and bivariate point-pattern analysis methods and summary characteristics have been used to characterize the spatial patterning of trees (Wiegand & Moloney, 2004;Wiegand & Moloney, 2014). Since each method tells something different about the spatial structure of the data within a forest, it is more desirable to use multiple summary characteristics to better describe the patterns of tree species and among species associations (Illian et al., 2008;Wiegand, He & Hubbell, 2013).
The forests of New England, USA have been shaped by a variety of natural and anthropogenic factors. As in other forests, the geology and climate of New England define the broad patterns of current forest composition (Foster et al., 1992;Hall et al., 2002), but the shifts in species abundance and distribution patterns that have occurred since Europeans colonized New England more than 400 years ago have resulted in a relatively homogenous assemblage of young, even-aged stands with fewer late-successional species (Thompson et al., 2013). In Massachusetts, modern vegetation exhibits only weak relationships to broad climatic gradients because of the overwhelming influence of past land use (Foster, Motzkin & Slater, 1998). An increasing emphasis in ecological studies is evaluating the relative importance of historic land-clearing, agriculture, intensive harvesting (Foster, 1992;Thompson et al., 2002;Rhemtulla, Mladenoff & Clayton, 2009;Hogan et al., 2016), and natural episodic storms (Foster & Boose, 1992;Zimmerman et al., 1995) on current-day structure and species composition of forest stands (Motzkin et al., 1996;Motzkin et al., 1999;Bürgi, Östlund & Mladenoff, 2017). Although evaluating which of these variables are most important in shaping current day structure and composition is challenging, the development and use of statistical approaches like recursive partitioning and conditional inference trees has aided the interpretation and prediction of these types of analyses (Hothorn, Hornik & Zeileis, 2006).
Harvard Forest is an ideal location to investigate how spatial patterns and composition of woody plants are influenced by land-use history. For more than a century, Harvard Forest (HF) researchers have investigated and recorded impacts of land use on forests and how New England's forests are continuing to change as the regional climate changes, populations of large herbivores wax and wane, and nonnative insects and pathogens establish, irrupt, and kill tree species (Foster & Aber, 2004).
Here, we describe the results of the first census of a fully mapped 35-ha forest-dynamics plot at the Harvard Forest and examine how its structure and composition relates to interactions between land-use history and ecological processes. We first describe the composition and structure of the woody plants in this plot and assess spatial associations within and among the dominant species using univariate and bivariate spatial point-pattern analysis. Second, we uncover the influence of historical land use and natural disturbances on the current-day structure and composition of this forest plot. We pay particular attention to patterns of distribution and abundance of Tsuga canadensis (eastern hemlock) and its relationship to other species in the plot because previous work has shown it to be a foundation species in this forest (e.g., a species that defines ecosystems, controlling the biological diversity of associated species and modulating critical ecosystem processes; sensu (Ellison, 2019)). Tsuga canadensis is currently threatened and declining throughout much of the HF plot and its range due to a nonnative insect, Adelges tsugae (hemlock woolly adelgid; HWA) and its decline and loss are likely to have profound impacts on forest structure and composition (Orwig et al., 2013;Foster, 2014).

Site description
The 35-ha (500 × 700 m) HF forest-dynamics plot is part of a global network of Forest Global Earth Observatory (ForestGEO) plots established to monitor, understand, and predict forest dynamics and responses to global change (Anderson-Teixeira et al., 2015). The HF ForestGEO plot (southwest corner at 42.5386 • N, 72.1798 • W) is located within the 380-ha HF Prospect Hill tract in Petersham, Massachusetts, USA within the Worcester/Monadnock Plateau ecoregion (Griffith, Omernik & Kiilsgaard, 1994) of Transition Hardwoods-White Pine-Hemlock forests (Westveld, 1956) (Fig. 1). Elevations in the plot range from 340.2 to 367.8 m a.s.l. Soils include Gloucester stony loam, Acton stony loam and Whitman very stony silt loams, all of which are gravelly and fine sandy loam soils that developed in glacial tills overlying gneiss and schist bedrock (Simmons, 1941). The north-central portion of the plot contains a 3-ha peat swamp with muck soils that has been colonized at intervals by Castor canadensis (beaver). Average  annual temperature at the site is 7.9 • C and the annual precipitation of 1090 mm is distributed evenly throughout the year (Boose & Gould, 2019).

Land-use history
We examined the influence of past land-use history (derived from forest stand descriptions of dates of field abandonment, areas used as woodlot, pasture, or cultivation; presence of distinct plow horizon; silviculture treatments; and salvage operations); historical events (e.g., insect outbreaks, storms and associated degree of forest damage Rowlands, 1941); and biophysical attributes (roads, soil type, slope, aspect, elevation, and distance to streams) on current forest composition and species distribution within the plot by using data from the document archives at HF (http://harvardforest.fas.harvard.edu/document-archive).
Original maps of activity were manually transcribed to standardized base maps and then scanned and digitized as shapefiles in ArcView GIS 3.2. The shapefiles were then transformed to Massachusetts State Plane Meters (NAD83 projection) in ArcGIS to align better with aerial photographs and linear features (trails, stonewalls, etc.) downloaded from MassGIS (Hall, 2005) and used in spatial analyses (see below). Pollen evidence suggests that prior to European settlement, Prospect Hill was a mixture of old-growth northern hardwoods, T. canadensis, and Pinus strobus (eastern white pine). Following European arrival, the site then experienced complex ownership and intensive land-use over the next few centuries, both of which are largely representative of the New England region (Ellison et al., 2014). Forest clearing began in 1750 and reached a maximum in the 1840s, by which time close to 80% of the original forests had been cleared for agriculture (Fisher, 1933;Raup & Carlson, 1941). Field abandonment began in 1850 and continued through 1905 in the southern half of the plot ( Fig. 2A). Reforestation of those fields continued through the 20th century (Foster, 1992). The western, northern, and northeastern areas of the plot remained permanently wooded, but experienced various types of selective cutting in the 1790s and 1870s (Foster, 1992). The first maps characterizing forest types of individual stands were completed in 1908 and classified the permanent woodlots in the western third of the plot as being comprised of hardwoods, white pine-hardwoods, hemlock, and red maple (Fig. 2B). Many Castanea dentata (American chestnut) died in 1912-1914 from infection by Endothia parasitica (chestnut blight) (McLachlan, Foster & Menalled, 2000) and forests were damaged by natural disturbances including an ice storm in 1921 and one of the most damaging hurricanes to hit New England in 1938. The hurricane and subsequent salvage logging resulted in the loss of as much as 70% of the standing timber on HF properties (Foster & Boose, 1992).
The central sections of the plot, containing mostly stony loam soils and no visible signs of a plow layer, were unimproved pastures abandoned in the mid-19th century (Motzkin et al., 1999) (Fig. 2C). These areas reforested and were classified as cordwood (poor hardwood) in 1908 (Fig. 2B), except for an area classified as open, which is the beaver swamp. Much of the cordwood section was subsequently clear-cut in the 1920s and then thinned or salvaged in the late 1940s following the 1938 hurricane. Pinus resinosa (red pine) and Picea abies (Norway spruce) plantations were established in portions of these abandoned pastures in the mid-1920s and early 1930s. The southcentral area of the plot contained areas of improved pasture and cultivation and was classified as containing white pine in 1908. This area was clear-cut in the 1920s and a portion of it was clear-cut again in 1980, resulting in many small diameter, multi-stemmed trees. Additional biotic changes that impacted the plot included the exotic Lymantria dispar (gypsy moth), which lead to widespread defoliation of hardwoods during 1944-45 and 1981; Cryptococcus fagisuga (beech scale insect) combined with Neonectria fungal spp. (beech-bark disease), which has led to the decline and death of larger Fagus grandifolia (American beech); and Adelges tsugae, which was first observed in the plot in 2008 and then rapidly spread throughout the plot, subsequently killing hundreds of T. canadensis stems and threatening the rest.

Plot establishment and woody stem census
During March 2010, professional surveyors delineated the plot boundaries, established a continuous grid of 20 × 20-m quadrats, and measured the elevation at each post using a Sokkia SET600 Total Station (Olathe, Kansas, USA). During the summers of 2010 and 2011, all woody stems ≥ 1 cm in diameter at breast height (dbh; 1.3 m above the ground level) were uniquely tagged, identified (nomenclature follows Haines, 2011), and measured to the nearest 0.1 cm dbh (Condit, 1998). All dead stems ≥ 5 cm diameter that were standing and >45 degrees from horizontal also were tagged, identified, and measured. The swamp located in the center of the plot was sampled when the ground was frozen during the winter months of 2012-2014. Each tagged stem was mapped within one of four 10 × 10 m subquadrats within each quadrat on a scale-drawn map data sheet. Each map was then scanned and individual stems were digitized using the ImageJ processing program (Rasband 2012), and converted to local (x, y) coordinates within a quadrat using R (v.3.6.1) (R Core Team, 2013) and the CTFS R package (Condit, 2014).

Forest species composition and stand structure
Estimates of stem densities were derived from total counts in which multi-stemmed individuals were considered as a single stem, whereas estimates of basal area and biomass were derived from the sum of all stems ≥ 1 cm dbh (Gilbert et al., 2010). Biomass of living woody stems was estimated from dbh using allometric equations (Table S1).

Spatial analysis
We assessed the spatial patterns of all stems of the seven most abundant overstory tree species across the entire plot using the pair-correlation function (g (r); Wiegand & Moloney, 2014), for which the value of the function represents the degree of clustering (g (r) > 1) or overdispersion (g (r) < 1) at a given spatial lag (distance between neighboring trees). We compared the observed pair-correlation statistic to that expected if trees were distributed randomly (g (r) = 1) within the plot using 199 Monte Carlo CSR (complete spatial randomness) simulations of the tree map for each species.
To test for the effects of intraspecific competition we used the univariate markcorrelation function (kmm(r); Wiegand & Moloney, 2004;Wiegand & Moloney, 2014) to test whether the size (dbh) of each of the seven most abundant tree species depended on its proximity to neighbors of its own species. The value of kmm(r) represents the relative sizes of trees at a given spatial lag and indicates if trees are larger or smaller than expected at a given spatial lag. We compared the observed univariate mark-correlation function statistic to that expected if the sizes of trees were randomly assigned across individuals using 199 Monte Carlo simulations for each species, i.e., the spatial pattern of the trees remained the same, but their sizes were shuffled (Jacquemyn et al., 2010). Spatial analyses were not conducted on shrub species as many only occurred in the central swamp area.
Prior work has shown that the shade-tolerant T. canadensis is an important foundation tree species, creating and strongly controlling the microenvironment, understory vegetation, and ecosystem dynamics (Ellison et al., 2005;Orwig et al., 2013). Thus, we assessed the potential influence of T. canadensis on the sizes of each of the other most common tree species in the plot using a bivariate marked point-pattern analysis (Schlather's version of Moran's I mark-correlation function (Im1m2(r); Wiegand & Moloney, 2014). This statistic determines if tree sizes are spatially correlated: individuals are smaller or larger than expected at various distances from a neighbor. We compared the observed Im1m2(r) to that expected if the sizes of trees were randomly assigned across individuals using 199 Monte Carlo simulations for each species (Jacquemyn et al., 2010). All spatial pattern analyses were performed using the 2018 version of the software Programita (Wiegand & Moloney, 2004;Wiegand & Moloney, 2014).
GIS overlays of past land use, historical events, and biophysical attributes were used as covariates in a conditional inference regression-tree model to predict diameter and abundance of the most common overstory species in the plot (Table 1). Species-specific abundances or sizes were predicted for each of the seven most abundant overstory species conditional on their observed locations. A mean relative abundance (stems/ha) associated with each tree location was calculated using raster-based tools within the GIS. First, a 3×3 cell moving focal window analysis was used to generate a surface of mean tree abundances across the plot at a 20-m cell resolution. Subsequently, to associate a mean relative abundance value with each tree, this generated surface was sampled in the GIS at the location of each tree. Using the 'cforest' function in the R package 'party' (Version 1.3-5) (Hothorn, Hornik & Zeileis, 2006), the outcomes of 500 conditional inference tree models (Hothorn, Hornik & Zeileis, 2006) were compiled and the relative importance of explanatory variables were ranked across all models. The conditional inference algorithm is based on a random forest machine-learning algorithm (Breiman, 2001) used in many ecological modeling contexts (e.g., Fox et al., 2017;Mi et al., 2017;Mohapatra et al., 2019;Shearman et al., 2019). The conditional inference method improves on the variable ranking methodology by applying a permutation importance algorithm that corrects for variable selection bias resulting from a mix of categorical and continuous explanatory variables that are correlated to varying degrees or that have complex interactions (Strobl, Boulesteix & Hothorn, 2007). Variable importance scores are calculated by determining the marginal loss of prediction accuracy from any given model iteration after removing each explanatory variable. Overall variable importance is determined by averaging the variable-wise decrease in accuracy scores over all 500 model iterations to rank the overall importance of each variable across all models.

Composition and stand structure
Within the 35-ha HF ForestGEO plot, we identified 108,632 live stems ≥ 1 cm dbh, representing 77,536 individuals (2215 ha −1 ) of 51 woody species in 17 families (Table S2). Table 1 Description of land-use history, disturbance, stand, and biophysical predictor variables converted to GIS shapefiles and used to predict current tree species abundance and dbh values across the Harvard Forest ForestGEO plot.

Predictor Description
Land use Stand type -1908Stand type - , 1947Stand type - , 1986 Early forest stand descriptions in plot recorded by forest type and year Allen Land Use Land-use descriptions derived from degree of soil disturbance, including plow (Ap) horizon presence and depth, recorded by previous HF soil scientist, Arthur Allen.
Field Abandonment Years since the date of field abandonment 20th C. Salvage cutting Areas that experienced cutting following wind damage or other natural disturbance in the early to mid-1900s 20th C. intensive cutting Areas that experienced clearcut, shelterwood or reproduction cuts during the early to mid-1900s

Natural disturbance
Hurricane damage Data collected between 1939-1941 on degree of overstory trees uprooted, leaning or broken following 1938 hurricane (Rowlands, 1941).
In contrast, 73% of tagged stems and 69% of live individuals within the plot were <10-cm dbh (Fig. 4). These same stems comprised only 5% of the total live plot basal area and 3% of the total live plot biomass (Table 2). Shrub species made up many of these stems with reverse-J size distributions and included I. verticillata, Vaccinium corymbosum (highbush blueberry), and Kalmia latifolia (mountain laurel). Nonnative species in the plot included 1687 stems of Picea abies (Norway spruce) and Pinus resinosa (red pine) that remained from early 20th-century conifer plantings and three stems of Frangula alnus (glossy false buckthorn). Ten species had only one or two stems within the plot (Table  2). Finally, there were 7595 dead stems ≥ 5 cm dbh within the plot, >50% of which were T. canadensis, P. strobus, or A. rubrum. Dead tree basal area was 4.18 m 2 ha −1 and dead aboveground biomass was 17.53 Mg ha −1 .

Spatial structure related to past land-use impacts
The spatial distributions of the seven most common species varied across the plot (Fig. 5). Pinus strobus was common throughout the plot. Tsuga canadensis was most abundant in the western, northern, and eastern portions of the plot, whereas Q. rubra and A. rubrum dominated the central and southern areas. Both Betula species were most abundant in the central and eastern sections, and F. grandifolia was most common in the southeastern section.
Shrubs were also often spatially aggregated with respect to hydrology and topography. Ilex verticillata V. corymbosum, Viburnum nudum (withe-rod), and Lyonia ligustrina (maleberry) dominated the poorly drained beaver swamp (Fig. 6). Hamamelis virginiana (witch-hazel) was found in a narrow elevational band (342-346 m) surrounding the swamp and a dense patch of K. latifolia was located in the northwest corner of the plot. The seven most abundant canopy tree species were significantly clustered in the plot at all spatial lags up to 50 m relative to a CSR null expectation (Fig. 7). The effect of intraspecific competition also was apparent for these seven species as smaller than expected diameters were observed when nearby individuals of the same species. For example, spatial distributions of A. rubrum, Q. rubra, and F. grandifolia showed negative intraspecific correlations in diameters up to at least a 150-m spatial lag, whereas the other species had intraspecific negative correlations at ≤ 50-m spatial lags (Fig. 8). Tsuga canadensis, B. alleghaniensis, and P. strobus had positive spatial correlations (larger diameters than expected among dbhs at spatial lags >150 m. Interspecific correlations in diameters between species suggest that the impact of T. canadensis on Q. rubra was negative at intermediate spatial lags (25-75 m) but positive between T. canadensis and the other five species at most spatial scales up to 150 m (Fig. 9). The abundances and sizes of the most common overstory species were predicted best by a variety of historical factors and competitive interactions. Conditional inference random-forest modeling revealed that the abundances of T. canadensis, P. strobus, Q. rubra, A. rubrum and F. grandifolia were strongly associated with neighborhood effects (size of neighboring trees within 10 m; Fig. 10). The date of field abandonment was a strong predictor of Q. rubra, P. strobus, and B. lenta abundance, whereas the forest type in 1908 was the best predictor of B. alleghaniensis and A. rubrum abundance. Betula species also were strongly associated with Simmons soil type. Overstory species diameters were best predicted by neighborhood effects for T. canadensis, B. lenta, and F. grandifolia; date of field abandonment for P. strobus and B. alleghaniensis; and the 1947 stand type for Q. rubra and A. rubrum (Fig. 11). The predictive power of the conditional inference forest model regressions was much higher (R 2 = 0.79-0.95) for species abundance in the plot compared to species size (R 2 = 0.11-0.53).

DISCUSSION
We censused all woody stems ≥1-cm dbh within a 35-ha forest-dynamics plot in northcentral Massachusetts to examine the spatial patterns of trees and shrubs at a scale rarely attempted in temperate forests. We have shown that broad patterns in land use and historical disturbance that occurred up to a century ago remain dominant controls on present-day spatial distribution and structure of overstory species. Tree species were significantly clumped within the plot and T. canadensis affected the distribution of other dominant canopy species in different ways. Topography and hydrology also affected the distribution and abundance of understory stems. Detailed abundance and species distribution data provided in this study will provide invaluable information on forest dynamics in the future as the currently most abundant species-Tsuga canadensis-is declining because of a non-native insect (Orwig et al., 2018).

Forest structure is contingent on past land use
The forest canopy within the HF ForestGEO plot, dominated by T. canadensis, Q. rubra, A. rubrum, and P. strobus, is representative of many central New England forests. Like other temperate ForestGEO plots, a relatively small number of species dominated the HF plot (13 species were represented by over 1000 stems). However, this number was higher than the 5-10 species that reached this abundance in other temperate ForestGEO plots (Wang et al., 2010;Wang et al., 2011;Lutz et al., 2012;Bourg et al., 2013;Lutz et al., 2013) and likely reflects the varied habitats, high intensity of prior land use, and early stages of stand development at HF. Although we have much historical knowledge regarding land-use change at HF, the conditional regression random-forest modeling enabled us to explore more quantitatively how patterns of tree size and stem density for the seven most abundant species have been affected by tradeoffs between legacy effects of past land uses, management interventions, disturbances, and local-scale variation in stand structure and environmental conditions. This combination of quantitative modeling with historical knowledge contributes to a deeper understanding of historical human impacts on current forest structure. For example, our modeling results suggested that T. canadensis diameters and stem densities across the full plot are most strongly associated with local stand structural characteristics and neighborhood effects, whereas stem densities are only moderately associated with land-use history. This result is consistent with the relatively undisturbed appearance of the older portion of the HF plot where T. canadensis is most common, has persisted through time for thousands of years (Foster & Zebryk, 1993), and excluded of other species under its canopy. Tsuga canadensis is most abundant on land that was consistently used as a woodlot but never completely cleared for agriculture. The western portion of the plot was one of the few locations at HF that was mapped as T. canadensis forest in 1908 (Spurr, 1956). The high abundance of T. canadensis is the result of its shade tolerance and The significance of this effect was evaluated by comparing the calculated Schlather's I (Im1m2(r)) bivariate correlation statistic against values simulated under a null expectation, where nonfocal species' tree sizes were randomly shuffled over trees for each of 199 simulations. The blue line indicates calculated Im1m2(r) values, whereas the black lines demark the 95% confidence envelope around simulated Im1m2(r) values under the null model. A blue line falling below, within, or above the upper confidence limit indicates significant negative, independent, or positive correlations of dbh marks of the given species with the dbh of T. canadensis individuals found at a range of distances, respectively.
Full-size DOI: 10.7717/peerj.12693/ fig-9 deep crowns, which enable it to persist for decades, modify the understory environment by transmitting very little light, prevent other species from getting established (Canham et al., 1994), and gain dominance following partial cuttings, the death and subsequent salvage of C. dentata and F. grandifolia, and moderate damage from the 1938 hurricane (Foster et al., 1992;Motzkin et al., 1999;McLachlan, Foster & Menalled, 2000). These same disturbances also likely led to growth increases and additional establishment of P. strobus (Hibbs, 1982b), as the largest pine stems also occur on the western edge of the HF plot. In contrast, modeling revealed stronger effects of both land-use history and stand structural variables on the sizes and stem densities of the other six dominant species. Field abandonment date and stand types present in the early-and mid-20th century are particularly strong predictors of diameters and densities of these species. This is consistent with recorded historical knowledge. For example, Pinus strobus and Q. rubra are most abundant on areas that were formerly pasture or fields in the mid-to late-1800s that also experienced intensive past silvicultural cuts, thinning, and weeding in the 1920s-1940s, and more severe damage from the 1938 hurricane (Motzkin et al., 1999;Hall, 2005). Quercus rubra trees had larger mean diameters and crown sizes than F. grandifolia or A. rubrum, consistent with past investigations that highlighted the ability of Q. rubra to overtop canopy associates and rapidly expand laterally into gaps (Oliver, 1978;Hibbs, 1982a). Acer rubrum and B. alleghaniensis are more closely associated with mesic locations such as swamp borders with silt loam soils and low-lying sites with peaty soils in the northeast corner of the plot; indeed, random-forest models supported the relatively strong importance of soil type for these species and B. lenta relative to the other species. The south-central portion of the plot experienced the most intensive land use. It was the only area that experienced historical cultivation and multiple periods of subsequent clear-cutting, including a harvest in 1980. This area is dominated by smaller, multi-stemmed A. rubrum, Q. rubra, B. populifolia and B. papyrifera (grey and paper birch), and Prunus (cherry) species, which are much more common in forests that have experienced intense human impacts (Del Tredici, 2001). The relationship between current stem-density patterns for A. rubrum and these two Betula species and intensive historical land-use activities can be explained by the ability of these species to sprout following cutting and take advantage of high-light environments (Burns & Honkala, 1990).
Understory composition, dominated by woody shrubs, appears to be determined by soil drainage and the ability of individual species to tolerate standing water, poorly drained soils, or subtle topographic variation. Historically, the swamp contained pasture on its western edge and a woodlot in the remaining portion. Today, the wetland shrubs I. verticillata, Va. corymbosum, L. ligustrina, and Vi. nudum are found in high abundance in the central beaver swamp, which otherwise is devoid of trees. The northwest section of the plot has the highest elevation and is dominated by K. latifolia. Hamamelis virginiana appears to be restricted to a narrow elevation west of the swamp and in the southeast corner of the plot. Previous work at HF related K. latifolia abundance to nitrogen-poor sites and H. virginiana to continuously forested sites (Motzkin et al., 1999), which is consistent with our findings.
Across all species and size classes, the forest contains a preponderance (>80,000) of small stems (<10-cm dbh) that exhibit a reverse-J size distribution. The high abundance of stems in this size class (e.g., several shrub species, T. canadensis, and A. rubrum) is in contrast to several other temperate forest plots (Lutz et al., 2012;Bourg et al., 2013;Lutz et al., 2013), and is more similar to results from tropical evergreen (Memiaghe et al., 2016) or Mediterranean forests (Gilbert et al., 2010). Most of the abundant overstory and all the abundant shrub species also have reverse-J distributions, indicative of stable populations and adequate regeneration. For overstory species, this likely is a result of the mix of even-age and varying-aged cohorts and single trees establishing following anthropogenic disturbances and natural gap-phase dynamics that are frequent in this region (Oliver & Stephens, 1977;Hibbs, 1982a;Pederson, 2005). The greater ages of the shade-tolerant T. canadensis that occur on primary woodland are approaching a structure and diameter distribution that resembles old-growth forest (D'Amato, Orwig & Foster, 2008;Janowiak, Nagel & Webster, 2008). In contrast, A. rubrum and Q. rubra had skewed unimodal size distributions more indicative of managed forests (Janowiak, Nagel & Webster, 2008).

Overstory spatial patterns
We observed significant spatial clustering among abundant overstory species at all spatial scales examined. Aggregated species distribution patterns are common in both temperate (Hou et al., 2004;Hao et al., 2007;Wang et al., 2011) and tropical forests (Condit et al., 2000;Plotkin et al., 2000;Réjou-Méchain et al. et al. 2011, Nguyen, Uria-Diez & Wiegand, 2016. Both external factors (habitat heterogeneity) and internal factors (dispersal limitation, succession, gap dynamics) can lead to clumped distributions at various spatial scales (Getzin et al., 2008;Réjou-Méchain et al. et al. 2011). Within the HF ForestGEO plot, high habitat heterogeneity caused by complex past land use (differing field abandonment dates followed by repeated cutting and thinning; Motzkin et al., 1999) has likely led to high densities of A. rubrum and Q. rubra stems in the central portion of the plot. These non-random patches of individuals with lower than average dbh (as seen in the mark correlation analysis) may reflect strong competition for light as seen elsewhere (Fibich et al., 2016). Similar patterns seen in B. alleghaniensis, B. lenta, P. strobus, and T. canadensis in close proximity to other conspecifics (0-20-m scale) likely reflect crowding effects, and for T. canadensis, the ability of thousands of small stems to persist in the understory for decades (Marshall, 1927). These effects disappear at intermediate scales and even become positive at distances >100 m, indicating that trees greater than the mean dbh are more broadly distributed. The negative correlation observed for F. grandifolia at most spatial lags ≤ 150 m may be more reflective of its overall size distribution with most of its stems <10-cm dbh. Beech-bark disease is present at HF, and has likely contributed, along with past cutting, to the absence of large F. grandifolia in the plot.
Bivariate mark correlation functions have been underused in large, stem-mapped plots but hold great promise in ecological research (Velázquez et al., 2016). We used this method to examine the relationship between the size of individuals of T. canadensis, an important foundation species within the plot, with the size of six other important canopy species some distance away. Apart from Q. rubra, diameters of the other five species were positively correlated with the diameters of T. canadensis at all spatial scales. This pattern is consistent with T. canadensis being a foundation species in this forest (Buckley, Case & Ellison, 2016;Ellison et al., 2019), but it also simply could indicate a ''habitat'' effect: all these species are growing well everywhere and are found at a wide range of sizes. This effect was particularly strong for B. lenta and P. strobus, but weaker for A. rubrum, B. alleghaniensis, and F. grandifolia and disappeared after 100-150 m. Diameter of Q. rubra was on average smaller than expected by chance when within 20-80 m of T. canadensis. Historical factors play a role here, as the spatial distribution of these species highlight that oak abundance is the lowest within the T. canadensis-dominated portions of the plot that were woodlots and suggest that T. canadensis and the dense shade cast by their crowns limited establishment of the more intolerant Q. rubra.

CONCLUSIONS
The HF ForestGEO plot is the largest mapped temperate-forest plot in North America and joins the growing array of temperate forest-dynamics plots worldwide (Anderson-Teixeira et al., 2015). The species composition and aggregated spatial patterns within the plot are still being influenced by a land-use legacy of anthropogenic and natural disturbances that occurred decades to over a century ago. Despite extensive 20th-century harvesting, silvicultural thinning, and salvage operations following the 1938 hurricane, the most common overstory species in the HF ForestGEO plot today can best be predicted by longerterm land-use legacies represented by the 1908 forest type and the date of late 19th-century field abandonment, and tree neighborhood effects. At smaller scales, there is evidence of crowding effects of many common species, likely a result of successional dynamics of these aggrading forests following intensive land use. The increasing importance of T. canadensis during the last century across the plot negatively affected the distribution of Q. rubra. Its location and five-year schedule of plot sampling highlight the plot as valuable long-term infrastructure that will complement Harvard Forest, LTER, NEON, and ForestGEO research efforts (Orwig et al., 2018). Because all woody stems ≥ 1-cm dbh are mapped and measured, the data have been used in a variety of complementary ways, including to examine species codispersion patterns and spatial patterns of species co-occurrence (Buckley, Case & Ellison, 2016;Case et al., 2016), help inform a simulation model of forest dynamics (SORTIE;Case et al., 2017), assist with investigating crown allometry (Sullivan et al., 2017) and crown mapping (Hastings et al., 2020), develop maps of tree-mycorrhizal associations (Sousa et al., 2021) and aid in identifying statistical fingerprints of foundation species (Ellison et al., 2019). In addition, the data enable us to document changing species distribution patterns at an uncommonly large scale, while focusing on elements of the landscape that are often ignored, like beaver swamps and shrub thickets, and examine their contribution to overall forest structure and composition. data questions, and many plot logistics. Thanks to Jeannette Bowlen for administrative assistance, Emery Boose and Paul Siqueira for help with plot coordinates, Brian Hall for map and GIS layer production, and Matthew Duveneck and Danelle Laflower for assistance with figures. We also thank David Foster for his support and assistance with plot design, location, and integration with other long-term studies at HF. We thank three anonymous reviewers as well as Jonathan Thompson, Neil Pederson, Audrey Barker-Plotkin and lab group members for providing critical comments on earlier versions of the manuscript. There were no conflicts of interest.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
Funding for the Harvard ForestGEO Forest Dynamics plot was provided by the Center for Tropical Forest Science and Smithsonian Institute's Forest Global Earth Observatory (CTFS-ForestGEO), the National Science Foundation's LTER program (DEB 06-20443, DEB 12-37491, DEB 18-32210) and Harvard University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: The Center for Tropical Forest Science and Smithsonian Institute's Forest Global Earth Observatory (CTFS-ForestGEO). The National Science Foundation's LTER program: DEB 06-20443, DEB 12-37491, DEB 18-32210. Harvard University.