Modifiable reporting unit problems and time series of long-term human activity

This paper responds to a resurgence of interest in constructing long-term time proxies of human activity, especially but not limited to models of population change over the Pleistocene and/or Holocene. While very much agreeing with the need for this increased attention, we emphasize three important issues that can all be thought of as modifiable reporting unit problems: the impact of (i) archaeological periodization, (ii) uneven event durations and (iii) geographical nucleation-dispersal phenomena. Drawing inspiration from real-world examples from prehistoric Britain, Greece and Japan, we explore their consequences and possible mitigation via a reproducible set of tactical simulations. This article is part of the theme issue ‘Cross-disciplinary approaches to prehistoric demography’.


The Effect of Periodisation
We emulate an archaeological periodisation process by replacing the time-stamp of each event to the membership to an archaeological phase. For example, suppose phase A had a temporal span of 800 to 701, and phase B between 700 and 401, and phase C between 400 and 300. phases = cut(ss,breaks=c(800,700,400,300),labels=c("C","B","A"),include.lowest = TRUE) #labels are in reverse order since dates are in BC Thus in this case there are 36 events assigned to phase A, 585 to phase B, and 379 to phase C.

Aoristic Analysis + Monte-Carlo Approach
There is a number of closely related developed in the last decade designed to analyse frequency data based on archaeological periodisation. One such approach consists of assigning probabilistic weights to individual events for a given temporal interval. This approach is at the basis of aoristic analysis, where time is divided into equally sized blocks, and weights are computed under the assumption of a uniform probability distribution within assigned phase(s). Thus if we use blocks of 50 years, we would obtain the following:

Number of Events
The average time-series extracted from the Monte-Carlo simulations is comparable to the result of the aoristic analysis, but showcases the extent of chronological uncertainty between 700 and 400 BC (i.e. phase B). More importantly, because of the assumption of uniform probability distribution, both methods fail to identify correctly the major population growth event between 600 and 500 BC. The extent of this bias depends on the resolution of the archaeological periodisation in relation to the scale of the population dynamic of interest and the extent by which shifts in frequencies co-occur with changes in archaeological phases.

Simulation 2: Duration
Consider a time-series recording the number of simultaneously occupied mines over a 1,000 years interval (1750-750 BC). Suppose this number to be constant (n = 100) but with the duration of occupation of the mines to be a linear function of time. More formally we model the duration of each mine as a random draw from a negative binomial distribution with the dispersion parameter α equal to 1 and mean µ equal to with t between -1750 and -750. Here we limit µ to bet between 10 and 200 (i.e. if the result of the equation is below 10, µ is set to 10, if above 200, µ is set to 200 The figure below compares the life-span of a sample of 1000 sites from one simulation (top panel) along with µ(t) (dashed orange line), against the number of those occupied at a given moment in time (middle panel) and the frequency of mid points (lower panel). (3,1),mar=c(0,4,3,4)) plotSUB=minedf[sample(1:nrow(minedf),size=1000),] plot(0,0,type= n ,xlim=c(-1800,-700),ylim=c(0,nrow(plotSUB)+1), axes=FALSE,ylab= ,xlab= Year BC ) for (i in 1:nrow(plotSUB)) {

Simulation 3: Nucleation/Dispersal Bias Setup
Consider two hypothethical archaeological periods, α and β, with equal durations in time. Our objective is to investigate the percentage change in the number of residential units across the two periods. More formally we are intested in estimating 100 × (N β − N α )/N β where N α and N β are the total number of residential units for each period. Our residential units are however spatially orgaised into sites (i.e. settlements) with different sizes, and that our sampling is conditioned by such structure. More specifically, we assume that sampling occurs at the level of site and not each individual residential unit, and that we are able to recover only a fraction r of sites, where r = k/K, where k is observed number of sites across the two periods in our sample and K is the number of sites across the two periods in the population. Finally, we assume that the probability of each site being sampled is defined by the following equation: where π i is the probability of selecting a site with size S i , K is the total number of sites, and 0 ≤ b ≤ 1. The exponent b is bias parameter that conditions the probability of a site to be sampled as a function of its size. When b = 0, all sites have the same chance of being included in the sample, but when b > 0 larger sites have a higher probability of being selected.

Simulation Experiment
What is the combined impact of the non-random sampling regime described above when the two periods α and β are characterised by a different settlement size distribution which we might expect in case of nucleation/dispersal shifts? Here we employ a simple tactical simulation where we: 1) generate artificial settlements for two hypothethical archaeological periods; 2) sample a fraction r of settlements using different degrees of site-size bias b; and 3) compute the observed percentage change in the number of residential units. For period α the site size distribution would be approximately log-normal with µ = 3 and σ = 1 whilst for period β the size distribution would be approximately uniform. The function sim.settlement() will generate the artificial settlements ensuring that the total number of residential units for the two periods are the same.