The timing and tempo of the Neolithic expansion across the Central Balkans in the light of the new radiocarbon evidence

The new set of radiocarbon dates was used to explore the timing and tempo of the Neolithic expansion across the Central Balkans. Our results suggest that the first farmers arrived in this region around or few decades before 6200 cal BC. The observed spatio-temporal pattern based on the radiocarbon data suggests that the general direction of the expansion was along the south-north axis. The regression analysis (arrival time vs. distance from the origin of expansion in northern Greece) was used to estimate the Neolithic front speed. The results of this analysis suggest that there is a moderate fit of the linear model. Most of the front speed estimates based on the Central Balkan data are between 1 and 2.5 km/year (depending on the data subset and the statistical technique) which is mostly above the expected range (around 1 km/year) for the standard wave of advance model and the empirically determined continental averages. We conclude that the spatio-temporal pattern of the Neolithic expansion in the Central Balkans is broadly consistent with the predictions of the wave of advance model, with the possibility of sporadic leapfrog migration events. The speed of the expansion seems to have been faster in the Central Balkans compared to the continental average.


Introduction
The Central Balkans was the bridge for the spread of farming from its initial foothold in Greece to the mainland of Europe. The earliest European Neolithic is found in the Aegean on the sites such as Knossos on the island of Crete, and Franchti cave on the Peloponnese, both dated to the first quarter of the 7th millennium (Perlès et al., 2013;Perlès, 2001;Douka et al., 2017). In the northern Greece, the Neolithic started slightly later, around 6600-6500 cal BC (Karamitrou-Mentessidi et al., 2015;Reingruber et al., 2017), from where it spread further to the Central Balkans. The aDNA research indicates that this was mainly a demographic process with populations from the Anatolia and the Aegean migrating north (Mathieson et al., 2018;Hofmanová et al., 2016).
In culture-historical terms, the earliest Central Balkan Neolithic is represented by the Starčevo culture, a part of the wider Early Neolithic cultural Starčevo-Körös-Criş complex of the late 7th and 6th millennium BC, with characteristic globular painted pottery, clay figurines, pit houses and small settlements (Garašanin, 1982;Gatsov and Boyadzhiev, 2009;Luca and Suciu, 2011;Anders and Siklósi, 2012). The presence of the Mesolithic in the Balkans seems to be limited only to certain microregional pockets (Gurova and Bonsall, 2014). In the Central Balkans, the Mesolithic communities lived in the microregion of the Danube Gorges from the beginning of the Holocene. Upon the arrival of farmers in the region, the Mesolithic and Neolithic communities came into contact which included the exchange of knowledge, animals and people (Borić, 2011). Both strontium and aDNA evidence suggest that Neolithic women of Anatolian genetic ancestry came to live in the Mesolithic communities of the Gorges (Borić and Price, 2013;Mathieson et al., 2018).
Previous research on the timing, tempo and mode of the Neolithic expansion across the Central Balkans was based on a relatively low number of radiocarbon dates. The most recent comprehensive study was by Whittle et al. (2002) who concluded that the earliest farmers reached the Central Balkans around 6200 cal BC. As for the mode of expansion, Whittle et al. (2002) rejected the Wave of Advance (WoA) model Cavalli-Sforza, 1973, 1984) and suggested that the directed pioneer colonization was more probable.
As for the tempo of the expansion, in the pioneering study on determining the rate of the Neolithic expansion in Europe by using regression analysis with radiocarbon dates and distances from the assumed origin of the Neolithic expansion as key variables, Ammerman and Cavalli-Sforza (1971) found a good fit of the linear model (with correlation coefficients ranging from 0.83 to 0.89, depending on the assumed origin of expansion), and estimated that the average continental rate was 1.08 km/year. Gkiasta et al. (2003) found that the overall continental average was~1.3 km/year, also with a relatively good fit of the linear model (correlation coefficient between dates and distances from the origin was 0.74). Pinhasi et al. (2005) estimated that the continental average rate of expansion was in the range between 0.6 and 1.3 km/year and concluded that these values perfectly fit the predictions of the WoA model both in terms of the actual estimated values, as well as the correlation coefficients measuring the goodness of fit of dates vs. distance regressions (~0.8 in their case). Bocquet-Appel et al. (2012) calculated the overall continental average rate to be 1.09 km/year. Similar value was obtained in the latest study by Henderson et al. (2014) who found 0.96 ± 0.04 km/year to be the continental average. Estimated regional expansion rates vary from 0.7 km/year, for the Balkans, to 5.59 km/year for the LBK expansion area in the seminal study of Ammerman and Cavalli-Sforza (1971). Bocquet-Appel et al. (2012) regional estimates for the Balkans is 0.788 km/year, close to the original Ammerman and Cavalli-Sforza's (1971) estimate. Biagi et al. (2005) estimated 3.64 km/year for the front speed across the Balkans, which is very high, comparable to Dolukhanov et al. (2005) estimated speed of the expansion of the LBK Neolithic across the central Europe (> 4 km/year), and to Zilhão's (2003) estimate for the Mediterranean maritime route of expansion ranging from 3.8 and 5 km/year.
In summary, the results of previous studies established the general outlines of the Neolithization in the Central Balkans, but many important aspects of the process remain unknown mainly due to the low number of radiocarbon dates. For the purposes of the ERC BIRTH project a new set of 300 AMS radiocarbon samples was collected and dated (Porčić et al., 2020). This resulted in an unprecedented data set of new absolute dates for the Early Neolithic in the Central Balkans. In this study we combine this new high-resolution radiocarbon data set with legacy dates of sufficient quality to answer three major questions: 1. When did the Neolithic arrive to the Central Balkans? We use the new radiocarbon evidence to update and revise previous knowledge about the arrival of farming to this region. 2. What was the route of the expansion? The aim is to reconstruct a spatio-temporal pattern of the spread in terms of geographic specifics of the process. 3. What was the speed of the expansion?
The questions of speed and spatial patterns bear important implications for the reconstruction of the social, demographic, and economic aspects of the Neolithic expansion, as they figure prominently in theories and models regarding the spread of the farming populations into and across Europe (Ammerman and Cavalli-Sforza, 1971, 1973, 1984Zvelebil, 2001;Hazelwood and Steele, 2004;Pinhasi et al., 2005;Robb and Miracle, 2007;Fort et al., 2012;Fort, 2012;Bocquet-Appel et al., 2012;Shennan, 2018). In the standard WoA model, the speed of the expansion depends on the migration distance and the intrinsic population growth rate (Ammerman and Cavalli-Sforza: 68, 1984;Hazelwood and Steele, 2004). Therefore, knowing the speed of the expansion gives us insight into the range of possible combinations of migration distances and growth rate values (Ammerman and Cavalli-Sforza, 1984: 81). Admittedly, this is a rather limited information about the details of the expansion mechanism, but if the estimated regional front speed is outside the limits predicted by theory and/or empirically established continental averages, this may signal that there may be regional specifics in the way that the farming was spreading.

Materials and methods
As the core of the Central Balkan region almost completely overlaps with the boundaries of the Republic of Serbia, for practical purposes we limited our study to the territory of Serbia and Kosovo. First, we collected all existing radiocarbon dates from the literature, and the most recent dates from the BIRTH project for the Early Neolithic sites in Serbia into a single database. In the second step, we filtered out all dates with standard error greater than 100 radiocarbon years. In this way, we assure that only dates with acceptable precision are used for the analysis.
To answer the first and second research questions, regarding the timing and spatial pattern of the expansion, we try to estimate which are the earliest dates from the earliest sites. As many of the Early Neolithic Starčevo sites were founded long after the Neolithic had expanded into the particular region, we specified the following criteria for the inclusion of sites as the earliest. As the main direction of the spread of the Neolithic front was to the north, we assume that once the Neolithic front had passed through further to the north, the regions behind the front are considered as Neolithic, even though individual settlements might have been founded centuries afterwards. For example, if we know that there are Neolithic sites north of the Sava and Danube river lines dated to~6000 cal BC, then we would exclude from our data a site that is situated south of this line if its start is dated tõ 5600 cal BC, as this date is not informative on the arrival of the Neolithic. In order to solve this problem, we first look at the earliest Neolithic radiocarbon dates in Hungary which is immediately to the north of our study area. The earliest date in Hungary has a median of 5999 cal BC when calibrated. Therefore, we only included sites for which the medians of the calibrated distributions of their respective earliest radiocarbon dates are not later than 5999 cal BC. The more detailed description of the protocol for the selection of the sample is presented in Supplementary material 1 file. This sample consists of 26 radiocarbon dates from 26 sites (Fig. 1, Table 1). Fourteen of these dates are the new radiocarbon dates generated by the BIRTH project, and the remaining 12 are the dates from the literature. It should be emphasized that the selection criterion, as defined here, cannot guarantee that sites which in reality were not truly the earliest (we will refer to them as false earliest) in their respective microregions will not be included in the set. In practice, it is not possible to get a pure set of the truly earliest sites regardless of the selection criterion, as these sites might have never been archaeologically detected. The imposition of the threshold only reduces the probability of including the false earliest sites, therefore our selection would be our estimate about which sites are the earliest in their microregions. That is why we refer to this sample as the estimated earliest sites/dates set.
We calibrated the radiocarbon dates from this set in the OxCal software v4.3.2 (Bronk Ramsey, 2009). In order to explore the spatial patterns of the spread, we divided the study area into 10x10km quadratic grid. Each cell in this grid is associated with the earliest date from that cell. The cells are classified into 50 years wide temporal intervals on a calendric timescale, based on the means of the calibrated distributions of the earliest dates from each cell. The cells belonging to the same temporal interval are plotted in the same color of the color gradient spectrum in order to facilitate the identification of patterns on the map.
As for the third research question, we estimated the average speed of the expansion of the Neolithic across the Central Balkans by using a regression analysis that involves the great circle distance from the origin of the expansion and the time of arrival of the Neolithic front (Ammerman and Cavalli-Sforza, 1971;Gkiasta et al., 2003;Pinhasi et al., 2005;Steele, 2010;Brami and Zanotti, 2015). The time variable is measured for each site as the earliest date associated with the site. As for the distance, the first step was to determine the location of the origin of the expansion. Until recently, it was considered that the origin of the spread of the Neolithic from Greece to the rest of the Balkans was the region of Thessaly where the Neolithic settlements appeared 6500 cal BC (Reingruber et al., 2017), but in the light of new radiocarbon evidence from the sites Paliambela and Mavropigi, both in Greek Macedonia, dated to start between 6600 and 6500 cal BC, it now seems that the earliest Neolithic sites are located further to the north (Kotsakis, 2014;Maniatis, 2014;Karamitrou-Mentessidi et al., 2015). As Mavropigi is closer to the Central Balkans than Paliambela, and the earliest dates from both sites overlap substantially, Mavropigi is taken as the point of origin of the expansion.
The choice of the regression technique for this kind of analysis is not straightforward and needs to be elaborated. At first glance, the solution would be to apply the ordinary least squares (OLS) regression with  arrival time as the dependent variable (as it contains a considerable amount of error due to measurement and calibration uncertainty) and distance from the origin as the independent variable. The expansion speed estimate is equal to the reciprocal of the estimated slope of the best fitting regression line. Pinhasi et al. (2005) used the OLS technique, they provided an interval estimate of the expansion front speed with the lower boundary defined by the slope of the distance vs. time regression, and the upper boundary defined by the reciprocal of the slope of the time vs. distance regression. The main problem for the use of the OLS is that both time and distance variables contain error. The OLS regression is applied under the assumption that only the dependent variable contains error, whereas the independent variable is measured without error. The error in the time variable is due to uncertainty of sampling (if the actual earliest date has been sampled at the site), uncertainty of the laboratory measurement, and the uncertainty of the calibration process. The error in the distance stems from the fact that we do not know the true point of origin. Distances measured from an arbitrarily chosen site are approximations in the sense that the location of a particular site is taken as a proxy for the actual point of origin. For this reason, Steele (2010) suggested that the Reduced Major Axis (RMA) regression (Legendre and Legendre, 1998: 510-517) is the most suitable regression technique for the front speed expansion estimation, as it takes into account error in both variables. Estimates in Gkiasta et al. (2003) are based on the RMA technique. Although Steele's suggestion may seem like the best solution, Smith (2009) has shown that things are not that simple, as the choice of RMA over OLS cannot be justified only by the fact that both variables contain error. Smith (2009) identified two key issues: 1) the structure of the error (how error is distributed between the two variables) and 2) is the relationship between the variables asymmetrical (e.g. one is the cause of the other) or symmetrical (no clear causal relation). Smith (2009) suggests that if the error in the independent variable is small compared to the error in the dependent variable, the OLS is superior to RMA for the estimation of the slope. Likewise, if the relationship between variables is asymmetrical, again OLS is a better choice over RMA. This led Buchanan et al. (2011) to conclude that the OLS is more appropriate than the RMA as the front speed estimation technique. Intuitively, it can be argued that the measurement error in the distance from origin is indeed smaller than the measurement error associated with arrival times, because the point of origin is confined to the relatively small area, so choosing different points within this area should not change the distance values significantly. But it is difficult to precisely estimate the ratio of these two errors. The issue of symmetry is also complex. We disagree with Buchanan et al. (2011) that the direction of causality is clear regarding the distance and time -it does not seem right to argue that changes in arrival times are caused by changes in distance. Both distance and arrival time are changing as a result of an underlying migration process, so they seem to be symmetrical in this respect. As this particular problem seems to be the borderline case for the choice between RMA and OLS, we choose to report the results of both techniques, with arrival time as dependent and distance as the independent variable. The RMA estimates can be considered as the lower boundary for the speed estimate, and the OLS as the upper boundary, as RMA best fitting lines have steeper slopes than OLS lines for the same data (we remind the reader that the speed is calculated as the reciprocal of the slope, therefore higher slopes translate into lower speeds and vice versa) (Smith, 2009). The accurate value of the front speed should be seen as being somewhere in between the RMA and OLS estimates, probably closer to the OLS estimate, given the assumption that the error in arrival time is higher than error in distance. Calibrated radiocarbon dates are not point estimates but probability distributions, therefore we performed a series of regressions with different possible point estimates of calendar dates following Steele's (2010) and Hamilton and Buchanan's (2007) Monte Carlo resampling approach. In addition to performing series of regressions based on resampling, we also performed and reported the results of regressions where expected values (means) of radiocarbon calibrated probability distributions are used as point estimates of the time variable. The regression analyses are implemented in R (R Core Team, 2019). Specifically for the RMA regression we used the lmodel2 package (Legendre, 2018). Detailed description of the statistical analysis with the R code and the spreadsheet with data used for the analysis can be found in the online Supplementary material 1 and Supplementary material 2 files, respectively.
An additional complication is that the choice of the data selection protocol can influence the results of the regression. Ideally, we would only use the set of the truly earliest sites, but this is not possible as the best we can get is an estimate of this set. Even though it seems intuitive to use set of the estimated earliest sites/dates, this approach may potentially bias the regression analysis. The crux of the problem is that if we set the terminus ante quem threshold for the selection of sites as the earliest, as we did here, it is more likely that the false earliest sites will be selected in the south (closer to the origin of expansion) than in the north. This is so because the time window for the inclusion of sites into the earliest group is much smaller in the north, where the threshold value is closer to the true arrival time. The uneven spatial distribution of the false earliest sites might influence the slope of the regression line (Fig. 2).
The alternative is to use the earliest radiocarbon dates from all Early Neolithic sites with available radiocarbon evidence between 6200 and 5400 BCE (the time span of the Early Neolithic Starčevo culture) without an attempt to filter out the false earliest sites. In this case, all other things being equal, the probability of including the false earliest sites will not change with the distance from the origin -it would be more or less constant throughout the study area. The slope of the resulting regression line would be similar to the slope of the line based only on the truly earliest dates from different regions (Fig. 2). However, the intercept based on the full data set would be shifted to the more recent date in comparison with the intercept value based on the truly earliest dates. Counterintuitively, we should get more accurate front speed estimates by using the non-filtered data set, where the false earliest sites are more common, than by using the filtered data set where the probability of including the false earliest dates is smaller but it is unequally distributed in space. This kind of estimate is directly comparable to most of the earlier studies that used the same method. But it comes with a price, as the use of the full data set will reduce the goodness of fit of the linear model due to the noise generated by the false earliest sites in the sample. In theory, front speed estimates based on the full set of sites should be accurate, but only if there are no other biasing factors like the differential intensity of research and differences in regional demographic histories along the gradient of expansion.
The third option is to define an even stricter criterion for the detection of the earliest dates, which is to consider only the earliest date within a certain distance bin (Hamilton and Buchanan, 2007). The downsides of this approach are that it results in a reduction of the sample size (and increase in the uncertainty of estimates) and the loss of spatial resolution as it would smooth the expansion pattern in space and increase the fit of the linear model. The potential patterns due to nondirectional or leapfrog migration would be deleted from the picture. It also introduces an element of arbitrariness in choosing the bin width.
As all these methods have their strengths and weaknesses, the safest solution is to "triangulate" the front speed estimates by trying out all three approaches. Therefore, we perform regression analyses on the three data sets (Supplementary material 2): 1. The estimated earliest sites/dates set. This is the same set of dates and sites used to answer the first two questions in this study (described above). 2. The full data set. The set of earliest radiocarbon dates from each Early Neolithic site that were radiocarbon dated within the study area. This sample consists of 47 dates from 47 sites. 26 out of 47 are the new dates generated by the BIRTH project (Supplementary material 2). 3. The binned data set consists of the 8 earliest radiocarbon dates from 8 distance bins (6 are the new BIRTH dates). This is based on the method from Hamilton and Buchanan (2007). We first grouped the sites from the estimated earliest sites/dates set into 50 km distance bins (Supplementary material 2). These bins should be imagined as concentric 50 km wide annuli emanating from the assumed origin of expansion. We then selected the earliest date from each bin as the arrival time of the Neolithic population to the particular spatial area defined by the bin.
Therefore, the estimated earliest sites/dates set is a subset of the full data set, and the binned set is a subset of the estimated earliest sites/ dates set.

Results
The calibration of the radiocarbon dates from the estimated earliest sites set is presented in Fig. 3. The oldest date comes from the site of Miokovci-Crkvine in Western Serbia (6362-6098 cal BC at 95%, expected value 6238 BCE), but it is statistically indistinguishable (p = 0.572, based on the chi-square test performed using the Combine function in OxCal) from the second oldest date from Rudnik Kosovski (6325-6088 cal BC at 95%, expected value 6185 cal BC) in Kosovo, which is the southernmost early site in our sample. These two dates are followed by a cluster of dates from the Central and Southern Serbia with the 95% confidence intervals between~6200 and~6000 cal BC. Ajmana and Lepenski Vir from the Danube Gorges region (Eastern Serbia) also fall into this interval, although means of their probability distributions are closer to 6100 cal BC. Most of the latest dates (with expected values of~6000 cal BC) all come from the north. The spatiotemporal pattern of the expansion is clearly revealed in the map based on the expected values of the earliest radiocarbon dates for each grid cell (Fig. 4). There is a general south-north gradient suggesting that the expansion followed the expected route, along the South Morava and Great Morava river valleys. However, this gradient is not perfect, it is a statistical trend, as there are sites to the north with earlier dates than sites to the south.
The mean front speed estimates of the RMA regression analyses (the summary of all regression analyses is given in Table 2) based on 10,000 resampled calendar date configurations for the estimated earliest sites/ dates, the full and the binned data sets are 1.22 km/year, 0.43 km/year, and 1.43 km/year, respectively (full distributions in Fig. 5). The mean front speed estimates of the 10,000 OLS regressions based on the resampled calendar date configurations for the estimated earliest sites/ dates, the full and the binned data sets are 2.18 km/year, 0.97 km/year, and 2.44 km/year, respectively (Fig. 5). The mean goodness of fit of the linear model is measured by the mean value of Pearson's correlation coefficient across different realizations of the resampled calendar dates. It is the highest for the binned data set (r = 0.67), followed by the earliest sites set (r = 0.55), and it is the lowest for the full data set (r = 0.44).
If we only use the means of calibrated probability distributions of individual radiocarbon dates as point estimates of the Neolithic arrival time, the results of the RMA regression are as follows (Table 3, Fig. 6). For the estimated earliest sites/dates set, the estimated front speed is 1.57 km/year, for the full data set it is 0.39 km/year, and for the binned data set it is 1.64 km/year. For the OLS regressions based on point estimates of arrival time the front speeds are 2.24 km/year, 0.95 km/ year, and 2.04 km/year, for the same three data sets, respectively. The linear correlation coefficient is the highest for the binned data set (r = 0.81); it is slightly lower for the estimated earliest sites/dates set (r = 0.7); and the lowest value is associated with the full data set (r = 0.41). For all three data sets with point estimates the regression models are statistically significant at the 0.05 level.

Discussion
Before evaluating the time of the arrival of the Neolithic to the Central Balkans, a note needs to be made about estimating the earliest occupation in a region or on a site. The probability of actually sampling the earliest date from a site will depend on the population dynamics and duration of a particular settlement as well as on the sample size (Perreault, 2011). For example, if we assume, in accord with the theoretical and empirical results regarding the Neolithic Demographic Transition, that the Neolithic population size was increasing through time (Bocquet-Appel and Bar-Yosef, 2008;Shennan, 2018), then the most probable date to sample will be a date from the period when the population size was at its peak. The implication is that it is highly Fig. 2. Schematic representation of the effects of different kinds of data selection on the estimated regression curve in time vs. distance analysis: a) The earliest radiocarbon dates are included from all sites, b) The earliest radiocarbon dates included only from the sites with the earliest dates older than the defined threshold.
M. Porčić, et al. Journal of Archaeological Science: Reports 33 (2020) 102528 unlikely to sample the earliest dates from any site. It is difficult to precisely calculate the expected offset without assuming a population dynamics model and knowing the duration of each settlement, but a conservative educated guess would be that the actual earliest dates are usually few decades earlier than the sampled ones. Having this fact in mind, we tentatively conclude that it is most probable that the Neolithic arrived at the Central Balkans between 6250 and 6200 cal BC. It is interesting to note that the oldest dates from Serbia are older than most of the radiocarbon dates for the Early Neolithic in the Republic of North Macedonia, which is immediately to the south of Serbia. There are pre-6250 cal BC radiocarbon dates coming from the North Macedonian sites Amzabegovo and Čuka-Topolčani Fidanoski, 2009), but the accuracy and precision of these dates are problematic as these are conventional radiocarbon dates made on charcoal with very large standard errors (over 150 radiocarbon years). It is theoretically possible that the Central Balkans was populated by a long-distance migration from northern Greece to southern or even central Serbia~6250 cal BC, "jumping over" the territory of North Macedonia or sweeping across North Macedonia and Serbia in a relatively short time, in circumstances related to the 8.2 ky event (Weninger et al., 2006;Berger and Guilaine, 2009;Pross et al., 2009;Krauß et al., 2018). This would result in the observed pattern where the earliest dates from Serbia and North Macedonia are almost the same, but until more dates from the North Macedonia become available, it is more parsimonious to explain this pattern as a result of an extremely small number of the Early Neolithic dates (N = 29) from a small number of sites (N = 8) in North Macedonia (for a full list of published North Macedonian dates see Fidanoski, 2009;. In order to resolve this issue, new research programmes (e.g. such as Horejs et al., 2019) need to be funded in order to fill the large gap in the quantity and quality of archaeological information related to the spread of the Neolithic between Northern Greece and Central Serbia.
As for the front speed estimates, first we will discuss the large discrepancy in the estimated front speeds between the full data set on one side, and the estimated earliest and binned sets on the other (Table 2). It is absolutely certain that the average speed of expansion must have been higher than the RMA estimates of~0.4 km/year, because at such rate, it would take more than thousand years for the Neolithic to cross   ~450 km from the southernmost to the northernmost part of the study area, which cannot possibly be true. The OLS estimates based on the full set are at first glance realistic, as they are consistent with the 1 km/year prediction of the standard WoA model and estimates from previous studies (e.g. Bocquet-Appel et al., 2012;Gkiasta et al., 2003;Pinhasi et al., 2005), but on closer analysis the OLS estimates seem to be underestimates, as well. The only way to get a value that is comparably low as the OLS estimate is to assume that the Neolithic arrived to the southernmost point of our study area as early as 6362 cal BC (the upper limit of the 95% CI of the oldest date in the earliest sites/dates set), and that it reached the northernmost point as late as 5879 cal BC (the lower limit of the 95% CI of the oldest date in the earliest sites/dates set). In this case the average speed would be~1.1 km year, which is still higher than the OLS estimate. But this scenario is highly unlikely and borders on the impossible, as the lower boundary of the 95% confidence intervals for the oldest date from a site which is situated~130 km to the north of the study area is 5883 cal BC (see Supplementary material 1). This leads us to conclude that the full set of sites is biased. Fig. 6 suggests that there are far more sites in the north than in the south, especially sites with later founding dates. This is tilting the regression line towards lower speeds. This difference in the number of sites between the south and the north is most probably a consequence of the differential intensity of research (the number of investigated Early Neolithic sites with radiocarbon dates is lower in the south than in the north, see Fig. 1) and/or potentially different regional demographic histories. Therefore, front speed estimates based on the full set of sites can safely be rejected as unrealistically low. We proceed with the discussion of the front speed estimates based only on the estimated earliest sites and binned data sets. The front speed estimates based on the Monte Carlo resampling are either broadly consistent (RMA estimate on the earliest sites/dates set) with, slightly higher (RMA estimate on the binned data set) than, or significantly above (OLS estimates) the continental front speed estimate ranges from previous studies (e.g. Gkiasta et al., 2003;Pinhasi et al., 2005;Bocquet-Appel et al., 2012). However, the estimates from the previous studies were based on the point estimates of arrival times. Strictly speaking, we should only compare them to the theoretical expectations for the WoA model. In this case we can conclude that they are considerably higher than the often-cited value of~1 km/year predicted by the standard parametrization of the WoA model (Ammerman and Cavalli-Sforza: 80, 1984;Hazelwood and Steele, 2004). On the other hand, point estimates are comparable to the empirical estimates from other studies, and they unequivocally suggest that the Neolithic front expansion speed in the Balkans was around 1.5-2 times higher than the continental average of~1 km/year, although not as high as suggested by Biagi et al. (2005).
The fact that the estimated speed is higher than the standard 1 km/ year does not mean that is not consistent with the WoA model. The speed of the expansion is proportional to the square root of the product between the migratory activity and the intrinsic growth rate (Ammerman and Cavalli-Sforza: 68, 1984; Hazelwood and Steele, 2004), therefore different combinations of these two parameters can result in different speed values. This means that the front expansion speeds around 1.5-2 km/year are possible within the limits of the parameter values of the WoA model that are considered to be possible and realistic (Ammerman and Cavalli-Sforza, 1984: 81).
The more recent formulations of the WoA model that include cultural diffusion suggest that the front speed is also proportional to the degree of cultural diffusion (Fort, 2012). Therefore, relatively high estimates of the front speed in the Central Balkans could have arisen from the increased conversion of the local hunter-gatherers to farmers as a result of the cultural diffusion, but the problem with this interpretation is that there is no secure evidence of the Mesolithic presence in the Central Balkans outside the microregion of the Danube Gorges (Gurova and Bonsall, 2014).
With these results we can only make general remarks about the mode of expansion, i.e. the underlying demographic, economic and social processes driving the expansion. The moderately good fit of the linear model suggests that the WoA is a satisfactory general model of the expansion in the Central Balkans. At the lower spatio-temporal scales the process seems to carry significant noise which reduces the goodness of fit of the model. This can be interpreted in two mutually non-exclusive ways. The first possibility is that the earliest farming communities sometimes practiced relatively long-distance migrations (e.g.~100 km opposed to the maximum of~50 km assumed in the standard WoA model) corresponding to the leapfrog colonization model (van Andel and Runnels, 1995;Zvelebil, 2001). The second possibility is that these discrepancies reflect the sampling effects and differential regional intensity of research. In conjunction with the fact that the earliest dates from sites are unlikely to be sampled a priori, this might have blurred the south-north gradient. As already mentioned, the inclusion of the false earliest sites would also decrease the goodness of fit, which is apparent here with the full data set where the correlation between the time of arrival and distance is the lowest in this study.
Under certain conditions, intercept values of the best-fitting regression line can be used to evaluate the accuracy of the model. The intercept value can be interpreted as the estimated date for the start of the expansion which can be compared to the actual dates associated with the assumed point of origin. However, this would require us to make some very strong assumptions that would make the entire effort speculative, therefore we decided not to follow this path at this moment (see Supplementary material 1 for a detailed explanation).

Conclusion
The first farmers arrived at the Central Balkans as early as 6250-6200 cal BC and reached the Great Pannonian Plain by 6000 cal BC. The spread of the Neolithic across this region generally unfolded in the south-north direction, following the major river courses. Our results indicate that, in general, the standard WoA model of expansion is acceptable as the first approximation of the process of expansion, with the possibility of sporadic leapfrog migration events. The Neolithic  Porčić, et al. Journal of Archaeological Science: Reports 33 (2020) 102528 expansion in the Central Balkans was faster compared to the continental average.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.