North and South in the ancient Central Andes: Contextualizing the archaeological record with evidence from linguistics and molecular anthropology

: The Central Andes are characterized by the early emergence of complex societies and a chequered yet continuous cultural tradition. However, at least for certain points of time in the cultural development, the overall cohesiveness of this ‘culture area’ has been called into question, favoring an alternative perspective that emphasizes the existence of several relatively independent nuclei of development on the North Coast, the southern Peruvian Highlands and the Titicaca basin, with distinct cultural expressions and political organization. Here, we engage archaeological evidence and its interpretation with newly emerging perspectives from linguistics and genetics (modern and ancient DNA), including new targeted genetic analysis, to add fresh evidence to the question of the internal structure and cohesiveness of the ancient Central Andes as a culture area. The double cultural/biological approach points at a North vs. South structure bisecting the Central Andes that becomes appreciable 2,000 years ago; however, as the evidence from all three disciplines indicates, too, the spheres have remained connected and hence maintained an overall cohesiveness. Our analysis suggests that demographic population structure precedes the constitution of distinct cultural domains, a pattern which is to be verified in other chronological transects in South America and at a global scale. The Central Andes are characterized by the early emergence of complex societies and a chequered yet continuous cultural tradition. However, at least for certain points of time in the cultural development, the overall cohesiveness of this ‘culture area’ has been called into question, favoring an alternative perspective that emphasizes the existence of several relatively independent nuclei of development on the North Coast, the southern Peruvian Highlands and the Titicaca basin, with distinct cultural expressions and political organization. Here, we engage archaeological evidence and its interpretation with newly emerging perspectives from linguistics and genetics (modern and ancient DNA), including new targeted genetic analysis, to add fresh evidence to the question of the internal structure and cohesiveness of the ancient Central Andes as a culture area. The double cultural/biological approach points at a North vs. South structure bisecting the Central Andes that becomes appreciable ~2,000 years ago; however, as the evidence from all three disciplines indicates, too, the spheres have remained connected and hence maintained an overall cohesiveness. Our analysis suggests that demographic population structure precedes the constitution of distinct cultural domains, a pattern which is to be verified in other chronological transects in South America and at a global scale.


Introduction
Compared to other areas of the world, South America has a relative short history of human occupation. The Pacific coast provided an important point of reference for early humans, as the coastal route was a first viable access to the North American continent whose interior was largely covered by a glacial ice sheet (Braje et al., 2017). Indeed early coastal settlements in South America show a clear adaptation to the coastal ecological niche Fix, 2005;Rothhammer and Dillehay, 2009). In addition, however, in spite of the formidable challenges which the high-altitude environments of the Andes in western South America pose for survival and subsistence, humans began to exploit these extreme ecozones and perhaps even established permanent residence there very soon (Rademaker et al., 2014, cf. Capriles et al., 2016. In one region of South America, the continuity between lowland coastal environments touched by the nutrient-rich Pacific Ocean (emphasized e.g. by Moseley, 1975) and the highly variable environments at higher altitudes (emphasized e.g. by Burger, 1992) created the setting of the emergence of cultural developments not found elsewhere in the continent. This region, known as the Central Andes, is usually considered the only area of South America -at the present state of research -which witnessed an indigenous cultural trajectory that led to the "pristine" emergence of state-level societies (e.g. Haas et al., 1987, Lumbreras, 1999, Stanish, 2001. While the incipience of cultural complexity is highly localized within the region, at later stages of development cultural commonalities across geographical space emerged. These have led to the idea of a Central Andean cultural co-tradition, i.e. an "over-all unit of cultural history ... within which the component cultures have been interrelated over a period of time" (Bennett, 1948: 1; for overviews on the cultural chronology of the Central Andes, see e.g. Moseley, 2001;Leon, 2014;Quilter, 2014).
Here, we adopt the definition of Stanish (2001) to describe the geographical extent of this co-tradition or culture area. According to this definition, the Central Andes span latitudinally from the modern Ecuadorian-Peruvian border up to and including the greater Lake Titicaca basin in Bolivia. Even though the term 'Andes' may evoke https://doi.org/10.1016/j.jaa.2020.101233 Received 2 April 2020; Received in revised form 10 September 2020 T primarily images of snowcapped mountain peaks, the geographic domain covered by the term 'Central Andes,' precisely because of its hybrid cultural-geographical nature, is not limited to the highlands, but comprises three different ecodomains: the Pacific Coast, the Andean highlands, and the cloud forests at the eastern slopes of the Andes. 1 More than a century of intensive archaeological research on the Central Andes has led to the establishment of a chequered and multifarious cultural chronology, which leads from the first beginnings of cultural complexity to its culmination point, the short-lived Inca empire (Fig. 1). Due to a rapid expansion that involved military conquest as well as co-option of preexisting polities (e.g. D 'Altroy, 2014;Rostworowski de Diez Canseco, 1999), the Inca managed to create the largest empire that ever existed in the New World before European conquest (and that transcended, if only for few decades, the geographical limits of the Central Andes as defined here).
When speaking of a chequered and multifarious cultural chronology, we have in mind the fact that in the Central Andes periods of cultural integration -so-called "Horizons", the last of which associated with Inca hegemony-characteristically alternated with periods characterized more by the absence of centralized power and more local developments, the so-called "Intermediate Periods" (Rowe, 1956(Rowe, , 1962Rowe and Menzel, 1967, see Fig. 1). While the latter label may suggest periods of decline, actually, in a salient number of cases these Intermediate Periods are periods during which political integration is clearly in evidence on local scales and during which interregional exchange and artistic production flourished under the aegide of local rulers. 2 Even in the so-called Intermediate Periods, local cultural developments have a notable persistence in geographical space and follow a kind of longue durée pattern of continuity. One example is that elements of the Moche culture of the Early Intermediate Period persist, although transformed, through the Middle Horizon into the Late Intermediate Period Lambayeque and Chimú cultures, with heartlands in the same area in which the Moche culture had flourished centuries earlier. Also, demographic and cultural continuity of the common population is demonstrable through the Early Intermediate-Middle Horizon-Late Intermediate transition (Klaus, 2014), with change primarily affecting elite styles.
On the basis of such considerations, at least for certain points of time of their cultural development, the overall cohesiveness of the Central Andean culture area has been called into question in favor of a perspective that emphasizes the existence several relatively independent nuclei of development, with distinct cultural expressions and political organization. In particular, a distinction between one nucleus in Northern Peru, represented prominently by the maritimely adapted cultures of the coast, and the highland-based agropastoral societies of the highlands of South Peru on the one hand and the Titicaca basin on the other has been suggested (Stanish, 2001).
Here, we examine the question of cultural cohesiveness vs. nuclearization in the ancient Central Andes -which, naturally enough, has been discussed primarily by archaeologists on the basis of the evidence available to them -in an interdisciplinary perspective. Concretely, we engage the archaeological evidence and its interpretation with the latest advances in other fields of inquiry that are relevant for elucidating the human past in the same region, namely (historical) linguistics and molecular anthropology (human genetics). We proceed by providing more detail on the relevant archaeological evidence and the interpretations to which archaeologists have subjected it in Section 2, being aware of the limitations of such broad review -we can only provide a synthesis of the archaeological evidence at a very high level of abstraction and have to gloss over many details. Then, in Section 3, we offer some conceptual and methodological details on our approach to contextualizing the archaeological evidence in a broader interdisciplinary framework, thereby paving the way to our discussion of the linguistic (Section 4) and genetic evidence (Section 5) as it is relevant to the topic of the article. Finally, we pull the strands together and discuss and interpret the three lines of evidence in Section 6.

Central Andean archaeology: North and South. 3
In a review of the archaeological evidence for state-formation in the Central Andes available at the onset of the 21st century, Stanish (2001: 43) argues that "the idea that the Central Andes is culturally unified and homogenous has been a subtext in anthropological and historical studies since at least the European conquest," and that much of this perception reflects propagandistic efforts of the Inca and later the Spanish, who both had an interest of promoting a picture of imperial unity. Indeed, as Stanish goes on, one millennium before the Inca, during the Middle Horizon, distinct cultures would have prevailed in different parts of the Central Andes which, in a commonly held opinion, at the same time engendered the first state-level societies. The first clear hallmarks of the societal complexity associated with states in the arid deserts of the North Coast become appreciable during the Early Intermediate Period with the Moche culture (see Fig. 1). The political landscape of the North Coast in Moche times, however, is a matter of current debate; it is considered possible that polities with characteristics of states emerged separately in different valleys (cf. Chapdelaine, 2011 for review). The North Coast would have been involved in early exchanges with the North Highlands (Isbell and Silverman, 2008: 506) that are also appreciable in early colonial times, where they surface in the ethnohistoric record in the form of the resource sharing between rulers from coast and highland (Ramírez, 1995).
Slightly later, but with roots in times that are coincidental with the late phases of Moche, "leadership in complexity within the Central Andes shifted from northern Peru and the Pacific coast … to south central Peru, northwestern Bolivia and the Andean highlands" (Isbell, 2008: 731) with the rise of the Ayacucho-based Wari and the Tiwanaku polity in the Titicaca basin (Fig. 1). From their eponymous urban centers, iconographic and stylistic canons that were novel yet rooted in previous traditions radiated out to eventually reach also the geographic peripheries of the Central Andes, including the North Coast, to yield the so-called Middle Horizon in the cultural periodization of the Andes. While there is considerable similarity that is likely to reflect shared cultural and religious beliefs, there are also differences between the material culture associated with Wari on the one hand and Tiwanaku on the other. These allow to relatively clearly delimit Wari and Tiwanakuinfluenced parts of the Central Andes in the archaeological record with a border zone in the Moquegua valley of what is today Southern Peru (cf. e.g. Isbell, 2008;Williams, 2001 for review).
Based on this cultural tripartition -Moche in the northern part of the Central Andes on the North Coast, Wari in the central part (Southern Peru), and Tiwanaku in the south dominated by the Titicaca basin 1 While the geographical limitation is useful and grounded in facts, reification must be avoided: for one, the fringes and peripheries of the Central Andes arguably shifted northward through time, with Far Northern Peru only brought into the confines of the Central Andean culture area relatively late (Hocquenghem, 1998;Richardson et al., 1990). On the other hand, peripheries, in the Andean context e.g. at Vicus in what is now Northern Peru (e.g. Makowski, 1994;Kaulicke, 2006) are the site of the complex confluence of people and ideas, and it is unlikely that attempts to draw strictly sharp borders in geographical space are feasible.
2 Needless to say, in particular archaeologists working in regional contexts whose trajectories differ significantly from the Ica valley on which the chronology is based have expressed dissatisfaction with it (e.g. Pozorski and Pozorski, 1987), and in fact, the horizon concept is viewed with increasing skepticism among archaeologists themselves, see Swenson and Roddick (2018) for a recent review. 3 The title of this section is in reminiscence to that of Isbell and Silverman (2008), the contributions in which provide rich discussion of the issues involved and to which the reader is referred for further details.
( Fig. 1)-, Stanish (2001) ponders the idea that the three implicated regions constituted largely independent, rather than interrelated, nuclei of state formation. 4 As Stanish (2001: 60) says, "[t]here is virtually no evidence for any direct links between Tiwanaku and Moche, except for the most superficial of iconographic data. There are greater links between Moche and Wari, but these are largely iconographic as well…" 5 However, the close entanglement between Wari and Tiwanaku is among the principal reasons why Stanish's picture of three independent traditions and pathways towards societal complexity has been criticized. Isbell and Silverman (2008: 508) "find the evidence convincing that after AD 600, the rise of Wari and Tiwanaku was so profoundly interrelated that one cannot be understood without considering the other. From that time, and probably throughout the remainder of the Middle Horizon, the two were probably no longer independent evolutionary trajectories, but part of a southern co-tradition of imperialism." In a similar but different vein, Shimada (1999: 486) points out that "north-south differences in artistic and technological styles became evident during this period and persisted until the end of prehistory," mentioning in particular the "contrasting emphasis placed on two-dimensional (including geoglyphs) and polychrome expressions in southern Peru and the South-Central Andes, as opposed to the northern preference for a limited color palette and three-dimensional, sculptural expression." The fact that, thus, in the Early Intermediate Period "the Central Andes can be bipartitioned into (overlapping) northern and southern cultural spheres," as Shimada (1999: 487) goes on to state, underscores that the differences are not just a matter of state formation, but that there are broader and long-standing cultural differences between North and South.
To what extent the archaeological cultures in the northern, central, and southern subregions of the Central Andes were interrelated is a question that archaeologists need to discuss and ultimately answer on the basis of the evidence from their discipline. Here, instead, we wish to put the question of the nuclearization of the Central Andean culture area that is at the heart of Stanish's review of state formation into a broader, interdisciplinary, perspective. Even though state formation is likely a significant component of the broader issues, our discussion is not tied particularly to the question of the origins of societal complexity in the Central Andes. Instead, we would like to investigate further the broader question to what extent the cultural expressions of the Central Andean co-tradition are unified rather than an epiphenomenon that emerges as the mere sum of developments in more regionalized hotspots of interaction and development.

From archaeology to a broad anthropological perspective: Conceptualization and limitations
As a vehicle to bring the abovementioned disciplines to bear upon one another, we borrow the notion of "interaction sphere," which is already commonly used in Central Andean archaeology (e.g. Burger, 2013;MacNeish et al., 1975;Quilter, 2014: 152-157). Directly relevant to the present concerns, Isbell and Silverman (2008: 506) speak of a "southern sphere of interaction" emerging from the meeting of Wari and Tiwanaku. However, unlike its original definition (Caldwell, 1964), we do not emphasize the distribution of elite goods nor use it with exclusive reference to redistribution, exchange, or commercial activities (though these may well be implied). Instead we use this term in a broad sense (which is already foreshadowed in Andean archaeology, see Lau, 2008: 145-146) to describe the geographical extent of regionalized similarities in (i) material archaeological culture, (ii) shared lexical and grammatical patterns in language (cf. Rehg, 1995and, for South America specifically, Jolkesky, 2016, and (iii) population structure, all elements that must reflect repeated interactions between people. Given that, just like archaeological cultures, also languages and genomes "were independently in existence before the onset of the new configuration brought about by interaction mechanisms" (Masry, 1997: 120) the concept is well-suited for our intent. That the notion of the interaction sphere emphasizes synchronic distributions over developments through time (Lau, 2008: 146) is likewise fitting for our purposes because the linguistic and genomic data we will be discussing are largely synchronic rather than diachronic. However, given that the question we pose is strongly linked to questions of cultural trajectories through time, and that parts of our genomic data pertain to ancient DNA (see below), we believe it is appropriate to subsequently dynamicize our view into a broader and partially diachronic perspective.
Some highly influential models at the interface of linguistics, archaeology, and genetics, such as Renfrew's (1992) Language/Farming Dispersal Hypothesis, operate on the basis of specific theories and frameworks which shape expectations as to how the data from the three different disciplines should be relatable. For the purpose of our study we work more in a data-rather than theory-driven manner, with no strong predefined assumptions as to how the interface between the disciplines should be theorized. Most relevantly, in fact, our approach does not rest on potentially problematic attempts to correlate historically attested languages of the Central Andes with archaeological cultures, and either of these with contemporaneous or prehistoric "people" (cf. Quilter 2010: 228-229) that could be studied by means of molecular anthropology. Instead, we investigate whether more longstanding patterns of nuclearization of cultural developments correspond to linguistic and genomic divides within the Central Andes. We also do not necessarily expect that the evidence from the individual disciplines converge on the same picture. Rather, we believe in evaluating the evidence from each discipline in its own right, without attempting to fit one into the procrustrean bed that an interpretative framework based on another would constitute (cf. Denham and Donohue, 2012). Where the evidence paints a disparate picture, the interdisciplinary non-congruence itself is what is informative and in need of explanation (Pakendorf, 2014(Pakendorf, , 2015.
It lies in the nature of the question asked that the perspective we take is a broad one, concerned with macro-level analysis of spatial patterns in the data. We do not wish to conceal that this entails glossing over significant details in the record of each discipline, which add important facets to our understanding of the cultural development of the Central Andes. We consider it important to take these into account as well, although we do not discuss them here at the high level of abstraction that the principal question we pose demands.

Perspectives on North-South structure in the Central Andes from 16th century language geography
One of the ways in which Stanish (2001: 44) illustrates the distinctiveness between the North Coast, the Central Highlands, and the Lake Titicaca basin is by pointing to linguistic differences between these regions. 6 Linguistic history indeed already has played an unusually large role in considerations of the pre-Columbian Central Andes, though the dialogue between linguistics and archaeology has so far placed an emphasis on questions revolving around cultural triggers for language spread (Kaulicke et al., 2010;Heggarty and Beresford-Jones, 2012). Yet, linguistic evidence is relevant to theorizing Andean prehistory also beyond these. For the question of the nuclearization of the Central Andean co-tradition it is vital to be cognizant of the fact that the linguistic landscape of the Central Andes underwent drastic changes in the historical period, especially in Northern Peru. The linguistic situation that can be observed today is heavily influenced by the European impact, which involved language shift from indigenous languages to Spanish in a process that was quicker in some regions and slower in others (and that is still ongoing today). As we are essentially concerned with the dynamics of the nuclearization and unity in the pre-Columbian cultural chronology Central Andes rather than the present, the baseline and point of departure for inferences in this regard should be the earliest recoverable situation in the early 16th century, when the first Spaniards set foot into the region. Fig. 2 shows one possible reconstruction of the distribution of indigenous languages in the Central Andes in early 16th century, based on the work of Torero (1986Torero ( , 1990Torero ( , 1993, Cerrón , and Urban (2019a). These reconstructions, in turn, are based on a bouquet of sometimes mutually reinforcing, but sometimes also contradictory primary evidence. It includes passages in ethnohistorical documents (for instance, early chronicles and lawsuit protocols from colonial times), toponymic evidence (placenames which often bear recurrent endings -such as -ing or -shire in England-that allow to track the former distribution of the language that gave rise to them), and sometimes also statements from colonial grammarians who actually described the languages in question.
As the map shows, a "quilt of socially stable linguistic differentiation" (Mannheim, 1991: 51) characterized Southern Peru and the Titicaca basin. This "quilt" involved Quechua, Aymara, and Puquina as widely spoken languages, often interspersed among one another in discontinuous areas (see Urban, in press-c for discussion). The social roles and identities associated with the linguistic differences cannot be recovered, but scattered ethnohistoric evidence suggests that the Quechua and Aymara languages were regionally acting as indexes of distinct social identities within an overarching society (de Ulloa Mogollón, 1965, cf. Mannheim, 2018: 513 andUrban, in press-c). 7 A further, comparably minor player when compared with the much more widespread presence of Quechua and Aymara varieties and Puquina in the Southern Sphere were the Uru and Chipaya languages, spoken by people adapted to the lacustrine and riverine environments of the greater Titicaca region (Wachtel, 1986). 8 (2018) with minor modifications. The figure, produced using the R packages "ggplot2" (Wickham, 2016) and "maps" (Becker et al., 2018) is for illustration purposes only; all linguistic boundaries shown are approximate and lowland languages further east are not shown. 6 The idea that "some form of proto-Aymara" (as initially posited by Stanish 2001: 44, cf. also Plourde and Stanish, 2008: 237) was spoken in the altiplano in Tiwanaku times is now disregarded (cf. Cerrón- Palomino, 2000). Instead the Puquina language, also mentioned by Stanish, is commonly thought to have been associated with Tiwanaku (e.g. Cerrón-Palomino, 2016b). 7 Ethnographic data from Bolivia reported in Bastien (Bastien, 1985) indicates that this was a configuration of broader relevance at least in the Southern Central Andes, cf. Urban (to appear c) for further discussion. 8 The history of the term "Uru" is complex and not entirely resolvable. Among other issues associated with the term, there is good evidence that in colonial, and likely Incaic times, rather than designating a language, "Uru" was an ethnic and especially socioeconomic category that was relevant for taxation (cf. Bouysse-Cassagne, 1975;Julien, 1987;Torero, 1987;Mannheim, 1991: 50).

Fig. 2. A reconstruction of the linguistic situation in the Central Andes at the point of European contact on the basis of Urban
Here, we follow current practice in linguistics and use the term unambiguously with reference to the language described in Hannß (2008) and Cerrón-Palomino et al., 2016 on the basis of earlier sources. Many, but far from all, people classified as "Uru" in the colonial records would have been speakers of the language we refer to as Uru. However, some people categorized as "Uru" in the 16th century were Puquina speakers undergoing language shift to Aymara (Julien, 1987: 54). Here, we do not use the term for the unrelated Puquina A quite different linguistic ecology obtained in the North-Central Andes. Whereas the South is characterized by closely related dialects pertaining to widespread families and genealogical diversity was relatively modest, in the North, if one includes the eastern slopes, we have knowledge of as many as twelve distinct indigenous languages that were spoken alongside Quechua varieties (Urban, in press-c). While their geographical extent probably ranged from truly local languages to ones with somewhat wider significance, the sheer geographical extent of Quechua and Aymara languages dwarves all of them. On the North Coast, we know of languages most commonly known as Tallán, Sechura, Mochica, and Quingnam. Of these, Mochica survived longest and was also attributed the greatest significance in colonial times as evidenced by the number of materials produced (see Urban, 2019a for review) and the fact that it is at least once mentioned among the lenguas generales of Peru (cf. López, 1889: 549). The highlands of Cajamarca and the northernmost provinces of Ancash were, alongside Quechua, the domain of the Culli language; toponymy betrays the presence of one or more further indigenous languages (Torero, 1989) which may have been relatives of Cholón (Urban, in press-a). Cholón was probably a major language on the eastern slopes of the Andes, which can in Northern Peru be conveniently separated from the highlands by the steeply incised valley of the Marañón river. To the north of Cholón, its putative sister language Hibito was spoken, and still further north, there was a truly multilingual region involving languages known as Chirino, Patagón, Bagua, Copallín, and Sacata. Linguistic divisions across the coast-to-highland transitory ecotones and across the Marañón valley were not neat, however. "Coastal" languages like Mochica were present at least parochially in the highlands of Cajamarca (De la Carrera, 1644; Rostworowski de Diez Canseco, 1985), and conversely, toponymy betrays Culli outliers near the coast (Adelaar, 1988;Urban 2019a: 71-71;2019b), so that in outline a picture of zones of different speech deeply interpenetrating each other emerges (Urban, in press-c).
Importantly, the languages of Northern Peru, and likewise the extinct Puquina language in Southern Peru and Bolivia, cannot be related successfully to the known language families of the Central Andes and indeed of South America as a whole. In particular, there is no evidence for a genealogical connection of any of these languages to Quechua or Aymara. While Hibito and Cholón might have pertained to a larger language family of Northern Peru (Urban, in press-a), otherwise there is no conclusive evidence to show a genealogical relationship between the non-Quechua non-Aymara languages among themselves. That is why, for practical purposes, they must be considered isolates, i.e. pertaining to language families the only known member of which are these languages themselves. Therefore, another contrast between the North and South of the Central Andes is that the former is characterized by a situation of higher genealogical diversity, with many local languages that, for all we can know, were not related to one another and small language families of modest extension, while in the latter, genealogical diversity was lower and the linguistic landscape was dominated by the larger Quechua and Aymara language families. The extension of the isolate Puquina in the South also exceeds that of the language isolates in the North.

Perspectives on North-South structure in the Central Andes from language contact patterns and areal convergence in language structure
While there is still much work to do in descriptive Andean linguistics, a surge of documentary efforts in the 1960 s and 1970 s as well as more recent work has led to a situation in which we have a reasonably good picture of the major Andean language families, Quechua and Aymara, as well as their historical interrelations. Also Chipaya, an unrelated language still spoken today in the town of Santa Ana de Chipaya in the Bolivian Andes, is now blessed with a modern reference grammar and dictionary (Cerrón- Palomino, 2006;Cerrón-Palomino and Ballón Aguirre, 2011).
In conjunction, these materials clearly demonstrate that the languages once and presently spoken in Southern Peru and Bolivia show the hallmarks of language contact, and hence, interaction between speakers of different languages on a sustained basis. Quechua and Aymara are well-known to be intimately tied to one another in lexical and grammatical structure: a first wave of convergence due to intensive language contact must be posited before the split-up of the ancestors of the lineages when they were still spoken in or close to the homeland (theorized to be in Central Peru), and secondary convergences are visible between daughter languages (Emlen and Adelaar, 2017;Emlen, in press). One of the sites of such secondary convergence, and a particularly strongly visible one, is found in Southern Peru and Bolivia and involves the closely related Quechua varieties of Cuzco and Bolivia and the Aymara language proper (e.g. Adelaar, 1987;Mannheim, 1991).
The Uru language appears to have been influenced structurally by Aymara (Muysken, 2000), and also Puquina is involved as both donor and receiver of linguistic material from Quechua and Aymara (e.g. Adelaar and van de Kerke, 2009;Cerrón-Palomino, 2016a).
All the non-Quechua languages of Northern Peru, however, are extinct and insufficiently documented. In the best of cases, a colonial grammarian has produced a full grammar of the language and documentary work was also carried out in the heyday of European naturalist-explorers in the late 19th and early 20th centuries. Colonial grammars, in spite of the problems associated with the limitations of the theoretical framework of description they employed and the absence of a principled theory of phonetics and phonology, can be interpreted in terms of modern linguistic descriptive practices. Such work has been carried out for the Mochica language of the North Coast as well as for Cholón, once spoken on the eastern slopes at roughly the same latitude on the basis of colonial grammars (De la Carrera, 1644; De la Mata, 2007) and, in the case of Mochica, also more recent data (Middendorf, 1892;Brüning, 2004). Unfortunately, many remaining languages -especially within the Northern Sphere-are even less-well known, since data are restricted to short, sometimes minimal, wordlists, as is the case for Tallán, Sechura, Quingnam, Culli, Hibito, and the languages of the Jaén region (Rivet, 1949;Martínez Compañón, 1985;Torero, 1993;Quilter, 2010;Urban, 2015a). Documentation is likewise scanty for Puquina in the South: in this case we do not have a wordlist, but a restricted set of uncommented translations of Christian materials into Puquina (de Oré, 1607). Even such materials can allow to infer some aspects of phonology, word structure, and sometimes even patterns of word-formation and affixation (Urban, 2019b;Emlen et al., in press); in the case of the languages of the Jaén region (Bagua, Xoroca, Chirino, Patagón, Copallín, and Sácata), however, the three to five words the available material is restricted to virtually bar any effort of analysis. Finally, there are languages that must have been spoken in Northern Peru, but for which no documentation at all is known. This does not mean that all is lost: as for the other poorly documented languages, ancillary sources such as toponymy, anthroponymy, and isolated indigenous words from colonial works produced in the region where the individual languages were once spoken as well as "substrate" words -borrowings from these original languages into their successorsbetray their existence (cf. e.g. Cerrón- Palomino, 2016;Urban, in pressb).
While it goes without saying that this data situation leaves many aspects of the languages of Northern Peru in the dark, linguistics can nevertheless contribute to the understanding of pre-Columbian Northern Peru. In fact, the linguistic record available for the northern languages, sparse as it may be, has especially in recent years been the subject of focused attention (Andrade Ciudad, 1995;Cerrón-Palomino, 2004;Taylor, 1990;Torero, 1986Torero, , 1989Torero, , 1993. The state of research is now mature enough to develop comparative and historical perspectives (footnote continued) language, as early theories to their (diachronic) identity (Créqui-Montfort and Rivet, 1925Rivet, , 1926Rivet, , 1927) are now thoroughly discredited (Torero, 1992). which can shed light on the relations between former speakers (Urban, 2017(Urban, , 2019a(Urban, , 2019b; at the same time, given the data situation, these perspectives will only be able to touch on selected aspects of the structure of the languages, and they will always remain incomplete and partial. As Lass (1997: 184-185) puts it, "[t]he palimpsest that makes up the observable surface of a language is rarely (if ever) entirely the result of its own internal history. At least part, either superficial like lexis, or 'deeper' in structure will likely be the scars of encounters with other languages." Such traces of interaction are saliently visible in the linguistic record of Northern Peru, and that across the coast-highland divide as well as the Marañón river valley, in spite of the fact that it formed and forms "a formidable obstacle to communications" (Adelaar, 1988: 121). Concretely, there are two main lines of linguistic evidence that suggests that Northern Peru formed a self-contained linguistic interaction sphere that was, nevertheless, connected to the broader Central Andean linguistic ecology (Urban, 2019a). One concerns lexical borrowing. Qualitatively, the data include ample evidence for shared lexical material even in what linguists call "basic vocabulary" -words for concepts that are so essential that they are highly stable through time and rarely replaced by borrowings from other language, and if so only in contact situations that are characterized by prolonged intensive linguistic interactions. In Northern Peru, these include words for 'to eat,' 'to drink,' 'bird,' and even 'water' or 'lake' (Urban, 2017). Also, there may have been shared practices of counting with small objects that are reflected in the grammaticalization of numeral classifiers from words such as 'stone' or 'seed' (Urban, 2015b;Rojas-Berscia and Eloranta, 2019). Furthermore, the Northern Peruvian Quechua varieties have been enriched lexically with loanwords from the local languages (Urban, in press-b). On the other hand, Northern Peru's extinct languages also show structural commonalities among themselves that are in contrast with the Quechua and Aymara languages. The main relevant features are (i) a mix of monosyllabic and disyllabic roots which (ii) feature root-final plosives without restrictions and (iii) often occur in a characteristically reduplicated form (Urban, 2019a(Urban, , 2019b. Together, these characteristics give the lexical roots of Northern Peru a typical distinctive shape. Monosyllabic roots with final plosives are, to be sure, no typologically particularly unusual configuration -Mayan languages, for instance, depend on such roots to an even higher degree than the languages of Northern Peru. However, in the Andean context, this shared profile is notable, for it contrasts considerably with Quechua and Aymara. Quechua languages strongly prefer disyllabic roots, and in Aymara, roots are generally di-or trisyllabic (Hardman, 2001: 24). Likewise, Quechua restricts at least final /t/, but allows /k/ and, where retained, /q/ (final /p/ also exists, but is thought to be due to fossilized suffixation). In Aymara, again, restrictions are stronger, and roots as a rule end in a vowel (Hardman, 2001: 24). 9 It is also highly notable that two Northern Quechua varieties, Ferreñafe and Chachapoyas, go furthest in eliding unstressed vowels, thereby creating heavier syllables even if the material involved is of Quechua vintage (Escribens Trisano, 1977;Valqui, 2018). This can be interpreted as evidence for the integration of Northern Peruvian Quechua varieties into the linguistic ecology of Northern Peru.
The boundary might be located in what is today northern Ancash, where the northernmost varieties of the Quechua I branch are spoken.
However, some of these bear a quite visible Culli substrate in that they have acquired lexical items ending in -t that likely reflects a derivational marker of this shape that can be, especially because it violates the usual Quechua phonotactic constraints, plausibly be theorized to derive from Culli (Adelaar, 1988).
The data also show some borrowing from Quechua (Cerrón- Palomino, 1989;Urban, 2019a). In addition, the presence of some "pan-Andean" forms (Torero, 2002: 29) -similar forms for similar concepts that occur throughout the Central Andes, but whose correspondences are nonsystematic and whose origins are hard to pin down-show at the same time that the Northern Peruvian linguistic ecology was not selfcontained, but integrated as one constituent component into broader Central Andean linguistic interaction spheres (Urban, 2019a: 208). What linguistic ideologies and real-world patterns of interaction were underlying and supporting the linguistic contact zone into which the northern Peruvian languages amalgamated is not recoverable from the linguistic data themselves (though see Urban, in press-c). Some of the presence of languages like Mochica in places like Balsas appear to be due to Inca-instigated resettlement (cf. Church and Von Hagen, 2008: 908), others, however, may rather reflect autochthonous practices of "resource sharing" (on which see Ramírez, 1995;Topic, 2013).
In spite of the thorough philological analysis of extant linguistic documentation of Northern Peru that has been carried out in the last few decades, the evidence could be questioned on grounds of the poor state of documentation. In this context it is interesting to note that broader perspectives on the areal typology of the Andes, based on systematically collated data which are analyzed by advanced statistical methods, point to similar conclusions. Michael et al. (2014: 53) are concerned with the phonological typology of the languages of the Andes and, applying Bayesian statistics, find support for a division of Andean languages in northern and southern ones, the dividing line running in Central and Southern Peru. The overall typological data from Urban et al. (2019) paint a smoother transition, but still with a clear geographical basis along a North vs. South axis for the typological clustering of Andean languages.

Perspectives on North-South structure in the Central Andes from studies of uniparental markers
The first genetic studies on the subcontinent observed a split between Andean and Amazonian populations which was contextualized with different geographic and ecological features (Fuselli et al., 2003;Tarazona-Santos et al., 2001). Following studies, which increased the number of population samples available, confirmed in particular the presence of an unspecific Andean "core" of genetic homogeneity, without identifying its boundaries. These first results are based on the study of uniparental markers (Y chromosome and mitochondrial DNA, or mtDNA). These are widely studied genetic markers which contributed to generate broad comparative data for the continent. Their analytical power, though, is limited: each individual carries only one type of mtDNA and, in the case of males, one type of Y chromosome. Geneticists study specific types which have a characteristic distribution and varying frequency in different populations. These types are also referred to as haplogroups: more technically, these are major lineages (or branches) within the phylogenetic trees of both mtDNA and Y chromosome, named with capital letters.
Mitochondrial haplogroups in the Native American populations are represented by four predominant groups (A2, B2, C1 and D1 - Torroni and Wallace, 1995). Haplogroup B2 is of particular interest for our purposes here, as it is characteristically frequent in the Andean region (Bisso-Machado et al., 2012). Nevertheless, the sole presence of B2 should not be seen as a straightforward indicator of Andean origin: while it reaches the highest frequencies in Quechua-and Aymaraspeaking populations of the surroundings of Lake Titicaca (Sandoval 9 One can even observe how northern vs. southern canons in root structure clash. In the list of numerals from an indigenous language of the North Coast that was found in the ruins of a colonial church at Magdalena de Cao , two forms are borrowed from a Quechua II variety. The source forms are tawa 'four' and suqta 'six,' which, however, are phonologically adapted to the North Coast language (probably Quingnam) as < tau > and < sot > respectively. This creates the typical North Coast monosyllabic root shape from the typical Quechua disyllabic root shape by eliminating the unstressed final vowel. The resulting form < sot > , in fact, would actually be impossible in Quechua because of the final alveolar stop. et al., 2013a,b), like the other three founding haplogroups, it can be found all over the continent. In particular, B2 is also found at relevant frequencies up north to Ecuador (Rickards et al., 1999) and sporadically in pockets of the Amazonian region, e.g. among the Xavante (Ward et al., 1996). Specific sublineages of haplogroup B2, however, can be linked more robustly to a characteristic Andean/highland profile, and are particularly frequent in the Southern Sphere of our target region: B2ai is found from Central Peru to Bolivia and Northern Chile, while B2aj is present at high frequencies around Lake Titicaca, in Bolivia, and Northern Chile (Gómez-Carballa et al., 2018).
For the Y chromosome, haplogroups alone do not allow to detect any relevant regional structure, not even the broad distinction between the unspecific Andean core and neighboring regions; there is only one predominant Native American haplogroup in all of South America, Q-M3 (Bisso-Machado et al., 2012;Pinotti et al., 2019). Higher resolution can be achieved through the analysis of shared haplotypes: these are genetic profiles characterized by the same set of variants, possibly inherited from a common ancestor, and belonging to the same haplogroup. Two individuals who share the same Y chromosome haplotype are likely to have inherited it on the direct paternal line from the same ancestor, who would have lived within a time range expressed in generations ago (the more generations pass by, the higher the chance for mutations to appear and for the two haplotypes to diverge from each other -see review in Calafell and Larmuseau, 2017). By identifying such connections between individuals of different populations, it is possible to reconstruct events of contact and exchange, assuming that descendants of the same individual migrated (or just traveled) from one place to another and left descendants there. This scenario is directly relatable to the concept of the interaction sphere as we employ it.
The connections that we are able to reconstruct using Y chromosomal data, however, are only meaningful for relatively short time frames: looking at a couple of millennia in the past, the chances to have a shared ancestor are too high and such connections become less informative (Rohde et al., 2004 -see the analysis discussed in Section 5.3.3). Conveniently, the data for these profiles that is reported in the literature covers an appreciable number of populations and geographical areas. In our region of interest, genetically homogeneous populations correspond to a dense network of shared Y chromosome haplotypes. The geographical limits of this sharing extend from the Cuzco region to Lake Titicaca and to Bolivian populations as far south as Potosí (Barbieri et al., 2017); this network also includes a few groups from the Eastern slopes of Central Peru (Barbieri et al., 2014). Like the characteristic mtDNA haplogroup presence noted before, this network of gene flow can also be considered specific to the Southern Sphere: it crucially does not extend to Northern Peru, which looks partially separated by a gap roughly positioned in Central Peru, as explored with a focus on the Northern Central Andes in Barbieri et al. (2017).
The limited availability of sampled populations constrains our ability to provide further perspectives on the demographic structure of the Central Andes. The coastal region is underrepresented in the genetic literature, with the first genetic samples (from Peru's South Coast) coming from aDNA (ancient DNA) studies (Fehren-Schmitz et al., 2010. Only in recent years sampling efforts extended to northern regions of Peru, with populations from both the highlands and the coast, analyzed for both uniparental and genomic markers (Barbieri et al., 2017;Cabana et al., 2015;Guevara et al., 2016;Harris et al., 2018;Sandoval et al., 2013a,b;Sandoval et al., 2016), but the coverage here still remains poorer (as discussed in the analysis in Section 5.3.3).

Perspectives on North-South structure in the Central Andes from full genome analyses
As mentioned before, uniparental analysis focuses only on one line of ancestry between the many that contribute to each individual's history. To obtain a more complete picture of the relatedness between individuals and populations, one must consider genomic data, either thousands of markers across all the chromosomes (Single Nucleotide Polymorphisms, SNPs) or even full genome data. These analyses are usually more complex and cost-intensive, but recent studies are increasing the data available also within South America. Here we review the results of a recent study which provides genomic data for 19 living Peruvian populations, including both the North Coast and the Southern Highlands (Barbieri et al., 2019). In this study, the coast is represented by a high number of new samples, many of which with a predominant Native American genetic ancestry. 10 The study also features new samples from the North Highlands of Peru in the region of Chachapoyas, thus significantly contributing to a more even coverage of the genomic landscapes of the Central Andes.
The newly available samples are compared with new and available samples from Bolivia, Ecuador and Colombia on the basis of ~600,000 SNP variants across the genome. A popular way to decompose the variance of thousands of independent SNPs is cluster analysis performed with the software ADMIXTURE (Alexander et al., 2009;Patterson et al., 2012). ADMIXTURE reduces the total diversity into a given number of "ancestry components," and reports their frequencies in each individual of the dataset. In Barbieri et al. (2019), the populations from the Central Andes all harbor the same broad regional ancestry, which is present at high frequency in all individuals. This result is in line with previously detected homogeneity in western South America, which was explained as a common origin from the same major migration event (Raghavan et al., 2015). At a more fine-grained scale of analysis, regional differences relevant for our research question can, however, be appreciated: ADMIXTURE analysis at K = 8 (i.e. with eight ancestry components recognized by the algorithm) separates an ancestry component characteristic of the Southern Highlands ( Figure S2 of Barbieri et al., 2019). This component also appears sporadically and at low proportions in some individuals of the Chachapoyas region and of the population sample from Magdalena de Cao on the coast of Northern Peru. It is important to notice that this high number of Ks (i.e. ancestry components that we force onto the existing genetic diversity) is not strongly supported by the algorithm, so the effect described is indeed subtle. Nevertheless, this faint North-South divide is consistently shown also by other independent analyses: another commonly performed descriptive analysis is Principal Component Analysis (PCA), which decomposes the variance into major axes of variation that can be then plotted in a bidimensional space; in this way, the distance between individuals on the plot is roughly proportional to their genetic distance. In the PCA analysis of Barbieri et al. (2019), the three groups overlap in the main three principal components, but on the fourth component the Southern Highlands are differentiated from the coast and the Northern Highlands. This level of population structure detectable with ADMIX-TURE and PCA is influenced by events that occurred at early stages of human occupation through the continent (see the following section for further exploration of the temporal dimension). By investigating shared haplotypes, instead, it is possible to reconstruct more recent cases of gene flow which overlay the previously described ancient components. In Figure 4 of Barbieri et al. (2019), the heaviest load of shared haplotypes is found among the populations of the North Coast, which share a large number of long segments that are indicative of recent and possibly ongoing contact that resulted in shared recent ancestors. The North Coast is more isolated from the rest of the dataset: sampled coastal populations have 13 shared haplotypes with four populations from the neighboring Northern Andes (Chachapoyas region), and only six shared haplotypes with six populations from the Southern Andes (from the highlands near Lima to Lake Titicaca). The six populations from the Southern Sphere are also interconnected by shared haplotypes of slightly smaller size than the ones shared within the coast and within the Northern Andes. This possibly indicates that the interactions underlying the genesis of shared haplotypes has come to a halt.
One drawback of Barbieri et al.'s dataset is that it lacks samples from the South Coast, which would be important to better define patterns of interaction in Southern Peru. Our only chance so far to include the South Coast is represented by studies of ancient DNA. A previous study employed numerous aDNA mitochondrial samples from Nazca and neighboring cultures to infer the timing of the split between South Coast and South Highlands with demographic simulations (Fehren-Schmitz et al., 2014). The scenario which best matched the observed genetic variation was one designed with a coast/highland split at 1175 BCE, a moderate-to-high emigration (10%) from the coast to the highlands, followed by a massive reimmigration (25%) of people from the highlands to the coast in the Late Intermediate Period. These dynamics confirm a higher level of connection for the Southern regions, not only within the highlands, but also with the coast.
In sum, mitochondrial and Y chromosome data identify a region of similar genetic background and intense sharing in Southern Peru and the Titicaca basin, which includes the South Coast of Peru and extends into Northern Chile. Genomic SNP data demonstrate that this southern network of shared genomic material is, on a relatively subtle but clearly appreciable level of analysis, set apart from the genomic profiles found among populations of the northern half of the Central Andes, which show slightly weaker but likewise appreciable links between the major ecozones (coast, highland, eastern slopes), and a high level of sharing within such ecozones. We can therefore conclude that both uniparental and genomic data converge in depicting a subtle structural divide between the proposed Northern and Southern interaction spheres, with slightly distinct ancestry profiles (or gene pools), and a high level of sharing within the spheres.

Introduction
If we wish to meaningfully compare the archaeological record with the one from linguistics and population genetics, it is not sufficient to compare distributions in space. This is especially so because the linguistic and current genomic perspectives are by and themselves not directly interpretable historically, but represent snapshots of patterns of diversity that are or were in place at certain points of time. However, the question of nuclearization as opposed to unification in the Central Andes as a culture area is intimately linked with the emergence of statelevel societies and cultural and political trajectories through time. Hence, we must attempt to dynamize our notion of interaction spheres as evidenced in the linguistic and genomic evidence, and attempt to provide a chronology of their genesis. Torero (1990: 244) opines that the major Andean languages were in place by 500 CE, but does not mention any factual evidence in support. Nevertheless, a variety of considerations indeed point to an overall stability of the linguistic differentiation in parts of Northern Peru (Urban, 2019a(Urban, : 225-231, 2019b, i.e. a situation of linguistic maintenance in a multilingual contact setting. 11 For instance, the Mochica-Quingnam dualism on the North Coast corresponds to slightly different cultural trajectories in the same regions where these languages are attested in early historical times that goes back to at least Moche times; likewise, the clear presence of the Mochica language in the upper Piura valley where a Moche presence is attested (though its nature is not quite clear) is compatible with the assumption of an in situ development of the languages of Northern Peru with linguistically stable division through time from the Early Intermediate Period onward (Urban, 2019a). This is not tantamount to saying that no changes in the linguistic ecology occurred, and in fact we cannot securely project the landscape from the early 16th century back in time lightheartedly. Indeed, to what extent the linguistic mosaic of Northern Peru really was stable through time (rather than just compatible with such a scenario) is difficult to say on the basis of the linguistic evidence alone, as linguistics generally is hard-pressed when it comes to absolute chronologies of language-internal developments. This is only exacerbated in the case of linguistic isolates such as the ones that were spoken in Northern Peru. Therefore, while bearing in mind some indications for a stable linguistic ecology in Northern Peru from Moche times onward, especially the North Coast, our inferences regarding the chronology of the emergence of population structure will mainly come from the discipline that is better suited to provide absolute chronologies, i.e. genetics.
In particular, with the use of ancient DNA, genetic continuity of specific regions can be reconstructed through longitudinal time transects. Ancient DNA anchors genetic variation to radiocarbondated human remains which are directly associated with archaeological contexts. In the Central Andes, aDNA analysis suggests a strong genetic continuity which persists over the past 8,000 years: this means that genetic profiles from the same regions are similar through time, resisting major population replacements (Baca et al., 2014;Fehren-Schmitz et al., 2014). A recent study brings together new and published ancient data from 89 individuals, mostly from Peru and Bolivia, dating between ~9,000 and 500 years ago (Nakatsuka et al., 2020). According to the radiocarbondated sites available and the ADMIXTURE analysis performed, an appreciable discontinuity between North and South in the Central Andes was already present by at least 5,800 years ago. Then, a genetic divide between North and South which resembles the one observed today appeared within the last 2,000 years and survived the rise and fall of major complex societies. Before 2,000 years ago, gene flow was more intense between Northern and Southern Highlands.
In light of these results from aDNA analysis, we wanted to quantify the magnitude of shared genetic material between and within the two proposed interaction spheres in the Central Andes -Northern and Southern-, and evaluate the time windows of these exchanges, thus expanding the still relatively sparsely available aDNA with the larger database available for living populations. We therefore performed new genetic analyses of published data with a two-pronged approach, involving (i) an analysis of different time windows of haplotype exchanges, from the broad Y chromosome data available, and (ii) a demographic simulation to reconstruct divergence time and migration for the published genomic dataset of Barbieri et al. (2019).

Methods
For the Y chromosome haplotype analysis, we considered part of the dataset assembled and analyzed in Barbieri et al. (2017) with a focus on western South America. The dataset used here includes 36 populations from key regions of the Northern and Southern cultural spheres from Peru and Bolivia (plus Kichwa from Ecuador). Data for Short Tandem Repeats (STR) haplotypes is taken from the published literature (Gayà-Vidal et al., 2011;Roewer et al., 2013;Sandoval et al., 2013a,b;Guevara et al., 2016;Sandoval et al., 2016;Barbieri et al., 2017), and 15 stable STR loci are considered. STR Y chromosome data is subject to a rapid mutation rate, and is therefore able to discriminate between changes occurring in the past centuries. Pairwise haplotype similarity was adjusted for the mutation rate for each locus as reported in the Y STR haplotype reference database (https://yhrd.org/), using, following Barbieri et al. (2017), the Average Square Distance formula (ASD, Goldstein and Pollock, 1997). ASD is commonly used to calculate the divergence age between populations from their STR haplotypes, and corresponds to the average variance divided by the mutation rate at each locus. Here, we use ASD to approximate the divergence time between pairs of sequences, with greater confidence in the relative degree of similarity than in any exact divergence time estimates. The distance between non-identical (similar) pairs of haplotypes is transformed into divergence time by correlating the variance per locus with the mutation rate associated to each locus. We then binned the divergence time frames infive clusters: identical haplotypes, and similar haplotype pairs which harbor a set of differences that could have accumulated within 500, 1,000, 2,000, and 3,000 years ago. The sum of sharing events between each population pair (pop1 and pop2) is transformed into frequency values by dividing this number by the product of the two population sizes in each comparison (number of individuals in pop1 × number of individuals in pop2). Visualization and analysis are performed in R (R Core Team, 2020). In Fig. 3A, a map with the approximate location of the populations included in this analysis is colorcoded to distinguish the two groups. Fig. 3B shows the distribution of the frequency of identical haplotypes, and all the haplotypes that could have diverged within 500, 1,000, 2,000, or 3,000 years ago, for pairs of populations within the North, within the South, and between the North and the South.
As Y chromosome data allows for relatedness reconstructions only for the paternal line, it is important to consider whole genome data for more complete perspectives on the populations. For the whole genome analysis, we used the dataset of Barbieri et al. (2019) and focused on the divergence time between target populations with demographic simulations. The method used is a coalescent simulation with an Approximate Bayesian Computation statistical framework, on a dataset of ~2,500 SNPs ascertained to be variable in Native American populations, from the Affymetrix Human Origins Array (Patterson et al., 2012). Details on the simulations can be found in Barbieri et al. (2019). To design our demographic model, we chose three viable proxy populations from the dataset, with low levels of European admixture and appreciable sample size, to represent the North Coast, the Northern Central Andes around Chachapoyas, and the southern part of the Central Andes. Our population proxies are (i) Sechura_Tallan, a mixed sample of two coastal locations which are genetically homogeneous and have a high percentage of Native American ancestry for the North Coast; (ii) La Jalca, the sample in the province of Chachapoyas with the least European admixture, for the Northeastern Andes; and (iii) Puno, a sample from the Titicaca lakeshore with a high percentage of Native American ancestry, comprising Quechua and Aymara-speaking individuals for the southern part of the Central Andes. We modeled a demographic scenario with an Amazonian group as outlier (the Karitiana, which are included also to adjust for the ascertainment bias of the SNPs subset described in Panel 7 of Patterson et al. 2012, see Methods in Barbieri et al., 2019), and assigned broad, overlapping time split priors between Sechura_Tallan, Puno and La Jalca. We allowed for continuous migration between the three target populations. The simulations are then compared to the summary statistics of our observed data (the actual genomic diversity of the chosen populations) to obtain posterior curves indicating the most probable time split and migration rate. Fig. 4 shows a summary of the demographic model and the posterior probabilities associated to our actual genomic data for the target populations.

Results
The Y chromosome analysis offers a glimpse into the paternal relationships between populations of the targeted region. Haplotype sharing was binned into different divergence time windows to appreciate the variation in the intensity of sharing through time. Southern regions consistently showed a dense network of sharing, as seen also in published studies (see Section 5.1). In our new analysis (Fig. 3), we see that the network of the Southern Sphere includes haplotypes that diverged in ancient as well as in recent times, even in the 500-1,000 years timeframe. On the other hand, the connections within the North become appreciable (with a higher mean, closer to the one within the South) only with a longer time frame or relatedness, namely when we take together all the events that might have occurred in the last 2,000 years. Connections between the North and the South are less intense but appreciable over all the time slots considered, especially again within the last 2,000 years (or more): this suggests that North and South exhibited lower exchange rates in the last 2,000 years. With this analysis, however, we cannot distinguish if the situation before 2,000 years ago was of a common origin (ancestors of the Northern and Southern Sphere belonging to the same overall populations) or stronger networks of gene flow between two already distinct populations, which became less intense in the last millennia. One reason for relatively and similar haplotype pairs. Map produced using the R package "maps" (Becker et al., 2018), graphics produced using the R package "ggplot2" (Wickham, 2016). recent connections could be traced back to "mitma," (i.e. forced resettlement of groups of people, a common practice perpetrated by the Inca empire, which is known to have taken place over long distances, see D'Altroy 2014): nevertheless, such events are not captured by our genetic dataset which shows very limited north-south connections in the most recent time layers. Another important caveat relates to data coverage: as stated before, genetic studies have been focusing more intensely on the Southern half of the Peruvian Andes, while the Northern regions would benefit from denser sampling. This means that connections in the North might be slightly underrepresented, even though in our analysis the sharing frequency is adjusted for sample size to partially account for the uneven number of samples per population.
The new demographic simulations based on genomic data (Fig. 4) return even more explicit results as they are capable of modeling both the time of divergence and the rate of gene flow between groups. For these simulations we designed a tripartite model which represents the North Coast with a good target population sample and takes into account the slight differences in the genetic makeup of the Northern Coast and the Northern Highlands, as described in the original publication (Barbieri et al., 2019). On a broad continental level, the observed genetic profiles are compatible with a divergence of the three Peruvian target populations from the Amazonian outgroup at 6,000 years ago, confirming the results of Barbieri et al. (2019). When looking at the diversity within Peru, we now observe that the divergence time between the three target populations is around 2,500-3,500 years ago between Southern Highlands (Puno) and North Coast (Sechura_Tallan), and around 2,000-3,000 years ago between Southern Highlands and Northern Highlands (La Jalca). The migration rate after the population split was very high between the two Andean populations (0.8-1% of migrant exchanges per generation), lower between the two populations of the Northern Sphere (representing highlands and coast), and very low between the Northern Coast and the Southern Highlands. The connection between the Andean populations of Puno and La Jalca appears therefore stronger than the one between Puno and the coastal Sechura_Tallan, given the more recent divergent time and the much higher inferred migration rate. The results are broadly in agreement with population structure uncovered by the analysis of ancient DNA from Peru (Nakatsuka et al., 2020), and point at the existence of ecogeographic divides within the Northern Sphere, and between the Northern and Southern Sphere.
In summary, the two new analysis performed (Y chromosomal and genomic data) both return a North-South structure appreciable from 2,000 ago onward and possibly continuing until recently. This level of structure is compatible with a high level of migration within regions, especially in the South, where the dense network of Y chromosome haplotype exchange continues until the last 1,000-500 years. We can conclude that the demographic divide between the Northern and Southern Sphere is well supported by genetic evidence, and relatively ancient. The level of gene flow between these, which possibly reduces in intensity in recent times, is not strong enough to confound this demographic divide.

Discussion
In this article, we take a broad (and hence, necessarily relatively abstract) perspective on the cultural and societal development of the Central Andes and what have been identified as nuclei of the overarching culture area on the North Coast, in the Southern Highlands, and in the Titicaca basin, where the first undisputed examples of state-level societies are in evidence.
We have sketched how, on a likewise broad level of analysis, archaeologists have raised the question whether the Central Andean culture area can be conceived of as an overarching whole, or whether it would be better regarded as an emergent phenomenon from relatively independent developments in three, or, given the intimate intertwinement between Wari and Tiwanaku, rather two nuclear regions.
Here, we have explored if and how the linguistic situation in the earliest attested historical period and the genomic profile of indigenous people of the Central Andes differ between North and South and suggest the existence of relatively self-contained interaction spheres in these regions that are supported by both types of evidence. Both linguistic and genetic perspectives have only become available in the last few years thanks to advances in the comparative analysis of the past linguistic ecologies, especially in Northern Peru, and the newly available genomic data from North Coast and eastern slopes in the Chachapoyas region. On the basis of these intradisciplinary advances, we are now in a position to incorporate Northern Peru into an interdisciplinary dialogue in ways that were not attainable before. We have shown that recent work in linguistics and genetics indeed suggests, independently from one another and from the archaeological evidence, the existence of interlocking Northern and Southern interaction spheres in the Central Andes, reflecting more intense interaction within these regions than between them.
From the linguistic point of view, we can observe different linguistic ecologies in the North and the South of the Central Andes as documented by the historical records of the 16th century. The North was characterized by a mosaic of interacting languages and a high level of linguistic diversity, with a fabric of fewer languages -varieties of Quechua and Aymara, together with Puquina and Uru-Chipaya-interspersed with one another locally in the south (see Mannheim, 1991: 49-53). Both regions have in common that bi-or multilingualism and language contact must have been a significant factor (Urban, 2017) which led to the convergence of the languages with regard to the lexicon and some structural features -clear hallmarks of interaction spheres.
Casual, but also relatively intensive levels of language contact are would split between an African population (Yoruba) and a remaining group in the Out of Africa scenario, at time tOOA (prior: 40-120 thousand years agokya). The latter group (NeOOA) would be colonizing the rest of the continents and split somewhere in Siberia between a local group (NeC) and the founders of the American continent (NeSA1) at time tSA (prior: 12-30 kya). Within the Americas, the target populations (effective population size NeKa, NeST, NeLJ, NePu, priors: 500-8,000 individuals) would split with broad overlapping priors (t: time, priors 0-5 kya) and exchange migration rates (m: migration rate, priors 0-0.1). Bottom: distribution of posteriors for m and t between the target populations. Note that the topology is determined by relative values of tPu-ST and tPu-Ka: Puno and Sechura-Tallan form a monophyletic group when tPu-Ka > tPu-ST. For details on the methods used in this paper, see the methods section of Barbieri et al. (2019) and Figure S4 in that article.
conceivable without any gene flow occurring, as shown by Pakendorf (2015: 631) in work on Siberia. On the other hand, situations of language maintenance in a multilingual setting are hard to correlate linearly with particular genetic scenarios (Pakendorf, 2015: 633). From the genomic point of view, in our case, one observes high levels of linguistic homogeneity and shared haplotypes in what is today Southern Peru and the Lake Titicaca region. Specifically, a noticeable characteristic of the Southern Interaction Sphere as viewed from the genomic point of view is the higher level of gene flow, which suggests an extensive and dense network of contact throughout the last millennia.
As aDNA in particular shows, the affinities at times partially transcend the coast-highlands divide and links coastal regions of Nazca with the highlands of Ayacucho. 12 In contrast, genetic diversity is higher within the Northern Sphere, with some structure in particular between the highland and coastal domains, a situation that might reflect that, while language boundaries were probably not sharp, largely different languages were spoken in the coast and in the highlands.
Thus, the tripartite division of the Central Andes evoked by the seminal work of Stanish (2001), with a distinction drawn between Southern Peru, where Wari developed, and the Titicaca basin with the Tiwanaku civilization, is less well supported, at least from the demographic/genetic evidence available. However, distinct trajectories within the Southern Sphere could have persisted without a marked demographic effect. It is also possible that we do not have enough relevant population samples to appreciate this divide, in particular for living populations in the core regions of the Wari domain. The most parsimonious interpretation that emerges from the interdisciplinary contextualization of the archaeological evidence, nevertheless, confirms the view that "there was a vast southern cultural region whose internal interactions were much greater (more intense, frequent, significant) than the region's communication with Andean societies farther north" (Isbell and Silverman, 2008: 507).
At the same time, the evidence from archaeology, linguistics and genetics is also consistent in that it does not suggest completely separated spheres. Rather, there is clear cultural contact and influence, language contact and gene flow that linked North and South together without obliterating their clearly recognizable distinct identities. For instance, the "Southern Andean Iconographic Series" (Isbell and Knobloch, 2006;Isbell, 2008;Isbell and Knobloch, 2009) amalgamates components inherited from the Chavín phenomenon, and Wari did have an impact on the highlands and, to a lesser degree, the coast of Northern Peru (Lange Topic, 1991;Lau, 2012). We find the linguistic and genomic evidence in agreement with Isbell and Silverman's (2008: 512) summarizing assessment of the archaeological record to the effect that "[t]he north coast and the Lake Titicaca basin definitely have a longue durée kind of cultural consistency and essentially unbroken trajectories of increasing social, political and cultural complexity. However, these regions did not develop in isolation from the other areas of the Central Andes …" What we are showing here for the first time is that northern and southern archaeological cultures, languages, and peoples were, within the two spheres, in relationships that led them to develop in recognizably differentiated ways.
We also explore the temporal dimension to match the interdisciplinary evidence which shapes the spheres of influence within a coherent chronology. The new genomic analysis suggests that internal structure between North and South Highlands of Peru and the Titicaca basin appeared at around 2,000 years ago, and when the North Coast is factored in, the estimated time frame suggests a range of 2,000-3,000 BP. 13 Likewise, the genomic profiles crystallize and stabilize at around 2,000 years before present, and show long-term continuity from then as, slightly later, does the archaeological record (a continuity that, in this case, even persists to the present day, see Nakatsuka et al., 2020). The emergence of population structure apparently predates the instauration of the major archaeological cultural complexes. The two developments appear chronologically shifted, suggesting that complex cultural identities and patterns of political organization develop over groups that already formed cohesive internal relatedness profiles. Following this scenario, a demographic process of nuclearization, then, created distinct demographic substrates on which societal complexity begins to flourish. 14 The notion of "cultural cohesiveness" can also apply exclusively to dominant groups and political elites over a patchwork of "populations" characterized by various degree of internal relatedness. It is therefore meaningful that there is a signal of correspondence also between the demographic history and the languages with which local people communicated with the described archaeological trajectories. This strengthens the correspondence between cultural features, demographic connections, and interactions leading to linguistic convergence as vectors behind the development of complex societies which in their expansive phases would propagate over existing cultural and demographic realities, adding their influence in terms of exchanges and connection but not leading to substantial population replacement (cf. Klaus, 2014 for the North Coast). Certainly, some level of displacement did occur through the centuries: for instance, long-distance relocations of some magnitude are clearly in evidence for the later phases of the Inca empire (D'Altroy, 2014) and, more recently, by the policies of the Spanish secular and ecclesiastic authorities, but the effect of those appear to not have been strong enough to perturbate the demographic continuities recorded through the last millennia (and anchored through evidence from ancient DNA). This has allowed us to meaningfully engage macro-patterns observable in the demographic population structure of the Central Andes with similar macro-structure in language geography and areal typology, and with the archaeological record.
To conclude our discussion, we reiterate that our study has several limitations. One is the coarse nature of our analyses which clearly require subsequent work at finer spatial and temporal levels of resolution. Another limitation is the available data on which our analyses and interpretations are based. The records are still incomplete, and are likely to remain so as far as linguistics is concerned; were such data available the picture might change in unpredictable ways, and the same is true should future research in molecular anthropology be able to draw on more current and ancient DNA samples, where there still are gaps in coverage especially in the highlands of Cajamarca and also in Central Peru.

CRediT authorship contribution statement
Matthias Urban: Conceptualization, Investigation, Writing -12 These demographic developments, in turn, may have an archaeologically visible correlate in iconographic similarities between the Nazca and Wari cultures as well as its local ayacuchan predecessor, Huarpa (cf. e.g. Silverman, 2002: 178). 13 In spite of recent computational approaches that hold promise for opening new possibilities, reliable absolute dating is at present still a desideratum of historical linguistics that has not yet been achieved (Maurits et al., 2020). By calibrating against the internal diversity of linguistic groupings whose history is well known, however, it is possible to make at least approximate inferences as to the likely time depths of language families. The internal diversity of the Quechua family, for one, can be likened to that of the Romance languages, perhaps minus the divergent French (Heggarty, 2007) which suggests to situate the breakup within a time frame of perhaps 1500-2000 years. However, the extremely blunt nature of this approximation makes it impossible to draw firm inferences as to whether this split predates or is concomitant with state formation in the Early Intermediate Period. 14 Note, though, that Shimada (1999: 487) considers the emergence of north-south cultural spheres in the Early Intermediate Period as an "accentuation of cultural choices harking back to the Formative Era." original draft, Writing -review & editing, Visualization. Chiara Barbieri: Conceptualization, Formal analysis, Investigation, Writingoriginal draft, Writing -review & editing, Visualization.