The Nordic research infrastructure for syntactic variation: Possibilities, limitations and achievements

Øystein Alexander Vangsnes1,2 and Janne Bondi Johannessen3 1 CASTL and AcqVA, Department of Language and Culture, UiT The Arctic University of Norway, Langnes, 9037 Tromsø, NO 2 Department of Language, Literature, Mathematics and Interpreting, Western Norway University of Applied Sciences, NO 3 MultiLing, ILN, University of Oslo, Blindern N-0317, Oslo, NO Corresponding author: Øystein Alexander Vangsnes (oystein.vangsnes@uit.no)


Introduction
Since 2003 and for a period of about ten years a network of ten research groups in the five Nordic countries worked within the Scandinavian Dialect Syntax project (ScanDiaSyn) towards mapping syntactic variation across the North Germanic dialect continuum. Two major research tools grew out of the collaboration: the Nordic Dialect Corpus (NDC) and the Nordic Syntax Database (NSD), see Section 2.
In 2014 The Nordic Atlas of Language Structures Online (NALS) was launched with some 50 papers based on data from the two databases organised under the following six general topics: (i) Noun phrases, (ii) Verb phrase: Argument structure and verb particles, (iii) Verb placement, (iv) Middle field/TP: Subject placement, object shift, auxiliaries and tense marking, (v) Left and right periphery: complementisers, questions, extractions etc., and (vi) Binding and co-reference. NALS is formally organised as a journal, and has continued afterwards to publish papers that map variation across the North Germanic dialect continuum.
In this paper we will discuss two syntactic phenomena on the basis of data available from the dual NDC/NSD research infrastructure: (i) the relative placement of adverbs and infinitive markers (±split infinitive), and (ii) non-V2 in matrix wh-questions across Norwegian dialects. For each of these we discuss possibilities, limitations and achievements posed by the research infrastructure as promised by the title of the paper. Section 2 presents the two types of research infrastructure. Section 3 presents the investigation of placement of adverbs and infinitives, and also illustrates how the corpus and database can Glossa general linguistics a journal of above) only included material from rural and semi-rural locations, and hence no Swedish (and Finnish) cities are represented in the infrastructure. Since geographical distribution was prioritised, the locations were not balanced for population size, hence there are both "small" and "big" dialects in the sample.
The fieldworkers were a good mix of senior and junior researchers and student assistants who travelled to the locations of the informants to carry out the data collection. In most cases two fieldworkers would do the data sampling together. In some cases the fieldworkers would speak a similar dialect as the informants, but in other cases not.
The test sentences in the questionnaire were pre-recorded by a speaker of the same regional variety so that the pronunciation would be the same as or similar to that of the informants. This was done to avoid the sentences being deemed unacceptable because of pronunciation. The informants were instructed to judge the sentences on a Likert scale from 1 (bad) to 5 (good) according to their own dialect intuitions, and they would give their judgments after hearing each pre-recorded sentence. These questionnaire sessions lasted about one to one and a half hours. The recording sessions consisted of an interview of about 15-20 minutes with one of the fieldworkers and a conversation with another informant of about 30-45 minutes. Sampling data from four informants at one location normally took a full work day.
The informants were typically recruited through a local contact person according to a set of criteria targeting traditional dialect speakers. Beyond age and gender information sociolinguistic information about the informants was not recorded. The informants were not paid, but given a symbolic gift as a token of gratitude and they were also served coffee, tea and soft drinks as well as fruit and (non-crunchy) candy at the sessions. For further details about the project logistics, including methodologies, see Vangsnes (2007a;b), Johannessen et al. (2008), Lindstad et al. (2009), Johannessen et al. (2014).
The papers in the NALS Journal, which typically exploit both the the NDC (corpus) and the NSD (judgment database), show that the dialect infrastructure developed under the ScanDiaSyn umbrella does indeed allow researchers to investigate new isoglosses and dialect phenomena across the Nordic countries. The two case studies to be presented in this paper should serve to make the same point.

Case study 1: Relative placement of infinitival marker and negation (±split infinitives)
The differing relative placement of infinitival markers and adverbs across the Scandinavian written languages is a well-known issue (see for example Hulthén 1947;Faarlund et al. 1997). The received wisdom is that Danish requires the adverb (especially the negative adverb) to be placed before the infinitival marker, as in (1), whereas Swedish requires the adverb to follow the infinitival marker, as in (2), and Norwegian is supposed to accept both orders. The "Danish pattern" is what often is referred to as "unsplit infinitives" since the adverb does not split the infinitival marker from the verb, and conversely the "Swedish pattern" is generally referred to as "split infinitives" precisely since the adverb separates the infinitival marker from the verb. A third pattern, given in (3), is the standard word order in Icelandic infinitivals (with an infinitival marker), and this pattern was also tested for the mainland languages (and Faroese). The sentences are all given here with Bokmål Norwegian words and orthography. (1) Kjell hadde lenge prøvd ikke å komme for sent på jobb. kjell had long tried not to come too late at work 'Kjell had for a long time tried not to come too late for work.' Vangsnes and Johannessen: The Nordic research infrastructure for syntactic variation Art. 26,page 4 of 23 (2) Kjell hadde lenge prøvd å ikke komme for sent på jobb.
The issue of "split infinitive" is well-known also from English grammar and is a source of controversy where prescriptivists favour the use of unsplit over split infinitives (see Huddleston & Pullum 2002: 581f, for references). This same prescriptivism can be witnessed also in the context of Norwegian as unsplit infinitives have traditionally been the recommended word order. Furthermore, the Norwegian reference grammar (Faarlund et al. 1997: 997) claims that while both orders are possible in Norwegian, the unsplit pattern is the most natural for many language users (Faarlund 1997: 997). We will see below that the NSD database does not support this claim. Figure 1 is a screenshot from the NSD of sentence (1) as it was presented to the informants, and we see that the Swedish test sentence is worded differently but nevertheless probes the same word order as its Norwegian and Danish counterparts. Similar adjustments across the language-specific questionnaires were made in several cases, but each "bundle of test sentences" probing a specific phenomenon is always given a unique identity in the database, in this case the number 143.
Searching the database gives a long list with all the answers of all the informants, divided over several results pages. Each informant would grade this sentence (and all the others presented to them) on a scale from 1 to 5. These results are rendered as colour codes in the database, to enable the researcher to assess the results at a glance, see Figure 2.
Even more visually illustrative are the maps that can be generated from the result page. The results from the sentence evaluations performed across the three countries Denmark, Norway and Sweden show very clear and different results, see Maps 1, 2 and 3. A white marker means that a sentence has a mean score of 4 or higher at that geographical location, whereas a black marker means that it has a mean score of 2 or lower. In other words, a white marker indicates acceptability, a black one non-acceptability.
There is a clear acceptance of the unsplit, "Danish pattern" in Denmark. Furthermore, in Sweden the unsplit pattern is quite clearly not accepted, but more strikingly, at the great majority of Norwegian locations the pattern is also dismissed. The split, "Swedish pattern", on the other hand, is accepted not just in Sweden, but also in most of Norway, although there is an enclave in the central part of the country (the Trøndelag area) where the split pattern seems to be rejected at several locations. In turn, as Map 3 makes evident, the "Icelandic pattern" with the verb preceding negation is not accepted by anyone in the mainland countries (and not in the Faroe Islands either). Map 1: Unsplit infinitive: neg -C -Vinf. Vangsnes  A striking result from the database data, evident from the maps, is that the "Danish pattern" is hardly accepted at all by the informants from the Norwegian measure points. Only at ten locations in Norway have the informants accepted it, and these are spread quite evenly across the country.
The general picture thus is very clear. Norwegian dialects generally follow the "Swedish pattern", with negation between the infinitival marker and the infinitive. The "Danish pattern" is rejected in most cases, and there is only one place in which the "Danish pattern" gets a higher score than the Swedish one. The claim by Faarlund et al. (1997: 997) that the "Danish pattern" is more natural for many Norwegian language users is therefore severely weakened by the data in the NSD.
A note on the data from Denmark is in order. First, at one location in the North of Jutland ("Vendsyssel") both the split and the unsplit patterns are accepted. Pedersen (2017: 44ff) has looked more closely at these data, and she finds that four informants at this location accepts the test sentence. Furthermore, she also points out that there is at least one informant at all of the other locations in Jutland that accepts the sentence, and at the measure point Eastern Jutland ("Østjylland") three informants do so. Second, Pedersen (op. cit.) shows that the existence of the "Swedish pattern" has been mentioned and documented in the dialectological literature also for the insular parts of Denmark. In other words, even in Danish dialects, the "Danish pattern" does not seem to be as obligatory as one might think. On the basis of these considerations, it is worth investigating to what extent data from the corpus of spontaneous speech corroborate the results from the database. Defining a search for the "Danish pattern" is, however, not trivial since a negation preceding the infinitival marker may belong to the matrix predicate rather than to the infinitival clause: the example in (4) is ambiguous between a high and low attachment for the negation. This is not the case with the "Swedish pattern", see (5), repeated from (2).
Kjell tries not to come too late at work i. 'Kjell does not try to come too late for work.' ii. 'Kjell tries not to come too late for work.' Kjell hadde lenge prøvd å ikke komme for sent på jobb. Kjell had long tried to not come too late at work 'Kjell had for a long time tried not to come too late for work.' A search for the string [negation]+[infinitival marker], tailored to the "Danish pattern" is formulated as in Figure 3. The search specifies that the first word should not be a verb in the past or present tense (to try to avoid the ambiguous pattern exemplified in (4)), followed by the negation word and the infinitival marker.
This search, when limited to just the Norwegian part of the corpus, gives 13 relevant results. All of them turn out to involve the idiomatic phrase for ikke å 'in order not to/to Map 3: The "Icelandic pattern": C -Vinf -neg. Vangsnes  not even', i.e. only with this preposition and only in this meaning. An example is given in (6).
(6) for itt å snakk om syklinga her for not to speak about cycling.def her 'to not even talk about the cycling around here' (bjugn_19) Other than these 13, there are zero hits for unsplit infinitives (the "Danish pattern") in Norwegian dialects.
The search for the split "Swedish pattern", on the other hand, gives 60 hits from 41 different locations across all of Norway. All of the hits are relevant, and an example is given in (7).

(7)
Han sku laer oss å ikkje ver redd uveret. he should teach us to not be afraid storm.def 'He was going to teach us not to be afraid of the storm.' (stamsund_03gm) The NDC has a map function which allows the user to generate a map to show which locations the hits are from, and Map 4 shows the distribution of the 60 split infinitives found in Norwegian dialects.
There are some important differences between the maps generated by the NSD, Maps 1-3, and the NDC, Map 4. The former only generates maps on the basis of sentence evaluations, while the latter generates hits from spontaneously produced speech. This means that while the locations in Map 4 show where the split infinitive (the "Swedish pattern") has been attested, we cannot draw the conclusion that the missing points on the map are places where it could not occur. We already know that the unsplit infinitive (the "Danish pattern") has only been attested for a sub-construction of all logically possible syntactic possibilities, and therefore that it is not one that would cover the unmarked places on the map. When a corpus does not attest a certain usage, it may be because the informants simply did not use that construction during the recorded conversation session.
The maps illustrate clearly why language research benefits from both a database and a corpus kind of infrastructure. The database data from the NSD has many more hits than the corpus data from NDC. However, a database based on evaluations of sentences can only answer questions that have been asked, and databases will therefore contain only a subset of variations of a construction. A corpus, on the other hand, where informants speak freely, will exemplify many different constructions, even ones that the researcher has not thought of beforehand. At the same time, it is to some extent arbitrary what constructions conversation partners use in recordings. This is exemplified in Map 4, which has far fewer locations than Maps 1-3. What this investigation shows, is that when the researcher is fortunate enough to have a database of evaluations for a particular structure, there will be hits for all the locations investigated. A corpus does not necessarily cover all locations if a construction is not among the most common ones. Still, the corpus can be used to check whether informants in the database have given answers that are indeed compatible with their own language production. Our investigation in this particular case shows that this is indeed the case. The placement of the adverb with respect to the infinitival marker in Norwegian turns out to follow the Swedish pattern, illustrated in Maps 1-3, and the production data from the corpus show the same, even more convincingly, in Map 4. And the whole infrastructure together shows that the claim made in the Norwegian reference grammar (Faarlund et al. 1997) is not supported by our empirical investigations.
We now turn to look at a different and far more complex issue, namely the lack of Verb Second in matrix wh-questions in Norwegian dialects.

Previous knowledge
The traditional portrayal of Norwegian and the North Germanic languages in general is that they are well-behaved Verb Second languages, i.e. with the finite verb in a fronted Wackernagel position in matrix clauses, always occurring before the subject whenever a non-subject introduces the clause. This is exemplified by the declarative clause in (8), and the comparison with the idiomatic English translation serves to make the point.
(8) a. I morgon skal studentane ta eksamen. tomorrow will students.def take exam 'Tomorrow the students will take the exam.' b. *I morgon studentane skal ta eksamen In Lohndal et al. (forthcoming) an overview of exceptions to the V2 requirement in Norwegian is given. The exceptions are more than one tends to acknowledge in general descriptions of the language. In short, the main message in the paper is that Verb Second in Norwegian cannot be an effect of a single macro-parameter, but is rather due to several minor rules which conspire to give the impression that V2 is almost omnipresent (cf. also Weerman 1989).
(9) a. Tromsø dialect (Iversen 1918 At the outset of the ScanDiaSyn network it was already well established that this lack of Verb Second is widespread across Norwegian dialects. The only area for which no instance of the phenomenon had been reported seemed to be Central Eastern Norway around the capital Oslo and the adjacent coastal areas to the south. It had also been established that there is considerable variation across the dialects. The first four points below are basic and shared characteristics of the phenomenon. a. The finite verb is in a position to the right of sentence adverbs (i.e. has not moved to C). b. In subject wh-questions the complementiser som appears in second position (and the finite verb in a position to the right of sentence adverbs). c. All dialects that allow non-V2 in wh-questions, also allow V2: the choice of ±V2 appears to be governed by information structure in that V2 is preferred when the subject is given information and non-V2 when the subject is new information. d. Dialects that allow non-V2 in matrix wh-questions also allow the complementiser som before the trace position of an extracted wh-subject, hence seemingly violating a COMP trace effect.
Points a. and b. entail that there is a strong parallelism between matrix wh-questions with non-V2 and embedded wh-questions: in embedded wh-questions the finite verb also appears to the right of a sentence adverb and the appearance of the complementiser som is obligatory in subject questions.
who som actually decides who decides actually 'I wonder who actually decides.' The complementiser som has no one-to-one English equivalent: it also appears in relative clauses and clefts (corresponding to that) and in small clauses and comparatives (corresponding to as) (see Stroh-Wollin 2002;Vangsnes 2004: 22f for details).
The insight in point c. is due to Westergaard (2003;, who studied V2 vs. non-V2 quantitatively in a corpus of the Tromsø dialect. Point d. can be credited to Nordgård (1985) and is illustrated by the example in (12).

(12)
Kem du trur som ___ e i baren? who you think som is in bar.def 'Who do you think is in the bar?' The following four points concern issues of variation across the dialects that allow non-V2 in matrix wh-questions in the first place. e. Many dialects allow non-V2 only with the short, monosyllabic items kven 'who', kva 'what' and kor 'where'. f. Some dialects also allow complex wh-items and wh-phrases with non-V2. g. Some dialects allow non-V2 only in subject wh-questions. i.e. with som in second position. h. Some dialects allow complex wh-subjects but only simple non-subject wh-constituents, hence make a ±subject distinction with regard to the complexity of the whconstituent.
Point e. was noted for the Tromsø city dialect already by Iversen (1918: 37), and stated more broadly as a trait of Northern Norwegian dialects by Elstad (1982). Point f. was shown by Nordgård (1985) and Åfarli (1986) for Northwestern Norwegian dialects. Point Vangsnes and Johannessen: The Nordic research infrastructure for syntactic variation Art. 26, page 12 of 23 g. is due to Lie (1992: 66) who noted that some Western Norwegian dialects seem to only allow non-V2 with wh-subjects and insertion of som. Point h. can be attributed to Fiva (1995) who reported that in a survey of the Tromsø dialect many informants found complex wh-subjects acceptable followed by som but would still only accept short whconstituents in non-subject questions with non-V2.

The questionnaire data
These various bits of knowledge informed the development of test sentences for the questionnaire to be used in the project. Since the questionnaire was to probe a long list of different constructions, some corners inevitably had to be cut. One of them was to check if the informants allowed both V2 and non-V2 in matrix wh-questions, and along with that, to what extent the choice was governed by information structure. In the Norwegian version of the questionnaire, which at the outset had about 130 sentences, we ended up with the following four sentences regarding non-V2 in matrix wh-questions. (13) has a short wh-predicative, (14) has a complex wh-adverb, (15) has a short wh-subject and (16) has a complex wh-subject, and in sum these sentences should serve to probe the issue of short versus complex wh-constituents, the ±subject condition and, furthermore, whether subject and non-subject questions differ with respect to allowing complex whconstituents with non-V2. However, the sentences would not serve to detect whether there would be a difference between arguments and non-arguments, or if different wh-adverbs would give different results. After the data collection had begun, an additional test sentence with a different wh-adverb was added to the questionnaire in order to possibly obtain more information about other relevant factors.

(17)
Koffer han va så sur, egentli? what-for he was so grumpy actually 'Why was he so grumpy?' ('What was the actual reason for his grumpiness?') The database results for the four initial non-V2 wh-questions have recently been presented in Westergaard et al. (2012;. The results by and large confirm the findings reported in the earlier literature, including statements e. to h. above, but now on the basis of a much more comprehensive and systematic data collection (four informants from 107 Vangsnes and Johannessen: The Nordic research infrastructure for syntactic variation Art. 26, page 13 of 23 locations spread out across Norway). Four types of dialects allowing non-V2 emerge from the NSD data: A. Complex wh-constituents are allowed with non-V2 in both subject and non-subject questions (i.e. all four test sentences) B. Only short wh-constituents are allowed with non-V2 in subject and non-subject questions alike (i.e. (13) and (15)). C. Complex wh-subjects are allowed with non-V2, but only short wh-constituents are allowed in non-subject questions (i.e. all but (14) are accepted). D. Only wh-subjects are allowed with non-V2 (i.e. (15) and (16) are accepted).
The distribution of these dialect types is given in Map 5 which is also published in Westergaard et al. (2017: 26).
In the area labeled with '*' non-V2 is dismissed by the informants consulted, and for the area labeled with '?' no clear pattern emerges as far as the authors can see (see also Westendorp 2017; 2018, for an assessment of the data from this area). Vangsnes & Westergaard (2014) present data regarding the phenomenon based on searches in the Nordic Dialect Corpus (NDC). The searches were optimised to target matrix whquestions, and a gross number of 2273 hits were trimmed down to 1332 relevant ones after fragments, exclamatives and embedded clauses had been sorted out.

The spontaneous speech data
The distribution of the remaining relevant examples over different wh-items and phrases were as given in Table 1  a middle position with about every fourth instance having non-V2. Third, for the other adverbial wh-items and the wh-phrases, very few appear with non-V2.
Concerning the first observation, the distribution of V2 versus non-V2 varies across different parts of the country. Vangsnes & Westergaard (2014: 143) show that for the three short wh-items 'what', 'who', and 'where' non-V2 is far more frequent than V2 in Northern Norwegian, and that the picture gradually shifts to the opposite when one moves through Central Norwegian and Western Norwegian to Eastern Norwegian. The figures they provide can be summarised as in Table 2. 2 On this issue the corpus data complement the questionnaire data in NSD as the latter only provide information about the acceptance of non-V2: the informants were never asked to judge matrix wh-questions with V2. Although we therefore do not know the relative preference of V2 versus non-V2, the production data from the corpus suggest that non-V2 is the unmarked option in Northern Norwegian dialects for the three short items 'what', 'who' and 'where' and that there is a gradual shift in preference as we move south.
Concerning the second observation, there exist both short and long variants for 'when' in Norwegian dialects. Some dialects use the monosyllabic variant når, which is the one used in the standard varieties, but the complex hva tid, literally 'what time' is widespread, and the variants når tid 'when time' and hvor tid 'where time' are also found. We should therefore consider what variants are used in the 21 instances of 'when'-questions with non-V2. In this case we are also in the fortunate situation that one of the four wh-questions probing non-V2 in NSD is a 'when'-clause (see above), thus allowing us to compare production and judgments directly at the level of the individual.
In Table 3 each informant is listed with whichever 'when'-variant they used and how they judged the NSD 'when'-question (#33 in the questionnaire). As we see, there are five instances with the short, monosyllabic item når in non-V2 matrix wh-questions, produced by five different informants from three different locations in Central Norway. The informants' judgments of the NSD test sentence vary, but crucially the complex variant hva tid (adjusted for local pronunciation) was used during data collection also in this area, and the lesser acceptance of the test sentence may be due to the fact that the wh-expression used in the test is not the short variant the informants spontaneously use themselves. There are, furthermore, 14 examples produced by three informants from two locations in Northwestern Norway, all of whom give the NSD test sentence the highest score. This is an area known for allowing complex wh-items with non-V2, and also here the production and judgment data are in harmony. The two remaining examples both involve complex wh-expressions. They are uttered by two informants from two places in Southwestern Norway: Hjelmeland and Bergen. The Hjelmeland informant gives the test sentence a high score, whereas the Bergen informant gives it a low score.
Of the 21 'when'-clauses there is therefore only one case where there is a clear incompatibility between production and judgment. The seemingly intermediate position of 'when'clauses is thus partly due to the fact that some of them involve a short, monosyllabic variant of the wh-expression and partly to the fact that only three informants produced most of the non-V2 cases (14 of 21).
Also for the manner how questions with non-V2 found in the corpus the simple~complex issue plays a role. The form of manner 'how' varies to a considerable extent across Norwegian dialects (see Vangsnes 2008), and Vangsnes & Westergaard (2014: 145) report that eight of the nine examples of non-V2 involves monosyllabic variants (korr, koss, høss). Only the ninth example has a disyllabic variant (kelles).
In other words, the number of non-V2 questions with complex wh-expressions is even lower than it seems at first sight in Table 3. The single 'why'-clause involves a disyllabic wh-item (as is always the case in Norwegian dialects), 3 and if we put together the 'when'  (21), '(manner) how' (8), 'why' (1), and wh-phrases (9) -which amount to 39 -and subtract the ones with monosyllabic wh-expressions (5 when and 7 how), we are left with 27 non-V2 matrix questions with a complex wh-expression out of a total of 539, in other words 5%. Map 6 indicates the locations of all of the 27 examples, including the six mismatches (see Vangsnes & Westergaard 2014: 145ff, for details).
19 of the 27 examples are from four locations in Northwestern Norway, indicated by the light blue markers. Three of the cases are from three locations in Southwestern Norway, indicated by purple markers. Both of these areas are roughly the ones indicated by the letter A in Map 5 above, hence where the NSD data suggests that informants by and large accept complex wh-phrases in matrix non-V2 questions.
The yellow icon marks the single example from a location in the northwest corner of the Eastern Norwegian dialect area, more specifically from the place Lom, which by the NSD data is part of the northwestern A-area: both of the test sentences with a complex wh-phrase (wh-subject and when) receive a high score at this location. The remaining three cases run counter to the data in NSD. Two of them are uttered by informants at two locations in Central Norway, Oppdal and Røros, indicated by green markers, and at both locations the relevant test sentences receive a low score both in general and by the two specific individuals who uttered the corpus sentences in particular. The same holds for the final example from Bergen in Western Norway, indicated by a dark blue marker. Accordingly, these three examples constitute noise in the data that it would be worth following up in future studies of the topic.
Despite this slight discrepancy (3 out of 27 cases) and despite the low total number of complex wh-questions with non-V2, the overall picture we are left with when scrutinising the corpus data in the Nordic Dialect Corpus versus the judgment data in the Nordic Syntax Database is that there is a very good match between the two sources. Attempts at analysing these data from a more formal, generative perspective can be found in Westergaard et al. (2017). (See also Rognes 2011, for a study focusing on one particular dialect area; and Vangsnes 2005;Westergaard & Vangsnes 2005; for slightly older theoretical approaches.)

Subject wh-extraction and Nordgård's Generalisation
In the background section above we noted that Nordgård (1985) established a correlation between dialects that allow non-V2 in matrix wh-questions and dialects that allow the insertion of the complementiser som before the trace of an extracted wh-subject. The questionnaire based data in the Nordic Syntax Database offers information on this issue.
As detailed in Bentzen (2014), eight test sentences for wh-extraction were also included in the questionnaire, probing both subject and object extraction, with and without the presence of either of the complementisers som and at 'that' and also with a resumptive subject pronoun. The following three sentences tap into the Nordgård's generalisation (1985) and more broadly the so-called that-trace effect or COMP trace effect (see Pesetsky 2016 for an overview).
(18) a. Hvem tror du har gjort det? who think you has done it b. Hvem tror du som har gjort det? who think you som has done it c. Hvem tror du at har gjort det? who think you that has done it All: 'Who do you think has done it?' The following three maps (Map 7) details the acceptance of the three sentences in Central, Western and Eastern Norway, with white markers indicating high average scores, grey markers medium average scores, and black markers low average scores (cf. section 2). Map 7a (the leftmost one) clearly indicates that the sentence with no overt complementiser is accepted as good by everyone. Furthermore, the map in the middle shows that som-insertion mostly gets a high or medium score in Western and Central Norway but is largely rejected in Eastern Norway. The sentence with at-insertion is in contrast only fully accepted at some measure points in Eastern Norway with some medium scores further to the North in Central Norway.
The data visualised here partly support Nordgård's (1985) generalisation insofar that sentence (18b) is only accepted in those parts of the country where non-V2 is accepted. Vangsnes and Johannessen: The Nordic research infrastructure for syntactic variation Art. 26,page 18 of 23 At the same time, it is also clear that far from all speakers who allow non-V2 allow som-insertion with extraction of a wh-subject. Still, the preference for no complementiser before a subject trace position over versions with an overt complementiser is a quite wellknown fact from previous studies of Germanic languages, and it is also documented for object extraction (see Cowart 1997;Hawkins 2004;Bentzen 2014;Schippers 2017). On the presumption that the informants in the NSD survey were able to contrast the cases with and without complementiser when consulted, the overall lower acceptance for the COMP trace sentences should therefore come as no great surprise. Furthermore, although the complementarity between som-insertion and at-insertion is not perfect given the many medium score locations in Central Norway in particular, Map 8 shows that when we just compare measure points with a high score, the complementarity is quite clear: the grey dots mark locations with a high score for som-insertion and the blue ones a high score for at-insertion.
This map also shows that som-insertion is widely accepted in Northern Norway and that at-insertion is widely accepted in Finland Swedish.

Discussion
This exposition of how the issue of non-V2 in Norwegian matrix wh-questions was researched in the ScanDiaSyn project should have revealed some significant achievements and also some limitations. In many ways the data confirm what had already been established if one pieces together information from various sources in the existing literature, but they now establish this in a much more systematic and complete way. The questionnaire and corpus data furthermore also by and large confirm each other and thus strengthen the empirical basis.
One very clear limitation with the questionnaire data is that the number of test sentences is low. Relevant additional variables may not have been detected because of this. In particular, given the scarcity of non-V2 questions with complex wh -constituents in the corpus compared to the abundance of non-V2 with short wh-constituents, it would have been desirable to test out a broader range of complex wh-constituents so as to compare different kinds of wh-adverbials or wh-adverbials versus complex wh-arguments.
Furthermore, as pointed out above, there are quite clearly some cases of discrepancy between production and judgment data. A positive angle to that is that they identify areas and/or locations that need to be studied more carefully, and the northern part of Eastern Norway, marked as "?" in Map 5, in particular stands out as an area with an unclear pattern.
In this paper we have done little to put the data to statistical scrutiny. With data from over 100 locations and almost 400 individuals on a number of test sentences, the possibilities for doing so is certainly there, and one study which has approached the phenomenon in a systematic way by employing statistical methods is Westendorp (2017;2018). Without going into details, she argues that the statistics do not support all of the diachronic speculations put forth in Westergaard et al. (2017) as to how Norwegian dialects -as the only ones across North Germanic -have developed this particular violation of Verb Second. Westendorp's statistical findings do however support the general idea that the phenomenon has started with short wh-constituents and later spread to questions with complex wh-expressions.

Conclusion
In this paper we hope to have demonstrated the assets of having access to both a database of syntactic judgments and a searchable corpus of free speech when researching topics in dialect syntax. In the case of the two Nordic dialect infrastructures, the Nordic Syntax Database and the Nordic Dialect Corpus, the data have been collected from a welldistributed set of locations, and crucially both kinds of data have been collected from largely the same set of informants. The well-known shortcoming of a corpus that particular constructions or variables may be scarce is counterbalanced by the way a questionnaire can ensure information from all participants on the selected constructions/variables. On the other hand, the closed nature of a questionnaire, where topics must be decided beforehand can be contrasted to the more dynamic nature of a corpus in which data one had not thought of in advance may occur. Moreover, data from the two sources (judgment versus production) for the same individuals may confirm each other, but they may also be contradictory. The latter kind of situation may help to identify issues that need to be further investigated.
For the concrete topics that we have chosen to base our exposition on, we have seen that the data from dialect infrastructures may represent both a correction of received knowledge and a strengthened confirmation of existing knowledge. In the case of split infinitives both the judgment data and the corpus data clearly show that in spoken Norwegian placing a sentence adverb between the infinitival marker and the infinitive (split version) is much more preferred and used than placing it before the infinitival marker (unsplit version). This runs counter to the received knowledge that Norwegian allows both structures and in fact prefers the unsplit version: the data show that in Norwegian dialects the split version is both preferred and most commonly used, placing them more in line with Swedish than with Danish.
In the case of matrix wh-questions with non-V2 word order in Norwegian, the judgment data serve to confirm the rather complex pattern of variation that can be pieced together on the basis of the existing literature going back to the early 20 th century. Furthermore, the corpus data serve to complement the questionnaire data insofar that the abundance of non-V2 wh-questions in all dialect regions but Eastern Norwegian confirms that the phenomenon is widespread, and we also see an increase in the use of it as we move northwards through the country. However, the abundance of examples applies just to questions with short wh-constituents: only 5% of the non-V2 wh-questions in the corpus contain complex wh-constituents, and they are also very few compared to V2 wh-questions with the same kind of constituents. To some extent this finding squares with the insight that complex non-V2 wh-questions are judged acceptable in two rather restricted areas (northwestern dialects and southwestern dialects), and almost all of the cases are indeed produced by speakers from these areas. But even within these restricted areas the complex non-V2 cases are fewer than complex V2 cases, and this entails that weight or complexity plays a role in production even in these dialects.
One clear shortcoming with the questionnaire data regarding the issue of wh-questions and V2, is that the number of test sentences were few. This is a direct effect of the topic being part of a general questionnaire that probed a wide range of topics. When administering data collection by questionnaires there is a limit to how much time one can keep the informants' attention and willingness to respond. Since the questionnaire and production data were collected at the same time by fieldworkers visiting the various locations this is a limitation that it is hard to come around unless one sets up ways of getting back to each individual informant at later points in time. In turn, administering a system for that is laborious and resource demanding and was not viable in the case of the ScanDiaSyn project.
In any event, it seems quite clear that the establishing of the Nordic research infrastructure for syntactic variation has moved the field of North Germanic dialect syntax several steps forward, and although others may be in a better position to judge it objectively, we also believe that the output of the research collaboration has served to revitalise the field of Nordic dialectology.

Funding Information
The infrastructure was built with support from national and Nordic funding bodies: The Research Council of Norway, The Swedish Research Council, The Danish Research Council for Culture and Communication, and The Icelandic Research Fund, and the Nordic funding bodies NordForsk and NOS-HS which also provided important funding for network activities.