Culture, norms, and the provision of training by employers: Evidence from the Swiss language border ✩

Apprenticeships are the core track of the Swiss educational system at the upper-secondary level, made possible by the fact that many Swiss ﬁrms voluntarily provide appropriate training positions. However, ﬁrms’ training provision diﬀers substantively between the language-cultural regions within Switzerland. This feature of the Swiss apprenticeship system is hard to explain using conventional explanations of ﬁrm-provided training. In this paper, we argue that there are cultural diﬀerences in the norms favoring private over state provision of goods, which inﬂuence ﬁrms’ provision of training positions. Exploiting national referenda, we ﬁrst show that, within a narrow band around the language border, voters in German speaking municipalities value private over public provision of certain goods more than their French speaking counterparts. We then document a higher share of training ﬁrms on the German speaking side of the language border of 4.4 percentage points, or roughly 13%. This estimate is robust across diﬀerent sets of controls, alternative speciﬁcations, and various subsamples. Our results suggest an interplay between regional norms and local ﬁrms’ training behavior.


Introduction
Private firms provide and finance apprenticeship places in large numbers on a voluntary basis in many countries, without or with little financial support from the state. Although there are economic explanations for this behavior, existing differences in the provision of apprenticeships across and within states call for additional, complementary explanations. In this paper, we study the case of Switzerland and argue that culturally determined regional norms are important for Swiss firms' propensity to train apprentices.
Swiss firms operate in different regional cultures and are confronted with different norms. Apprenticeships are clearly most prevalent in the German language region. They are still important in the French and Italian language regions, but to a substantially lesser extent. This feature of the Swiss apprenticeship system is hard to explain using conventional explanations, because the institution of apprenticeship training is regulated on federal level and economic shocks do not stop at language borders. Instead, we argue that the population in the German language region in Switzerland has a relatively stronger preference for the private -as opposed to public -provision of goods than the population in the ✩ We thank Patrick Emmenegger, Michael Gerfin, Samuel Mühlemann, two anonymous referees as well as seminar participants in Bern and Zurich for helpful comments and suggestions. Financial support by the State Secretariat for Education, Research and Innovation (SERI) is gratefully acknowledged (contract number 1315000817). * Corresponding author. E-mail address: manuel.aepli@ehb.swiss (M. Aepli).
French language region. Accordingly, people in the German language region have more pronounced expectations of firms to provide training places for their youth. Firms react to these expectations by training more apprentices, either because they conform to the local population's expectations for profit motives, or because firm owners and employees agree with the underlying values. Kuhn et al. (2019) show that Swiss firms are more prone to train apprentices if they are situated in municipalities whose population opposed new laws which would have accorded more responsibility to the state in providing apprenticeships. A major concern in all empirical analyses on the effects of values and norms on economic agents is reverse causality. It is a priori not clear whether norms influence firm behavior or whether firm behavior influences the values and expectations of the local population. However, controlled trials are infeasible because norms cannot be randomized. Therefore, we profit from the fact that reverse causality is less of a concern when analyzing the effect of stable, slowly changing norms that are deeply rooted in culture. Becker (1996) highlights that culture is mostly inherited and individuals do not have much control over their culture. In a similar vein, Guiso et al. (2006) recommend using cultural determinants that hardly change over individuals' lifetimes, e.g. ethnicity, as instrumental variables for their beliefs and preferences. We will follow this approach by using the dominant language spoken in the municipalities at the French-German language border as a cultural determinant of the local population's norms towards the public or private provision of goods, which in turn affects firms' provision of training. This approach and the firm population data we use are novel and completely distinct from those in Kuhn et al. (2019) , who use data from a cost-benefit survey and analyze norms and training in all of Switzerland. Despite different methods and data, both papers conclude that local norms affect firm training.
Identifying the effects of norms by comparing two culturally diverse regions within Switzerland may be problematic because cantons 1 shape institutions in many policy areas. Therefore, we will focus on a language border that separates two cultural areas, but cuts through bilingual cantons in most cases. This circumstance allows us to compare the training behavior of firms that operate largely within the same political, institutional, and economic framework, but are embedded in different cultures. In a first step, we will show that norms towards the private or public provision of goods vary strongly between language regions. German language voters regularly oppose state interventions and value the private provision of goods, whereas French language voters tend to favor state engagement in the provision of the same goods. In a second step, we will show that firms' training provision differs at the language border as well. Our estimates imply a relative difference in the proportion of training firms of about 13%. This finding remains largely unaffected when controlling for canton fixed effects, firm and location characteristics, and the demand for apprenticeship training; moreover, it is robust throughout various specifications and subsample estimations.
Our contribution to the literature is twofold. Firstly, we contribute to the literature on firm-based training by investigating a novel mechanism favoring the provision of training. The dominating explanations why firms offer training are based on direct financial returns, which accrue from trainees' productive work combined with low pay ( Becker, 1962;Lindley, 1975 ), and from rents generated from retaining trained workers in monopsonistic labor markets ( Acemoglu and Pischke, 1998;1999;Stevens, 1994 ), among others. We propose a complementary mechanism based on local norms favoring private over state provision of goods, which puts pressure on firms to provide apprenticeships. Such a mechanism is also important for strengthening apprenticeship systems in other countries, a goal declared by the EU with its "Alliance for apprenticeships " and the US alike. 2 We argue that encompassing apprenticeship systems do not only rely on institutions that allow firms to train in a profitable way, but also on a favorable societal context that prompts firms to train and to collaborate in developing the whole system. Secondly, our paper contributes to the literature on the effects of culture and culturally embedded values and norms on agents' behavior. There are a large number of empirical studies analyzing effects of norms on individuals, but studies which focus on the effects of regional culture and norms on firm behavior are still scant. In a recent study, Erhardt and Haenni (2018) analyze the influence of culture on becoming an entrepreneur. Moreover, large firms' organization, that is their degree of decentralization, has been shown to vary with the level of trust in the region where the corporate headquarters are situated ( Bloom et al., 2012 ). In a similar vein, Bassanini et al. (2017) show that French firms react to social pressure and try to avoid firing employees in the region where their headquarters are situated. Gender wage gaps may be larger within firms in regions where prevailing gender norms are less egali-tarian ( Janssen et al., 2016 ). Our paper provides evidence that regional culture and norms also affect firms' training provision.
The rest of the paper is organized as follows. Section 2 provides some background information on the Swiss apprenticeship system and on the French-and German-speaking language-cultural regions. Section 3 presents the main data sources and discusses the construction of the key variables, and Section 4 sets out our empirical strategy. Section 5 discusses our main results, including several robustness checks, as well as several pieces of complementary evidence. Section 6 concludes.

Firm-provided apprenticeships
Firm-based apprenticeship training depends crucially on the willingness of mostly private firms to provide training positions. 3 About 70,000 firms train apprentices in federally regulated apprenticeship tracks every year in Switzerland. This ensures that 60 percent of a Swiss youth cohort are able to attend and complete firm-based apprenticeships, which makes it the most popular education track at the uppersecondary level ( BFS, 2018 ). Firms offer apprenticeship positions voluntarily, and youngsters typically begin immediately after finishing compulsory school at around the age of 16. Firms and apprentices (or the parents of minors) sign an apprenticeship contract for two, three, or four years, depending on the apprenticeship occupation. Apprentices work three or four days per week within the training firm, spending the other one or two days at public vocational school.
The institution of apprenticeships enjoys high support and trust in the Swiss population not least because of its century-old history since the middle ages. An important turning point was the period at the end of the 19th century, when employer associations deplored the decreasing training quality in the old, unregulated model of apprenticeships with a master craftsman in the course of increasing international competition and industrialization ( Bauder, 2008 ). After intense political debate, a first federal law on vocational education and training came into force in 1930, which defined complementary roles for employers and state authorities: the state sets a regulatory framework and operates part-time vocational schools in order to increase the quality of training, but companies retain a lot of leeway and responsibility for operating the system and providing training ( Bonoli and Schweri, 2019 ). 4 This division of tasks is still valid in today's dual system of apprenticeship training.
Firms incur costs for training that amount to roughly five billion Swiss francs per year and consist primarily of apprentice wages, wages for part-and sometimes full-time trainers, and training equipment ( Gehret et al., 2019 ). At the same time, participating firms also profit from their apprentices, who carry out productive work during the training period. According to cost-benefit surveys among training firms, it is estimated that about two-thirds of all training firms in Switzerland realize net benefits from training ( Wolter et al., 2006 ), but these benefits are often small and hard to predict at the beginning of the training period.
The training literature developed models that explain why firms train even if they incur net costs during training, which is the case for the remaining third of Swiss training firms. These models assume frictional labor markets that lead to monopsony power for training firms. They can retain their apprentices as trained workers and acquire a part of their increased productivity, thus earning rents to recoup training costs ( Acemoglu and Pischke, 1998;1999;Leuven, 2005;Stevens, 1994 ). Indeed, empirical studies for Switzerland show that training firms realize benefits from retaining apprentices (e.g. Blatter et al., 2016 ). At the same time, however, these benefits are limited by the threat of poaching by other companies ( Mühlemann and Wolter, 2011 ).
Extant models of firm training go a long way towards explaining Swiss firms' training behavior. However, these models cannot explain the differences in training incidence within Switzerland, especially between language regions. The regulatory framework for apprenticeships is set at the national level, and cost-benefit surveys show that firms in the French language regions do not incur higher training costs than firms in the German language region ( Wolter et al., 2006 ). Nonetheless, 72 percent of every cohort of young people under age 25 attend and complete an apprenticeship in the German language region, while the proportions of young people under age 25 completing apprenticeships are 49 percent and 57 percent in the French and Italian language regions respectively ( BFS, 2018 ). We thus put forward an explanation for this observation based on local norms rooted in history and regional culture.
The perceived long-run success and importance of apprenticeships has led to a common understanding in Switzerland that apprenticeships are the norm at upper-secondary level education. Since firms and employer associations have committed themselves to collective action in the creation of the dual apprenticeship system, as described in the previous section, it is also considered self-evident that firms should provide training positions for young people. Such public expectations are relevant for firms when they are reflected in the behavior of clients and employees. For example, a local baker may win clients if she trains local youth, or loose clients to nearby competitors if she does not. The baker may then decide to train as long as the full net cost of training does not exceed the profit lost due to lost clients. Alternatively, the baker may have internalized the norm and feel obliged to train apprentices even at additional cost. She would then value training in itself and maximize a function reflecting her preferences and profits, instead of profits alone.
Yet, the population's overall trust in the beneficial outcomes of collective action by private actors is not the same in all language regions. This leads to variation in the strength of norms at the language border, such that the federal institution of apprenticeships may operate differently in different cultural regions.

History, culture, language, and norms
The roles of history, culture, and norms for economic behavior and cooperation have attracted considerable interest in the literature. North (1990) describes norms as culture-specific informal constraints of behavior and stresses their role for continuity, in that "(...) the informal solution to exchange problems in the past carries over into the present and makes those informal constraints important sources of continuity in long-run societal change " (p. 37). This continuity is achieved by the transmission of cultural values and norms to new generations by parents, peers, and the media, among others ( Bisin and Verdier, 2001;Guiso et al., 2016;Tabellini, 2008 ). Thus, culture is a key mechanism for explaining how history affects current economic development ( Tabellini, 2010 ). In our case, cultural differences between language regions may affect the training behavior of firms. The temporal continuity of culture also offers an opportunity for the causal estimation of the effects of culture on economic outcomes. The fact that mother tongue and cultural roots cannot be changed intentionally reduces the potential for reverse causality ( Guiso et al., 2006 ).
The French-German language border in Switzerland has been a cultural border for centuries. At the end of the Roman empire, Latin was the dominant language in the area now known as Switzerland. The language border emerged when the Germanic Allemanni tribe immigrated into the Northern and Eastern regions. In the West, this immigration stopped at the three lakes region, where Romanic settlement was denser than in other regions. Although the resulting Franco-German language border experienced some minor shifts (for details, see Haas (2000) ), it has remained largely stable since about 1000 CE. Language is still a major attribute of regional identity and culture, as the difficult process of the foundation of the canton of Jura has proved during the last decades. 5 Another visible sign of difference are the segregated media of the three major language regions, which "reinforce the linguistic and cultural homogeneity of each " ( Rash, 2002 ).
There is considerable empirical evidence showing that the Swiss cultural-linguistic borders result in differences of attitudes and norms. Eugster et al. (2017) and Cottier (2018) find diverging work attitudes at the Swiss language border that translate into differences in job search and retirement decisions. Different norms regarding working mothers along the border also correspond to varying work participation rates and fertility among women ( Steinhauer, 2018 ). Likewise, deviating preferences for tax competition affect local tax rates, although tax competition prevents discontinuities at the language border ( Eugster and Parchet, 2019 ). Cultural effects have also been reported concerning financial literacy ( Brown et al., 2018 ) and trade ( Egger and Lassmann, 2015 ). Very much in line with our argument, Rustagi and Veronesi (2016) show with an online survey experiment that cooperative attitudes and the propensity for conditional cooperation are markedly higher in the German-speaking part of Switzerland than in the French-speaking part.
For our empirical analysis, we will use municipalities' main language as an indicator of culture 6 because most social interactions are based on language, which makes language a carrier of culture and attribute of cultural identity ( Clots-Figueras and Masella, 2013 ). Today, Switzerland is a multilingual country with four official languages. According to the 2000 Population Census, German is the first language for a majority of its resident population (63.7%). In the western part of Switzerland, bordering France, French is the official language and at 20.4% the second most prevalent first language in Switzerland. The Italian language (6.5%) is the sole official language in the Canton of Ticino in the south of Switzerland. The fourth official language, Romansh (0.5%), is spoken in parts of the Canton of Grisons. The remaining 8.9% of residents are immigrants who speak a native language other than one of the four official languages Though the following analysis focuses on the French-German language border in the western part of Switzerland, Fig. 1 shows all four language regions along with the cantonal borders. Cantons are Switzerland's most important institutional entities, enjoying a considerable amount of autonomy, for instance over taxation and the educational system (with the exception of apprenticeships). The French-German language border runs largely through the multilingual Cantons of Bern, Fribourg, and Valais, and mostly does not coincide with cantonal borders. This course of the language border is crucial for our research design because it allows us to net out institutional differences by including fixed effects at the cantonal level. 7 In addition, as pointed out by Eugster et al. (2017) , neither important geographical territories nor eco-5 A part of the French-speaking "Bernese Jura " split off from the bilingual, but mostly German-speaking canton of Bern after a series of popular votes on national, cantonal and municipality levels, which followed serious social unrest in the seventies in the Jura region. This identity-finding process is strongly linked to language, (see Siroky et al., 2017 ). 6 A municipality is considered a German (French) municipality when the majority of its inhabitants speaks German (French). Eugster et al. (2011) , Eugster et al. (2017 and Cottier (2018) proceeded similarly. 7 Appendix Fig. B.1 shows that the jump in the share of German speakers is strong. This shows that locals rarely move across the language border to nearby municipalities, which preserves cultural differences between municipalities. nomic hubs spread homogeneously across one or the other language region.

Firm-level data from the Swiss Business Census
Our main data source are the four most recent waves (1998, 2001, 2005, and 2008) of the Swiss Business Census ( BFS, 2008 ), which covers all Swiss firms operating in the second or third sector. 8 There are two reasons for using several waves. First, to focus solely on firms closely to the language border our regional analysis requires many observations. Secondly, the period 1998-2008 approximately overlaps with the time period of the referenda we use to measure local norms (see Section 3.2 below). We impose two sample restrictions. First, we exclude 172,432 observations at the firm ×census-year level of not-for-profit organizations. These organizations are arguably less bound by people's expectations and financial restrictions, and thus less suited to study firm behavior. Second, we exclude very small firms with less than three employees because only a tiny fraction of these firms train apprentices. We will, however, show in Section 5.3 that our results are robust to this sample restriction, which reduces our sample to 842,146 observations at the firm ×census-year level.
Our main dependent variable is a dummy variable , indicating whether firm trains at least one apprentice in census year . 9 Table 1 shows descriptives for the full sample and for a subsample of firms located within 20 km of the language border, henceforth referred to as the local language border contrast (LLBC) sample. Most of our empirical analysis focuses on this subsample of 50,333 firms with a share of training firms of 34.9%, slightly higher than in the full sample (see also appendix Table A.1 , which shows descriptives by language region for the LLBC sample).

Municipality-level voting results
We further use municipal-level voting results from several national referenda to measure local norms favoring private solutions over state responsibility in collective action problems. 10 More specifically, we aggregate municipality results from eight national-level referenda that took place in Switzerland between 1986 and 2014 (appendix Table A.2 contains some additional information regarding these referenda). Although dealing with quite distinct substantive issues, these eight referenda share the broader question whether citizens want to confer more responsibility on the state. For example, one popular initiative (vote 503; cf. Table A.2 ) called for more state action in the provision of apprenticeship positions, another one for a single public health insurance fund (vote 528).
In what follows, will refer to the mean share of votes in support of more private engagement, and thus less engagement by public authorities, in municipality . Note that there is no temporal variation in this measure because we aggregate voting results from the different years. 10 The literature often uses survey items to measure regional norms (see Alesina and Giuliano, 2015 ). The advantages of referenda results over surveys are that referenda are incentivized, address the full population, and are collected by the state (e.g. Stutzer and Lalive, 2004 ). As also shown in panel (a) of Table 1 , the mean share of votes in support of more private engagement equals about 54% in the LLBC sample and 58.5% in the full sample at the firm level. 11

Control variables
In most estimations shown later we include control variables to account for the relatively small remaining differences in firm and location characteristics even in our LLBC sample. We thereby focus on variables being arguably either predetermined or unrelated to our treatment that is the emergence of language regions and a local norm favoring private provision of goods, respectively. Panels (b) to (d) in Table 1 display these variables separately for the LLBC and the full sample.

Firm-level characteristics
Firm-level characteristics included in the empirical analysis below are a firm's number of employees and its square, and a dummy being one for corporations and zero otherwise. All these firm characteristics are directly taken from the Business Census.

Municipality characteristics
Because our treatments are long lasting cultural spaces and attitudes, respectively, we lack many truly predetermined control variables. We nevertheless include two municipality characteristics that we believe fulfill this requirement: municipalities' mean elevation above sea level and their share of unproductive land, which mostly consists of water and areas in the mountains not suited for cultivation, and only to a smaller extent of uncultivated vegetated areas. We thus consider these variables as good proxies for exogenous local conditions which affect local economic activities (e.g. industry structure).
Due to the spatial nature of our empirical design, we also add municipality-level characteristics to account for differences in the composition of the population and the local economy that are arguably ex-11 Note that all statistics in Table 1 are calculated at the firm level. The municipality average of is 57% in the LLBC sample and 60% in the full sample, respectively. ogenous to firms' training activities. Namely that includes municipalities' log number of residents and population density, the share of the resident population that is employed, and the average taxable income per capita. Finally, to account for varying geographical spaces along the language border, we add eight local border region (LBR) dummies (see Section 4 for details) to the municipality characteristics. We compile these municipality-level controls from various data sources indicated in Table 1 .

Demand for apprenticeship training
One concern with our identification strategy is that language border differences might also exist in the demand for apprenticeship training. To study the association between local norms and firms' provision of apprenticeship positions, we ideally want to observe a firm's training intention (or more precisely, the supply curve) no matter whether it ultimately employs an apprentice or not. In contrast, the Business Census as our main data source only captures the outcome on the apprenticeship market, i.e. a firm's training status. We are thus not able to exclude apprenticeship demand side differences between language regions with certainty. To nevertheless tackle this issue we control for a firm's potential number of apprenticeship applicants by including the proportion of people between the ages of 15 and 21 at the municipality level in most of our estimations. Moreover, some of the robustness controls (see below) also address concerns towards a concurrent jump in the demand for apprenticeship training at the language border and Section 5.5 provides additional sensitivity checks dealing with this issue.

Robustness controls
Appendix Table A.3 shows two panels of additional variables (or robustness controls as we call them) that potentially interact with firms' provision of apprenticeship positions. First, on the firm level, we consider a detailed set of 81 industry dummies (NACE 2-digit) as such robustness controls (to save space appendix Table A.3 displays an aggregated set of 19 NACE 1-digit dummies). Because apprenticeship occupations are not equally distributed across industries regional industry clusters could affect the spatial distribution of apprenticeship positions. Second, as stated above, the local demand for apprenticeship training potentially affects whether a firm finally hires an apprentice or not. To partially map the local demand for apprenticeship, we calculate three minimum distances between a firm's location municipality and the nearest municipality hosting the respective upper-secondary schooling facility (i.e. vocational school, full-time vocational school, and baccalaureate school).
However, the inclusion of these robustness control variables is ambiguous: On the one hand, we aim to show that our results are neither driven by spatially clustered industries nor by firms' closeness to different upper-secondary schooling types that potentially affect the demand for apprenticeship training. On the other hand, these variables are at least partially endogenous to our treatment and thus qualifying as "bad controls " in the sense of Angrist and Pischke (2009) . We therefore only include them in a specific estimation (column (10) of Table 4 ) and show that our main results are not driven by these additional firm characteristics and apprenticeship demand controls.

Empirical design
In our empirical analysis, we primarily focus on the estimation of the reduced-form effect of the cultural background, against which a particular firm is operating, on its decision to train apprentices. Moreover, following our discussion from Section 2.2 above, we conceptualize a firm's cultural background by the language region within which it is located. We focus on the reduced-form effect because the existing empirical literature documents differences in various outcomes at the Swiss language border; perhaps not too surprisingly, given that "culture " is, by its very nature, a broad and not uniquely defined concept. From a methodological viewpoint, however, this situation should make us somewhat cautious in arguing that there is only one specific mechanism linking culture and firm behavior (e.g. Gallen and Raymond, 2020;Heath et al., 2020 ). This also means that a reduced-form approach would not pin down a single mechanism.
For that reason, we will also provide estimates that relate the cultural background of a region with the prevailing norms towards private engagement, which we hypothesize is the most relevant aspect of culture affecting firms' training behavior. While we are not able to show that this is the only dimension of culture relevant to the question at hand, we firmly believe that we can plausibly rule out that this channel is irrelevant in explaining the difference in training behavior between the different language regions.

Basic setup
A natural starting point to relate a firm's training behavior with the cultural background, against which it is operating, is the following reduced-form regression: where the binary dependent variable indicates whether firm trains apprentices or not, and the binary variable indicates whether a majority of people speak German in the municipality in which firm is located. We also include various firm-and municipality-level controls, such as the number of employees or the mean taxable income in a municipality (see Table 1 for the full list of controls). We further include a set of census-year dummies, denoted by , in most of our specifications because we pool data from several waves of the Business Census. The census-year fixed effects pick up any shifts in the mean training incidence among firms over time. More importantly, we also include a set of cantonal fixed effects, denoted by . These allow us to control for institutional differences between the cantons (note that it is possible to include these fixed effects because there are several bilingual cantons; as evident from Fig. 1 ). The inclusion of the cantonal fixed-effects may be important because cantons have considerable leeway in both educational and financial policy. Moreover, because our main regressor, the cultural-language region dummy , varies at the municipality-level only, we show standard errors that are clustered at the municipality level throughout in the regressions below (e.g. Moulton, 1990 ).
Parameter 1 is of main interest because it quantifies the difference in the training incidence in firms located on different sides of the language border and therefore exposed to different norms regarding firms' role in the provision of training. For that reason, and assuming that individuals in the German language regions are more favorable towards private engagement, we expect the training probability among firms to be higher on the German language side of the language border, and thus we expect to find that 1 > 0 . The main problem with equation (1) , however, is that it may be implausible to assume that and the error term are uncorrelated, even when controlling for many observable characteristics of both firms and municipalities (cf. Keele and Titiunik, 2016 ). For example, otherwise comparable firms may be faced with different product demand, depending on whether they are located closer or farther away from a large metropolitan area. Similarly, firms located closer to the country border may face different labor supply restrictions than those located in the centre of the country ( Aepli and Kuhn, 2021 ).

Focusing on locally adjacent regions along the language border
This is where the geographic quasi-experiment provided by the German-French language border comes into play. 12 As argued by Keele and Titiunik (2016) , it may be possible to improve over the simple approach along equation (1) by focusing on observations that are geographically close to each other -if there simultaneously is some feature that induces a relatively sharp and strong contrast in the endogenous variable of interest. In our specific setting, for example, firms located close to the language border are arguably subject to many unobservable factors to the same extent. At the same time, they may differ strongly in their cultural background, depending on which side of the language border they are located in. Thus in our own analysis we will focus on a much smaller subset of regions and firms, respectively, to estimate the parameter of interest: with denoting the approximate travelling distance from a given municipality, within which a firm is located, to the language border (see appendix B.1 for details concerning the construction of the distance variable). By choosing a relatively narrow bandwidth , we effectively enforce a comparison between firms from either side, yet close to the language border. To reiterate, the main substantive argument for this approach is that it becomes more plausible to assume that firms located on either side of the language border are more comparable in relation to unobserved factors influencing their training decision. While it is impossible to test this presumption directly, we are able to provide some indirect evidence in support of this assumption (for example, we can show that firms become more balanced along observable variables; see appendix B.3 for details). Moreover, it will also be important to show that firms located on different sides of the language border do in fact differ in their cultural background (see Section 4.2 below on this issue).
Choosing the bandwidth is crucial in this setup, and we discuss this decision in some detail in appendix B.3 . Based on these additional analyses, we use a baseline bandwidth of 20 km around the language border in most of our specifications (LLBC sample). This implies that our analysis focuses on 302 municipalities located along the language-cultural border (about 14% of all municipalities), which provide about 7.5% of all firm ×census-year observations (cf. Table 1 ). Fig. 2 shows a map of the selected municipalities, which cluster closely along the French-German language border (cf. Fig. 1 above). When estimated using only a subset of locally adjacent municipalities and firms, respectively, we will refer 12 The term geographic quasi-experiment is borrowed from Galiani et al. (2017) . See also Titiunik (2021) on the potential benefits and pitfalls of letting "natural " experiments guide the empirical analysis. to 1 , the locally estimated reduced-form effect of a firm's cultural background, as the local language border contrast (LLBC for short) in what follows. 13 Moreover, note that we deliberately decided not to estimate a spatial or geographic regression discontinuity design (e.g. Keele and Titiunik, 2015 ). There are several, partially related, reasons for this decision. A first reason is that there is measurement error in the variables that we use to approximate a firm's distance to the language border (see appendix Appendix B for details), which prevents identification exactly at the border. A second reason is that there is some inherent heaping of observations as we select different bandwidths (cf. Barreca et al., 2016 ). 14 Our approach of comparing the conditional training incidence between firms from the two language regions is arguably more robust to this data structure than a geographic discontinuity design, which focuses exclusively on the boundary point. Third, and perhaps most importantly, note that it is implausible to assume that firms are "as good as randomly " located to this or that side of the language border, even if we focus on a very narrow bandwidth around the language border, as assumed by the canonical geographic regression discontinuity design. We therefore argue that it is not necessarily a compelling strategy to identify the effect at the cutoff point (here, the boundary line) in our specific setting. Rather, the main argument leading our analysis is the 13 However, this also implies that we prima-facie estimate a parameter which applies to this specific subsample only, paralleling the local interpretation in other, more popular designs ( Abadie and Cattaneo, 2018;Athey and Imbens, 2017 ). 14 For example, the city of Bern is located in the German-speaking part of and about 30 km away from the language border. Setting the bandwidth marginally larger than 30 km will lead to a substantive heaping of observations at that distance from the language border. goal of eliminating or at least mitigating the impact of unobservable confounders by focusing on locally adjacent firms.

Controlling for local border regions
One remaining issue with equation (2) , however, is that we are still comparing firms from different locations, even if we are focusing on a narrow bandwidth around the language border. The reason for this is that the language border stretches from the northern to the southern country border; from the cantonal border between the French-speaking canton of Jura with the German-speaking cantons of Basel-Landschaft and Solothurn, respectively, to the bilingual canton of Valais (again, see Fig. 1 ). The inclusion of cantonal dummies does take account of this issue, but only partially. We therefore also constructed a set of what we call local border regions (LBRs), representing groups of locally adjacent municipalities from both sides of the language border clustered around municipalities that provide a direct access to the other side of the language border (see appendix B.1 for additional details). Thus our baseline regression model becomes: with denoting a set of additional dummy variables indicating whether a municipality belongs to a particular LBR. In our baseline specification, we will use a set of eight distinct LBRs, but our estimates are robust to a change in the number of LBRs (cf. column (6) of Table 4 ).
The baseline set of LBRs are graphically illustrated in Fig. 3 . As evident from this figure, the LBRs stretch across the language border as well as across cantonal borders in most cases. This allows us to control for both cantons and LBRs in the regressions, with the exception of the canton of Valais, located in the southern part of the country (in that case, the cantonal and the LBR dummy are collinear). Fig. 3 also graphically illustrates our general approach in estimating the reducedform effects of a firm's cultural backround: we focus on a comparison of Notes: The figure illustrates the location of the eight local border regions used in our baseline specifications. Note that these regions do not represent existing administrative units (see appendix B.1 on the construction of these regions). The figure zooms in on the western part of the country (cf. Fig. 1 for an overall map of Switzerland).
firms located close to the language border, both along and away from the language border.

Estimating the local language border contrast in norms
We also show estimates of a LLBC in local norms towards private engagement at the language border to substantiate our claim that the two cultural regions in Switzerland differ in how they view the proper role of the state versus the role of private actors, such as employers. In this case, we focus on regressions that take the following form, paralleling our discussion from Section 4.1 above: where again denotes whether municipality belongs to the German language part of the country. The dependent variable reflects the local norm towards private engagement prevailing in municipality . Because these regressions are based on municipality-level data, we only include municipality-level characteristics , a full set of cantonal dummies and a set of LBR dummies as additional controls. As discussed in Section 3 , is constructed in such a way that higher values are associated with a stronger support of private engagement; i.e. reflects the share of votes rejecting more/additional government intervention. Therefore, we expect that 1 > 0 because we expect individuals living in the German language regions to show stronger support for private engagement. 15

The local language border contrast in norms
We start documenting the existence of a LLBC in the norm favoring private engagement. As expected, and consistent with the results from Eugster et al. (2011) , among others, we find that individuals in the German language regions of Switzerland are much more in favor of private engagement than those from the French language regions.
First, Fig. 4 shows that there is a clear and sizable difference in the norm favoring private engagement between the two language regions. On average, individuals living in the German language municipalities are much more supportive of private engagement than those living in the French language parts of the country. Also note that the mean of is relatively constant on both sides of the language border, consistent with the idea that the two language regions are characterized by different, relatively persistent norms regarding the question on how responsibilities should be shared between private and public actors (as discussed in appendix B.3 , this holds locally but not globally and, in fact, the LLBC sample is chosen such that this pattern holds locally).
Next, Table 2 shows estimates of parameter 1 from equation (4) , using the LLBC sample only. The specification shown in the first column does not include any controls and yields an estimate of ̂1 = 0 . 150 , with a robust standard error of 0.006. This estimate implies that there is a large relative difference in the norm favoring private engagement between the two language regions of about 32% ( =100% ⋅ (0 . 150∕0 . 464) , where 0.464 corresponds to the mean of among the French language municipalities in the LLBC sample). The second column adds cantonal dummies, which yields a virtually identical estimate of ̂1 = 0 . 151 . Adding further regional-level controls does not have much impact on the estimate of Fig. 4. Municipality-level norms by travelling distance to the language border. Notes: The figure plots our municipality-level measure of the norm towards privatge engagement, , against the travelling distance to the language border. The two dashed horizontal lines correspond to the respective mean of on either side of the language border (and within the bandwidth of 20 km). To draw the figure, travelling distances from French-speaking municipalities are multiplied with -1. Notes: The table shows estimates using municipality-level data. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses.
the final two columns of Table 2 , we show estimates using different parameterizations of the norm towards private engagement. Specifically, in column (5), we only use the results from the two referenda related to apprenticeship training (cf. appendix Table A.2 ). This also yields a highly significant, but substantively smaller point estimate of ̂1 = 0 . 087 . In contrast, in the last column of Table 2 , we measure the norm towards private engagement using the mean share of opposing votes across the six votes unrelated to educational issues. In this case we get a point estimate of ̂1 = 0 . 166 . Overall, the estimates from Table 2 unambiguously show that there exists a large and robust LLBC in the norm towards private, rather than public, engagement.

The local language border contrast in the proportion of training firms
Focusing on our variable of main interest next, Fig. 5 shows the municipality-level training probability among firms located in a given distance to the language border. A first thing to note is that, in the LLBC sample shown here, the share of training firms amounts to 32.9% in the French language regions and to 36.9% in the German language regions (cf. appendix Table A.1 ). Another feature that is clearly evident from the figure is the huge variation in the mean training incidence across different municipalities, reflecting the fact that there are many, potentially unobservable, factors influencing a firm's decision to train apprentices. Table 3 presents the local reduced-form estimates based on comparing the share of training firms on either side of the language border, i.e. the table shows estimates of parameter 1 from equation (2) and (3) , respectively. Column (1) of Table 3 shows that the share of training firms is about four percentage points higher on the German side of the language border. This implies a relative difference in the probability of training apprentices of about 12% at the language border (i.e. 100% ⋅ (0 . 04∕0 . 329) = 12% , where 0.329 corresponds to the proportion of training firms among the firms in the LLBC sample located in the French language part of the country, see Table A.1 ). Thus, the difference in the training probability is smaller than the corresponding difference in the voting results, but remains both economically and statistically significant.
The remaining columns of Table 3 show that this estimate is robust to the inclusion of several sets of controls. Column (2) and column (3) add dummies for canton and census year, respectively. In both cases, the resulting point estimate is virtually identical to the one from the first column. Column (4) adds firm-level controls, such as the number of employees. The inclusion of the firm covariates hardly changes the point estimate of 1 , which remains of similar size and statistically significant. In column (5), we add several municipality-level controls. Again, this has no substantive impact on the size or the precision of the associated point estimate. Finally, column (6) adds the share of adolescents aged between 15 and 21. As outlined in Section 3 , fully disentangling firms' supply of apprenticeships from youngsters' demand for apprenticeships is an issue, and including this variable is meant to mitigate this concern. Compared to the previous specifications, the point estimate remains stable at 0.044. This baseline estimate sug- Fig. 5. Municipality-level training incidence by distance to the language border. Notes: The figure plots the municipality-level incidence of apprenticeship training among privately-owned firms against the travelling distance to the language border. The circles are proportional to the number of firms located in a given municipality. The two dashed horizontal lines correspond to the respective mean training incidence on either side of the language border (and within the bandwidth of 20 km). To draw the figure, travelling distances from French-speaking municipalities are multiplied with -1.

Table 3
Estimating the LLBC in the proportion of training firms.
Training firm, (2) Notes: The table shows estimates using firm-level data. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses and are clustered by municipality. Notes: The table shows estimates using firm-level data. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses and are clustered by municipality. In column (5) the labor market dummies substitute the cantonal dummies. In columns (7) and (8) the coordinates substitute the LBR dummies.
gests a 13% difference in the share of training firms between the language regions (i.e. 100% ⋅ (0 . 044∕0 . 329) = 13% , where 0.329 again corresponds to the proportion of training firms among the firms in the LLBC sample located in the French language part of the country, see Table A.1 ). All specifications considered, the estimates presented in Table 3 consistently point to a higher training propensity of about 4.0 to 4.4 percentage points among firms in the German language part of Switzerland; and thus to a higher training propensity in regions where the norm favoring private provision of goods is more prevalent.

Robustness
We next provide several robustness checks.

Alternative specifications
We start with some alternative specifications in Table 4 . For ease of comparison, the first column replicates the baseline specification from column (6) of Table 3 .
As a first robustness check, we slightly decrease and increase, respectively, the bandwidth around the language border. Specifically, we use a smaller bandwidth of 15 km in column (2) and a larger bandwidth of 25 km in column (3). 16 The two resulting point estimates, ̂1 = 0 . 056 and ̂1 = 0 . 028 , are somewhat different from each other as well as from our baseline estimate (i.e. ̂1 = 0 . 044 ). However, the two estimates are not statistically different from our baseline estimate, in part due to the fact that the estimates are not very precisely estimated. Next, column (4) reports the estimate of 1 when we also include firms with fewer than three employees in our sample. This leads to a large increase in the sample size and, because these small firms are much less likely to train any apprentices, to a reduction in the point estimate of 1 ( ̂1 = 0 . 020 ). The relative difference is also somewhat smaller, but remains comparable to the baseline specification (about 10% for the specification from column (4) versus about 13% in the baseline specification). As a next robustness check, the specification shown in column (5) includes fixed effects at the level of local labor markets regions (LM regions) instead of cantonal-level fixed effects. 17 The resulting estimate of ̂1 = 0 . 035 is again fairly close to the baseline estimate, suggesting that our results are not driven by differences in product demand or labor supply that the firms face at the level of local labor markets. In the specification shown in column (6), we include a set of dummy variables representing 24 distinct LBRs. Again, this yields a comparable, if somewhat larger point estimate of ̂1 = 0 . 052 . Next, in columns (7) and (8), respectively, we add the coordinates of a municipality (i.e. the coordinate of the centroid of the municipality) or the coordinates of the firm building as additional controls. Note that, in these two specifications, we do not simultaneously include the set of LBR dummies. Once again, in both cases, the resulting point estimate of parameter 1 is similar to our baseline estimate from the first column, ̂1 = 0 . 036 and ̂1 = 0 . 035 . The next column shows that our estimate is also robust to a specification that excludes those firms located closest to the language border from the estimation sample ( ̂1 = 0 . 047 ). In the final column of Table 4 , we include some additional, potentially endogenous controls, such as a set of dummies controlling for the detailed industrial affiliation of a firm. 18 Once again, the resulting point estimate of ̂1 = 0 . 047 remains robust to this change in the regression specification.

Using the geocoded information from the Business Census
In a further robustness check we use the linear distances between firms' location, instead of the effective travelling distances between municipalities, to approximate a firm's distance to the language border (see appendix Appendix B for additional details and a comparison between the two distance measures).
In this case, we first choose a bandwidth of 10 km because this bandwidth is associated with a sample of approximately the same size as in our baseline estimate from column (6) of Table 3 . 19 In the first column of Table 5 , we show the resulting estimate without any additional controls. This specification yields an estimate of ̂1 = 0 . 035 , which is again close to our baseline estimate. The second column includes the full set of control variables, yielding an estimate of ̂1 = 0 . 038 . In the 16 Changing the bandwidth leads to relatively large changes in the sample size. Decreasing (increasing) the bandwidth to 15 (25) kilometers results in a sample of 34,108 and 66,612 firm-level observations. 17 While cantons are arguably Switzerland's most important institutional entities, labor markets often do not coincide with these institutional borders. LM regions are therefore defined by the FSO as regions with common commuting patterns towards their centers. We thus interpret the coefficient from column (5) as the correlation between the local norm regarding private engagement in the provision of goods and the training behavior of firms that operate in the same market in terms of products they sell and labor supply they face. 18 See Section 3.3 and appendix Table A .3 . 19 As expected, the resulting sample of firms is similar, but not identical to our LLBC sample. Specifically, 43,656 firms are included in both samples (87% of the original LLBC sample), the remaining 6677 from the original LLBC sample are further away than 10 km of linear distance from the language border while 2415 firms are located within 10 km of linear distance but not within 20 km traveling distance from the language border. Notes: The table shows estimates using firm-level data. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses and are clustered by municipality.
remaining two columns of Table 5 , we again show estimates based on a somewhat smaller and larger bandwidth, respectively. Specifically, setting the bandwidth to 5 (15) kilometers yields an estimate of ̂1 = 0 . 062 and ̂1 = 0 . 027 , respectively. Paralleling the pattern from Table 3 above, changing the bandwidth appears to have more impact on the point estimate than changing the set of control variables.

Focusing on firms located in one of the three bilingual cantons
In a final robustness check, reported in Table 6 , we focus exclusively on firms that are located in one of the three bilingual cantons along the German-French language border (i.e. the cantons of Bern, Friboug, and Valais). As discussed in Section 2.2 , the language border partly coincides with cantonal borders. Due to the extensive political autonomy Swiss cantons enjoy, this may threaten our identification strategy. 20 This reduces the sample size down to 28,612 firms, for a bandwidth of 20 km. This large reduction in the number of observations is due to the fact that a large number of firms are located in the small, but densely populated area in the northern part of the language border where the language border coincides with cantonal borders.
In column 1, we first replicate our baseline estimate using the full set of controls and using a bandwidth of 20 km. This yields an estimate of ̂1 = 0 . 054 , which is somewhat larger than, but not statistically different from our baseline estimate from column (6) of Table 3 . The next two columns again change the bandwidth to either 15 or 25 km, respectively. Similar to Table 4 , changing the bandwidth does lead to somewhat different point estimates. In the remaining three columns of Table 6 , we use an alternative parameterization of a firm's distance to the language border. Specifically, we compute a firm's travelling distance to the nearest municipality on the other side of the language border, but located in the same canton. Thus, in these three columns, we strictly compare firms located in the same canton, but from different sides of the language border, with each other. Using the same baseline bandwidth of 20 km and the full set of controls, this yields a point estimate of ̂1 = 0 . 046 , which also remains close to our baseline estimate. The final two columns again set the bandwidth to either 15 or 25 km, also yielding comparable estimates (i.e. ̂1 = 0 . 41 and ̂1 = 0 . 033 ).

Complementary evidence from survey data
In a next step, we provide some complementary evidence based on additional survey data.

Reverse causality reflected in the norm measurement
A potential concern is that the local level of support for the private provision of a good in a given municipality is endogenous to the actual provided level of this good in the municipality. In particular, peo- Notes: The table shows estimates using firm-level data. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses and are clustered by municipality.
ple living in regions where firms provide many apprenticeship positions may not feel any need to increase the number of apprenticeship positions and therefore disapprove state interventions to promote apprenticeships. As noted in Section 2.2 , this is less of a concern for norms that are rooted in slowly changing culture ( Guiso et al., 2006 ). Nonetheless, institutions and current practices will also change culture in the longrun ( Alesina and Giuliano, 2015 ).
To dispel these concerns, we exploit additional data from an exit poll of the "apprenticeship initiative " (vote 503, cf. appendix Table A.2 ), which explicitly demanded that the state should take more responsibility within the apprenticeship system. In the exit poll, a small but representative sample of voters were asked for the motives behind their vote. Panel (a) of Table 7 shows on the one hand that only 30% of all voters opposed to the initiative in French language regions state that the provision of apprenticeships is not a task the state is responsible for, compared to 46% in German language regions. On the other hand, it yields no significant language region difference in the share of voters in favor of the initiative agreeing with the pro argument that there should be enough apprenticeship positions; despite the fact that actual apprenticeship provision by firms is lower in French language regions. Taken together, this suggests that the higher numbers of "yes " votes in the "apprenticeship initiative " among voters in the French language area are not triggered by low levels of apprenticeship provision by firms, but that French speaking voters are indeed less critical of state engagement in the provision of apprenticeships, as argued in Section 5.1 .

Individuals' attitudes and behavior
Moreover, a norm favoring private engagement should not only be visible in voting results, but also in other aspects of peoples' lives and particularly their behavior. In panels (b) and (c) of Table 7 we use additional survey data (details are given in the table notes) and show the contrast between the two language regions for each outcome considered, both without and with additional controls. First, panel (b) of Table 7 shows that the share of individuals who prefer state over private ownership of businesses is significantly lower in German language regions than in French regions. This difference is robust to the inclusion of various individual-and municipality-level controls. This is in line with a norm that values private over public provision of goods. Next, panel (c) of Table 7 compares three measures of individuals' private engagement across the two language regions. These three items ask whether people (i) actively participate in associations (sports, social, or cultural), (ii) worked voluntarily during the last four weeks, and (iii) donated during the last year. All three indicators show higher values for individuals in German language regions and are again robust to the inclusion of covariates. Panel (d) is restricted to the LLBC sample and complements this evidence by showing a higher share of people that provide work for which they are not or only partially compensated in German compared to French language areas. Overall, unpaid private engagement seems to be more widespread among individuals in German regions, consistent with the divergence in local norms documented in Section 5.1 above.

Demand for apprenticeship training
Another major concern to our identification strategy are differences between language regions in the demand for apprenticeship training, which would lead to differences in the number of apprenticeship contracts and thus differences in the incidence of firm training. In order to analyze the demand by apprenticeship candidates, we use additional data from the fourth survey on the costs and benefits of apprenticeship training ( Gehret et al., 2019 ). In this survey, employers were asked about the number of applications they receive for each open apprenticeship position. Table 8 shows that employers in the French language regions receive more applications on average than those in the German language regions. The difference in the number of applications is large and statistically significant, both in the full and the LLBC sample. In the full sample, the number of applications received is about 21% to about 52% lower among employers located on the German side of the language border, depending on whether controls are taken into account or not. Similarly, in the LLBC sample, the number of applications for each open position is about 44% to 63% lower among firms located in the German language part of the country. Assuming that there is no difference in search behavior across the two language regions, these estimates suggest that the demand for apprenticeship training (relative to the supply) is actually higher, not lower, in the French language part of the country. This is probably the most compelling piece of empirical evidence in support of our claim that the differences in the training probability among employers is not (mainly) driven by differential demand for apprenticeship training. 21 In the rest of this subsection, we look directly at alternative options for pupils after compulsory schooling. Besides apprenticeships, baccalaureate schools are the most relevant track at the upper-secondary level, with roughly 20% of all adolescents entering this track annually. Firms recruiting apprentices after compulsory schooling thus compete to some extent with baccalaureate schools for high-ability pupils. Bac- 21 We provide additional evidence in appendix Table A.4 . Firstly, selection into apprenticeships might differ because surveys (e.g. Busemeyer et al., 2011 ) show that they enjoy a less favorable reputation among French than among German speakers. In column (1), we show that the difference in the PISA scores between those starting an apprenticeship and those opting for general education programs, i.e. baccalaureate schools, at the upper-secondary-level is the same in the two language regions within the bilingual canton of Bern (coefficient of the interaction between the German and the apprenticeship dummy, × ). This suggests that there is no differential selection of apprentices in the French language part of the canton than in the German language part. Secondly, labor market perspectives could be better for apprenticeship graduates from German regions than for those from French regions. Using data from the Swiss Labor Force Surveys 2010-2014, we find that labor market outcomes (wages, unemployment) for apprenticeship graduates compared to graduates from baccalaureate schools are at least as good for French speakers than for German speakers ( × in columns 2 to 5). Finally, there is no language area difference in the apprentice wage conditional on controls ( in columns 6 and 7). Notes: Panel (d) is restricted to observations within 20 km of the language border; due to a lack of municipality identifiers, this restriction was not possible in the other panels. OLS regression of answers on a German language region dummy and demographic controls (i.e. age, gender, civic status, and nationality). Panel (a) "Reason for No " refers to a dummy being 1 if respondents indicate as reason for their individual no-vote in vote 503 (see Table A.2 in the appendix) either "economy should react by itself " or "firms not responsible for (apprenticeship-)market " or "can't be forced by the state " or "other statement concerning the responsibility of the economy " or "self-responsible " and 0 otherwise. Panel (a) "Reason for Yes " refers to a dummy being 1 if respondents indicate as reason for their individual yes-vote in vote 503 (see Table A.2 in the appendix) either "pupils need apprenticeship positions " or "not enough apprenticeship positions " or "right to get an apprenticeship positions " or "more opportunities for apprentices " and 0 otherwise. Panel (b), question (1) refers to a ten-point scale ranging from 1: "Ownership of business should be private " to 10: "Ownership of business should be public ". Answers displayed in panels (c) and (   Notes: The table shows the proportion of school-based apprenticeship programs of the total apprenticeship cohort (panel a) and employment rates for different age groups (panel b) in the full and the LLBC sample (20km around the language border), respectively. The square brackets contain the number of observations in the respective cohort. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Sources: Federal Statistical Office, Vocational Education and Training (VET) -Apprenticeships 2013 (school-based apprenticeship programs) and Population Census 2000 (employment rates).
calaureate schools rates at the upper-secondary level vary substantially across Swiss cantons and tend to be higher in French language cantons. 22 Therefore, one may conclude that firms in French language regions simply do not train because they lack potential apprentices. However, this subsection examines two alternative tracks for pupils after compulsory school that could provide firms with apprenticeship applicants. Both these tracks are more common among French language pupils. Instead of applying for a training place at a firm, pupils opting for an apprenticeship after compulsory schooling may enter school-based apprenticeship programs. These programs equip youngsters with similar vocational skills as provided during a firm-based apprenticeship program. Firm-and school-based apprenticeships are thus close substitutes from the viewpoint of pupils. Panel (a) of Table 9 shows that school-based apprenticeship programs as a proportion of the total apprenticeship cohort are more frequent in French language regions. This finding is ambiguous. It could mean that youth and parents in French language regions prefer school-based over firm-based apprenticeships; however, the finding is also consistent with a lower training commitment by firms in the French language part of the country, which forces more young people to attend school-based apprenticeship programs to acquire upper-secondary education.
To shed more light on this issue, panel (b) of Table 9 looks at pupils' options for entering the labor market directly after compulsory schooling. Indeed, labor market participation for youth between age 16 and 18 is higher in French language regions. This is somewhat surprising, considering that this is reversed for older cohorts, as the rest of panel (b) displays. Pupils entering the labor market directly after compulsory 22 The share of pupils opting for baccalaureate schools after compulsory schooling varies from the Cantons of Obwalden (11.0%), Glarus (12.2%), and Schaffhausen (13.0%) at the lower end to Ticino (27.3%), Geneva (29.4%), and Basel-Stadt (29.6%) at the upper end in 2016; data are taken from the FSO ( https://www.bfs.admin.ch/bfs/en/home/statistics/cataloguesdatabases/tables.assetdetail.2421478.html ). schooling represent potential apprenticeship applicants for firms, suggesting that the bottleneck in apprenticeships in French language regions is on the supply rather than on the demand side.

Conclusions
In this paper, we show that norms rooted in culture affect firms' provision of training. We contrast firms within a small band around the inner-Swiss French-German language border, where voting results reveal substantial differences in norms favoring private engagement and the private instead of public provision of goods. Our empirical analysis shows that firms on both sides of the language border that are otherwise similar differ in their probability of providing apprenticeship training. Our preferred estimate yields a higher share of training firms on the German side of the language border of 4.4 percentage points compared to the French side of the language border. Considering firms' average training propensity of roughly 30% across Switzerland, this difference is not only statistically significant but also economically relevant. This result is robust across different specifications and subsamples. Moreover, our estimates are well in line with the estimates of Kuhn et al. (2019) , who estimate the same parameter, but with an entirely different empirical strategy and different firm data.
We discuss additional evidence for different norms and behavior in the populations in the two language regions: German-speaking individuals show considerably higher private engagement, e.g. in their propensity for voluntary work, and they have a stronger preference for private-over state-owned business than their French-speaking counterparts. While it is not possible to completely rule out differences in the demand for apprenticeship training at the language border, we show that training firms receive more, not fewer applications for every vacant apprenticeship position in French language regions. These findings strongly suggest that lower training in French language municipalities is not due to lower demand but due to lower provision of training by firms.
Our results are also in line with a small literature that found effects of norms on other kinds of firm behavior, such as dismissals and gender pay gaps ( Bassanini et al., 2017;Janssen et al., 2016 ). For the Swiss case, our findings contribute to understanding local and regional differences in training provision, despite federal regulation of the institution of apprenticeships. Looking at international differences in the prevalence of firm-based training for youth, our findings suggest that historically grown dual apprenticeship systems are embedded in a culture and societal norms that favor private engagement, which might limit the implementation of firm-based apprenticeships in other contexts, independently of potential economic returns for firms. Notes: The table shows mean values across the LLBC sample (bandwidth of 20 km around the language border). All sample statistics are calculated at the firm level. Estimated differences of variables that are collected at the municipality level (local norm, distance to the language border, location and apprenticeship demand characteristics) are clustered at the municipality level. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. The two bilingual municipalities of Biel/Bienne and Fribourg/Friburg are excluded; moreover, we do not observe at least one firm in every municipality. Data sources see Table 1 .  Notes: The table shows estimates using individual data. Dependent variables are (1) test scores (average of math, reading, and science), log monthly earnings per full time equivalent (2 and 3), an unemployment dummy (4 and 5), and log apprentice wage (6 and 7). is a dummy being 1 for individuals living in German speaking regions and 0 for individuals living in French speaking regions.

Appendix A. Additional tables and figures
refers to a dummy taking the value 1 for pupils planning to start an apprenticeship after compulsory schooling (in column 1) and a dummy taking the value 1 for people who entered an apprenticeship after compulsory schooling (in columns 2-5), respectively, and 0 otherwise. ⋆⋆⋆ , ⋆⋆ , and ⋆ denote statistical significance at the 1%, 5%, and 10% level, respectively. Robust standard errors are given in parentheses. Programme for International Student Assessment 2003. Swiss Labor Force Survey 1999-2014. We additionally restrict the samples in columns (2) to (5) to individuals who opted for one of the two main educational tracks at the upper-secondary level (i.e. apprenticeship or baccalaureate school).

Appendix B. Setting up the empirical analysis
In this appendix we discuss in some detail our approximation of a firm's distance to the language border, using two different distance measures, as well as our choice of a baseline bandwidth of 20 km around the language border.

B1. Travelling distances between municipalities
In a first step, however, we shortly describe the data that we use to describe firms' position relative to the French-German language border and ultimately to select the sample of municipalities and firms used in the empirical analysis. The data we use for this purpose are from the Federal Office for Spatial Development ( "Bundesamt für Raumentwicklung ") and they contain an estimate of the effective travelling distances across the different Swiss municipalities based on various data sources such as mobile data (e.g. ARE, 2010 ). Specifically, we use data on average effective travelling distances between all possible pairs of municipalities. 23 That is, the data do not simply represent Euclidean distances between pairs of municipalities, but distances covered when actually travelling from one to another municipality along the available infrastructure (i.e. along roads and highways).
Using these data has both pros and cons. On the positive side, note that these distances implicitly take geographic features into account because they reflect distances travelled along the available infrastructure. For example, in steep terrain, travelling distances will be larger than linear distances because roads will run in a serpentine line in such terrain. Moreover, they naturally map an otherwise two-dimensional setup (i.e. coordinates in a plane) on a single dimension, which is conceptually and practically easier to work with (however, see Section 5.3 in the main text on the remaining issue that the effect might vary along the language border, which stretches from the northern to the southern country border). On the other hand, there are also some weaknesses. First, the data are aggregated at the municipality level, which means that we must assign the same travelling distance to the language border to all firms located in the same municipality. Second, we can only approximate the distance to the language border with the minimum travelling distance (as discussed in more detail below). This implies that there is necessarily measurement error in our variable used to determine a firm's distance to the language border; moreover, the measurement error is systematic because the distance to the language border is always, and necessarily, shorter than the distance to the nearest municipality located on the opposing side of the language border. See also appendix B.2 below for an alternative distance measure with somewhat different (dis)advantages.

B1.1. Approximating the distance to the language border
We proceed as follows to approximate the distance of a particular firm in the Business Census to the language border. First, we define the majority language for all 2352 Swiss municipalities based on the Population Census from the year 2000 and assign every municipality to the French or German language region. 24 As Fig. B.1 shows, this yields quite a sharp classification into either the French or the German language region. Nonetheless, close to the language border, there are some 23 More specifically, we have access to mean travelling distances by motorized individual traffic (i.e. distances covered when traveling by car, bike, or truck) on a weekday. Also note that the data are implicitly weighted by locations of different size within the same municipality in so far as there are more individual trips from and to larger locations than from and to smaller locations in a given municipality. Similarly, if there are several ways to travel from one to the other municipality, this is also reflected in our distance measure (again, weighted by the relative frequency of the different routes). 24 Like Eugster et al. (2017) , we exclude the two bilingual cities of Biel/Bienne and Freiburg/Fribourg because these cannot be assigned unambiguously to a language-cultural region. municipalities with a significant minority of individuals not speaking the dominant language.
Then, for every French language municipality, we keep the shortest travel distance to any German language municipality, and vice versa for every German language municipality. In the full sample, travel distances for French language municipalities range from 1.4 to 141.9 km, for German language municipalities from 1.4 to 308.1 km (reflecting the fact that the German speaking regions are spread over a relatively larger part of the country; cf. Fig. 1 in the main text).
A graphical representation of this variable is shown in Fig. B.2 . Note that, broadly, the minimum travelling distance to the (other side of the) language border closely matches the language border, identified by the proportion of first-language speakers, as shown in Fig. 1 in the main text.

B1.2. Defining local border regions
We also use these travelling distances to define a set of local border regions (LBRs). We use the LBRs in the empirical analysis to take into account that the German-French language border stretches from the northern to the southern country border, along a length of approximately 250 km. Including the LBRs as additional controls will make sure that we compare firms that are relatively close to each other along the north-south extension of the language border.
In a preliminary step, we identify those municipalities that are located directly at the language border (i.e. we identify those municipalities that show up as nearest municipality for one or more regions from the other side of the language border; in most cases, this should reflect that there is a direct possibility, i.e. a road, to cross the language border within the borders of this specific municipality). 25 In a second step, we then assign all neighboring municipalities that are located within the selected baseline bandwidth of 20 km (see appendix B.3 below) but that do show up in the preceding step. There is no obvious way to implement this step in a purely data-driven way, and thus we mainly rely on the existing administrative borders and the topographical map of Switzerland in this step of the analysis. 26 In our baseline estimates, we use a set of eight distinct LBRs in most specifications (but the estimates are robust to changes in the number of LBRs, cf. column (6) of Table 4 in the main text). The baseline set of these eight LBRs is graphically illustrated in Fig. 3 in the main text.

B2. Geocoded locational information from the Business Census
Moreover, the Swiss Business Census itself also provides (restricted access to) spatial information, in the form of the exact coordinates of the physical location of a company ( BFS, 2009 ). 27 This information can also be used to construct an alternative measure of a firm's distance to the language border, i.e. by constructing the linear distance from a firm located on one side of the language border to the nearest firm located on the other side.
Because both available distance measures are subject to measurement error, it is not obvious whether this, supposedly more accu-25 For example, the two municipalities of "Saanen " (located in the Germanspeaking part of the canton of Bern) and "Rougemont " (located in the Frenchspeaking canton of Vaud) are neighboring municipalities, located on either side of the language border. One can drive by car or bike from one municipality directly to the other, and vice versa, while passing the language border at the same time. 26 Using the same example as in footnote 25 , we then add, for example, the municipalities of "Zweisimmen " (located to the east of "Saanen ") and of "Châteaud'Oex " (located to the west of "Rougemont ") to the same LBR. Neither of these two municipalities provides a direct access to the other side of the language border, but both are located close the language border. 27 In most cases, these coordinates point to the physical building or grounds in which the company is located in. In cases where the data do not contain the exact location, they refer to the center of a grid (of the size of 1 × 1 km) within which the firm is located .  Fig. B.1. Share of German-speakers around the language border. Notes: The figure plots the share of German-speaking individuals against the travelling distance to the language border (the data are aggregated by bins of travelling distances, in steps of one kilometer). To draw the figure, travelling distances from French-speaking municipalities are multiplied with -1.

Fig. B.2.
Minimum travelling distance to the language border. Notes: The figure shows the minimum effective travelling distance from a municipality to the language border. The figure zooms in on the western part of the country (cf. Fig. 1 in the main text for an overall map of Switzerland). rate, information is preferable over the travelling distances discussed in appendix B.1 above. As discussed above, when using the effective travelling distances, we (must) necessarily ignore the distance between a firm's location to the center of a municipality because the data are only available at the municipality level. In that case, the resulting measurement error can be either positive or negative because firms located in the same municipality can be either closer of farther away from the language border, depending on their exact location within the municipality borders. In contrast, when using firm-based linear distances, we (must) ignore the fact that effective travelling distance are in large part dictated by geographic features of the landscape. Thus in this case the measurement error is expected to be systematically positive. Indeed, for our LLBC sample of firms, a simple regression of the travelling on the linear distance to the language border shows that, as expected, effective travelling distances tend to be substantively higher than linear distances. 28 Based on this direct comparison between the two available distance measures, we decided to use the effective travelling distances as our main distance variable. However, we also show a few estimation results that do use this information instead of the effective travelling distances 28 Specifically, we find that effective = 5 . 457 + 1 . 004 × linear , with an Rsquared of 0.617. Thus the correlation between the two measures equals 0.786, implying that the two measures are strongly (but far from perfectly) correlated with each other. Note that in the LLBC sample, the relative difference between the two measures is largest for those firms located closest to the border .  Fig. B.3. Municipality-level norms by minimum distance to the language border. Notes: The figure plots our municipality-level measure of the norm towards private engagement, , against the travelling distance to the language border (the data are aggregated by bins of travelling distances, in steps of one kilometer). To draw the figure, travelling distances from French-speaking municipalities are multiplied with -1.
(i.e. we show estimates that include the firm-level coordinates as additional controls in column (8) of Table 4 in the main text, and we also show some specifications where the selection of the analysis sample is based on these linear distances rather than on effective travelling distances; see Table 5 in the main text).

B3. Selecting the appropriate bandwidth
We next discuss in some detail how we use these data to determine the baseline bandwidth around the language border. This is, besides the definition of a firm's distance to the language border, the key ancillary parameter of our empirical analysis because it determines both the subset of municipalities and of firms that enter our estimation sample. Note that, in all the analyses presented below, we use the travelling distances from appendix B.1 to measure a firm's distance to the language border.

B3.1. Searching for a bandwidth where norms are constant within each language region
In a first step, we use our measure of the local norm towards private engagement, i.e. , to determine a bandwidth such that the norm is approximately constant within each of the two regions, albeit different across the two regions. A feature that is interesting on its own, note first that the data clearly show that there is a global, highly nonlinear trend in when travelling closer or farther away from the German-French language border, as evident from Fig. B.3 . This pattern suggests that it may be unreasonable to argue that we can conceptualize a firm's cultural background simply by a binary variable indicating whether it belongs to the French-or the German-speaking part of the country.
Thus our aim in this first ancillary step of the analysis is to select a bandwidth around the language border such that is approximately constant on either side of the language border. If it is possible to find such a bandwidth, we would argue that this would provide support in favor of our conceptualization of culture as a binary variable as well as in support of the argument that the assumption of unconfoundedness between and the error term should become more plausible when focusing on firms located closer to the language border (cf. Keele and Titiunik, 2016 ).
To search for such a bandwidth in a data-driven way, we run a series of ancillary regressions of the following form: = 0 + 1 + 2 + 3 ( × ) + ∀ ∈ { ∶ ≤ } , (B.1) with denoting our baseline measure of the local norm towards the role of the state in municipality . Again, is a dummy variable indicat-ing whether municipality is a German-speaking region (in which case = 1 , and 0 otherwise), and corresponds to the (minimum) travelling distance, in kilometers, to the other side of the language border (as explained in appendix B.1 above). In estimating equation (B.1) , we let the bandwidth vary from 140 km (the maximum distance that covers firms from both sides of the language border) down to 5 km, in steps of one kilometer, i.e. we estimate equation (B.1) for an ever narrower bandwidth around the language border (in the figure, however, we only show the test statistics for bandwidths in the range of 5 to 50 km). For each of these regressions, we test the null hypothesis that 2 = 3 = 0 , i.e. that the two coefficients associated with are simultaneously equal to zero. Fig. B.4 plots both the resulting F-statistic, as well as the associated p-value (i.e. the hollow (filled) circles denote test statistics that are associated with a p-value equal to or larger than (smaller than) 0.1), against the corresponding bandwidth. Consistent with Fig. B.3 above, the associated F-statistic is very large for broader bandwidths. At the same time, the figure also shows that it is possible to select a bandwidth such that becomes approximately constant on either side of the language border. Specifically, the figure suggests that norms towards private engagement become approximately constant within each of the two language regions once the bandwidth is set to about 45 km or lower.
At the same time, however, we also want to show that the language regions do not become balanced with regard to the prevailing local norm towards private engagement, as this would undermine our substantive argument concerning the hypothesized underlying mechanism. We explicitly take up this issue in Section 5.1 in the main text.

B3.2. Smoothness in the density of observations around the language border
In a second step, we look at the density of both firms and municipalities within different bandwidths around the language border (again restricted to a maximum bandwidth of 50 km around the language border, consistent with the preceding step of the analysis). In our specific context, it is the natural clustering of observations, for example due to the location of larger metropolitan areas at a specific distance to the language border, that may be of potential concern since larger urban areas have a substantively lower training incidence than smaller localities (see Barreca et al., 2016 , who discuss the same issue in the context of the regression discontinuity design). Fig. B.5 thus simply plots the cumulative density of firms, starting at and then moving away from the language border. While there are some regional differences in the density of firms across the full range considered, these differences become distinctly more important at distance of Fig. B.4. Testing for stable norms within the two language regions. Notes: The figure plots the F-statistic associated with the null hypothesis that the norm towards private engagement is approximately constant on either side of the language border (cf. Eq. B.1 ). The hollow (filled) circles denote that the corresponding F-statistic has a p-value equal to or larger than (smaller than) 0.1.

Fig. B.5.
Smoothness in the density of firms around the language border. Notes: The figure shows the cumulative number of firm-level observations in a given distance away from the language border. The two sharp increases in the number of firms on the Germanspeaking side of the language border coincide with the position of the cities of Bern (about 30 km away from the language border) and Basel (about 37 km away from the language border).
about 30 km or more. 29 Thus Fig. B.5 suggests to set the bandwidth to 30 km at most.

B3.3. Balancing of covariates
In a complementary step, we use a set of both firm-and municipalitylevel control variables and check whether narrowing the bandwidth leads to a better balancing of the covariates across the two language regions (cf. Keele and Titiunik, 2016 ). Fig. B.6 shows the robust F-statistic from testing the overall null hypothesis that, in a regression of on the full set of controls and for a chosen bandwidth , all coefficients are simultaneously equal to zero, i.e. we estimate:

2)
29 It turns out that this pattern is indeed driven by metropolitan areas, which are situated at specific distances away from the language border. Specifically, Bern is located 30 km away from the language border, Basel 37 km (both cities are located in the German-speaking part of the country).
for different bandwidths and test whether 1 = 2 = 0 . The pattern from Fig. B.6 shows that the resulting F-statistic is high for larger bandwidths, reflecting the fact that, in general, the firms from the two language regions differ substantively from each other. However, and as expected, the figure also shows that the two subsamples become more balanced when we focus on a narrow bandwidth. Based on this figure, a bandwidth in the range of about 10 to 30 km appears most appropriate -in the sense that the two subsamples are relatively well balanced for these bandwidths. See also

B3.4. Determining a minimum bandwidth
Finally, we discuss the issue that it may be necessary and reasonable -in our specific context -to determine a minimum bandwidth as well. A first reason is due to the measurement error in our distance measures, as discussed above. A second and more important reason is based on the fact that, when using effective travelling distances, changing the bandwidth is associated with a municipality-wise inclusion or exclusion of observations; moreover, for small bandwidths, this may imply a rather Fig. B.6. Testing for covariate balance across the two language regions. Notes: The figure plots the F-statistic associated with the null hypothesis that the control variables are approximately balanced across the two language regions (cf. equation (B.2) . The hollow (filled) circles denote that the corresponding F-statistic has a pvalue equal to or larger than (smaller than) 0.1.

Fig. B.7.
Determining a minimum bandwidth. Notes: The figure maps the set of municipalities located on either side of the language border within a given bandwidth (bw), with lighter (darker) shaded areas denoting municipalities located on the German-speaking (Frenchspeaking) side of the language border. The figures zoom in on the western part of the country (cf. Fig. 1 in the main text for an overall map of Switzerland).
haphazard selection of particular municipalities on this or that side of the language border. Fig. B.7 illustrates the second issue very clearly: once we set the bandwidth to 10 or 5 km, we effectively select a small set of municipalities that appears to be scattered quite haphazardly along either side of the language border. In effect, then, this implies that we are comparing firms from different locations along the language border with each other once the bandwidth become too small (this may also explain why the covariates become less well balanced again when choosing too small a bandwidth; see Fig. B.6 above). We would argue that this undermines rather than strengthens our empirical approach.

B3.5. Baseline bandwidth
Summing up our discussion related to the choice of the appropriate bandwidth, we have first determined a maximum bandwidth of about 45 to 50 km. Then, focusing on both covariate balance and smoothness in the density of firms and municipalities, respectively, we have further narrowed down the bandwidth to a range of about 10 to 25 km. We have also shown that using a bandwidth of 10 or less kilometers may have undesirable sideffects, mainly because the selection of municipalities (and thus firms) becomes haphazard, in the sense that the set of municipalities, and thus firms, does not represent the full language border anymore; this, in turn, also reflects the fact that firms are selected in blocks because our baseline distance measure is defined at the municipality level only.
In our empirical analysis presented in the main text, we will therefore set the baseline bandwidth to 20 km, but we will also show estimates based on a slightly smaller/larger bandwidth (15 and 25 km, respectively).