China's domestic production networks

This paper examines China’s domestic production networks. It uses VAT invoices to build interprovincial input-output tables for 2002 and 2012. These are combined with population censuses to determine the location of workers involved in production. We document i) increased trade in intermediate inputs between provinces; ii) inter-provincial production fragmentation that differs by product; iii) substitution of domestic for foreign intermediates, resulting in increased domestic value added in exports. Information about the occupations of workers suggest that iv) richer coastal areas such as Beijing, Tianjin, and Shanghai specialize in R&D and marketing activities, whereas v) inland provinces specialize in production activities.


Introduction
Trade agreements and better communication and management systems have contributed to the fragmentation of production. Nowadays, firms source inputs from a variety of suppliers in far-away places. A rapidly growing body of literature studies the causes and consequences of this slicing up of global value chains (Baldwin, 2016). The focus of this literature is on the effects of global sourcing on industries, firms and workers, treating countries as points in space with little regard for the economic linkages between firms within economies.
Measuring activities performed by China, the world's largest manufacturer, and understanding how its activities shape global trade is an important topic for academics and policy makers. Yet, who does what and where? There is a lot we do not know about how production is fragmented and where economic activities are located within China.
The main contribution of this paper is to build a new database and document stylized facts on the evolution of China's domestic provincial governments set up the policy of geben qiancheng (everyone runs their own race) or gexian shentong (everyone shows their particular achievements), which resulted in a situation of gezi weizheng (everyone acts consciously without regard for the general interest)." One of the economic reforms was fiscal decentralization. It gave local governments an incentive to shield resident firms from interprovincial competition in order to maintain their tax base. In addition, local governments protected state-owned enterprises under their administration, as they provided a base for political power, private benefits, fiscal revenue and sustained local employment (Bai, Du, Tao, & Tong, 2004;Naughton, 2003). Local governments imposed non-tariff barriers to non-local firms, such as administrative charges, biased technical standards, and bureaucratic red tape (Poncet, 2004). This increased the cost of cross-border trade and investment. The central authority did little to prevent this from happening except from introduction the value added taxation system and the reassignment of central and local taxes in 1994.
The barriers to internal trade eventually culminated in inter-provincial conflicts for the supply of resources. Such clashes occurred for various products, such as for intermediate inputs for manufactures (e.g. cotton, silk, and wool), foodstuff (e.g. eggs and pork), and local spices (e.g. mint oil, and aniseed) (Poncet, 2004). For example, the southwestern province Sichuan is the main supplier of cocoons used for silk. The main producers of silk are located in coastal provinces, in particular Zhejiang (textile production in Zhejiang will be further discussed in section 5). To stimulate local silk production, Sichuan restricted the exports of silk worms to coastal provinces. In 1988, this went as far as deploying armed forces and militia on the provincial borders to enforce these export restrictions (Poncet, 2004). 1 Thus, local governments had incentives to interfere in local economic activities, and also had power over production and trade. As the silk dispute illustrates, local interventions were often shaped by sectors that provided raw material inputs (intermediate inputs) and sectors that produced final manufacturing goods. Typically, prices of raw materials were kept low to encourage the development of manufacturing industries that made intensive use of energy and other basic materials. That is, local governments intervened in industries that provided intermediate inputs to ease constraints on final goods producers, and they intervened in final goods producers to lay a claim on industries providing raw material inputs (Naughton, 2003;Young, 2000).
In addition, China's geographic area is vast and its topography includes a lot of rugged mountainous terrains. For example, Fujian is geographically separated from other provinces due to mountains and -until the early 1990s -few transport connections (Naughton, 2003).
One might expect that economic development would result in integration of hitherto little connected provinces. Yet, this did not materialize due to another prominent reform after 1978, namely the proliferation of Export Processing Zones (EPZs). The EPZs were one of the stepping stones China used to cross the river towards a modern economy. The International Labour Organization (ILO) defines an EPZ as a designated geographic area with export-oriented firms that are attracted to the location by favorable investment and trade conditions. However, in China, EPZs are typically jurisdictions rather than physical zones (Fu & Gao, 2007). Furthermore, these zones take many forms, which can be grouped into special economic zones and development zones. The latter include, among others, economic and technology development zones, high-tech industrial development zones, logistics parks, industrial parks, and border economic cooperation zones. The first EPZ in China was set up in Shenzhen in 1979. It was followed by several other EPZs in the years thereafter. Major experimentation with a broader set of special economic zones and development zones was initiated in the 1990s, starting in the Shanghai Pudong New Area. After China's accession to the WTO in 2001 another wave of EPZs followed, which provide employment for millions of workers. Most of these EPZs are concentrated in the coastal provinces of China, with the exception of border economic cooperation zones (Fu & Gao, 2007).
Because of EPZs, coastal provinces developed stronger trade linkages with the global market than with the rest of China. Indeed, geography and limited transport infrastructure until the early 1990s hindered inter-provincial trade and encouraged an outward orientation of coastal provinces (Lemoine, Poncet, & Ünal, 2015).
Some forms of export processing involve manufacturing services for foreign companies and only require importing inputs from abroad (Assche and van Biesebroeck van Assche & van Biesebroeck, 2018). Yet many other activities in EPZs involve the sourcing of domestic intermediate inputs, which were mainly sourced locally (intra-province). Most EPZs were in coastal provinces, which facilitated their development. Also, coastal provinces were able to retain the major part of foreign currency income and accounted for most infrastructure investment (Poncet, 2004). This widened the development gap between the east and the west. Central authorities were aware of this situation and stimulated co-operation between provinces by setting up multi-regional economic cooperation zones, with the aim to develop inter-border linkages, together with the fiscal transfer payment through provinces. However, few such zones were created and economic activity was limited (Poncet, 2004). Core-periphery theory was common thinking, and thus differential development speeds among regions were accepted in national policy discussions.
After the mid-1990s, the widening income gap between provinces was recognized as a major issue by the central authority. Moreover, with China's entry to the WTO in 2001, the central government faced substantial challenges to comply with WTO rules to promote an integrated domestic market. Hence, the government needed to strike a balance between further economic reforms and its resulting adjustment costs as well as to limit the protectionism by local governments (Poncet, 2005). It therefore implemented policies to liberalize the flow of goods, services and capital within China, as well as restructuring agriculture and state-owned enterprises. Starting from the 2000s, regional development policies were launched, such as "the development of the West", and "revitalization of 1 Other examples include the wool dispute between provinces in the north-west and the rest of the country; the rice dispute between Hunan (rice grower) and Guangdong; and the cereal dispute between Fujian and Hubei, Zhejiang, Jiangxi, Anhui, and Henan (cereal growers). All these 'disputes' took place in the late 1980s, see Poncet (2004).
Northeast". Effectively this implied a shift from core-periphery thinking towards a more balanced growth strategy. Poncet (2004Poncet ( , 2005 uses data extracted from provincial input-output tables to examine internal and international trade for the years 1987, 1992, and 1997. She examines the share of local goods (intra-provincial), the share of goods imported from other provinces (inter-provincial), and the share of international goods in provinces' GDP. No distinction is made whether goods are for intermediate or final use. However, the commodity composition of inter-provincial trade reveals that trade is dominated by intra-industry trade in final manufacturing goods, rather than intermediate inputs (Naughton, 2003). Poncet (2004) documents a substantial decline in the share of inter-provincial trade between 1987 and 1997. On average the share of inter-provincial imports in GDP declined from 53.7 to 38.1%. In contrast, the share of international imports in GDP increased, in particular in coastal provinces where it increased from 5.4 to 24.9% between 1987 and 1997. The findings by Poncet (2004Poncet ( , 2005 align with patterns documented by Young (2000) and indicate segmentation of the Chinese economy extending into the 1990s. Li (2011), andXing et al. (2015) examine inter-provincial trade during the period from 1997 to 2009. 2 They observe a substantial internal home (i.e. province) trade bias, reflecting the long-lasting effects of local protectionism. Yet, they also find that inter-provincial trade expanded rapidly. Much of this trade is with neighboring provinces and they document three clusters, namely in the Yangtze river delta, the Pearl river delta, and the Beijing-Tianjin-Hebei region. Expanding inter-provincial trade indicates closer economic cooperation between provinces since the 2000s. Xing et al. (2015) also document substantial intra-industry trade between provinces. This suggests a certain degree of product variety and/or input sourcing. 3 Often these are dominated by specific industries. For example, Jilin province is home to China's largest truck producer. These truck producers account for a substantial share of inter-provincial trade in intermediate inputs of transport equipment between Jilin and other provinces.

Review of empirical evidence on trade linkages
Rising labor and land costs as well as shifts in government policies pushed part of the production activities to Northern and Western provinces (Naughton, 2007). Clearly, these less industrialized and less urbanized provinces had a substantial surplus labor force in rural areas. This shift in the location of production activities was facilitated by rapid improvements in transport infrastructure. China carried out massive investments in roads. Expressways increased from 147 km in 1988 to 98,000 km by 2012. And more than half of these expressways are located in central and western provinces (Li, Wu, & Chen, 2017). Infrastructure development also included the expansion of railways and the use of bullet trains. In the 1990s, the average speed of Chinese trains was below 60 km/h. By 2010, bullet trains were running at 350 km/h and its service length was over 8 thousand km (Zheng & Kahn, 2013). These lowered transportation costs. They also enabled firms to keep their headquarters in megacities, such as Beijing, Shanghai, and Guangzhou, and transfer capital and technology for production activities to the interior (Lemoine et al., 2015).
Overall, it suggests a process of restructuring is underway, whereby regionalism gives away towards inter-provincial production fragmentation. Increasingly, industries in central and western provinces appear to participate in China's domestic production network through the provision of intermediate inputs to exporting firms in coastal provinces (Meng et al., 2017;Meng, Wang, & Koopman, 2013;Pei et al., 2017). The implications for tasks specialization in China's domestic production network is an empirical matter. We present an empirical exploration in subsequent sections.

Data
This section presents the data to study China's domestic production networks. We first discuss the construction of Inter-Provincial Input-Output tables. Next, we describe the use of census data. Details are relegated to Appendix A.

Inter-provincial input-output tables
Inter-provincial input-output tables describe economic flows between province-industry pairs. Such tables are often used to examine the interdependence of subnational areas Mi et al. van Assche & van Biesebroeck, 2018). Yet, a key difficulty in the construction of IPIO tables is the measurement of inter-industry inter-province transactions. Occasionally, detailed survey data is available to measure internal trade. For example, the Japanese Ministry of Economy, Trade and Industry reports internal trade statistics from local customs offices based on production and distribution surveys (Ikeuchi, Belderbos, Fukao, Kim, & Kwon, 2015). However, most countries do not measure internal trade flows using surveys. Instead, scholars estimate internal trade using imputation methods. This is also the case for existing inter-provincial input-output tables for China, where internal trade flows are estimated using 2 Xing et al. (2015) use existing inter-provincial input-output tables for 1997, 2002, and 2007. These tables are further discussed in section 3. The tax invoices Xing et al. (2015) use are from 2003 to 2009. Xing et al. (2015) do not combine tax invoices and input-output tables to examine the evolution of China's production network.
3 Input sourcing is defined in a narrow sense as taking place within industries across provinces. a variety of econometric approaches . 4 This paper uses VAT invoices which provides direct measurement of inter-firm transactions collected by the China General State Administration of Taxation. VAT invoices report the sales value, the location of the firm, and provide a 4-digit industry classification of the firm. Inter-provincial trade is observed if the selling and buying firm are located in different provinces. If firms are located in the same province it is intra-provincial trade. By aggregating transaction values, inter-and intra-provincial trade flows by provinceindustry pairs are calculated for 2003 and 2012. 5 A firm is required to pay VAT if it sells a product to another firm. The tax administration has implemented a rigorous tax-collection system that inspects and audits the VAT paid by taxpayers throughout China. There is also a built-in mechanism that motivates firms to report accurately on the value of their transactions. This is because the firm's VAT liability equals its VAT in sales invoices minus the VAT in purchasing invoices. VAT invoices therefore appear to provide a reasonable measure of internal trade. However, VAT data is not without limitations. In particular, the tax administration uses a size threshold for the VAT invoices they provide to us. Only data of firms with annual merchandize transactions above 5 million yuan are provided. Many small firms that pay VAT are not included as a result, but their share in production is limited. For example, in 2004 large firms accounted for about 90% of total manufacturing sales . Furthermore, small firms typically sell their products locally. Hence, not including small firms may not have a substantial effect on our measurement of inter-regional production networks. Another limitation of using VAT invoices is that intrafirm transactions -deliveries between branches of multi-plant firms -are not covered by VAT invoices. These have likely grown in importance, implying our analysis may underestimate the expansion of China's domestic production network.
We obtain input-output tables for 31 provinces from the National Bureau of Statistics (NBS). 6 These are available for 2002 and 2012. We use these tables to derive Supply and Use Tables (SUTs) for each province, which is described in Appendix A.2. SUTs have products in its rows and industries in its columns, and can thus be easily combined with the VAT trade flows that are product-based and worker data (discussed below) that is industry-based.
The VAT inter-firm transactions are combined with the provincial SUTs to create the Inter-Provincial Input-Output tables for 2002 and 2012. The compilation of the IPIOs is described in Appendix A.3. The IPIOs show how the output of a given industry in a given province is divided between intermediate use and final consumption by all other province-industry pairs in China. The IPIOs provide data on n i = 39 industries and n p = 31 provinces, and also include a column with international exports by each province-industry pair. The basic structure of the IPIO for a given year is shown in Fig. A1.

Wages and occupations of workers
We use the 2000 and 2010 China Population Census. We obtained access to the 0.1% samples of the censuses, with approximately 1.2 million (1.3 million) observations for 2000 (2010). We observe where an individual is located, its occupation, and its industry of employment. We use this information to measure occupational employment shares by province-industry pairs.
We follow  and map occupations to four so-called business functions: production, R&D and technology development (abbreviated R&D), sales, marketing and distribution activities (Marketing), and other support activities (Other). 7 This constitutes a relevant level of analysis as firms tend to organize their activities this way due to internal economies of scale (Porter, 1985).
We can measure business function labor income shares for each of the 1209 (39 industries times 31 provinces) unique provinceindustry pairs as in the IPIOs. However, given the sample size, the potential number of observations available to estimate each share would be limited. We therefore measure occupational employment shares for each of the m p = 31 provinces by 9 broad sectors. Shares are assumed equal for more disaggregated sub-sectors. To measure relative wages by occupation, we use the 2002 and 2013 China Household Income Project (CHIP) survey. We then combine the relative wages with occupational employment shares by province-sector pairs to derive labor income shares by province-sector-year-function. 8 This is described in greater detail in Appendix A.4.

Methodology
This section describes the methods used to measure inter-provincial sourcing of intermediate inputs (subsection 4.1), domestic 4 A selective overview: a 1997 inter-regional input-output table was the result of a joint research project by IDE-JETRO and the China State Information Center (Meng et al., 2017). A table for 2002 was independently produced by the China State Information Center. Inter-regional input output tables for 1997, 2002, and 2007 were used by Xing et al. (2015Xing et al. ( ), a 2002Xing et al. ( and 2007  value added in exports (4.2), and functional specialization in international trade (4.3). We first define the variables used. A product is either exported, domestically consumed, or used as an intermediate input. Let e be a n i n p x 1 vector of exports, where i denotes products and p denotes provinces. Let f be a n i n p x 1 vector denoting domestic consumption (final consumption plus capital formation). Let z be a n i n p x 1 vector denoting the use of product i as an intermediate input in province p. The n i n p x 1 vector of output s is thus split between exports, domestic consumption, and intermediate inputs: Next, note that: where A is the n i n p x n i n p domestic intermediate input coefficients matrix with typical element A pp (a,b) denoting the amount of intermediate input a used to produce one unit of good b, where a and b are made in any province p. The symbol '^' indicates the output vector s is put on the main diagonal of an n i n p x n i n p matrix with zeros otherwise. Z is the n i n p x n i n p matrix of intermediate inputs flows between province-industry pairs that is estimated using the VAT invoices. 9 Let m be the n i n p x 1 vector of imported intermediate inputs from abroad. Then, let v be a n i n p x 1 vector of value added, which is output minus domestic and imported intermediate inputs. Value added v is the sum of capital income, v c , and labor income, v l .
Let B be a matrix of dimension k x n i n p , where k is the number of different business functions. A typical element of this matrix, b kip , denotes the labor income share of workers performing business function k for product i of province p.

Inter-provincial sourcing of intermediate inputs
Without introducing further notation, we note that the IPIO tables can be used to measure the sourcing of intermediate inputs from other provinces and from abroad. That is, domestic inter-provincial sourcing is the sum of intermediate inputs sourced from all other provinces in China as a share in total intermediate use in that province. Similarly, the share of intermediate inputs imported from abroad can be measured by dividing imported intermediates by total intermediate use.
These shares are descriptive, but relate to earlier studies of inter-provincial trade by China (Poncet, 2004). Furthermore, the share of imported intermediates closely relates to a measure commonly known as the broad measure of offshoring (Feenstra & Hanson, 1999). In a similar vein, the share of imports from other provinces can be interpreted as a broad measure of inter-provincial offshoring. Trends for these measures are presented in subsection 5.1.

Domestic value added in exports
We trace domestic value added in exports to examine where value is added in China's domestic production network. The domestic contribution is the value that is added by the industry that exports the product, but it also involves value added contributions of other domestic province-industry pairs that contribute indirectly through the delivery of intermediate inputs. Accounting for these indirect contributions requires the use of inter-provincial input-output tables.
The n i n p x 1 vector s e represents total gross output that is produced in each province-industry for exports. It is measured as: where I is a n i n p x n i n p identity matrix with ones on the diagonal and zeros elsewhere, and e and A are gross exports and the domestic intermediate input coefficient matrix. (I − A) − 1 is the well-known Leontief inverse matrix which ensures that all output related to exports, direct and indirect, are taken into account. Let the n i n p x 1 vector d e be the amount of domestic value added from each province-industry pair embodied in exports. Following Los, Timmer, and de Vries (2016), it is derived by pre-multiplying s e : where V is the matrix (n i n p x n i n p ) with diagonal element v ip representing the value added to gross output ratio for industry i in province p and zeroes on the off-diagonal elements. Note the vector d e contains value added generated in industries that export as well as in nonexporting industries through the delivery of intermediate inputs. Summation of d e provides the domestic value added of China embodied in its exports. Proper adjustment of s e in (3) allows us to measure characteristics of the domestic inter-provincial production network. In particular, if we keep the export value of i by province p and set all other elements of e to zero, the vector d e returns domestic value added from each province-industry pair embodied in that export value. This allows us, for example, to obtain the value added of a province embodied in the exports by another province, which will be labeled VAX_D i pp . 10 Total domestic value added in exports of i from province p, is the sum of value added by industries in all provinces, denoted VAX_D i p . 9 Note that in input-output analysis, products and industries are used interchangeably. 10 VAX_D is the term used by Los and Timmer (2020), who aim to clarify terminology in the use of various global value chain measures.
To analyze inter-provincial production fragmentation, we use VAX_D in network analysis as in Amador and Cabral (2017). Network analysis requires to determine nodes and set a condition for domestic value added flows that defines the links (edges) between the nodes. In this paper, nodes are the (p = 31) provinces. Links between provinces are based on a condition that aims to identify only those provinces that supply a substantial share of domestic value added in exports by a province. It is set such that we can visualize and interpret the network. That is, network analysis needs to capture relevant economic relations between provinces. We experimented with several thresholds and set it at 1.5%. 11 The orientation of the links (edges) is based on shares of provinces in the VAX_D of other provinces. This implies that we examine directed networks. That is.
Combining each binary value a pp generates a n p x n p connectivity matrix. 12 The connectivity matrix is binary. Since we use binary information, we examine an unweighted network and thus focus on the extensive margin of domestic value added flows between provinces embodied in exports. Result using network analysis are presented in subsection 5.2.

Functional specialization in international trade
In a final step, we trace the type of activities that contribute to domestic value added in exports using the methodology introduced by Timmer et al. (2019). This requires pre-multiplying d e in (4) by matrix B. Then: where d e is put on the main diagonal of an n i n p x n i n p matrix with zeros otherwise. Matrix G is of dimension k x n i n p and the typical element g kip represents domestic value added by function k in product i of province p embodied in exports. 13 We adapt the Balassa index (Balassa, 1965) to examine the functional specialization of provinces in China. Originally, the Balassa index refers to relative trade performance by comparing a province's share in world exports of a product to the province's share in overall exports. We follow Koopman, Wang, and Wei (2014) and examine trade performance on the basis of VAX_D. Note that G kp is the income from function k in province p embodied in exports. The functional specialization (FS) index for function k in province p is then defined as: The numerator measures the share of function k in overall functional income from province p that is embodied in exports. The denominator calculates the income share of this function of all provinces in exports. If the index is above one, the province is said to be specialized in that function. The results will be presented in subsection 5.3.

China's domestic production networks
Subsection 5.1 examines sourcing of intermediates from other provinces. Next, we employ input-output techniques to account for inter-provincial flows of domestic value added. These are used in subsection 5.2 to analyze the formation of production networks within China. Using network tools, we find evidence for cross-provincial production fragmentation during the 2000s. Finally, subsection 5.3 examines the location of activities in production networks. We document that richer (coastal) provinces orchestrate production networks, specializing in R&D, logistics and marketing, while production activities increasingly occur inland. Table 1 shows estimates for the sourcing of intermediates in 2002 and 2012. Columns 1 and 4 show the share of intermediates 11 Lower (higher) values result in a denser (sparser) network. Amador and Cabral (2017) set the threshold at 1%. Trends in network metrics discussed in section 5.2 are qualitatively similar when we set the threshold at 1 or 2%. 12 The connectivity matrix has 31 × 31-31 = 930 potential links (edges). 13 Domestic value added in exports by business function is measured by the costs of workers that carry it out. The sum across all functions thus equals the overall wage bill in gross exports. Hence, domestic value-added from activities in gross exports is the labor income that accrues to the provinces' workers. This is our preferred unit of analysis because employees tend to work and live in a geographical area. Capital income, which is the remainder when wages are subtracted from value added, is often hard to track to the ultimate recipients. Clearly, due to cross-border investments the location of the assets used in production need not equal the location of their owners. Also, assets are hard to allocate to a particular activity. For example, computers are used in many business functions. sourced from within the province. The local sourcing of intermediates appears low in Beijing and Hainan, where about a quarter of inputs is sourced locally. This contrasts to Hubei and Shandong, where over three quarter of inputs are sourced locally.

Inter-provincial trade in intermediate inputs
A broad measure of inter-provincial offshoring is the share of intermediates from other provinces for the production of manufactures in a province. This measure is shown in Table 1 columns 2 and 5 for 2002 and 2012. It varies considerably across provinces. It ranges from as low as 8% in Shandong to as high as 46% in Anhui in 2012.
In 18 out of 31 provinces, the share of intermediates sourced from other provinces increased. It increased substantially in several north-eastern provinces such as Jilin, from 6% to 18%, and in Heilongjiang from 12% to 28% between 2002 and 2012. Also, several central provinces, such as Hubei and Shaanxi report substantial increases. 14 The increase is also notable in coastal provinces, such as Guangdong, and Zhejiang, which account for a major share of exports. In Zhejiang the share increased from 4% in 2002 to 25% by 2012. The trend in Guangdong, China's main exporting province, closely follows the aggregate trend. This aggregate weighted share of inter-provincial trade (see the top row of Table 1) increased from 15 to 21% between 2002 and 2012.
The policy reforms and infrastructure investments (discussed in section 2) may have reduced fragmentation costs, which resulted in increased inter-provincial trade. As discussed, this pattern is opposite to that observed for the 1990s. However, it is not obvious that production would fragment across provincial borders during the 2000s. Bai and Liu (2019) examine a policy reform in 2004 regarding financing of VAT rebates. Due to fiscal pressure, the central government shifted part of the VAT rebate burden to provincial governments. In the new situation, local governments have to finance 25% of the VAT rebates. This also applies to any non-local goods that are bought and subsequently exported by local trading companies, for which the local government does not collect VAT revenue in the first place. Local governments therefore have an incentive to discourage sourcing by trading companies from non-local manufacturers, for example by delaying refunds. Bai and Liu (2019) find evidence of rising local protectionism due to the policy reform. In fact, our findings suggest that this is not reflected in aggregate trends.
Inter-provincial trade in intermediate inputs varies by product. This is illustrated in Fig. 1, which shows the share of intermediates from other provinces for two major manufacturing products, namely textile and automotives (all products that can be distinguished are shown in Table 2). The size of the bubbles is proportional to the volume of the product's gross output in the province in the initial year 2002. The bubbles for textile and automotives reveal variation in the spatial location of production. For textiles the bubbles indicate several coastal provinces, notably Jiangsu, Zhejiang, and Shandong, account for the majority of textile output. In fact, cities in Zhejiang are known by the textile product in which they specialize, such as Datang, known as 'sock city' and Shengzhou the 'necktie capital of the world'.
The geographical concentration of textile production contrasts to what is observed for automotive production. Fig. 1 suggest sizable car production in many provinces, pointing at a more even distribution of production across provinces. That may relate to efforts by provincial governments to establish local firms in the automotive industry (discussed in section 2). Yet, clearly provinces such as Hubei, Shanghai, and Jilin that form the heart-land of China's automotive industry have the highest output of automotive products.
A diagonal line is added to visualize whether sourcing from other provinces is higher in 2012 compared to 2002. Sourcing of intermediates from other provinces went up for both textile and automotives. That is, most observations in Fig. 1 are above the diagonal. For Zhejiang we observe a noticeable increase in inter-provincial sourcing of inputs for textile products.
However, the increase in inter-provincial trade in intermediates varies by product. For textiles the weighted average share went up slightly, from 13 to 14%. In contrast, for automotives it doubled, increasing from 14 to 28%. This suggests an increase in crossprovincial suppliers that provide inputs for cars and other vehicles. Table 2 shows changes in the sourcing of inputs by product for 2002 and 2012. The top row provides the weighted average share for all manufactures combined. Subsequent rows present shares by product.
Between 2002 and 2012, the share of intermediates sourced from within the province decreased from 74 to 67%. This is partly due to more imports of intermediates from abroad, which went up from 11 to 13%. But it is mainly due to the sourcing of intermediates from other provinces, which increased from 15 to 21%. Subsequent rows show substantial variation across products. The share of intermediates imported from abroad fell for many of the 'traditional' manufactures, such as textile, leather, paper, and wood products. For leather, for example, the share of intermediates imported from abroad fell from 11 to 5%. This trend is also observed for transport products (from 12 to 11%). Yet for several other products the share of intermediates from abroad increased. Most notably electronics, for which the share increased from 16 to 27%.
The share of intermediate inputs imported from other Chinese provinces increased for all products distinguished. However, as was illustrated in Fig. 1, the increase differs substantially across products. It rose fastest for automotive and slowest for textile products. Other products for which we observe a fast increase in sourcing of intermediates from other provinces are machinery equipment (rising from 16 to 23%), electric equipment (from 15 to 23%), and fabricated metal products (from 14 to 22%).
These descriptive statistics suggest an expansion of domestic production networks that vary by product. Note, however, that this section examines the share of imported intermediate inputs in total intermediate use. It measures the direct inputs in production, or the first stage of production. Yet, the production of intermediates itself requires additional production activities that take place both across 14 In several central and western provinces, such as in Sichuan and Qinghai, the share of intermediates sourced from other provinces decreased between 2002 and 2012. Yet, these provinces had a high share of intermediate inputs imported from other provinces in the initial year. Also, they are poorer on average. In the early 2000s, the ratio of average real GDP per worker in the five provinces with the highest income to that of the five central and western provinces with the lowest income was almost 4:1 (Tombe & Zhu, 2019). Over time, income convergence took place. The observed changes in inter-provincial trade for several inland provinces may thus reflect improvements in the capabilities of local firms, alleviating the need to source intermediates from other provinces.  Chinese provinces and outside the country. Due to the rapid increase in intermediates trade, such effects are likely sizeable. The next subsection uses the inter-provincial input-output tables and network tools to examine inter-provincial fragmentation of production.

Inter-provincial production fragmentation: Network properties
This subsection examines properties of China's domestic production network based on inter-provincial flows of domestic value added. The production network for automotives is illustrated in Fig. 2. The figure shows the development of directed binary networks between 2002 and 2012. Provinces are connected (a pp = 1) when the share of value added embodied surpasses the threshold defined in (5). A visual inspection of the networks reveals they have become denser. This increase is clearly more pronounced for automotives compared to textiles (see Fig. B1 for textiles).
Hubei is one of the major automotive producing provinces (illustrated by the size of the bubble in Fig. 1). Yet, Hubei appears to have no major linkages whereby intermediate inputs are sourced from other provinces in 2012. It is not that such inter-provincial trade flows are absent. It is simply that these flows fall below the 1.5% threshold due to the large amount of value added from Hubei province in its exports of automotives. To understand the evolution of China's production network, the graph is therefore not sufficient and should be complemented by absolute values of inter-provincial trade (discussed in the previous subsection) and other network metrics, which we discuss below.
We discuss four network metrics that are often used to illustrate production fragmentation (Amador & Cabral, 2017). These are shown in Fig. 3. As before, we show the measures for textile and automotive products. But we also discuss production fragmentation for fabricated metal and electronics products as well as aggregate manufacturing. Table B1 provides metric values for each product.
First, consider the average network degree in the top left panel of Fig. 3. It measures the average number of client/supplying province relations by product. For each product we observe an increase in the average degree between 2002 and 2012. Hence, with the intensification of trade in intermediates among provinces, networks have become more complex as more provinces are involved in each other's export of products. The level and the change in the average degree indicates this trend is more pronounced for automotives and electronics compared to textile and fabricated metal.
The average geodesic distance (top right panel of Fig. 3) measures how close provinces are to each other. It can be interpreted as a measure of production fragmentation, because a decrease in the measure indicates that the 'path' between provinces has shortened. Over time, we indeed observe a declining trend pointing at inter-provincial production fragmentation.
The bottom panel of Fig. 3 shows two measures that aim to capture how important specific provinces are in networks. The reciprocity correlation (bottom left panel) indicates to what extent ties are reciprocated. A predominance of asymmetric relations would point at a hierarchical structure and is reflected in negative values. This is not the case; values are positive pointing at reciprocal relations. The reciprocity correlation starts at low levels but increases over time. The degree of assortativity is shown in the bottom right panel. Studies of international trade flows typically find a disassortative mixing (i.e. negative values for the degree of assortativity), because a few countries are big and central to the global economy, acting as hubs for other smaller countries. In contrast, we obtain positive values (except for electronics in 2002). In addition, for some products, such as electronics and fabricated metal products the degree of assortativity increased, suggesting the involvement of provinces in electronics and fabricated metals production became more evenly spread over time. For other products, such as textiles and automotives, the degree of assortativity decreased pointing at agglomeration of production stages in specific provinces. Despite the tendency for textile and automotives, the degree of assortativity is positive.
These metrics suggest the structure of China's inter-provincial trade is less concentrated compared to the structure of international trade. In the latter, a few central countries dominate. Supply or demand shocks to critical countries in the production network may then form cascade effects and propagate to the rest of the global economy (Carvalho & Tahbaz-Salehi, 2019). Within China this appears to be a less salient feature, as the reciprocity correlation and degree of assortativity suggest a less centralized inter-provincial network and hence a network that might be more resilient to asymmetric shocks.

Exploring the nature of China's domestic value-added in exports
The previous subsections documented the increase in intermediate inputs sourced from other provinces and the evolution of production networks within China. It also documented patterns that differ by product, likely due to differences in the potential for production fragmentation. This subsection examines whether fragmentation resulted in functional specialization. Table 3 provides aggregate statistics of gross exports and its domestic value added content. The first row shows gross exports, which increased more than fivefold from 2783 billion to 14,510 billion yuan between 2002 and 2012. Domestic value added in exports, VAX_D, also increased (the second row). In fact, it increased at a faster pace compared to gross exports. As a consequence, the domestic value added content of exports increased from 70 to 72% (see bottom row of Table 3). This is consistent with other studies that document an increase in the domestic content of exports during a comparable period (Kee & Tang, 2016;Koopman et al., 2012).
The other rows of Table 3 provide a decomposition of labor income from exports into income by activity. Most labor income is from production activities, accounting for about 70% (588/839*100%) in 2002. Also, most of the absolute increase in domestic value-added is from production activities, increasing by more than 2000 billion yuan between 2002 and 2012. Yet income from other activities increased at a faster pace such that labor income from production activities accounted for 61% by 2012 (2654/4344*100%). The income shares from R&D and marketing increased from 7 to 12 and from 18 to 23% respectively between 2002 and 2012. Table 4 examines changes in domestic value added in exports by activity for each of China's 31 provinces separately between 2002 and 2012. Provinces are ranked by the change in value from production activities.
The top ten contributors to the increase in aggregate domestic value added are Guangdong, Jiangsu, Zhejiang, Shanghai, Beijing, Fujian, Shandong, Hebei, Liaoning and Henan respectively (in bold). Most of these provinces are also in the top ten contributors of value added from production activities, but not always, which we will discuss below. The top ten contributors from production activities account for about 75% of the increase in domestic value added from production activities in exports. Guangdong alone accounts for almost one third of the change in value added from production activities.
Guangdong is also a top ten contributor to the increase in domestic value-added from R&D, marketing, and other activities.  Table B1 provides values of the network metrics by product.
However, a top ten contributor in terms of production activities is not always a top ten contributor for other activities, and vice versa. For example, Henan is a top ten (ranked #8) contributor to the increase in domestic value for production. But it is not a top ten contributor in terms of R&D or marketing activities (both ranked #12). Beijing is one of the main contributors to the increase in value added from R&D (ranked #3) and marketing activities (#2). Yet, it does not account for much of the increase in production activities (#14). This is also the case for Tianjin. Tianjin is a top ten contributor to the increase in domestic value added from R&D and marketing activities, but not from production activities. Notes: Gross exports of China (row 1) is the sum of foreign and domestic value added exports. Domestic value added in exports (row 2) is the sum of income for capital and labor. Labor income (row 3) is split into income from R&D (4), production (5), marketing (6), and other (7) Table 3.
In sum, the increase in domestic value-added from production activities is largely accounted for by several provinces, most notably Guangdong, Jiangsu, and Zhejiang. Other provinces, such as Beijing and Tianjin, appear relatively larger contributors to the increase in value from R&D and marketing activities in exports. 15 The main absolute increase in income originates from production activities. However, the findings documented in Table 3 suggest that income from R&D activities increased almost nine fold and marketing more than six fold, which compares to a fourfold increase in income from production activities between 2002 and 2012. This suggests a change in the relative income from business functions towards R&D and marketing, and away from production. Fig. 4 identifies province-specific specialization patterns. We plot GDP per capita of a province against its Functional Specialization (FS) index in 2012. Horizontal lines separate observations with FS indices above and below 1. Values of the FS index by business function and province are given in Table B2.
Panel (a) of Fig. 4 shows that richer provinces tend to have a higher FS index in R&D. Put otherwise, there is positive correlation between levels of economic development and functional specialization in R&D. Beijing, Shanghai and Tianjin have particularly high FS indices for R&D. This suggests specialization in R&D is a common phenomenon for richer provinces in China.
Note, however, that it is not a uniform pattern. For a given level of income, one province can be specialized in an activity whereas another province is not. For example, Jiangxi has an FS index in R&D activities well above 1, but Guangxi not, even though they are at similar levels of income per capita. Panels (b) and (c) show FS indices for marketing and other activities respectively. Some provinces, like Liaoning and Guangdong specialize in both activities. Other provinces specialize in only one of them. For example, Shanxi, and Tibet specialize in marketing activities, but not in other support activities. Zhejiang, and Qinghai specialize in other activities, but not in marketing. These heterogeneous patterns suggest that there are many idiosyncratic determinants of a province's functional specialization pattern. For lower-income provinces specialization patterns also vary widely. As expected, most of them are specialized in production activities. They are mapped into the north-west quadrant in panel (d). Many provinces in or close to the Pearl and Yangtze river delta in the (south-)east of China appear to specialize in production activities. Table B2 provides the FS index by business function in 2002 and 2012. Several provinces such as Henan, Liaoning, Shanxi, Sichuan, Qinghai, and Jilin appear specialized in R&D activities in 2002. Their initial specialization may relate to the distortive protectionist provincial policies that were prevalent for many decades (see section 2). However, by 2012 these provinces were no longer specialized in R&D but in production. We conjecture that falling fragmentation costs may have driven a specialization according to comparative advantage in performing production tasks.
Overall, the FS indices are suggestive of a regional division of labor within China with richer (coastal) regions orchestrating production networks that reach deep into the inland regions. Yet, understanding what causes these differences in functional specialization in trade is an important and interesting avenue for future research. One might speculate it is driven by differential patterns in domestic value chain fragmentation due to spatial heterogeneity in transportation and communication links. But differences in specialization could also be driven by the size of the province, the attractiveness for (multi-)national headquarter locations, geographical characteristics and infrastructure, as well as historical built up of capabilities and networks.

Concluding remarks
This paper developed new data to analyze China's domestic production network. It used VAT invoices to measure internal trade flows and build inter-provincial input-output tables for 2002 and 2012. Based on the IPIO tables, we document a rapid increase in inter-provincial trade and the substitution of domestic for imported inputs. These patterns are not obvious. Indeed, they mark a deviation from historical trends. Historically, inter-provincial trade has been hampered by a combination of local protectionism, rugged geography, and limited transport infrastructure. As a result, the post 1978 reforms and opening up of China encouraged an outward orientation of coastal provinces. That is, coastal provinces developed stronger trade linkages with the global market rather than with the rest of China. This lasted deep into the 1990s (Poncet, 2005). Our findings show patterns reversed thereafter: fragmentation costs fell as local protectionism waned and infrastructure developed, which encouraged the formation of inter-provincial production networks.
This paper then used information on the occupational structure of the labor force from population censuses to characterize the activities that add value to China's domestic production network involved in exporting. We measure functional specialization in domestic trade between provinces based on occupational labor income. Our findings suggest that richer areas such as Beijing, Tianjin, and Shanghai specialize in R&D and marketing activities and inland provinces specialize in production activities. These findings speak to an important debate about the transition of China from assembly to knowledge-intensive innovation activities: within China this process is already taking place.
The analysis presented in this paper is only a first step to understand China's development and its changing position in global value chains. Further research is need to understand such pertinent questions, as how the development of China's domestic production networks have enhanced China's position and performance in the global value chains.
The new database was used to document stylized facts on the evolution of China's production network. The unit of analysis are provinces. Various official data collection efforts follow this administrative division. Also, borders between provinces have been shown to have a substantial impact on trade (Lemoine et al., 2015). Yet, activities are often geographically concentrated within provinces, 15 The substantial decline in domestic value added in exports in Hainan is likely due to measurement error. such as the electronics hardware cluster in the Shenzhen city of Guangdong province. Studying specialization in these clusters requires more granular data and alternative methods. For example, by studying the location and characteristics of detailed custom trade data (Assche and van Biesebroeck van Assche & van Biesebroeck, 2018;Luck, 2019) or using linked employer-employee data (Cheng, Fan, Hoshi, & Hu, 2019).
The new data is made public to encourage follow up research and the development of complementary data sets. A wide range of applications are feasible, from examining border effects (Poncet, 2005) to welfare implications (Caliendo & Parro, 2015). Future research should update the inter-provincial input-output tables to study the evolution of trade and production within China. Furthermore, the IPIO tables can be embedded in global input-output tables following procedures developed by Meng et al. (2013). This will enable computation of production length and other indexes to examine the linkages between China's domestic production networks and its position in global value chains.
Geographical areas in China appear to differ in their comparative advantage in functions. This carries implications about the impact of internal production fragmentation on external trade, and relates to studies about the uneven distribution of factors of production within a country (Brakman & Van Marrewijk, 2013). Clearly, changes in the production structure of each province may also change the production structure of the economy as a whole and its integration in the global economy.
Finally, the data and network tools will prove relevant to study how shocks propagate in an economy (Carvalho et al., 2017;Tokui et al., 2012). Our findings for the reciprocity correlation and degree of assortativity suggest an inter-provincial network that is not strongly centralized and hence a network that might be more resilient to asymmetric shocks. How resilient was China's production network to strict lockdowns that were implemented following the outbreak of Covid-19? Lockdowns implemented by local policymakers varied in terms of timing and duration until the virus was contained. Pei, de Vries, and Zhang (2021) exploit this in a differencein-differences analysis and find products that relied more on imported (domestic) intermediates experienced a sharper (flatter) slowdown in export growth. We hope the measurement advances in this paper on China's activities in its domestic production network will contribute to a better understanding of China's economic development and its position in the global economy.

Appendix A. Data Appendix
This appendix describes the construction of inter-provincial input-output tables and the measurement of activities performed by workers. Subsection A1 outlines the VAT data to measure firm-to-firm transactions in intermediate inputs. Subsection A2 describes the Input-Output tables and the estimation of Supply and Use tables for China's provinces. The VAT data and Supply and Use Tables are important building blocks for the inter-provincial input-output tables described in subsection A3. In subsection A4 we outline the occupational employment and wage data to measure where labor income is generated and the nature of economic activities.

A.1. Value Added Tax data
The Value Added Tax (VAT) invoices we use is census data collected by the China General State Administration of Taxation (SAT). A firm is required to pay VAT if it sells agricultural or manufacturing products to other firms. 16 A typical VAT invoice includes information on: the location and a 4-digit industry classification of the selling and purchasing firm, the invoice date, the transaction value and the VAT. The VAT data of SAT reports the annual sales value of deliveries for every VAT-registered business to any other VAT affiliate.
The national VAT collection system was established by SAT as part of the Golden Tax Project in 1994 . In 2003, the Golden Tax Project entered a new phase when the electronic tax management system was completed, covering the VAT and transaction values of almost all inter-firm trade. SAT has implemented a rigorous tax-collection system that inspects and audits the VAT paid by taxpayers throughout China. Its pairwise auditing of VAT invoices detects and rejects fake invoices. VAT evaders must then pay the unpaid VAT and a fine. SAT has been very successful in reducing fake VAT invoices (Winn and Zhang, 2013). There is also a built-in mechanism that motivates firms to report accurately on the value of their transactions. This is because the firm's VAT liability equals its VAT in sales invoices minus the VAT in purchasing invoices. Hence, firms have an incentive to ask for purchasing invoices that reflect the transaction value and VAT as well as to provide accurate sales invoices to their buyers.
The VAT data, provided by the SAT, only includes transactions for firms whose amount of transactions is greater or equal to 5 million yuan per year for merchandize producers or above 8 million yuan for other businesses (Gao et al. 2020). Hence, small firms that fall below the annual sales threshold are not included. Many small firms pay VAT, but their share in production is limited. For example, in 2004 large firms accounted for about 90% of total manufacturing sales . Furthermore, small firms typically mainly sell their produce locally. Hence, not including small firms is unlikely to have a substantial effect on our measurement of inter-regional production networks. The VAT data we use includes transactions data for firms that sell agricultural and manufacturing products.
The VAT data we use reports the annual sales value from firms in an industry to firms in the same or other industry. If the selling and buying firms are registered in different provinces, it is inter-provincial trade. If the firms are located in the same province it is an internal trade of that province. By aggregating transaction values over firms, inter-provincial and intra-provincial trade are calculated. We use the VAT data for the years 2003 and 2012. 17 The transaction data we are used to build a matrix of inter-and intra-provincial trade flows where each row and each column are a province-industry. This matrix of trade flows is suitable for analyzing the organization of production networks within China. Yet there are several limitations. First, we have no information of what is traded between firms. We are therefore not able to determine whether the inter-firm trade is an intermediate product or an investment inputs. For that, the provincial input output tables are used. For each province-industry pair, the ratio of intermediate use, capital formation, and final consumption is calculated. These ratios are put on the main diagonal of a matrix with zeros otherwise and multiplied with the matrix of trade flows (Gao et al. 2020). Second, transactions by wholesale and retail trade firms are reported, not the value added they generate. Hence, these firms show up large in the trade flow matrix. Most of these transactions are to final consumers and therefore less likely affect our analysis of production fragmentation. Third, the VAT data does not include intra-firm trade. Intermediates flows between branches of multi-plant firms expanded during the period considered (see section 2, main text). It is therefore likely we underestimate the expansion of China's domestic production network using VAT invoices. 16 In the 1980s, China replaced the product tax by a value added tax on manufactured goods and imports and a business tax on services. After January 2012, providers of services also started to pay VAT. Initially this was limited to Shanghai. By August 2013, a nation-wide VAT was levied on several services. By May 2016, it encompassed all services (Lardy, 2019). 17 Ideally we use VAT data for 2002 since the input-output data (described in the next subsection) is for 2002. VAT data is only available from 2003 onwards after SAT completed its electronic tax management system.

A.2. Input-output tables and supply and use tables by province
We obtain input-output tables at producer prices for each of China's 31 provinces from China's National Bureau of Statistics (NBS) for 2002 and 2012. 18 For each province we obtain a product-by-product Input-Output Table (IOT). 19 We use these IOTs to derive province-specific Supply and Use Tables (SUTs). 20 SUTs can be easily combined with trade flows that are product-based (discussed in subsection A1) and statistics on economic activities by workers that are industry-based (discussed in A4). The provincial IOTs and the SUTs will be used in subsection A3 for the construction of inter-provincial input-output tables.
SUTs have products in its rows and industries in its columns. In a first step, we generate a supply table for each province. Therefore, the column sums are estimated using the industrial structure information from annual survey of industrial production (ASIP) and provincial gross output, and the row sums of the supply table are from the province-specific product-by-product table. The internal structure of the supply table for each province is unknown. NBS does publish a national supply table for 2002 and 2012. We use the structure of the national supply table to obtain an initial estimate of the internal structure and then use RAS to reconcile values with the province-specific column and row sums. 21 In a second step, we derive the use table for each province from the estimated supply table and the product-by-product input-output table. The use table is obtained under the assumption that a given product is made with the same inputs, no matter in which industry it is made. This is commonly known as the product-technology assumption (Miller and Blair, 2009) and the preferred approach in the System of National Accounts 1993 (SNA 1993).
China's national statistical office has been actively engaged in capacity building at provincial offices. This has helped harmonize the compilation and improve the quality of provincial IOTs. The public availability of these tables is a signal that NBS is more confident in its reliability. This contrasts to issues raised about the provincial IOTs from the 1990s that were not made publicly available, see e.g. Naughton (2003).
A well-known issue about China's statistics is that the sum of provincial GDP is larger than the GDP of China. In 2012, it is 8.1% higher. We proportionally adjust such that the sum of provincial GDP equals the GDP of China reported in the China Statistical Yearbook 2017 (NBS SY, 2017). 22 We adjust province-industry gross output at producer prices by the same proportion. Hence value added to gross output ratios do not change. Keeping this ratio intact is relevant for the measurement of domestic value added of economic activities (see eq. 6 in the main text). Final expenditure categories of provinces are also proportionally adjusted such that they match with the numbers reported in the national accounts. We do not adjust import and export data, which is further discussed below. 23

A.3. Inter-provincial input-output tables
The Inter-Provincial Input-Output Tables (IPIOs) developed here show how the output of a given industry in a given province is divided between intermediate use and final consumption by all other province-industries within China's domestic economy.
The IPIO we develop for 2002 and 2012 provide data on n i = 39 industries and n p = 31 provinces, and a column with international exports. The basic structure of the IPIO for a given year is given in Fig. A1. The units of observations are the n i n p = 1,209 unique province-industry pairs. The n i n p x n i n p matrix Z records the flows of output for intermediate use between industries. The entry in row a and column b equals the use (in million Yuan) by industry-province b of intermediate inputs provided by a. The n i n p x 1 vector f contains for each province-industry the output for final use in China plus a n i n p x 1 vector e with exports abroad. Gross output for each province-industry pair is given by the n i n p x 1 supply vector s. Because total supply is by necessity equal to total intermediate and final use, the following equation has to hold: where 1  value added by industry and final expenditures from the regional national account and industry yearbook. We use the structure of the 2012 IOTs for Tibet to obtain an initial estimate of the internal structure for 2002 and then use RAS to reconcile values with the column and row sums. 20 Many statistical offices produce IOTs on the basis of SUTs. Our approach thus appears to be a reverse-engineering process. Note, however, that intermediate inputs used in a given province-industry and include its imports of intermediates (an element of the 1 x n i n p vector m') and value added (an element of the 1 x n i n p vector v') we also arrive at total output of this province-industry (the 1 x n i n p vector s'). 24 Value added, v, is the sum of capital income, v ' c , and labor income, v ' l . Labor income is split by income from business functions, which will be discussed in A4. The remainder of this section describes in detail the construction of the Inter-Provincial Input-Output Tables for China in 2002 and 2012. We describe in chronological order the steps that were taken.
Step 1. In a first step, we obtain IPIOs from the China State Information Center for 2002 and 2012. These tables serve as starting point and are further developed as described below. The construction of the 2002 IPIO is described in . The 2012 IPIO was received via personal communication, and will be officially released in the near future. The internal structure of these IPIOs, the matrix Z see Fig. A1, is estimated based on coefficients of gravity models for eight commodities from rail transportation data . We replace the matrix Z using VAT data that directly measure inter-firm transactions, which is described in step 5. The vector of exports e (dimension n i n p x 1) for 2002 is compiled by the China State Information Center using customs data. The exports vector for 2012 is taken directly from the provincial Input-Output Tables (see subsection A2). The export data in the provincial IOTs follows the guidelines of the system of national accounts (1993) and include exports that do not change ownership. 25 These pure processing exports are relevant for provinces with export processing zones and including these allows us to analyze the domestic production network for all exports of China. In the steps that follow, we do not adjust the vector of exports e.
Step 2. The IPIOs from the China State Information Center have a column called 'error'. This column arises due to an imbalance between supply and demand. 26 We distribute the error across the final demand categories, except for exports, using the share of each final demand category in total final demand. 27 In case final demand is zero or becomes negative, the error is allocated to changes in inventories and valuables. This affects final demand f, but not exports e and is therefore inconsequential for our analysis of China's domestic production network engaged in exporting.
The 2002 IPIO distinguishes transport services, warehousing services, wholesale and retail trade, but it groups all other services. Hence, it provides disaggregated data for 29 sectors of the economy, which compares to 42 sectors in the IPIO for 2012. We disaggregated the sector 'other services' in the 2002 IPIO in order to have distinguish 42 sectors. For the disaggregation we used the shares from the provincial IOTs, and assume no intermediates trade in these services sectors.
Step 3. The IPIOs from the China State Information Center are product-by-product tables. We transform these IPIOs to industry-byindustry tables. This is needed, because the inter-firm transactions are recorded by industry (see subsection A1) and also the location of economic activities (see subsection A4).
We have the supply table for each province and use these to calculate the market share matrix. Consider a transposed supply table for province p (S p )'. Columns now have products and rows industries. Each row shows the product-specific output of each industry. Each column shows the volume of a specific product produced by each industry. Summation of the columns gives a row vector s p i of 24 A prime 'denotes transposition. 25 Note that exports in the national IOTs for China excludes exports that do not change ownership. The national IOTs follow the guidelines of the system of national accounts 2008. This was confirmed in personal communication with the China State Information Center. For 2012, the sum of exports from the provincial IOTs, which includes the exports that do not change ownership, is 6.5% higher compared to total exports reported in the national IOT. 26 The tables are balanced by: Intermediate Inputs + Final Demand + Error = Gross Output + Import. 27 We work with a final demand matrix F of dimension n i n p x n p . In the final step, we sum over the columns to arrive at the final demand vector f of dimension n i n p x 1.
good i produced in province p. We obtain the market share matrix K p for province p as follows: In a first transformation we use the fixed product sales structure assumption, which assumes that each product has its own sales structure irrespective of the industry where it is produced. The product by product intermediate delivery matrix between any two provinces, Z product by product pp , can be transformed to an industry by product matrix: Z pp industry by product = K p Z pp product by product Each column of Z industry by product pp gives the intermediate inputs from industries in a province p in the production of products in province p.
In a second transformation we use the product technology assumption, which assumes that each product is produced in its own specific way irrespective of the industry where it is produced. Under this assumption, the intermediate input of a specific industry's production is a weighted average across the intermediate inputs of all products' production. The weights are the market shares of the specific industry in the supply of each product. Then, the industry by product intermediate delivery matrix between any two provinces can be transformed to an industry by industry intermediate delivery matrix between any two provinces: We do this for each of the intermediate delivery matrices between province pairs. The same approach is applied to transforming final demand and value added.
Step 4. For each IPIO, we replace the vectors of value added and gross output, v and s, by values from the provincial IOTs consistent with GDP reported in the NBS Statistical Yearbook 2017 (see subsection A2). We also replace totals of the final demand categories from the provincial IOTs, except for exports. We balance the IPIOs using the bi-proportional RAS technique, with the export vector e fixed.
Step 5. Subsection A1 describes the VAT data to measure inter-and intra-provincial trade flows. We use the VAT data to replace matrix Z in the IPIOs. We again balance using the bi-proportional RAS technique, with the export vector fixed.
Step 6. The sector classification in the IPIO 2002 differs slightly from the IPIO 2012. We aggregate the IPIOs to a common classification, distinguishing 39 industries. The aggregation to a common industry classification is reported in Table A1. In the previous steps, we worked with a final demand matrix F of dimension n i n p x n p . We sum over the columns to arrive at the final demand vector f of dimension n i n p x 1.
This completes the development of China' Inter-Provincial Input-Output Tables for 2002 and 2012. Key characteristics of the tables are that they are consistent with national accounts, industry by industry, and deliveries of intermediates between province-industries are based on VAT data.

Table A1
Common industry classification for China's Inter-Provincial Input-Output Table (

A.4. Labor income shares by activity
Labor income, vector v l ' , is split into labor income by economic activity. Our approach closely follows . We distinguish between four possible activities or functions, namely production, R&D, marketing, and other. The labor income of a particular function is measured by the income of domestic workers that carry out this function. We describe information on the type of workers involved in a function, characterized by occupation.
Our primary data source for information on the type of workers and their distribution over province-industry pairs is the 2000 and 2010 China Population Census. We obtained access to the 0.1% samples of the censuses, with approximately 1.2 million (1.3 million) observations for 2000 (2010). 28 For each observation we observe where an individual lives, her occupation, and the industry in which she is employed. We use this information to measure occupational employment shares by province-industry pairs. We could measure these shares for the 1209 unique province-industry pairs as in the IPIOs. However, this would limit the potential number of observations per cell. 29 We therefore measure occupational employment shares for each of the n p = 31 provinces by 9 broad sectors. 30 Shares are assumed equal for more disaggregated sub-sectors. 31 To measure wages by occupation, we use the 2002 and 2013 China Household Income Project (CHIP) survey (Li and Sicular, 2014). The survey is collected and compiled by the Chinese Academy of Social Sciences based on a representative sample provided by NBS. The survey data include a series of individual and household characteristics and information on income. We use information on the 28 Sampling weights are not provided. This may introduce bias in the business function shares. From NBS, China's statistical office, we obtained industry by occupation data for China (not by province), which NBS tabulated on the basis of 10% samples of the population censuses 2000 and 2010. We compared the shares from the data provided to NBS to the shares based on the 0.1% sample. The shares for China as a whole are very similar and also the correlation at the industry-business function level is high, ranging from 0.73 (employment share in other activities by industry in 2010) to 0.995 (employment share in marketing activities by industry in 2000). 29 1.3 million observations divided by 1209 province-industry times 66 occupations is about 16 observations per cell. 30 The 9 broad sectors and their mapping to the 39 industries in the IPIOs, see Appendix Table A1, are: Agriculture (IPIO industry number 1); Mining (IPIO numbers 2-5); Light manufacturing (6-10); Chemicals, metals, and other manufacturing (11-16; 21-22); Machinery, electronics and transport equipment manufacturing (17-20); Utilities and construction (23-26); Hotels, restaurants, and distributive trade (27-29); Finance and business services (30-34); Other services (35-39). 31 The approach to measure shares at a more aggregated level is similar to that described in O'Mahony and Timmer (2009), who used labor force surveys to infer skill shares by country-industry. relative income by broad (1-digit) occupational groups. For example, the income of managers (census codes 101-105, see column 2 in Table A2) is on average 57% higher compared to production workers in 2013. We combine relative wages with occupational employment shares by province-industry to derive labor income shares by province-industry-year-occupation. 32 We map the 66 occupation labor income shares to four business functions: production, R&D and technology development (abbreviated R&D), sales and distribution activities (Marketing), and other support activities. 33 The mapping is exhaustive and shown in column 4 of Table A2. Classification of Occupations (ISCO), which categorizes workers by level of skill. Because workers are categorized by area of expertise, we cannot distinguish between skilled versus unskilled production activities (e.g. between assemblers and machine engineers of electronic products, see census code 608 in Appendix Table A2).    Notes: This table presents network metrics for all manufactured products combined, and by product. The metrics are based on inter-provincial domestic value added flows related to China's exports, employing the inter-provincial input-output tables for 2002 and 2012. Notes: Observations above 1 indicate specialization in a function (FS ≥ 1). Calculated according to Eq. (7), comparing functional income shares in exports of all goods and services by a province to the same shares for all provinces in China. RD refers to R&D; FAB to production; MAR to sales and marketing; OTH to other support activities. Source: Authors' calculations.