Fiscal decentralization in Poland: 2004-2019 municipal and city dataset

This dataset covers 2476–2479 Polish municipalities and cities (dependent on the year) over a period from 2004 when Poland joined the EU to the pre-COVID-19-pandemic 2019. The created 113 yearly panel variables include budgetary, electoral competitiveness, and European Union funded investment drive data. While the dataset has been created out of publicly available sources, their use requires advanced knowledge of budgetary data and their classification, as well as data gathering, merging, and clearing, which required many hours of work over a year. Fiscal variables were created out of raw data of over 25 million subcentral governments records. They were sourced from Rb27s (revenue), Rb28s (expenditure), RbNDS (balance), and RbZtd (debt) forms, which are reported quarterly by all subcentral governments to the Ministry of Finance. These data were aggregated according to the governmental budgetary classification keys into ready-to-use variables. Furthermore, these data were used to create original EU-financed local investment drives proxy variables based on large investments in general and in sports objects in particular. Moreover, subcentral electoral data from 2002, 2006, 2010, 2014, and 2018 were sourced from the National Electoral Commission, mapped, cleared, merged, and used to create original electoral competitiveness variables. This dataset can be used to model different aspects of fiscal decentralization, political budget cycles, and EU-funded investment in a large sample of local government units.


a b s t r a c t
This dataset covers 2476-2479 Polish municipalities and cities (dependent on the year) over a period from 2004 when Poland joined the EU to the pre-COVID-19-pandemic 2019. The created 113 yearly panel variables include budgetary, electoral competitiveness, and European Union funded investment drive data. While the dataset has been created out of publicly available sources, their use requires advanced knowledge of budgetary data and their classification, as well as data gathering, merging, and clearing, which required many hours of work over a year. Fiscal variables were created out of raw data of over 25 million subcentral governments records. They were sourced from Rb27s (revenue), Rb28s (expenditure), RbNDS (balance), and RbZtd (debt) forms, which are reported quarterly by all subcentral governments to the Ministry of Finance. These data were aggregated according to the governmental budgetary classification keys into readyto-use variables. Furthermore, these data were used to create original EU-financed local investment drives proxy variables based on large investments in general and in sports objects in particular. Moreover, subcentral electoral data from 20 02, 20 06, 2010, 2014, and 2018 were sourced from the National Electoral Commission, mapped, cleared, merged, and used to create original electoral competitiveness variables. This dataset can be used to model different aspects of fiscal decentralization, political budget cycles, and EU-funded investment in a large sample of local government units.

Value of the Data
• Revenue decentralization is the share of public revenues collected by subcentral governments. It plays a key role in the design of fiscal frameworks. It has been shown to affect public expenditure [1] , fiscal balance [3] , indebtedness [5] , public sector productivity [4] , political budget cycles [7] , and GDP per capita [2] . Despite their usefulness, individual municipal and city level data are not readily available for econometric analysis. Their compilation required a complex database construction procedure, which took many workhours over a year. The resulting database consists of ready-to-use variables, e.g., fiscal data aggregated according to the Ministry of Finance aggregation keys, as well as original European-Union-financed public investment drives indicators and electoral competitiveness proxies. • Poland is a unitary country, which makes its local governments institutionally very similar (no major differences like between American states or German Länder). In effect, Polish data makes it possible to study fiscal decentralization in a large sample of local units. • Central European countries like Poland are unique in that they have a markedly different model of revenue decentralization than most OECD member states, relying much less on local taxes and more on taxes shared with the central government. Our data thus allows the study of a much discussed question in the literature, namely whether shared taxes should be treated as own revenue or vertical transfers [ 6 , 8 ]. • These data can be used by academic economists, non-governmental organizations, and governmental agencies, to inform research and policy regarding local government revenue autonomy, fiscal discipline, European-Union-financed investment drives, political budget cycles, and more. The dataset will be also of interest to international researchers who want to diversify their country-sample, but are finding it difficult to access national Polish data. • This dataset can be used to develop and analyze revenue decentralization reforms in Poland and countries with similar fiscal frameworks, as well as draw new insights from the interplay of fiscal decentralization, fiscal sustainability, EU-financed investment, public investment drives, and political budget cycles.

Objective
While Poland is regarded as a successful case of fiscal decentralization [ 10 , 11 ], OECD Tax Autonomy Indicators [9] show that it is lagging behind other OECD economies in terms of sub-central government revenue autonomy. National data were collected to better understand this phenomenon, as well as draw general conclusions on the impact of local government revenue autonomy on fiscal sustainability, European-Union-financed investment drives, and political budget cycles. For this reason data collection has been limited to municipalities and cities, as these are the only local units with popularly elected executives and at least some discretion over tax rates and reliefs. The time frame has been chosen to avoid institutional breakpoints, starting in 2004 when Poland joined the European Union and ending in 2019 before the COVID-19 pandemic outbreak.

Data Description
The dataset is available in a single Excel spreadsheet ( PL_localgov_2004-2019.xlsx ). The file includes three individual Excel sheets that contain: (1) dataset that comprises 113 variables with almost 40 thousand unit-year observations, (2) descriptions for each variable in English and Polish, including MF budgetary classification codes, and sources, and (3) official Ministry of Finance budgetary data aggregation key. These sheets are summarized in Table 1 . The first sheet in the Excel file, Variables , contains 113 variables. These variables will be described below. They are clustered in Table 3 into 11 groups. First, IDENTIFIERS , group contains unique unit identifiers, unit names, unit types year, and other variables that are helpful for categorizing observations. Second, REVENUE , group consists of revenues aggregated according to the Ministry of Finance methodology. They disaggregate revenue into own revenue, earmarked grants, general grant -and their main categories.
Third, EXPENDITURE , group comprises expenditures aggregated according to the Ministry of Finance methodology. They disaggregate expenditures into capital and current expenditure, first of which includes investment expenditure, while second consists of wages and salaries, subsidies, debt servicing expenses, sureties and guarantees, social benefits, and other expenses.
Fourth, DEBT , group includes fiscal balance and debt variables. Debt is disaggregated into securities, credits and loans, deposits, and matured liabilities.
Fifth, PLAN , group encompasses planned PIT and CIT revenues for the entire year in Q1. These can be compared with actually executed PIT and CIT revenues, i.e., as a gage of uncertainty.
Sixth, LOCAL TAXES , group includes variables for the nine local taxes, which are (up to a maximum level set by central government) determined by municipalities and cities -namely revenues, effects of lowering the top rate, and effects of granting additional reliefs and exemptions. These can be helpful when studying revenue decentralization or tax competition.
Seventh, FAMILY 500 + , group consists of central government grants and local government expenses for the "Family 500 + " program, which is a child benefit introduced in 2016 and gradually expanded. The program was introduced in April 2016 as a monthly benefit of 500 PLN per every second, third, and subsequent child, while in families with incomes per person below 800 PLN for the first child as well. Next year its fiscal cost grew, because it was paid out for the whole year. In July 2019 the income criterion for the first child was removed. The program is so large in fiscal terms, that it may substantially affect e.g. shares of own revenue in total revenue. For this reason it may be necessary to control for it in regressions, although this data can be also used to study in which regions it is concentrated.
Eight, NON-FISCAL , group contains non-fiscal variables from Statistics Poland, which include population, population density, age group shares, unemployment, employment, and apartment stock. These are typically used as control variables when working with Polish local data.
Nineth, PRICES , group include only one variable -GDP deflator. It can be used for converting the national currency current prices fiscal variables into constant 2015 prices.
Tenth, ELECTORAL , group comprises variables constructed by the authors for measuring incumbency and electoral competitiveness in elections of local executives.
Eleventh, INV_DRIVES , group consists of variables constructed by the authors for measuring local public investment drives funded with European Union grants.
Most of the dataset consists of fiscal data sourced from Ministry of Finance databases listed in Table 2 . This data is provided in national currency in current prices. It can be however easily converted into i.e. constant prices, per capita values, or percentage points of revenue. It is given on cash basis -unfortunately there is no way to convert it into accrual basis. Table 4 lists main fiscal variables and their aggregation keys. Table 5 provides basic descriptive statistics for selected fiscal variables. As an example Fig. 1 shows main municipal and city revenue category shares for 2019.
Electoral variables are constructed based on National Electoral Commission data for 2006, 2010, 2014, and 2018 local elections. They include a subcentral election year dummy, and a set of electoral competitiveness proxies. These proxies are the number of candidates running for office, whether incumbent runs for re-election, whether incumbent runs for reelection in alignment with the central government, whether incumbent is the only candidate, and whether incumbent wins in the first round of voting. Electoral competitiveness variables take the same values for entire electoral terms, i.e. if incumbent runs for re-election in 2010, dummy takes value of 1 in the years 2007-2010. Variables are described in Table 6 .
Local investment drive variables are constructed based on Ministry of Finance data. Unfortunately the budgetary classification does not provide the standard statistical office division of gross fixed capital formation into buildings, vehicles, intangible assets etc. Instead, the constructed variables measure European Union funded investment drives in general and in sports objects in particular -as this is a category often invoked in popular discourse as an example of local malinvestment. Each variable is designed, so that it is triggered by particularly EU-funded large investments in general or in sports objects, accumulates with subsequent investments, and continues to drag over the current-year budget in following years. In particular: 1. The variable is triggered for a municipality or city when in year t investments from EU funds exceed X percent of revenue. This variable in year t takes the value of EU-funded investment from year t in percentage points of revenue from year t. 2. In year t + 1 variable takes the value of EU-funded investments from years t and t + 1 in percentage points of revenue from year t + 1.

Experimental Design, Materials and Methods
The data were compiled following a review of local fiscal data provided by the Ministry of Finance, local election data provided by the National Electoral Commission, and general local data provided by Statistics Poland Local Data Bank (Bank Danych Lokalnych, BDL) database. Below, firstly, primary data sources characteristics are described. Secondly, dataset construction procedure from the primary data sources is outlined.

Primary Data Sources Characteristics
Forms Rb27s (revenue), Rb28s (expenditure), RbNDS (balance), and RbZtd (debt) which all local governments report quarterly to the Ministry of Finance. There are much more various data forms submitted to the Ministry of Finance by subcentral governments of various tiers with different frequencies and periods of availability, so these all had to be mapped before the four forms finally used were identified. Ministry of Finance employees were consulted for assistance, as there is little explanatory data that is publicly available. Data were sourced from the DBF databases published quarterly on the Ministry of Finance website. 1 Databases from these forms contain very granular fiscal data with roughly 237 types of revenues (i.e. personal income tax, general grant, exchange rate gains) and 282 kinds of expenditures (i.e., various investment, compensation, and other expenditures), classified into 10 types of domestic and foreign funds sources, 34 broad sectors (i.e. tourism, public administration, family), and 878 more subsectors (i.e. foreign aid, Family 500 + childcare benefit, National Science Center). The challenge is the correct aggregation of this data, especially as their classification changes constantly, even several times a year.
There are some drawbacks to this data however. First, all data is presented on a pure cash basis, not consolidated, and not in line with ESA2010 methodology. Statistics Poland converts these to accrual basis with the algorithm Claims t − T ot al outst anding claims t−1 − O v er payments t−1 but the necessary data was not publicly available and the constructed dataset remains on a pure cash basis. Therefore the fiscal balance provided by these data is referred to as "working balance" in ESA2010 documents.
Second, these data include local government budgetary institutions only, which leaves out roughly 59,307 other units that are included in the ESA2010 local government sector in Poland. These units are 2 : National Electoral Commission Data. Local elections candidates and results published (and partly provided on request) in spreadsheets by the National Electoral Commission. The challenge is connecting data from the subsequent elections, as each is presented in a different spreadsheet format. There tend to be 1-3 spreadsheets per election, with electoral rounds presented in separate files and layouts. Typos and other small errors in the results also need to be accounted for.
Statistics Poland data. Population, employment, housing etc. data from Statistics Poland BDL online database. This is a relatively easy to use database that can export many units and years of data into the same spreadsheet. Two major drawbacks remain however. First, Statistics Poland does not seem to account well for internal and even external migrations, which could skew the population data. Second, employment and wage data does not cover the entire labor market, i.e. employment covers 10 million people in enterprises with 10 and more workers and public sector out of roughly 16 million workers. Third, unemployment data covers only persons who register with the state as unemployed instead of the better and more commonly used Labor Forces Survey data, which is not available on the local level.
Constant municipal and city IDs from BeSTi@ API of the Ministry of Finance. While each quarterly set of DBF databases published on the Ministry of Finance website contains a file listing all units, these can change year-to-year with borders or reclassifications, so constant identifiers were necessary. This is a good source, as it contains all revisions of subcentral government units ("Jednostki Samorz ądu Terytorialnego") with their constant ids.
Budgetary classification codes from BeSTi@ API of the Ministry of Finance. This is a good source, as it takes into accounts all the past methodology revisions. In particular budget classification paragraphs ("Paragrafy klasyfikacji bud żetowej", i.e. PIT revenue), financing source of budget classification paragraphs ("Finansowania Paragrafu klasyfikacji bud żetowej", i.e. non-returnable sources from European Union programs), and budget classification subsections ("Rozdziały klasyfikacji bud żetowej", i.e. "Rodzina 500 + " child benefit) were downloaded.
Keys for fiscal data aggregation provided on request by the Ministry of Finance . These were necessary in order to follow the official classification of investment, debt servicing costs, and other aggregates.
Eurostat data . This is a very transparent and easy to use database, but it contains only budgetary data aggregated into general government and its subsectors in accordance with ESA2010 methodology, instead of data for each units, so only GDP deflators were used.

Dataset Construction Procedure
The specific dataset construction procedure is outlined in this section and Fig. 3 . Yearly databases for each form category were merged using DBF Viewer 20 0 0 (Rb28s databases were so large that they had to be merged into two separate files for 2004-2013 and 2014-2019 periods). As these observations have no unique identifiers, unit and unit-year ids were created in Microsoft Access by generating a new variable based on WK (Voivodship code), PK (County code), GK (Municipality code), GT (Municipality type), PT (County type), and ROK (year) existing variables. Subsequently the DBF files were imported into Microsoft Excel as data models (in order to avoid the number of rows limitation) for filtering and aggregation in pivot tables. Revenue and expenditure aggregates (i.e. own-source revenue, investment expenditure, debt-servicing expenditure) were constructed according to definitions used in quarterly Ministry of Finance statements on local governments budget execution, aggregation keys to which were provided on request. This aggregation was streamlined by using pivot tables, but only to a degree -often many paragraphs had to be chosen to create a single aggregate (e.g. investment expenditure). Other variables were constructed using budget classification provided through the Ministry of Finance BeSTi@ Application Programming Interface (API) for budget classification paragraphs ("Paragrafy klasyfikacji bud żetowej", i.e. PIT revenue), financing source of budget classification paragraphs ("Finansowania Paragrafu klasyfikacji bud żetowej", i.e. non-returnable sources from European Union programs), and budget classification subsections ("Rozdziały klasyfikacji bud żetowej", i.e. "Rodzina 500 + " child benefit). This API was also used to download a list of local government units ("Jednostki Samorz ądu Terytorialnego") with unit constant identifiers, which were used as the basis for dataset aggregating variables from all used databases.
Electoral data has been sourced primarily from spreadsheets published on the National Electoral Commission website . These were nevertheless supplemented by spreadsheets and explanations gathered through direct email enquiries. Microsoft Excel was used to clear, process, and join data from 20 02, 20 06, 2010, 2014, and 2018 municipal and city elections. Some additional web researches were necessary in order to identify incumbents when standard (name, sex, age etc.) data was unclear due to database mistakes (e.g. misspellings) or midterm changes of local executives. The NEC data made it possible to create electoral competitiveness variables based on incumbency for each municipality and city. Variables values are defined by subsequent election, e.g. the variable that shows whether incumbent is on the ballot for 2012 takes value of 1 if incumbent was on the ballot in the 2014 election and 0 otherwise. The assumption is that incumbent already expected to be on the ballot in the future and conducted policy in the whole previous term accordingly. This

Ethics statements
No ethical issues are associated with this work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.