Innovative Applications of O.R. Let’s meet as usual: Do games played on non-frequent days differ? Evidence from top European soccer leagues ✩

Balancing the allocation of games in sports competitions is an important organizational task that can have serious ﬁnancial consequences. In this paper, we examine data from 10,142 soccer games played in the top German, Spanish, French, and English soccer leagues between 20 07/20 08 and 2016/2017. Using a machine learning technique for variable selection and applying a semi-parametric analysis of radius matching on the propensity score, we ﬁnd that all four leagues have a lower attendance in games that take place on four non-frequently played days than those on three frequently played days. We also ﬁnd that, in all leagues, there is a signiﬁcantly lower home advantage for the underdog teams on non-frequent days. Our ﬁndings suggest that the current schedule favors underdog teams with fewer home games on non-frequent days. Therefore, to increase the fairness of the competitions, it is necessary to adjust the allocation of the home games on non-frequent days in a way that eliminates any advantage driven by the schedule. These ﬁndings have implications for the stakeholders of the leagues, referees’ and calendar committees as well as for coaches and players.


Introduction
In recent decades, top European soccer leagues have become large business corporations. Each of the top leagues receives more than 1 billion Euros from television revenues alone. 1 A large part of these amounts is redistributed to teams based on their performance. In addition, the highest ranked teams of the top leagues earn the right to participate in the UEFA Champions League and Europa League. According to UEFA (the governing body of soccer in Europe), in the 2016/2017 season, more than 1.3 billion Euros was shared among the clubs in the Champions League and almost 400 million Euros among the clubs in the Europa League. 2 Since an unbalanced schedule may have serious financial consequences, the leagues face an important organizational task in creating a schedule that will not discriminate against or favor specific teams.
Top European soccer leagues use a double round-robin structure, where each team competes against each other team twice during the season. Operational research literature has intensively investigated different issues of round-robin structures, such as balanced distribution of home and away matches ( Della Croce & Oliveri, 2006 ;Durán, Guajardo & Sauré, 2017 ), break optimization ( Ribeiro & Urrutia, 2007 ), police requirements ( Kendall, Knust, Ribeiro & Urrutia, 2010 ), stakeholders' requirements ( Goossens & Spieksma, 2009 ), minimizing traveling distance ( Durán, Durán, Marenco, Mascialino & Rey, 2019 ;Kendall, 2008 ), optimizing the number of prizes ( Krumer, Megidish & Sela, 2019 ), and allocation of rescheduled games ( Yi, Goossens & Nobibon, 2020 ). 3 However, the operational research literature has neglected another important issue: the allocation of (non-rescheduled) games between days that are not the usual days in a league's calendar. This may play an important role, because fans may have different preferences toward certain days of the week ( Wang, Goossens & Vandebroek, 2018 ) or even the timings of the game ( Krumer, 2020 ). For example, if fans are not used to attending games on a certain day, their routine may prevent them from attending games on those days. In such cases, lower attendance may be expected on these days, which may reduce the home advantage ( Downward & Jones, 2007 ;Nevill, Balmer & Williams, 2002 ;Page & Page, 2010 ;Pettersson-Lidbom & Priks, 2010 ). In the 2017/2018 season, for example, the decision to schedule Monday games in the German Bundesliga 1 led to large protests. Moreover, German team 1.FSV Mainz 05 officially complained to the German Federation (DFL) since it had to play eight non-weekend games, six of which were at home. The DFL finally decided to abolish Monday games. 4 The present paper is closely related to the study of Krumer and Lechner (2018) , who investigated games in the German Bundesliga 1 and found a significantly lower attendance and also a lower home advantage in midweek days compared to weekend days (Friday, Saturday, Sunday − the most frequently played days in this league). In other leagues, however, the three most frequent days, which account for approximately 90% of all matches, differ from those in Bundesliga 1. For example, in England and France, the three most frequent days for games are Saturday, Sunday, and Wednesday, whereas the respective days in Spain are Saturday, Sunday, and Monday. 5 In this paper, we ask a simple question: Does playing on nonfrequent days have any effect on the various aspects of soccer games? More specifically, using different definition of non-frequent days and applying data from the four above-mentioned European soccer leagues between 20 07/20 08 and 2016/2017, we compare the games that were played on frequent and on non-frequent days with regard to their attendance and home advantage. More specifically, unlike Krumer and Lechner (2018) , we separately investigated games with a home advantage for the favorite team and games with a home advantage for the underdog team. This is another contribution to the operational research literature, which has not yet considered the possibility of such a heterogeneous effect. 6 To the best of our knowledge, the only paper to study such an effect in a context of schedule is Krumer (2020) , which investigated the UEFA Europa League games with kick-off times at 19:00 CET and 21:05 CET. That paper documented a lower attendance in games that started at 21:05 CET and a significantly lower home advantage for the underdog teams in these later games.
It is important to note that the allocation of the match days is not entirely random, and might be based on different schedulerelated features such as public holidays, international breaks, European tournaments, police requirements, broadcasters' and clubs' interests, months of the year, and even teams' values. Therefore, we need to control for these deviations from random selection into treatment (that is, non-frequent days) using a selection-onobservables approach. Specifically, we estimated the average treatment effect of playing on the non-frequent days by using the distance-weighted radius matching approach with bias adjustment suggested by Lechner, Miquel and Wunsch (2011) . This estimator is constructed to be more robust than other matching-type estimators, as it combines the features of distance-weighted radius matching with a bias adjustment to remove sample biases due to mismatches ( Huber, Lechner & Wunsch, 2013 ). In addition, having a rich database in terms of potential confounding variables, we use 4 For additional information, see: https://www.mainz05.de/news/ brief-an-die-dfl-kritik-an-terminierungen/ (in German) and https://www.dw. com/en/bundesliga-monday-games-to-be-discontinued-as-fan-protests-persist/ a-46390559 . Last accessed on 03.11.2019. 5 According to the UEFA association club coefficients, these are four of the five most successful leagues in Europe. We do not use data on the fifth (the Italian Serie A) since it suffered from various scandals and club insolvencies in the underlying period. See, for example, Buraimo, Migali and Simmons (2016) , who found a significantly lower crowd attendance after the Calcipoli scandal in the 20 05/20 06. 6 We discuss this type of heterogeneity in detail in the data section. a machine learning technique for variable selection as proposed by Belloni, Chernozhukov and Hansen (2014) . Based on analysis of 10,142 games from the top four European leagues over 10 seasons, we found a significantly lower attendance on four non-frequent days in all four leagues. In addition, all of the leagues had a reduced home advantage on four non-frequent days for the underdog teams, which is in line with Krumer (2020) . Our results suggest that the difference in the number of points between the favorite and the underdog teams, when the game takes place on non-frequent days compared to frequent days, is 0.49 in Ligue 1, 0.43 in La Liga, 0.42 in the Premier League, and 0.63 in Bundesliga 1. To put these numbers into perspective, a favorite with home advantage gains about 1.1 points more than the underdog, on average. 7 , 8 Such a reduced home advantage for weaker teams in games with a lower attendance is in line with the literature on the effect of the density of the crowd and its noise on referees' bias in favor of the home team. For example, using laboratory settings, Nevill et al. (2002) determined that crowd noise had a significant effect on the probability of a referee issuing a yellow card against a home team. Downward and Jones (2007) showed a positive relationship between the size of the crowd and the likelihood of a player receiving a yellow card in the English FA Cup. Similarly, Pettersson-Lidbom and Priks (2010) found a significant home bias of referees in games in which spectators were present compared to games with no spectators at all in the Italian Serie A. In addition, Ponzo and Scoppa (2018) showed a significantly larger number of cards against away teams in Serie A games between the teams from the same city that shared the same stadium. Finally, Page and Page (2010) found that the home advantage effect differs significantly among referees, and that this relationship is moderated by the size of the crowd. Therefore, a possible mediator of the difference in the home advantage of the underdog teams in games that take place on non-frequent days is lower crowd noise compared to games on frequent days. 9 However, except for the case of Bundesliga 1, whose favorite teams that played at home suffer from the highest reduction of the crowd on non-frequent days (almost 16% smaller crowds), we found no difference in home advantage between different days when favorite teams play at home. As Krumer (2020) proposed, it is possible that the underdog teams depends more on crowd support or even pre-performance routine than the favorite teams because the latter are likely to win due to their higher ability regardless of home support or some other psychological factors. Therefore, underdog teams seem to lose more points in games with lower crowd density compared to the favorite teams.
Our results suggest that since some underdog teams play more home games on non-frequent days than other underdog teams, the current structure favors underdogs that play fewer home games on non-frequent days and favorites that play more away games on non-frequent days. To illustrate a possible relationship between an unbalanced schedule and the resulting monetary rewards, Krumer and Lechner (2018) gave an example of SC Paderborn 07 , which was relegated from the Bundesliga 1 in the 2014/2015 season. This team played more home games on non-frequent days than its 7 Note that the winning team receives three points, while the losing team gets no points. In case of a draw, each team gets one point. 8 We also investigated the effect of playing on the two most frequent days, which are always Saturday and Sunday versus five non-frequent days (5/2 days split). We found that the third most frequent day is more similar to weekend games than to the other midweek games confirming our choice to refer to the 4/3 days split as to our preferred specification. For additional details, see the results section. 9 One additional explanation may be related to the usual pre-performance routine that has been found to be positively associated with performance in sports (see, for example, Lonsdale andTam, 2008 , andMullane-Grant, 2010 ). We discuss this possibility in the results section. closest rival until the very last game in the relegation fight, Hamburger SV , which eventually remained in the top division. Moreover, one of these games was against Hamburger SV , which the latter won. According to Krumer and Lechner (2018) , if SC Paderborn 07 had survived in the Bundesliga 1, its additional revenue from TV alone would have been at least 10.3 million Euros (not counting all other revenues from ticketing, advertising, and so on).
The remainder of the paper is organized as follows. Section 2 describes the schedule of the different leagues. The data and some descriptive results are presented in Section 3 . Section 4 presents the empirical strategy. The results are contained in Section 5 and we offer concluding remarks in Section 6 .

General structure of the leagues
While there are specific features for different leagues, the structure of all four leagues we investigate is largely similar. The leagues are organized as double round-robin tournaments, with each round consisting of n 2 games, where n is the number of teams in the league. In total, each team plays each other team twice, once at its home field in the first half of the season, and once away in the second half of the season (or vice versa). In total, every team has 2( n − 1 ) games. In the French, Spanish, and English leagues, there are 20 teams, resulting in 38 games for each team. In the German Bundesliga 1, there are 18 teams, resulting in 34 games for each team. In addition, except for the English Premier League, the leagues have a winter break of several weeks without games.
The schedule of the leagues should also take into account international tournaments between nations, with the requirement to release participating players earlier and allow them a longer vacation. The main tournaments are the FIFA World Cup and the UEFA European Championship (held alternately every two years in June and July). Other tournaments that have the requirement to release players are the African Cup of Nations and the Asian Cup. Those take place during wintertime in parallel to the European leagues' matches.
League games usually take place on weekends, but since there are not enough weekends in the season, some rounds take place on other days.
At the end of a season, the final table determines which teams participate in the following season's European club tournaments; these include the Champions League, which is the most prestigious club tournament in Europe, and the Europa League, which also yields significant monetary rewards. In addition, the two or three worst-ranked clubs are relegated to the second division, implying that the different outcomes have substantial financial consequences for the clubs.
Following Krumer and Lechner (2018) , who investigated the effect of playing on Fridays, Saturdays, and Sundays (the three most frequently played days in the Bundesliga 1), we identified the three most frequently played days separately for each league, as depicted in Table 1 . Although less pronounced than in the Bundesliga 1, the three most frequent days for Ligue 1 and Premier League are quite clear. For the La Liga, however, the reduction in frequency between the third and fourth days is less significant. We follow the choice of Krumer and Lechner (2018) by taking the three most frequent days and need the remaining four days, which are defined as nonfrequent, to have enough observations. Nevertheless, we will also investigate the effect of playing on the two most frequent days, which are always Saturday and Sunday, and compare between the results by discussing whether the third most frequent day is more similar to weekend games or to the other midweek games. In the following, we discuss special settings and uniqueness of schedule of the games that are described below for each league separately. Notes: The total numbers of matches on the specific weekdays, including matches that were excluded from the final analysis. Bold numbers represent the days of the "non-frequent" specification for each league.

The French Ligue 1
The three most frequently played match days are Saturday, Sunday, and Wednesday. The seasonal tournament in France takes place from August to the beginning of May. The top three teams advance to the Champions League (or for the Champions League playoffs). Teams in the fourth to sixth positions play in the Europa League (this may also depend on the outcome of an elimination French Cup tournament, called the Coupe de France ). In addition, the two worst-ranked clubs are relegated to the lower division and the 18th-ranked team has to participate in a relegation playoff against the team that won the second division playoff for the right to play in the Ligue 1 in the following year. 10

The Spanish La Liga
The three most frequently played match days are Saturday, Sunday, and Monday. The seasonal tournament in Spain runs from the end of August or beginning of September until May of the following year. The top four teams advance to the Champions League (or the playoffs). Teams finishing fifth to seventh play in the Europa League (this may also depend on the outcome of an elimination Spanish Cup tournament, called the Copa del Rey ). In addition, the three worst-ranked clubs are relegated to the lower division.

The German Bundesliga 1
The three most frequently played match days are Saturday, Sunday, and Friday. The seasonal tournament in Germany takes place from August to May. The top four teams advance to the Champions League (or to the playoffs). Teams finishing fifth to seventh play in the Europa League (this may also depend on the outcome of an elimination German Cup tournament, called the DFB-Pokal ). In addition, the two worst-ranked clubs are relegated to the lower division and the 16th-ranked team must participate in the relegation playoffs against the third-ranked team in the Bundesliga 2 for the right to play in the Bundesliga 1 in the following year. 11

The English Premier League
The three most frequently played match days are Saturday, Sunday, and Wednesday. The seasonal tournament in England takes place from August until May. This is the only one of the four leagues discussed here that does not have a long winter break. Several rounds take place during the Christmas holidays, usually involving local derbies to avoid fans having to travel long distances on those days ( Kendall, 2008 ). The best-known round takes place on Boxing Day, which is a part of the Commonwealth tradition. We expect this to play a role in the scheduling process in the underlying period and account for this issue, as described in the next section.
The top four teams advance to the Champions League (or the playoffs). Teams finishing fifth to seventh play in the Europa League (this may also depend on the outcome of two elimination English Cup tournaments: the FA Cup and the League Cup). In addition, the three worst-ranked clubs are relegated to the lower division.
Compared to the other three leagues, the Premier League has the highest amount of rescheduled games, because its clubs potentially have the highest number of games to play in their national cups (the FA Cup and the League Cup). The reason for this is that, in most stages of these competitions, a drawn match necessitated a repeated second game. 12 This partly interfered with the initial schedule proposed by the calendar committee.

Database
We used data on four major European football leagues: the French Ligue 1, German Bundesliga 1, the Spanish La Liga, and the English Premier League. 13 For each of the leagues, we collected data on all the games starting from the start of the 20 07/20 08 season until the end of the 2016/2017 season. This represents a total of 14,460 games. However, we disregarded games in which a home team did not play at its usual home stadium. For example, Bayer Leverkusen from Germany did not play the second half of the 20 08/20 09 season at its home stadium due to reconstruction. RC Lens from France experienced a similar situation in the 2014/2015 season. In addition, Montpellier, Caen (both 2014/2015), and Lille (20 07/20 08 and 20 08/20 09) from France played some home games in alternative stadiums. We also removed matches in which one of the teams had already been relegated or had already won the championship title. 14 In addition, teams that play in the Champions League or Europa League may strategically adjust their squads in the domestic leagues games that take place just before or after the European cups (for example, they may save their best players before the European games to avoid a risk of injury or let them rest after). Therefore, we also removed games that involved teams playing just before or just after the continental competitions. 15 Finally, we also removed rescheduled games, since those may differ with regard to media attention as they are detached from the rest of the matches. 16 Removing those games left 10,142 matches, 9010 of which took place on frequent days and 1132 on non-frequent days.
For every game, we collected information on the identity of teams, referees, exact day, attendance, distance between the cities, and the final score. We also used data from the Transfermarkt website to proxy the market value of each player of each team in every season. This data also includes personal information of each player, such as his age, height, and preferred foot. Finally, we have data on 12 In the case of a draw after the second game, overtime is played and, if needed, penalty shootouts determine the winner.
13 See Appendix C for the full list of sources. 14 For example, in 2013/2014 season Bayern Munich from Germany had won the Bundesliga 1 title after 27 rounds. However, in the next three games they only gained one point out of nine and were accused of lacking motivation. See Kendall and Lenten (2017) for additional discussion on the usage of squads in remaining games after winning a title. 15 See Rohde and Breuer (2017) , who showed that teams adjust their effort s in domestic league just before or after games in European tournaments. 16 For additional discussion on rescheduled games, see Yi, Goossens, and Nobibon (2020) . the dates of the beginning and the end of each coach's tenure, as well as data on the capacity of each stadium.

Definition of heterogeneity
There can be different types of heterogeneity in sports competition, such as home versus away or the favorite versus underdog teams. 17 We choose the favorite-underdog type of heterogeneity because it is intuitive that probabilities of winning (or the expected number of points) are largely driven by the differences in the teams' abilities, whereas the home-away factor plays a secondary role in increasing or decreasing the gap between the teams' probabilities of winning. While home advantage is a wellestablished phenomenon, the literature has largely neglected the heterogeneous effect for favorites versus underdogs. More importantly, beyond the above-mentioned intuition, standard economic theory predicts probabilities of winning based on contestants' innate abilities. For example, the Tullock contest ( Tullock, 1980 ) is a well-known model in economic theory that has been applied in many fields, from political races (e.g. Klumpp & Polborn, 2006 ) to sports tournaments (e.g., Szymanski, 2003 ). The most popular versions of this model are lottery and all-pay contest. In the lottery version, a contestant with a lower effort still has a positive probability of winning, whereas an all-pay contest is fully discriminatory, where a contestant with a lower effort is certain to lose. Now, assume a contest between two heterogeneous contestants 1 and 2, whose values (or the ability types) are V 1 > V 2 , implying that contestant 1 is a stronger (or a higher-ranked) contestant. In the lottery model, contestants' efforts ( x i ) are given by and their probabilities of winning ( p i ) are given by In the all-pay case, contestants' efforts are given by x 1 = V 2 2 , and x 2 = V 2 2 2 V 1 , and their probabilities of winning are given by 18 We can see that these probabilities are derived from contestants' ability types. 19 Therefore, the favorite-underdog type of heterogeneity is the one that fits the economic theory when investigating probabilities of winning (or the number of the gained points per game, in the case of soccer).

Variables and descriptive statistics
To estimate the effect of playing on non-frequent days on attendance and the number of gained points by the teams, we coded a dummy variable that equals 1 if a match was played on a nonfrequent day in a certain league, and zero otherwise. We also used a rich set of variables that characterize team value and players' ability, game attendance, and the international and national schedule. In the following, we present some of the most important measures (a more comprehensive list of variables appears in Appendix A ).
Our approach is closely related to that of Krumer and Lechner (2018) . Following their study, we used data on players' values from 17 Other types of heterogeneities might be found in teams that replaced their coach, matches played on artificial versus natural grass, televised versus nontelevised matches, traditional rivalry versus non-rivalry matches, etc.
18 For the lottery case, see, for example, Megidish and Sela (2014) , who studied two-stage contests that are frequently used in sports competitions. For the all-pay case, see, for example, Krumer and Lechner (2017) , who showed that in six out of seven possible cases in Olympic wrestling competitions, the all-pay model predicted correctly the identity of a wrestler with a higher probability of winning. 19 Note that those probabilities can be easily adjusted for the home advantage. See, for example, Krumer (2013) who provided a theoretical explanation to empirical finding of Page and Page (2007) on second-leg home advantage in the UEFA European Cups, by using the all-pay model adjusted by home and away games. a popular soccer website, Transfermarkt, which are supposed to reflect teams' abilities. Since these values increase every season, we standardized them for each league and season so that they take the within-season variation into account. 20 The teams' values measure strongly correlates with teams' performance, suggesting that we have measured teams' abilities quite well. 21 For each game, the favorite is defined as the team with the higher standardized Transfermarkt value and the underdog is the team with the lower standardized Transfermarkt value. Unlike with betting odds, where favorite and underdog can be a function of the day of the week and the home advantage, the Transfermarkt values are determined without considering those factors. Therefore, these definitions are exogenous. Following Krumer (2020) , we divided the data into games that take place at the favorite's and the underdog's home fields. In Table 2 , which presents descriptive statistics for the pooled data, we can see that when a favorite plays at its home stadium, the average number of points it gains on frequent days is 1.93. When the game is on a non-frequent day, the favorite team gains a very similar number of points (1.98). However, when an underdog team hosts the game, it gains an average of 1.39 points on frequent days and 1.14 points on non-frequent days, suggesting a lower home advantage on non-frequent days for the underdog teams only. Table 3 presents the descriptive statistics divided into four different leagues, where we can see a similar pattern. However, we 20 According to Bryson, Frick and Simmons (2013) , the coverage of Transfermarkt is quite "impressive with information on 190,0 0 0 players across 330 football competitions" (p. 611). Players' values are estimated by industry experts and take into account salaries, signing fees, bonuses, and transfer fees. Franck and Nüesch (2012) found that the correlation between values evaluated by Transfermarkt and Kicker , another highly-respected sport magazine in Germany, is as high as 0.89. 21 The results of the relevant regression analysis are available upon request from the authors. also observe that when an underdog team plays at home, in all leagues except La Liga, there are much stronger favorites on nonfrequent days compared to frequent days. The direction is the same in La Liga, but the difference between standardized values of favorites on frequent and non-frequent days is less pronounced. This descriptive evidence indicates that there is non-random selection into treatment; that is, non-frequent days. We will discuss how to solve this issue in the next section.
The players' values are used to create additional measures such as the distribution of values between and within teams. More specifically, for each team we compute the standard deviation of players' values − the Herfindahl-Hirschman Index (HHI) − which is defined as the sum of the squares of the values shares of each player within the team. We also created other within-team inequality-related variables such as the ratio of different players' values according to their ranking order in the team. For example, one measure is the ratio between the top three players to players ranked 9 −11 according to their values within a team. 22 In addition to players' values, we also use several other variables that may reflect the level of ability, such as a dummy variable for a team's first season in the top division after being promoted from the lower division, whether a team dismissed its coach during a season, and the age of the coach. 23 We also use data on the size of the squad, share of foreign players in the squad, height of the players, share of left-footed players, age of oldest/youngest players, etc.
Based on the large body of the literature on the effect that the crowd has on home advantage, we created a measure to reflect the attendance in a match. Our first measure − attendance as share 22 See Coates, Frick, and Jewell (2016) for discussion on the relationship between players' inequality in salaries and teams' performance. 23 See Tena and Forrest (2007) , and Flores, Forrest, and Tena (2012) for discussions on the effects of coach dismissals on team performance. of the capacity of the stadium − is the ratio between the number of viewers in a match and the maximal possible capacity of the respective stadium. We also applied a different measure of attendance, namely a natural logarithm of attendance ( Ln(Attendance)) that is also used in the literature on attendance demand ( Buraimo & Simmons, 2015 ;Buraimo, Tena & de la Piedra, 2018 ;Krumer, 2020 ) . In addition, following the studies of Boyko, Boyko and Boyko (2007) and Page and Page (2010) , who described the existence of individual differences among referees in terms of the home advantage in different soccer leagues, we created dummy variables for each individual referee in our data. Further, there is also information about the distance between cities, in kilometers for the shortest traveling distance.
We also obtained information on other schedule-related variables in international competitions, such as two pre-and post-World Cup and European Championships months, as well as the months in which the African Cups of Nations and Asian Cup took place. We also took different months of the season and public holidays into account.

Selection into treatment
We studied the effect of playing on a non-frequent day compared to a frequent day on the performance of a team. Here, the challenge for identifying a causal effect lies in the non-random determination of the teams that play at home on non-frequent days. In order to obtain an unbiased causal effect, it is essential to disentangle the effect coming with the selection from the effect caused by playing on non-frequent days. In other words, there is the need to take selection effects into account. Decisions regarding which teams play on which days are made by the calendar committees of the respective leagues and might be driven by teams' characteristics and other schedule-related features, such as public holidays, international breaks, TV broadcasters' interests, European association tournaments, etc. 24 The rich database presented in the previous section enabled us to opt for a selection-on-observables approach; that is, controlling for the reasons for the deviations from random treatment assignment. Having information on teams and game characteristics, European cups scheduling, national teams' tournaments, etc., enabled us to capture all confounding factors related to team, location, and timing to create a quasi-experimental setup. This allows us to identify the causal effect of playing on non-frequent days on performance if there are no unobserved characteristics that simultaneously affect both the probability of playing on a non-frequent day and the outcome.
An important issue that is worth a separate discussion relates to broadcasting issues, which may affect the allocation of games into frequent and non-frequent days. For instance, a broadcaster may influence the decision to allocate strategically more or less attractive games into different days. However, it is problematic to have a dummy variable of whether a game was broadcast or not since it is potentially part of our treatment. We try to solve this issue by capturing the selection effects, which might be partially due to broadcasters' interests, by controlling for a huge range of variables. Among those are the approximated strength of a team and its popularity (in the form of teams' values and teams' dummies, among many other things), as well as controls for other obligations of the teams, so as whether they play in the European tournaments and are therefore shifted to non-frequent days. Therefore, if a broadcaster has a preference for certain teams to play on frequent or non-frequent days (or even if a team itself has such a preference), these teams' dummy variable will capture this potential issue. Thus, it is very likely that selection into treatment is captured by our very large set of potential control variables satisfying the conditional independence assumption needed for credibly estimating a causal effect in our study.
Finally, it is also important to note that the share of capacity (or Ln(Attendance) ) is an endogenous variable since it is an outcome variable that depends on the day of the game. Therefore, we do not include it as a covariate in our estimation, but only use it as an outcome variable.

Estimator
In order to have a flexible approach and overcome the restrictive assumptions of classical statistical linear models, we used a statistical matching approach. More specifically, we applied the radius-matching-on-the-propensity score estimator with bias adjustment ( Lechner et al., 2011 ). 25 Not only was this estimator found to be very competitive among a range of matching-type estimators, but Huber et al. (2013) also showed its superior finite sample and robustness properties in a large-scale Empirical Monte Carlo Study. This estimator combines the features of distance-weighted radius matching with a bias adjustment, which removes potential biases due to mismatches. 26 Control observations, which are close to the treated unit in terms of the confounding influences, can be compared to the latter to obtain the treatment effect as if treated and control units were in an experimental setting. Therefore, it is crucial to capture all confounding influences; we explain how we do this in more detail below.

Propensity score
The propensity score, which is the probability of playing on a non-frequent day, condenses the information from all relevant confounding variables to a one-dimensional score, determining which observations are similar in terms of confounding influences. In their pioneering work, Rosenbaum and Rubin (1983) showed that controlling for the propensity score removes selection bias. Therefore, treated and non-treated observations with similar propensity scores are compared to each other in the matching estimator.
If the exact relation of confounding variables and the treatment assignment is known, the variables to include in the propensity score estimation can be specified ad-hoc. In our case, we have a set of 385 potentially confounding variables, in addition to the referees and teams dummies, as described in the previous section. Despite prior knowledge about the selection process, we cannot specify ad-hoc exactly which of the many potential confounders to use in the propensity score estimation. Further, including everything would lead to an unfeasible estimation, less precise or instable estimates. Therefore, for the specification we rely on a machine learning algorithm.
Using machine learning for causal inference is not a trivial exercise and the literature is still under development. Since those algorithms are designed for prediction and not for doing inference in treatment effects estimation, we follow the approach of Belloni et al. (2014) to make machine learning algorithms useful in this setup. Those authors suggested using the LASSO procedure developed by Tibshirani (1996) as a variable selection tool, 25 The variance is estimated as weight-based variance as described in Huber, Lechner and Steinmayr (2015) , which Bodory et al. (2020) showed to lead to conservative standard errors. 26 Distance weighting leads to a weighting of non-treated observations within the radius inversely proportional to their distance to the respective treated unit. twice. 27 , 28 In the first step, we selected a set of variables confounding the treatment. In the second step, we selected those variables correlated with the respective outcome. 29 The reason for this double-selection procedure, as opposed to only looking at the treatment selection equation, is to additionally capture variables that are highly correlated to the outcome and mildly related to the treatment selection. The same line of argumentation holds for only looking at the outcome equation. Ignoring those kinds of variables would lead to potentially biased results, as described in Belloni et al. (2014) . The union of variables selected by the two separate LASSO procedures is our final set of variables for the propensity score estimation. We repeat this selection procedure for all of the estimations presented in the next section.

Pooled data
First, our aim was to study whether playing on non-frequent days has an effect on performance when using the data on all the leagues together. To accomplish this goal, we first estimated the propensity score, based on variables that were chosen in the double-selection LASSO procedure described in the previous section. It is important to note that the purpose of the propensity score estimation is purely technical, to allow the easy purging of the results from the selection effects. Therefore, the respective marginal effects of the propensity score estimation cannot be interpreted in a causal sense, but rather in their contribution to the probability of being treated. Appendix B provides an example of one of the propensity score estimations we had to execute, which is the propensity score estimation for the number of favorite's points. Generally, as is already apparent from Tables 2 and 3 , selection effects are driven by team values as well as by schedule-related features such as public holidays and international tournaments. In addition, several individual referees and teams were picked by the double-selection LASSO procedure. While Appendix B only presents the results of the propensity score estimation for the number of points of a favorite team, a separate propensity score estimation for each matching estimation presented in the paper is available upon request.
In addition, we wish to show the sensitivity of our results to the presence or absence of referees' dummies. Such an importance is driven by findings of Boyko et al. (2007) and Page and Page (2010) , who showed that some referees might be more affected by the home crowd and therefore may serve as possible mediator of a difference in a home advantage. Therefore, we present our results with and without the inclusion of referees' dummies. More specifically, in the specification that includes referees' dummies, we only use the dummies that were chosen in the double-selection LASSO procedure.
Panel A of Table 4 shows the effect of playing on four nonfrequent days compared to the three frequent days by pooling all the leagues together. We can see that when a favorite team plays at home, the effect of playing on non-frequent days on the number of points is very close to zero ( −0.029) and highly insignificant 27 The LASSO procedure is a shrinking estimator, which works like an OLS estimator with penalized coefficients. Penalizing the coefficients leads to variables selection as the coefficients of not too informative covariates are forced to zero. 28 Goller et al. (2019) compared different (machine learning and "classical" probit) estimation procedures for the propensity score in matching estimation and found the LASSO delivered the most credible results in a setup, which is comparable to ours, with many potentially confounding variables and a low share of treated units. 29 Using the LASSO method requires a penalty term, which is data-driven determined using 10-fold cross-validation. For the current analysis, we chose the penalty term that minimized the mean squared error.
( p -val = 0.64). However, when testing the effect of playing on nonfrequent days on the number of points of the favorite when an underdog plays at home, we find that a favorite gains 0.17 points significantly more on non-frequent days than on the frequent days. An underdog gains about 0.13 points less when hosting a game on non-frequent compared to frequent days, making the difference between favorite and underdog about 0.30 points. The share of capacity on non-frequent days is 3.8 percentage points less when a favorite team plays at home and 1.8 percentage points less when an underdog team hosts the game. Similarly, Ln(Attendance) is significantly lower on non-frequent days. Panel B of Table 4 , where we present the results without referees' dummies shows a similar pattern for all the outcome variables. One possible concern is that our results are driven by the definition of the frequent days. As presented in Table 1 , Saturdays and Sundays are the most frequently played days in all the leagues. Therefore, an alternative comparison could be between the five midweek days to the two weekend days. In Table 5 , we present the effect of playing on the five non-frequent days compared to the two frequent days (defined as 3-7 vs. 1,2 in Table 5 ). We can see that, as previously, there is no significant effect on teams' points when a favorite team plays at home. However, when an underdog team plays at home, the effect on teams' points is lower than in the case of the 4/3 days split as presented in Table 4 (also defined as 4-7 vs. 1,2,3 in Table 5 ). In addition, we excluded the third frequent day and compared the effect of playing on the four remaining weekdays compared to the two weekend days (defined as 4-7 vs. 1,2 in Table 5 ). In the case of the underdog's home advantage, we can see that the effects are much closer to the 4/3 days split rather than to the 5/2 days split. We also show that the third most frequent day is significantly different from days 4-7 in terms of teams' points (defined as 4-7 vs. 3 in Table 5 ), namely underdog teams that play at home on the four non-frequent days gain significantly less points than when they play at home on the third most frequent day. Finally, we show that there is no significant difference in terms of points between the two weekend days and the third frequent day (defined as 3 vs. 1,2 in Table 5 ). These results suggest that the third frequent day is more similar to the weekend days rather than to the other weekdays.

Individual leagues
To investigate whether our results are driven by pooling the data, in Table 6 , we present the results for each league separately by using the 4/3 days split. When including referees' dummies, as presented in Panel A, we can see that, for all the leagues and for both cases of home advantage, we find a lower attendance as a share of capacity and a lower Ln(Attendance) on non-frequent days compared to frequent days. When a favorite team hosts the game, the effect of playing on non-frequent days on share of capacity ranges from 1.6 percentage points less in English Premier League to 7.4 percentage points less in German Bundesliga 1. When looking at Ln(Attendance) , we see that a reduction in attendance ranges from 6.5% lower attendance in English Premier League to 15.9% lower attendance in German Bundesliga 1 on non-frequent days. When an underdog team hosts the game, the effect of playing on non-frequent days on the share of capacity ranges from 3.4 percentage points less in the English Premier League to 5.9 percentage points less in Spanish La Liga. When looking at Ln(Attendance) , we see that reduction in attendance ranges from 6.0% lower attendance in English Premier League and French Ligue 1 to 10.2% lower attendance in German Bundesliga 1. Panel B of Table 6 shows a very similar pattern. These results replicate the finding of Krumer and Lechner (2018) , who also found a lower attendance in the German Bundesliga 1.  (1) and (2) represent the expected values for four most non-frequent and three most frequent days, respectively. Columns (3) and (4) report the average treatment effect and the respective P -value. Standard errors are calculated as weight-based standard errors and clustered at the season per league level. Column (5) states the share of observations in common support in the radius matching. * * and * * * denote the 5% and 1% significance levels, respectively.

Notes:
The results represent the effects of playing on different definitions of non-frequent days for all the data. The most frequent day is defined as day 1 and the least frequent day is defined as day 7. P -values of the effects are presented in parentheses. Standard errors are calculated as weight-based standard errors and clustered at the seasonal level. * , * * , and * * * denote the 10%, 5%, and 1% significance levels, respectively.
When using referees dummies, as presented in Panel A of Table 6 , in all the leagues apart from Bundesliga 1, playing on nonfrequent days has no effect on the home advantage of the favorite teams. However, in all the four leagues there is a reduced home advantage on non-frequent days for the underdog teams. Our results suggest that the difference in the number of points between the favorite and the underdog teams, when the game takes place on non-frequent days compared to frequent days, is 0.49 in Ligue 1, 0.43 in La Liga, 0.42 in Premier League, and 0.63 in Bundesliga 1. This difference is quite large given that, in our dataset, a favorite with home advantage gains on average about 1.1 points more than the underdog. In addition, in a tight league, one point could make the difference between relegation and survival or between qualification to the UEFA Champions League and the less prestigious UEFA Europa League.
Interestingly, when excluding referees' dummies, as presented in Panel B of Table 6 , we find no significant effect of playing on non-frequent days in the Premier League. This result may suggest that certain referees may serve as possible mediators of a reduced home advantage of an underdog team on non-frequent days, which is in line with the findings in Boyko et al. (2007) and Page and Page (2010) , who described the existence of individual differences among referees in terms of the home advantage. This is also in line with the recent speculations that some referees have a reputation for being biased in favor or against specific teams in the English Premier League. 30 30 See, for example, https://www.sportskeeda.com/slideshow/ football-most-biased-referees-top-teams-premier-league .
Last accessed on 02.11.2019.

Table 6
Levels and effects of playing on four non-frequent days for each league separately.

Notes:
The results represent the effects of playing on four non-frequent days compared to the three most frequent days for each league separately. P -values of the effects are presented in parentheses. Standard errors are calculated as weight-based standard errors and clustered at the seasonal level. * , * * , and * * * denote the 10%, 5%, and 1% significance levels, respectively. Common support for each of the matching estimations is at least 88.1% (Ligue 1), 92% (La Liga), 91.4% (Premier League), and 83.7% (Bundesliga).
Furthermore, we conducted a WALD test for the equality of each estimate obtained in the analyses for the single leagues. More specifically, we tested the hypothesis that the effects are equal for all the leagues and reported the p-values from the resulting chi square test statistic in the last column of Table 6 . We found that the effects are not statistically different between the leagues for the favorites and underdogs points (except for underdog points without referees' dummies), or for the Ln(Attendance). Although the only robust significant difference between the leagues is for the share of capacity in the case when a favorite team plays at home, this emphasizes the need to conduct the analysis for each league separately.

Robustness tests
Despite our findings presented in Table 5 that on the aggregate level, the third frequent day is likely to resemble the two weekend days, we nevertheless conduct separate analyses for the 5/2 days split for each league separately. The results in Table 7 show a similar pattern as with 4/3 days split presented in Table 6 , though not always significant at conventional level. More specifically, we find that the share of capacity in case when an underdog plays at home is always negative and significant for all the leagues with and without referees' dummies. These results are in line with the findings of Buraimo (2008) , Buraimo and Simmons (2015) , and Forrest and Simmons (2006) , all of whom reported that weekend games attract larger crowds and larger TV ratings in the English Premier League. However, Ln(Attendance) is sensitive to inclusion of referees' dummies in Ligue 1 and also not significant in three out of four cases in the Bundesliga 1, but confirms the general picture. In addition, we can see a higher number of points for the favorite team (lower number of points for the underdog team) when underdog team plays at home, however with a lower magnitude compared to the results in Table 6 . When including referees' dum-mies, the results are significant for La Liga and Bundesliga 1, and are in similar direction as with the 4/3 days split, but with p -val in the range between 0.14 and 0.17 in Ligue 1 and Premier League. When excluding the referees' dummies, as presented in Panel B of Table 7 , the Premier League has a zero effect, which is in line with the results presented in Panel B of Table 6 . The results in Ligue 1 and La Liga are just insignificant with p-val in the range between 0.11 and 0.14.
Finally, as previously, we excluded the third frequent day and tested the effect of playing on four non-frequent days versus the two most frequently played days. Table 8 presents the results, where we can see that the share of capacity is always negative and significant, and Ln(Attendance) is not significant for Ligue 1 without referees' dummies and for one case in the Premier League. When excluding the third most frequent day in Bundesliga 1 (Friday), the Ln(Attendance) became significant again, suggesting that Friday is much more similar to Saturday and Sunday than to the other weekdays. Finally, the effects on teams' points when the underdog team plays at home are much larger and more significant than in the 5/2 days split, presented in Table 7 . As already discussed, these results suggest again that the third most frequent day is more associated with the weekend games rather than with midweek games, and therefore we refer to the 4/3 days split as to our preferred specification.

Discussion
Our finding regarding the lower home advantage of the underdog on non-frequent days is in line with the literature on the effect of the density of the crowd and its noise on referees' bias in favor of the home team ( Downward & Jones, 2007 ;Nevill et al., 2002 ;Page & Page, 2010 ;Pettersson-Lidbom & Priks, 2010 ). Therefore, a possible mediator of the difference in the home advantage of the underdog teams in games that take place on non-frequent Table 7 Levels and Effects of playing on five non-frequent days for each league separately.

Outcomes
Ligue 1   Notes: The results represent the effects of playing on five non-frequent days compared to the two most frequent days for each league separately P -values of the effects are presented in parentheses. Standard errors are calculated as weight-based standard errors and clustered at the seasonal level. * , * * , and * * * denote the 10%, 5%, and 1% significance levels, respectively. Common support for each of the matching estimations is at least 92.8% (Ligue 1), 93.4% (La Liga), 89.5% (Premier League), and 94.6% (Bundesliga).  days is lower crowd noise compared to games on frequent days. Crowd noise might be lower on non-frequent days not only due to lower attendance, but also magnified by a lower amount of alcohol consumption on non-frequent days. In addition, if a crowd is less passionate and supportive on non-frequent days, this may also negatively affect players' motivation, which may result in lower performance.
Another possible explanation to the lower home advantage of the underdog on non-frequent days relates to the literature on preperformance routine. Previous research has suggested that having the usual pre-performance routine has a positive relationship with performance in sports. This may be due to lowered anxiety, increased task-relevant focus or reduced negative self-assessment. For example, Mesagno and Mullane-Grant (2010) showed that usual pre-performance routine helped Australian football players performing better under increased pressure. Similarly, Lonsdale and Tam (2008) demonstrated that NBA basketball players' free throw shooting was better if they followed their dominant behavioral sequence. 31 In that spirit it is possible that players of the home team have a different pre-performance routine on nonfrequent day (for example, different within family routine, different traffic before the game, different media coverage, etc.), which may negatively affect their anxiety and self-efficacy levels. In that regard, away teams are less exposed to negative effects of deviating from their routine, since they are not in their usual settings anyway, and therefore prepare to that game in the usual away game routine.
One possible explanation for the finding that the home advantage of the favorite teams is not affected by the day of the game is that these teams are likely to win because of their higher abilities, regardless of home support or pre-performance routine. This would suggest that underdog teams depend more on crowd support or some other psychological factors than favorite teams. The negative effect on the home advantage of the favorite in Bundesliga 1 may be explained by the fact that this league differs from the other leagues as it has 18 teams, not 20. In addition, according to Yi et al. (2020) , Bundesliga 1 has the most reliable league schedule, with the lowest number of rescheduled games and a higher quality realized schedule than the other leagues. Indeed, as Table 1 shows, Bundesliga 1 has the lowest amount of games on non-frequent days. In addition, as already mentioned, Bundesliga 1 has the highest reduction of attendance among all the leagues in the 4/3 days split when a favorite team plays at home (15.9% less attendance). This suggests that fans are not used to attending the stadium on non-frequent days and also that the players of the favorite teams may be less motivated to play in front of smaller crowds that are also likely to be less supportive on non-frequent days.

Conclusion
According to Wright (2014) , the main objective of his survey on operational research in sports was fairness, which is probably 31 For additional references on the effect of pre-performance routine on performance, see a recent study of Wergin et al. (2020) . one of the most important features in sports competitions. In the context of scheduling of the soccer leagues, a schedule would be considered fair if ex-ante all teams have the same probability to convert the home advantage into success, given their individual characteristics, regardless of the day of the game. In this regard, our findings suggest that in all four leagues that we investigated (Bundesliga 1, La Liga, Premier League and Ligue 1), we find that the current schedule structure favors underdog teams that play fewer home games on non-frequent days and favorite teams that play more away games on these days. However, at least in the Premier League, the possible mediator may be a referee bias in favor or against certain teams. Therefore, our results may be of interest to the referees' committee whose goal is to reduce any bias.
Our results also suggest that all four leagues suffer from lower attendance rates on non-frequent days. Therefore, our findings may also be of interest to the calendar committees of the relevant leagues, whose task is to allocate games in a way that eliminates any advantage driven by schedule. More specifically, it is important that home games on non-frequent days are allocated evenly on the level of a single team per season. Having said that, we are aware that the calendar committees have to deal with a large amount of constraints, such that it is sometimes almost impossible to satisfy all of them. On top of it, schedule effects that might drive a referee bias should be a big concern for the referees' committees rather than for calendar committees. However, the case of 1.FSV Mainz 05 that was mentioned in the introduction provides an example where the calendar committee disregarded the issue of fair allocation between different days. Furthermore, our findings might be worth an additional restriction, not only in constructing the initial schedule, but also for implementing proactive or reactive strategies for rescheduled games, as Yi et al. (2020) recently proposed.
In addition, the results of this paper may also help coaches and players prepare to play on different days. According to our results, underdog teams may be expected to have a lower home advantage on non-frequent days and should therefore consider adjusting their preparation to these games. Furthermore, teams may adjust their ticket sales strategy. For example, tickets for games on non-frequent days, for which there is less demand, could be sold for a lower price to attract larger crowds and increase home advantage.
Finally, we call for additional empirical research on different schedule effects in sports leagues that may potentially affect onpitch performance as well as financial outcomes.   Appendix B: Propensity score estimation    Notes: Dependent variable is whether a game is played on one of the four nonfrequent days. Probit average marginal effects are presented. The results are based on the union of variables selected by the two-step LASSO variable selection for playing on non-frequent days and the number of favorites' points. Und. and Fav. represent the underdog and favorite teams, respectively. * , * * and * * * represent the 10%, 5%, and 1% significance levels, respectively.