Detecting coverage bias in user-generated content

ABSTRACT The importance of user-generated content is growing as media consumption is moving online; yet, investigations of media bias on user-generated content platforms are rare. We develop a novel procedure to detect coverage bias – i.e., bias in the amount of coverage certain topics or issues receive – on user-generated content platforms. We proceed in two steps. First, we focus on a sample of homogeneous observations and control for observable differences. Second, we compare the coverage of our observations between different language versions of the same platform in a difference-in-differences framework, which allows us to disentangle coverage bias from unobserved heterogeneity between observations. We apply our procedure to Wikipedia and examine whether it has a coverage bias in its biographies of German (and French) Members of Parliament (MPs). Our analysis reveals a small to medium size coverage bias against MPs from the center-left parties in Germany and in France. A plausible explanation are partisan contributions to the Wikipedia biographies, as we show by analyzing patterns of authorship and Wikipedia’s talk pages for the German case. Practical implications of our results include raising users’ awareness of coverage bias when searching for and processing information obtained on user-generated content platforms.


Introduction
User-generated content is becoming more and more important for modern media markets. As media consumption is moving online, consumers turn to Yelp to find a restaurant, to TripAdvisor to plan a vacation, and to Wikipedia to search for information (Luca, 2016). The reach of such user-generated content platforms is tremendous: as of June 2020, more than a billion hours of YouTube videos are watched every day, 1 Facebook attracts 2:6 billion active users every month, 2 and 1:5 billion users per month access the 53 million Wikipedia articles. 3 Despite the significance of user-generated content for the online media landscape, examinations of media bias on user-generated content platforms are rare. In particular, while it is comparatively easy to disclose fake news, the detection of coverage bias -i.e., bias in the amount of coverage certain topics or issues receive -is challenging.
Research on coverage bias has a long history in media economics and dates back to early studies on the agenda-setting power of the mass media (McCombs & Shaw, 1972). More recently, Puglisi and Snyder (2016) and Gentzkow et al. (2016) characterize coverage bias as insidious and more prevalent than media bias through the distortion of information (p. 651 and p. 626, respectively), calling for close attention of researchers and policy-makers. 4 Considering coverage bias in the context of usergenerated content platforms adds two additional dimensions to the existing literature. First, being As argued, any investigation of coverage bias requires the researcher to disentangle the effect of the newsworthiness of a subject from the bias itself. In our application, we must differentiate between the effect of an MP's individual characteristics on her biography length from the effect of party affiliation. In other words, it is possible that MPs about whom there is more to write accumulate within particular parties; the resulting differences in Wikipedia coverage between MPs would, according to our definition, not be classified as bias. If, in contrast, partisan contributors manipulate the biographies of MPs from a specific party, we would observe a coverage bias as defined above. Since many MP characteristics are unobserved (e.g., ability, wittiness, or looks), we cannot easily distinguish their impact on biography length from the impact of the MPs' party affiliation.
Following the outline from above, we address this issue in two steps. First, we study a sample of relatively homogeneous observations. We start by considering the 18th German Bundestag (2013 to 2017) and focus on MPs from Germany's two biggest political parties, the center-right CDU/CSU and the center-left SPD, who jointly comprised more than three quarters of all MPs and formed a coalition government. This is particularly convenient for our identification strategy: when we compare the biographies of MPs from the CDU/CSU and the SPD, differences in length cannot originate from differences in government versus opposition parties, centrist versus more extreme parties, or big versus fringe parties. 10 To further increase comparability, we exclude MPs in distinguished offices such as Chancellor Angela Merkel, ministers, and party heads from the analysis, and we control for observable characteristics such as gender, political experience, and constituency demographics.
As a second step, we compare the length of the MPs' German and English Wikipedia biographies in a difference-in-differences framework. Partisan contributors are less likely to amplify the English biographies, since German voters are unlikely to read them. Assuming that unobserved MP characteristics affect the German and the English biography length equivalently, a difference-in-differences estimation using language as a first, and party affiliation as a second difference, yields estimates of the effects of party affiliation on biography length that are less prone to omitted variable bias. Since English biographies are only available for about a quarter of our observations, we also take potential selection effects into account.
We find that biographies of MPs from the SPD are, on average, about 1; 200 characters shorter than biographies of MPs from the CDU/CSU, which roughly corresponds to half a DIN-A4 page. 11 These differences remain after controlling for gender, political experience, outside earnings, education, and MPs' constituency demographics. To put the effect size into perspective, note that the average biography length is about 5; 900 characters (2:33 pages), the median length is about 4; 500 characters (1:66 pages), and the standard deviation in biography length is about 4; 900 characters (nearly two pages). The difference-in-differences estimate for coverage bias, i.e., the effect of party affiliation on biography length, is about twice as large as the estimate based on German biography length alone, confirming that the unequal coverage between MPs from CDU/CSU and SPD is not driven by unobserved MP characteristics.
We investigate several further dimensions of coverage, too, and find that biographies of MPs from the SPD exhibit fewer images, fewer adjectives, a lower adjective to word ratio, and a smaller number of links to external websites under the control of the MP or her party than biographies on MPs from the CDU/CSU. Images and adjectives brighten texts and contribute to a more positive coverage, and a high number of weblinks under party control indicates that Wikipedia is used more extensively for election campaigns. Biographies of MPs from the SPD are also assigned to a lower number of Wikipedia categories, which makes them harder to find.
We stress that our approach is widely applicable. To demonstrate this point, we study if there is a coverage bias in the biographies of French Members of the National Assembly in a second application. Using the same procedure as above, we find that biographies of MPs from the center-right LR are around half a page longer than biographies of the center-left PS and its allies; this difference 10 The literature on the news coverage of politicians has found a clear incumbency effect on coverage (Vos, 2014). 11 2; 500 characters roughly correspond to one DIN-A4 page of the "Printable version" of a Wikipedia biography. corresponds to around 35% of a standard deviation in biography length. The finding is supported by the difference-in-differences estimates, whose magnitude and effect size are similar to the results based on French biography length alone.
We also provide a brief theoretical framework on the emergence of coverage bias in user-generated content, where we highlight the role of a platform's users. In particular, we argue that differences in the users' preferences, characteristics, and ulterior marketing aspirations can result in unbalanced coverage.
Regarding Wikipedia, we show that differences in the users' partisan contributions are a likely driver of the coverage bias between center-left and center-right parties. Focusing on German MPs, we check the plausibility of this explanation in three ways. First, we identify all anonymous edits conducted from the Bundestag network by tracking the users' IP addresses. Consistent with our explanation, we find that biographies of MPs from the CDU/CSU are edited nearly 50% more often from the Bundestag building than biographies of MPs from the SPD. Second, we document that there are fewer authors who repetitively contribute to SPD biographies only. Finally, if partisan contributions drive the differences in coverage, they should generate debates within the Wikipedia community. A testable implication is that Wikipedia's talk pages for MPs from the SPD should be shorter. Indeed, there are fewer talk pages for MPs from the SPD, and these pages are also shorter on average.
Our paper makes three contributions to research in media economics. First, we present a novel empirical strategy to detect coverage biases in user-generated content. While existing approaches use variation in institutional features across or within online platforms (e.g., Anderson & Magruder, 2012;Mayzlin et al., 2014), we exploit variation between different language versions of a platform. The procedure is applicable beyond the context of our paper and may be used as basis for further research on coverage bias in the online media landscape, e.g., to study coverage bias in consumer reviews on Amazon, Yelp, or TripAdvisor, or to study the coverage of companies, celebrities, and certain (types of) products on information sharing websites like YouTube or Tumblr. 12 Second, while media biases in the offline media landscape are well understood (see Puglisi & Snyder, 2016, for a survey), our results add to the scarce knowledge on media bias in user-generated content. As far as we know, we are the first to unveil a partisan coverage bias on Wikipedia, whereby we complement the findings by Greenstein and Zhu (2012), who measure political slant in Wikipedia articles on political issues and conclude that the English language version of Wikipedia is slanted toward the Democratic party in the USA. In contrast to that, our measure is based on the amount of coverage of German (and French) MPs from different political parties.
Third, we provide evidence of a partisan coverage bias in the world's largest online encyclopedia, which -given that Wikipedia serves as information source for billions of users -is a relevant result in itself and utters a general word of caution regarding the use of Wikipedia. If users are unaware of this type of bias during their online searches, they may be especially susceptible to it.
The paper proceeds as follows. Section 2 discusses the related literature. Section 3 describes our dataset. The empirical strategy and the results with respect to German MPs are presented in Section 4. Section 5 demonstrates the applicability of our approach beyond the German context. In Section 6, we propose a brief theoretical framework and show that partisan contributors are a likely driver of coverage bias. Section 7 discusses the practical implications, the external validity, and limitations of our analysis.

Related literature
Our paper contributes to three overlapping strands of literature, summarized below and in Appendix D.
12 See Section 7 for further discussion.

Media bias
First, we contribute to the large literature on media bias. Our paper is especially close to studies that develop empirical measures for media bias; see D' Alessio and Allen (2000), Groeling (2013), and Puglisi and Snyder (2016) for extensive reviews. 13 Manifestations of media bias typically fall into one out of two categories: the systematic selection of news items (Which news items are covered by the media?), and the systematic distortion of news items (How do the media report about news items?). Following D'Alessio and Allen (2000), we classify the former manifestation as coverage bias and the latter as statement bias. 14 As our paper presents a novel approach to detect coverage bias in user-generated content, we focus on reviewing papers that present alternative measures for coverage bias, but we also include the most important papers detecting statement bias in the overview we provide in Appendix D.
As argued in Section 1, detecting coverage bias is a challenge, because it is difficult to determine whether the amount of coverage certain topics receive corresponds to their newsworthiness. A straightforward procedure is to control for observable differences between observations as in Schiffer (2006), whereby any remaining differences in coverage can be ascribed to coverage bias. However, many determinants of a topic's newsworthiness are unobserved (e.g., the ability of a politician or the quality of a product), whereby the estimates from this approach are prone to omitted variable bias.
Procedures that either compare the coverage of homogeneous observations within the same media outlet or the coverage of the same observations across different media outlets are better able to take unobserved heterogeneity between observations into account. Niven (2003), Puglisi (2011), Brandenburg (2005, and Morris and Francia (2010), for instance, pursue the former approach, arguing that differences in the coverage of comparable observations are unlikely to result from differences in the observations' newsworthiness. 15 Conceptually similar, Barrett and Peake (2007), Groeling and Baum (2008), Larcinese et al. (2011), Puglisi and, Lott and Hassett (2014), and Galvis et al. (2016) pursue the latter approach, where comparing the coverage of the same observations across media outlets rules out that differences in coverage are driven by unobserved heterogeneity between observations. 16 Yet, estimates from comparing the coverage of homogeneous observations within the same media outlet may still be affected by unobserved differences in newsworthiness between observations, and estimates from comparing the same observations across different media outlets may be confounded by heterogeneity between news outlets.
A final approach to study coverage bias is to focus on observations with an observable population, either because it is observable per se as in Groeling (2008), Aday (2010), Soroka (2012), Garz (2014), and Heinz and Swinnen (2015), or because the authors could create it as in Butler and Schofield (2010) and Dertwinkel-Kalt et al. (2019). This procedure, however, is often not feasible for the particular research question at hand. Our application, for instance, would require an observable population of all MP characteristics that determine MPs' newsworthiness.
Our paper presents a novel approach to detect coverage bias in user-generated content that improves upon the shortcomings of the existing procedures. As outlined in Section 1, we proceed in two steps. First, similar to Schiffer (2006), Niven (2003, Puglisi (2011), Brandenburg (2005, and Morris and Francia (2010), we focus on a sample of homogeneous observations and control for observable differences. Second, we compare the coverage of our observations between different language versions of the same platform in a difference-in-differences framework, whereby we can disentangle coverage bias from unobserved heterogeneity between observations. In particular, if 13 The theoretical literature on media bias is surveyed by Gentzkow et al. (2016). 14 Puglisi and Snyder (2016) refer to "implicit" and "explicit bias"; Gentzkow et al. (2016) speak of "filtering" and "distortion bias." 15 Puglisi (2011), for instance, analyses the New York Times' coverage of several topics from public policy and finds that during a presidential campaign, The New York Times gives more emphasis to topics on which the Democratic party is perceived as more competent (civil rights, health care, labor and social welfare) when the incumbent president is a Republican. 16 E.g., Larcinese et al. (2011) consider coverage on unemployment numbers and show that newspapers with pro-Democratic endorsement patterns systematically give more coverage to high unemployment when the incumbent president is a Republican than when the president is Democratic, compared to newspapers with pro-Republican endorsement patterns. unobserved heterogeneity between observations affects the amount of coverage in either language version equivalently, the difference-in-differences estimates are less prone to omitted variable bias than the estimates from a comparison of coverage in one language alone. Moreover, comparing two language versions of the same platform prevents omitted variable bias stemming from unobserved heterogeneity between media outlets. We perceive this novel procedure to detect coverage bias as the major contribution of our paper.
Most of the research on media bias has focused on traditional media -such as newspapers and television -while comparatively little is known about biases in user-generated content. A notable exception is Greenstein and Zhu (2012) who apply automated text analysis to measure the slant in Wikipedia articles on political issues. In their paper, slant is an intrinsic property of an article, measuring whether its language is more typical for the Republican or Democratic party in the USA. In contrast to that, our measure of coverage bias is based on the overall coverage of German and French MPs from different political parties. In addition, research on media bias has mainly focused on media in the USA, while research on biases of media in other languages is rare. The political systems and the media systems of the USA, Germany and France differ along many dimensions (Hallin & Mancini, 2004;Persson & Tabellini, 2005), and it is a priori unclear whether results obtained in the US will hold in the European cases. Indeed, a naive extrapolation of the result that the English Wikipedia has a pro-liberal bias (Greenstein & Zhu, 2012) might suggest that, when comparing the two biggest German parties, the German Wikipedia is biased in favor of the center-left SPD relative to the centerright CDU/CSU. Our results, however, show the exact opposite.

User-generated content
Our paper also adds to the growing research on user-generated content and social media (see Luca, 2016, for a survey), where it is particularly close to the literature on (promotional) consumer reviews. These papers typically develop procedures to detect promotional or fake reviews as in Mayzlin et al. (2014) and Luca and Zervas (2016), or they assess the effect of promotional reviews on prices (Melnik & Alm, 2002), sales (Anderson & Magruder, 2012;Chevalier & Mayzlin, 2006;Lu et al., 2013), or product growth (Clemons et al., 2006). 17 Promotional reviews are conceptually similar to ulterior marketing aspirations that lead to coverage bias on user-generated content platforms (see also Section 6.1 on this), whereby the former strand of research is particularly close to our paper. Luca and Zervas (2016) use the results of Yelp's filtering algorithm as a proxy for fake reviews and find that a restaurant is more likely to commit review fraud on Yelp when its reputation is weak. Mayzlin et al. (2014) study promotional content in hotel reviews and show that hotels with a high incentive to fake have more positive reviews on TripAdvisor -where it is less costly to leave a review -relative to Expedia; the neighbors of hotels with a high incentive to fake, in contrast, have more negative reviews on TripAdvisor than on Expedia. In contrast to that, our approach does not compare equivalent observations across platforms, but within two language versions of the same platform. Our empirical strategy hinges on the assumption that the difference in the coverage of two subjects in two different language versions of the same platform was identical in the absence of bias, which might be more plausible than the assumption that the difference in the coverage of two subjects between different platforms was identical. Moreover, our data requirements are more modest: our procedure does not require an observation to be covered by two independent platforms, but in two language versions of the same platform instead. Since many relevant usergenerated content platforms such as Wikipedia, YouTube, TripAdvisor, Twitter, Yelp, and Facebook are available in several languages, we consider the latter condition as easier to meet. In contrast to Luca and Zervas (2016), our approach does not rely on the availability of pre-specified promotional content.
A further group of papers on user-generated content examines the individual motives for users to contribute to platforms in general and to Wikipedia in particular; the results from these papers link to 17 Theoretical contributions are provided by Mayzlin (2006) and Dellarocas (2006). the theoretical framework from Section 6.1. Wang (2010) shows that a positive reputation of a user increases her productivity, while Chen et al. (2010) demonstrate that users become more productive when they realize that others are more productive than they are. In the context of Wikipedia, Zhang and Zhu (2011) show that users' activity decreases in community size, i.e., users contribute more if the number of potential readers grows.
Several analyses, surveyed by Mesgari et al. (2015), are linked to our paper, because they examine contributions to Wikipedia or Wikipedia itself. For instance, it is often claimed that Wikipedia suffers from systemic coverage biases induced by its users' demographics -a majority is English-speaking, white, male, and Internet-affine (Halavais & Lackaff, 2008) -leading, e.g., to the under-representation of women (Hinnosaar, 2019;Reagle & Rhue, 2011) as well as to the over-representation of Western culture (Callahan & Herring, 2011) and other "male" topics such as naval sciences, military, science fiction, and fantasy (Rosenzweig, 2006). Similarly, while several studies confirm that Wikipedia exhibits only few factual errors (Mesgari et al., 2015), Brown (2011) unveils frequent errors of omission, which motivates our focus on coverage.

Social media and political outcomes
Third, our analysis is linked to research on social media and political outcomes (see Goldfarb & Tucker, 2019, for a survey), in particular to papers that examine how broadband diffusion (e.g., Campante et al., 2018;Falck et al., 2014;Gavazza et al., 2019) and social media (e.g., Bond et al., 2012;Enikolopov et al., 2020) affect voting behavior and political participation (Zhuravskaya et al., 2020, provide a survey). Since coverage bias on Wikipedia that stems from partisan contributions is a type of political advertising, the paper by Liberini et al. (2020) is especially close to our study. The authors show that micro-targeted political advertising through social media has a significant effect on voters during the 2016 US presidential elections: exposure to these ads made individuals less likely to change their initial voting intentions, particularly among those who had expressed an intention to vote for Donald Trump. Kürschner (2015) provides a list of all members of the 18th German Bundestag, including information on their education, political experience, and party affiliation. Data on the MPs' offices during the 18th Bundestag stems from bundestag.de. Information on the MPs' ancillary incomes is collected from abgeordnetenwatch.de: German MPs are obliged to declare their ancillary income by means of ten different categories; following the literature, we use these categories' mean values in our analysis (e.g., Becker et al., 2009). 18 Data on constituency demographics stems from the electoral management body. 19 Data on the French MPs in the 14th legislature stems from the National Assembly's homepage assembleenationale.fr.

Data
From the initial list of German MPs, we exclude Chancellor Angela Merkel, 35 MPs in distinguished offices (party heads and ministers from the 18th or a preceding Bundestag), and nine MPs who had already left the 18th Bundestag before we started our data collection. 20  In Germany, each voter casts two votes in elections to the Bundestag. The first vote decides which local candidate from each of Germany's 299 constituencies will be sent to the Bundestag. The second vote is cast for a party list and determines the parties' relative strength in the Bundestag. The Bundestag has a minimum number of 598 seats. In its 18th election term, it was amplified by four additional "overhang" seats, since the CDU won more constituency seats than it would have been entitled to based on the second-vote share. A further 29 "balance seats" sustain the parties' relative strengths, leading to a total number of 631 seats.
German Wikipedia biographies exist for all MPs in our dataset, English biographies are available for 138 German MPs. Similarly, French Wikipedia biographies exist for all MPs in our dataset, English biographies are available for 296 French MPs. The numbers of characters, words, adjectives, images, and categories per biography are obtained via Wikipedia's API. In addition, each biography links to a background site that provides a list of unique authors and the number of edits. All Wikipedia data has been collected on 12 October 2015. Information on the number of weblinks under party control, translation indicators, MPs' English homepages, and criticism is hand coded. Table 1 provides summary statistics of all the variables used in our analysis.

Evidence from the German Bundestag
Our goal is to answer if there is a coverage bias on Wikipedia, i.e., if comparable MPs from different political parties are covered differently in terms of their biography length. We proceed in two steps. First, we focus on the biographies of a relatively homogeneous group of backbenchers from CDU/CSU and SPD. Any differences in biography length can therefore not originate from differences in government versus opposition parties, centrist versus more extreme parties, or big versus fringe parties. Moreover, the personalities of single prominent MPs cannot affect our results. Second, to disentangle the effect of party affiliation from unobserved MP characteristics, we compare the MPs' German and English biography length in a difference-in-differences framework; we consider this as our main regression. Figure 1 shows the average biography length per party in characters. Since 2; 500 characters roughly correspond to one DIN-A4 page of a biography's PDF print version, biographies of MPs from the CDU/CSU and the Greens are around two and a half pages, biographies of MPs from the SPD are around two pages, and biographies of MPs from the Left nearly three pages long. 22 To put these numbers into perspective, note that the average biography length across all MPs is around 2.33 pages (5901.6 characters) and the median around 1.66 pages (4487.5 characters), with a standard deviation of nearly two pages (4890.87 characters). Hence, the difference in average biography length between the CDU/CSU and the SPD corresponds to roughly 25% of a standard deviation.

Choose homogeneous observations and control for observable differences
To control for observable MP characteristics, we further estimate the regression equation by OLS, where length G i corresponds to the German biography length of MP i, P i is a vector of party dummies with the SPD as omitted category, and X i is a vector of control variables including MP i's gender, political experience, education, ancillary income, and -if MP i is directly elected -constituency demographics. 23 The parameter of interest is β G 1 , as it measures the effect of party affiliation on biography length relative to the omitted category SPD. In other words, β G 1 corresponds to the average coverage bias relative to the SPD. Table 2 shows the results. Column 1 replicates the party averages from Figure 1 relative to the omitted category SPD, i.e., the CDU/CSU coefficient corresponds to around half a DIN-A4 page.

22
There are three potential explanations for the significantly longer Wikipedia biographies of Left MPs. First, from the parties whom we consider, the Left are positioned furthest away from the center of the political spectrum and may therefore have worse access to alternative media channels -such as newspaper or television interviews -to communicate with their voters. Second, Forschungsgruppe Wahlen (2014) has shown that voters of the Left are more Internet-affine than, for instance, voters of CDU/ CSU and SPD. Finally, MPs from the Left may consider an extensive web presence as more important than MPs from other parties. Freitag et al. (2020), for instance, find that 74% of the MPs from the Left have used Twitter in the six months following the constitution of the 19th Bundestag in 2017, while only 55.6% of the MPs from the SPD, 36% of the MPs from the CDU and 30.4% of the MPs from the CSU have done so. 23 Several large cities such as Cologne and Berlin are divided into several electoral districts. The data on those cities are aggregated and therefore not independent. We account for that by clustering the standard errors at the city level, and obtain 249 clusters.
When we control for gender (column 2), political experience (column 3), and doctoral degrees as well as ancillary incomes (column 4), the size of the CDU/CSU coefficient decreases to a third of a DIN-A4 page (which corresponds to 15% of a standard deviation in the dependent variable), but remains statistically significant. 24  24 We obtain similar estimates when we consider more prominent political figures such as ministers and party heads, too (see Appendix B).
In column 5, we control for an MP's constituency's demographics. Since only about 50% of the MPs are directly elected in a constituency and the other 50% enter the Parliament via state  lists and thus lack a clear constituency affiliation, we consider only 266 observations here. 25 Plausibly, population density has a highly significant effect on biography length: broadband connections in urban areas are usually better than in rural areas, which facilitates the use of Wikipedia and increases the biographies' likelihood of being read. In contrast to that, education or a constituency's share of voters aged 18 to 35 -those who are most prone to use the Internet as a source of political information -have no statistically significant effect. The CDU/ CSU coefficient is statistically significant despite the decreased sample size and corresponds to around 30% of a standard deviation in the dependent variable.
In addition, we perform several two-sided t-tests to check whether differences in coverage between other parties are statistically significant. In columns (1) and (2), no other differences in biography length between any two parties are significant, while in columns (3) and (4), the difference between CDU/CSU and Left is significant at the 5%-level. Moreover, Section 6.4 demonstrates that negative coverage plays only a minor role and does not affect our results.

Difference-in-differences estimation
If party affiliation is correlated to unobservable MP characteristics (e.g., if more salient politicians accumulate within particular parties) an OLS estimation of Equation (1) might suffer from an omitted variable bias. To disentangle the effect of party affiliation on biography length from the effect that unobserved MP characteristics could have, we exploit variation between the MPs' biographies in the German and in the English Wikipedia. Under the identifying assumption that unobserved MP characteristics affect the German and the English biography length equivalently, a difference-in-differences estimation using language as a first, and party affiliation as a second difference, yields estimates of the effects of party affiliation on biography length that are less prone to omitted variable bias.
English Wikipedia biographies exist for 138 MPs in our sample; Figure 2 shows their average length. English biographies of MPs both from the CDU/CSU and the SPD are nearly two pages, English biographies of MPs from the Greens are around three pages, and English biographies of MPs from the Left are around one and a third page long. To put these numbers into perspective, note that the average English biography length across all MPs is around two pages (4909.5 characters) and the median around one and a half pages (3817 characters), with a standard deviation of 1.66 pages (4119.6 characters). Suppose that the MPs' English biography length is determined analogously to Equation (1). Then, Differencing Equations (1) and (2) yields 26 The parameter of interest in Equation (3) is β 1 , whose interpretation hinges on further assumptions on β E 1 . As argued, biography length may serve as a positive signal about MPs' valence characteristics (see Appendix A), so partisan contributors have an incentive to amplify the Wikipedia biographies of MPs from one specific party. While partisan contributions may lead to a coverage bias in the German Wikipedia, this is unlikely to occur in the English Wikipedia version, since German voters are unlikely to read their MPs' English biographies. Thus, we assume that β E 1 ¼ 0; see Appendix C for further discussion. If party affiliation has no effect on the MPs' English biography length, β 1 corresponds to β G 1 in Equation (1). Hence, under the identifying assumption that unobserved MP characteristics affect the German and the English biography length equivalently and thereby cancel out, Equation (3) yields an unbiased estimate for β G 1 , the average coverage bias relative to the SPD.
Naively regressing Equation (3) on the subsample of MPs where we observe English biographies may lead to sample selection bias, though. Thus, we also consider a selection model consisting of Equation (3) and the selection equation where the dependent variable d i indicates if there exists an English biography for MP i (see also Greene, 2003). For the selection model to work well, Z i has to include additional variables that determine whether or not there exists an English Wikipedia biography for MP i, but do not affect the dependent variable in Equation (3). We use Z i ¼ ðP i ; X i ; I i Þ, where I i is a vector of variables that determine the international political relevance of MP i: the number of election terms in the European Parliament and the number of international offices during the 18th Bundestag (Commission of Foreign Affairs, Commission of European Affairs, and being head of an international parliamentary group). While international offices plausibly increase the probability that an English biography will be set up, they are not very salient, and hardly anything is written on them. We estimate the selection model by Heckman two-step and by maximum likelihood. 27 Table 3 shows the results. Column 1 shows the results of an OLS estimation of Equation (3). Although the size of the CDU/CSU coefficient is considerable -it corresponds to nearly a DIN-A4 page, which is nearly 60% of a standard deviation in the dependent variable and around twice as much as in Table 2 -it is not statistically significant. Columns 2 and 4 show the results from a two-step and a maximum likelihood estimation of Equations (3) and (4), respectively. Since the magnitude of the coefficients is not directly comparable to the OLS estimates in column 1 of Table 3, we also display their marginal effects at the mean (MEM) in columns 3 and 5, which are similar to the OLS estimates in column 1. 28 In contrast to the OLS regression, the CDU/CSU coefficient is statistically significant at 26 While we assume that differencing Equations (1) and (2) cancels out unobserved heterogeneity between our observations, our approach allows the observable control variables to have a different impact on German and English Wikipedia biographies. 27 We do not believe that the probability of having an English Wikipedia biography is affected by MP preferences: only five of the 138 MPs with an English Wikipedia biography even provide their personal homepage in English. 28 MEMs measure the effect of a change in one of the regressors on the conditional mean of the difference in biography length, evaluated at the mean values of all other covariates, and given that this difference is observed. MEMs for dummy variables show how the biography length changes as the dummy changes from 0 to 1, holding all other covariates at their mean values. the 10%-level in the two-step, and at the 1%-level in the maximum likelihood specification. 29 Hence, the difference-in-differences estimation provides even stronger evidence of a coverage bias against MPs from the SPD than our results from Section 4.1.

Further dimensions of coverage
This section shows that our results on coverage bias from Section 4.1 hold for alternative dependent variables in Equation (1): adjectives, images, categories, and weblinks under party control. First, we study the occurrence of adjectives in the biographies. Adjectives can make a text more lively and colorful. Moreover, the literature on sentiment analysis shows a correlation between the presence of adjectives and the subjectivity of a sentence, i.e., the degree to which opinions are expressed (e.g., Bruce & Wiebe, 1999;Pang & Lee, 2008;Wiebe et al., 2004). We find that biographies on MPs from the CDU/CSU have around ten more adjectives and a higher adjective-to-word ratio than biographies on MPs from the SPD; the former difference corresponds to around 18% of a standard deviation in the dependent variable and is statistically significant at the 5%-level (Table 4, columns 1 and 2).
Similarly, images make biographies more lively and attractive. We find that biographies of MPs from the CDU/CSU contain on average 0:33 images (30% of a standard deviation) more than biographies of MPs from the SPD; the difference is statistically significant at the 1%-level (Table 4, column 3).  (1), (2), and (4) is length G i À length E i that measures the differences in Wikipedia coverage of MP i in terms of her biography length in characters. CDU=CSU i , Left i , and Greens i are dummy variables equal to 1 if MP i is affiliated to that party; SPD is the omitted category. Female i is equal to 1 if MP i is a woman. Former periods in BT i counts the election terms that MP i has been in parliament. Ancillary income i is the mean ancillary income of MP i during the 18th election term in 1,000 Euros based on the estimation of abgeordnetenwatch.de. PhD i is equal to 1 if MP i has a PhD. Population density i , Fraction pop: 18 À 35 i , and Fraction pop: with Abitur i refer to i's constituency demography. The MEMs in columns (3) and (5) show the coefficients' marginal effect at the mean, i.e., the change in the difference in biography length, given that it is observed, and holding all other factors at their mean. MEMs for dummy variables show the effect in the dependent variable for a discrete change from 0 to 1. � p < 0:1, �� p < 0:05, ��� p < 0:01. 29 By using the likelihood function rather than the method of moments, the latter estimation procedure is more efficient and yields smaller standard errors; see Greene (2003) for details.
Next, Wikipedia articles are usually assigned to categories, that "help readers to find, and navigate around, a subject area, to see pages sorted by title, and to thus find article relationships." 30 Thus, assigning a biography to many different categories -examples include "Member of Parliament", "Politician", or "German" -enhances its chances to be found by readers. Biographies on MPs from the CDU/CSU are assigned to 0:65 more categories than biographies on MPs from the SPD; this difference corresponds to 25% of a standard deviation in the dependent variable and is statistically significant at the 1%-level (Table 4, column 4).
Finally, many Wikipedia biographies provide weblinks to external websites. We classify a link as "under party control" if it directs to a website that is under the obvious and substantial influence of an MP's party (e.g., MPs' personal or party homepages). Weblinks under party control facilitate the use of Wikipedia as a platform for political campaigns. We find that biographies on MPs from the CDU/CSU contain about half a weblink more than biographies on MPs from the SPD (60% of a standard deviation); the difference is statistically significant at the 1%-level (Table 4, column 5). 31

Evidence from the French National Assembly
We argue that our approach to detect coverage bias in user-generated content is widely applicable. Considering Wikipedia, it can be applied to all settings where a sufficient number of Wikipedia articles in the native and at least one foreign language exist. To demonstrate the point, this section studies  (2) is Adjectives=Words i , which measures the adjective-toword ratio of MP i. The dependent variable in column (3) is Images i , which measures the number of images in the biography of MP i. The dependent variable in column (4) is Categories i , which measures the number of categories that the biography of MP i is assigned to. The dependent variable in column (5)  The finding is supported by one particular incident. According to the Wikipedia talk pages, it is difficult to incorporate weblinks into articles, partly because of the German umlauts (ä, ö, ü). There exists, however, a user called "Cducsu" who has written a program to facilitate the procedure and has used it to install weblinks underneath biographies of MPs affiliated to the CDU/CSU that redirect to the homepage of the CDU/CSU parliamentary group.
coverage bias in the Wikipedia biographies of French MPs as a second application of our approach. Moreover, as we find a coverage bias in favor of the center-right Les Républicains, we also show that the results from above are not a country specific phenomenon. We proceed analogously to our main analysis. First, we focus on a relatively homogeneous group of backbenchers. To this end, we limit our attention to the National Assembly of the 14th legislature of the French Fifth Republic (2012 to 2017), where the majority of French MPs was affiliated to the center-right Les Républicains (LR) or the center-left Parti Socialiste and its allies (SER). Then, in our main specification, we compare the coverage of our observations in the French vs. the English language version of Wikipedia.
We retrieve a list of all MPs in the 14th legislature from the National Assembly's homepage assemblee-nationale.fr. Next, we use the Wikipedia API to collect data on all MPs' French and English biography length in characters, the number of images, external weblinks, and categories. To facilitate a comparison with the results from the German MPs, we retrieved this information as of 12 October 2015, just as we did for the main sample.  Figure 4 illustrate our data. Figure 3 shows that French biographies of MPs from the SER are about 1; 500 characters shorter that biographies of MPs from the LR, which is slightly more than half a DIN-A4 page. To put the numbers into perspective, note that the average biography for a French MP is around three and a half pages (8850:3 characters) and the median biography is around three pages (7687 characters) long, with a standard deviation of 4310:3 characters. The sign of the effect reverses in Figure 4: English biographies of French MPs from the SER are on average around 200 characters longer than English biographies of MPs from the LR. Table 5 shows the results from running regression Equations (1), (3), and (4) on the sample of French MPs, including only party dummies, and using the SER as baseline. In Column 1, we estimate Equation (1) and use length F i , i.e., the length of the French biography of MP i, as dependent variable. In line with Figure 3, the estimate for LR is positive and highly statistically significant; the difference in coverage between MPs from the LR and the SER corresponds to around 35% of a standard deviation in the dependent variable. Thus, just as in the German case, French conservatives receive more coverage than MPs from the center-left. The pattern is confirmed by further dimensions of coverage. In columns 2 to 4, we use the number of images, external weblinks, and categories as dependent variables. The estimates in columns 3 and 4 are positive and significant at the 1% level; the effect sizes correspond to 24% and 44% of a standard deviation in the dependent variable, respectively. The estimate in column 2 is negative, but not statistically significant.
Since the difference in coverage could also be driven by unobserved heterogeneity, we use our difference-in-differences approach from Equation (3) in Columns 5 and 6, where length F i À length E i is the dependent variable. In Column 5, we do not take potential selection into having an English Wikipedia biography into account. The estimate for LR is around 50% smaller than in Column 1 and only weakly statistically significant. Column 6 shows the ML results from estimating the selection model consisting of Equations (3) and (4), using indicators for membership in the Committee on Foreign Relations and the Committee on European Affairs for identification. The estimate for LR is positive, too, and highly statistically significant. To compare its magnitude with the OLS estimates, Column 7 provides the MEMs for the coefficients. The MEM for LR is very similar to the OLS estimate from Column 1 and confirms the existence of a coverage bias against MPs from the SER. We conclude that the application of our approach to French MPs yields comparable results to our main analysis in Section 4, although the coverage bias is smaller in absolute terms for French MPs.

Partisan contributions
How does coverage bias on user-generated content platforms emerge? This section develops a brief theoretical framework that highlights the role of a platform's users. In particular, we argue that differences in the users' preferences, characteristics, and ulterior marketing aspirations may result in the unbalanced coverage of certain topics. Coming back to the German Wikipedia, we demonstrate  that differences in users' partisan contributions are a likely driver of the unbalanced coverage of MPs from CDU/CSU and SPD.

Theoretical considerations
In contrast to traditional media outlets, coverage bias on user-generated content platforms accrues from the contributions of individual users and not necessarily from the intentions of the platform itself. For instance, if two topics are equivalently newsworthy, but a larger number of users contributes to one of them, coverage bias may arise. Similarly, if the users' contributions to a specific topic are relatively large, the overall coverage may become unbalanced. Figure 5 illustrates our theoretical considerations. A user-generated content platform covers two topics, A and B, which are equivalently newsworthy. Yet, the amount of coverage on topic B exceeds the amount of coverage on topic A; in other words, there is a coverage bias against A. There are two potential drivers of this bias: First, a larger number of users could contribute to topic B; second, the individual contributions to B could be relatively larger than the contributions to A. These drivers are not mutually exclusive. Figure 5 also shows that the emerging coverage bias may, in turn, affect political outcomes, sales, and users' well-being (see also Section 1 on the potential effects of coverage bias).
There are many potential reasons for differences in user contributions to the platform. On the one hand, individual characteristics, preferences, and interests determine how much and to which topics the users contribute. E.g., as discussed in Section 2.2, Wikipedia's users are predominantly white, male, English-speaking, and Internet-affine (Halavais & Lackaff, 2008), which has induced some structural biases on the platform. Reagle andRhue (2011) andHinnosaar (2019), for instance, show that women are underrepresented on Wikipedia. Halavais and Lackaff (2008) document a coverage bias toward naval sciences and military, whereas Rosenzweig (2006) speaks of "geek priorities" (p. 127) that manifest, e.g., in a large number of articles on obscure characters from science fiction and fantasy.
On the other hand, it is possible that users' contributions are driven by ulterior marketing aspirations, i.e., intentional causation of coverage bias (in contrast to the unintentional causation of bias through the channels outlined above). Users could, for instance, raise the amount of coverage of  (2) is Images i , which measures the number of images in the biography of MP i. The dependent variable in column (3) is Weblinks i , which measures the number of external weblinks underneath the biography of MP i. The dependent variable in column (4) is Categories i , which measures the number of categories that the biography of MP i is assigned to. LR, UDI, GDR, RRDP, and NI are dummy variables equal to 1 if MP i is affiliated to that party; SER is the omitted category. � p < 0:1, �� p < 0:05, ��� p < 0:01. a certain politician or of a certain product on purpose to increase the vote share of the politician or the sales of the product. E.g., Mayzlin et al. (2014) unveil the existence of promotional reviews on TripAdvisor; Lu et al. (2013) provide similar evidence from online restaurant reviews. 32 Likewise, the coverage of politicians may be influenced by hidden marketing aspirations -i.e., partisan contributions -of the users.
Theoretically, coverage bias on Wikipedia may thus be driven by structural biases of the platform (e.g., if there are more users who are interested in MPs from the center-right parties) or by partisan contributions (e.g., if supporters of the center-right parties manipulate the biographies of associated MPs). In the former case, differential coverage is likely to accrue in all language versions; in the latter case, it is more likely to occur in the language version that matters to the MPs' potential voters. Since differential coverage that occurs both on the German (French) and the English biographies cancels out in a difference-in-differences estimation, the results from Sections 4 and 5 support the role of ulterior marketing aspirations. To further endorse this claim, the remainder of this section shows that partisan contributions are a likely driver of our main results. 33

Authorship patterns
To support the idea that partisan contributions drive the coverage bias on Wikipedia, this section shows that there are more Wikipedia authors who repetitively contribute to the biographies of MPs from the CDU/CSU than to biographies of MPs from the SPD. Moreover, we show that biographies of MPs from the CDU/CSU are edited more often from the Bundestag building.
First, we check if there are authors who repetitively amplify the biographies of MPs from one specific party. For each article, Wikipedia displays either the authors' user name or, in case of anonymous contributions, their IP address. We identify all authors who contribute to at least 10% of the biographies of MPs from the CDU/CSU or from the SPD and classify them as "repetitive contributors." Next, we check which of these repetitive contributors amplify the biographies of just one specific party and classify them as "party-specific repetitive contributors." We find that there exist 37 repetitive and three party-specific repetitive contributors for the SPD. Moreover, there exist 42 repetitive and five party-specific repetitive contributors for the CDU/CSU.
Next, we track all contributions of anonymous users whose IP addresses are displayed. Building on a study by Bayerischer Rundfunk (2017), we consider all IP adresses that can be linked to the Bundestag building. 34 We find that 50.3% of the biographies of MPs from the CDU/CSU, and  Figure 5. Theoretical considerations of coverage bias. 32 See also Mayzlin (2006) for a game theoretic model on promotional chat on the Internet. 33 For the sake of conciseness, we focus on the German case. 34 According to bundesedit.de (June 2018), the digits "193.17." at the beginning of an IP address indicate the Bundestag network. 52.2% of the biographies of MPs from the SPD were edited at least once from the Bundestag. Moreover, biographies of MPs from the CDU/CSU were edited on average 5:2 times, while biographies of MPs from the SPD were edited 3:8 times on average. There exist also differences in terms of the number of characters added or deleted: while on average 627:02 characters are added and 482:20 characters deleted from biographies of MPs from the CDU/CSU by Bundestag contributors, only 232:49 characters are added and 157:65 characters deleted from biographies of MPs from the SPD. Note that the difference in the net number of characters added between the CDU/CSU and the SPDaround thirty -is much smaller than the effect we find in Section 4. Hence, edits by anonymous users from the Bundestag alone cannot explain the coverage bias documented above.

Talk pages
Partisan contributions are likely to entail discussions if additional content should be included or not. The purpose of a Wikipedia talk page "is to provide space for editors to discuss changes to its associated article or project page." 35 Hence, existence and length of a talk page can indicate the occurrence of partisan contributions. 36 We find that talk pages exist for 132 MPs from the CDU/CSU (44.9%) and for 79 MPs from the SPD (42.9%). Moreover, talk pages on MPs from the SPD are on average half a page shorter than talk pages on MPs from the CDU/CSU (3589.52 versus 5029.30 characters), which corresponds to 10% of a standard deviation in talk page length.

Negative coverage
Finally, we show that negative coverage does not drive our results on coverage bias. Our analysis assumes that Wikipedia coverage is beneficial for MPs. Media coverage is generally beneficial for politicians, e.g., because it increases name recognition (Burden, 2002); this applies in particular to less well-known politicians such as the MPs in our dataset. In addition, we have conducted a classroom survey that shows that a longer biography signals knowledge, strength as a public servant, and the ability to inspire people to younger audiences (see Appendix A). Wikipedia coverage could hurt MPs, however, if the biographies contained large amounts of criticism. To resolve such concerns, this section shows that negative coverage is a minor concern.
To explore the extent of negative coverage, we systematically identify negative sentences in the biographies. In a first step, we search each biography for sentences that contain the word stems of "Kritik" ("criticism"), "Diskussion" ("discussion"), "Rück-/Austritt" ("resignation"), "Skandal" ("scandal"), and "Affaire" ("affair"). Next, we determine if these sentences actually criticize the MP. We find that negative coverage is a minor issue: only 7% of the biographies contain more than one sentence of negative coverage, and 90% do not contain any negative coverage at all.
To confirm that the results from Section 4 are not driven by different amounts of negative coverage, we estimate Equations (1), (3) and (4) on the subsample of MPs whose biographies do not contain any criticism at all. Table 6 shows the results. Although the estimates are smaller and not as statistically significant as in Tables 2 and 3 To support this presumption, we let a Research Assistant code whether the talk pages contain "criticism on the biography's length", "criticism on the biography's content", or neither of it. She found that eighteen talk pages for MPs from the SPD exhibit criticism on length (22.8% of the existing talk pages), and four talk pages exhibit criticism on content (5% of the existing talk pages). Moreover, 27 talk pages for MPs from the CDU/CSU exhibit criticism on length (20.5% of the existing talk pages), and four talk pages exhibit some criticism on content (3% of the existing talk pages). 37 We also perceive pure vandalism as a minor issue: false statements are quickly detected by Wikipedia's control mechanisms and are thereupon erased. Moreover, if MPs are involved in scandals such as plagiarism or the consumption of illegal drugs, they usually resign, and these observations are excluded from our analysis.

Conclusion
This paper presents a novel approach to detect coverage bias in user-generated content. The procedure involves two steps: First, we focus on a sample of homogeneous observations and control for observable differences; second, we compare the coverage of our observations between different language versions of the same user-generated content platform in a difference-in-differences framework. As opposed to existing approaches to detect coverage bias in the media -such as the comparison of similar cases within the same media outlet or the comparison of equivalent cases across media outlets -the estimates from our approach are less prone to omitted variable bias.
An application of our procedure to Wikipedia unveils a coverage bias against MPs from the centerleft relative to MPs from the center-right in Germany and in France. We also present a brief theoretical framework on the driving forces behind our results and, focusing on the German case, provide empirical evidence that supports the role of partisan contributions as a potential driver of the emergence of coverage bias on Wikipedia.
Our study is relevant for media researchers, policy makers, and practitioners for two broad reasons. First, our approach to detect coverage bias in user-generated content is widely applicable and could be used to unveil further biases on Wikipedia or on alternative user-generated content platforms. One could, for instance, use the procedure to study coverage bias in consumer reviews on Amazon, Yelp, or  (6) is length G i À length E i , which measures the differences in Wikipedia coverage of MP i in terms of her biography length in characters. CDU=CSU i , Left i , and Greens i are dummy variables equal to 1 if MP i is affiliated to that party; SPD is the omitted category. Female i is equal to 1 if MP i is a woman. Former periods in BT i counts the election terms that MP i has been in parliament. Ancillary income i is the mean ancillary income of MP i during the 18th election term in 1,000 Euros based on the estimation of abgeordnetenwatch.de. PhD i is equal to 1 if MP i has a PhD. Population density i , Fraction pop: 18 À 35 i , and Fraction pop: with Abitur i refer to i's constituency demography. � p < 0:1, �� p < 0:05, ��� p < 0:01.
TripAdvisor. Similarly, one could examine the coverage of companies, celebrities, and certain (types of) products on information sharing websites like YouTube or Tumblr. Extending our analysis to further Members of Parliamant, Acts of Parliament, or political initiatives such as Fridays For Future or Occupy Wall Street would be feasible as well. Practitioners could employ our approach to monitor the coverage of their own business relative to their competitors.
Second, the results of our application provide the basis for a general debate on how coverage bias on user-generated content platforms could be counteracted. As argued in Section 6.1 and illustrated in Figure 5, coverage bias on user-generated content platforms stems from the individual contributions of its users and not necessarily from the intentions of the platform itself. Hence, any media policy that aims to diminish coverage biases must address the users' actions. While a full-fledged discussion is beyond the scope of this paper, we suggest two general approaches. One the one hand, one could raise the platform's and the platform's users' awareness of coverage bias. E.g., while warning templates for poor quality articles on Wikipedia already exist, a similar type of template could be introduced, stating that not only the information itself, but also the amount of information matters and should be processed accordingly. 38 On the other hand, user-generated content platforms could encourage users from underrepresented groups to make more and larger contributions. Wikipedia could, for instance, embolden women and non-English speakers to set up new articles or to extend existing ones. Similarly, user review websites like Yelp or TripAdvisor could use vouchers or discounts to increase the incentive to contribute of users from underrepresented groups.
Our empirical approach to detect coverage bias on user-generated content platforms is limited in two ways. First, while we show that marketing aspirations are one likely driver of unbalanced coverage, we cannot fully rule out other potential causes. That is, we cannot determine precisely whether and to which extent the coverage bias is driven by user preferences, characteristics, or ulterior marketing aspirations. Regarding the latter, we are also unable to distinguish between demand and supply side as driving force behind this channel. Thus, in terms of our application, we cannot say with certainty why there exists a coverage bias against MPs from the center-left in Germany and France. 39 The SPD in Germany has fewer potential voters than the CDU/CSU; hence, there may be relatively less demand for Wikipedia biographies of MPs from the SPD, such that partisan contributions have a comparatively smaller payoff. Although voters of CDU/CSU and SPD are equally Internet affine (Forschungsgruppe Wahlen, 2014), potential voters of the SPD may also perceive the new media as a less relevant information source. On the other hand, it is possible that the differences in partisan activity reflect the parties' perceptions of how important an extensive Internet presence is and to which extent they are aware of the potentials of user-generated content. For instance, Peter Tauber, who was secretary general during our observation period, provides a social media compendium that also points to the importance of Wikipedia (Tauber, 2013, p. 12), while nothing comparable exists for the SPD -in other words, the CDU/CSU may be more successful in their ulterior marketing activities.
Second, the approach is not applicable to unilingual platforms. We consider this to be a minor disadvantage, though. Many relevant user-generated content platforms such as Wikipedia, YouTube, TripAdvisor, Twitter, Yelp, and Facebook are available in several languages; notable exceptions are the English-only Reddit 40 and the Chinese-only Sina Weibo. 41 Similarly, different language versions must not be mere translations of each other, and one language version must be expected to be less biased.
One further limitation of our paper is specific to our application to Wikipedia. To disentangle the effect of party affiliation on coverage from the effect that MP characteristics may have, we focus on 38 Although Wikipedia's author guidelines stress that any content must be written from a "neutral point of view" (see https://en. wikipedia.org/wiki/Wikipedia:Neutral_point_of_view (June 2020)), which includes assigning proper weights to topics, the average user is unlikely to be aware of the issue and its pitfalls. 39 We contacted the German parties' press offices to inquire whether there are coordinated party activities in Wikipedia. According to all replies, there exist no official guidelines for the handling of Wikipedia; every MP is responsible herself for her Wikipedia biography. MPs from just one election period, which raises concerns about the generalizability of our results. The external validity of our study is, however, supported by similar patterns of Wikipedia coverage of members of the 16 German State Parliaments and of German Members of the European Parliament. 42 Biographies of CDU/CSU affiliates in a State Parliament are on average about a quarter page (or about 0.15 standard deviations) longer than biographies of SPD affiliates. Similarly, biographies of CDU/ CSU affiliates in the European Parliament are on average about half a page (or about 0.25 standard deviations) longer than biographies of SPD affiliates. These numbers suggest that our main results reflect a general pattern, rather than being specific to the sample at hand. Interestingly, when we compare the Wikipedia coverage of judges in the German Constitutional Court -who are usually nominated by a particular party -we do not find differences in coverage. These judges are, however, elected for a lifetime and not by the public, and thus have no incentive to amplify their Wikipedia biographies. These observations therefore also fit our hypothesis about partisan contributions as the main driver of our results.
The implications and limitations of our analysis highlight several avenues to future research. First, it would be interesting to study the effects of different remedies against coverage bias. It is, for instance, unclear by which means underrepresented user groups can best be encouraged to contribute. Moreover, since the literature on fact-checking measures documents that factual information has ambiguous effects on individuals' misperceptions (Jerit & Zhao, 2020), warning templates for articles that cover sensitive topics need to be well-designed to be beneficial rather than harmful. Second, a comprehensive analysis of the channels that ultimately lead to unbalanced coverage on usergenerated content platforms promises valuable insights. If these channels were better understood and distinguishable from each other, more concerted remedies against coverage bias would be feasible.

Appendix A. Survey evidence
To investigate whether voters perceive an extensive Wikipedia biography as a positive signal, we conducted a classroom survey at the University of Cologne and a representative online survey. The goal was to study if participants rate unknown MPs with a longer Wikipedia biography better in terms of their valence characteristics, i.e., qualities of a politician on which all voters agree (Stokes, 1963). We used nine valence characteristics that are frequently discussed in the political science literature (e.g., Funk, 1999;Kinder et al., 1980;Stone & Simas, 2010).
We randomized participants into two groups. Participants in group 1 received the instruction: "Consider a politician from your preferred party. The Wikipedia biography of this politician (politician A) is three pages long. Consider another politician from the same party. The Wikipedia biography of this politician is one page long. Please answer the following questions." Participants in group 2 received the same instructions, only that the biography of politician A was one, and of politician B was three pages long. We did not show actual (fake) biographies to the participants, because adding text -that inevitably provides further information on the politician -would prevent us from disentangling the effect of biography length as such from the effect of providing a particular piece of additional information.
Next, we asked the participants which politician would probably score better with respect to each of the nine valence characteristics. They could either reply "Politician A", "Politician B", or "Don't know." We considered participants' replies if they answered all nine questions.
A.1. Classroom survey The classroom survey was conducted among sixty undergraduate students of economics at the University of Cologne. 43 Table A1 shows the results. Columns 1 to 3 show the shares of students who opted for the politician with the three-page biography (s 3 ), the politician with the one-page biography (s 1 ), and "Don't know", respectively. Column 4 displays the difference between s 3 and s 1 , which is positive for all valence characteristics except for intelligence and honesty. We test the statistical significance of this difference against the null hypothesis that s 3 ¼ s 1 by means of a likelihood ratio test as in Giordan and Diana (2008) and find that the difference between s 3 and s 1 is statistically significant for knowledge, strength as a public servant, and inspiring (p < 0:01); and weakly statistically significant for empathy (p < 0:1)

A.2. Representative survey
To explore whether the results from the classroom survey are specific to the particular age group under consideration, we repeated the survey among a representative sample of participants (N ¼ 500). 44 Columns 5 to 20 of Table A1 show the results. To compare the results to the classroom survey while still maintaining a reasonable sample size, we partitioned the sample into terciles based on the participants' age in Columns 5 to 16; Columns 17 to 20 show the results for the entire sample.
In line with the results from Section A.1, we find that the difference between s 3 and s 1 for respondents younger than 37 years is positive and statistically significant for knowledge, strength as a public servant, inspiring, and empathy (p < 0:05, Column 8). This does not hold for respondents older than 37, though. In contrast to Columns 4 and 8, the difference between s 3 and s 1 is negative for many valence characteristics in Columns 12 and 16, which means that older people do not perceive a relatively longer biography as a positive signal.
To sum up, we find that younger subjects perceive a longer Wikipedia biography as a positive signal about the politician's valence characteristics, whereas it makes older subjects more critical. In his widely cited paper, Converse (1969) argues that voters' party identification becomes stronger over time; Falter (2010) provides empirical evidence from Germany (p. 14, Figure 15). Thus, young individuals are more likely to be swing voters and thereby comprise the relevant target group for (partisan) coverage bias on Wikipedia. Moreover, young individuals are far more likely to use the Internet in general and in particular to obtain political information. 45

43
The survey was carried out via classEx, a free software for interactive classroom experiments (Giamattei & Lambsdorff, 2019). 44 The survey was conducted online by respondi in December 2020. The sample of participants was representative for the German population between 18 and 74 years. 45 See https://de.statista.com/infografik/20068/informationsquellen-fuer-politische-nachrichten-bei-jungen-und-alten/ (Dec 2020), and https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Einkommen-Konsum-Lebens bedingungen/IT-Nutzung/Tabellen/ durchschnittl-nutzung-alter-ikt.html (Dec 2020).  (2) shows the share of participants that opted for the politician with the three-page biography. Column (3) shows the share of participants that opted for the politician with the one-page biography. Column (4) shows the share of participants that opted for "Don't know." Column (5) gives the difference between columns (2) and (3) along with its standard error. We test against the null hypothesis that (2)

Appendix B. Robustness checks
In our main analysis, we excluded 35 MPs in distinguished positions (ministers and party heads) and nine further MPs who had already left the Bundestag. Table B1 shows that including these observations does not affect our results. Moreover, the estimates that were added -dummies for early resignations and distinguished offices -are very large and statistically significant, legitimizing the presumption that they are not comparable to other MPs in the sample. In addition to that, we prove the robustness of the results in Table 3 by taking into account that English texts are about a fourth to a fifth shorter than German texts. 46 To this end, we scale the MPs' German biography length in Equation (3) with the factors 0:6, 0:75, 0:8, and 0:9, respectively. Moreover, we use the difference in logs of the biography lengths as a dependent variable. Table B2 shows that our results are qualitatively unaffected. Robust standard errors in parentheses. The dependent variable length G i measures Wikipedia coverage of MP i in terms of her biography length in characters. CDU=CSU i , Left i , and Greens i are dummy variables equal to 1 if MP i is affiliated to that party; SPD is the omitted category. Female i is equal to 1 if MP i is a woman. Former periods in BT i counts the election terms that MP i has been in parliament. Ancillary income i is the mean ancillary income of MP i during the 18th election term in 1,000 Euros based on the estimation of abgeordnetenwatch.de. PhD i is equal to 1 if MP i has a PhD. Population density i , Fraction pop: 18 À 35 i , and Fraction pop: with Abitur i refer to i's constituency demography. Party head i is equal to 1 if MP i is chairman of her party or its parliamentary group. Minister during 18th BT i is equal to 1 if i was a minister in the 18th election term. Former periods as minister i counts the election terms during which MP i was minister before the 18th election term. Early resign i is equal to 1 if MP i left the Bundestag early. � p < 0:1, �� p < 0:05, ��� p < 0:01. 46 See, e.g., www.orbis-uebersetzungen.de (Feb 2016).
In Section 4, we argue that it is plausible to assume that there are no effects of party affiliation on the English biography length, because partisan contributors have no incentive to contribute to the English Wikipedia. In this section, we perform four additional plausibility checks. First, only thirteen of 598 MPs in our dataset provide more than a short CV in English on their personal homepage, suggesting that they do not consider an English web presence as important. Second, only eight of 138 English biographies (5.8%) were edited from the Bundestag building; with one exception, only small changes were undertaken. Third, the lion's share of the English biographies is not translated from their German counterparts. Translated articles have to be marked by a translation template on the article's talk page and by a link to the source article; only ten out of 138 English biographies are marked like this, and no biography is translated from a foreign language into German. In addition, Wikipedia advises against one-to-one translation. 47 Finally, while the assumption of no party effects may fail for foreign languages that are spoken in countries adjacent to Germany or by large minorities, Germany has no direct border with any English speaking country, and a low number of immigrants whose native language is English (Statistisches Bundesamt, 2017). Robust standard errors in parentheses. The results are based on an ML estimation of the DiD selection model. The dependent variable is length G i À length E i , which measures the differences in Wikipedia coverage of MP i in terms of her biography length in characters. In columns (1) to (4) the German length length G i was scaled with the factors 0:6, 0:75, 0:8, and 0:9, respectively, before taking the difference. CDU=CSU i , Left i , and Greens i are dummy variables equal to 1 if MP i is affiliated to that party; SPD is the omitted category. Female i is equal to 1 if MP i is a woman. Former periods in BT i counts the election terms that MP i has been in parliament. Ancillary income i is the mean ancillary income of MP i during the 18th election term in 1,000 Euros based on the estimation of abgeordnetenwatch.de. PhD i is equal to 1 if MP i has a PhD. Population density i , Fraction pop: 18 À 35 i , and Fraction pop: with Abitur i refer to i's constituency demography. � p < 0:1, �� p < 0:05, ��� p < 0:01. 47 See https://en.wikipedia.org/wiki/Wikipedia:Translation (Dec 2018).