Data Sources

There are many sources that relevant data for clinical data science can originate from. The brief overview in this chapter highlights the most frequent sources, but is definitely not exhaustive. The goal of this chapter is to provide an introduction to the most common data sources and to familiarize the reader with basic terminology in this context, in order to more easily understand discussions in next chapters and in literature in general.

across national urban systems (16) or by taking into account smaller population agglomerations (18,25). Table S1 gives a summary of fundamental urban input variables and of key, theoretically derived, urban parameters. Table S2 shows the explicit dependence of the scaling pre-factors and exponents on these parameters, as derived in the main text and presented in Table 1. Table S3 provides an annotated synthesis of data sources and of the existing literature on empirical urban scaling relations (including new results reported here). Table S3 also provides a general overview of estimated values for exponents and their uncertainty intervals, summarized in Table 1. Additional discussion of data sources, the nature of proxy quantities and specific issues relating to each scaling relation is provided in Supplementary Text, below.

Supplementary Text From Social Interaction Networks to Average Socioeconomic Rates
In general, I describe social interactions in a city in terms of a generalized graph, F k ij , (a graph between elements i and j, mediated by a set of different interaction types -friendship, employment, acquaintance, etc -indexed by k) as where g k is the strength per link of the interaction of type k to generate the total output of the city, Y . Note that the couplings, g k , can be either positive (attractive, expressing a social benefit, e.g. mutually beneficial economic relations) or negative (repulsive, expressing a social cost e.g. crime), though the balance must be positive for the city to exist, see below. The couplings g k have dimensions of Y per interaction, for example units of money or energy per unit time, per interaction. In a city there are many forms of interactions. For example, economic transactions contribute to economic output in terms of wages, profits, and many other quantities.
Crime, in contrast, may be the output of non-economic interactions such as those between the perpetrator and the victim as well as those mediated by law enforcement and by citizens themselves. Likewise, the interactions that lead to the spread of a contagious disease will be mediated by their specific types of encounters. The urban environment affects its citizens across all these dimensions so that a theory of cities must take them into account together.
The essential point I make here is that all these processes share the same average underlying dynamics of social encounters in space and time, against the background of the city and its infrastructure networks. To see this more explicitly, first consider the number of interactions, I i,k , of a specific individual i, across all modes, k, I consider the situation where the strength of the interaction k, is statistically independent of the specific pair i, j so that I can write F k ij = p(k|ij)F ij = p(k)F ij , where p(k) is the probability of different interaction modes, k, per link and F ij is the social network across all interaction types. Now consider that these interactions take place in space and time. Each individual is characterized by an interaction area, a 0 , (a cross section in the language of physics) and by a length traveled in the network,`. This spans a worldsheet, which is a fraction of the total public space volume, V n , (or area, in 2D networks, A n , which is the notation used in the main text) of the city. Because both a 0 and`are intrinsic properties of individuals I take these two parameters as independent of the type of interactions k.
Taking all people to be homogeneously distributed in this volume (the mean field assumption), the average total interactions experienced by our test individual,Ī i,k , are given by the ratio of the two volumes times the total number of (other) individuals, i.e.
with (x) the density in public networks (and n its average value), where interactions take place. In this last expression I wrote N 1 ' N , for large N (33). Note that any other sufficiently short-range interaction potential, not necessarily a -function, would lead to the same result, Eq. (S3), up to a dimensionless multiplicative constant, independent of N . Then, I can finally write which are the total interactions experienced on the average by an individual i, in a city of population N , and public volume V n . The couplingḡ is the average strength per link of interactions over all modes. In the main text, I take the volume of public space of the city to scale like that of its infrastructure networks, A n . In the two dimensional case (D=2) the cross sectional area a 0 takes the dimension of a traverse length, so that the ratio of (2-dimensional) volumes remains a pure dimensionless number. Thus, we obtain It is important to stress that although social interactions are local and take place at the most microscopic level between two individuals, Eq. (S1) leads nevertheless to effective interactions between individuals that are not directly connected, through chains of people between them, and between individuals and institutions (firms, public administration) as well as between institutions themselves. These effective interactions are obtained via the appropriate groupings of individuals in social or economic organizations and by the consideration of the resulting coarsegrained interactions between such entities (which are always ultimately mediated by people).
Institutions and industries that benefit from strong mutual interactions may aggregate in space and time within the city in order to maximize their Y W (see below), a point first made by Marshall (23) in the context of industrial districts. Others organizations may benefit primarily from the mean-field effects that result from being in the wider city and collecting a diversity of interactions, an argument often attributed to Jane Jacobs (22). This analysis of the finer structure of more heterogeneous interactions, which requires considerations beyond the average behavior derived here, will be considered elsewhere. Likewise the analysis of the fine structure of types of functions and interactions in cities, for example in terms of professions, and their connection to superlinear increases in socioeconomic productivity is developed in greater detail in (34).

Mixing, Exploration of Space and Fractal Dimension
Here, I develop more detailed considerations about the exploration of space by individuals that may take place in cities and the necessary conditions for a mixing population. The general idea is that, to benefit from their integration in the city, individuals explore different locations at different times but must be able, on their most basic budget, to explore the city fully. I parameterize this general behavior in Eq. (S6) by H, the fractal (Hausdorff) dimension of a path in space. T is the (energetic) cost associated with such path, which is written in terms of the city's land area, A, as in D general dimensions and where ✏ is a cost per unit length (see main text). The minimum budget that a new citizen may naturally muster is y min = GN/A, which is much smaller than the average budget, y = GN/A n , because A >> A n . Thus, this can be seen as an entry condition into the city: A new citizen, perceiving the city only in an unstructured way, before knowing its networks and public spaces, should be able to reach anyone else in the city. Equating T to y min 6 leads to a generalized relationship between population and area of the form with the exponent ↵ = D/(D + H) ' 2/3, for D = 2, H = 1, and the baseline area Note that a is a rising function of G, which controls the average strength (and productivity) of social interactions, and of decreasing ✏ the cost of transport per unit length.
Thus, increases in human capital, mobility and the diversity of social interactions, if expressed in increasing values of G and increases in transportation efficiency (decreases in ✏), lead to a larger a and an overall less dense city, while preserving the scaling relation. This is consistent with the observed trend in modern cities over time to become less dense (35).
The parameters G and ✏ are generally time dependent and may also show some (small) city size dependences a subject that I explore in the main text. Note that G can be measured from N , as is done in Figure 1B (inset). The constancy of the average G, measured in this way, gives direct evidence for the empirical validity of the assumptions made in the main text and can be interpreted as a "conservation law", since dG/dN = 0, or equivalently d ln G/d ln N = 0.
I introduced a Hausdorff dimension H to characterize paths through the city because I considered it too strong an assumption to take their geometry to be known. It is interesting to discuss the meaning of the several values of H further. H = 1 corresponds to the most natural assumption, that these costs are proportional to the linear extent (diameter) of the city and is clearly sufficient for an individual to reach any location in the city by himself. This is the assumption that is made (often implicitly) in urban economic models of land use, due to Alonso, Müth, Mills and others (36,37). The exponent ↵ = 2/3 was derived long ago by Nordbeck (18), who first observed it for Swedish cities. He used an allometric argument that total city population, N , should scale like a 3D volume due to its spatial profile of density change, which implies N ⇠ A 3/2 . This is indeed the case for D = 2, H = 1 in the argument given here. However, the social interaction picture, constrained by transportation costs across the city, is more fundamental because it can be fulfilled by individuals, appeals directly to function and social dynamics and does not require global optimization.
H < 1 corresponds to a trajectory with a volume less than linear and is in practice a series of separate spatial clusters. This means that an individual cannot reach the entire city by himself, though the city may still stay connected via a chain of local interactions. While a city can exist as such, cities would potentially become more and more disconnected as they grow, requiring a larger number of overlapping zones and interpersonal contacts to be available to each citizen.
In this regime a city would then behave more like a series of separate interacting communities rather than a whole mixing population, a characteristic that is often used to define the absence of a bona fide functional city. Note that in the limit H ! 0, the exponent = H D(D+H) ! 0, and urban agglomeration effects (superlinearity of socioeconomic outputs and sublinearity of infrastructure) altogether vanish. Given available data, which provides typically only total administrative unit area for a city or metropolitan area, this H . 1 regime seems to be sometimes observed, see discussion below and Table S3.
Conversely, H > 1 means that the length of trajectories scales faster than a linear volume, and in particular for H = 2 they would scale as an area (and for H = 3 as a 3D volume).
Because cities are approximately two dimensional we may expect H  2 to be an absolute upper bound, which leads necessarily to ↵ 1/2 and  1/4. It is important to stress that although individuals may explore the city in a way that is area filling locally, this does not imply that H = 2 in general. This is because the characteristic length is measured in terms of the area of the city, and consequently H = 2 would mean that they would have to cover the entire land area over a given time period. This seems counterfactual, certainly for large cities. For all these reasons, while I leave H as a parameter in the main text, I expect that it would naturally be of can be reached by people, goods and information traveling over infrastructure networks. The technology involved in these networks varies enormously with level of urban development but I assume here that the geometry of the networks does not. Figure 2A (main text) illustrates this situation for a regular grid. In this case the total length of the network can be derived easily, see Figure 2A (main text), as where l is the average block length (the minimum separation along the network), n b is the (linear) number of blocks across the city, and L = n b l. The factor of 2, in the first term of L n above, accounts for vertical plus horizontal network segments, and the factor of n b + 1 counts the number of segments across the city, including one at the edge, each with length L = n b l.
The factor of n 2 b that results is then identified with the area A, up to a multiplicative constant.
For networks that are not, on the average, square grids the constants multiplying the factors of area, A, will differ, but not the space filling character of the network, expressed as L n ' A/l.

Boundary Conditions and Scaling of Currents
Here I show more explicitly the effect of the choice of boundary conditions on network model variables and the introduction of certain city size independent individual and network properties.
This choice is important because it sets the scaling behavior of energy dissipation in the network due to transportation processes. I have assumed that the width of the smallest network units, s ⇤ , is a constant, independent of city size. Although seemingly an abstract assumption this means in practice something quite intuitive, that house doors, water faucets and electrical outlets, for example, each have a common cross section in all cities that does not vary with city population size. This means that I can write the scaling of width across network levels as which implies that the width is largest at the highest level (i = 0: root, "highways") s This condition may apply only statistically (38) for a network that is not a (balanced) tree, as for example, would happen in a semi-lattice (26), where branches at the same level are connected, or upper branches can converge on the same lower site. This condition leads to the scaling relation for the current density This relationship is not fully specified until I prescribe its boundary conditions. I can place a limit on the current density at the root ⇢ or respectively. Both these forms are independent of level i, a necessary consequence of total current conservation, but they scale with population size in different ways. Specifically, given a boundary condition at the root obtains J i = J = s ⇤ ⇢ 0 v 0 N 1 , while for the boundary condition at the leaves this leads to Note that the latter is the expected current for a population of individuals, in terms of their intrinsic "individual needs", and is therefore the natural boundary condition. It means, in intuitive terms, that the flow of people through doorways in their homes is similar across cities of different sizes and that the consumption of water, electricity, etc, per capita in households is an invariant of city size, as observed (12).
Thus, the differences between cities arise at larger scales, where social interactions are more common and population-wide constraints apply. Thus, life at home in cities of any size remains in many ways the same; it is only in public interaction spaces that the more urban character of larger cities manifests itself.

Dissipation on Infrastructure Networks
There are many dissipative processes (costs) that can take place in a city and that can lead to situations in which increasing social interactions and their products may be more than overcome by their associated costs. In the main text I assume the the resistance at each level of the network is that of all branches taken in parallel (c.f. (4) ), that is as usual, if all branches have the same resistance r i . The resistance of each branch is a purely geometric property of the network times a resistance, r, per unit length and transverse area, which increases with level i, and therefore is larger in the smallest branches than at the root.
From (S14) this leads to which decreases with i and is therefore larger at the root (highways) than at the leaves (narrow local paths). This is a direct result of the assumed parallelism of the branches at each level. If they are not strictly operating in parallel then the total resistance will decrease less slowly from the root to the leaves of the network, and be larger in total, leading to higher dissipation than estimated here. We can put the conditions on the current and resistance together to obtain the total power dissipated, W , as which scales superlinearly, with an exponent 1 + ' 7/6 (D = 2, H = 1). The pre-factor in ) . We see that the dissipative behavior of the network is set by the current squared, J 2 , multiplied by the resistance at the root, R 0 = ar ls⇤ N 1 . The current, in turn, is set by conditions at the smallest branches, that is, by the fundamental properties of people and their behavior. Thus, the main overall contribution to these dissipative processes results from people, energy, information, etc, being channeled through a network with many levels, and of the constraints that occur at its largest scales. Remarkably, this result ties together the most microscopic needs and behaviors of individuals anywhere to the most macroscopic aspects of the urban infrastructure.
Another way to see this is to rearrange terms in Eq. (S18) to write it as ) . This shows that that the dissipation term can be made smaller by increasing the infrastructure network's total volume, A n . In contrast, as we have seen above, making A n smaller increases the social outputs of cities. Thus, we may expect an equilibrium between the detailed consequences of these two effects that leads to an optimal allocation of infrastructure to social interactions as a function of population size (and level of technology).

Global Optimization
Here I show that the principles discussed in the main text can be formulated in terms of a constrained optimization problem, where each individual maximizes the outcome of his/her interactions minus costs, subject to the general infrastructural and size constraints posed by the city, and where city infrastructure can be managed so as to maximize collective welfare. I write the objective function, L, for this problem as where c = A 0 a 1/D is a constant in N that follows from Eqs. 2 and 4, d = (A/N ) 1/D (see main text) and 1 , 2 are Lagrange multipliers. From their point of view, individuals can structure their interactions in space and time so as to maximize the benefit of being in the city, while minimizing costs. This is expressed primarily in terms of the factors that enter G. In turn, city authorities should provide organizations (such as police, which affect social interaction modes) and infrastructure so that general urban socioeconomic benefits are maximized. This can be expressed in terms of the variation of A n (and of the factors that make it). Varying (S20) relative to A and A n leads to that is, it imposes the dependences in N of A and A n discussed in the main text and their consequences for social outputs and network dissipation.

13
Now observe that the problem of matching the sum total of social interactions to costs has two solutions in terms of values of G, specifically (S22) The first solution, at G = 0, means that for a city to exist it needs to have some level of net positive social interactions, G > 0. The second solution is the point at which network dissipation costs overwhelm the social benefits of the city, beyond which the city becomes too expensive to exist as a whole and may break up into disconnected areas. In between these two extremes, there is a special value of the coupling G = G ⇤ for which the balance is positive and largest. We can determine this point by taking the variation of the net benefits L relative to G, which results in the solution This condition implies that there is an optimal G to which any city should converge in order to maximize its difference between net social output and associated dissipation costs. Note that the city can only exist if social outputs are larger than dissipation and that, starting with small G > 0, it pays to increase the coupling for a while. However, increasing it beyond G > G ⇤ leads to dissipation rising faster than social outputs, reducing the net difference between the two and ultimately canceling them altogether.
Finally we can rewrite L at G ⇤ as We see, therefore, that the optimization that is achieved in the city is open-ended relative to population size, N , as long as both individual choices and infrastructure can be mutually adapted to (close to) their optimal values. This emphasizes the interplay between individual and social behavior, which constitutes the necessary condition for the city to exist, and the role of infrastructure and policy in creating the conditions that promote the benefits and reduce the costs of human social behavior.
In practice, these conditions predict that the most likely value of G from a sample of many cities should correspond to G ⇤ . Therefore, we can use an estimate of the mode of the distribution of G (Fig. 1B inset) to obtain G ⇤ , shown as the solid yellow line. Thus, monitoring relative changes in Y (e.g. via GDP) versus those in W (the dissipative costs of transportation) allows cities to judge how close to their optimal net output they are. More importantly, it also provides a practical and quantitative basis for iterative city management and policy as well as for benchmarking urban areas relative to each other, see Fig. 1B (inset). Table S3 (and Table 1) summarizes the empirical literature on estimates of scaling relations for cities including a few new ones, introduced here. I now discuss some of the current evidence for the ranges of exponents given in Tables 1 and S3 and present a few additional examples that emphasize the consistency of scaling exponents with data from many different urban systems and over time. I also discuss issues related to currently available data for specific urban indicators and directions for future empirical research.

Empirical Support for Urban Scaling Laws Predicted by Theory
There have been over 50 years of research characterizing many urban properties in terms of allometric, or power law, relations, especially in geography, sociology and urban economics.
The use of these types of scale-invariant relations is dictated by the general, but often unstated, assumption that human settlement properties, from the smaller towns to the largest cities, vary continuously and that there is no particular population or length scale at which they change radically. This is supported by a vast body of empirical evidence, only a fraction of which is  Table 1 (see Table S3) is a synthesis of these results and reflects this larger variance. It would be desirable in the future to develop a more consistent approach to measuring this scaling relation, hopefully motivated by the theoretical framework developed here.
Urban Scaling of Paved Areas: Several of the more modern empirical studies aiming at capturing the spatial extent of cities use satellite imagery as a means to measure the built area of settlements. These data allow, in principle, for simultaneous and consistent measurements over many cities. Pioneering studies by Batty, Longley and collaborators in the early 1990s (49), analyzing specific regions of Britain (such as Norfolk) found no (strong) agglomeration effects in this regional context, and developed null models to account for such behavior. However, these techniques have since developed further and been carefully calibrated and refined over the last 20 years, see e.g. discussion in (16). As a consequence remote measurements of built area have probably become reliable in the last few years thanks to new satellite imagery and more thorough comparative analyses. When applied to large cities they measure primarily the area of paved infrastructure, thus providing a measure of A n , not A. A variety of recent results for large cities in Europe (50), China (51) and over 3,600 cities over 100,000 people worldwide (16) establish the scaling A n ⇠ N ⌫ , with ⌫ ⇠ 5/6 (see Table S3), in general agreement with the theoretical arguments developed in the main text. The value of the exponent ⌫ shown in Table 1 is a synthesis of direct measurements, such as those shown in Fig. 1A (main text), and of these recent remote sensing estimates, see Table S3. More systematic measurements of A and A n and corresponding population sizes, would be desirable and new remote sensing datasets, properly calibrated and expanded to smaller settlements, may provide the best candidate empirical approach to this end (16).
Urban Scaling of Social Outputs: Perhaps an even richer literature addresses the effects of city population size on urban socioeconomic quantities. General qualitative arguments for the advantages of cities in human social life are very old and date back to at least Aristotle (52)

in his
Politics, where he discusses the greater scope of human sociality in cities (polis) when compared to animal societies. In terms of inspiring economic theory, the work of Alfred Marshall about industrial districts (23) and, more recently, of Jane Jacobs (22) and Allan Pred (53), suggested general qualitative mechanisms for why larger cities can generate increases in innovation and economic production rates and inspired many subsequent empirical and theoretical studies. The main arguments about these dynamical (functional) advantages of larger cities rely on aspects of their internal socioeconomic structure, which allow larger cities to provide new and better services in the context of a (national or even international) system of cities (22), (53). These ideas, which together with concepts from location and central place theory (54,55), constitute the basis for modern economic geography (29), require progress in the current quantitative understanding of urban social and infrastructural networks as the basis for predicting agglomeration effects. This is the objective of the theoretical framework developed in the main text.
Historically, it was only in the 1970s, to my knowledge, that direct empirical analyses of socioeconomic output rates across cities were first carried out. Sveikauskas (9)  and their construction is complex but consistent, see http://www.bea.gov/regional/ methods.cfm. For other nations these procedures tend to be analogous and some detail is available in the original sources' online materials, see Materials and Methods above.
Analyses of other quantities mediated through social interactions, such as crime, the incidence of contagious diseases, innovation, etc provide opportunities to measure the effects of cities in accelerating human social contact rates. Generally, in modern societies, levels of person-on-person crime (such as homicides) increase superlinearly with city size (10,12,25).
However, this may not always have been the case as it is the net result of superlinear opportunities for violence and social measures to combat crime (metropolitan police), which historically are first developed systematically in larger cities (1), (56). It has been argued for example that in Medieval times, cities were safer than the countryside (and both were very violent by modern standards) (1), and also that pre-1940s larger cities in the USA were safer (57) than smaller places, but quantitative data in support of these statements are weak. At present, urban systems that present higher levels of violence, especially in Latin America and parts of Africa, also suffer from severe issues affecting the availability and reliability of reported data (25). Nevertheless, where data are systematically collected exponents are in the expected ranges (25).
These reporting biases also affect measurements of public health data, especially dealing with the incidence of contagious diseases. In developed nations, access to modern public health in large cities has all but stemmed the worst impact of most contagious diseases.
An exception may be HIV/AIDS in the early years of the epidemic (12) (which showed superlinear rates of incidence with city size), as no crowd immunity effect or clinical inter-20 vention could then reduce death rates. The effect of antiretroviral treatment for the disease has sharply reduced new cases of the disease (as well as deaths) from the date of its introduction in the US in 1996. But because this introduction seems to not have suffered from substantial city size biases (at least among large cities for which there is data), it has not affected scaling exponents significantly. Data from the US Centers for Disease Control (CDC) for the earliest years of the epidemic are sparser and concentrated on higher risk population segments, see http://www.cdc.gov/hiv/topics/surveillance/resources/ reports/past.htm#supplemental, and the specific urban areas where the epidemic first took root, such as San Francisco and New York City. Later years provide a more general picture of urban contacts leading to the spread of the disease. It would be interesting to perform more thorough analyses of the epidemic's history, with these events and data limitations in mind.
Other contagious diseases, such as diarrheal diseases, have historically contributed to much urban mortality, especially among children. Where data are available, the incidence of these diseases is clearly correlated to urbanization, including in 19th century Britain (58). At present, where these diseases remain a major burden, data are usually lacking or are unreliable. This situation may soon change, due to new measurement opportunities in developing nations resulting from portable device technologies. These opportunities will provide important tests and applications of the theoretical framework developed here in environments where understanding and managing urbanization are most critical.
Urban Scaling of Invention and Innovation: While very important for growth and development, innovation is difficult to measure unambiguously. Patents (11) have provided a proxy for technological innovation, as have employment in 'creative' sectors (59). Patents tend to show scaling exponents slightly larger than most other socioeconomic rates, in the range = 1.2 1.3, possibly because they rely on interactions between individuals that are already dis-proportionately present in larger cities. For example, the number of supercreative 1 professionals (59) scales with city size in the US with an exponent ' 1.15 (11).
The exponent range for socioeconomic rates in Table 1 are obtained from references (9,10,11,12), (43), without controlling for co-varying factors, see Table S3.  Table S3. This is the range shown in Table 1.
Urban Scaling of Land Rents: An estimate for land rents across the city, given in Tables 1 and   S3, follows from considering the total income ⇠ N 1+ divided by the total land ⇠ N ↵ , which results in average land rents (measured in units of money per unit area and unit time) scaling with an exponent P L ⇠ N 1 ↵+ , 1 ↵ + ' 1/2 in D = 2, H = 1. Thus, land rents scale faster with population than incomes or wages. This is offset in part, by smaller per capita use of land, achieved primarily by increasing the floor area of buildings relative to their land footprint by building taller multi-floor units.
Data on land rents across cities are fairly sparse. Ranges given in Table 1 are obtained by adapting the results from (24) on housing values as a proxy. Note that in their Figure 3 these authors estimate the scaling of personal income per capita on median house value by metropolitan area. They find an exponent of 0.34 ± 0.02. Thus, with income per capita scaling with exponent , we obtain that P L ⇠ N L , with L = 0.34 = 2.94 = 1 ↵+ . In the estimates given in Table 1 I used = 1/6, but a slightly lower value of this exponent (often associated with income, but not wages or GDP, because of national transfers) will naturally result in a lower estimate of L .
New Evidence for Predicted Scaling Relations for Different Nations and Time: Figs. S1-3 show how the scaling of superlinear socioeconomic rates is a property of many urban systems worldwide, across several continents (Americas, Asia, Europe) and levels of development (e.g. China or the USA). Fig. S1 also shows the sublinear scaling of infrastructural quantities, see also (12), in relation to corresponding socioeconomic rates for the metropolitan areas of Japan.
Finally, Fig. S3 show the extraordinary consistency of superlinear scaling of socioeconomic rates (wages) in US Metropolitan Statistical Areas over the last 40 years, see also (43). This confirms the theoretical prediction that scaling exponents are independent of time, and levels of socioeconomic development, which vary considerably over this period; see Fig. S3d.
All this evidence, across time and nations all over the world, establishes some of the general properties of cities that can explain their universal role in socioeconomic development of human societies (1,9,22). Current data gaps, uncertainties and potential biases also stress the need for theory that can establish key quantities to measure and provide quantitative hypothesis for their magnitude and behavior. The framework developed here seeks to explain and provide predictions for urban data anywhere, through an interdisciplinary quantitative synthesis from geography, sociology, urban economics, planning and complex systems. It establishes the 23 fundamental nature of cities in terms of network theories of people and infrastructure. The estimate of the exponent is brought down slightly by the two largest cities, the Ruhr Valley and Berlin, respectively. In their absence the estimated scaling exponent agrees perfectly with the simplest theoretical expectation of = 7/6. The Ruhr Valley is an industrial area composed of several distinctly recognizable cities. Berlin is a special large city having been largely destroyed in World War II and still experiencing, at this time, the integration of its Western and Eastern parts, in the aftermath of German reunification. Its current population of about 3.5 million remains substantially smaller than at its height (in 1939 it was estimated at 4.3 million.) ( Table S2: Summary of the dependence of scaling parameters on input variables (Tables 1 and  S1). Note that exponents are only dependent on the the dimensionless parameters H, D, and are in general independent of network parameters or of details of individual behavior. In this sense we may expect exponents to be largely invariant in time, population size or levels of socioeconomic development. Nevertheless, H is a means of measuring how connected (inclusive) a city is and may change slowly over time. The prefactors of the scaling relations depend on the remaining input parameters and do change over time (e.g., see Fig. S3d), reflecting socioeconomic development and changes in the properties of infrastructure and individual behavior. (24) NR=not reported. Error, in order of availability from the source, is given by: 95% confidence intervals (square brackets), ranges, or R 2 values. Note: Average quantities are the simple (unweighted) averages across rows. Corresponding error intervals are the union of those from individual studies. ⇤ This estimate of Average land area includes all 12 rows above, it mixes explicit measurements of built area with others. † This estimate was obtained by the author through visual inspection of Fig. 1 in Ref. (39). Table S3: Summary of empirical evidence for predicted urban scaling exponents. Note that Land area has been measured in a variety of ways; those that account only for built area (at different levels of resolution) and those that compute circumscribing area. The latter scale with a smaller exponent ↵ ⇠ 2/3, while the former should approach the exponent ⌫ ⇠ 5/6, rather than ↵, as their resolution improves, as observed. Additional, measurements, discussion and sources are given in the original references and in the Supplementary Text.