The inheritance of social status: England, 1600 to 2022

Significance There is widespread belief across the social sciences in the ability of social interventions and social institutions to significantly influence rates of social mobility. In England, 1600 to 2022, we see considerable change in social institutions across time. Half the population was illiterate in 1,800, and not until 1,880 was compulsory primary education introduced. Progressively after this, educational provision and other social supports for poorer families expanded greatly. The paper shows, however, that these interventions did not change in any measurable way the strong familial persistence of social status across generations.


Estimations of the Fisher relationships
Notes: "-rem" indicates "once removed". Cousin2, 3 and 4 indicates second, third and fourth cousins. Modstat is a PCA index that combines House Value, IMD and the Company Director indicator (CoDir). "Occstat" is occupational status, "HighEd" an indicator for higher education. "Pairs Observed" is the total number of pairs of relatives used in estimating the parameters of equation (3).  Table S2 reports the details of the estimates of b, m, and h 2 from equation (3)

Figure S1. Modstat Correlations and Implied Shared Genotype
Notes: The dashed line shows the OLS fit to this data. The R 2 reported is for this fitted line. The 5 child and grandchild correlations potentially deviate from this fitted line. 10

Figure S2. IMD Correlations and Implied Shared Genotype
Notes: The dashed line shows the OLS fit to this data. The R 2 reported is for this fitted line. The child and grandchild correlations potentially deviate from this fitted line.

Figure S3: Company Director Correlations and Implied Shared Genotype
Notes: The dashed line shows the OLS fit to this data. The R 2 reported is for this fitted line. The child and grandchild correlations potentially deviate from this fitted line. 5 10 Figure S4: Literacy, births 1725-1862, and Implied Shared Genotype Notes: The dashed line shows the OLS fit to this data. The R 2 reported is for this fitted line. The child and grandchild correlations potentially deviate from this fitted line.    Table 4 reports the details of the estimate of the correlation, r, in latent social status between marital partners by period for marriages 1837-2022. In total there were 1,014,299 marriages with 5 information on the occupational status of groom, father, and father-in-law. Bride occupational status was not used to estimate r because for most of this interval only a small minority of brides had a listed occupation.

Marital Assortment
We can estimate latent marital assortment for the years before 1889 using measures of bride 10 literacy, indicated by the bride signing the marriage register. This estimation is just the ratio of the correlation of bride literacy to her father-in-law relative to the correlation with her own father. This gives similar results to those reported in

The FOE Database
The materials for this study are a database of 422,374 individuals linked to their parents and spouses who lived, or had ancestors living, in England and Wales 1600-1822 (the Families of England (FOE) database). This database has two components. The majority of the data is from a 25 set of lineages of persons with rare surnames created by members of the Guild of One-Name Studies. 4 These lineages incorporate everyone with a rare surname of interest, wherever they reside, as well as spelling variants of the surname. Thus the Mitchelmore lineage, for example, incorporates the surnames Michelmore, Mitchelmore, Mitchamore, Mitchmore, Mouchemore, Muchamore, and Muchmore. 5 Similary the Auty lineage encompasses Auty, Autey, Awty, Otty, 30 and Ottey. 6 In cases where we only had access to the published lineages, these did not typically contain details of any living holders of the surname. In these cases we added that information ourselves from public records of births, marriages and addresses. Lineages were chosen for inclusion based on their completeness, and either the public posting of the lineages, or their creators' willingness to share the data with us for inclusion in the study. 35 In addition to these existing lineages, we ourselves created a set of lineages for rare surnames that were high wealth for people in the lineage born 1780-1850, for the purposes of better estimating social mobility rates through having more variance in social outcomes in the earlier generations. For the estimates based on the residence addresses in the electoral rolls and on     Table S3 shows the outline of the source of the data, and its distribution across time, and between general and elite lineages. Table S4 shows the numbers of relationship pairs in the data, 10 again by lineage type. The reason for the extraordinarily large numbers of pairs of 2 nd -4 th cousins in table S4 can be seen in figure S8, which shows an illustrative fragment of the genealogy database. Average completed family size in England in the nineteenth century was around 3 adult children, but this varied enormously across families, and the bulk of adults in each generation came from larger than average families, so that average sibship size then was 6. Such demographic 15 processes ensured large numbers of cousins, 2nd cousins, etc. for adults in each subsequent generation. Table S5 shows the social outcomes that are available by gender and lineage type. The numbers of any social outcome are much less than the numbers of people in the database because: 20 (1) some outcomes are available for men only, (2) before 1914 a significant number of children die before reaching age 21, (3) for births before 1780 and after 1920 many social outcomes are not observable. But, as   Wealth relative to average

Year of Death
How representative are the lineages in the FOE database of the general population in England and Wales? One test is average wealth at death in the general lineage 1858-1996 compared to average wealth of all deaths in England and Wales in these same years. Figure S9 shows this ratio by decade 1860-1990. As can be seen for the death decades 1920 and later, and thus the birth 5 decades 1860 and later, the Families of England average lineages seem representative of the general population in terms of wealth, and thus also in terms of other aspects of social status. For deaths before 1920, and thus births before 1860, average wealth in the general lineages is typically 50% higher than for the general population. The most likely explanation for this is that the processes that generated the rarer surnames used in these lineages were associated with somewhat 10 higher status families in earlier centuries, but that over time slow but steady social mobility has brought these surnames to average social status by the time of births in the 1860s and later. There is also a possibility that lower status holders of the lineage surnames are less likely to appear in the records with a surname recognizable as belonging to these lineages. Will the modestly elite status of the FOE average lineages for births 1780-1859 bias the estimates of social mobility rates in this period? The answer is that if we draw a sample from the 40 population where the variance of outcomes is different than for the population as a whole we would potentially get a biased estimate of the heritability of traits, h 2 . If the variance of the sample outcomes is higher, we will also get a more precise estimate of persistence. But whatever sample of the population we start from, estimates of the level of persistence should be unaffected.  10 here, in terms of geography and house values versus for the general population. The FOE dataset for individuals observed 1999-2022 has a geographic distribution that largely echoes that of the general population distribution, except for being less frequent in London. But the FOE dataset is composed, by design, of long-established English family lineages. London is the area of England with the largest proportion of the population of more recent immigrants. So it is expected that the 15 frequency of the FOE families will be lower in London than for the general population.
The estimated house values observed in the FOE database, adjusted to 2017 prices, again are close to the average across region observed nationally in sales in 2017. The only location with a substantial difference is London, where the FOE house values are higher. But as noted London is 20 the city with the largest share of population of recent immigration to England. Thus there is no reason to expect the FOE house values here to be similar to those of the London population as a whole. Overall house values in the FOE database are 6.6% higher than for England and Wales, as a whole, in 2017. This is a modest difference.
The paper utilizes seven social outcome variables. These were constructed as follows.
1. Literacy. This is inferred for marriages 1754-1889 from whether the bride or groom signed the marriage register. These signature records extend more recently than 1889, but by then 5 signature rates were very high, making the status information content of the measure low. Only a subset of county record offices in England have made images of marriage registers available online or through Ancestry.com. So literacy is available only for a subset of men and women marrying in these years. large set of occupation description strings were first assigned to one of 442 categories. Using a large set of 1.6 million marriage records 1837-1939 which give occupations for grooms, their fathers, and the father-in-law an occupational status score 0-100 was derived using Goodman's Association methodology. As can be seen in table 2, the occupational status index derived in this way shows strong parent-child correlations in both 1780-1859 and 1860-1919. 10 35 4. House Value. This is estimated from the addresses recorded for people alive 1999 and later in the electoral roll. Since the Roll released 2002 and later records only those who consented to the release of their address, there are potential issues of selectivity. However, we see in table S6 above that the average house value recorded using the electoral register addresses closely 40 approximates to that of England as a whole. There is thus no sign that higher status individuals are less likely to permit publication of their addresses in the Electoral Roll.
The Land registry shows house prices for all property sales 1995 and later. From this we construct an average dwelling value, normed to 2017 prices, for each Postal Code. British Postal 45 codes on average cover only 40 houses. So this gives a good estimate of local house values for the person. Where there was no sale recorded for a postcode, we use the Council Tax Band to estimate the property value. Empirically the log of average house values produces higher correlations between relatives, so we use this measure.
We employ this measure only for men and women aged 24 or above, and living at a different address than their parents. 5 Property values differ substantially by region in England and Wales, as table S6 shows. London house values, for example, are more than 4 times those in the North of England. Since people show strong persistence by region across generations, we normalize all house values to remove regional effects, dividing England and Wales into 6 regions for this purpose. 10 5. Index of Multiple Deprivation. This index is a ranking of Lower Layer Super Output Areas (LSOAs), typically with a population of around 1500, in terms of a weighted average of measures of social deprivation. The index is available by post code. 11 The 2019 index, used here, is a weighted average, with weights indicated, of measures of: Income (22.5%), Employment (22.5%), 15 Education (13.5%), Health (13.5%), Crime (9.3%), Barriers to Housing and Services (9.3%), Living Environment (9.3%). Since the index for Wales is constructed in a non-comparable way, we fix the level for that as the average for the English IMD for the North of England.
6. Company Director. Companies House in the UK maintains a register of the directors of 20 limited companies. 12 Limited companies include commercial enterprises, but also management companies for housing associations, as well as medical and legal practices. The register also includes people who subsequently resigned the position. We classify anyone alive 2002 and later, and aged 24 and above in 2022, as either a company director or not. 25 7. Status Modern. This is an index of social status which combines the three previous measures, using Principal Component Analysis, into an omnibus modern social status index. The correlation of the normed log house value and the index of multiple deprivation is 0.53, and with being a company director 0.24. The correlation between the index of multiple deprivation and with being a company director is 0.14. The correlation of the Status Modern index with these three 30 components is: normed log house value, 0.86, index of multiple deprivation, 0.82, company director, 0.49.

Replication Files
Included in the supporting information are two Excel files with the data required to replicate the results in Figures 1-4, S1-S7, and in tables 2-4, and S2. 5 The file Clark-Replication Data contains The file Clark-FOE Raw Data contains all the persons in the FOE database born 1600 and later, with the links to their father and mother, and with the 7 social outcomes used in this study. From this file the 11 relationship combinations and their status outcomes can be constructed.