Estimating eviction prevalence across the United States

Significance Several negative effects of forced displacement have been well documented, yet we lack reliable measurement of eviction risk in the national perspective. This prevents accurate estimations of the scope and geography of the problem as well as evaluations of policies to reduce housing loss. We construct a nationwide database of eviction filings in the United States. Doing so reveals that 2.7 million households, on average, are threatened with eviction each year; that the highest eviction filing rates are not concentrated solely in high-cost urban areas; and that state-level housing policies are strongly associated with county-level eviction filing risk. These data facilitate an expanded research agenda on the causes and consequences of eviction lawsuits in the United States.

requests between June 2016 and January 2019. In areas with available data, many of the requests were fulfilled near the beginning of this timeframe, resulting in greater representation of these records in 2016 and earlier years. We refer to these data as court-issued individual records.
We were not able to collect data via bulk records requests from all states for two primary reasons. First, not all states maintain statewide electronic case management systems that compile records across courts. In some states, these systems are just beginning to be implemented (e.g., the Maryland Electronic Courts [MDEC] system) and do not include digitized historical records. Second, some states have policies prohibiting bulk collection of court records by third parties, including researchers. For example, Kansas Supreme Court Rule 196 prohibits bulk records requests in District Courts (2).
Second, we also requested annual, aggregated counts of eviction filings at the county level from state and county courts. We began making these requests in October 2017 and continued through April 2020. While lacking in case-specific information, these aggregated filing counts served two purposes. First, we used these data to validate the case counts aggregated from individual records. Second, these data provided information about the filing volume in areas with limited or no individual records data. We requested county-level filings because this is the smallest areal unit in which states consistently tabulate and release these data. We were able to collect at least one year of aggregated filing counts from 2,204 counties across 46 states (Table S2). Throughout the text, we refer to these data as court-issued aggregated data.
Third, we purchased proprietary individual records data from LexisNexis Risk Solutions (Lexis-Nexis). LexisNexis conducts automated and in-person bulk collection of records from local civil and housing courts in most states. Automated collection typically involves bulk transfer of electronic records from the courts directly to LexisNexis. In-person collection, on the other hand, requires the manual entry of case information from on-site records in courthouses, which is much more labor intensive and time consuming. Some courts have restrictions on bulk requests of paper files or store records off-site, which increase the difficulty of in-person data collection. Due to the sheer volume of eviction cases filed annually and lack of available electronic bulk records collection across courts, it is not realistic to expect that the proprietary data will include the universe of eviction filings. Even so, the proprietary data provided an important signal of eviction case activity in areas where we were not able to obtain records directly from the courts, such as states that do not have unified case management systems or prohibit bulk requests of electronic court records.
To be clear then, both the court-issued and proprietary data were built on case records held by local and state courts. We were only able to request electronic individual records or aggregated counts of case filings, while LexisNexis collects records from both electronic and paper case management systems. In county-years in which we were able to obtain both court-issued and proprietary records, there was significant overlap in case representation across these data sources, as expected.
Although we did not place time restrictions on our data requests of the courts or LexisNexis, we restricted the analytic sample for this project to residential eviction cases filed between 2000 and 2018. Many court systems lacked consistent digitization of case records before this period (Table  S1), which limited both the representativeness of our data and the ability to validate filing counts across data sources before this period.

Data Cleaning
Court records are created for administrative rather than research purposes and the format, quantity, and quality of case information varies substantially across sources. We cleaned and standardized all individual court records, regardless of whether we obtained them directly from the courts or LexisNexis. We received records electronically, usually in the form of text-delimited files. The files were structured (i.e., the information was separated into labeled fields as opposed to unformatted text or document images) but did not share a standardized format across sources. Information was recorded at different levels across files. Some files had all case information contained within one observation. Other data files contained separate observations for each party (plaintiff or defendant) on a case. Still others had multiple observations for separate actions (e.g., filings, judgments) associated with a case. In the latter two formats then, some cases were represented by multiple observations. Unique cases were identified by the county name (or numeric county id), court id, and case number.
The available data elements for cases varied across sources as well. Most files included a basic set of information: filing date, landlord-plaintiff name(s), tenant-defendant name(s), defendant or property address, judgments entered on the case, corresponding judgment dates, and monetary amounts defendants were ordered to pay the plaintiff (if applicable). Some data files included additional pieces of information, including whether a writ was issued for the sheriff's office to remove tenants from the property (Virginia), names of judges assigned to the case (Philadelphia County), or itemized monetary judgment amounts (Pennsylvania). Given our goal of creating comparable eviction metrics across states, however, we were only able to fully utilize the data elements that were consistently available in most files for this project.
We can gain important indicators of eviction prevalence from these records. At the most foundational level, we can count how many residential eviction cases were filed within a particular place in a selected time period. Case-level address information allows aggregation of filings in smaller areal units, including census tracts, than are available in annual reports of case filings released by courts. We can then use the combination of tenant names and addresses in these records to move beyond total filings to distinguish how many unique households have experienced an eviction filing over time. This constitutes a separate measure from the overall number of filings as some households receive multiple eviction filings at the same address. The total number of filings represents the burden of eviction on the legal system within a jurisdiction, while the number of households receiving at least one eviction filing is a more direct measure of the burden of eviction on renters. The difference between these numbers can also impart important information on how landlords use the legal system to manage rental properties and variation in landlord behavior across jurisdictions (3,4).
These records do not allow us to measure how many households were displaced following the case filing or how often households were forcibly removed from rental properties. Courts do not track whether tenants remain at disputed properties throughout or following an eviction case. Default judgments may be issued against tenants who have already vacated the property. Landlords and tenants may resolve the dispute without the tenant leaving the property even after a judgment in favor of the landlord has been returned on the case. Most courts do not track whether cases result in the issuance of a writ of restitution for the disputed property, and even those that do may not reliably update records when the writ is cancelled or unexecuted. While many court records contain some form of judgment information, the quality and type of information varies significantly across S -4 data sources, making it difficult to create comparative metrics. For this reason, we cannot create a nationally comprehensive estimate of eviction judgments or households displaced due to eviction. We discuss these limitations in greater detail in Section 7.

Case Location
We needed to assign each eviction case to a precise geographic location to aggregate filings into standardized areal units (e.g., Census tract). For the data files that included a designated field for the address of the disputed property, we used that as the case address (5). For the remaining data files we used the defendant address as the case address under the assumption that most cases were initiated while the tenant resided at the disputed property (6). We cleaned and geocoded addresses to obtain latitude and longitude coordinates.

Address Cleaning
We cleaned the case addresses prior to geocoding to increase the likelihood they would be matched to a known street address. We did this in several steps: 1. We capitalized all alphabetic characters and removed excess white space in all fields containing address information.
2. Some data files included all address information in a single field, while others contained separate fields for street address, city name, state, or zip code. In some cases, data files with separate address fields were incorrectly or incompletely split. We ensured that addresses were accurately split into five distinct fields: • street address • unit designation • city name • state abbreviation • 5-digit zip code We used regular expressions to separate city names, state abbreviations, and 5-or 9-digit zip codes. We removed punctuation (with the exception of dashes and apostrophes) and other non-alphabetic characters from city names. We standardized state abbreviations to valid two-letter strings representing US states. We verified that the zip code field contained only numbers and shortened 9-digit zip codes to five digits. We left partial 3-or 4-digit zip codes unchanged.
3. We removed business, individual, or care-of (C/O) names from street addresses by searching for and removing strings of alphabetical characters that appeared before street numbers. 4. We removed post office box information from street addresses to prevent erroneous matches to street names during geocoding by searching for and removing common variations of this information (e.g., "PO BOX", "POB", "PMB"). 5. We separated apartment designations from street addresses by searching for common unit signifiers (e.g., "APT", "UNIT", "SUITE", "NO", "#") and transferring these elements into a separate field. 6. We corrected any cases where street address and apartment designations had been inverted in the original data by searching for cases in which the street address field did not begin with numeric characters (i.e., the street number), but the apartment (or second street address) field did. 7. We standardized street type abbreviations (e.g., "AVE" to "AVENUE", "ST" to "STREET", "DR" to "DRIVE") and numeric designations (e.g., "1st" to "First", "2nd" to "Second", "ONE" to "1"). 8. We cleaned city names, state abbreviations, and 5-digit zip codes via automated comparison to a standardized listing of city names and state abbreviations associated with US zip codes (7). We conducted this standardized comparison listing using two sources: Zip Codes To Go and US Census Zip Code Tabulation Areas. Zip Codes To Go (www.zipcodestogo.com) maintains an up-to-date listing of all zip codes used by the United States Postal Service (USPS), along with the primary city name and state abbreviation associated with each zip code. The US Census provides Zip Code Tabulation Areas (ZCTAs) that roughly align with USPS zip codes (8). We combined the listings of city names, state abbreviations, and zip codes from both sources into one master file of valid city name, state abbreviation, and zip code combinations.
We then compared each city name, state abbreviation, and zip code in the eviction records to the master file. We used the matchit program in Stata, which performs fuzzy string comparison, to find the best standardized match for each record using bigram vectoral decomposition distance. Bigram vectoral decomposition breaks each string into two-letter segments and then calculates a similarity score recording the proportion of these segments that align across two strings. The highest possible similarity score was 1, which indicated that a city name, state abbreviation, and zip code from a case address perfectly matched a city, state, and zip code listing in the master file. Most case addresses (95.6%) were matched with a similarity score of 1.
Some of the matches with similarity scores below 1 were corrected versions of the city name, state, and zip code in the case addresses. We marked matches with similarity scores below 1 as correct in the following situations: • Similarity scores greater than 0.70 with a perfect match for the city name and zip code. In these cases the state abbreviation in the eviction record was incorrectly entered, as zip codes do not repeat across states.
• Similarity scores greater than 0.70 with a perfect match for the state abbreviation and zip code. These matches represented misspellings in city names (9).
• Similarity scores greater than 0.70 with a perfect zip code match. These matches represented minor misspellings in city names and incorrect state abbreviations.
• Similarity scores greater than 0.75 with a perfect match for city name, state abbreviation, and only one unmatched digit in the first four digits of the zip code. Zip codes are S -6 assigned in a consistent pattern across states such that one unmatched digit (excluding the final digit) is very likely to represent an error.
• Similarity scores greater than 0.75, with a perfect match for city name, state abbreviation, with two consecutive digits of the zip codes inverted. Again, if two digits are simply inverted, this is more likely to represent a data entry error than a distinct, unknown location.
If the matches between the case addresses and master listing were marked as correct, we updated the cleaned address fields in the case address with the standardized city names, state abbreviations, and zip codes in the master file. City names, state abbreviations, and zip codes were updated for 1.7% of case addresses using these criteria. Suggested matches that did not meet any of the above criteria were not used to update the case addresses (2.7% of addresses) (10).
These were the general rules used for cleaning addresses, but we checked corrections as we worked and developed more specific rules for addressing particular variations of these problems. More specific examples are discussed elsewhere (11).

Geocoding
We geocoded cleaned case addresses to obtain two additional pieces of information: 1. A standardized representation of the address that is consistent across records.
2. The latitude and longitude coordinates that can be used to assign cases to areal units.
The case addresses were geocoded using the 2016 Environmental Systems Research Institute (ESRI) USA Street Address Locator Files. In total, 93.7% of addresses could be geocoded at the point-or street-address level. The point-address level assigns coordinates based on the house or building address. The street-address level assigns coordinates based on a street number falling within a specific range on a particular street (e.g., "123 Main Street" would be assigned to the 100-200 block of Main Street). Both the point-and street-address levels represent locations with a high degree of geographic precision. For addresses geocoded at these levels, we updated the case address fields to reflect the standardized addresses returned during the geocode. For addresses geocoded at less precise levels, we did not update the case addresses. When this was the case, we continued to use the cleaned address as it was before geocoding (6.3% of cases). We retained all returned latitude and longitude coordinates for case addresses but created an indicator for addresses geocoded at the point-or street-level so that we would be able to restrict the sample when aggregating in small areal units that require a high degree of precision for assignment, such as Census tracts.
1. The street address, state, and zip code matched, but one or more entries had a different city name, e.g., "123 MAIN STREET, HENRICO, VA 55555" and "123 MAIN STREET, RICHMOND, VA 55555." These were usually neighboring cities. We updated all entries to share the same city name, with preference given according to the following ordered criteria: (a) The version of the address geocoded at the point or street level (the most precise levels of geography). (b) The version of the address that could be geocoded at any level of geography. (c) The version of the address that was verified as a valid city name, state, and zip code combination (Item 8 in Section 1.3.2). (d) The version of the address that appeared most frequently in the records.
2. The street address, state, and city name matched, but one or more entries had a different zip code. As with variations in city names, this commonly occurred near zip code borders and these addresses were updated to share the same zip code. We used the same set of criteria listed above.
3. The apartment designation was appended to the street number, e.g., "123 MAIN STREET, APT 2B, ANYWHERE, US 55555" and "1232B MAIN STREET, ANYWHERE, US 55555". We identified these cases by searching for instances in which both the street and apartment numbers associated with one address appeared within the street number of another address. It was important to identify these cases because addresses with apartment designations appended to the street number often could not be geocoded. We updated concatenated versions with the correct separation of street number and unit designation.
4. Addresses shared the same street number and had street names that had either (1) a Levenshtein string edit distance of one character or less or (2) the containment of one street name within the other (e.g., "MAIN" and "MAINE"). We again updated addresses using the same ordered criteria specified for city names (12).
5. For addresses in DC, we extracted street directions (e.g., NE, NW, SE, SW), standardized the format to the two-letter abbreviation, and placed them uniformly at the end of the street address.
For addresses updated during this process, we also updated the standardized representation of the address and geographic coordinates obtained from the geocode.

Multiple Addresses
Some data files without designated property addresses had cases that listed multiple defendant addresses. In some instances, different addresses were associated with different defendants. In others, multiple addresses were listed for the same defendant, which may have resulted from defendants leaving the disputed property while the case was ongoing. Less than 10% of cases across data files had multiple defendant addresses, indicating that most defendant addresses likely represented the property address.
If multiple addresses appeared on a case, we selected the address with the highest likelihood of representing the disputed property using the following criteria: S -8 1. The address associated with the original case filing.
2. The address associated with a forcible detainer judgment or dismissal (if present). A forcible detainer judgment restores possession of the property to the landlord and is likely to reflect the location of the property at stake.
3. The modal case address.
4. The address associated with the earliest case action.
5. The in-state address (if in-state and out-of-state addresses are listed).
6. The address that could be geocoded at the point or street level.
7. If more than one candidate address remained for a case, we randomly selected an address.

Commercial Defendants
As we are studying housing eviction, we excluded cases with commercial defendants. The proprietary data already included an indicator for commercial defendants. To identify commercial defendants in the court-issued data, we developed a list of key words commonly associated with business entities and used regular expressions to identify defendant names that included these keywords (13).

Unique Defendant Identifiers
Administrative records may contain multiple representations of the same defendant name due to data entry errors or variations in how the name was listed on multiple case filings. Variations in names for the same person across multiple records impede our ability to identify duplicate records or multiple filings against the same household. We need to be able to identify multiple filings against the same household to estimate the number of unique households threatened with eviction (Section 4). We defined a unique defendant as the same individual residing at the same address. To find instances in which variations of the same defendant name and address appeared across multiple records, we used the fastLink probabilistic record linkage program in R (14). The program calculated similarity scores by first name, last name, and street address using Jaro-Winkler string distance (15,16). We considered a full match to be a similarity score of at least 0.95 and a partial match to be a similarity score of at least 0.92 (for reference, an exact match would have a similarity score of 1). We marked observations as representing the same defendant if all three fields were full matches or the last name was a partial match and the first name and street address was a full match.
We created unique defendant identifiers by treating the fastLink matches as undirected edges of a network graph. In this case, each raw name and address observation was a node in the network, with matched combinations representing ties between nodes. Each connected component created by the ties represented a unique defendant. We used the pooh package in R (17) to calculate the equivalence classes using weak tie transitive closure for the components in the network. Each resulting component was assigned a sequential number, which then served as the unique defendant identifier.

Household Identifiers
Landlords may not list all persons residing in the household as defendants on an eviction case. If multiple cases are filed against the same household over time, there may be variations in defendants listed on each filing. We created unique household identifiers by grouping together cases located at the same case address that shared at least one defendant name in common. If multiple cases were listed at the same address but shared no defendants in common, they were considered cases against separate households.

Duplicate Records
Data entry errors can lead to the same case being entered into the case management system multiple times. Failing to identify and exclude these records can inflate estimates of the number of case filings. We searched for duplicate records by the unique defendant identifier (same defendant name and address), date, case action, and judgment amount (if a monetary judgment was issued on that date). It is important to note that duplicate records do not represent multiple, distinct filings against the same households, which we discuss in Section 1.3.10. These are records that duplicate the entry of a filing that has already been captured in the records (e.g., same filing date, same outcome). We marked duplicated records both within and across case numbers. The former represent duplicated actions on the same case while the latter represent duplicated cases within the court system. Duplicated cases often appeared to be the product of small variations in the case numbers. To avoid over-counting cases, we linked duplicated records (both within and across cases) under one updated case number. We also checked for duplicated records across the same defendant and case action date. Multiple, separate actions could appear on the same date for a case; we ensured that these records were linked under the same case number.

Identifying Case Series
Case series refer to a set of cases filed against the same household. Identifying repeated case filings against tenants at the same residence challenges the assumption that the intention of an eviction filing is to remove a tenant from the property and illuminates other ways landlords use the legal system to manage rental properties (3,4). It also helps us distinguish the total volume of case filings from the number of households threatened with eviction-two distinct measures of eviction prevalence. We used the household identifiers (Section 1.3.8) to identify case series.

County-level Aggregation
We summed the number of eviction filings and unique households threatened with eviction for all US counties annually for 2000-2018. We defined our geography according to the 2010 Census (N=3,143 counties). Each case was assigned to the county and calendar year in which it was filed (N=59,717 total county-years). All data received directly from the courts included filing dates (or filing year, in the case of court-reported aggregated filing counts). When filing dates were not available in the proprietary data (approximately 32% of cases), we assigned the case to the calendar year of the earliest recorded action. For eviction filings, we simply summed all cases by county and calendar year. To measure unique households threatened with eviction, we again summed cases S -10 by county and calendar year but only counted the first (or only) appearance of a household in that county-year.

Covariates
To better understand how eviction prevalence varies across counties, we also collected a set of demographic and court characteristics for each county-year. The demographic covariates include the number of renting households, total population, population density, household density, percent urban population, percent renting households (of total households), share of population by race/ethnicity (Black/African American, Hispanic/Latino, Native America, Asian, Pacific Islander/Native Hawaiian, other race/ethnicity, two or more race/ethnicities, and non-Hispanic white), median income, median property value, median rent, mean rent burden (defined as the average percent of income spent on housing), share of population living at or below the poverty line, and unemployment rate. The court characteristics include the cost to file an eviction case, availability of a public access terminal in courts, and the number of courts that hear eviction cases in the county.
We estimated the number of renting households using linear interpolation of block group-level data from the 2000 and 2010 Censuses and 2016 ESRI Business Analyst. We obtained renting household data for the 2000 Census in 2010 Census geography from the IPUMS National Historical Geographic Information System (18). We extrapolated the linear interpolation for 2017-2018. We then aggregated block groups to create counts of renting households at the county level. We calculated household density by dividing the number of renting households in the county by the total land area reported in the 2010 Census.
We downloaded Census measures of percent county population residing in urban areas from the Missouri Census Data Center (19,20). These measures were only available for 2000 and 2010. We used linear interpolation to create percentages for the intervening years and extrapolated estimates to 2018.
We used measures of total population, share of population by race/ethnicity, median income, median property value, median rent, mean rent burden, and share of population living at or below poverty line from the 2000 and 2010 Censuses and four waves of the 5-year American Community Survey (ACS): 2005-2009, 2008-2012, 2011-2015, and 2014-2018. We used 5-year rather than 1-year ACS estimates as many counties had fewer than 65,000 residents, thus not meeting the population threshold for inclusion in 1-year ACS data. We assigned yearly data as shown in Table  S10. To calculate population density, we divided total population by the total land area reported in the 2010 Census. We obtained the annual unemployment rate from the National Bureau of Labor Statistics. All variables were measured at the county level.
Information on the availability of public access terminals and the number of courts that hear eviction cases in each county was collected from court information provided by the Public Record Research System (21). Public access terminals allow the public to view electronic case records in the courthouse (as opposed to having to request paper files from the court clerks) and may increase ease of access to records. The availability of public access terminals was used as a binary measure, with "1" indicating that courts had public access terminals. The data also included an indicator of whether a court heard eviction cases. To measure the number of courts that hear eviction cases, we summed this number within counties.
We calculated several additional data collection measures directly from the proprietary data to estimate how well these data should be expected to compare to the court-issued case counts. First, we calculated the percentage of records collected through an automated source (rather than in-person collection). We expected automated collection to increase data coverage as it should be less time and resource intensive than in-person collection, although challenges with consistent automated collection in some systems may also result in decreased coverage. We marked cases with records collected via automation through the collector id included in the data file. Second, we calculated the total number of data uploads that occurred in each county-year by counting the unique upload dates appearing on records; a greater number of uploads may represent increased effort to collect records from that area. Third, we might expect data coverage to differ by record collectors; some collectors may systematically collect a higher volume of cases than others. While we did not know the identity of the record collectors, we were able to identify the collector that contributed the majority of records to the data in a county-year and calculate the proportion of records contributed by that collector. Fourth, we calculated the percent of cases that were dismissed or appeared unresolved. Collection of dismissed cases may be a lower priority (or prevented by some states), rendering these cases more likely to be systematically missing. If records of the case filing were collected prior to the case being dismissed, it may appear in our data but without judgment information. Lack of dismissed (or unresolved) cases in the data may indicate an incomplete set of filings for that county-year.

Eviction Case Filing Fees
Landlords (or their agents) are required to pay an administrative fee when filing an eviction case in court. Some states set state-wide fee schedules, while others allow filing fees to be determined by counties (or the type of court). We collected data on the cost to file an eviction case manually. Research interns recorded the total minimum dollar amount required to file an eviction case in each county through either official fee schedules or case information posted on the court website or contact via phone or email with court clerks (22). We were able to successfully collect current fee schedules in 2018 for 3,116 of 3,143 counties (99.1%). In 28 states, filing fees were consistent throughout the state (i.e., the same across all counties). Even in states without fixed state-wide fees, the cost to file a case tended to be very similar across counties (Table S6). We confirmed that the within-state variance was significantly less than the between-state variance by fitting a fixed-effects model at the state-level. The fraction of variance attributable to the state-level intercepts was 0.953, with results from an F-test of the null hypothesis that all intercepts were indistinguishable from 0 significant at p < 0.001 (F(50, 3065) = 1023.82).

Regional and Urban-Rural Classification of Counties
To examine eviction prevalence across different areas, we classified counties by level of urbanization and region. We used the 2013 National Center for Health Statistics (NCHS) urban-rural classification scheme (23) to separate counties into six levels of urbanization:

Court-Issued Data
Although court-issued data were expected to represent all eviction cases that were processed through the courts' case management systems, staged implementation of new case management systems, changes in record-keeping, and inconsistent reporting by local courts can result in incomplete representation of eviction filings. After cleaning the individual records data and aggregating filings by county-year, we validated our court-issued county-year filing counts to ensure we had complete and consistent data coverage across counties. We did this in two steps. First, we compared filing counts in county-years in which we had both court-issued individual records and aggregated filing counts. We expected these filing counts to be very similar as we received both directly from the courts rather than through secondary data collection. We marked instances in which the filing counts differed by more than 10% and investigated discrepancies. Oregon was the only state with consistent, significant discrepancies. We determined that Oregon's court-issued individual records were not consistently inclusive of cases over time. For this reason, we excluded individual court-issued records from Oregon from the analyses. Second, we marked yearly fluctuations in the filing counts from the court-issued data within each county. We did this for both the court-issued individual records and aggregated filing counts. We flagged years in which the filing counts increased or decreased by 50% or more (25). We also flagged county-years that had court-issued filing counts that were less than 50% of filing counts observed in the proprietary data. We reviewed all instances that were flagged by one or more of these criteria and excluded filing counts that appeared incomplete. We identified the following issues: Court-issued individual records 1. We excluded most county-years for Indiana due to inconsistent implementation of the state's electronic case management system.

South
Carolina had 49 county-years from 11 counties excluded due to inconsistent data reporting.
Court-issued aggregated filing counts 1. We excluded county-years in Arkansas that also allowed eviction complaints to be filed as criminal "Failure to Vacate" cases (1). The data we received for Arkansas included only civil cases, resulting in an undercount of eviction filings in these county-years.
2. Filing counts for Georgia and Texas were reported at the court level, but not all courts consistently reported filings. We excluded county-years with one or more missing filing counts from courts that heard eviction cases in the previous year.
3. The filing counts we received for New York counties in 2017-2018 (excluding New York City) did not include cases heard in local or district courts, likely undercounting case filings. We excluded these county-years.
We excluded a small number of additional county-years of court-issued data (both individual records and aggregated filing counts) due to inconsistencies that did not appear to be part of a S -14 systematic or long-term pattern. These may represent more isolated data entry, data collection, or case management system errors. Table S3 summarizes availability and exclusion of court-issued data across states. Our goal was to produce a comprehensive set of eviction filing estimates across all 3,143 US counties from 2000-2018 (N=59,717 county-years). After validating the court-issued data, we had 9,432 county-years (in 935 counties across 20 states) with reliable filing counts from individual records and 29,545 county-years (in 1,930 counties across 44 states) of reliable aggregated filing counts. We used the filing counts from the court-issued individual records in the available countyyears (N=9,432). We used filing counts from the court-issued aggregated data in county-years with reliable aggregated data but missing or unreliable court-issued individual records (N=23,815). We gave preference to the individual records when both sources of court-issued data were available as we were able to review these data and remove commercial and duplicated records. Again, we removed only duplicated records that represented re-entries of cases already included in the data, not multiple, distinct filings against the same households. For the 2000-2018 period, we were able to observe filing counts from court-issued data for 33,247 county-years (55.7% of the total 59,717 county-years).
Filing counts from court-issued individual records and aggregated data were very similar in county-years in which we reliably observed both (N=5,730 county-years in 681 counties across 13 states). Filing counts reported in the court-issued aggregates were 1.7% higher, on average, than filing courts generated from the individual records when there were at least 20 case filings reported in the individual records. When including all county-years with reliable court-issued individual records and aggregated data, the mean difference was 2.4%, but this reflected the fact that very small differences of one or two records can result in large percentage differences when the total number of filings is small. There was some variation in this difference across states, making it difficult to make blanket assumptions about whether or how to adjust court-issued aggregated filing counts. In 2,971 of these county-years (51.8%), there was no difference between the individual records and aggregated filing counts. When there was a difference, the aggregated filing counts were more likely to be higher (N=2,244, 39.2% of county-years) than lower (N=515, 9.0% of county-years) than filing counts generated from the individual records, as expected. Due to the variation in the difference and the relatively small overall discrepancies, we decided not to make a blanket adjustment to the court-issued aggregated filing counts. If we would have made a blanket adjustment-1.7% for court-issued aggregates with 20 or more filings and 3.5% for aggregates with fewer than 20 filings (the average difference between individual records and court-issued aggregates when fewer than 20 filings were reported in the individual records)-it would have had very little impact on estimated filing counts. The national number of filings would have been reduced by 32,472, less than 0.9% of total filings, on average, annually. As we cannot assess the full distribution of possible differences between court-issued individual records and aggregated filing counts across counties and states, we decided against adjusting the court-issued aggregates before incorporating them into the national eviction filing estimates or Bayesian model (as we wanted the Bayesian model to preserve county-level variation in case filings as much as possible). The rough calculations presented above suggest that the decision not to adjust court-issued aggregated filing counts should have made very little difference for estimates.

Court-Issued and Proprietary Data
For county-years without reliable court-issued data (N=26,470), we needed a novel method to estimate the number of eviction filings. On one hand, we had at least one year of validated, courtissued case filings in 2,272 counties in 49 states in the 2000-2018 period; on the other, we were missing court-issued data for at least one year in a similar number of areas (2,673 counties in 49 states). This incentivized the development of a hierarchical model that allowed us to borrow information about filing volume within counties and states across years in which we were missing court-issued filing counts, when possible. We incorporated the proprietary data as a secondary measure of court-issued filing counts. We did not use proprietary filing counts as a direct measure of filing volume due to concerns about the inability to capture all case filings, as discussed in Section 1.2.
We observed at least one filing in the proprietary data in 75.3% of county-years (N=44,953 of 59,717 county-years). Of these 44,953 county-years, 25,715 (57.2%) had validated court-issued filing counts. The coverage of filings in the proprietary data relative to the court-issued filing counts varied considerably across these county-years (Table S11). In very few county-years were the court-issued and proprietary filing counts an exact match (4.3% of county-years with both validated court-issued data and at least one filing in the proprietary data). In 38.7% of these county-years, the proprietary data were a fairly reliable approximate of court-issued filing counts, showing a discrepancy of fewer than 10 cases or less than 5% of court-issued filing volume. In over half the county-years (52.5%), however, the proprietary data undercounted the court-issued filings by more than 10 cases or 5% of total filing volume. In only 4.6% of county-years did the proprietary data overcount filings by more than 10 cases or 5% of total volume relative to the court-issued data.
On average, the proprietary data undercounted the court-issued filings by 29.8%. In all states, the average difference revealed the proprietary data to be an undercount of the court-issued filings, except for Louisiana. In Louisiana, the proprietary data overcounted court-issued filings in one county but, because we had very few county-years in Louisiana with both court-issued and proprietary data, the overall mean difference was positive. The undercounts of filings in the proprietary data relative to court-issued data were likely the result of the inability to collect the universe of cases due to limitations discussed in Section 1.2. Small discrepancies in filing counts (both over-and undercounts) between the court-issued and proprietary data could also be due to differences in the dating of some cases discussed in Section 1.3.11. For county-years in which proprietary data were unavailable, it is not possible to determine whether it was because there were no eviction filings in that county-year or cases were filed but not collected. Due to the variation in the differences in court-issued and proprietary filing counts across counties and states, we incorporated these counts as a secondary measurement of filing volume in the Bayesian model rather than applying a standard adjustment to the proprietary data across all areas. We discuss how these data were entered into the model in more detail in the next section.
Finally, we were unable to obtain court-issued or proprietary filing counts in 7,232 county-years (in 1,331 counties across 41 states). These county-years needed to be estimated directly from the Bayesian model. Fig. S14 shows the distribution of data sources underlying county-level estimates across years. Although the same number of counties are represented in national estimates of eviction filings and households threatened with eviction each year (N=3,143), the data sources underlying county-level estimates shift across years.

Model Introduction
We considered the proprietary data to be a secondary measure of court-issued eviction filing volume. That is, we considered the validated court-issued data to represent complete annual case filing volume and the proprietary data to be an imperfect measurement of it. As shown in Table S11, the proprietary data typically undercounted filing volume relative to court-issued data, although the variability of the measurement error differed considerably across states. In some states there was very good agreement between proprietary and court-issued filing counts, whereas in others the proprietary filing counts were consistently lower than court-issued filing counts or fluctuated more.
We developed a joint Bayesian model for the court-issued and proprietary filing counts in response to these challenges. The court-issued filing counts were modeled as a function of demographic and court characteristics with an expected association with eviction case prevalence. The proprietary filing counts were modeled as a function of the court-issued filing counts and data collection conditions. This joint model was then used to generate 25,000 imputed datasets for county-years in which we were missing court-issued filing counts. The multiple imputation of these missing data was generated from the posterior predictive distribution of the joint model. The posterior predictive distribution is the distribution of values for the unobserved court-issued counts given the observed court-issued and proprietary counts, accounting for both the true variability of filing counts and our uncertainty regarding the parameters in the Bayesian model. To perform inference tasks, such as estimating the filing counts for a larger geographical area, the inference task was performed on each imputed dataset and the results from these analyses were combined to provide point estimates as well as measures of uncertainty.

Data and Notation
Let Y i be a random variable with realization y i indicating the court-issued filing count for year t i ∈ {2000, ..., 2018} and county c i ∈ {1, ..., n c }, where n c is the total number of counties and i indexes all county-year combinations. Further, let Z i be a random variable with realization z i indicating the county-year filing count from the proprietary data, s i be the county's state, and r i ∈ {1, ..., 4} its Census region (Northeast, Midwest, South, West).
In some counties all landlord-tenant cases were included in the reported filing count, whereas in others, only eviction-specific case types were reported. Let l i ∈ {0, 1} be "1" if all landlordtenant cases were reported and "0" otherwise. As most landlord-tenant disputes are eviction cases, the discrepancies should be small, but as some court-issued data and all the proprietary data contain landlord-tenant cases, we included this term to adjust for these differences and ensure an apples-to-apples comparison in the joint model.
We included several measures of demographic and court characteristics with an expected association with eviction filing volume in the model. The following county-level covariates were considered: For each covariate, we investigated potential model inclusion by calculating the residuals of the currently-fit model at the posterior mode and then investigating the relationship between those residuals and the variable. If a relationship existed, we transformed the variable to render the relationship approximately linear and standardized the variable. The transformations were chosen to make the relationship between the covariate and log(y i + 10) approximately linear, and a second transformation was added for three variables in which additional non-linearity in the relationship remained after the initial transformation. Table S12 shows all covariate transformations. Once a variable was included in the model, we investigated the posterior distribution of the parameter to ensure that it was substantively different from zero, and that its inclusion improved model fit. The resulting matrix was denoted by X. We similarly investigated interactions between S -18 covariates and potential variability of the covariate relationships with filing counts by state. We included random slope parameters for household density, median rent, and unemployment rate. We denote the transformed scaled covariates as x hd i for household density, x mr i for median rent, and x u i for unemployment. We investigated using splines and tensors to model the non-linearity in covariate relationships; however, this added computational expense and did not improve model fit. Additionally, these methods would have greatly complicated utilization of random slopes to model any interactions between covariates and state.
As noted in Section 1.3.12, many of the covariates, including total population, share of population by race/ethnicity, median income, median property value, median rent, mean rent burden, and share of population living at or below poverty line, were calculated from decennial census data in some years and 5-year ACS data others (Table S10). Due to sampling differences, ACS data are measured with greater error than the decennial census (26). The presence of measurement error in the ACS could potentially lead to understatement of the true covariate effects by the model and lead to less precise filing estimates relative to observing the true values of the covariates. We investigated including ACS margin of error estimates directly into the model, however, we determined that it added too much computational complexity to be practical.
There were several measures from the proprietary data relevant to estimating how well z filing counts should correspond to y (court-issued) filing counts. First, the share of records collected from automated sources was dichotomized into ≤ 50% and > 50% (auto i ∈ {0, 1}). Second, the total number of data uploads that occurred in that county-year was dichotomized into ≤ 10 and > 10 (nups i ∈ {0, 1}). Third, the collector tasked with the majority of record collection in that county-year (col i ). Fourth, an indicator for whether the county-year did not include any dismissed cases (cds i ∈ {0, 1}). Finally, an indicator for whether the county-year did not include any unresolved cases (bns i ∈ {0, 1}). Table S13 presents descriptive statistics for all variables included in the model.
Only some of the y i and z i values were observed. The indices where y was observed are o y and the indices where z was observed are o z . We denote the indices where they were not observed as the compliment ((o y ) C and (o z ) C ).

Probability Model for Y
Filing counts were transformed using a log plus ten transform and modeled using a hierarchical normal model. This transformation and distribution were chosen for several reasons. First, the resulting random variable was approximately normal. Second, the residuals of the fit model were approximately normal. Third, the residual variance was stable across predicted count levels. Fourth, the variance in the proprietary filing counts depended greatly on a number of covariates, which would have been challenging to model using some other distribution classes. Fifth, using a normal model admits closed form representations for the marginalized distribution of the proprietary filing counts when the court-issued counts are missing (see Section 3.4) and for generating posterior predictive values from the posterior via closed form representation (see Section 3.6). Finally, the model fitting process required extensive computing time and the utilization of many of the other potential formulations greatly increased the computational burden of the Hamiltonian Markov Chain Monte Carlo algorithm.
We also investigated using a hierarchical Poisson model for the raw filing counts but determined that the variability of the outcome (after adjusting for covariates) did not follow the expected relationship under the Poisson assumption. Additionally, a different model class would have been required to handle the variance model for Z. Along a similar line, we investigated utilizing a binomial model with the number of renting households as the binomial sample size, which was rejected for the same reasons as the Poisson model.
The model utilized a hierarchical specification with counties nested within states nested within regions. We added additional terms for the effect of the covariates, year, and landlord-tenant indicator and state-varying parameters for the effect of household density, median rent, and unemployment rate. We modeled state and year effects as normally distributed. County effects and covariate-state random slopes exhibited heavier than expected posterior tails when modeled as normal, so these utilized T-distributions. We set the degrees of freedom for these at 1, except for the county-level effects, as there was enough information in the data to include degrees of freedom as a parameter with a half-normal prior. The scale parameters for all random effects were given half-Cauchy priors following the recommendation of Gelman (2006) (27). The covariate effect parameters were given normal priors with large spreads. All priors were chosen to be weakly-or non-informative. We assessed this by expanding the scale of the priors and checking that little appreciable change was observed in the posterior.
The probability model for Y was specified as and is displayed graphically in Fig. S15.

Probability Model for Z
The differences between court-issued and proprietary filing counts varied significantly between states. This was true both in terms of the bias of log(z + 10) compared to log(y + 10) and its spread.
Thus it was important to be able to model both the mean (λ z i ) and standard deviation (σ z i ) of the measurement difference with state-varying parameters. Fig. S16 shows the graphical representation of the measurement error model. This model consists of a bias component, which represents the deviation of the proprietary filing counts from S -20 the court-issued counts, which can vary by state. A variability component for measurement error is also included, which can also vary by state. The model is expressed mathematically as When the court-issued filing count is missing it can be marginalized out via integration resulting in Fig. S17 shows the details of the bias portion of the model. Hierarchical parameters for the effect of state, the effect of each covariate by state, and the effect of the majority record collector on bias are all modeled as normal. The hyper-parameter priors were chosen to be normally distributed for central tendency measures and half-Cauchy distributed for the scale parameters. The intercept and landlord-tenant parameters were given flat improper priors, though these were so well determined by the data that any weakly-informative prior has little effect on the posterior. All priors were inspected and determined to be weakly-or non-informative.
The bias of the proprietary filing counts is then expressed mathematically as α bns,cds ∼ normal(µ bns,cds , κ bns,cds ) Fig. S18 provides a graphical representation of the variance portion of the measurement error model. We modeled the measurement error variance using a log-normal distribution. There is, to our knowledge, no standard guidance suggesting a particular distributional form for a hierarchicallyspecified scale parameter. Log-normal was chosen for two reasons. First, it is restricted to the positive domain and can take any mean and variance value. Second, the hierarchy for log variance can be expressed as a familiar hierarchical normal model. One disadvantage of this distribution is that the density goes to 0 as the variance goes to 0, thus if any of the measurement error variance was at or near 0, the model might have difficultly converging. We inspected the observed measurement errors broken down by all covariates and states and determined that there was appreciable variability at all disaggregation levels.

S -21
Covariate effects were modeled as normal with weakly-informative normal priors for the central tendency parameters and half-Cauchy priors for the spread parameters. The intercept was chosen to have a weakly-informative normal prior.

Inference
The missingness process for Y and Z were assumed to be missing at random and thus could be factored out of the likelihood for Bayesian inference. The likelihood for observation can be expressed in terms of the probability models outlined in the previous sections: The posterior distribution for this likelihood was constructed using the Stan probabilistic programming language (28).

Imputing Missing Data Using the Posterior Predictive Distribution
Given samples of the model parameters from the posterior distribution, we drew imputed samples from the posterior predictive distribution for y i in cases where it was missing. If we did not observe z i , then the kth sample ((Y i ) (k) ) from the posterior predictive distribution was: are the kth parameter samples from the posterior distribution. If we did observe z i but not y i , then the kth sample ((Y i ) (k) ) from the posterior predictive distribution was: After the imputation was complete, we adjusted down filing counts in counties where all landlordtenant cases (rather than only eviction-specific case types) were reported by subtracting off (β l ) (k) from the log(y i + 10) filing counts. Parameters estimated from these models are reported in Section 11. We do not report county-level hierarchical parameters from the primary model or majority collector identifiers from the secondary (proprietary filings) model due to space considerations, but the full set of parameter estimates can be downloaded at [url].
Some tenants are taken to court repeatedly by their landlord. Therefore, each filing does not represent a unique household threatened with eviction. Setting aside repeated filings against the same households is important for accurately assessing the number of households threatened with eviction each year. The frequency of these repeated filings varies across counties and states. For this reason, we specified a secondary Bayesian model to estimate the number of unique households receiving an eviction filing, which allowed us to preserve this geographic variation.
For county-years in which we have reliable court-issued individual records, we could directly observe the number of households threatened with eviction. We did this by creating household identifiers (Section 1.3.8) and aggregating unique instances of these identifiers annually (Section 1.3.11). We were able to measure the number of households threatened with eviction from courtissued individual records for 4,711 county-years.
There were several states in which we had court-issued individual records but could not measure households threatened with eviction directly from the data. Connecticut lacked tenant names for cases heard in housing courts, which account for approximately 50% of records. North Carolina was also missing tenant addresses for approximately 50% of cases. Data from DC and South Dakota did not include tenant addresses. Johnson and Wyandotte Counties in Kansas did not include tenant names or addresses. Virginia data included only the city, state, and zip code of a tenant's address. South Carolina included names and addresses, but because the original dataset was at the case level (i.e., one observation per case) the tenant name and address variables were often concatenated and difficult to parse. Without individual tenant names and addresses, we were unable to consistently group filings by household. We were not able to use the court-issued aggregated filing counts to observe or estimate the number of households threatened with eviction as there was no information on who appeared on the case filings or how often.
For states without court-issued individual records with name and address information, we estimated the number of households threatened with eviction from the Bayesian model. We calculated the number of unique households threatened with eviction from the proprietary data in the same manner as we did for the court-issued individual records. We only used proprietary data in county-years with filing coverage of at least 70% of that reported in the court-issued data to ensure that we had adequate representation of the proportion of repeated filings against the same household in the total case volume (29). To assess the agreement between the two data sources, we compared the proportion of filings that represented a unique household threatened with eviction using the court-issued and proprietary individual records. We calculated the correlation between these two proportion for the 390 country-years with validated data from both sources and at least 400 filings. The correlation was 0.87, indicating strong agreement.
We then leveraged hierarchical Bayesian models to estimate the number of unique households threatened with eviction for all county-years (threat i ). The number of households threatened with eviction given the number of eviction filings was modeled as binomial, with hierarchical normal effects for state (s), county within state (c), and year within county within state (cy). Demographic predictors were also added for the median rent (medrent i ), population size (pop i ), percentage of families living in poverty (perpov i ), percent unemployment rate (unemp i ), and median income (medinc i ). Additional predictors for the logged number of filings (y i ), year (t i ), and whether the filings represented all landlord tenant cases (l i ) were also included.
Flat priors were used for all population-level predictors and weakly-informative half student-t S -24 distribution priors were used for the hierarchical parameters. The full model specification was logit(p i ) = κ intercept + κ y log(y i + 10) + κ medrent log(medrent i + 10) + κ pop log(pop i + 10)+ κ perpov log(perpov i + 10) + κ unemp log(unemp i + 1) + κ medinc log(medinc i + 10)+ We considered modeling the number of households threatened with eviction jointly with the number of filings by incorporating the binomial model into the larger eviction filing model. This was rejected for two reasons. First, we could only observe the number of households threatened with eviction directly in counties with court-issued filing counts generated from individual records. Second, it added significant computational complexity that led to very long run times.
To generate an imputation sample from the posterior predictive distribution for a county-year, we first imputed a value for the number of case filings from the posterior predictive distribution of the eviction filings model (y i ) and then we sampled a value from the posterior predictive distribution of the household threatened with eviction model using this imputed filings value.

S -25
We performed several robustness analyses on the estimates of eviction filings and households threatened with eviction to investigate how sensitive our results were to alternative specifications of data sources and measurement. First, we assessed the correspondence between the court-issued data and estimates produced by the Bayesian models. Second, we investigated how incorporating the proprietary data as a secondary measure of filing volume affected national and state-level filing estimates. Third, we evaluated how Maryland's uniquely high filing counts affect national filing estimates.

Correspondence between Court-Issued Data and Bayesian Estimates
In a perfect world (of eviction data transparency), all states would have complete, electronic repositories of eviction case records. From these records, it would be possible to directly calculate the number of eviction filings and households threatened with eviction annually. Throughout the main text and in our discussion of data collection (Section 1.2), we have discussed the practical, legal, and technological barriers that make this impossible.
Collecting available court-issued filings data (either in individual record or aggregated filing count form) is an important first step in generating estimates of filings and households threatened with eviction; however, relying solely on these data is problematic for two reasons. First, as we have demonstrated in this paper, rates of filings and households threatened with eviction differ across states. We cannot assume that the filing rate observed in one or two states in any given year can be directly transferred to other states for which we were unable to obtain court-issued data. Second, the composition of states and counties represented in court-issued data changes over time. This occurs for many reasons. Courts implement electronic case management systems at different times; some courts digitize historical records and incorporate them into the system, others do not. Changes in record management systems may create inconsistencies in filing counts or delays in producing updated filing counts. Court case management systems are also vulnerable to external threats. For example, we were unable to obtain court-issued aggregated filing counts for the post-2016 period from Georgia due to a malware attack that affected the court system there. Making comparisons across years when there are fluctuations in the states or counties with available court-issued data is problematic because changes in filing counts may be due to sample composition rather than actual changes in the prevalence of filings nationally. For this reason it is important to have a balanced panel of counties included in estimates across years; however, the sources of available data on which to base these estimates may shift over time (Fig. S14).
Due to the fluctuations in availability of court-issued data over time, the composition of the national filing estimates-the proportion of county-years with court-issued filing counts relative to county-years with filing counts generated from the Bayesian posterior distribution-also fluctuated across years (Fig. S19). Relying solely on court-issued data would make it appear as though there have been drastic reductions in eviction filings in recent years, though this would be primarily a reflection of court-issued data availability (Fig. S14) rather than precipitous declines in eviction filings reported in court-issued individual records or aggregated filing counts. Even with the inclusion of Bayesian posterior estimates in county-years in which we were unable to collect court-issued data, there has still been a decline in eviction filings in recent years (Fig. 1), albeit much smaller than that produced by omitting county-years missing in court-issued data (light blue S -26 bars in Fig. S19).
Court-issued eviction filing rates were more similar within than across states over time (Fig.  S20), which affirms the across-state differences identified using the national estimates in the main text. Fig. S20 also shows the variation in availability of court-issued data across states over time. Fluctuations in filing rates estimated using only court-issued data reflect two possible sources of variation over time: (1) true changes in the prevalence of eviction filings and (2) changes in the states that had available court-issued data. For example, Maryland has a substantially higher filing rate per renting households than all other states, therefore we would expect a higher average filing rate in the court-issued data in years in which we had these data available for Maryland as compared to years when we did not, even if the true prevalence of eviction filings in Maryland (or other states) did not change. This makes direct comparisons of filing rates using only the court-issued data to the full set of national estimates, which is a balanced panel of all counties in the United States across years, over time very difficult (and possibly misleading).
To investigate the validity of the longitudinal trend in national filing rates reported in the main text then, we examined changes in average filing rates across three time periods in court-issued data, where robust state-level coverage of court-issued filing counts was available. The national estimates presented in the main text showed an increasing filing rate from 2000 to 2008 before declining in recent years. To increase the number of comparison periods in the court-issued data, we split the national trend into three periods: 2000-2003 (period 1), 2006-2010 (period 2) and 2015-2018 (period 3). We calculated the national average filing rate within each of these three periods: 9.1%, 9.6%, 8.1%, respectively. The percentage change in the filing rate between these periods was an increase of 5.6% between periods 1 and 2 and a decrease of 15.1% between periods 2 and 3.
We then created comparable metrics across states with court-issued data across these three periods (Table S14). We only calculated state metrics if we had validated court-issued data for at least 50% of counties in a given year to increase the likelihood that the changes we calculated reflected true state-level trends in filing rates. We calculated the percentage changes in court-issued state-level filing rates analogous to those calculated for the national filing rate. Looking across states, many displayed similar longitudinal trends to those observed in the estimated national filing rate. On average, court-issued state-level filing rates increased by 3.3% between periods 1 and 2; however, after weighting by renting households, the average increase was 5.1%, very similar to the change observed in the national filing rate between the same periods. Likewise, court-issued state-level filing rates decreased by an average of 10.1% between periods 2 and 3, 18.7% when weighted by renting households, again comparable to the change observed in the national filing rate between the same periods. This suggests that the longitudinal trends present in the national filing estimates are reflective of the changes observed in court-issued data.
We also compared the filing estimates produced by the Bayesian model to those reported in the court-issued data for the county-years in which we had validated court-issued data (N=33,247). As shown in Table S15, 60.5% of the county-year Bayesian estimates differed by less than 10 filings or 5% of court-issued filings, demonstrating fairly good agreement between court-issued data and estimates. The remaining county-years were evenly split between overestimates (20.4%) and underestimates (19.2%) by the Bayesian model. The average percentage difference between court-issued filing counts and those predicted from the Bayesian posterior distribution was 10.0%; however, this difference was inflated minor discrepancies in counties with very low filing counts. When restricting to county-years in which the court-issued data reported at least 10 filings, the mean percentage difference between court-issued filing counts and those predicted in the Bayesian S -27 posterior distribution was only 1.8%.
To evaluate the fit of the Bayesian model relative to the court-issued filing counts, we re-fit the Bayesian model with a holdout set of 5% (N=1,668) randomly selected county-years. Fig. S21 shows the court-issued filing counts plotted against the posterior mean predicted values from this model. The figure shows good agreement and no appreciable bias in the estimates. The holdout set relative bias in the transformed outcome (log(y i + 10), where y i = court-issued filing counts) was less than 1%. Additionally, the variance explained in the model for the court-issued filing counts was high (98.0%). Explained variance was slightly higher (99.0%) for the transformed court-issued filing counts (log(y i + 10)). As might be expected, the number of renting households in a county accounted for a significant portion of the explained variance in number of eviction filings. Adjusting for renting households (log(y i + 10) − log(rh i + 10)), the variance in filing counts explained by the model was 95.0%.
The full set of county-level eviction filing rate estimates in 2018 generated from the Bayesian posterior mean predicted values are shown in the map in Fig. S22. The distribution of predicted filing rates across the US is almost indistinguishable from that presented in the main text, which combined both the court-issued and Bayesian estimated filing counts, for 2018 (Fig. 3A). The full set of posterior mean predictions for all years, including credible interval and coefficient estimates, can be downloaded alongside the court-issued counts of filings and households threatened with eviction at [url].
Finally, Table S16 presents state-level average filing rates across data sources underlying the county-level estimates. States were assigned to data sources based on the predominant underlying data source for county estimates. In some states, county-level filing estimates were drawn from different data sources based on availability of court-issued individual records or aggregated filing counts. In five states, no single data source covered 90% of counties (final column in Table S16). We presented average filing rates from 2015 in Table S16 as this was the year with the best representation of court-issued data (Fig. S14). The assignment of states to data sources varied across years with changes in availability of court-issued data. There was significant variation in the magnitude of average filing rates across states in all data sources. Although it appears that the mean filing rate among states drawing filing counts from court-issued aggregated data was significantly higher than that in other data sources, this was due to the inclusion of Maryland, which has an abnormally high filing rate. Removing Maryland from the court-issued aggregated filing counts reduced the overall mean to 7.4%, which was almost identical to that observed for state-level filing rates generated from court-issued individual records (7.3%). Although the mean filing rates for states with estimates generated from the Bayesian model or with no majority data source are lower than those with court-issued data in 2015, many of these states have very small renter populations. The mean filing rates across data sources fluctuated significantly across years as state-level filing rates are highly correlated over time (Fig. S12) but the underlying sources of data often fluctuated (Fig. S14).
We also plotted the annual percent eviction filings representing unique households for counties in which we observed data on households threatened with an eviction filing (see Section 4 for a discussion of the observed data for households threatened with eviction) along with rates estimated from the Bayesian model for these same county-years (N=12,870). The rates show very good correspondence across all years (Fig. S23). These proportions are used to calculate the number of households threatened with an eviction filing each year. Again, as with eviction filings, these percentage non-repeated filings against the same household should not be directly compared to the full set of national estimates for households threatened with eviction (Fig. 1B) as the counties and S -28 states with available observed data fluctuated across years. It may appear as though the percentage of unique households represented in eviction filings decreased from 2009-2013 before returning to previous levels in 2018; however, this decrease is due in large part to the availability of observed data in South Carolina and Virginia for 2010-2016. South Carolina and Virginia, which both have higher than average rates of repeated filings against the same households (Fig. S11), only had data on households threatened with an eviction filing available in these years, which had a noticeable effect on the average percentage of filings representing unique households.

Eviction Filing Estimates without Proprietary Data
We also assessed how the inclusion of proprietary data in the Bayesian model affected the estimated number of filings. As discussed above, there are many structural challenges to collecting a complete set of eviction records from courts. Limitations imposed on record collection and changes in collection volume may create trends in the proprietary data that do not reflect true fluctuations in case filings. As the proprietary data entered the Bayesian model as a secondary, imperfect measure of court-issued filings, it was also possible to estimate the model without these data. Structurally, the model omitting the proprietary data was very similar to the model described in Section 3. The model included the same covariates and used filing counts from the court-issued data as the primary outcome.
As shown in Fig. S24, filing rates were estimated to be 4.7% higher, on average, when excluding proprietary data from the model. The discrepancy was greatest in 2000 before stabilizing at a smaller margin for the remaining years. The estimates from the full model incorporating proprietary data were contained within the 95% credible intervals of the modeling omitting the proprietary data in many years between 2009 and 2016, albeit near the lower bound. While the estimates from the model without proprietary data were consistently higher, we could not conclude that the estimates differ significantly in these years. The shape of the longitudinal trend for filings was almost identical regardless of whether proprietary data were included in the model. The curvilinear trend in eviction filings then is not attributable to fluctuations in secondary collection volume over time. Additionally, the 95% credible intervals were much wider when proprietary data was not included in the model. Even if sometimes incomplete, the proprietary data provided an important signal about how many filings should be expected in areas without court-issued data. In particular, the proprietary estimates were a valuable source of information when we lacked court-issued data for entire states. Filing rates differ substantially across states, even when populations are demographically similar (Fig.  S10). For this reason, observing data from as many states as possible is crucial to producing accurate estimates of the national prevalence of eviction.
The importance of access to data in all states is clear when comparing predicted state-level filings with and without proprietary data. In the Northeast, the availability of court-issued data increased the reliability of estimates over time (Fig. S25); however, the inclusion of proprietary data helped reduce uncertainty in states with long-term limited data, particularly New York. In the Midwest, we observed similar reductions in uncertainty in Illinois and Kansas (Fig. S26). Reductions in uncertainty were even more striking in the South (Fig. S27), where lack of court-issued data in Louisiana, Mississippi, Oklahoma, Tennessee, and West Virginia made it particularly difficult to estimate filings in this region. This pattern was repeated in Idaho and Montana in the West (Fig.  S28). Incorporating proprietary data into the analyses was key to producing more precise estimates, without creating artificial patterns in long-term filing trends.

Maryland Estimates
Maryland is an outlier in the number of eviction filings. An average of 589,064 cases (95% credible interval: 579,879 -597,957) were filed annually between 2000 and 2018. With a renting population of approximately 707,000 households, this is an average annual filing rate of 83.3%. In some areas of the state, including Baltimore City, the annual number of filings regularly exceeds the number of renting households (30). The unusually high filing rate is produced by the state's landlord-tenant policy. In Maryland, the filing serves as the initial eviction notice and can be filed immediately following nonpayment (31); however, if the tenant pays the balance of the rent due, plus any incurred late fees or court costs, the complaint is considered satisfied and the tenant remains in place (32,33). For this reason, many households receive multiple filings before an actual eviction occurs; in fact, repeated filings against the same household constituted 57.4% of all case filings in Maryland.
We have annual county-level, court-issued aggregated landlord-tenant filing counts in Maryland for 2000-2017. These data are released in annual reports by the Maryland Judiciary (34). The only source of uncertainty in our filing counts then is adjusting for the portion of eviction-specific cases out of the total landlord-tenant case volume (see Section 3) in 2017 and estimates for 2018. As aggregated filing counts do not include case-specific information, we had to estimate households threatened with eviction from the Bayesian model (Section 5). The uniquely high prevalence of filings in Maryland resulted in higher estimated rates of households threatened with eviction, as we might expect.
To understand how the high prevalence of filings in Maryland affected national estimates, we produced an additional set of numbers that excluded Maryland. Excluding Maryland reduced the number of filings by approximately 500,000 cases annually. This reduced the average national filing rate by 14.5%, from 9.0% to 7.7% (Panel A, Fig. S29). Although the filing prevalence was lower when excluding Maryland, the overall longitudinal trend in the filing rate remained unchanged, indicating that, while Maryland does contribute a large number of filings in the national perspective, it is not the primary driver of changes in the trajectory of eviction prevalence.
Maryland's effect on the rates of households threatened with eviction was less substantial (Panel B, Fig. S29). When excluding Maryland, the average rate of households threatened with eviction was reduced by 7.7%. The relatively smaller magnitude of this reduction is likely attributable to the higher prevalence of repeat filings against the same households. The "pay to stay" condition allows renters to remain in the residence if financial obligations of the eviction complaint are satisfied, yet the use of the filing as the eviction notice and low filing fees incentivize the landlords to repeatedly file cases if rent payments are late in subsequent months. In this way, unique households may represent a smaller portion of the annual filings. Renting households in Maryland are at higher risk of filings, independent of household characteristics typically associated with eviction (Fig.  S10). They are also at higher risk of receiving repeated filings (Fig. S11), which is why excluding Maryland results in a larger reduction to the filing rate than rate of households threatened with eviction.
The process for notifying a tenant of an eviction action, including how and when the landlord is required to provide notice, differs significantly across states but is not included in many classifications of state-level landlord-tenant policy regimes. The frequent classification of Maryland as a more tenant-friendly state (e.g., 35) may reflect important legal protections afforded to renters but does not capture key aspects of policies that create increased risk of landlords threatening tenants with court-ordered eviction. These differences contribute significantly to disparities in eviction

Eviction Notice Requirements
We investigated the association between state-level eviction notice requirements for filing nonpayment of rent cases and filing rates by downloading the publicly available Eviction Laws Database released by the Legal Services Corporation (LSC) (36). We used two measures of eviction notice requirements from the data. First, an indicator for whether landlords were required to provide notice to tenants before filing an eviction case for nonpayment of rent. All 50 states and DC require landlords to provide notice for eviction filings for reasons other than nonpayment, but, as nonpayment cases constitute a substantial majority of filings, the lack of notice for nonpayment cases is more consequential for overall filing volume. Second, the number of days' notice, if any, a landlord was required to provide prior to filing a nonpayment case.
The LSC Eviction Laws Database was assembled in 2021, after the period for which we had collected data on eviction filings. To verify that the same notice requirements were in place for the 2000-2018 period, we checked the legal code history using the Westlaw Next Campus Research database (accessed through the Princeton University Library). We searched for archived copies of the legal codes cited by LSC for eviction notice requirements. When possible, we verified the consistency of notice requirements for the full period of study. We examined the text of the most recent archived legal code prior to 2000 and any amendments enacted between 2000 and 2018 ( Table S7). The verification year listed in Table S7 does not represent the date the law(s) governing notice requirements were implemented or changed; instead, it represents the most recent year prior to the start of the study period (2000) that archived enacted legal code was available. Most notice requirements were initially adopted well before the year used to verify the legal code as of 2000. Westlaw served as the verification source, except when noted otherwise in Table S7. We were able to verify notice requirements for the full study period in 36 states. For the remaining 15 states, we were able to verify requirements for 13 (out of 19 possible) years, on average. Only 11 of these 15 states had counties included in the analysis. The lack of verification for the full study period does not mean that notice requirements changed during this time, only that archived legal code was not readily available pre-dating 2000. In fact, we observed no changes to notice requirements during the 2000-2018 period. This indicates that policies governing notice requirements tend to be fairly stable over time. As shown in Table S7, we did find 8 discrepancies between notice requirements in the LSC Eviction Laws Database and the archived legal code, all owing to changes in the law that occurred post-2018. In all cases, we were able to observe the changes directly in archived legal code. Some of these changes were due to the COVID-19 pandemic (e.g., in DC), while others may have been attributable to increasing attention to eviction as an important housing issue in the US.
The lack of changes in eviction notice requirements during our study period prohibited us from examining the associations between these requirements and filing rates longitudinally within states. If we included required days notice in models with state-level fixed effects (which help account for time-invariant factors that may be associated with both housing policy regimes and eviction filing frequency), the effects of notice requirements would be absorbed into the (unobserved) state-specific intercepts. To overcome these difficulties, we exploited state variation occurring in core-based statistical areas (CBSAs) that cross state borders. CBSAs are designed to capture an urban area and the surrounding counties that are socioeconomically interdependent with that urban area. Unlike many Census areal units that are fully contained within states, CBSAs are defined as groupings of counties that are not bound by state borders. CBSAs constitute areas expected to share similar socio-demographic and economic conditions; however, eviction filings are governed S -32 by the landlord-tenant code of the state in which the county is located. This variation allowed us to examine differences in eviction notice requirements for nonpayment cases in otherwise comparable areas.
We used a regression discontinuity design to detect the associations between notice requirements and filing rates across state borders. We placed two restrictions on our analytic sample. First, we included only counties with validated court-issued eviction filing counts. We restricted the sample in this way to eliminate concerns that the covariates used to predict filing counts in the Bayesian model may be correlated with state policy conditions, thus inflating the association between notice requirements and filing rates. Second, the county had to be located in a CBSA that crossed state borders. We included cross-state CBSAs in the sample regardless of whether notice requirements changed at the state border; CBSAs without variation in notice requirements were necessary to help estimate the associations (37). Our sample included 230 counties in 39 CBSAs that met these criteria. Fig. S13 shows the counties included in the sample and the distribution of notice requirements across states. Table S8 provides a list of the CBSAs included in the sample.
We fit a longitudinal linear regression model with random effects at the county level and fixed effects for year and CBSA. The random effects were appropriate for repeated observations within counties across time. Including fixed effects at the year level helped account for any exogenous variation in filing rates over time. CBSA fixed effects helped control for local time-invariant conditions that may be associated with the likelihood of eviction filings. The outcome was the logged eviction filing rate. We calculated this by dividing the number of court-issued filings by the total number of renting households in the county and then taking the natural log. We added a small constant (0.01) to avoid dropping the few county-years in which the filing rate was zero. Excluding these counties did not affect the results. We logged the filing rate to adjust for skew (i.e., the majority of counties had very low filing rates, while a few consistently had more filings than renting households).
The primary covariate of interest was a categorical measure of required days notice for nonpayment filings. Although the original measure was interval (number of days), we created a categorical measure due to concerns that the association between days required notice and eviction filings may not be linear; for example, additional days' notice near the beginning of the interval (e.g., one to three days) may be more consequential to the filing rate than additional days near the end of the interval (e.g., 12-14 days). We separated days' notice into four categories: We tested alternative specifications of these categories (e.g., 0 days, 1-6 days, 7-13 days, and 14 or more days), but chose the shorter windows due to the lack of states requiring more than 14 days' notice prior to nonpayment filings. We entered this measure into the model as a series of indicators to reflect the assumed non-linearity in association between notice period and filing rate. Descriptive statistics for this measure (and all others included in the analysis) are shown in Table  S17. The models contained several additional control variables. First, we included the county-level S -33 socio-demographic variables from the Bayesian model of case filings: percent renting households, share African American population, household density, median household income, median rent, and unemployment rate. We logged household density, median household income, and median rent to adjust for skew (Model 1 in Table S9). We also fit a model that included the number of courts that hear eviction cases, but this did not substantively change the association between notice requirements and filing rate (Model 2 in Table S9). We did not include this model in Fig. 4 due to space considerations.
We then included a series of policy measures capturing other aspects of the landlord-tenant legal environment that may affect how often landlords file eviction cases. First, we used an indicator for whether a state requires that landlords have a just cause to terminate tenancy at the end of a lease. This measure was also taken from the LSC Eviction Laws Database (36). Although this requirement itself may not substantially impact on the number of eviction filings, it may help capture the general character of the landlord-tenant legal regime (35). We also included the fee required to file an eviction case (see Section 1.3.13) as a measure of the relative ease (or difficulty) of case filing. Filing fees were consistent across all counties in 28 states and tended to be very similar across counties in states that did not have fixed fee schedules (Table S6). In general, lower filing fees were associated with higher eviction filing rates (Fig. S30). Finally, we included a state-level indicator for whether filings were eviction-specific or represented all landlord-tenant cases. In practice, discrepancies between these counts are very minor as most landlord-tenant filings are eviction cases (see Section 3.2) but this indicator also helped to verify that any associations with filing rates were not due to the inclusion of non-eviction complaints in the filing counts. While filing fee and landlord-tenant indicators were both significantly associated with filing volume, they did not explain the association between notice requirements and filing rates (Model 3 in Table S9).
Finally, we fit two additional models with more stringent restrictions on sample inclusion to test the robustness of these associations. These additional restrictions imposed criteria expected to increase comparability of the counties included within CBSAs. First, we restricted the sample to counties that were located directly along the state border. This restricted our sample to 168 counties in the 39 CBSAs but did not substantively alter the estimated associations between notice requirements and filing rates (Model 4 in Table S9). Second, we restricted the sample to counties that had population centers located within 25 miles of the state border (38). Again, this resulted in a smaller number of counties in the sample (N=185) but no substantive change in the findings (Model 5 in Table S9).
We also tested the robustness of the specification of notice requirements. We fit an identical set of models that used a binary indicator of required notice (1=landlords required to provide notice to tenants before filing a nonpayment case, 0=no notice required); these results are shown in Fig. S31. As expected, the indicator was negative and significant across all models.
In practice, the number of days required notice before filing usually approximates the minimum number of days a tenant can be late on rent before the landlord may initiate an eviction lawsuit. To demonstrate this, we created an equivalent categorical measure of the number of days a tenant can be late on rent before having a case filed against them from the LSC Eviction Laws Database (36). We updated this measure for any changes in legal code that occurred between our study period and the creation of the LSC dataset (Table S7). This measure produced almost identical findings (Fig.  S32), suggesting that policies governing the timing of eviction filings by landlords, including grace periods for late rent payments, are significantly associated with how frequently eviction cases are filed, even after accounting for local demographic and socioeconomic conditions.

S -34
These analyses do not establish the causal effect of notice requirements for nonpayment of rent on eviction filings, but they suggest that state-level landlord-tenant legal code can play an important role in shaping risk of an eviction filing. The estimated associations are also substantially large. This is likely due to the high filing rates observed in states that do not require prior notice before filing eviction cases for nonpayment of rent or do not have a minimum grace period tenants can fulfill past-due rent before receiving an eviction lawsuit. This has important practical implications. Landlord-tenant policies that incentivize landlords to legally document that tenants are past-due on rent or to use the courts to enforce rent collection (4) can put tenants at higher risk of receiving an eviction filing. This complicates commonly held beliefs that eviction filing prevalence is a straightforward reflection of unpaid rent. Two states could have similar proportions of tenants with outstanding rent balances but very different rates of eviction filings. In states that require landlords to provide a minimum period of notice (or a grace period for unpaid rent), tenants may be less at risk of receiving the permanent mark of an eviction filing in their housing history. As mentioned in the main text, eviction filings, not just orders to vacate rental properties or executed evictions by the local sheriff, are recorded in housing histories, limiting future access to rental housing (39).
Additional research is necessary to establish the causal effects of specific landlord-tenant policies on frequency of eviction filings and risk of displacement due to eviction. Eviction notice requirements may be associated with other aspects of state-level landlord-tenant legal code that shape risk of being threatened with eviction; these policy regimes may affect this risk in ways that extend far beyond the demographic or socio-economic factors that have been identified as increasing risk of receiving an eviction filing or forced displacement due to eviction. Maryland and New Jersey, states that have been frequently categorized as "renter-friendly" in previous studies of landlord-tenant legal regimes (e.g., 35), consistently report higher than average filing rates and do not require landlords to provide tenants with notice prior to nonpayment filings (nor do they specify a minimum number of days a tenant can be late on rent before receiving an eviction filing), demonstrating that there is much work to be done in understanding the characteristics and effects of landlord-tenant policy regimes.

Eviction Judgments
In this paper, we measured eviction prevalence as the number of eviction filings and households threatened with eviction annually. While these numbers illuminate the burden of eviction on the court system and the threat eviction poses to housing security, they do not tell us how many households are displaced by eviction. This number is more difficult to estimate for many reasons. First, court records do not provide information about whether tenants vacated the rental property before, during, or after an eviction filing. When multiple cases were filed against the same household, we knew that tenants remained at the property after the previous filing but did not know whether the case series ultimately resulted in displacement. For households who only appeared once in the data, we had no means of assessing whether the single filing resulted in displacement. Second, courts do not uniformly record whether writs of restitution were issued for the disputed property following a judgment in favor of the landlord. Forcible removal of tenants by the sheriff or marshal is perhaps the strictest definition of an eviction (31) yet counting only these events as evictions ignores the many tenants who vacate properties as a direct result of legal eviction action before a writ of restitution is granted or executed. Third, while a judgment in favor of the landlord restores legal authority over the property and/or grants the ability to extract past-due rent and fees from tenants, court records often contain unclear information about the terms of the judgment and whether landlords ultimately sought restitution of the property. Furthermore, the quality of this information and how it is recorded varies across court records, making direct comparisons of the volume of eviction judgments very difficult.
Most of the individual records data (both court-issued and proprietary) contained some information about how cases were resolved. The court-issued data often contained the judgment codes used by the courts; some were detailed enough to determine whether restitution of property to the landlord was included in the judgment, while others used only broad categories of judgments (e.g., bench verdict, consent judgment, default judgment, mediated settlement). Only a few states directly recorded which party the judgment favored. In some states, codes were too uninformative or ambiguous to determine whether the judgment was in favor of the landlord or tenant. The proprietary data, on the other hand, did not contain court judgment codes but marked judgments as small claims, civil judgments, or forcible detainer judgments in favor of the plaintiff. Small claims and civil judgments typically indicate that defendants were ordered to pay the plaintiff a monetary settlement, while forcible detainers involve restitution of property to the landlord. It was not possible to determine whether restitution of property was included in the small claims or civil judgments, which constituted the overwhelming majority of judgments in the data. Furthermore, 32.1% of cases in the proprietary data did not contain judgment information. This judgment information was not missing at random. Some dispositions, including dismissals, were less likely to be recorded in the data. Additionally, some states were less likely to have judgment information included for cases, including Georgia, New Jersey, and Utah.
We attempted to develop a systematic marking and tabulation of eviction judgments across data sources and states but found little agreement between court-issued and proprietary data when comparing aggregated judgment counts. There could be several reasons for these discrepancies. Lack of clear judgment information in some of the court-issued data made it difficult to mark eviction judgments in a way that was consistent with the judgment categories included in the proprietary data. The lack of original court judgment codes in the proprietary data made it impossible to reconstruct the information available in the court-issued data. To be clear, neither the court-issued nor the proprietary data were created with the purpose of tracking displacement due to eviction, underscoring the limitations of using administrative data to answer this type of research question. Furthermore, the proprietary data recorded the current case status at the point of data collection. If judgments or other actions occurred on the case following the date of collection, information may not be updated in these data. This may explain why some cases lack judgment information: when the case information was collected the case was still pending. The court-issued data would likely reflect the most up-to-date information on case status, although some court-issued data files contained non-trivial numbers of cases missing judgment information as well.
Although we were unable to create a comprehensive measure of eviction judgments, we calculated this measure for the court-issued individual records to provide a general sense of how many case filings resulted in eviction-related judgments. We first provide details about how we marked eviction judgments in the court-issued data. We then compare aggregated eviction judgments across 16 states. Although the court-issued data have more complete judgment information than the proprietary data, missing and ambiguous judgment information in the court-issued data still limits the ability to accurately determine how cases were resolved.

Marking Eviction Judgments
When possible, we marked cases as resulting in an eviction judgment if the final action recorded for a case was a judgment in favor of the plaintiff. Judgment information varied significantly across the court-issued data files. Some files included information on which party the judgment was in favor of or included straightforward judgment codes indicating unlawful detainer or monetary judgments. When this information was not included in the data, we marked judgment types likely to be in favor of the plaintiff (e.g., default, consent, and uncontested) as eviction judgments. For cases without clear judgment information, it was impossible to discern how the case was resolved. These cases could represent unresolved cases or cases in which there was a judgment (for eviction or otherwise), but it was not entered into the case management system. We did not count cases lacking judgment information as resulting in eviction judgments. We were able to observe case filings and some form of judgement information from court-issued data for 7,002 county years in the following states and counties: • Alabama In some instances, we observed an eviction judgment issued against a household but knew it did not result in displacement because the same household was observed at the same property on a subsequent case filing. In states and counties with complete name and address information for records, we could adjust for this. We created a secondary measure of eviction judgments in which we counted a case as having an eviction judgment only if it was the final case observed in the series of filings against the same household. We did not count eviction judgments on the non-final cases in the series as we knew the previous judgments did not result in displacement. This adjustment did not affect households that appeared once in the data, as one case represented their first and last filing. We were able to calculate this secondary measure for 6,549 county years in the following states and counties: We could not create this secondary measure for San Francisco County, Connecticut, DC, Kansas, or South Dakota due to missing name and address information. Although included in the adjusted measures, partial name and address missingness in Maricopa County, Arizona and Virginia may have resulted in incomplete adjustment for repeated filings against tenants. Fig. S33 shows the average rate of case filings ending in eviction judgments across states. In seven states more than 50% of the cases filed ended in eviction judgments, while eviction judgments were less common in other jurisdictions. For example, less than 25% of cases filed in DC ended in eviction judgments, suggesting that more cases may be dismissed, judged in favor of the defendant, or remain unresolved. Adjusting for repeated filings against the same household also produced larger reductions in the eviction judgment rate in some municipalities, including New York City and Virginia, relative to others. This suggests that filings result in different rates of eviction judgments across states and that some repeated filings result in eviction judgments that do not routinely lead to displacement of tenants. Fig. S34 highlights these differences by plotting comparisons between average rates of case filings, households threatened with eviction, and adjusted eviction judgments. Filings in North Dakota and Hawaii are more likely to represent cases against unique households and, in the case of North Dakota, eviction judgments than other states. New York and North Carolina have high filing rates, yet comparatively average eviction judgment rates. It is difficult then to make assumptions about the number of eviction judgments or displacement due to eviction based on the number of filings or unique households threatened with eviction in a state. Certainly, higher rates of filings are often associated with repeating cases against the same households, but some states still have relatively higher percentages of filings ending in eviction judgments than others, even after accounting for repeated filings. The differences in filing behavior by landlords across states are not clearly communicated in eviction records, leaving tenants in states with high filing rates with relatively inflated numbers of eviction cases in their housing histories.

Eviction Judgment Prevalence
Although these numbers are restricted in scope, these 11 states and 11 counties recorded an average of 342,139 eviction judgments annually. This is an average of 53.7% of filings resulting in eviction judgments (40). If these states are representative of the rate of filings resulting in eviction judgments in the national perspective, more than 1.9 million eviction judgments would be issued on eviction cases, on average, in the US each year. The magnitude of case filings, households receiving at least one filing, and eviction judgments annually demand better accounting of displacement due to eviction across the US, including work that acknowledges that filing rates differ systematically across states and do not accurately capture households displaced due to eviction.   S -44    Note: "Court = Prop." = exact match of court-issued and proprietary data; "Court ≈ Prop." = court-issued and proprietary data not an exact match but difference was less than 10 cases or 5% of filings reported in court-issued data; "Court < Prop." = court-issued data reported fewer cases than proprietary data (difference was more than 10 cases or 5% of filings in court-issued data); "Court > Prop." = court-issued data reported more cases than proprietary data (difference was more than 10 cases or 5% of filings in court-issued data); Mean. % diff is average percentage difference in court-issued and proprietary filing counts for county-years with more than 10 filings reported in the court-issued data. S -54 Note: "Court ≈ Ests." = Difference between court-issued filing counts and Bayesian estimates ("Ests.") was less than 10 cases or 5% of filings reported in court-issued data; "Court < Ests." = court-issued data reported fewer cases than Bayesian estimates (difference was 10 or more cases or greater than 5% of court-issued filings); "Court > Ests." = court-issued data reported more cases than Bayesian estimates (difference was 10 or more cases or greater than 5% of court-issued filings); Mean. % diff is average percentage difference in court-issued filing counts and Bayesian estimates.   Blue lines show linear, bivarate association; shaded gray areas show 95% confidence intervals. N=980 counties with court-issued filing counts and complete ACS data on population, median rent, poverty rate, and rent burden. We did not include counties with estimated filing counts from the Bayesian posterior distribution as the model included median rent and number of renting households, which would be expected to be strongly correlated with population.

Supplementary Tables
S -67 Fig. S10. Estimated eviction filings in demographically identical counties, by state, 2018 All covariates (number renting households, household density, percent African American population, median income, median rent, unemployment rate, and number of courts that hear eviction cases) are set equal to overall means and county and regional variation are marginalized over. Error bars show 95% credible intervals.
S -68 Filing rates for each state are plotted against one another for the years indicated on the axes. Filing rates calculated as predicted number of filings divided by number of renting households. The black line marks the diagonal. States falling close to the diagonal have roughly the same case rate in both years. Although some states shift above or below the diagonal, the relative disparities between states do not shift substantially.

Covariate-State Interactions
Error Scale    Rates of repeated filings against households were calculated by dividing the number of filings against households that had already experienced at least one prior case filing within the same calendar year by the total number of filings. Collection of case filing fees discussed in Section 1.3.13; filing fees were averaged across counties within states. Labels indicate state abbreviation. Solid blue line shows linear association. There is a negative relationship between filing fees and the percent of cases that represent repeat filings against the same household. Cost of filing is not a direct measure of the difficulty or ease of filing eviction cases across states; however, lower fees may reflect other aspects of the filing process that create lower barriers to entry for landlords using the court system. For example, very low filing fees in Maryland likely reflect the use of case filings as the initial eviction notice to tenants, resulting in substantially higher landlord filing volume.
S -88  Rates calculated using total renting households as denominator and averaged across years.