Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation

We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties - i.e., real estate entering the habitation market, just not for purchase (off-market) and hence free of speculation - as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.

One of the most impactful financial life-course events that individuals may encounter is buying a house, and in the United States (US) this fundamental decision is increasingly facilitated by online real-estate platforms such as Zillow.com,Trulia.com and Redfin.com.These marketplace service platforms aggregate available property information into virtual marketplaces, thereby facilitating the rapid and remote comparison of individual candidate houses, estimation of mortgage repayment schedules, and assessment of the overall realestate market.Their user bases are broad, including professional investors, traditional homeowners and sellers, and casual browsers alike [1].Consequently, the inflow of highfrequency market information that is aggregated by online real-estate platforms informs potential buyer and seller speculation, defined as near-term expectations of price and price movements [2], which is invariably conditioned by individuals varying and acutely sensitive perceptions of uncertainty.
Against this backdrop, one of the many perplexing outcomes of the COVID-19 pandemic was the emergence of exuberant markets in the US after the dust settled from the first shock wave.This was particularly evident in the housing market, as illustrated in Fig. 1(A), which shows the official US government All-transactions House Price Index for several regions in California (CA), where average home sale prices grew by up to 23% in 2021.Similar levels of price appreciation occurred in metropolitan areas across the US.
The initial reaction of US financial and housing markets to the COVID-19 outbreak were sharply negative, as this pervasive shock disrupted the health and security of individuals, thereby extending to entire socio-economic systems [3][4][5][6].So why the rapid turnaround in these markets in the second half of 2020?Prior empirical and theoretical work on real-estate markets establishes various factors underlying market volatility, but distinct differences in situational context make it chal-lenging to infer wether or not the prior insights readily extend to the events defining 2020-2021.One particular factor unique to the pandemic period were stay-at-home orders that promptly and unexpectedly thrust an entire society into interpersonal interaction and information consumption modes that were entirely mediated by the internet and electronic displays, the impacts of which are only now beginning to be understood [7].This situational context is relevant to our study given the prevalence and behavioral impact of online real-estate platforms in the US [1,[8][9][10], which establishes the conditions for multi-scale correlated phenomena that underly collective herding behavior [11][12][13][14][15][16].Other relevant factors include the rapid deployment of work-from-home accommodations that decreased the demand for metropolitan amenities [17], and also shifted perspectives on work-life balance and associated household expenditures [18,19].
Understanding the housing market's response to macroeconomic shock is critical to understanding the resilience of this fundamental global market.However, unlike stock markets, where an abundance of high-frequency data provides a clear avenue for analyzing market response to both anticipated and surprise news [20,21], there are scant high-frequency data sources for operationalizing such research on the real-estate market, even during 'normal' market periods.In this regard, our data collection approach exhibits the utility of novel highresolution and real-time altmetrics for research at the intersection of real estate and urban development [22][23][24][25][26][27][28].
In particular, we contribute to the literature on real-estate market dynamics and speculation by tracking individual property valuations for nearly 2 years before and two years after the onset of the pandemic in January 2020, which we hereafter denote by "1/2020".A distinguishing feature of our study is the construction of a high-resolution property-level dataset that captures two specific elements necessary for an-alyzing price speculation: (a) the 30-day change in estimated house price, which measures near-term price movements; and (b) the high-low range in estimated house price, which quantifies uncertainty in price expectations.
As such, our multi-year analysis leverages the sudden emergence of widespread uncertainty as an instrument for analyzing the impact of collective speculation.We leverage this systematic market shift by implementing a differencein-difference research design that compares price dynamics for properties listed for sale (on-market) to properties listed for rent (off-market) from the same neighborhood.These matched rental properties that were simultaneously available -just not for sale, and thus transparent to speculation deriving from short-term expectations of returns via resale -provide a counterfactual baseline for estimating our main result: to what degree was excess real-estate price growth attributable to COVID-19 pandemic uncertainty?
In what follows we address this question by way of the following three research questions.First, what are the characteristics of high-frequency real estate price dynamics at the 1-month resolution, and to what degree did they change after the COVID-19 pandemic?Second, to what degree did the pandemic shock to market uncertainty affect collective speculation -namely, in house price estimates and certainty in those estimates?And third, how did shifts in speculation relate to fundamental market factors, such as market size, supply, and benchmark borrowing rates?While our results are based upon regions in California, our results are likely generalizable to similar US regions featuring speculative growth and subsequent price relaxation (see "Shift in home prices since their respective 2022 peak" (L.Lambert, Fortune.com);corresponding full article published by Fortune.com)given the ubiquity of the underlying market factors (low interest rates, high uncertainty, supply constraints) during our sample period.
A common methodology in the real-estate literature are hedonic regression models, applied to identify attributes associated with a given property and neighborhood that are positively and negatively correlated with property valuations.Hedonic factors include property-level features such as building type, materials and floor area, combined with important local amenities [23] such as access to public transportation [40,41], and security of clean tap water [42][43][44].Other studies identify externalities that are pervasive, such as climate change impacts on tree shade coverage [45].
We do not employ hedonic factor analysis in this work, because our data source lacks consistent property-level features.Moreover, we do not model the estimated property valuation nor its final sale price.Instead, we take estimated property valuations as a given, and then analyze how valuation changes are correlated with micro-economic factors such as market size, local housing supply, and benchmark borrowing rates.
According to established economic theory, lower mortgage rates contribute to increased housing demand [30,46].Yet few housing market analyses are performed over periods featuring systematic urban-to-rural migration, as observed in the US during the pandemic [47], because most studies focus principally on select large metropolitan markets.Hence, there is scant research comparing urban and rural markets within the same region and period.As such, a distinction of the present work is the construction of a balanced panel of multiple neighboring regions, for both large and small market size, over a significant time horizon.We selected the 10 locations analyzed hereafter based upon the accessibility of consistent property-level data from Zillow.com, familiarity with the region, and most importantly, regional context.Namely, California has been affected for decades by an affordable housing crisis that is concentrated in regions with high wealth inequality, the Bay Area mega-region being a case example [48,49].
Another issue that has limited research on real-estate market dynamics is the scant availability of high-resolution data at the property level.Instead, market research commonly employs annualized property sales data that are aggregated at the regional level, which fails to capture market dynamics of individual properties.This choice follows from the technical challenges associated with assembling a balanced panel comprised of data with high spatiotemporal resolution by sourcing data from online real-estate platforms, with few examples [25,26].Instead, much of the related micro-economic literature uses house sales transaction data aggregated as mean values over sizable regions such as US ZIP codes or census tracts [17,19,30,32,34,50].One example is the recent work by Mondragon & Wieland [51] who use house transaction data aggregated across US counties over the period 12/2018-11/2021, reporting that a 1% increase in a region's share of remote-work explains 0.93% increase in average house prices across the US, which accounts for roughly half of the price growth over that period analyzed.
As the unit of analysis in our study is an individual property, as opposed to the median or average property within a specific zip code or other regional unit, we also contribute to empirical research on micro-level asset price dynamics [22].Various asset classes, such as stock prices, firm sizes and human productivity, are amenable to analysis over variable time windows ranging from intraday, to monthly, to intra-annual and decadal scales [12,[52][53][54].The most relevant study of real-estate market dynamics is by [30], who analyze capital gains on sold properties over a 5-year horizon for the specific region of San Diego, CA.We are unaware of research analyzing the dynamics of individual real-estate valuations at the 1-month frequency, which is a unique feature of our data source.
A final consideration regarding the extant COVID-19 research is the predominant focus on the short-term market decline in real estate markets immediately following the onset of the pandemic [17,34,35,55].This focus neglects the overwhelming market reversal that followed the initial negative market reaction.Such a narrow window also disregards the pre-existing trends in market appreciation that preceded the pandemic in California, USA and elsewhere.

Methods
Data source.We constructed a balanced 10-region panel with four notable features.First, we collected property-level data at high spatiotemporal resolution from a prominent online real-estate platform (Zillow.com).As such, the fundamental unit of observation in this balanced panel are individual house listings, which distinguishes our study from much of the prior literature.In total, the dataset is comprised of 57,414 individual properties listings from 10 regions spanning a nearly 4-year time period (2018-2021) [56].Notably, we do not include off-market properties (those that are not listed either for sale or rent on Zillow.com),even though Zillow Inc. produces and updates property valuation estimates for all on and offmarket properties within its massive and near comprehensive real-estate data for the US market.
Figure 1(B) shows the location of the 10 regions, which are official administrative units in CA.Individual house-level data were collected from the official Zillow Inc. application programming interface (API).For each month (from March 2018 to September 2021) and each region, we used the open-access Zillow Inc. GetSearchResults API to collect comprehensive data on all on-market properties belonging to either of two property categories: "For Sale" and "Rent".For further elaboration on the available house-level data see the official Zillow API page [57].
As such, because our panel includes high variation in region sizes and population density, these data can be used to compare market dynamics according to housing market size.Three regions are associated with big (urban) markets (San Jose, Modesto, Fresno), and the remaining seven are associated with small (rural) markets, as proxied by the principal city population for each region.Because these regions all belong to the Bay Area mega-region, connected together by a major public highway, we are able to estimate the differential impact of the pandemic on urban versus rural settings within the same macro-region backdrop.This approach distinguishes our study from other studies that focus on just the largest metropolitan real-estate markets.
Second, as the top real-estate website in the U.S. in 2021 with roughly 36 million visits per month [8], Zillow Inc. is a leading real-estate platform in an increasingly ubiquitous IT service sector [58].By maintaining a nearly real-time catalogue of available listings and estimated valuations, Zillow facilitates comprehensive market assessment in addition to mediating buyer-seller interactions.Consequently, data obtained from the Zillow API are algorithmically consistent, which is critical for analyzing simultaneous snapshots of entire regional housing markets.Alternative methods collecting ask and sales prices from regional multiple listing services (MLS) involve data collected from different brokers, realtors and sellers, and do not satisfy this consistency criterion.
Third, our dataset includes quantitative measures of speculation and uncertainty within the real-estate asset class, for which little is known.Specifically, Zillow collects, integrates and calculates real-time house price estimates, including a 30day price estimate change, along with high and low price estimates for each property.These property valuations derive from a proprietary in-house algorithm that estimates individual house prices based upon a massive and near comprehensive historical database extending back to the mid 2000s, including ask prices elected by the sellers and subsequent sale prices.These primary source data are readily available to the public and have fostered data science education and research by way of open competitions [57,59].
Zillow house price estimates are not only calculated at the point of market entry (typically when the seller declares a public ask price), but are also interpolated between prior listing and future price updates in real time.As such, even though Zillow price estimates are algorithmically determined, they integrate contemporaneous macroeconomic, regional, neighborhood and house-specific factors rendering the estimates consistent and robust.Moreover, price estimates are rapidly calibrated to property sale events -not only of the individual property itself, but also its neighbors, which contributes to a collective mode of price formation and speculation [26].This is in contrast to non-centralized data sources such as Multiple Listing Service (MLS) databases, which aggregate listing information that may depend upon realtors' and owners' idiosyncratic understanding of price formation and speculation.
A fourth feature of the data source is the consistent property value estimation for properties listed for sale and for rent.In the present study, rental properties entering the habitation market played an important role in accommodating the desire to escape high population density and/or to take advantage of remote work opportunity -two factors associated with the pandemic housing market.Hence, in what follows we juxtapose the price dynamics for these two distinct classes of available real estate to estimate the impact of pandemic speculation on the housing market.The key distinction being that buyer-seller interactions implicitly incorporate speculation on future price movements.By contrast, rental property owners instead opt for a revenue strategy based upon cash flow derived from future rents, which is less dependent on property and real-estate market speculation.To be clear, data obtained for rental listings are not monthly rent estimates, but are estimated valuations of the rental property, i.e. deriving from same algorithm as those properties that are listed for sale, rendering these distinct property classes directly comparable.
Data Collection.We obtained data for 10 proximal CA cities and their surrounding regions belonging to the Bay Area mega-region shown in Fig. 1(B).The largest principal city by population is San Jose (∼1 million inhabitants in 2021); and by area is Fresno (116 square miles); the smallest city by population is Mariposa (∼1500 inhabitants) and by area is Livingston (3.7 square miles).For spatiotemporal context, the distance separating San Jose and Fresno is roughly 150 driving miles (240 km) corresponding to 2.5 driving hours.Despite a wide variation in size, location and socio-economic backdrop, these 10 regions all feature shortages in affordable housing, a longstanding problem plaguing California and various other metropolitan areas in the United States [48,49].Seven of the principal cities are located along a major industrial and commuter transportation highway (CA 99), and are within the 3-hour super-commuter travel-time from the greater Bay Area, thereby qualifying as bedroom communities.Conversely, two regions (Mariposa and Oakhurst) are oriented around recreational tourism in and around Yosemite National Park.All together, these municipalities span a wide range of house prices, market size and turnover to support within and across-city analysis at high geo-temporal resolution.
In the remainder of the analysis, for data sampled between March 2018 and May 2019, we denote this sample as "before 2020"; and for data sampled between May 2020 and September 2021, we denote this sample as "after 1/2020".See Fig. S2(A,B) for monthly sample sizes for data grouped by property type ("For Sale" and "Rent"); and Fig. S2(C,D) for sample sizes grouped by 6-month non-overlapping periods that facilitate a visual comparison of average-property trends before and after 1/2020.
Each month we first obtained a set of unique listing identifiers (ZPID) by manually scanning across the entire Zillow.comdirectory for a given region and property type.This sampling frequency is sufficient to collect data for the majority of listings made within a monthly time window, as the average property during this period was on the market for 44 days [34].To ensure sample time consistency and to also be in in accordance with daily API call limits [57], we limited sampling to just these 10 regions.Consequently, API requests spanned just a couple days each month, and are thus contemporaneously consistent.Notably, a listing is featured in either of the property type catalogues at the owner's or realtor's discretion, and so we do not capture hidden or private listings, which is a limitation to our approach.However, given the prominence of Zillow.com in the US [8], we believe this sampling bias is very limited in scope.
Property-level metrics.The primary data used in this study come from two open data sources: the US Federal Reserve Bank of St. Louis and Zillow Inc.From the US Federal Reserve we collected monthly data compiled by Freddie Mac ® for the average US 30-year fixed rate mortgage, denoted by M m , which provides a macro-economic indicator of borrowing costs.From Zillow Inc. we exploit their internal system of unique property identifiers (ZPID) that facilitate property disambiguation to assemble a city-level panel of property-level data.Specifically, for each unique property h in sampling month m, we obtained the following data from the Zillow Get-SearchResults API: 1. the official address (including zip code and city name); 2. the longitude and latitude (centroid of the property); The price estimates (P h,m and δP h,m ) are calculated by Zillow Inc. based upon their proprietary in-house algorithm that incorporates a battery of hedonic factors.For example, inputs used to estimate P h,m include macro-economic market data (such as mortgage rates, regional and neighborhood data such as schools and similar houses), house-specific data provided by the seller and from external sources (habitation area, number of floors, construction materials and date, pool and yard dimensions, garage capacity, school district, neighborhood amenities, and other web-metrics such as house-views), and other properties in the neighborhood of h that are either contemporaneously for sale or were listed in the past (i.e., within the near-comprehensive Zillow Inc. property database).
Note that P h,m is not the asking price set by the listing agent, but rather an estimate of the property's market value.It is common for Zillow.comproperty profiles to feature up to 10 years of historical price estimates as a time series, also annotated by point events corresponding to prior ask and sales prices, which together inform buyer and seller speculation.Manual inspection of 10-year Zestimate ® time series indicates that new listings and updated ask prices are rapidly incorporated into the Zestimate ® algorithm [26].This rapid information collection is a critical feature that facilitates collective co-production of market speculation deriving from individual seller and online platform service user activity.In this regard, P h,m represents a dynamically updated estimate of the fair market value based upon real-time, localized and comprehensive market information.
Notably, the Zestimate ® error rate, measured as the percent difference between P h,m and the property's actual sale price, has decreased over time as their proprietary algorithm becomes more accurate.According to Zillow Inc., the median error rate (such that 50% of property valuation errors are less than this value) for on-market homes was 3.2% during our sampling period, and has since decreased to 2.4% [60].
These unique features of Zillow property data -namely, the comprehensiveness, consistency, dynamics and accuracy -facilitate analyzing the evolution of the housing market in specific regions at high geographic and temporal resolution.Without this rich data source, the next best alternative would be to pool records of seller ask prices.However, such data would not be consistent and would not include dynamics, as the ask price occurs at a fixed date and does not tend to change over a 30-day time window.Instead, the Zestimate ® is updated in real time.Also, seller ask prices do not include a price range, and so they do not permit analysis of valuation uncertainty.
We constructed our panel of Zillow property estimates by sampling Zillow.commonthly for over 4 years.As such, price values were obtained in nominal US$ at the sampling month m.Hence, in what follows, we deflated all price values to 1/1/2018 US$.We control for the data sampling (calendar) month in our statistical analysis to account for well-known intra-annual housing market activity cycles [61].
Based upon the primary data from Zillow.com, we also computed three additional metrics.First, we calculated the price change as a percent of the initial price, See Fig. S2(E,F) for the mean and standard deviation of ∆P h,m , grouped by period and property type.Second, we calculated the spot price uncertainty, In addition to contemporaneous valuation estimates, users are also confronted with longitudinal P h (t) histories extending up to a decade, which includes actual sales events indicated in the "Price History" section of each listing page.(B) Our quasi-experimental design leverages the algorithmically consistent data (P h , δP h , P + h and P − h ) available for on-market properties listed for sale (which are sensitive to market speculation) as well off-market properties listed for rent.Rental properties represent appropriate counterfactuals in that while they are available for habitation, they are off-market, meaning that they are neutral to short-term market speculation (since the time horizon for entering the market is well beyond the horizon for contemporaneous speculation).Consequently, whereas price changes for onmarket properties depend on shifts in the valuation of fundamentals in addition to market speculation, price changes for rental properties primarily reflect shifts in the valuation of fundamentals (e.g., the incremental value of an additional bedroom).Hence, this study applies a differencein-difference (DiD) design to net out shifts in the valuation of fundamentals in order to isolate shifts attributable to speculation -see Eq. ( 4).Moreover, by comparing shifts after versus before 1/2020, we estimate the effect of market speculation deriving from COVID-19 uncertainty on the real-estate market.
See Fig. S2(G,H) for the mean and std.dev. of U h,m , grouped by period and property type.And third, we estimated the neighborhood housing market activity A h,m of a particular listing h by counting the total number of properties within a 0.5 mile (0.8 km) radius, and within the contemporanous three-month period {m h , m h − 1, m h − 2} including the listing month m h .Pruned data sample optimized for unit-level matching.Our quantitative analysis focuses on typical property listings for which there is sufficient neighborhood activity to support unit-level matching.For this reason we exclude observations from our raw data sample according to four criteria.
First, we excluded property listings featuring extreme price change or price uncertainty values.Specifically, we only include properties with ∆P h,m ≤ 40% and U h,m ≤ 40%, which together reduced the original dataset from 133,668 observations to 110,530 listings (a 17% reduction).
Second, to ensure properties have sufficient real-estate activity in the neighboring vicinity that offers alternative buyer options, we excluded properties with fewer than 4 listings within the local neighborhood, defined as a 0.5 mile radius around h -which, for example, corresponds to 10 New York City blocks.This choice ensures that comparable properties used in our unit-level matching approach can be reached by walking, and so in principle have the same neighborhood amenities as the central property.
Third, we only consider alternative property listings from the same calendar phase, defined as a three-month window prior to and including the central property's listing month.That is, if a property was listed in April 2021, we only consider candidate matches in the before 1/2020 period that were listed in February, March and April.And fourth, we exclude listings outside of the active CA real-estate period, which is March thru October [61].
Together, the second, third and fourth stages of pruning further reduced the sample size from 110,530 to 57,414 listings, corresponding to a 48% reduction, largely attributable the second criteria regarding neighborhood activity.Together, these latter three criteria eliminated many listings corresponding to empty lots and other under-developed properties located beyond the principal city limits associated with each region.

Results
Descriptive statistics grouped by region and period.The distribution of house prices P h,m is approximately log-normal -see Fig. S3.This feature is consistent with the Gibrat proportional growth model [12,54].
Figure 3(A,B) show the distributions of ∆P h and U h , respectively.30-day price-change fluctuations (∆P h ) feature high levels of variance around the roughly 1-2% average price FIG. 3. Systematic increase in property valuation and confidence in the after-1/2020 housing market.Kernel density estimate of the probability density function (PDF) calculated for (A) 30-day price change, ∆P h,m , including the bestfit Cauchy PDF calculated using both the big and small market data combined; and (B) PDF calculated for price uncertainty, U h,m .Data shown are calculated using properties listed "For Sale"; see Fig. S4 for PDF conditioned on market size, period and property type.(C-F) Mean (⟨•⟩) and standard deviation (STD, σ[•]) calculated for ∆P h,m and U h,m conditional on spot price P h,m .Together, these two variables show how the after-1/2020 CA housing market features excess valuation growth and increasing valuation confidence (i.e., decreased uncertainty), patterns that are common to both the big and small markets, and appear to be even stronger for the small market.These effects manifest as systematic shifts in the first and second moments -i.e., the characteristic location (C,D) and characteristic fluctuation scale (E,F) -of the underlying data distributions, and are robust across the entire range of house listing price estimates.
growth levels observed during the sample period.The frequency distribution P (∆P h ) is asymmetric and leptokurtic, being wider in the bulk than the Laplace (double-exponential) tent-shaped growth distributions observed in other relevant empirical studies of economic growth [52][53][54]62].Both the positive and negative tails of P (∆P h ) are heavy, extending well beyond the values of ±40% used to truncate our data sample (a sampling choice used so that parameter estimates in our regression model are not biased by extreme outliers).
We estimated a best model for the P (∆P h ) distribution using the maximum likelihood method.The resulting best fit probability density function (PDF) is the Cauchy-Lorentz distribution, which has asymptotic power-law tail behavior P (∆P ) ∼ ∆P −2 for |∆P − x 0 | ≫ γ.The two Cauchy-Lorentz PDF parameters estimated using both big and small market data pooled together are x 0 = 0.2 (location) and γ = 2.0 (scale).
As illustrated in Fig. 3 Nevertheless, the distribution tails extend well beyond 10%, indicating that fluctuations in this real estate asset class are more similar to the heavy-tailed price fluctuation distributions observed for the equity asset class [63].One explanation for the heavy tails is the large scale of real estate depreciation that can occur over the lifetime of ownership, balanced on the other side by relatively sudden appreciation attributable to renovations.Put another way, when a property enters the realestate market, there is a rapid update in asset valuation that incorporates information that had accrued over wide-ranging time scales.This is of course not dissimilar from stock markets, where the periodic release of earnings and other news are rapidly absorbed into stock prices [21].
The price uncertainty U h is a unique feature of the Zillow API data, which also shows considerable variation, and is narrowly centered around the 10% level, but with significant right-skew.By way of comparison, consider the distribution P (∆P h ) calculated for properties listed for sale, for which we observe a systematic shift towards an excess frequency of ∆P h > 0 values after 1/2020 relative to before.Conversely, in the case of P (U h ) we observe the opposite trend, signaling increased valuation confidence after 1/2020 relative to before.Interestingly, in the case of rental properties, we observe no shift in P (∆P h ) comparing before and after, whereas the frequency of larger U h values post-1/2020 increases dramatically, possibly reflecting COVID-19 eviction moratorium policy rapidly implemented in the US [64][65][66].See Fig. S4 for complementary distributions conditioned on market size, period and property type.
Prominent shifts in real-estate valuation during COVID-19.Using the CA real estate market before 1/2020 as a comparative baseline, Fig. 3 shows that the post-1/2020 market feature hallmarks of a speculative bubble -namely (a) accelerated valuation growth net of change in fundamentals and (b) increased confidence in excess valuation.Somewhat ironically, these characteristics may have emerged by way of contagious spreading of 'irrational exuberance' among market agents [13,15,16].
One explanation for the enhanced real-estate speculation derives from the global COVID-19 uncertainty shock, which muddled global expectations for investment returns.This global shock resulted in a confounding and non-uniform impact on the public, as indicated by a diverging "K-shaped" recovery in the US population [67].The shock was also followed by profound policy interventions, such as the sudden reduction of the US federal funds target rate taking the form of a long-lasting financial-quake [20,21], which among other immediate effects promoted aggressive household borrowing that boosted home-purchasing power and home-improvement activity [68].This also triggered a sudden housing supplydemand imbalance exacerbated by the rapid expansion of remote work-from-home policy [17,51], in particular in the IT sector that is concentrated in the Bay Area mega-region.While these factors primarily affect the house purchase market, they also augmented uncertainty and speculation levels in the rental market, given the coincident increased demand for rent combined with sudden rent protection policy that together shifted risk-levels for both tenants and rental property owners [65].
Combined, these factors are reflected by significant systematic shifts in the characteristic levels of speculation (∆P h,m ) and uncertainty (U h,m ) across the entire range of P h,m -for both small and big markets.Notably, we observe higher average ∆P h,m in small markets than in big markets, consistent with nationwide analysis of the impact of state-level shutdowns on price changes in the months before and after their implementation, which were found to be mediated by differences in population and structural density between urban and rural markets [34].
Compared with recent work analyzing the real estate market in southern CA that finds a negative relation between price growth and price [30], a relation that is consistent with other asset classes such as firms and stocks [54], we instead observe an increasing trend in ⟨∆P h,m⟩ ⟩ with P h,m after 1/2020, indicative of accelerated speculation -see Fig. 3(A).This shift is also readily apparent in the higher levels of price-growth variation (σ[∆P h,m⟩ ]) observed after 1/2020 -see Fig. 3(B).Again, this pattern deviates from the well-established decreasing size-variance relationship found for other asset classes [12,52,54,62,63,69].Contrariwise, Fig. 3(C,D) indicate a reduction in mean and standard deviation of price uncertainty after 1/2020, also consistent with the conditions of a speculative bubble.
Quantifying the effect of COVID-19 on speculative valuation in a CA real-estate market.We use the rapid onset of the pandemic as an exogenous shock to uncertainty, which thereby facilitates estimating the degree to which shifts in property valuation and valuation confidence during the pandemic were attributable to collective speculation.Our approach contributes to a growing body of quasi-experimental COVID-19 research in the social sciences [39].
As a consistency check, we implemented two complementary quasi-experimental methods: (a) unit-level matching and (b) multivariate regression.Unit-level matching of individual properties leverages the granularity of our data sample to estimate treatment effects manifesting at high spatiotemporal resolution.Instead, multivariate regression yields inferences based upon differences in group-level averages, with the notable advantage that additional regressors can be included in order to control for micro-level (e.g., number of neighboring properties listed for sale, A h,m ) and macro-level covariates (e.g., contemporaneous mortgage rates, M m ).
Fundamental to both methods is identifying a counterfactual baseline to net out differences pre-existing before the pandemic.To this end, both approaches utilize the rental market -comprised of properties that satisfy the same demand for housing, but were just not available for sale and thus were neutral to contemporaneous speculation -as a counterfactual baseline for comparison.Accordingly, both approaches rely on the parallel trend assumption between on-market (denoted by "For Sale", F S) and off-market ("Rent", R) property types, which we confirm and exhibit in Figure S7.
The logic underpinning this counterfactual approach is as follows.Whereas shifts in the valuation of on-market properties depend on shifts in the valuation of fundamentals in addition to market speculation, shifts in the valuation of off-market properties primarily reflect shifts in the valuation of fundamentals.Hence, we can estimate the impact of speculation on a given quantity Y by way of a difference-in-difference (DiD) strategy denoted by as illustrated in Fig. 2(B).For example, in the case of Y = ∆P with ∆F denoting the shift in price (∆P ) associated with fundamentals and ∆S denoting the shift in price associated with speculation, then the DiD corresponds to where the last line is follows if , reflecting the assumption that there was no systematic shift in the value associated with changes in fundamentals-oriented valuation (e.g.renovation and maintenance costs) between the two property classes before and after 1/2020.We apply this strategy to estimate the effect of the COVID-19 pandemic on two quantities that are sensitive to uncertainty: Y = ∆P h and Y = U h .Note that Eq. ( 4), which we further specify in the following section, inherently incorporates a temporal difference between the before and after 1/2020 periods.This second difference implies that the DiD ∆∆ Y is net of the baseline level of the market before 1/2020, meaning that this estimator quantifies the magnitude of price shifts specifically attributable to the speculation in the CA real-estate market deriving from COVID-19 uncertainty.
Method 1: Unit-level matching.The quasi-experimental matching design implemented in this subsection does not conform to a traditional treatment-control setting [70], as the pandemic was perniciously pervasive -i.e., there is no untreated group in the period after 1/2020.Yet our approach still incorporates notable advantages of matching designs.Foremost, this approach accounts for unobserved covariates that are nonetheless correlated with the available matching variables.That is, while we do not explicitly incorporate housespecific features -such as vicinity to shopping and schools, backyard size and other physical amenities such as a pool and garage, these and many other variables are implicitly incorporated into each property valuation P h,m value, which we use in the counterfactual matching stage.Moreover, by virtue of its design as a leading e-platform [71] that derives value by aggregating comprehensive and contemporaneous local and national house listings, P h,m values are believed to be consistent and thus well-suited for the purpose of unit-level matching.
Our matching design also exploits the high geo-temporal resolution of the listing data to match properties listed after 1/2020 with similar properties listed before 2020, thereby optimizing measurement precision in the evaluation of market shifts due to pandemic uncertainty.An advantage of this approach is addressing the high degree of price and price change variation that exists even within a single region, as illustrated in Fig. 1(C).To be specific, we account for unobserved unitlevel features [70] by strictly matching houses according to three listing features: (a) price strata, (b) calendar month, and (c) geographic location -variables that are only weakly related to the variables of primary interest, namely ∆P h,m and U h,m .
We match on price strata by first calculating an intensive variable Q c (P h,m ) ∈ 1, 2...10, with 1 (respectively, 10) representing the lowest (highest) price decile that is a specific to a particular city c and before/after period.Assuming that potential buyers would be open to a range of house prices in excess of a single decile, we then allow for matches within ±1 decile group from Q c (P h,m ).We constrained matches temporally by requiring matched houses from the same calendar month or 1 calendar month prior of the central house, which accounts for intra-year housing market cycles.For example, if a property was listed in June, then we only accept properties listed in May or June as candidate matches.And we constrained matches geographically by requiring matched houses to be within a 0.5 mile (0.8 km) radius of the central house.
By way of example, Fig. 1(G,H) illustrates the matching procedure using a property from San Jose listed after 1/2020, which also highlights the reduction in market supply after 1/2020 relative to before.Note that not all houses within the specified radius are candidate matches because the price vari-ations in a single neighborhood can span several Q c (P h,m ) strata.In Fig. 1(G) we denote the set of matched houses in the same neighborhood of a given central house h by {N h } Bef .
More specifically, for each property h listed after 1/2020, we identify the match set {N h } Bef from the pool of similar properties listed before 2020.We then construct a hypothetical property listed before 2020 that is very similar to h.Ideally, the counterfactual property would be the same property h using data sampled from before 2020.Unfortunately, the Zillow API only returns data contemporaneous to the data download date, and so we are unable to back-sample prior valuation data for any given property h.In order to overcome this challenge, a more sophisticated research design would need to identify a repeated sampling procedure to obtain a balanced Zillow estimates for the same set of properties over time, which was beyond the scope of our data collection capability, and is a limitation shared by most real-estate analyses using data (aggregated or not) for on-market properties.
The characteristics of the counterfactual property are given by the average value ⟨Y ⟩ {N h }Bef calculated across the match set {N h } Bef , where Y represents either P h,m , ∆P h,m or U h,m .We then compute for each h the counterfactual difference which estimates the shift in Y associated with the two time periods.In a companion study, we perform a similar analysis by instead matching first across property types within each time period, and then computing a temporal difference.This approach is more constrained by smaller R sample sizes for the period after 1/2020, yet we obtain largely consistent results [36].
From the set of ∆ Y,h values collected for each region and property type, we then calculate the average difference where we denote the property type in subscript, e.g.∆ Y,F S and ∆ Y,R .The impact of the COVID-19 pandemic on the variable Y is then estimated according to the magnitude and statistical significance of ∆ Y .We evaluate the latter using a one-sample Student T-test to estimate the likelihood of the null hypothesis ∆ Y = 0 representing no pandemic effect.Figure 4(A-C) show the sign, magnitude and statistical difference of ∆ Y calculated for the three property-level variables P h,m , ∆P h,m or U h,m .See Fig. S5 for the distribution of individual ∆ Y,h values from which ∆ Y are calculated; and see Fig. S6 for ∆ Y,c calculated at the city level as a demonstration of robustness over down-scaled regions.Hence, the difference in difference ∆∆ Y defined in Eq. ( 4) nets out the overall market shifts that may bias interpretation of ∆ Y,F S when considered alone.What remains after subtracting our speculation-neutral baseline for comparison ∆ Y,R is the excess impact attributable to speculation implicit in property sales.We evaluate the statistical significance of the null hypotheses ∆∆ Y = 0 using the two-sample Student T-test with Welch correction that accounts for varying samplesize and variance between the F S and R samples.We begin by considering Y values corresponding the absolute price change, which we report primarily for the purpose of demonstrating that the magnitude of price shifts we encountered are not incremental.Figure 4(A) shows ∆∆ P = ∆ P,F S − ∆ P,R of roughly 8,000 US$ for both market sizes.This result indicates that the same property h listed for sale is valued 8,000 US$ more than if it was listed as available for rent, a result which is significant at the p < 0.001 level.
In what follows, we focus on the relative quantities ∆P h and U h , because intensive quantities (i.e., percentages) are more directly comparable, while also being less sensitive to the matching variable Q c (P h,m ).In particular, Fig. 4(B) indicates excess valuation growth over a 30-day period of ∆∆ ∆P = ∆ ∆P,F S − ∆ ∆P,R = 1.36 − 0.26 = 1.1 percentage points for the average property in the big market, and 1.47 − (−0.53) = 2.0 percentage points for the small market.Both DiD values are significant at the p < 0.001 level.This result suggests that the valuation of the same property h would appreciate an additional 2% percentage points more if it were listed for sale, as opposed to if it were instead listed as available for rent.In terms of the magnitude of this effect on properties listed for sale, the increase in ∆P h,m is more than double the characteristic levels observed prior to the pandemic -see Fig. 3(A).
Regarding percent price uncertainty, we calculate ∆∆ U = ∆ U,F S − ∆ U,R = −2.4percentage points for the big market, and ∆∆ U = -7.2percentage points for the small market.Both DiD values are significant at the p < 0.001 level.This result suggests that the certainty in the valuation of a property is higher if it were listed for sale than if it were listed as available for rent.
Method 2: Multivariate regression.We complement the matching method with multiple regression, which affords estimating marginal relationships with temporal and spatial covariates.In what follows we implement a two-period difference-in-difference (DiD) model for three regions (San Jose, Fresno, Merced) for which sufficient rental property data is available to serve as the before-and after-1/2020 control group.In short, we apply ordinary least squares (OLS) regression using STATA 13.0 software to estimate the following model for a specific region, where ⃗ X (respectively, ⃗ I) represents a battery of continuous (respectively, factor) controls, and the DiD interaction term δ T E (I h,ForSale × T m ) captures the difference between the two property types (specified by the binary indicator variable I h,ForSale ) across the two periods (specified by the binary indicator T m ). Figure S7 shows that the conditions of the DiD parallel trend assumption in the period before 2020 are sufficiently satisfied for both ∆P h,m and U h,m .And for additional cross-validation, see the study by [17] analyzing repeatedtransaction home price data within and across the 25 largest metropolitan statistical areas during the before-2020 period.And regarding the exclusion restriction on the treatment, one can verify this assumption by using Zillow.com to manually inspect properties listed for rent, and compare them to those that are listed for sale to see that there are no a priori systematic differences between the two property types.
More specifically, we apply this canonical two-period DiD specification to model two different dependent variables: Y ≡ ∆P h,m and Y ≡ U h,m .For each model we implement fixed-effects to account for time-independent factors associated with the calendar month m of the listing (C m ), and region-specific price strata Q c (P h,m ), where both quantities are encoded as categorial variables.Hence, the treatment effect δ T E is the direct analog to ∆∆ Y , and estimates the excess shift in Y attributable to collective speculation deriving from COVID-19 uncertainty.
In the first scenario where the dependent variable is the 30day percent price change, the model specification is where the covariates are h,m × I h,ForSale ) + β P ln P h,m + β P 2 ln 2 P h,m and the factor variables are The interaction between I h,ForSale and several control variables differentiate responses conditional on property type.Full model estimates are elaborated in Table S1.
Similarly, in the second scenario where the dependent variable is the percent price uncertainty, the model specification is Full model estimates are elaborated in Table S2 Figure 4(D) shows the estimated treatment effect δ T E for each model and city.Results indicate an excess 30-day percent price change of δ T E,∆P = 0.85 (Fresno), 1.13 (Merced) and 1.21 (San Jose) percentage points.These values are consistent in sign, magnitude and statistical significance with the corresponding market-level DiD values ∆∆ ∆P estimated using the matching method.Both methods indicate excess valuation, or higher valuations than there would have been in the absence of COVID-19 market shock, which is consistent with prior theory of housing-market speculation [2,15].
At the same time, we observe declines in price uncertainties, corresponding to increases in valuation confidence, attributable to the pandemic: δ T E,U = -3.1 (San Jose), -3.6 (Fresno) and -8.9 (Merced) percentage points.As a robustness check, we confirm that each point estimate δ T E,U is consistent in sign, magnitude and statistical significance when compared with the corresponding market-level DiD values ∆∆ U estimated using the matching method.See [36] for additional consistency check, where an analog to Method 1 is applied, but instead matches for-sale and rental properties within each period and city, and then computes a pre-post difference as a DiD estimate.
A clear limitation of our data sample is the lack of additional property-level feature data.As such, unobserved factors may bias our δ T E,∆P and δ T E,U estimates, including construction supply constraints [49,72], the regulatory environment for affordable housing construction [34], shifts in demand for amenity density [17], and remote-work and associated migration [32,51].
Marginal effects of market supply and mortgage rates.To further explore the relative impact on price change and uncertainty, Fig. 5 shows the margins associated with (a) neighborhood market activity A h,m , a micro-level indicator of housing supply measured as the number of potentially competing listings in the immediate vicinity of each listing; and (b) the average 30-year fixed-rate mortgage M m obtained from Freddie Mac ® , which is an inverse measure of homeowner borrowing power.
The specification used to estimate these marginal effects is nearly identical to the DiD models described above.The main difference is we do not include the DiD term (I h,ForSale × T m ).Instead, this model includes an interaction S h × A h,m × T m in order to quantify the marginal effect of neighborhood market activity A h,m associated with Y = {∆P h,m or U h,m }, while accounting for differences in period and market size.Full model estimates are elaborated in Table S3.
Figures 5(A-D) provide an estimate of the semi-elasticity of price with supply, and are consistent in magnitude with prior empirical work by [29] on the full elasticity of housing supply conditioned by land development constraints.For example, an additional 10 local listings (i.e.A h,m shifting from 10 to 20) corresponds to a reduction in price change of roughly 0.6 (resp.0.7) percentage points for the small (resp.big) market before 2020; however, after 1/2020 this reduction increased in magnitude by roughly 0.1 percentage points for both markets as indicated by the increasingly steep slope after 1/2020.
Another factor explaining price gains during this period are the lower interest rates that directly affect buyer purchasing power and builder construction costs [46].The slope of the lines shown in Fig. 5(C,D) provide an estimate of the mortgage rate semi-elasticity, indicating a roughly 0.7 percent price increase for a 1 point reduction in M m , which is on the lower side but and consistent with estimation based upon a wide range of approaches [31].The discrepancy may be attributable to the relatively low range of M m and relatively high monthly price changes encountered during our sample period.Note that the estimation for smaller (larger) interest rates for before (after) 1/2020 are extrapolations into out-ofsample M m regimes, as indicated by the larger standard errors indicated in the regression fit.
Another relevant analysis for comparison is one based upon the San Diego housing market from 1997-2008, which attributes higher price gains for houses at the lower end of the price distribution to cheaper credit [30].While we do not explicitly explore the interaction between M m and ∆P conditional on P , we do not see evidence of the differential price gains by price segment over this period for big vis-a-vis small markets, as also indicated by Fig. 3(A).
Figure 5(E-H) show analog response margins associated with price uncertainty U h .For both big and small markets, uncertainty levels tempered after 1/2020 relative to before, corresponding to higher levels of valuation confidence for the same levels of neighborhood supply.Counterintuitively, this result indicates more efficient price discovery [73], despite greatly heightened socio-economic uncertainty.Interestingly, the informational signal captured by A h,m diminished during the pandemic in the small market, as indicated by the relatively flat profile in Fig. 5(F).

Discussion
The rapid emergence of the global pandemic, followed by pervasive mitigation policy, had broad yet uneven impacts across society [3,4,6,[64][65][66][67]. Against this backdrop, we seek to contribute to the rich literature emerging from this global crisis [39] by utilizing this sudden uncertainty shock to analyze the collective dynamics of real-estate price formation.
As all markets are highly correlated systems [11][12][13], the COVID-19 pandemic perturbed the housing market in several critical ways.First, the pandemic shifted social interactions towards virtual modes, which increased the importance of online real-estate platforms as decision-making tools.The subsequent interruption to everyday life had immediate effects, as documented in research showing that US counties featuring stay-at-home orders also had higher property sale prices [34].Other perturbations include global supply chain disruptions [72] that negatively impacted building costs and exacerbated supply inelasticity [32], two features that are central to the theory of emergent housing bubbles [2,15].These sup-ply factors were complemented by the expansion of remotework options, which effectively increased the search radius of buyers, and decreased the overall demand for amenity density [17].Another pertinent factor in California are the pervasive regulations regarding real-estate development and new home construction [49].
Viewed from a longer perspective, the US real-estate market has been steadily transforming since the housing boom leading into the bust of 2007-2008.In particular, the growth of the IT service economy [58,71] has brought online realestate platforms to ubiquity [8], with roughly 110 million distinct properties tracked by Zillow Inc. [9], corresponding to roughly 3 out of every 4 of the 142 million housing units tracked by the US Census Bureau in 2021.In addition to updating on-market and off-market property data, Zillow also calculates algorithmically consistent property valuations that are increasingly relevant to price formation in the US realestate market.
The utility of such comprehensive and rapidly-updated market data extends far beyond active buyers and sellers.According to a recent industry survey [1], 75% of the respondents classify their time casually browsing real-estate platforms as an imagination outlet, with only 17% claiming to search listings with serious home-purchase motivations.This statistic suggests that, in addition to fundamental shifts in supply and demand, the extreme levels of price growth during the pandemic may be attributable to behavioral phenomena related to heightened levels of life-course uncertainty, and an increased prevalence of naive speculators that are important contributors to bubble formation [10,14].Hence, inasmuch as real-estate platform service providers facilitate crowd-sourcing, browsing, and market-making, they also facilitate analyzing the dynamics of speculation at high resolution and vast scale.
As such, this work consists of both methodological and empirical contributions to the real-estate market literature.In order to address our three research questions, we first constructed a high-resolution multi-region balanced panel comprised of individual property valuation estimates, which thereby facilitates inferential econometric analysis.Our main result is estimating the excess price growth attributable to the COVID-19 pandemic by way of two complementary econometric DiD approaches: unit-level matching and multivariate regression.
Our property-level dataset combined with a pre-post model design leverages the systematic comparison of price estimates for properties listed for sale versus those listed for rent, the difference corresponding to the effect of pandemic uncertainty on price speculation.Another unique feature of our panel is its regional composition, including both big (urban) and small (rural) real-estate markets.In our first DiD approach, we matched house listings based upon the set of available characteristics (listing month, price strata, longitude-latitude of the property) to optimize around precision in the calculation of the effect size [70].In the second DiD approach, we implemented a canonical 2-period and 2-group model that incorporates additional covariates while also exploiting the different valuation and socio-economic features of renting versus buying that were exacerbated during the pandemic.Both approaches yield consistent results, as summarized in Fig. 4. Because the 10 regions analyzed capture a relatively wide variation in size, location and socio-economic backdrop, there is reason to believe our results are generalizable to other US regions with housing markets similar to the Bay Area mega-region.
Limitations: Our data and methods are characterized by various limitations.One limitation of our data sample is the lack of additional property-level feature data.As such, unobserved factors may bias the δ T E,∆P and δ T E,U estimates produced by the multivariate regression method.Relevant omitted variables include construction supply constraints [49,72], the regulatory environment for affordable housing construction [34], shifts in demand for amenity density [17], and remote-work and associated migration [32,51].These estimates may be further biased by spatial autocorrelation, which may call for more advanced econometric methods employing spatial lag variables.However, we do note that our matching method accounts for time independent spatial autocorrelations, which are neutralized in the first difference applied in Eq. (6).
For this reason, we complemented the regression method by a matching method, which constructs a hypothetical counterfactual property according to three matching factors: price, location and calendar listing month.In particular, we assume that the estimated price P h,m incorporates all the omitted variables in a consistent way.Hence, in matching properties according to price and location, we are able to factor out the missing idiosyncratic property details that contributed to each property's valuation.
Another notable limitation of our study is the inability to account for two complementary demand-side factors, namely the shift towards remote work and the coincident emergence of online market intermediaries, or iBuyers.Regarding the former, recent work shows that an increasing prevalence of remote work, and subsequent housing demand shifts associated with migration, explains roughly half of the aggregate price changes over 2019-2021 [51].Meanwhile, recent analysis on the emerging paradigm of instant-offer iBuyer platforms finds that the profitability of this emerging industry is highly impacted by valuation uncertainty [74].Consequently, despite our analysis subsuming these factors, we are unable to crossvalidate or contribute additional insights regarding their role in market speculation.

Conclusion
We analyzed the impact of the COVID-19 pandemic shock using a property-level dataset including unique measures of uncertainty and speculation.Despite the drastically increased levels of uncertainty surrounding the scope and duration of the global pandemic, our results indicate a counterintuitive decrease in property-level price uncertainty (U h,m ).At the same time, we employ two complementary methods to estimate ∆∆ ∆P and δ T E,∆P , respectively, which quantify the excess price growth attributable to heightened levels of pandemic speculation.Both methods yield consistent estimates, on the order of 1% per month excess price growth, i.e. above the levels of growth that would be expected in the absence of the pandemic, corresponding to roughly +12.7 percentage points when integrated across an entire year.For context, this effect size accounts for more than half of the actual annual growth observed across these same regions in 2021.The coincidence of accelerating price growth and valuation confidence is a hallmark of a speculative bubble, which we found to be stronger in the smaller housing markets, and likely reflects their greater susceptibility to sudden supply contraction.
Considered together, these results are harbingers of 'irrational exuberance' [16] in response to the sudden shock to long-term certainty that augmented the dynamics and scale of collective speculation.These findings, when contextualized against the backdrop of major life-course decision-making, are reconciled by behavioral theory regarding the persuasive power of uncertainty [37] and sudden unexpected interruptions [38].Considered in this light, while also accounting for the magnitude of severity and surprise of this global shock, we speculate that the response to COVID-19 uncertainty and subsequent daily life interruptions combined with the real-time inflow of market information collected by online real-estate platforms may have contributed to collective herding behavior that is central to speculative bubble formation in complex socio-economic systems [11][12][13][14][15][16].Consequently, this restricted our ability to obtain the house-level identifiers (ZPIDs) which were the principal input for the Zillow API for harvesting data; Data collection re-commenced in Feb. 2020, and since there is a 2-month padding to identify matched houses, this results in the data gap ending in May 2020.This data gap does not affect our ability to perform a pre/post analysis, as it falls principally during the housing market off-season of November thru February, which are summarily excluded from our analysis anyhow.Critical changes to the entire Zillow API platform in October 2021 haulted data collection entirely.
FIG. S3.Distribution of house price estimates grouped by city, period and property type.Price distributions are organized by city in two sets columns according to housing market size (Big and Small).For each city we show the smooth kernel density estimate of the conditional price distribution, P (P h |period, unit type), calculated according to four non-overlapping data samples: for two periods (before 2020 -gray dashed curve; after 1/2020 -colored solid curve) and two property types (For Sale and Rent).To facilitate comparing P (P h |period, unit type) across period for a given property type, vertical bars indicate the mean price value of the corresponding distribution.Note that all X-axes are shown on logarithmic scale, visually indicating that many P (P h |period, unit type) are log-normal distributed.Also note that there was only sufficient house rental data available through the Zillow API for 4 regions, with only one of these (Merced) belonging to the small market group.The empirical data distribution is asymmetric, with empirical frequencies in excess of (less than) the best-fit Cauchy distribution for relatively large ∆P h > 0 values ( ∆P h < 0 values).(B) Price change distributions for houses listed for sale, by market and period, showing excess frequency for ∆P h > 0 comparing after to before 2020, but not for ∆P h < 0. (C) Price change distributions for houses listed for rent, by market and period, where the main difference between the plots is associated with market size.Comparing panels (B) and (C), the rent distribution is less leptokurtic in the bulk and also decays faster in both the positive and negative tails.(D-F) Distributions of price uncertainty, U h,m indicate a skewed distribution closely centered around 10% with mean values closer to 11% in panels D and E which is dominated by properties listed for sale, and more variable in panel F which represents rental properties.

FIG. 1 .
FIG. 1. Schematic of data sampling and before-and after-1/2020 matching design.(A) All-Transactions House Price Index data by region, obtained from the US Federal Reserve Bank of St. Louis (www.fred.stlouisfed.org).Annual percent increase from Oct. 2020-2021 are listed in the legend (2021 data not yet available for Mariposa; for more details see Fig. S1).(B) Longitudinal panel of Zillow Inc. house listings across 10 regions in northern California, USA constructed over 4-year time period 2018-2021 (see Fig. S2 for sample size information).Shown are the locations and names of the 10 principal cities -separated into big market (magenta) and small market (green) groups based upon 2021 population sizes, which are proportional to each circle radius.(C) Spatial distribution of mean house price estimate calculated for properties listed for sale in San Jose before 2020; each grid is color-coded according to its corresponding distribution quintile.(D) Mean 30-day price changes, color scale corresponds to distribution quintiles.(E) Mean price estimate after 1/2020 using values deflated to 1/1/2018 US$.(F) Percent difference between grid values in panels B and D. (G,H) Schematic of house matching design.For each house listed after 1/2020 (denoted by the index h), we identified two sets of similar houses, denoted by {N h }Bef and {N h }Aft, based upon three criteria.Matched houses must be listed for sale in the same calendar month phase (e.g. if h is from July then matches must be from May, June or July), in the same price strata (i.e., matches must be within ± 1 price decile of h), and within a 1/2 mile radius of the central house.The set of matches {N h }Bef are used for causal inference by way of a difference-in-difference identification strategy.The set {N h }Aft is only used to estimate the contemporaneous neighborhood housing supply, denoted by the activity A h,m = |{N h }Aft|.(G) Candidate matches before 2020 (10 matches indicated by orange dots); and (H) after 1/2020 (8 matches).Candidate houses within the same period not meeting these criteria are indicated by blue dots.

FIG. 4 .
FIG. 4. Estimation of housing market valuation shifts attributable to COVID-19.(A-C) ∆Y is the distribution average of the unitlevel difference ∆ Y,h = Y h,Aft − ⟨Y ⟩ {N h }Bef calculated for the variable Y across properties listed after 1/2020.The counterfactual baseline ⟨Y ⟩ {N h }Bef is calculated using the set of matched properties that were listed before 2020 (denoted by {N h }Bef).In this way, matching facilitates a more precise estimation of the impact of COVID-19 on individual properties.Error bars indicate the standard error of the mean and stars indicate the significance level of a T-Test for the likelihood of the null hypothesis ∆Y = 0.Each gray bar represents the differencein-difference ∆∆Y = ∆Y,F S − ∆Y,R, which is an estimator for the effect of COVID-19 speculation on Y .Note that each market-level ∆∆Y is directly comparable and consistent with the corresponding city-level treatment effect δT E,Y shown in panel (D), where San Jose and Fresno are big markets, and Merced is a small market.(A) The difference in the price estimate (Y = P h,m ; all values deflated to 1/2018 US$) shows the average price change for listings after 1/2020.(B) The difference in price change (Y = ∆P h,m ) measures shifts in price valuations at high temporal resolution (30-day), and shows that properties listed for sale had excess price valuation relative to those listed for rent.(C) The difference in price uncertainty (Y = U h,m ) is inversely related to valuation confidence.In the case of properties listed for sale, we observe a 1-percentage point reduction in price-uncertainty, i.e. higher valuation confidence; conversely, we observe drastic price uncertainty increases for rental properties.(D,E) Summary of the COVID-19 treatment effect δT E,Y on properties listed for sale, based upon results from a twoperiod difference-in-difference multivariate regression model.To summarize, average percent price change values increased between 0.85 and 1.21 percentage points, and price uncertainties declined between 3 and 9 percentage points, relative to the baseline levels they plausibly would have maintained in the absence of the pandemic.Note that in both cases, this treatment effect corresponds to properties listed for sale.Error bars represent the 95% confidence interval in each point estimate; full table of parameter estimates are reported in Tables S1-S2.Significance levels indicated by the asterisks: * p < 0.05, ** p < 0.01, *** p < 0.001.

FIG. 5 .
FIG. 5. Marginal effects of local market supply and mortgage rate on price change and uncertainty.(A,B) Predictions of the relationship between the supply of alternative houses (defined as the number of matched houses within the same period as the central house listing, A h,m ) and price change ∆P h .Positive shift in ∆P h of roughly 0.5 percent after 1/2020 relative to before, which diminishes at higher levels of market supply for both small and big markets.(C,D) Predictions of the relationship between the average 30-year US Mortgage rate (Fixed rate, shown as percentage) and ∆P h .Positive shift on the order of 0.4 percent for both small and big markets.(E-H) Similar to panels (A-D) but showing the OLS model predictions for price uncertainty.As expected, the uncertainty associated with COVID-19 is more clearly manifest in the market valuation uncertainty than the price dynamics.Counterintuitively, the increased levels of uncertainty associated with the pandemic appear to have reduced uncertainty in price estimations, which points to the amplification of market speculation during this period of global stress.Shaded areas indicate 95% confidence interval around the predicted margins of response indicated by the dashed line.All marginal effects are scalculated using covariates maintained at their mean values.

A
FIG. S1.Quarterly price indices for several CA housing markets.Aggregate region-level house price indices produced by the Federal Reserve Bank of St. Louis (quantities are "Not Seasonally Adjusted" and "Estimated using sales prices and appraisal data.").(A) Official Price Index data available for San Jose-Sunnyvale-Santa Clara; Modesto; Merced; Fresno; Mariposa County.(B,C) Two-period percent change, ∆t(%) = 100(xt − xt−1)/xt−1, calculated for the price indices (xt) shown in panel A. Note that data for Mariposa are estimated at the annual frequency, and not the quarterly level as in the other cases, and because the range of values is substantially higher, we separated the panel for Mariposa County.

4 Mean 20 Standard 8 Standard 6 Standard
FIG. S2.Sample size and summary statistics grouped by period and property type.. Observations separated into non-overlapping subsets according to listing period and property types (For Sale and Rent).(A,B) Sample size by month.(C) Sample size as number of houses by period and type.(C) Sample size as fraction of all houses belonging to a period grouped by type, which shows common market size trends despite differences in absolute numbers.(E,F) Mean and standard deviation of ZEst house price, P h .(G,H) Mean and standard deviation of ZEst house price uncertainty, U h .There is a gap in data collection for month numbers 18 (June 2019) thru 27 (March 2020) due to changes in the Zillow website.Consequently, this restricted our ability to obtain the house-level identifiers (ZPIDs) which were the principal input for the Zillow API for harvesting data; Data collection re-commenced in Feb. 2020, and since there is a 2-month padding to identify matched houses, this results in the data gap ending in May 2020.This data gap does not affect our ability to perform a pre/post analysis, as it falls principally during the housing market off-season of November thru February, which are summarily excluded from our analysis anyhow.Critical changes to the entire Zillow API platform in October 2021 haulted data collection entirely.
FIG. S4.Distribution of price change ∆P h,m and uncertainty U h,m grouped by market size, period and property type.Distributions of 30-day price change, ∆P h,m (A-C) and price uncertainty, U h,m (D-F).For each data distribution we calculated the smooth kernel density estimate by collecting data into non-overlapping subsets based upon market size (Big and Small) and sampling period (before 2020 and after 1/2020).(A) Aggregate price change distribution exhibits leptokurtic shape (i.e., broader than the benchmark Normal distribution), with the best-fit distribution model identified as the Cauchy-Lorentz probability density function (PDF) P (x) ∼ 1/x 2 for |(x − x0)/γ| ≫ 1.Each dashed line corresponds to the distributions by market size: For properties in big markets, the location x0 = 0.15 and scale γ = 2.2; similarly, for properties in small markets, x0 = 0.34 and scale γ = 1.5.The bulk of the data are captured by |∆P h | ≤ 10, with mean distribution values around 1.25 and 1.5% indicated by the solid vertical lines.The inset shows the distribution on log-linear axes, with the line corresponding to the fit based upon both big and small market data pooled: x0 = 0.2 and scale γ = 2.0.(Inset) The price change distribution shown over the full range |∆P h | ≤ 40 %.The empirical data distribution is asymmetric, with empirical frequencies in excess of (less than) the best-fit Cauchy distribution for relatively large ∆P h > 0 values ( ∆P h < 0 values).(B) Price change distributions for houses listed for sale, by market and period, showing excess frequency for ∆P h > 0 comparing after to before 2020, but not for ∆P h < 0. (C) Price change distributions for houses listed for rent, by market and period, where the main difference between the plots is associated with market size.Comparing panels (B) and (C), the rent distribution is less leptokurtic in the bulk and also decays faster in both the positive and negative tails.(D-F) Distributions of price uncertainty, U h,m indicate a skewed distribution closely centered around 10% with mean values closer to 11% in panels D and E which is dominated by properties listed for sale, and more variable in panel F which represents rental properties.

25 Y 4 . 6 YFIG
FIG.S5.Distributions of differences in matched-house price change and price uncertainty grouped by market size and property type.Shown are the full distributions of match differences to supplement the mean match difference values (∆Y ) reported in Fig.4.For each house listed after 1/2020 we calculate ∆ h between that house and the average value of Y calculated across the set of matched houses {N h } listed before 2020, i.e. ∆ Y,h = Y h −⟨Y ⟩ {N h } , where the second term in the difference is the average value of Y calculated across the set of matched houses that were listed before 2020.
Schematic of quasi-experimental design for estimating the magnitude of price shifts attributable to COVID-19 market speculation.(A) Shown is a Zillow webpage for an actual on-market property listed for sale.Red highlights indicate the primary source data obtained from the open-access Zillow Inc. GetSearchResults API; yellow highlights indicate additional standardized data that feed into the proprietary Zillow Inc. algorithm that yields real-time estimates for P h , δP h , P + h and P − h .

TABLE S1 .
Two-period Difference-in-Difference model with dependent variable ∆P h,m .Ordinary Least Squares (OLS) model implemented separately for the three regions with sufficient rental data, which serve as comparative DiD group, corresponding to the I h,ForSale baseline: if h is listed for sale, then I h,ForSale = 0 and = 1 otherwise.Coefficients estimated with property type interaction are indicated by [For Sale] and[Rent].The coefficient δT E,∆P corresponds to the COVID-19 treatment effect on 30-day percent price changes, ∆P h,m , and visualized together in Fig.4(D).Note that Tm is the time period variable, taking the value 1 if the listing occurred after 1/2020, and 0 if before 2020.Parameter estimate p-values are shown in parenthesis below each point estimate.OLS regression implemented in STATA 13 using "reg" calculated with robust standard errors.Factor variables included but not reported in the table below: Zest.price decile D h,m , which ranges from 1 to 10; Dummy variable for calendar month, Cm, which ranges from 3 (March) to 10 (October) capturing intra-annual housing market cycle.See Fig.S8for the cross-correlation matrix across the principal model covariates.Treatment effect, δ T E,∆P (I h,ForSale × T m ) p < 0.05, * * p < 0.01, * * * p < 0.001 *

TABLE S2 .
Two-period Difference-in-Difference model with dependent variable U h,m .Ordinary Least Squares (OLS) model implemented separately for the three regions with sufficient rental data, which serve as comparative DiD group corresponding to the I h,ForSale baseline: if h is listed for sale, then I h,ForSale = 0 and = 1 otherwise.Coefficients estimated with property type interaction are indicated by [For Sale] and[Rent].The coefficient δT E,U corresponds to the COVID-19 treatment effect on the percent price uncertainty, U h,m , and visualized together in Fig.4(D).Note that Tm is the time period variable, taking the value 1 if the listing occurred after 1/2020, and 0 if before 2020.Parameter estimate p-values are shown in parenthesis below each point estimate.OLS regression implemented in STATA 13 using "reg" calculated with robust standard errors.Factor variables included but not reported in the table below: Zest.price decile D h,m , which ranges from 1 to 10; Dummy variable for calendar month, Cm, which ranges from 3 (March) to 10 (October) capturing intra-annual housing market cycle.See Fig.S8for the cross-correlation matrix across the principal model covariates.Treatment effect, δ T E,U (I h,ForSale × T m ) p-values in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001

TABLE S3 .
Aggregate model of properties listed 'For Sale' with city fixed effects.Parameter estimates for the model yielding marginal effects plotted in Fig.5.S h is binary indicator variable coding the market size of each city (big or small).Ordinary Least Squares (OLS) model implemented in STATA 13 using "areg" with city-level fixed effects, and calculated with robust standard errors.Parameter estimate p-values are shown in parenthesis below each point estimate.∆Ph,m (%) U h,m (%) Ave.US 30-yr.fixed mortgage rate, β(M m ) p-values in parentheses* p < 0.05, * * p < 0.01, * * * p < 0.001