Determinants of rural creative microclustering : Evidence from web-scraped data for England

This study aims to compare the drivers of clustering of rural and urban creative industries in England, UK. We use pre-pandemic web-scraped data from 154,618 creative industry organisations in England, and use a novel technique to identify 71 distinct rural creative ‘microclusters’ of geographically proximate creative firms. We then consider the role of place-based assets and agglomeration in the presence of microclusters at a micro-level geography and find that the determinants of microclustering are generally consistent between rural and urban areas. On that basis we argue that policies to support creative clusters may drive rural regional development.


Introduction
The creative industries (CIs) play an increasingly vital role in the global economy. Prior to the COVID-19 pandemic, they were the fastest-growing industry sector (UNCTAD, 2018); in the EU, in 2019 cultural enterprises had a combined turnover of over 401 billion euros (Eurostat, 2021). The sectors that make up the CIs (and related terms and framings such as 'creative and cultural industries' and 'creative economy') have multiple, contested definitions 1 but are generally characterised by having high levels of human creativity as inputs, containing symbolic meaning for users, and potentially containing an element of intellectual property (Throsby, 2008). From a regional policy perspective, CIs are particularly interesting because they are characterised by high levels of agglomeration and clustering (Berg & Hassink, 2014;Bloom et al., 2020;Gong & Hassink, 2017), which can help to drive regional economic performance (Boix et al., 2015;Boix-Domenech & Soler-Marco, 2017;Crociata et al., 2018;OECD, 2019). In line with the broader literature on agglomeration and cities (Duranton & Puga, 2004), creative clusters are most often associated with urban areas (e.g. Berg & Hassink, 2014;Boix et al., 2015;Coll-Martinez et al., 2019;Lazzeretti et al., 2008;Lorenzen & Frederiksen, 2008). Despite this urban reporting, there is also a growing body of evidence that rural areas can benefit from the presence of CI activities (Bell & Jayne, 2010;Hill et al., 2021;Mahon et al., 2018;Townsend et al., 2017).
Rural firms are impacted by their locations: rural areas are characterised by a lower population, business and infrastructure density, which affects access to local customers, appropriately skilled staff, and business support (Bosworth & Turner, 2018;Lee & Cowling, 2015). In light of these challenges, neo-endogenous development models in rural areas (e.g. Ray, 2001Ray, , 2006 have argued that local place-based assets have the potential to drive the development of rural areas (Naldi et al., 2021) and 'culture economies' more broadly (Argent, 2019;Phillip & Williams, 2019). The potential contribution of rural CIs to economic development has prompted calls to rethink our conceptualisation of creative clusters to include rural, as well as urban, areas (Escalona-Orcao et al., 2016;Harvey et al., 2012).
This article aims to investigate the nature of creative clustering in rural areas. It does so by adopting the concept of the 'microcluster' (Boix et al., 2015;Siepel et al., 2020), which proposes that agglomerations can manifest on a smaller scale of 50 or more businesses in a proximate area. We apply the microcluster concept to rural CIs in England and use the resulting clusters to compare the determinants of clustering in rural and urban settings, with particular attention to the role of local assets as the basis for clustering.
We identified these microclusters using a novel technique whereby we used data scraped from 154,618 CI industry websites in England, UK, which we then mapped down to the street level. We used a density-based clustering method to identify clusters 2 of geographically close rural creative firms. We then aggregated the number of firms in microclusters in fine, granular geographies and estimated a series of regression models to identify the determinants of firms' being based within a microcluster. We identified 71 rural microclusters, representing 38% of all rural organisations in our sample. Our analysis shows that, generally speaking, the determinants of microclustering are broadly consistent between rural and urban areas. We observe strong associations between cultural amenities and clustering, but find limited and subsector-specific association between natural amenities and clustering at the fine geographical level. We also find a weaker association between networking activities in rural areas compared to urban areas.
This article makes two contributions to the literature: first, it presents a novel way of mapping microclusters using scraped web data, which we employ on a larger geographical area (e.g. an entire nation) than has been done previously to our knowledge. Second, using the clusters derived from this analysis we show that drivers of clustering are generally consistent between rural and urban areas. We use our findings to argue that policies to support creative clusters should include rural areas, and that efforts to support formation of rural creative networks might pay dividends.
The article has the following structure. In Section 2 we provide some previous evidence. Section 3 briefly explains the data and the methodology. Section 4 presents and discusses the estimation results. Finally, in Section 5 we conclude.

Creative clusters, rural spaces and agglomeration
The CIs are widely recognised to be highly clustered, and companies in these sectors are bound to the locations in which they operate (Berg & Hassink, 2014;Boal & Herrero, 2017;Bloom et al., 2020;Domenech et al., 2011;Lazzeretti et al., 2008;Mateos-Garcia & Bakhshi, 2016). The large body of literature on creative clusters (see reviews in Berg & Hassink, 2014;Bloom et al., 2020;and Gong & Hassink, 2017) generally characterises creative clusters as effectively urban phenomena, in line with broader literature identifying agglomeration as largely taking place in cities (see Duranton & Puga, 2004).
An implicitly urban-focused conceptualisation of creative clusters neglects the possibility that CIs, and creative clusters specifically, may play a role in driving rural economic development (Bell & Jayne, 2010;Darchen, 2016;Harvey et al., 2012). A growing literature shows that creative businesses play a significant role in rural economies and societies (Balfour et al., 2018;Hill et al., 2021;Mahon et al., 2018;Townsend et al., 2017). Moreover, there is also evidence that creative clustering can be important in rural settings (Escalona-Orcao et al., 2016;Harvey et al., 2012;Roberts & Townsend, 2016) and that interventions to co-locate creative activity in rural areas, such as creative hubs, can play an important role in mimicking the spatial clustering seen in urban spaces (Hill, 2021;Hill et al., 2021;Merrell et al., 2021;Pratt, 2021).
If rural areas are to benefit from creative clusters, it is also important to recognise the distinctive socio-spatial context in rural areas (Bosworth & Turner, 2018; to which firms must adapt. Rural areas have a lower population density, and hence smaller local/regional numbers of consumers and choice of appropriately skilled employees (Lee & Cowling, 2015). Similarly, the lower density of businesses means less local competition but also fewer business customers, which then necessitates an earlier engagement with exporting activities out of the region and internationally (Dubois et al., 2012;Lee and Cowling, 2015;Mole et al., 2022). Rural businesses face a business support infrastructure that is less dense, poorer access to public transport (Bosworth & Turner, 2018) and inconsistent internet connectivity (Whitacre et al., 2014), all of which can impact firm performance and contribute to -or motivate -clustering in rural areas. We aim here to explore how local place-based assets, as well as agglomeration spillover effects, impact creative clustering in rural areas.

Place-based assets, creative clustering and agglomeration spillover effects
There is established evidence on the importance of institutional factors (cultural and human capital) in the formation of creative firms and the location of clusters (e.g. Boix et al., 2013;Cerisola & Panzera, 2022;Cooke & Lazzeretti, 2008;Lazzeretti et al., 2012), specifically in the rural context (McGranahan et al., 2010;Naldi et al., 2015). Moreover, there is also evidence suggesting that local factors, such as public transport access, infrastructure, and access to natural spaces, make locations more attractive (Gottlieb, 1995;Naldi et al., 2021) and that these location-specific factors may play a stronger role in driving rural economic development than they do in urban areas (Naldi et al., 2021). This points to a way in which local rural assets may prove to be a basis for growth. These rural assets may serve as an alternative basis for economic development instead of more traditional endogenous growth factors in urban regions (e.g. Duranton & Puga, 2004), as well as Jacobs-style economic diversity. Location-specific factors may then provide a basis for attracting and absorbing outside knowledge (Trippl et al., 2015) that might otherwise be lacking. All of these factors are particularly H1: Creative clustering is likely to be associated with presence of cultural amenities in both rural and urban areas.

Local social capital
Another potential but distinct factor that could drive clustering is local social capital, proxied by the existence of community organisations and venues that facilitate informal social interactions and networking, such as community centres in a neighbourhood, meetup groups, co-working spaces and coffee shops (as used in Hoyman & Faricy, 2009). These organisations, while being location-specific factors, represent an alternate means of building and maintaining creative and cultural networks (Harvey et al., 2012), while also generating a sense of community belonging (Andres & Round, 2015). This result was found by Hoyman and Faricy (2009) to be closely associated with other means of human and intellectual capital, as well as wage growth in US metropolitan areas. Given the association between social capital and economic development (e.g. Iyer et al., 2005;Woodman, Sawyer and Griffin, 2006), we would expect this finding to be associated with creative clustering in both rural and urban settings.
H2: Creative clustering is likely to be associated with presence of venues facilitating informal social networking in both rural and urban areas.

Natural amenities
There is a mixed set of evidence about the importance of natural amenities (or nature-based amenities) for rural economic development (Power, 2005). Deller et al. (2008) found a positive relationship between amenities and regional economic growth, and Naldi et al. (2021) found a positive relationship between natural amenities and new firm formation in Sweden. In contrast, using UK data, Agarwal et al. (2009) did not find significant relationships between a natural beauty index and the economic performance of English rural local authorities. However, given the importance placed on outdoor amenities as a quality-of-life attribute for the rural creative class (McGranahan & Wojan, 2007;McGranahan et al., 2010), we would expect natural amenities to be associated with the presence of creative clusters. In contrast, we would not expect natural amenities to be associated with urban clusters (following, for instance, the finding in Naldi et al., 2021).
H3: Creative clustering is likely to be associated with natural amenities in rural areas, but not urban areas.

Local knowledge environment
Many universities and colleges cooperate with local businesses to ensure their offerings meet the skills needs of their respective region. Valero and Van Reenen (2019) also show a positive spillover effect from universities to their closest neighbouring regions. These institutions may also be a source of ideas or cultural amenities (Combes, Duranton & Gobillon, 2011), although companies in clusters may have a negative or ambivalent view about the impact of universities in supporting their activities (see also Chapain et al., 2010). Despite this, there is evidence that universities can be a source of knowledge spillovers in urban as well as rural settings, although in the latter case the spillover effects may be more localised (Andersson et al., 2009). Agglomeration benefits from universities may be generated by infrastructure such as innovation parks (Rosenthal & Strange, 2020), as well as entrepreneurship from graduates (Kitagawa et al., 2022). In non-urban areas, Kitagawa et al. (2022) found that social sciences and humanities graduates are more likely to start businesses near their universities. It is not unreasonable, therefore, to presume that this could be a factor associated with creative microclustering.
H4: Creative clustering is likely to be positively associated with proximity to universities in urban areas and rural areas.

Agglomeration spillovers
Beyond these above structural factors, there is some limited evidence about the nature of agglomeration economies in rural settings. Two crucial aspects of agglomeration economies relate to industry specialisation and diversity and, more recently, to the concepts of related diversity and unrelated diversity (Frenken et al., 2007). Related diversity in the regional context refers to the presence of sectors in a region that have related or complementary capabilities, assets or knowledge. Unrelated diversity, on the other hand, refers to industries that are not related or complementary to each other. The general argument is that knowledge spillovers depend on firms being in close cognitive proximity or relatedness (manifested by the homogeneity of capabilities, skills, and knowledge base) -that is, similar sectors are more likely to have higher knowledge spillovers. This type of proximity is assumed to generate an interactive learning environment where firms can discover, interact, learn and innovate (Boschma, 2005;Boschma, 2017;Boschma et al., 2015). Related diversity and clustering occur when different industries within a region are complementary and supportive of each other. For instance, the presence of a strong technology sector can support the growth of other industries such as finance and professional services, leading to a related diversity of industries. This related diversity creates a supportive ecosystem that enables knowledge spillovers and economies of scope, leading to increased competitiveness and innovation.
Unrelated diversity and clustering, on the other hand, occur when different industries within a region are not complementary or supportive of each other. For instance, the presence of a large manufacturing sector in a region may not support the growth of a biotechnology or financial services sector, leading to unrelated diversity. Similarly, the clustering of unrelated industries can lead to competition for resources, rather than collaboration and sharing of knowledge, and thus to reduced innovation and competitiveness. The impact of related and unrelated diversity in rural (as opposed to urban) settings is still emerging in the literature. In the context of rural areas, Naldi et al. (2021) recently found that both related and unrelated diversity were associated with new firm formation. We would therefore anticipate a similar positive relationship between both related and unrelated diversity and the presence of creative clusters.
H5: Creative clustering is likely to be positively associated with related diversity in urban and rural areas.
H6: Creative clustering is likely to be positively associated with unrelated diversity in urban and rural areas.

Methodology and data
Identifying clusters is not a trivial task (see Bergman & Feser, 1999), with methods including qualitative identification alongside index base indicators such as location quotients (LQs), concentration indexes and input-output analyses. 3 More recent applications use spatial statistics, in which the analysis of agglomeration puts a great deal of emphasis on space, distance and spatial dependence (van Oort, 2017). Our approach relies on the use of spatial statistics, as these statistics offer some added advantages compared to other methods. First, they induce measurement improvements in the exact definition of agglomeration as the distance becomes more functional in character. Second, they offer a finer spatial scale than metropolitan areas (cities, commuting zones, local authorities), shedding light on intra-urban dependency (van Oort, 2017;Wallsten, 2001) and, for our purposes, allowing us to elucidate levels of clustering in rural areas, where skewness of population centres means that average measures such as LQs may smooth out activity hotspots.
Our approach in this paper draws upon this spatial approach to clusters by adopting the concept of the creative 'microcluster' (Boix et al., 2016;Siepel et al., 2020) as the basis for our analysis. The 'microcluster' concept is based on the idea that agglomeration dynamics in the CIs occur within the first kilometre of a business's location (Arzaghi & Henderson, 2008;Coll-Martinez, 2019;Coll-Martinez et al., 2019). On this basis, examining microclusters, which have previously been defined as consisting of 50 or more proximate businesses (Boix et al., 2016;Siepel et al., 2020), has the potential to provide a granular means of identifying localisation economies at a fine level of geography. This approach can therefore allow us to capture smaller agglomerations, not just in rural areas (where these might be smoothed out using LQs) but also in urban areas, where microclusters can identify specific neighbourhood-level agglomerations.
Our approach to identifying these microclusters draws upon recent studies that have explored clustering at a fine geographical level using data scraped from company websites (e.g. Papagiannidis et al., 2018;Rammer et al., 2020;Siepel et al., 2020;Stich et al., 2022). These studies generally use the addresses provided on company websites to identify and inductively map clusters of activity, often in a way that differs or is more insightful than standard SIC codes (as in Papagiannidis et al., 2018). Our approach differs from previous studies in that, whereas other studies looking at clustering using web data have looked at clustering in an individual neighbourhood or city (Rammer et al., 2020;Stich et al., 2022) or one region (Papagiannidis et al., 2018), we attempt to map across the whole of England, and we try to use this data to identify clusters in rural areas. To our knowledge, neither of these has been done previously.

Data
The data we used was collected by the analytics company Glass.ai in 2019. From a sample of all UKbased websites (approximately 2,690,395 in total), we sought to identify CI websites that provided physical addresses, and from those to identify those in rural areas. We then used very fine-grained geographical units to identify local assets and to model the relationship between creative microclustering and the assets discussed above. Figure 1 shows our empirical approach.

Defining the creative industries
There are a number of terms that are widely used as underlying creative clusters: creative industries, cultural and creative industries and creative economy, among others, can all be used in association with the concept of creative clusters, and all have slightly distinct meanings (see for instance Higgs & Cunningham, 2008). Our interest, for the purposes of the analysis we conducted, was a bespoke extension of the UK creative industries definition by the Department for Culture, Media and Sport (DCMS), which identifies nine subsectors: advertising and marketing; architecture; crafts; design; film, TV and radio; IT and software; publishing; museums and libraries; and music and performing arts (DCMS, 2016), which is the de facto definition used in the UK. Given the nature of the questions we are asking, particularly relating to local cultural amenities, a definition of CIs that includes cultural organisations risks double-counting these organisations on both sides of any model. Therefore, to address this issue, we adopted the DCMS definition but excluded organisations in the museums and libraries, music and performing arts sectors. This selection means that we excluded some key cultural elements in the 'creative and cultural industries' definition; however, the remaining sectors allowed us to explore clustering in related sectors.
Operationalising this definition is also potentially a challenge, as the DCMS definition is based upon four-digit Standard Industry Classification (SIC) codes, and our data is SIC code-agnostic. The scraped web data that we used is inductively classified into 109 broad sectors. Our definition of 'creative industries' is, therefore, all of the broad sectors from the web data that map onto one of the seven subsectors from the DCMS definition: advertising and marketing (the 'marketing and advertising' and 'public relations and communications' categories in the scraped data); architecture ('architecture and planning'); crafts ('arts and crafts'); design ('design' and 'apparel and fashion' (excluding fashion retail)); film, TV and radio ('animation', 'broadcast media', 'media production', 'motion picture and film' and 'photography'); IT and software ('computer games', 'computer software'); publishing ('newspapers and magazines', 'online media', 'publishing', 'translation' and 'writing and editing').

Identifying websites with addresses
Applying this definition to our data resulted in a sample of 361,459 websites that had text indicating participation in one of the sectors named above. But, importantly, not all websites had addresses on them. Businesses may choose to list their address on their website if they want their customers to find them easily, but equally businesses may choose not to list their address (for instance if they do not have an office, or work from home, or do not wish to be visited by members of the public). When we removed the websites that did not list an address, we were left with a working sample of 154,618 creative organisations with postcodes in England. This meant that 42.8% of websites listed an address. While this is considerably higher than the 24% of websites containing addresses in Papagiannidis et al.'s (2018) study, it is possible that this represents a bias in our results, and to this end Table 1 summarises the count of businesses in the overall sample as well as the sample with addresses. Generally speaking, the percentage composition of the samples is remarkably consistent between the two samples. Where there is variation (notably design, film and TV, although some other subsectors as well), we hypothesise that these are in sectors characterised by high levels of self-employment and freelance work, in which workers might not keep consistent offices. Measuring the location and activities of freelancers is a substantial problem for any effort to map CIs (see Paneels et al., 2021, for a summary of the issues). While it would obviously be preferable for all websites to list an address, the coverage we have in this dataset is clearly unique and is likely superior in terms of coverage of freelance workers compared to other forms of business registration.

Identifying rural businesses
We identified 21,124 creative companies located in rural settings as characterised by the UK Office for National Statistics definition of rural areas for England (ONS, 2016), based on the postcodes provided in the analysis. These companies represented 14% of the total sample. Figure 2 demonstrates that about half of the sample firms were located in rural villages and dispersed, and 43% operated within rural towns and the periphery in England.

Identifying microclusters
From the geo-located data, we determined whether a firm was in a microcluster (i.e. a small concentration or group of firms that are close to each other). We implemented a self-adjusting (HDBSCAN) clustering method (Campello et al., 2013;McInnes et al., 2017) to detect areas where companies were concentrated and where their location was based in sparsely populated or empty areas. The clustering method employs an unsupervised machine-learning clustering algorithm to identify a range of distances to separate clusters of varying densities from sparser noise. The algorithm computes hierarchical estimates and scores the outlierness of each data object, extracting local clusters based on a cluster tree. 4 The algorithm optimally creates the most stable clusters that incorporate as many firms as possible without incorporating noise. Appendix 1 gives further details behind the clustering method.
To identify the threshold of values of what constitutes a 'microcluster', the algorithm requires only one input -the analysis to select the minimum size per cluster. We set the condition of the number of firms in the cluster ranging from 50 firms as the minimum cluster size to N. The threshold of a minimum of 50 firms has been used in previous microcluster studies, including Boix et al. (2016) and Siepel et al. (2020). This threshold could reasonably capture effects at an immediately proximate area as being creative clusters. Looking at the number of neighbours at different radii (Table 2), we see that, up to one kilometre, the average number of neighbours was 14 firms. Up to five kilometres, the average number of neighbours was 44. Our threshold of 50 firms per cluster is, therefore, a relatively conservative measure in capturing hotspots of rural firms in a radius of about one to five kilometres. Previous evidence for the CIs has shown that agglomeration dynamics in the CIs occur within the first 500 metres and kilometre (Coll-Martinez, 2019;Coll-Martinez et al., 2019). Tables A1a and A1b in the appendix show a summary of a sensitivity analysis that we carried out, testing different threshold measures. We see that within the same radius the median and the average number of microclustered firms remain relatively consistent even as the minimum size of the cluster changes.
Through the application of the density-based clustering method, we identified 71 rural creative microclusters across England. Overall, about 38% of rural firms in the sample were in a microcluster (see Table 3). The fraction of firms in microclusters by subsector in Table 3 varied across sectors, with the IT/software sector the most likely to be located in a microcluster. We explore potential explanations for this finding in Section 4.2. Figure 2 displays the clusters identified. Looking at the microclusters map, we can see that some clusters were on the periphery of large cities as those surrounding London, Manchester or Birmingham, while others, such as some in the north-west of England, were near national parks like the Lake District National Park.

Regression analysis
With the mapping complete, the next step was to identify the determinants of clustering. We estimated a set of regression models, intending to analyse how certain local factors were associated with the location of rural-urban creative microclusters. We explain the selection of the variables in more detail below.

Dependent variable
We summed the number of creative firms in microclusters across a Lower Super Output Area (LSOA). The sum corresponds to the total number of firms in microclusters at each LSOA within one kilometre of the LSOA centroid. An LSOA is a census dissemination unit that represents homogeneous neighbourhoods of 1,500 residents on average, and it is the smallest geographical unit used by the UK Office of National Statistics. Using LSOAs as a measure has the advantage of capturing microclustering dynamics at a very granular level (i.e. close to a neighbourhood), while offering the opportunity to use centroids to aggregate headcount information at a radius of one kilometre of the LSOA centroid to construct our control variables. 5 Previous empirical analyses have also shown that creative firms only benefit from localisation economies within the first kilometre (Arzaghi & Henderson, 2008;Coll-Martinez, 2019;Coll-Martinez et al., 2019).

Explanatory variables
Our interest in this empirical exercise is to test the role of different types of location-based amenities and the learning aspect of agglomeration economies, with a focus on rural creative microclusters. Our selection of explanatory variables drew on prior studies of CI location and general firm location studies. We geocoded data from several databases to identify the following set of regressors.
Neighbourhood supply of cultural amenities. We defined cultural amenities as including museums, public galleries, heritage sites, libraries, archives and science centres. We calculated the number of cultural institutions within a one-kilometre radius of the centroid of each LSOA (ln_cultural_inst). The data used was based on the comprehensive Culture24 database 6 of UK cultural organisations, which covered 11,304 listings, for which 10,571 places were geocoded and merged to our main data.
Local cultural and creative industry-related networks. We used this to capture opportunities for networking through informal social connections through social networks, fairs and venues. To account for this, we used the Culture24 dataset to identify the number of organisations dedicated to exhibitions, campaigns and initiatives, festivals, cultural and scientific meetings within a onekilometre radius of the centroid of each LSOA 7 (ln_cultural_network).
Neighbourhood supply of nature-based amenities. Place-bound resources may be exploited by firms if they locate nearby. Naldi et al. (2021), for instance, showed that both urban and rural firms derive a positive benefit from natural amenities such as natural areas and parks. Our measure of natural amenities (ln_nature) summarises the number of gardens, environmental and ecological centres, national parks and areas of outstanding natural beauty within a one-kilometre radius of the centroid of each LSOA.
Local knowledge environment and local labour pool. To proxy the local knowledge environment, we controlled for the distance (in kilometres) to the nearest higher education institution (universities and further education colleges) from the LSOA centroid (ln_distance_HEI). 8 5 There are 34,753 LSOAs in England and Wales. While the full dataset contains this number of LSOA, some of the crucial control variables that we employ in our models are not available for all LSOAs. Therefore, in subsequent analyses we present data on the sample of LSOAs for which information on all relevant variables is available. 6 Data was drawn from Culture24, a private organization that operates in the UK and has the rights to the most complete data of cultural amenities available in the country. We thank Culture24 for providing an API to access the data. Culture24 does not bear any responsibility for the analysis of the data. 7 Data also collected from Culture24. See footnote 6 for further details. 8 Data retrieved from the Historic England 'Register of Historic Parks and Gardens of special historic interest in England' available at: https://www.data.gov.uk/dataset/88cfe0de-85cd-431f-9836-2bee841d8165/registeredparks-and-gardens-gis-data Agglomeration spillovers. Evolutionary economic geography argues that both cognitive proximity and geographical proximity are important in the flow of knowledge through regions (Boschma & Martin, 2007). Drawing on this notion, we introduced three measures usually applied in the empirical literature on industrial location. First, we controlled for the industry composition by computing two diversity indexes of all industry sectors at the LSOA, following Frenken et al.'s (2007) and Wixe and Andersson's (2017) approach. The first index refers to unrelated diversity (UD), which determines the extent to which firms operate in different industries that share a number of similarities within the local area. Operationally, UD measures the distribution of employees in the neighbourhood between two-digit industries. 9 The second index, related diversity (RD), captures the extent to which firms operate in different industries that share few or limited similarities. The index measures the distribution of employees between five-digit industries within each two-digit sector. The concepts of related diversity and unrelated diversity reflect the level of regional/local specialisation: a low level of regional specialisation could be an indication of a high level of related or unrelated industrial diversity (Aarstad et al., 2016). 10 The third measure captures neighbourhood specialisation using employment-based LQs at the LSOA level. LQs are computed for CIs, manufacturing, services-based activities and knowledge-based activities (following the use of similar measures in Arauzo et al., 2010;Cruz & Teixeira, 2021;Lazzeretti et al., 2012).
We also tried to control for the presence of agglomeration and urbanisation economies. We first used population density (ln_pop) as densely populated areas display more interactions between economic agents (Rodriguez-Pose & Hardy, 2015). We measured population density as the number of inhabitants per square kilometre at the LSOA level. 11 Second, we controlled for the lack of affordability of local offices and spaces, which is also considered a proxy for agglomeration economies (Andres & Round, 2015;Cruz & Teixeira, 2021;Drennan & Kelly, 2011). 12 We proxied this by including rateable value per square metre at the LSOA level as a control variable (ln_rate). Rateable value is a measure collected by the UK Valuations Office Agency as an indicator of the value of business premises for rental purposes, for the purposes of assessing business tax. We also included the squared term of ln_rate in our models to account for non-linearity. Another control refers to the distance of each LSOA to the main city (distance2city). For this purpose, we used the major towns and cities statistical geography, which provides a precise definition of the most important cities and towns in England. 13 Being closer to a core city or area may bring potential economic size benefits (Hanson, 2001). Furthermore, to account for regional economic aspects that can drive clustering, we controlled for the level of unemployment at the district level. 14 As shown by Duranton and Puga (2004) and Glaeser et al. (2015), firms prefer to locate in areas with enough workers. 15 We also controlled for the geography of relative affluence in different LSOAs (rel_affluence), using the UK Index of Multiple Deprivation, which ranks all LSOAs by relative affluence and attributes each LSOA to the decile corresponding to its rank. We use this decile measure for our affluence control. 16 Finally, we controlled for the presence of connectivity infrastructure by including broadband speed at the local level (ln_internet_speed). where _ is the number of creative firms in microclusters in area i. Two samples were used to estimate this model using the same set of explanatory variables and controlling for dummies for the commuting hinterland where most people work (also known as travel-to-work areas, TTWA) and represented by α . The first sample corresponds to 4,702 rural areas located in England. The second sample covers 18,765 urban areas. We separated these two samples to investigate the reasons for apparent differences in creative microcluster locations in rural and urban areas. For estimation purposes, the reference period for our dependent variable is the year 2019, whereas all explanatory variables are the years 2018 and 2017, where possible, to avoid problems of simultaneity. The data we used was mainly cross-sectional, reducing the possibility of controlling for sources of endogeneity. For instance, the location patterns of creative microclusters could be explained by the innovativeness embedded in places and regions. As we discuss above, we controlled for the presence of universities and colleges as a means of controlling for possible knowledge or innovation spillovers arising from universities. Despite this, knowledge spillovers can come from different sources, for instance dominant technologies being developed in the region, or innovation hubs that attract and support cultural organisations or creative businesses. Nevertheless, the nature of our key variables makes the use of time variation redundant to some extent, as natural and culture-based amenities seldom change over time (and so are time-invariant). Table 4 displays the variables used and summary statistics. The distribution of our dependent variable (mc_stock) has two features that are worthy of attention. First, the variance is larger than 14 There are a total of 309 districts in England. They are a level of subnational division of England and determine the structure of local government. 15 Data obtained from the Annual Population Survey (2018); population data corresponds to 2018 mid-year population estimates by the ONS. 16 Details on the methodology behind the index of multiple deprivation are available at https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 the mean, implying that the data is over-dispersed. In addition, the variable refers to the number (or count) of firms in microclusters. For these reasons, we estimated a negative binomial regression (NBR), which can model the dispersion by adding an extra parameter into the model. The NBR is a generalisation of Poisson regression. Table A3 in the appendix reports a correlation matrix. None of the correlations reported appears to be particularly high. 17

Rural vs urban determinants of microclustering
We present the results of estimating models by using two different subsamples, split by rural and urban location (Table 5). For each sample, we evaluated four separate models, testing different specifications. Overall, the results are generally robust across all estimations (i.e. coefficients do not change markedly when adding additional controls). We also tested non-linear effects for the office rateable value per square metre (ln_rate). The interpretation that follows is based on our preferred specification, displayed in columns 4 and 8. All models have mean variance inflation factors below 4, and the Hausman test on TTWA fixed effects is also statistically significant (i.e. the model with TTWA fixed effects is preferred to the one without TTWA fixed effects).
Regarding our main explanatory variables, the regressions show a positive sign for cultural amenities when the model is estimated using both subsamples. Regarding the magnitude of the effects, the effect size of cultural amenities is appreciably stronger for rural areas than that for urban areas (0.45 vs 0.28) 18 . This result corroborates the fact that the provision of culture-associated activities is an essential aspect in the dynamics of creative microclusters in urban and rural areas. In other words, the accumulation of culturally led facilities could stimulate clustering, providing a common resource base that brings identity and aesthetic values (Throsby, 2008). These results are parallel to those of Lazzeretti et al. (2012) for Italian urban areas, where the presence of cultural and artistic heritage influences the presence of heritage-dependent CIs. On this basis, we find H1 to be supported.
Concerning the role of creative and social networks, the results show that this variable is positively associated with the number of firms in microclusters in local urban and rural areas, but the results are highly significant for urban areas and (weakly) significant for rural areas, only providing partial support for H2. The effect size in urban areas is relatively higher compared to rural areas (0.19 vs 0.11). This finding suggests that these relationships are important, but the density and proximity of informal social networks make this effect stronger in core urban zones, while the relationship with cluster activity in rural areas is potentially weaker owing to smaller and more displaced networks. Even though the variable used in our analysis is an imperfect proxy of creative networking, this result opens the avenue for policy to encourage the formation of informal networks to connect business and people to places, in both rural and urban areas.
We find a complex picture regarding the association of nature-based amenities and clustering. On the surface we do not see a significant association between immediate proximity to nature-based amenities and creative microclustering in either the urban or rural samples. While this is consistent with previous results for urban areas (Naldi et al., 2021), the lack of a result for rural areas may be explained by the limited variation in the natural amenity measure, especially in urban areas (Table 4 shows that the mean count of urban LSOAs containing natural amenities is 0.08 and, of our urban sample, 93% of urban LSOAs have no natural amenities). One possible explanation is that, in rural areas, those places with high numbers of gardens and other natural amenities are unlikely to also have any substantial (non-agricultural) business activity within a one-kilometre radius. In this case, our results would suggest that these natural amenities themselves do not necessarily appear to attract co-location of creative businesses, and a negative association between natural amenities and microclusters should be expected. However, it is possible that this represents a composition issue in our sample in which there is variation in subsectoral effects, which we explore in detail in the following section. On this basis, we consider H3 to be only partially supported.
Universities play an important role in microclusters in rural and urban areas. The coefficient associated with the log of the distance to the nearest university is negative and statistically significant. This means that microclusters located closer to universities in rural regions exercise an attractive attraction pole that promotes the agglomeration of CIs and perhaps that staff/students at these institutions are active in business offering creative services, or perhaps that graduates of these universities are more likely to start businesses in proximity to these universities, following Kitagawa et al. (2022). We explore this issue in more detail in the next section. On this basis we consider H4 to be accepted.
When examining the relationship between agglomeration, spillovers and the number of microcluster firms in the local area, the coefficients obtained suggest a strong and positive correlation between related diversity, unrelated diversity and microclustering for urban and rural areas. For the case of related variety, which measures the distribution of employees between five-digit industries within each two-digit sector, regions with a high degree of related activities are more likely to have companies located as a part of a microcluster. 19 Another important finding is that the coefficient of unrelated diversity, which captures the degree to which firms operate in different industries with few similarities, is also positive and statistically significant across models for both samples. This finding, while preliminary, supports the idea that regions that host a highly diverse productive structure (including creative and non-creative industries) are also more likely to host microclusters. In other words, local interactions and spillovers outside the industry may drive agglomeration processes even in rural areas and offer regional resilience to sectoral economic shocks, unlike in single sector regions. With this said, the effect size for related diversification (0.21) is very similar to the effect size for unrelated diversification (0.22). This combination of findings supports the conceptual premise that the process of agglomeration in rural areas is driven by both the co-agglomeration of similar industries (Marshall externalities) but also by a diverse set of unrelated industries (Jacobs externalities). We therefore consider both H5 and H6 to be supported.
We also observe some differences between rural and urban areas regarding industry specialisation, measured by LQs, and the presence of firms in creative microclusters. For the case of rural areas, we find a weakly statistically significant negative association between concentration in services and creative microclustering, and no significant association with manufacturing, CIs, or knowledge-intensive sectors. While CI businesses are often in the supply chain of non-tradable sectors like services, it may be that non-tradable sectors reliant on local demand do not find it advantageous to locate in rural settings where the demand they can pull is too low (see Goffette-Nagot & Schmitt, 1999). For urban areas, we find that the presence of creative microclusters is negatively associated with the specialisation of manufacturing and knowledge-based firms. The negative association with manufacturing firms in urban areas may reflect the relative concentration of manufacturing in industrial parks, which we would not expect to also host creative microclusters. The negative association with knowledge-based firms is more perplexing, although -similar to manufacturingthis may also reflect differences between businesses requiring specialised facilities (e.g. R&D labs) that in urban areas might not be collocated with creative businesses.
We find that the area's population density is associated with the number of microclustered firms in both rural and urban areas (see coefficients of ln_pop in columns 1-3 and 5-7 in Table 5). However, the effect of this is absorbed when including the entropy measures (related and unrelated diversity) (see coefficients in columns 4 and 8). Table 5 indicates that the distance coefficient to the city is negative for rural areas and positive for urban areas, and it is statistically significant in both cases. This suggests that creative microclusters tend to thrive best in areas close to cities but not necessarily in the city centre, such as suburbs or exurbs. It is possible that costs may be too high for creative microclusters to thrive within cities, while demand may be too low for them to be successful in more remote areas. When we consider unemployment and microclustering, we find a negative and statistically significant result, in line with previous research showing that CIs firms are less likely to be located in more deprived places with higher rates of unemployment. This finding is also supported by our measure of relative affluence, which finds that creative rural firms in affluent areas are more likely to be associated with microclustering, though as we explore below this has some sectoral composition effects. It is worth noting that, in urban settings, we observe the opposite result, where greater affluence is linked to a lower number of creative microclusters. It is likely that this is the result of a complex interplay of factors, including higher rent and property costs and greater competition, which may mean that consumer-facing creative businesses are successful but which may discourage businesses selling to other businesses, thus deterring the formation of creative microclusters. Another possible explanation is that, as seen in the burgeoning literature on gentrification (Behrens et al., 2022), less-affluent, low-cost locations may be more attractive to creative businesses and thus may be more likely to be the centres of creative microclusters.
The regression results also show a surprising negative association of creative microclusters with internet speed in urban areas. This could be due to a number of factors, including urban areas already having relatively high-speed internet, thus negating further locational pull when even faster internet is made available (Tranos & Mack, 2016); the rise of remote working (even before COVID-19; see also Florida, 2017), which could potentially mitigate drivers for physical co-location; or the possibility that higher speeds are correlated to areas (such as industrial parks) where CIs are unlikely to be located.
Finally, our findings show that the relationship between creative microclusters and office space value is decreasing at an increasing rate. This is evident in the negative coefficient associated with rateable value per square metre and the positive coefficient of rateable value squared. The F-test also provides evidence that the quadratic term is not equal to zero. However, the minimum of the quadratic relationship occurs outside the range of values in the data, suggesting a monotonically decreasing relationship. This means that the slope of the relationship becomes less negative as the rateable value increases.
Overall, these results may reflect general trends across the CIs, such as the increasing prevalence of remote work and digitalisation, and the growing demand for affordable places to do business (Florida, 2017).

Sector-specific determinants of microclustering in rural areas
As shown in Table 3, the percentage of firms in microclusters varies across subsectors. To explore this variation in more detail, we ran separate models for each of the seven DCMS creative subsectors we were considering. All models are estimated using an NBR as in the previous subsection ( Table 6). The specification corresponds to models shown in columns 4 and 8 in Table 5, in which all key explanatory variables and controls are included. The dependent variable (mc_stock) corresponds to the number of firms in microclusters that belong to each of the subsectors.
When we consider individual sectors as opposed to 'creative industries' as a group, we find relatively few differences, but some that prove to be insightful. Several key results, such as the significance of cultural institutions and proximity to HE institutions, as well as the related diversity and unrelated diversity measures, are all significant across each of the subsectors, indicating that these effects appear to be consistent. But the natural amenities variable shows substantial variationarchitecture and crafts are highly significantly associated with natural amenities, and design and film and TV are weakly significant. By contrast, advertising and marketing, IT and software, and publishing do not have an association with natural amenities. The cultural networks variable, which is weakly significant across the whole sample, is only significant -albeit weakly -for the film, TV and radio subsector.
The results in Table 6 also show interesting subsectoral variation around agglomeration and affluence, with advertising and marketing, architecture, and IT and software being more likely to be clustered in more affluent areas. This appears to drive our results across the whole rural sample, and also shows how IT and software (which is associated with other IT businesses but only weakly associated with cultural amenities) and advertising and marketing (which is associated with population density) may have different spatial configurations from other creative sectors (such as crafts, where clustering is associated with lower broadband speed and greater distance to nearest cities).

Robustness checks
The clustering algorithm used in this paper requires the analyst to select the minimum number of firms in a microcluster. As discussed in Section 3.1, we selected 50 firms as a minimum threshold, following previous empirical evidence. To check that our results are not dependent on the selected minimum number of firms in a microcluster, we re-estimated our regression model using 25 firms instead of 50 (resulting in 216 rather than 71 rural microclusters, according to Table A1). We find that the results have broadly similar estimates by using a different threshold (Table A4 in the appendix).

Changing the geographical unit of analysis
In our previous regressions, we used a finer geographical area -the LSOA -which represents a geographical area containing 1,500 residents on average. However, by using this level of granularity, we could potentially ignore dynamics that extend beyond this geographical grid. To test for this granularity, we now change the unit of analysis to a higher level of geographical aggregation, corresponding to Middle Layer Super Output Areas (MSOAs). MSOAs are built from groups of contiguous LSOAs, capturing 7,200 inhabitants on average. Table A5 in the appendix reports the regression estimates from Table 5 but now using MSOAs as the unit of analysis. All variables measure the natural logarithm of X in the MSOAs, where X corresponds to firms or amenities depending on the variable type. The estimates obtained broadly confirm our previous results, once a higher level of geographical aggregation is accounted for. Note that the coefficient of unrelated diversity (UD) becomes statistically insignificant. One possible interpretation is that these factors may not influence the level of microclustering beyond a certain geographical distance. We also note that the negative coefficient for LQ for knowledge-intensive sectors (LQ_know) in rural areas becomes statistically significant in this specification. This negative relationship could be due to the relative absence of specialist (e.g. R&D) facilities in larger rural areas in places where there are otherwise creative microclusters, or possibly different labour markets or limited connectivity to knowledge networks.

Controlling for spatial dependence
Given that the dynamics of microclustering could extend beyond the local area, we needed to control for the potential influence of neighbours on the location of creative microclusters. In other words, one could expect that geographical areas hosting microclusters could exercise influence on their neighbours (spillovers across geographical units). This type of influence generates spatial dependence across geographical units, which could cause an omitted variable bias (Paelinck, 2000). To correct for this, we estimate four spatial regression models. The first model (Equation [2]) estimates a spatial autoregressive model (SAR). 20 This model is estimated using a contiguous spatial weights matrix (W), which measures who the neighbours are. The vector of explanatory variables is represented by and corresponds to the residuals, which are assumed to be independent and identically distributed (i.i.d). We also estimate a spatial error model, in which the dependent variable is regressed on a spatially correlated error term ( ), a vector of explanatory variables and an error term , which is i.i.d (Equation [3]). The third model in Equation [4] estimates a spatial and error lag model. Finally, we estimate a spatial Durbin model to account for both direct and indirect effects of the explanatory variables on the dependent variable, and the indirect effects are spatially dependent. In this case, the dependent variable is regressed on its lagged values, the lagged values of the neighbouring observations, and the explanatory variables (Equation [5] 20 We use a contiguous spatial weight matrix. Before estimating models, we checked global spatial dependence by means of Geary's c and Getis and Ord's G tests. The null hypothesis of no spatial dependence was rejected (pvalue = 0.000).
We used as many covariates as from the previous regressions as we could, although we had to drop some variables (ln_nature, LQ_manuf, unemp, and ln_rate2) due to high levels of correlation. The dependent variable ( ) corresponds to the number of microcluster firms in the MSOA (mc_stock). This methodology has been previously employed in a recent study by Arauzo-Carod et al. (2023). Table A6 in the appendix reports the main results for both samples (rural and urban locations). Columns 1 and 5 report regression estimates for a spatial lag autoregressive model (i.e. the spatial lag of the dependent variable enters as an explanatory variable). We can see that the coefficient is statistically significant, confirming that the number of creative firms in microclusters behaves with a spatial structure. The models in columns 2 and 6 estimate an autoregressive model with a spatially autocorrelated error term (e.mc_stock). This variable is statistically significant. A model that combines a spatial and error lag model is estimated in columns 3 and 7. The final model (in columns 4 and 8) corresponds to a mixed regressive-spatial autoregressive model with spatial autocorrelation in the independent variables (spatial Durbin model). All tests on the spatially autocorrelated error terms (e.mc_stock) are statistically significant. The variable associated with the spatially autocorrelated error term is statistically significant across these models. Considering the final model (columns 4 and 8), spatial autocorrelation in the independent variables is statistically significant only for urban areas. Overall, the results from the spatial models support our general findings. Specifically, the coefficients for cultural institutions, cultural networks, and distance to universitiesexhibit the same direction and similar level of statistical significance in both samples. However, it should be noted that the coefficient for social capital is only weakly significant in the rural sample. Additionally, the coefficient for unrelated diversity is statistically insignificant in the rural sample. The LQ for services businesses in urban areas switches signs to become positive and becomes significant in the model, which could potentially reflect colocation in urban areas where MSOAs are comparatively smaller in area and hence might capture collocation. The LQ for rural knowledge-intensive businesses becomes weakly significant in this specification.

Conclusions
It is well-established that businesses and workers in the creative industries benefit from agglomeration and co-location. But, to date, most research on creative clusters has assumed this colocation to be a largely urban phenomenon. This paper draws upon growing evidence that clustering can be equally important in rural settings (Harvey et al., 2012;Merrell et al., 2021) to capture the extent of 'microclustering' -that is, smaller spatial agglomerations (Boix et al., 2016;Siepel et al., 2020) -of creative firms in rural areas. Using a novel mapping technique based on spatial analysis of scraped company website data for the whole of England, this study has mapped these microclusters in rural and urban settings. It then aimed to understand whether the determinants of these microclusters are different in rural and urban areas, particularly with respect to local assets such as cultural institutions and networks, natural amenities and agglomeration economies. We answered this question using geocoded data from 154,618 websites of CI businesses and organisations in England. We used a clustering algorithm to identify 71 rural 'microclusters', each of which had 50 or more creative businesses and which collectively made up 38% of the rural creative industry firms in our sample. We then explored the determinants of clustering and analysed the differences between rural and urban microclusters, using fine geographies to identify these determinants. Our primary finding is that determinants of clustering appear to generally be similar between rural and urban areas. For instance, in both settings we find that microclustering is associated with heritage and culture-led facilities and a diverse set of local industries that share a number of similarities (that is, related diversity). Where we find differences (for instance relating to the role of natural amenities), this appears to be driven by variation between the creative subsectors in our sample. We find that creative microclustering is associated with proximity to universities in both urban and rural contexts , and is positively associated with related and unrelated diversification. This paper makes two main contributions to the literature. First, it introduces a novel technique for identifying 'microclusters' that uses scraped web data and an inductive clustering algorithm that allows us to identify clusters across and distinct from geographical boundaries. This method provides a novel technique distinct from the location quotients widely used in the literature elsewhere (Escalona-Orcao et al., 2016). Second, drawing upon this technique, we identify rural creative microclusters and then compare the determinants of clustering in these microclusters to those in urban areas. We therefore contribute to the longstanding literature on firm location by addressing not only the determinants of location for creative firms (Coll-Martinez et al., 2019), but how these vary in different contexts, specifically rural (Naldi et al., 2021) contexts. Our finding that rural determinants of microclustering arevery similar to those in urban areas, applying a high level of territorial disaggregation, suggests that the sectoral trends outweigh the urban/rural distinction. . Bringing these insights together then allows us to draw greater insights about drivers of agglomeration more broadly, comparing rural and urban areas.
The policy implications are twofold. Our findings regarding the association between cultural institutions and associated creative clustering support the possibility for culture-led regeneration and placemaking in rural areas, creating what Ray (2001) refers to as 'culture economies ' and Naldi et al. (2015) discuss with 'smart rural development', where CIs are used as a basis for local revitalisation (Wolman & Hincapie, 2015). We have shown factors that may promote such microclusters but not why that matters or should matter to policymakers. Future research can take up that challenge.
Second, this research suggests that policymakers would benefit from recognising the distinctive features of rural CIs outside of cities, and consider a more nuanced place-based approach. In particular, efforts to support CIs should therefore not overlook or otherwise exclude rural clusters in favour of cities. Indeed, targeted support to develop firms in microclusters and clusters wherever they are will help to unlock the potential of the CIs, in urban and rural contexts. Where rural-specific interventions are designed, efforts that support the development of informal networks between businesses and between rural people and organisations that stimulate local demand seem to be one promising approach .
To conclude, as with all research, our analysis has some limitations. First, although we use a novel data set, its structure is cross-sectional, giving little room to control for potential sources of endogeneity coming from omitted variables and reverse causality. As a result, our findings only demonstrate partial associations and cannot establish causal relationships. Nonetheless, we took several measures to address potential factors that could undermine causal interpretations. For example, we used earlier time periods for all explanatory variables, excluded certain creative sectors from the definition of microclusters to prevent inducing correlations between key explanatory variables and the dependent variables, employed TTWA fixed effects to control for unobserved differences across labour markets affecting outcomes, and controlled for a significant number of confounding variables. These steps helped to strengthen the validity of our results and provide a more robust foundation for future research on this topic. Second, our data reflects spatial distribution, but we cannot control for firm-level characteristics such as age, size and type of organisation, as well as the presence of actors such as freelancers who may not have a web presence. These issues would deserve further research. Third, because we are capturing spatial clustering, our findings in this article do not allow us to make a statement about the existence of agglomeration economies per se in the identified rural microclusters. Finally, the relationships we identify are based on pre-COVID-19 data. The impact of COVID-19 on rural microclusters is important in multiple ways -in terms of both the changing spatial distributions of business activities as a result of the pandemic (for instance through relocations as creative workers move from urban to rural areas because of the pandemic) and the resilience of rural microcluster businesses. Further research needs to establish the impact of COVID-19 on these clustering patterns. In addition, further evidence about the nature and impact of agglomeration in rural creative clusters could be very helpful in setting policy and research agendas to help rural areas to unlock the socio-economic potential of creative clusters.