A two-stage clustering approach to investigate lifestyle carbon footprints in two Australian cities

Given the key role of households in driving global emissions and resource use, a change in their consumption behaviours towards more sustainable levels is essential to reduce worldwide adverse environmental impacts. Thereby, focusing on cities is especially important because of today’s large share of the global population living in cities and because local authorities are close to the needs of their residents. However, devising targeted and effective policy measures implies a thorough understanding of prevailing consumption patterns and associated environmental consequences. The goal of this article is to investigate and compare household behaviours and lifestyle-induced carbon footprints in Sydney and Melbourne in order to enhance today’s understanding of household consumption in cities of a high-income, high-emission country. For this purpose, we employed a two-stage clustering approach with a Self-Organising Map and a subsequent Ward-clustering. This allowed for including expenditure data along with socio-economic attributes and thus for recognising lifestyle-archetypes. These emerging archetypes represent households with similar characteristics and comparable consumption patterns. Analysing the archetypes in detail and performing a city-comparison based on multi-dimensional scaling revealed similarities and dissimilarities between the two metropoles. ‘Older’ archetypes seem to behave more alike across cities but show different carbon footprints emphasising the importance of regionalised environmental assessments and of city-specific supply chains. Distinct patterns especially emerged in the high- and low-income segments highlighting the different importance of different lifestyles in each city. Socio-economically similar family-archetypes were found in both cities, but some of them showed diverging consumption behaviours. This article showed that studying household-induced environmental impacts in cities should not rely on macro-trends but should rather be based on city-specific analyses that capture local peculiarities and consider socio-economic characteristics and consumption data simultaneously.

A novel approach for investigating household behaviours in a certain region and thus for delivering data in support of environmental policymaking was developed by Froemelt and colleagues [39]. Using data mining techniques these authors exploited the Swiss household budget survey and were able to group households according to their socio-economic characteristics and consumption behaviours. Previous household consumption studies (e.g. [25,27,38,[40][41][42][43][44][45][46][47][48]) have also been able to provide insightful and highly interesting findings, but they are often based on pre-defined socio-economic household segments. However, several articles point out that there is still a large variability of behaviour present within socioeconomic cohorts [9,25,[40][41][42]. Froemelt et al's [39] approach attempts to advance current methodologies in this regard and thus also to respond to recent calls for a better exploration of lifestyles and the inclusion of behavioural anomalies [9,10,20,34]. By letting consumption archetypes emerge from consumer expenditure data, the proposed two-stage clustering has the potential to recognise important behavioural patterns and thus to efficiently focus on a set of household groups that represent distinct behaviours and socio-economic circumstances occurring in a certain city. Furthermore, Froemelt et al's approach [39] has also been successfully integrated in the context of large-scale models [49].
However, despite the importance of cities, Froemelt et al's methodology [39] has never been applied in a city-context. Moreover, to understand household consumption behaviour more broadly, it is essential to apply this methodology also in an international context and not only to Swiss data.
The two Australian metropoles Sydney and Melbourne provide interesting case studies for these purposes [5,14,[50][51][52]. Both cities show commitment to reduce their carbon footprints to some extent [5,50]. However, together, they are currently contributing about 30% to the total carbon footprint (TCF) of Australia [5], which, from a per-capita perspective, is among the highest in the world [19,53].
The goal of this article is thus to investigate consumption patterns in Sydney and Melbourne in order to learn, understand, and compare different lifestyles in two cities of a high-income, high-emission country and thus to improve today's knowledge of household consumption behaviours.
By way of analysing two cities simultaneously, we also aim at providing answers to important questions that might further our understanding of household consumption; e.g. Do similar household segments behave similarly across the cities or do we find cityspecific behaviour? These questions are particularly interesting in the context of Melbourne and Sydney since both cities show-except for being located in somewhat different climate zones-many similarities at first glance: same nation, similar population size, high economic importance within the country [5], and both rank among the top most liveable cities in the world [54][55][56].

Methods
In order to analyse different lifestyles and to assess their induced environmental consequences in Sydney and Melbourne, we proceeded in a similar way as suggested by Froemelt et al [39]. Their two-stage clustering approach revealed interesting household consumption patterns for Switzerland and can be considered a well-suited procedure for the current study. Furthermore, these kinds of pattern recognition techniques also prove their usefulness in other research fields and perform especially well and robust for highdimensional and noisy data [57][58][59][60].
A synopsis of our applied methodology is provided as a flow scheme in figure 1 and is detailed in the following sub-sections. The supplementary information (SI) (available online at stacks.iop.org/ERL/15/104096/mmedia) holds more information to enhance the reproducibility of our study.

Data description and preparation
The Australian household expenditure survey (HES) conducted by the Australian Bureau of Statistics (ABS) in the 2015-16 cycle represents the main dataset for our study [62,63]. In the HES, detailed data on expenditures, household and personal characteristics, sources of income, as well as net wealth and possession of properties of 10 046 individual households is collected. The full dataset comprises about 400 attributes (mainly characteristics, income categories and possession) for each household and each household member, respectively, and subdivides expenditure into 709 consumption categories at the most detailed level of the Australian household expenditure classification system (HEC) [62,63]. Note that data accessible for research is partially confidentialised [63]. In our case, this especially affected the geographical resolution of the data and set the city boundaries of Sydney and Melbourne according to the 'Greater Capital City Statistical Area' (GCCSA) defined by ABS [64]. [IOA = input-output analysis; AusIELab is the implementation of IELab for Australia [61]].
In a first step, the HES microdata needed to undergo basic pre-processing to enable the subsequent application of data mining techniques. This included the aggregation of person-specific data at the household level and the preparation of categorical variables of household characteristics.
As outlined in [39], the clustering of households is only meaningful if it is based on attributes that enable the detection of similar characteristics and behaviour. Therefore, the HES dataset was filtered for such variables and various highly detailed expenditure categories were aggregated to broader consumption areas. Subsequently, all 108 variables destined for the clustering process were put on an even footing by standardising and by correcting for seasonality in cases where the survey time period was found to be statistically significant by one-way ANOVA tests [65].
Besides a more detailed description of this preparatory step, the SI provides also a tabular overview of all HES categories and the variables used in the next steps.

Two-stage pattern recognition
In the first stage of this two-tiered approach, a Self-Organising Map (SOM) is trained [66,67]. SOMs belong to the class of unsupervised artificial neural networks and are well-suited to enable the efficient subsequent application of clustering techniques by preconditioning noisy and high-dimensional datasets [58][59][60]. The SOM usually consists of a 2D-lattice whose nodes represent the neurons. Attached to each neuron is a prototype vector with the same dimension as the original dataset. During the training phase, the data points of the original dataset are exposed to the SOM. The nearest neuron as well as adjacent neurons are activated and move closer to the data point under consideration. The underlying learning procedure allows for preserving the data topology (neighbouring neurons have similar prototype vectors) and for projecting the original dataset to a 2Dmap. After several repetitions of the learning process, the prototype vectors approximate an optimised set of substitutes that are representative of the original data points. The resulting vectors are not only reduced in numbers compared to the original data points, but are also smoothed with regard to noise and enable thus an enhanced pattern recognition for clustering algorithms and even for the human eye [66][67][68].
After extracting the households living in Sydney (1778 households) and Melbourne (2004 households) from the prepared dataset, we trained an SOM separately for both cities. The training followed literature recommendations [66][67][68][69][70] and the tuning parameters were selected based on the resulting topographic error and quantization error (see SI for details) [69]. The final map for Sydney comprises 420 neurons in a 14:30-arrangment, while 434 neurons on a 14:31-grid were considered optimal for Melbourne.
For the clustering of neurons in the second step, a hierarchical agglomerative approach [71,72] was deployed on top of the two SOMs. The decision for this type of clustering was taken mainly because of the possibility to retain the connectivity among the neurons [58,59,73] and due to its good performance in other studies [39,57]. However, to find an optimal clustering, we systematically evaluated different numbers of clusters as well as different affinity metrics and linkage criteria. To judge the performance of clustering parameter combinations, we relied on the Silhouette coefficient [74] as well as on the representation of the so-called U-Matrix which visually summarises an SOM by showing the distances between the neurons and which is well-suited to assess different clusterings [59,75,76]. Based on these considerations and including ANOVA-tests [65] to obtain a sense of the statistical importance of the attributes for the clustering, we fixed the clustering parameters (Ward-linkage with Euclidean distances [77] for both cities) and decided for an upper and a lower number of clusters. In a last step, we consulted the socalled Hits-Map and included all clusters that would be formed by the hierarchical clustering between the upper and the lower number of clusters and that do not violate 'zero-hits'-boundaries [58].
Among the formed household groups, we excluded clusters with less than 50 households due to concerns of representativeness [78]. The whole pattern recognition procedure described above and in the SI finally found 13 clusters for Sydney and 14 for Melbourne.

Computation of centroids and uncertainty analysis
For computing a representative vector for the different household clusters, all variables from the HES dataset (also those excluded during the clustering) were averaged according to the recommendations of ABS [62]. These guidelines suggest considering the representativeness-weight given by ABS for each household [63].
As a valuable addition to the approach of [39], we developed an uncertainty analysis framework by introducing a bootstrap-sampling [79]. The latter randomly re-samples the households in each cluster and, by applying above weighted-average procedure, is able to estimate the uncertainty of the cluster's centroid [79]. In the present case, 1000 vectors were computed for each cluster.

Environmental assessment
For assessing the carbon footprint of the different household behaviours, we set the temporal boundaries to 1 yr of household consumption (the reference year corresponds to the financial year 2015-16 in accordance with the HES dataset [63]).
For computing indirect, upstream life-cycle greenhouse gas (GHG) emissions caused by households, the IELab [61,80,81] was used to build a highly detailed multi-region-supply-use-table for the year 2015 with corresponding GHG-satellites (taken from [82]). Our table distinguishes ten regions (among them also the GCCSAs of Sydney and Melbourne) and details 797 economic sectors in each region. Together with a m:n-correspondence table established for [83], the CO 2 -eq-intensity-factors for 593 consumption categories and thus for the highest possible HEC-resolution could be computed. Thanks to the regionalisation, different factors apply for Melbourne and Sydney (see SI for details).
The inclusion of a constraint for the HES in the reconciliation process of the table ensured good correspondence of the HES data with the final demand vectors (12.9% and 24.2% deviation of the resulting table for Sydney and Melbourne respectively). Furthermore, reasonable agreement of the resulting carbon footprints with existing studies were found (deviations are between 2.2% and 13.3%; see SI).
Direct household emissions from housing (heating and cooking) and private transport were retrieved from [82,84] for Victoria and New South Wales. The respective state-wide household expenditures [63] for residential energy and transport fuels allowed for estimating intensity factors to compute direct emissions caused by the clusters based on their spending. The TCF of households is the sum of life-cycle and direct emissions.

Post-analysis
The SI discloses fully detailed consumption profiles of all archetypes and contains various plots and heatmaps to characterise and study the household consumption behaviours found in Sydney and Melbourne. In addition, a simplified, but intuitive characterisation of these archetypes is provided in figure  2, and further described in tables 1 and 2. The indicative titles for each cluster in these tables are based on a semi-automated analysis that detects expenditure behaviour of individual archetypes that is most distinct from other archetypes in the same city (based on the average distance of the expenditure under consideration from the archetype under consideration to all other archetypes in the respective city; see SI).
Finally, in order to investigate and compare consumption behaviours and carbon footprint compositions in the two cities, metric multi-dimensional scaling (MDS) was applied [73,85]. The employed technique projects the high-dimensional dataset to a 2D-plane and attempts to simultaneously preserve the original distances between the data points in the high-dimensional space. This results in a 2D-plot that reveals which consumption patterns in Sydney and Melbourne are similar within, but also across these two cities. Note that we deliberately choose to apply MDS only to the 20 main consumption categories of the HEC-classification and thus to exclude socio-economic aspects and more detailed spending. This will allow for comparing more generic consumption behaviours across the cities rather than socio-economic differences and city-specific purchases. Likewise, when comparing carbon footprint composition with MDS, we focused on eight main categories. Figure 2, tables 1 and 2 reveal a certain correlation between household size and income in Sydney, while in Melbourne, a similar trend is observed, but two archetypes (MEL-XIII and MEL-VI) emerge with low income but comparably large size. In general, the archetypes in Melbourne seem to be socio-economically more diverse (from very high to very low income and different combinations of income and household sizes). In Sydney, the verylow-income segment is subdivided into more archetypes suggesting that the behaviours in this income group might be more distinct than in Melbourne.

Consumption archetypes and environmental impacts in Sydney and Melbourne
The per-capita carbon footprint ranking of consumption archetypes in figure 3(b) shows an intuitively logical order for Sydney by starting from highincome/small households to two-person/middleaged households, then families, low-income single persons and low-income families. An interesting exception is cluster SYD-I (just retired, one-person Table 1. Consumption archetypes found in Sydney. The different archetypes are randomly labelled with letters. The simplified subdivision into household sizes, income classes as well as the summarising titles help to better understand the nature of the archetypes and shall support the visualisation of socio-economic characteristics in figure 2. Refer to the SI for a detailed characterisation of the archetypes. This table also provides the following information for each archetype: average age of adults (persons aged over 18 yr), number of children (aged below 15 yr) income in AUD per week; income per capita (p.c.); total carbon footprint (TCF) in tCO2-eq per year; and TCF per capita. To aid the comparison, the latter three parameters show a colour scaling from blue (low) to red (high). Table 2. Consumption archetypes found in Melbourne. The different archetypes are randomly labelled with roman numbers. The simplified subdivision into household sizes, income classes as well as the summarising titles help to better understand the nature of the archetypes and shall support the visualisation of socio-economic characteristics in figure 2. Refer to the SI for a detailed characterisation of the archetypes. This table also provides the following information for each archetype: average age of adults (persons aged over 18 yr), number of children (aged below 15 yr), income in AUD per week; income per capita (p.c.); total carbon footprint (TCF) in tCO2-eq per year; and TCF per capita. To enhance the comparison, the latter three parameters show a colour scaling from blue (low) to red (high). households) with rather high per-capita footprint along with low per-capita income. However, the 95% confidence interval for this archetype is large and these households have rather high net wealth. SYD-I is also the least prevalent archetype 1 and 1 Prevalence in this article is approximated by the sum of all representativeness-weights provided by ABS [63] and is visualised in figure 2. therefore only shows a minor contribution to the TCF of Sydney. This is depicted in figure 3(a) that relates the archetypes' per-capita footprints to their shares in Sydney's prevalence-weighted TCF to express the archetypes' importance in the collective. This figure can also help to identify archetypes that might be promising targets to reduce the city's overall footprint (high per-capita footprint/high contribution to overall footprint). The highest shares in Sydney's carbon footprint originate from SYD-D and SYD-B, Further characterisation of these archetypes can be found in a simplified overview in tables 1 (Sydney) and 2 (Melbourne) and in full detail in the SI. The colours of the markers indicate the average age of adults in the archetype (all persons aged older than 18 yr), while the size represents the prevalence of the clusters. In this article, prevalence is computed as the sum of the respective cluster's household-representativeness-weights provided by ABS [63]. Since the resulting absolute prevalence-values do not have a clear meaning, we reduced the legend to a relative low/high-comparison.
both high-income large families whose contributions are high due to their sheer abundance and household size (cf figure 3(a)). Yet, the per-capita footprint of these clusters rank medium compared to the smaller, high-income households (e.g. SYD-J and SYD-A) that also contribute significantly to the overall carbon footprint of Sydney.
In Melbourne, the per-capita ranking (figure 3(d)) starts with very-high-income households, leads over to high-income/small households, single persons, medium-income/small families to highincome/big families and finally families with low income. The most obvious difference to Sydney pertains to single-person households that rank higher in the per-capita footprints list in Melbourne. This is likely due to their generally higher per-capita income and thus probably higher consumption demand. Interestingly, archetypes with older household members, such as MEL-I, MEL-X, SYD-M and SYD-F, show relatively higher impacts in Melbourne than in Sydney (see also discussion below).
As a side note, we would like to pick the per-capita carbon footprints of MEL-X (old, retired couple), MEL-XI (young adults) and MEL-VII (large families) that have different compositions but add up to about the same total. This shows that households in different circumstances and with different consumption profiles might induce a similar amount of lifecycle GHG per person.
In addition to family-archetypes (such as e.g. MEL-VII), young-adult-households (MEL-XI) show also large contributions to Melbourne's overall footprint (cf figure 3(c)). In both cities small highto-very-high-income households (MEL-II, SYD-A, SYD-J) have high shares in the overall footprints and simultaneously high per-capita footprints. This is in contrast to Froemelt et al's findings for Switzerland where very-high-income households had the highest per-capita footprints but minor shares in the nationwide TCF [39].
Sydney's and Melbourne's prevalence-weighted overall footprints are comparable and in line with other studies [5,14,50]. For both cities, food (both: 25% contribution), housing (Sydney: 23%; Melbourne: 27%) and transport (Sydney: 22%; Melbourne: 23%) are the environmentally most important consumption domains, although with a slightly different ranking in the two cities. While the most important contributions to the overall food impacts (expenditures in restaurants, for fast food, for dairy products, and more) and transport impacts (mainly direct emissions from car driving) are similar in both cases, the contributions to housing emissions differ. In Sydney, the use of electricity at home is by far the most important factor. In addition to electricity, Melbourne's direct emissions from burning natural gas is of similar importance. The differences are likely due to climatic conditions with more air conditioning required in Sydney in summer and more heating in winter in Melbourne.
Already in this overview perspective, it is apparent that different archetypes are of different importance in each city. Socio-economically, this manifests itself in small households with comparably lower incomes in Sydney and the archetypes of older population segments with relatively higher footprints in Melbourne. Furthermore, no very-high-income archetype was found in Sydney. Generally, the non-emergence of archetypes can be explained by a less distinct consumption behaviour of these households (becoming thus part of other archetypes) or a lower abundance (being either non-existent or less important and thus removed from consideration because the cluster was not regarded representative). Figure 4 presents the results of the metric MDS and allows studying the similarities and dissimilarities between the two cities in more detail. While detailed data for all archetypes are provided in the SI, only a selection of the most interesting comparisons is presented here. The choice of examples was based on often-investigated aspects like household age, income and size.

Similarities and dissimilarities between the cities
SYD-M and MEL-I are socio-demographically similar and constitute consumption patterns for elder single-person households with a little higher income in Sydney. Figures 4(a) and (b) clearly show that these two clusters are close, revealing similar consumption behaviours across cities. Both have a rather low transport demand, but a relatively high demand for health services, just as other 'old-person' archetypes. Melburnians have somewhat higher expenditures on housing, while Sydneysiders spend a bit more on communications. SYD-F and MEL-X are even more alike from a socio-economic viewpoint. Both represent very old couples in similar financial situations. In figures 4(a) and (b), these clusters are again very close Figure 4. Results of the metric multi-dimensional scaling (MDS) of the archetypes' consumption behaviours and footprints in Sydney and Melbourne. The size of the markers visualises the average household size in the clusters, while the income is indicated by a colour shading scale in the respective city's colours (blue for Sydney, red for Melbourne). Note that MDS is a visualisation technique for multi-dimensional datasets and attempts to preserve the distances between the data points of the original, high-dimensional, data space in a 2D-plane. Therefore, the x-and y-axes do not have a specific meaning and do not correspond to each other in the different sub-plots. Remember that we excluded socio-economic aspects in these analyses to concentrate on investigating similar consumption profiles and footprints. Two clusters being close in these plots means that they show similar consumption behaviour or footprint compositions independent of their socio-economic circumstances. (a) Consumption behaviour per capita (shading follows per-capita income); (b) consumption behaviour per cluster (shading follows total income); (c) carbon footprint composition per capita (shading follows per-capita income); (d) carbon footprint composition per cluster (shading follows total income).
in both total and per-capita terms. However, MEL-X is still closer to MEL-V. The latter refers to younger people than MEL-X and SYD-F. The Melburnian consumption patterns being closer to each other than to the Sydney counter-part could be a hint that these consumption behaviours are city-specific to some extent. In any case, figures 4(c) and (d) show clear differences among these four clusters with regard to footprint composition, especially in a per-capita view. The obvious segregation between Sydney clusters and Melbourne clusters in the MDS-footprint graphs (figures 4(c) and (d)) leads to the conclusion that this is explainable by the different GHG-factors for Sydney and Melbourne. Thereby, the carbon intensities are higher for Melbourne by trend, which is at least partly attributable to the coal-based electricity production in Victoria.
In all four sub-plots of figure 4, MEL-II (veryhigh-income, small families) is very distant to all other clusters. This is in line with the findings of Froemelt and colleagues [39]. In their study, the highest income segment had a special status and highest per-capita carbon footprints too. Transport demand causes the largest life-cycle GHG for this group, followed by food and housing. MEL-II's high expenditures on recreational activities induces also large environmental impacts. The most similar consumption behaviour is shown by SYD-J. Even though MEL-II represents younger adults and children together with higher incomes, SYD-J seems to be in a related financially comfortable situation. On the other end of the per-capita income distribution are SYD-L and MEL-VI. These archetypes are in a comparable occupational and social situation since many households belonging to these groups are unemployed and/or might be single-parents with low net wealth. The Sydney equivalent includes people of slightly higher age and households of a slightly lower size. However, in consumption terms, these archetypes are not as similar as could be expected. SYD-L behaves like small, low-income households (mostly single-person households such as MEL-I, MEL-III, SYD-C and SYD-I) and like poor families (MEL-XIII and SYD-G) in a per-capita view.
Even though MEL-VI is not the closest cluster for SYD-L, they are obviously related in figure 4(a). However, from a total consumption view, many other consumption patterns are between the two archetypes with unemployed persons. Further investigations would be needed to clarify if these different behaviours are a consequence of living in two different cities.
The family-archetypes in Sydney could be interpreted as a sequence from SYD-G to SYD-B and finally SYD-D with increasing age of children. These three archetypes spread over the whole space in figures 4(a) and (b), suggesting very different consumption behaviours of families when children are getting older. Among many different consumption categories, this is very obviously expressed by decreasing spendings for baby food and child care but increasing expenditures for education fees and recreation along this succession (see SI). However, it has to be pointed out that these clusters are also in different financial situations, which additionally affects their consumption patterns. With a very low per-capita income, SYD-G has the lowest per-capita footprint in Sydney, while SYD-B and SYD-D are placed in the middle field (cf figure 3(b)). Due to their prevalence and household size, all three clusters show substantial contributions to Sydney's overall footprint (figure 3(a)).
A similar sequence also emerges in Melbourne with MEL-XIII, MEL-XIV, MEL-VIII and MEL-VII, whereby MEL-XIII and MEL-XIV are close in a sociodemographic view but differ in their income situation with MEL-XIV having a higher income. In many aspects, it seems as if SYD-G is a mix of MEL-XIV and MEL-XIII. It is interesting to see that the behaviours of these family types (young adults with babies/toddlers) led to two distinct behaviours in Melbourne but to one big archetype in Sydney. SYD-G being in-between the two corresponding Melburnian clusters could also explain the differences in behaviours observed in figures 4(a) and (b). While MEL-XIII and SYD-G are similar in a per-capita view, MEL-XIII obviously still behaves even more similar as SYD-L (the 'unemployed' cluster above). MEL-XIV in this perspective is in transition from SYD-G to MEL-VII/SYD-D which seems intuitively correct. However, in a total perspective SYD-G has some proximity to MEL-XIV, but is closer to old twoperson households, while MEL-XIII is surrounded by many different clusters. This leaves us with the conclusion that looking at socio-economic segments alone might fall short of supporting the development of targeted measures and highlights the importance of our approach that takes also the specific consumption behaviours of these segments into account.
Archetypes SYD-D and MEL-VII both represent families with children over 15 and act as a certain counter-example. They are not only socioeconomically close but figures 4(a) and (b) suggest also similar consumption behaviours in both cities. However, in turn, this is not the case for SYD-B and MEL-VIII both with children at school age. While MEL-VIII is understandably close to SYD-D in a total perspective, SYD-B is separated and takes a special status. The distance from SYD-B to other familyarchetypes is even larger in a per-capita view. Despite its socio-economic counter-part in Melbourne, SYD-B obviously behaves very differently. This might be due to its high net wealth which also could explain a certain closeness to SYD-I (wealthy one-person households) in a per-capita consumption behaviour. Interestingly, the city-specific GHG-factors draw SYD-D and SYD-B closer together in the footprintplots in figure 4, while SYD-D and MEL-VII are pulled apart despite their similar expenditure patterns.
All these examples show that some archetypes only emerge in one of the cities, but there are also others that can be found in similar forms in both cities. However, the analyses reveal that seemingly similar archetypes, especially in a socio-economic view, might behave very differently in different cities or might even show unexpected similarities with clusters in other socio-economic circumstances. Furthermore, this study also emphasises the importance of regionalised GHG-factors. The examples of similar consumption patterns but with differing environmental consequences suggest that city-specific supply chains need to be taken into consideration, as was done in this study, when planning policy interventions.

Limitations of the study
While forming the clusters and the life cycle assessment of their consumption behaviour was done with due care and appraised with different performance metrics and evaluation techniques, our approach is still based on assumptions and simplifications that possibly influence the final results. An important issue in this regard pertains to the representativeness of the archetypes. The HES is not based on a minimum number of households and thus no official recommendation from the ABS exists to compile a representative sample [62]. To adequately respond to this and alike issues, we developed an uncertainty framework.
Furthermore, previous studies point out that HESs tend to underreport effective spending [42,86,87]. Besides that, we need to point out that the applied environmental assessment concentrates solely on expenditures, but neglects consequences of long-term investments. This is often encountered in input-output frameworks as is using solely monetary flows [27,[88][89][90][91]. The latter assumption can lead to an overestimation of GHG when purchasers opt for better quality and thus high-price products instead of quantity [40]. Furthermore, by applying a multi-region-supply-use-table without a rest-of-world-region, we implicitly assume that domestic production technology and carbon intensity has been used for household expenditures outside Australia [92].
Finally, the MDS is purposely based on aggregated categories, but it is possible that different aggregations could lead to different pictures. Goodness-offit-evaluations of the MDS are presented in the SI. Thereby, it shall be mentioned that figure 4(a) does not show an optimal representation of the distances, but can be considered suitable.

Conclusions and outlook
We derived distinctly separate lifestyle-archetypes for Sydney and Melbourne to study consumption behaviour patterns and associated environmental impacts within, but also across cities. The application of clustering techniques allowed for including expenditure data along with socio-economic parameters and hence for building household groups with comparable conditions of live and similar behaviour concurrently.
In our study, different archetypes emerged in the two cities and were of different importance. While household groups with older members were generally more consistent across cities, the lowestand highest-income segments featured more distinct behaviours. Some socio-economically comparable groupings behaved differently in the two cities. For instance, family-archetypes with similar householdmember-compositions could be found in both cities. This suggests that the age of children affects the expenditure patterns of households. However, some of those family-archetypes showed similar consumption patterns across cities, while others behaved completely differently.
In conclusion, we observed both: similar behaviour of related socio-economic groups as well as supposable city-specific behaviour. Further investigations are needed to find the causes of varying behaviours in different cities. While the search for such causal drivers is beyond the scope of the present article, the applied data-driven approach is able to quantify different lifestyles and associated environmental impacts in individual cities. Thereby, our study revealed that a detailed planning base for targeted measures in cities should not rely on general trends. A city-specific differentiation is important for the analysis of consumption behaviour as well as for GHG-factors; the latter also emphasising the importance of improving supply chains to cities. Nevertheless, some similar behaviours could be found suggesting that once an effective policy action has been designed for an individual archetype in a city, this measure could be used to target other household groups and applied even across cities. But to verify this, more research efforts are needed and psychological aspects and behavioural economics theories ought to be considered [16,35].
The application of the consumption-archetypes idea [39] to Sydney and Melbourne confirmed the importance of breaking traditional socio-economic household segments by including observed consumption data together with socio-economic aspects when studying environmental impacts from household consumption patterns. Furthermore, we were able to advance Froemelt et al's approach [39] by developing an uncertainty framework. Our study might help to generate new insights into household consumption in general, and particularly in cities. We also hope that this article may be a small contribution to support Australia towards the 'Sustainable Transition' strongly suggested by Allen et al [93] and we will further pursue the archetype-framework and aim for more international studies.