Measuring human wellbeing: A protocol for selecting local indicators

Improving human wellbeing is a major focus of international environmental and sustainable development policy. However, clearly deined measures of wellbeing are needed as an empirical base for the formulation and eval-uation of policies. Despite conceptual progress towards agreement of universally relevant dimensions of well- being, consensus is still lacking on how to translate these dimensions into locally appropriate indicators to measure wellbeing in different contexts. This paper focuses on three interrelated challenges associated with this knowledge gap: (1) navigating trade-offs between complexity versus simplicity of concept; (2) integrating top-down and bottom-up perspectives; (3) ensuring a cost-effective and lexible approach suitable for different policy contexts. We contribute to illing this gap by developing a step-by-step Wellbeing Indicator Selection Protocol (WISP) for measuring wellbeing. The protocol integrates perspectives through an interdisciplinary mixed methods design that includes cross-validation between quantitative approaches of redundancy analysis and statistical modelling and qualitative approaches of focus groups and thematic analysis. In this way we promote a pragmatic approach suitable for a range of social and environmental contexts. We tested WISP in rural Tanzania, identifying 111 candidate wellbeing indicators. This list was simpliied to a subset of 19 indicators that retained 91 % of measured variation among all wellbeing indicators. The simpliied list was representative of both a multidimensional concept of wellbeing and the diversity of opinions sampled. We conclude that the protocol provides practical, statistically validated guidance to support the design of wellbeing assessments, maintaining coherence between universal theory and local realities.


Introduction
Improving human wellbeing has become a major goal of international environmental and sustainable development policy (UNDP (United Nations Development Programme), 2015; CBD (Convention on Biological Diversity), 2016). However there remains ongoing debate about how wellbeing should be conceptualised and measured (Dasgupta, 2001; OECD (Organisation for Economic Cooperation and Development), 2013). Meanwhile these high-level policy goals have largely fallen short in terms of the persistence of extreme poverty, increasing inequality and environmental degradation (Fehling et al., 2013;Allen et al., 2018;McGregor, 2018). Clearly deined, measurable indicators of wellbeing are needed to improve achievement of policy goals by (1) providing an evidence-base to track progress towards a more inclusive society (Brende and Bent, 2015;Costanza et al., 2014;Hicks et al., 2016), and (2) highlighting social issues requiring attention and adaptive action (Brown and Westaway, 2011).
The rising popularity of human wellbeing as a measure of development stems from growing recognition of the failures of economic indicators to adequately represent non-economic aspects of peoples' lives (Klugman et al., 2011;Haq, 1996). For example, education can be a stronger predictor of health than income (Sen, 1999;Herd, Goesling and House, 2007). In contrast, the concept of wellbeing encompasses a broader notion of multidimensional development, building on an understanding of what people need to participate and lourish in society (Max-Neef et al., 1989;Alkire and Foster, 2011). Various deinitions of wellbeing exist, though none are unanimously accepted (Brown and Westaway, 2011). Here we adopt a deinition developed by the Wellbeing in Developing Countries research group, which deines wellbeing as 'a state of being with others, which arises where human needs are met, where one can act meaningfully to pursue one's goals and where one can enjoy a satisfactory quality of life ' (Gough and McGregor, 2007).
Three distinct dimensions of wellbeing have been identiied, for which there is growing theoretical consensus; objective, subjective and relational wellbeing (Boarini et al., 2014). These form the beginnings of a uniied theory of wellbeing, with contributions from diverse disciplines of philosophy, psychology, economics, and more recently, the natural sciences (Schleicher et al., 2017). Objective wellbeing is concerned with the material conditions of a person's life, often represented by wealth indicators of poverty (Mcgregor and Sumner, 2010). Subjective wellbeing is concerned with self-evaluation of personal circumstances (Vanhoutte, 2015). Examples of subjective wellbeing measurement include the Satisfaction with Life Scale, a ive-question research instrument where respondents self-report their satisfaction with life as a whole (Pavot et al., 1991). Thirdly, relational wellbeing, based on the capabilities approach of economist Amartya Sen (Sen, 1999), concerns the opportunities available to a person, recognising that individual wellbeing is pursued in relation to other people (Gough andMcGregor, 2007, Woodhouse et al., 2015).
Progress towards operationalising wellbeing has been made through increasing theoretical convergence towards breaking down these broad conceptual dimensions, into more speciic but still universally relevant domains of wellbeing (McGregor, 2018). Alternative lists of domains have been suggested (for a review see King et al., 2014). However, all build on a human, rather than purely economy-centred conception of development and cover similar aspects of peoples' lives, with relabelling of alternative lists largely relecting the speciic purpose, or disciplinary approach (McGregor, 2018). Here we take an interdisciplinary approach to wellbeing for use across social and environmental contexts. We therefore adopt the domains put forward by the Millenium Ecosystems Assessment (MEA (Millennium Ecosystem Assessment), 2005), which explicitly uses a socioecological systems approach and deines ive domains: (1) Basic material for a good life -hereafter referred to as material wellbeing, (2) Health, (3) Social relations, (4) Security, (5) Freedom of choice and action (hereafter referred to as freedom; Narayan et al., 2000;Supplementary material).
There is also growing methodological agreement of a general approach for measuring wellbeing. Conceptions of wellbeing are socially constructed and since communities are not homogenous, there is a need to consider how understandings of wellbeing differ between actors and contexts (Martin et al., 2014;Wood et al., 2018). Therefore, participatory methods should be used to include the views of those individuals whose wellbeing is being assessed (Camield et al., 2009;Sterling et al., 2017). Furthermore, heterogeneity may exist within households (de Lange et al., 2016). Therefore individuals should be the unit of measurement, rather than households as a whole (Fry et al., 2015).
Despite these advances towards measuring wellbeing, a remaining knowledge gap concerns how to effectively translate universally relevant wellbeing domains into local indicators (Mcgregor, 2018;Sterling et al., 2017). We refer to 'local indicators' as incorporating context speciic values (Caillon et al., 2017;Sterling et al., 2017). Here we focus on three interrelated challenges associated with selecting local indicators, which we refer to as (1) complexity-simplicity, (2) integrating perspectives and (3) practical utility.
Firstly, given the multidimensional nature of wellbeing, thousands of potentially relevant indicators exist (Breslow et al., 2016;Corrigan et al., 2017). Previous studies have identiied correlations between different social indicators (Mcgillivray, 1991, S7, Supplementary material). For example, there is a strong correlation between literacy and income (Qizilbash, 2001). The inclusion of highly inter-correlated indicators provides little additional information about variation in wellbeing, suggesting a level of redundancy and the potential to use fewer indicators for concise communication of wellbeing assessments to policymakers. Furthermore, lengthy questionnaires may cause respondent fatigue (Ben-Nun, 2008) which has ethical and data quality implications. Yet, oversimpliication risks losing the rich description intended by the wellbeing concept. We refer to this as the 'complexity-simplicity problem'.
We suggest that introducing the use of statistical approaches for variable reduction may help to navigate the complexity-simplicity challenge. Breslow et al. (2016) identify the need to select parsimonious sets of indicators for wellbeing assessment i.e. reducing the number of indicators without loss of the complexity required to adequately describe wellbeing. However, we are not aware of any wellbeing indicator selection methods that utilise statistical approaches to guide the process of reducing the number of indicators. Introducing the use of statistical methods provides several beneits (Murtaugh, 2009). The removal of numerically correlated indicators creates an orthogonal (uncorrelated) set of indicators (Crawley, 2007). Orthogonality among indicators is a fundamental assumption of statistical analysis and required to avoid erroneous results of any subsequent analysis of wellbeing data (Zuur, et al., 2010). Furthermore, statistical methods can be exactly repeated between sites, minimising the introduction of human bias and supporting comparison between wellbeing assessments.
A second challenge is to determine how best to integrate top-down perspectives (i.e. from wellbeing theory) and bottom-up perspectives (i.e. local knowledge of study participants; Boarini et al., 2014). We refer to this as the 'integrating perspectives challenge', which can be understood in terms of how contrasting disciplines seek to maximize different aspects of scientiic validity. Top-down perspectives are common in the natural sciences and quantitative social sciences and tend to prioritise 'external validity', i.e. the ability to generalise indings to different contexts and populations (Campbell and Stanley, 1963). For example, top-down selection of wellbeing indicators may take place through a combination of literature review and expert opinion (Biedenweg et al., 2016;Breslow et al., 2017). This approach promotes external validity through strong relation to theory, but may marginalise the perspectives of those people whose wellbeing is to be assessed, thereby lacking local relevance (Grillo and Stirrat, 1997;Woodhouse et al., 2016).
Conversely, bottom-up perspectives emphasise the need for contextual understanding and 'ecological validity', deined by Yue (2012) in relation to case study research as the extent to which the researchers' indings relect the lived experience of those whom the researchers are studying. Ecological validity ensures that local relevance is retained, promoting rather than marginalising the needs of study participants (Howard et al., 2016). Efforts to prioritise bottom-up perspectives in conceptualizations of wellbeing have been undertaken through anthropological and in-depth qualitative research approaches (Beauchamp et al., 2018;Woodhouse and McCabe, 2018). However, if an exclusively, bottom-up perspective is followed, some important issues may go unreported due to the adaptive preferences of survey respondents (Sen et al., 1999;Mitra et al., 2013).
The Basic Necessities Survey is a quantitative social assessment tool that builds on this bottom-up perspective, prioritizing locally deined indicators by combining focus group consultations followed by a household questionnaire (Davies, 2007). An issue with this approach is that it does not organize indicators in relation to a conceptual framework (Schreckenberg et al., 2010). This risks overlooking subjective indicators that are less easily articulated through participatory discussions, thereby invalidating conclusions about the overall wellbeing of respondents if one dimension of wellbeing is missed (Woodhouse et al., 2015). These tensions between the strengths and weaknesses of top-down and bottom-up perspectives should be carefully considered (Poteete et al., 2010). Each perspective illuminates important aspects of wellbeing, but in doing so, prioritises contrasting forms of validity, which need to be integrated to gain a well-rounded understanding of human wellbeing (Fig. 1).
The third challenge of practical utility, is that the process of selecting local indicators must be cost-effective and adaptable in order to mainstream the process into different policy contexts (Rasmussen et al., 2017).
Here, we contribute a Wellbeing Indicator Selection Protocol (WISP) that aims to operationalise measurement of human wellbeing in different contexts. The protocol provides a generalised, step-by-step method to help researchers and practitioners translate universal wellbeing domains into locally appropriate indicators. To address the complexity-simplicity challenge, we introduce the use of statistical methods to remove redundant indicators. To address the integrating perspectives challenge, the protocol employs a mixed methods design to balance external and ecological validity (Fig. 2). To assess the practical utility of the protocol, we provide an example of its use in rural Tanzania (Supplementary material S1). We critically evaluate the protocol's effectiveness to address these three challenges.
WISP is intended to be used in the scoping phase of projects operating at landscape, or regional sub-national scales to support the design and testing of wellbeing questionnaires prior to implementation of the survey instrument. Potential applications include exploratory use to identify local priorities in order to supporting policy formulation. The protocol can also be used to support context speciic wellbeing impact evaluations of conservation and development projects (for an overview of wellbeing impact design considerations see Woodhouse et al., 2015). We highlight common design considerations for wellbeing assessments and discuss implications for the protocol's wider use.

Generalised overview of WISP
Sample selection. Before undertaking a wellbeing assessment, the diversity of community actors present within the intended study area should be identiied and particular consideration given to ensure participation of marginalised groups in a way that is culturally sensitive in the local context (Franks and Small, 2016). A minimum of two contrasting sites (e.g. villages) should be visited in order to sample variation across the study area. Selected sites should be representative of the study area and encompass variation in key socio-economic and environmental variables of relevance to the local context (PEN, 2007). Common criteria for consideration include economic drivers of wellbeing, such as proximity to local markets (Helliwell and Putnam, 2005), environmental drivers, such as topography inluencing farming and other livelihood practices (Boarini et al., 2014).
Step 1. WISP uses a stratiied random sampling design to identify an unbiased sample of local actors within villages (for an overview of sampling approaches see Angelsen et al., 2011). Stratiication should use participatory wealth ranking or other criteria relevant to the speciic context of the study, such as gender, age or livelihood (Supplementary material S1).
Exploratory focus groups are undertaken to identify candidate wellbeing indicators, with each focus group comprising a single community actor group (e.g. divided by gender) to reduce within-group variation, thereby encouraging uninhibited discussion and crossvalidation of ideas between participants (Kitzinger, 1994;Macnaghten and Myers, 2011).
It is important to frame focus group discussions around a suficiently broad conception of wellbeing and be careful about how this is communicated when translating between languages (OECD, 2013). An open questioning style should be used to facilitate participants to develop a locally understood conception of wellbeing, thereby promoting ecological validity (Supplementary material S2). Thematic analysis of focus group transcripts is then used to identify candidate wellbeing indicators in relation to the ive domains of wellbeing (Supplementary material S2, S3), noting indicators that are speciic to a particular village or community actor group and local priority indicators, which we deine as those indicators discussed in all focus groups. If less than ive indicators are suggested per wellbeing domain then additional indicators should be added from relevant frameworks (Supplementary material S1).
Step 2. All identiied candidate indicators are used to develop a quantitative instrument (questionnaire), which is implemented with a stratiied random sample of respondents (Supplementary material S4;Creswell et al., 2004). This is done to trial the wellbeing questionnaire and gain sample data of how candidate indicators vary across the study area.
Step 3. The spread of responses for each indicator is assessed to eliminate indicators with zero or uneven spread, that would give no helpful information on the variation of wellbeing present within communities. For an overview of data exploration in relation to common statistical problems see Zuur et al. (2010).
Step 4. A human wellbeing index (HWI) is calculated to represent all indicators in a single, standardised index following principles of the Human Development Index (UNDP (United Nations Development Programme), 2017; Eq.1).
Eq.1. Human Wellbeing Index (HWI), where x is the mean value of standardised indicators from each wellbeing domain.
HWI is used as a continuous response variable to inform further reduction in the number of candidate indicators, using high covariance between indicators to infer statistical redundancy. We deine high covariance as Pearson correlation coeficient (r) ≥ |0.7|, and/or Variance Inlation Factors (VIF) ≥ 3 across all indicators (Dormann et al., 2013;Zuur et al., 2010). In the event of high covariance, the indicator that has the strongest relationship with HWI is retained.
Step 5. Statistical modelling using an information-theoretic approach (Burnham and Anderson, 2002;Burnham et al., 2011) is employed to achieve statistical parsimony, i.e. reduction of indicators without loss of the complexity needed to adequately describe wellbeing. The uncorrelated indicators from step four are used as predictor variables of the HWI response variable in a Generalised Linear Model (GLM) with Gaussian error; suitable for continuous variables with approximately normal distribution. Then to reduce the number of indicators to only those making the strongest contributions to overall wellbeing, we used backwards-forwards stepwise model selection (Venables and Ripley, 2002;Murtaugh, 2009). Stepwise selection is based on Akaike's Information Criterion (AIC; Akaike, 1973) to avoid consequences of frequentist approaches such as F statistics (Whittingham et al., 2006).
Step 6. Finally, it is important that the process of selecting a reduced set of indicators is not blindly automated without critical review and validation checks (Burnham et al., 2011). The reduced indicators are checked to ensure that each wellbeing domain contains at least two indicators to promote external validity consistent with wellbeing theory and that local priority indicators identiied in step 1 are retained to promote ecological validity. Once the indicator selection process has been completed, the retained indicators can be used either individually, aggregated to indices of each wellbeing domain, or provide an overall wellbeing index in support of different policy applications using the HWI equation listed above.

Study region
WISP was tested in Tanzania within a protected area dominated landscape of 10,000 km 2 in Morogoro and Iringa regions where landscape planning interventions aimed to deliver improvements in wellbeing (SAGCOT, 2011; Fig. 3). The protocol was used to develop a context speciic wellbeing questionnaire to be implemented in a further 20 villages in order to evaluate multidimensional wellbeing impacts of protected areas in the landscape using a site matching design (for an overview of statistical matching see Schleicher et al., 2020).
Two contrasting villages were selected to test WISP. Mang'ula B (Kilombero district, Morogoro region, elevation 306 m) had a population density at 23.6 people per km 2 , annual population growth of 2.29 %, close to the national average of 2.7 % and was located adjacent to a road. In contrast, the mountain community of Udekwa (Kilolo district, Iringa region, elevation 1611 m) had a population density one tenth of that in Mang'ula at 2.4 people per km 2 , with slower annual population growth of 0.72 % (Tanzania National Bureau of Statistics, 2012) and poor road access. Two focus groups stratiied by gender were undertaken in each village to identify candidate indicators. Participatory wealth ranking with village leaders was used to identify a random sample of 90 questionnaire respondents stratiied by gender and socioeconomic status (Supplementary material S1).

Analysis
We undertook analyses to understand how well WISP overcame the three challenges outlined in the introduction. To analyse the complexitysimplicity challenge, we evaluated the value of introducing statistical modelling (step ive) by comparing the two lists of indicators selected by steps four and ive, treating indicator lists as equivalent if within two AIC (Burnham and Anderson, 2002). The potential for further simpliication beyond the inal indicator list was evaluated by further indicator removal. This was done by plotting the sequential loss of deviance explained following consecutive indicator removal of the least contributory indicator (Crawley, 2007). These analyses served as both numeric and visual tools for evaluating how conservative the stepwise model selection process was in terms of simplifying the number of indicators in relation to loss of explained variation.
To analyse the integrating perspectives challenge, we assessed how well the inal indicator list retained site and gender speciic indicators identiied during the thematic analysis of step one (Silverman, 2011).
To analyse the practical utility challenge, we retrospectively assessed the minimum number of questionnaire replicates needed to reach our statistical conclusions using the 'pwr' power analysis package in R (Cohen, 1988;Champely, 2020). For step 4, we evaluated the sample size needed to detect a correlation between indicators at a correlation coeficient of 0.7. For step 5, we evaluated the sample size required to achieve the same effect size as the GLM in the Tanzania case study (Cohen, 1998;Champely, 2020).

Results
From step one, 111 candidate wellbeing indicators were identiied by focus groups and literature review. These were grouped into 62

Fig. 3. Study region in Tanzania (grey rectangle on small map) detailing study villages (black squares) in relation major towns (grey circles), major roads (black lines), protected areas (light grey polygons) and elevation (meters above sea level; dotted contour lines).
questions included in the wellbeing questionnaire (step 2). Removal of indicators with little variation reduced this list to 56 indicators (step 3; Supplementary material S6). Removal of correlated indicators reduced the list further to 30 indicators (step 4, Supplementary material S5). Statistical modelling (step 5) then reduced the list to 17 indicators. The qualitative validation step reintroduced two indicators, resulting in a inal list of 19 indicators (Table 1).

Complexity-simplicity
The inal indicator list (Table 1) explained 91 % deviance in the human wellbeing index. Stepwise GLM reduction of the indicator list from step four (30 indicators) to step ive (17 indicators) led to reduction in AIC from -397 to -412 (ΔAIC = 15), with only marginal loss of deviance (95 % to 91 %). Therefore, the reduced indicator list was more parsimonious. Sequential removal of indicators revealed a pattern of increasing loss of deviance explained per indicator removed (Fig. 4). The loss of deviance explained from 20 to 15 indicators was only 1%, but a 15 % loss of deviance explained was observed between 10 and 5 indicators.

Integrating perspectives
In both the candidate and inal indicator lists, material wellbeing had the largest percentage of indicators (Fig. 5). The inal list had a slightly lower percentage of material indicators and slightly higher percentage of security and freedom indicators (Fig. 5).
The validation checks (step 6) revealed that the social relations domain of wellbeing had been reduced to a single indicator. Therefore, an additional indicator, recognition in the village, was reinserted to improve balance between domains. We identiied one disparity between prioritisation of indicators through quantitative analyses versus qualitative assessment of local priorities. Livestock ownership was identiied as a local priority indicator. However, this indicator was removed by statistical stepwise selection (step 5). To ensure WISP integrated topdown and bottom-up perspectives, livestock ownership was reinserted  in the inalised list (step 6). Whether candidate indicators were considered universally important across the study area, or site, or gender-speciic, depended on the wellbeing domain. Indicators relating to the material domain of wellbeing were strongly corroborated between focus groups. Plant-based agriculture was the dominant livelihood at both sites and so the area of agricultural land owned was identiied early on in all focus groups. While other material indicators, such as household building materials and livestock ownership were also universally identiied in all focus groups. Some gender-speciic differences were noted in terms of the social relations domain of wellbeing. All-female focus groups in both villages identiied the importance of mutual reliance within communities, deined as the ability to lend resources. However, this informal interdependence was not discussed in all-male focus groups. Instead discussion highlighted the importance of recognition from peers in the village indicated by a felt sense that their voice was heard in village meetings. These gender-speciic indicators were included in the inal indicator list (Table 1).
Finally, some differences were noted between villages in relation to the location and extent of remoteness and self-reliance versus connectedness of villages to urban centres. In Udekwa, the village located further from major transport routes and urban centres, candidate indicators within the health and freedom domains of wellbeing included knowledge and access to local medicines and producing enough food to eat. However, in Mang'ula B, the village located close to a major road with direct transport links to urban centres, health insurance and access to formal banking facilities were also discussed by this village. Sitespeciic indicators were retained in the inal indicator list (Table 1).

Practical utility
Power analyses showed that a sample size of 13 questionnaire repeats would be needed to evaluate correlations between indicators in step 4 and 33 replicates to provide suficient power to undertake statistical modelling in step 5.

Complexity-simplicity
WISP resulted in relatively little loss of information concerning variation in wellbeing, while signiicantly reducing the number of indicators. Reintroduction of local priority indicators contributed a small amount of statistical redundancy, exempliied by the minimal reduction in deviance explained when additional indicators were removed beyond the inal list (Fig. 4). However it is vital to characterise wellbeing in accordance with place-based values to avoid unintended harmful consequences of policies for local residents (Sterling et al., 2017). We therefore suggest that WISP remains suficiently conservative to retain a rich decription of wellbeing that balances the trade-off between complex local realities and statistical parsimony.
A comparable alternative social assessment approach is the Basic Necessities Survey (BNS), which in previous studies has identiied between 20 and 25 local indicators (Schreckenberg et al., 2010;Davies, 2007). BNS creates an index of poverty, providing a narrower conception of wellbeing focused on the material domain. We therefore conclude that in terms of the complexity-simplicity problem, the protocol performed well in relation to BNS, creating a more concise list of indicators, yet more representative of multidimensional wellbeing.

Integrating perspectives
Mixed methods provide an opportunity to identify and address tensions between qualitative and quantitative methods (Denscombe, 2008). In our study the statistically led simpliication step reduced the number of candidate indicators and in the process removed livestock ownership, which had been frequently mentioned in focus group discussions. This highlights a tension between external and ecological validity and emphasises the importance of integrating perspectives in order to navigate this trade-off (Fig. 1). Indeed, we suggest that there is an inherent tension in translating a felt sense of wellbeing into numeric values with potential to compromise ecological validity through over reliance on quantitative approaches. However in our Tanzanian example, the protocol helped to reconcile this tension through step six, which ensured the inal indicator list was aligned to local priorities.
Future users might consider adapting the protocol steps and indicator inclusion criteria depending on the intended application. For example, if used to evaluate the impact of a speciic intervention, such as a water security program, it might be important to use focus groups to explore locally relevant indicators for the intervention, such as clean water access and prioritise retention of intervention-speciic indicators (Jensen and Wu, 2018). Alternatively, if used to evaluate change in wellbeing through time (Sayer et al., 2007), then researchers might choose to be more conservative in retaining indicators that have little variation at the time of the irst survey (step 3), but for which variation is expected to increase, either as a result of increasing inequality (Martin et al., 2014), or as a result of the intervention targeting a subsection of the population. Where there is doubt, we recommend using clear hypotheses to justify additional inclusions and using locally stated priorities and ecological validity as a guiding principle to determine inclusion.
Our observation that the material domain of wellbeing comprised more indicators than other domains concurs with observations from other developing and developed countries. Namely, material wellbeing may be distributed among a number of different sources (DFID, 2000;Goodwin, 2003). In our study, the dominant sources of material wellbeing were inancial, land and livestock. This pattern of spreading material wellbeing among a number of capital sources can be interpreted as a strategy for enhancing the resilience of individuals; the ability to cope with and overcome shocks (Folke et al., 2002;Walker et al., 2006). As such, we suggest that rich descriptions of material wellbeing that include multiple indicators are also applicable for evaluating the related concept of resilience (Cinner et al., 2009;Hoque et al., 2017).
Various cultural, socio-economic and livelihood characteristics of individuals inluence which sources of material wellbeing are invested in (Miller and Hajjar, 2019;Sunderlin et al., 2005). For example, pastoralists may invest far more in livestock compared to individuals whose livelihood depends more on crop-based agriculture and invest more in land. Therefore, to accurately compare material wellbeing in heterogenous communities we suggest that a larger number of indicators may be needed for this domain than others to provide an accurate summative measure that accounts for differential capital investment patterns.
An alternative explanation for the large number of material indicators relates to the methods used in this study. Contrasting qualitative and quantitative methods are better suited towards identifying different social phenomena (Braun and Clarke, 2013;Bull et al., 2015). For example, ethnographic approaches are tailored to the identiication of in-depth personal narratives and socially constructed themes (Atkinson and Silverman, 1997). In contrast, WISP uses a more rapid approach to identifying candidate wellbeing indicators. As a result, the less tangible aspects of wellbeing, such as social relations, were relatively under-represented among candidate indicators. Instead objective indicators that were more easily observable and articulated were more represented (Schreckenberg et al., 2010). Future adoption of ethnographic approaches to complement the protocol might facilitate exploration of the less tangible aspects of wellbeing. Another approach to promote broader representation of indicators would be to structure focus group discussion topics around the ive wellbeing domains. However, we preferred a more open questioning style to encourage study participants to lead discussions, rather than be conined by wellbeing theory, thereby promoting ecological validity.

Practical utility
The introduction of a statistical techniques to the process of selecting wellbeing indicators may cause a technical challenge for researchers with more qualitative backgrounds. However, there is also potential for greater expansion of this element of WISP. Future studies might consider multi-model averaging approaches (Burnham and Anderson, 2002), to determine additional more subtle contributions to wellbeing from indicators dropped from our models. However, step 4 and Fig. 4 show that any changes to the indicators selected would add only marginal difference to the variation explained in wellbeing. Therefore we prefer our more simple and accessible statistical approach.
Decisions regarding appropriate sample sizes of villages, the diversity of actors and questionnaire replicates in future studies will depend in part on a practical trade-off between exhaustive sampling and resource constraints. Here we sampled two villages, though we stress that this igure should be used as a guide only and larger scale studies may require additional sampling. The Poverty Environment Network guidance suggests that questionnaire pre-testing should include seven draft questionnaire trials before commencing the main survey (PEN, 2007). However, in our Tanzanian example we estimated that 33 questionnaire repeats were required for statistical analyses. Therefore we recommend a conservative minimum of 40 questionnaires be undertaken in future applications of the protocol to allow for context speciic differences in wellbeing indicators. We suggest that increased pre-testing investment is a necessary consequence of moving away from simpler conceptions of wellbeing or poverty, towards robust measurement of a more complex conception of multidimensional wellbeing. As the number of indicators increases, there will be greater potential for correlation and violations to statistical assumptions. Consequently, the introduced orthogonality checks are necessary to promote external validity and robust analysis. An additional beneit of investing in questionnaire simpliication at the beginning of a wellbeing assessment is that this shortens the questionnaire; in our case study to less than a third of its original length. This is more eficient of time, resources and reduces respondent fatigue during survey implementation (Trochim, 2006).

Conclusions
We have demonstrated that WISP makes progress in addressing three interrelated challenges to measuring wellbeing in different local contexts. We therefore recommend the protocol as practical and statistically validated step-by-step guidance to support the design of multidimensional wellbeing assessments, maintaining coherence between universal theory and local realities. In this way, the protocol contributes to a research agenda seeking to support policy makers in advancing a holistic notion of social progress. Future contributions to this ield might explore how to integrate local and national scale wellbeing assessments. Also the integration of local perspectives with actors operating at larger scales, such as national policy makers in order to advance transparent and equitable policy decision-making.

Declaration of Competing Interest
The authors report no declarations of interest.