Perceived value interviews and socio-economic survey data for communities in rural Uganda

This article describes a dataset of perceived values and socioeconomic indicators collected in rural Ugandan communities. The data were collected in interviews which employed: (1) the User-Perceived Value game, which solicits verbal data using graphical prompts and ‘why’-probing; and (2) socio-economic surveys, which collected demographic data. The dataset constitutes 119 interviews conducted between 2014 and 2015 in seven rural Ugandan villages. Interviews were conducted in various settings (e.g. individual/group, women/men/mixed) and in seven different local languages (which were subsequently translated into English). These interviews were part of a research project aiming to better understand what is important to rural communities in Uganda, and to investigate decision-making as a function of different demographics. This dataset can be used by researchers and practitioners in various fields such as sustainable development (e.g. to analyze how development initiatives may be designed to match community values) and natural language processing (e.g. to automatically perform perceived value classification from the expert-annotated interviews).


a b s t r a c t
This article describes a dataset of perceived values and socioeconomic indicators collected in rural Ugandan communities. The data were collected in interviews which employed: (1) the User-Perceived Value game, which solicits verbal data using graphical prompts and 'why'-probing; and (2) socio-economic surveys, which collected demographic data. The dataset constitutes 119 interviews conducted between 2014 and 2015 in seven rural Ugandan villages. Interviews were conducted in various settings (e.g. individual/group, women/men/mixed) and in seven different local languages (which were subsequently translated into English). These interviews were part of a research project aiming to better understand what is important to rural communities in Uganda, and to investigate decision-making as a function of different demographics. This dataset can be used by researchers and practitioners in various fields such as sustainable development (e.g. to analyze how development initiatives may be designed to match community values) and natural language processing (e.g. to automatically perform perceived value classification from the expert-annotated interviews

Value of the Data
• These data can be used to better understand what is important to rural communities in Uganda and identify potential factors for decision-making amongst varying demographics (e.g. gender, age, income-levels). • These data can be useful to practitioners working on projects with rural and remote communities in Uganda (e.g. project developers, development workers), as well as to the academic and scientific communities. • Furthermore, these data can be used to analyze the relationship between the perceived values and a number of other dimensions, including education, gender, interview setting, and many others as done by Hirmer et al. [2] . • Finally, the data can be used to train classifiers, such as deep learning-based models, to automatically perform the task of perceived value classification from the expert-annotated interviews, in a similar way as done by Conforti et al. [3] .

Data Description
The dataset described in this article was collected in seven communities across Northern, Eastern, South-western, and Western Uganda between 2014 and 2015. It comprises information about household characteristics (including education and employment; living situation; infrastructure; household shocks, borrowing and household debt; subjective wellbeing and social attitudes; and religion and exposure), and personal values. The information on household characteristics were obtained from socio-economic surveys, and the data on values by interviewing community members using the UPV game in different settings. A codebook describing value Table 1 Setting specification at different stages of the dataset construction.

Timeline
Between 2014 and 2015 Location 7 rural villages in different areas of Uganda. Collectors Ugandan citizens fluent in the different local languages, who participated in a training workshop.

Methodology
The interviews were conducted in locations familiar to the interviewees, mostly in open air. The interviews were conducted both individually and in groups of six following standard focus group methods. To avoid direct inquiry, the interviews were conducted by means of the UPV game , which is described in detail in [1] , resulting in semi-structured interactions. Each interviewee also completed a survey.

Data Translation
Timeline 2015 Location Various villages in rural Uganda Translators Ugandan citizens from the same areas where the interviews were collected. All translators spoke Uglish (the Ugandan variety of English) fluently as a second language.

Data Annotation
Timeline 2019 Location United Kingdom Annotators A 30-year-old Bavarian woman native in German with fluency in English, who had worked and lived in rural Uganda prior to that work, and a 3rd year Ph.D. student in Natural Language Processing.

Methodology
Each utterance was separately annotated. In case of disagreement, the final labels were decided through discussion.
labels for variables and the questionnaires that were used for data collection is also available to accompany this dataset. However, identifying variables such as names, GPS coordinates, and village names are anonymised. The dataset is composed of two main parts: the qualitative UPV interviews and the socioeconomic surveys. In the sections below, first we describe how the dataset was built, and then we describe each of these two main parts in detail.

Dataset Building
The dataset was collected in three main stages, summarised in Table 1 .

Data format
The survey data is provided as a comma-separated values file (CSV), which reports the speakers' answers to a set of 49 questions. In the file, each column represents a question, and each row represents a single speaker's answers. The questions are recorded in the CSV header (i.e. the first row).
To ease analysis, a mapping between the questions in the csv and all possible answers is provided as a single file in JavaScript Object Notation (JSON) format. The JSON file contains a dictionary of dictionaries, in which each dictionary represents a question and its possible values. This is illustrated by the example below, which reports the json entry for the question on 'Education Status'.

Data Description
The interviews and surveys were collected orally in the native language of the speakers. A complete list of survey questions is provided in Table 2 . Speaker demographics are reported in Table 3 . The data comes in an anonymized, disaggregated form.

Data format
The perceived values dataset drawn from UPV game interviews is provided as a single file in JavaScript Object Notation (JSON) format. The file contains a list of 2,341 entries. Each entry contains the interview excerpt discussing one specific item, as part of the UPV game described in Section 2, and is encoded as a dictionary with the following specifications ( Table 4 ).
The following entry exemplifies this data format: { "item'': "computer'', "item_order'': 9, "speaker_id'': 71, "village_id'': 'South-Western Uganda', "interview_setting'': "single woman'', "utterances'': [ "And it again helps in making work easy for example me; I take a knife to be a computer anywhere and anything that makes work easy I refer it as a computer.'', "If I have a computer haaa I would be making money every minute because everyone here in < PLACE > wants to type work or print has to go < PLACE > for that computer work and if I had it I would do it from her not even putting transport and time.

Data Description
The quantities of utterances are quite even between men and women and amongst the seven villages. The distribution of utterances per interview setting and per village in the dataset are reported in Fig. 1 a and b. Fig. 2 reports the distribution of speakers over some of the considered variables (source of employment, and distance to a source of water).
While English is the national language in Uganda, the written English of translators was poor at times. As a consequence, the dataset contains some grammatical errors. Given that the interviews were manually transcribed, some utterances contain typos and spelling errors.
To protect the participants' identity, the exact names of the villages are not released. All proper nouns (people, tribes, locations, ...) in the utterances are anonymized with special tags (such as < PERSON > or < LOCATION > ).   On average, utterances are 18.5 words long ( Fig. 3 a), without any major difference between interview setting type ( Fig. 3 b).

Sampling
In this section, the specific sampling considerations are described. A number of samples from specific populations were taken at different stages of the research for case selection, focus group members, and interviewees.

Case selection
Case villages were selected with the aim to provide a representative sample of a population of rural Uganda that could be interviewed. As these data were originally collected as part of a research project focusing on off-grid energy access, selected case villages must have benefited from off-grid energy access initiatives. Seven case study villages located from four regions of Uganda were selected. The regions included: Northern, Eastern, South Western, and Western Uganda.

Participant sampling
In each of the seven villages, 17 interviews were conducted with 12 participants (i.e. a total of 119 interviews and 84 participants). The sampling criteria listed in Table 5 were applied for both individual and focus group interviews. Note that the same people participating in the UPV game were also asked to take part in the socio-economic survey.
Five focus group meetings were held in each of the seven case villages with six participants in each group, giving a total of 35 focus groups. Each village focus group was initially separated into male and female participants to facilitate equal gender participation (as female participants in the rural Ugandan village setting may be less likely to voice opinions in the presence of male participants [4] ). Members were selected to represent the diverse interests of the village, e.g. large family, farmer or female head of household [5] . It made use of the standard focus group method of a group interview, consisting of six participants [5][6][7] .

Setup
Translators were employed to undertake the data collection with rural villagers in Uganda. This was necessary as in the four different regions containing the case villages, seven different languages were spoken. To ensure consistency in the collected data, a two-day training session was held with the translators prior to the fieldwork. During field interviews, each translator took notes and audio recordings were also made. To further ensure consistency across all villages, a local research assistant was hired to accompany and oversee all fieldwork.
Each participant (i.e. interviewee) received a small financial compensation of UGX 5,0 0 0 (£1.25). This roughly corresponds to the daily salary in case villages at the time of the interviews.

Narrative UPV game
The UPV game can help to better understand what users of development initiatives find to be of value. It requires participants to select items based on which are most important to them. Participants also have the option to identify new items or matters of value to them. This is followed by inquiry as to why the selected items were important. This approach is based on methods commonly used in market research and product design.
This method is designed to bypass interviewee predispositions and preconceptions by redirecting focus to the game itself. This focus helps prevent interviewees from trying to assume or second-guess responses. It encourages semi-structured storytelling and discussion of topics which resonated with each participant's experience.
The game seeks to identify user-perceived values (UPVs) which illuminate the underlying reasoning as to why selected items are important to beneficiaries. Such values are insufficiently captured by other needs assessment tools.
To gather the data described in this paper, the UPV game was played with participants in each of the seven villages. Participants were equally split between men and women and came from a variety of backgrounds and ages. The interviews were conducted in a variety of settings to obtain results representative of the community given the complex nature of influences on decision-making. The values, opinions, and preferences people express may vary in different group settings [8] . In light of this, there were 12 individual interviews and five group interviews per village. Group interviews included six participants each, and were held for: • Men; • Women; • Mixed-gender (with the three most active participants from the previous discussions); • Men discussing solutions proposed by women; • Women discussing solutions proposed by men.
For each round of interviews (except when men/women discussing womens'/mens' choices), participants were asked to (individually or as a group): • Select 20 out of the 46 presented items based on what is important to them. Items included everyday products or services found in rural Uganda, such as livestock (e.g. cow, chicken), basic electronic gadgets (e.g. mobile phone, television, radio), household goods (dishes, soap, blanket), and horticultural items (e.g. plough, hoe). Participants could also name additional items they perceived as important. Items were depicted graphically to account for the low level of literacy across developing countries [9] like Uganda, where 43% are illiterate and rural areas are the worst affected [10] . • Rank their selection in order of importance.
• Give reasons as to why these items are most important to them personally. At this stage, participants were encouraged to give reasons ("why is this important to you?") that reflected their personal lives. This method is called "why-probing". Answers were in the form of storytelling.

Narrative socio-economic survey
To contextualise the data gathered from the UPV game, a socio-economic survey was also conducted. This data can be used to understand the participants' stories and value perceptions within the broader context of their lives, demographics, and status.
The socio-economic survey was conducted in a semi-structured way to allow for flexible dialogue rather than rigid questioning [5] . Similar to the UPV game, the interviews were conducted with help of translators in local languages.
The socio-economic survey was conducted during the same visit as the UPV game with the same 12 participants in each village. It collected data within the following categories: general information; education and employment; living situation; infrastructure; household shocks, borrowing and household debt; subjective wellbeing and social attitudes; and religion and exposure (see Table 2 ). In addition to providing background information, the survey gathered information on life stage, external influences, and social, economic and cultural predispositions and perceptions. Highest grade completed in general education (# of years completed) 7 Does/Did attend any technical/vocational school or college, either privately run or publicly run? 8 What is the employment status of the participant? 9 What are the sources of income of your household? (In cash and in kind; can write multiple & role HH member i.e. son, husband/wife) Table 3.
Demographic characteristics of the speakers.

Ethics Statement
To ensure the study's integrity, a risk and ethics assessment following the Cambridge School of Technology Research Ethics Committee at the University of Cambridge in accordance with the procedures laid down by the University for Ethical Approval for all research involving human participants was completed and approved with Reference: R68195/RE001. To protect the participants' identity, all names were removed.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.