Dataset on cigarette smokers in six South African townships

A total of 2453 smokers were interviewed in townships over two rounds of data collection. Townships are low-income, urban areas characterised by overpopulation, poor service delivery, crime, and poor socioeconomic outcomes. Township residents typically live in poverty. Data were collected from six townships in four of South Africa's nine provinces, namely Gauteng (Eldorado Park and Ivory Park), Western Cape (Khayelitsha and Mitchell's Plain), Free State (Thabong) and KwaZulu-Natal (Umlazi). These townships were chosen to represent both the geographical and racial spread of low socioeconomic areas in South Africa. Round 1 data (n = 1260) were collected from October to November 2017, and round 2 data (n = 1193) were collected from July to August 2018. The sample includes two of South Africa's four population groups: African and mixed race (locally referred to as “Coloured”, which describes people of mixed Khoisan, Malay, European, and black African ancestry). Since few Whites and Asians live in townships, they were not sampled. Households were selected via a random walk through each township. One smoker per household was interviewed (if a household contained at least one available smoker). We aimed to interview 200 adult smokers (aged 18+ years) per township per round. If a household had more than one smoker, a random selection determined which smoker to interview. Respondents were asked about their most recent cigarette purchase, specifically packaging type (single stick, pack, or carton), number of items purchased, brand, type of outlet where the cigarettes were bought, and the total amount paid for cigarettes. Respondents were also asked about other tobacco use in the household, and about their perceptions regarding illegal cigarettes. Socioeconomic and demographic information was collected at the individual and household level. The data has been used to estimate illicit trade (https://tobaccocontrol.bmj.com/content/early/2020/03/10/tobaccocontrol-2019–055136.info), and to analyse the determinants of smoking intensity (https://www.sciencedirect.com/science/article/pii/S2211335520300590).

pack, or carton), number of items purchased, brand, type of outlet where the cigarettes were bought, and the total amount paid for cigarettes. Respondents were also asked about other tobacco use in the household, and about their perceptions regarding illegal cigarettes. Socioeconomic and demographic information was collected at the individual and household level. The data has been used to estimate illicit trade ( https://tobaccocontrol.bmj.com/content/early/2020/03/ 10/tobaccocontrol-2019 -055136.info ), and to analyse the determinants of smoking intensity ( https://www.sciencedirect. com/science/article/pii/S2211335520300590 ).
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Table   Subject Public Health and Health Policy Specific subject area Tobacco control Type of data Excel file How data were acquired Questionnaires completed by interviewers using electronic devices Data format Raw Parameters for data collection Data were collected from low socio-economic areas in South Africa. The sample consisted of households with at least one cigarette smoker. If a selected household had more than one smoker, a random selection determined which smoker to interview. Description of data collection Random walk. Data source location The data were collected from six townships in four of South Africa's nine provinces: Gauteng (Eldorado Park and Ivory Park), Western Cape (Khayelitsha and Mitchell's Plain), Free State (Thabong) and KwaZulu-Natal (Umlazi). These townships were chosen to represent both the geographical and racial spread of low socioeconomic areas in South Africa.

Data accessibility
The data is available on a public repository.

Value of the Data
• This dataset provides detailed information about tobacco use in South African townships.
Smokers were asked about their smoking behaviour including: initiation age, number of cigarettes smoked per day, purchasing behaviour, brand choice, where they bought cigarettes, and quit attempts. Smokers are also asked about their use of other tobacco products, and perceptions regarding illicit cigarettes. The dataset provides detailed information on township smokers' demographic and socio-economic characteristics (including gender, population group, age, and education). • Researchers and policymakers who are interested in tobacco use in townships will benefit from this data. • Researchers can investigate popular brands by township and province, as well as price differentials across brands/producers. Researchers can also compare prices charged by various types of retailers (foreign-owned spaza shops, South African-owned spaza shops, large retail stores, street stands, vending machine, house shop, internet, and from family/friends

Data Description
The data consists of two repeated cross sections. The first round of data ( n = 1260) was collected from October −November 2017. A similar sized sample ( n = 1193) was collected the following year from July −August 2018. Approximately 200 smokers per township were interviewed in each round ( Table 1 ). In round 2, the goal of 200 smokers per township was not achieved in Eldorado Park (Gauteng) because of safety concerns. To compensate for the decreased number in Eldorado Park ( n = 133), more smokers were interviewed in Ivory park ( n = 263) (also in Gauteng). Table 1 below provides a description of the raw data for each round and overall.

Background and other similar surveys
Dedicated surveys that look at smoking behaviour have been done in many countries to monitor the tobacco epidemic. The Global Adult Tobacco Survey (GATS) is a nationally representative survey that has been conducted in more than 25 low-and middle-income countries. GATS enables countries to monitor adult tobacco use and assess key tobacco control measures, and is comparable across countries. [1] The Global Youth Tobacco Survey (GYTS), launched in 1999 is a school-based survey which monitors tobacco use amongst students aged 13 −15. The GYTS was conducted in South Africa in 1999, 2002, and 2011. [ 2 , 3 ] The International Tobacco Control Evaluation Project (ITC) is a system for evaluating the impact of tobacco control measures, particularly national policies of the World Health Organization's Framework Convention on Tobacco Control. ITC has been conducted in 29 countries and is also designed to be comparable across countries. [4] To date, neither GATS nor ITC have been conducted in South Africa.
Besides tobacco-specific surveys, national South African household surveys provide insight into tobacco use. South Africa has several nationally representative datasets that include questions on smoking: the National Income Dynamics Study, the Demographic and Health Survey, and the South African Health and Nutrition Examination Survey. [5-7] However, these surveys cover a range of topics; the focus is not on tobacco use and therefore questions on tobacco use are few. Although these surveys are nationally representative, they cannot be used to analyse tobacco use at a smaller geographical level, such as townships. For this reason, it was necessary to run a dedicated survey.

Survey design, sample selection, and data collection
South Africa has nine provinces. Data were collected from four provinces (Gauteng, Western Cape, KwaZulu-Natal, and the Free State). Since the Western Cape and Gauteng are highly populated, we selected two townships (one predominantly mixed race, the other predominantly African) in each of these provinces.
Interviewers walked through the selected townships and approached households. Questionnaires were completed using survey software (SurveyCTO) on a handheld device. Enumerators were required to enter the number of directions it was possible to take at an intersection, so that the system could randomly select the direction to take, the side of the road to walk along and the house to interview. The counting process occurred from the individual's left side. Due to safety issues, some round 1 fieldworkers deviated from this approach and went to areas that were perceived as more safe.
Fieldworkers compiled a roster of all adults (18 + years old) living in the household. The device randomly selected one smoker in the household to participate in the survey. Respondents had to be 18 years or older to be selected. If the selected smoker was unavailable to participate at the time of the visit, a second smoker was randomly selected.
This random walk methodology, while not rigorous, is the most affordable sampling method. Respondent uses cigarettes daily or less than daily (round 1 only) use_cigarettes_r2 Respondent uses cigarettes (round 2 only) use_rolled_r2 Respondent uses roll your own (round 2 only) use_cigars_r2 Respondent uses cigars (round 2 only) use_pipes_r2 Respondent uses pipe tobacco (round 2 only) use_snuff_r2 Respondent uses snuff (round 2 only) use_chew_r2 Respondent uses chewing tobacco (round 2 only) use_ecigarettes_r2 Respondent uses e-cigarettes (round 2 only) use_water_r2 Respondent uses waterpipe (round 2 only) freq_daily_cigarettes Daily cigarette consumption (daily smokers only) freq_daily_rolled Daily use of roll your own by respondent freq_daily_cigars Daily use of cigars by respondent freq_daily_pipes Daily use of pipe tobacco by respondent freq_daily_snuff Daily use of snuff by respondent freq_daily_chew Daily use of chewing tobacco by respondent freq_daily_ecigarettes Daily use of e-cigarettes by respondent freq_daily_water Daily use of waterpipe by respondent freq_weekly_cigarettes Weekly cigarette consumption (less than daily smokers only) freq_weekly_rolled Weekly use of roll your own by respondent freq_weekly_cigars Weekly use of cigars by respondent freq_weekly_pipes Weekly use of pipe tobacco by respondent freq_weekly_snuff Weekly use of snuff by respondent freq_weekly_chew Weekly use of chewing tobacco by respondent freq_weekly_ecigarettes Weekly use of e-cigarettes by respondent freq_weekly_water Weekly use of waterpipe by respondent start_age_r2 Age started smoking (round 2 only) start_age_daily Age started smoking daily start_years_r2 How many years ago did you first start smoking cigarettes? start_years_daily How many years ago did you first start smoking cigarettes daily? start_age_lessdaily_r1 Less than daily smokers only: Age started smoking (round 1 only) start_years_lessdaily_r1 Less than daily smokers only: how many years ago start smoking (round 1 only) cigcons_wkly_pd Weekly cig consumption converted to daily (derived) cigcons Daily cig consumption, including daily and weekly smokers (derived)  At the beginning of this year (Jan), did you smoke cigarettes? (round 2 only) compare_brands_r2 In January, what brand of cigarettes did you smoke most often? (round 2 only) compare_freq_r2 In January, how many cigarettes PER DAY did you smoke on average? (round 2 only) compare_purchase_r2 In Jan., did you usually buy cartons, packs or loose cigarettes? (round 2 only) compare_change_r2 Changed your cigarette smoking behaviour between Jan. & now? (round 2 only) compare_reason_r2 Reason for change in smoking behaviour between January and now (round 2 only) compare_reason_1_r2 I am trying to reduce the health impact of smoking (round 2 only) compare_reason_2_r2 Cigarettes are becoming more expensive (round 2 only) compare_reason_3_r2 The quality of cigarettes decreased (round 2 only) compare_reason_4_r2 The quality of cigarettes increased (round 2 only) compare_reason_5_r2 I can afford more cigarettes (round 2 only) compare_reason_6_r2 I enjoy smoking more cigarettes (round 2 only) compare_reason_7_r2 Pressure from family or friends (round 2 only) compare_reason_6666_r2 Reason for change: Other (round 2 only) compare_reason_7777_r2 Reason for change: None (round 2 only) compare_reason_9999_r2 Reason for change: Don't know (round 2 only) compare_reason_8888_r2 Reason for change: Refuse (round 2 only) compare_reasonother_r2 Reas. for change in smoking behaviour betw. January -now, other (round 2 only) Note: Spaza shops are informal convenience shops.

Variable definition
There are 157 variables in the final dataset ( Table 2 ). The variables for the two rounds are standardized, with the inclusion of a variable labelled "round", which indicates when the data were collected (round 1 or round 2). Observations are appended to produce the final dataset. There were three variables that only appeared in round 1 and 35 variables that appeared only in round 2 (these variables are suffixed with "_r1" and "_r2") . The difference in these variables across the rounds is due to minor edits to the questionnaire.

Ethics statement
The University of Cape Town's Research in Ethics Committee (Faculty of Commerce) approved this research (REC2017/10/011). Informed consent was obtained from all participants.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.