Impact of 10-valent Pneumococcal Conjugate Vaccine (PCV10) on nasopharyngeal carriage in children 2 years of age: Data from a four-year time series cross-sectional study from Pakistan

The dataset described in this paper was collected for a time-series cross-sectional study exploring the impact of 10-valent Pneumococcal Conjugate Vaccine (PCV10) on nasopharyngeal (NP) carriage in children under 2 years of age from a rural population in Sindh, Pakistan. The study was carried out in two union councils of Matiari - Khyber and Shah Alam Shah Jee Wasi (Latitude 25.680298 / Longitude 68.502711). Data was collected on socio-demographics, clinical characteristics and vaccination status using android phone-based application. NP samples were collected using standard World Health Organisation (WHO) techniques, culture and serotyping was done using sequential Multiplex PCR described by Centre for Disease Control, USA. We looked at the carriage rate of vaccine type (VT) and non-vaccine type (NVT) serotypes over time in vaccinated and unvaccinated children. We additionally looked at the predictors for pneumococcal carriage. The uploaded dataset, available on Mendeley data repository (Nisar, Muhammad Imran (2021), “Impact of PCV10 on nasopharyngeal carriage in children in Pakistan”, Mendeley Data, V1, doi:10.17632/t79h6g97gr.1), has 3140 observations in CSV format. Additional files uploaded include a data dictionary and the set of questionnaires. The dataset and accompanying files can be used by other interested researchers to replicate our analysis, carry similar analysis under varying set of assumptions or perform additional exploratory or metanalysis

of vaccine type (VT) and non-vaccine type (NVT) serotypes over time in vaccinated and unvaccinated children. We additionally looked at the predictors for pneumococcal carriage. The uploaded dataset, available on Mendeley data repository (Nisar, Muhammad Imran (2021), "Impact of PCV10 on nasopharyngeal carriage in children in Pakistan", Mendeley Data, V1, doi: 10.17632/t79h6g97gr.1 ), has 3140 observations in CSV format. Additional files uploaded include a data dictionary and the set of questionnaires. The dataset and accompanying files can be used by other interested researchers to replicate our analysis, carry similar analysis under varying set of assumptions or perform additional exploratory or metanalysis © 2021 The Author(s

Value of the Data
• This is a large data set (n = 3140) of children under 2 years age from a low income, rural south Asian setting from where data on pneumococcal epidemiology is limited. • The dataset and accompanying files can be used by other researchers interested in the field of pneumococcal epidemiology, public health scientists and mathematical modellers. • Data can be used to replicate the current analysis, perform same analysis under varying set of assumptions, carry additional exploratory analysis, mathematical modelling and individual level meta-analysis.

Data Description
Uploaded dataset includes observation on 3140 children under 2 years of age enrolled from October 2014 to September 2018. There are four files which are uploaded.
1. "Raw_dataset.csv 2. "Codebook.xls" 3. "DIB_questionaire.docx" 4. "Lab_questionnaire.docx" The first two files are to be read in conjunction. There are a total of 92 fields in the "Raw_dataset.csv". The field (variable) names, variable label, variable type and codes are provided in in "Codebook.xls". "DIB_questionaire.docx" and "Lab_questionnaire.docx" are the two questionnaires that were used to collect this data. Section A in "DIB_questionaire.docx" documents the eligibility of the study participants and consent given. Section B collects information on sociodemographic variables, past clinical history, and a brief clinical exam. Section C collects information on vaccination history of the child. "Lab_questionnaire.docx" documents information on time of collection and receipt as well as condition of the samples upon receipt at the Infectious Disease Research Laboratory. Fields q12, q13, q14 and q19 in "Raw_dataset.csv" describe the results of the laboratory analysis.

Study design and Study Area
This was a time series cross sectional study in which was carried out in two Union Councils (Khyber and Shah Alam Shah Jee Wasi Seekhat) of Matiari in Sindh, Pakistan, with a total population of around 88,739. Matiari is a rural agricultural district in Sindh province located around 180 km from Karachi. Its total population is 0.7 million residing in more than 90,0 0 0 households in 1,600 villages (2018 census). The household density is 6.7. Only 33% of the population is literate. This site was chosen because of its rural settings with minimum access to health and a high burden of maternal, neonatal and post neonatal illnesses.

Target population
Target population was all children under the age of 2 years residing in the study area. Children were randomly selected from a continuously updated line listing every week. Fifteen children who met the inclusion criterion were enrolled per week from October 2014 to September 2018. Children with nose and throat abnormalities or with a serious illness requiring hospitalization were not enrolled. Fig. 1 describes the study area along with GIS coordinates mapped for the enrolled children.

Collection of data
Data was collected by trained study personnel on a custom-made data capturing application for android phones and tablets. Data was collected on household demographics, recent clinical history including hospitalization and outpatient visits, and vaccination status based on caregiver report or vaccination cards wherever available. The online form was created in .xml format using Open Data Kit (ODK) software. To prevent data entry errors, checks were built for consistency, range and logic where applicable.
Nasopharyngeal specimens were collected using established World Health Organization's (WHO) consensus methods and transported at 2-8 °C from the field site to the Infectious Disease Research Laboratory (IDRL) in Karachi within 8 hours of collection [2] . In the lab, samples were vortexed for 10-20 s to disperse the organism and then frozen at -70 °C in an upright position till time of further processing. For culture, samples were thawed, vortexed and 200 μL of a sample was added to a mixture of 1 mL rabbit serum, 5 mL Todd Hewitt broth with 0 • 5% yeast extract and incubated for 6 h at 37 °C. After this, one loop full (10 ul) was inoculated onto bilayer sheep blood and colistin-nalidixic-acid-agar and streaked for isolation of streptococci. After 18-24 h, plates were examined for the appearance of alpha-hemolytic colonies and susceptibility to optochin and bile solubility. Serotypes were detected using the conventional sequential multiplex PCR assay and further confirmation was done by monoplex PCR [3,4] .

Data storage
Data was stored in a secure server at the Aga Khan University and updated daily. An additional backup was maintained at the Data Monitoring Unit (DMU) of Department of Paediatrics and child Health. Only study investigators and data manager had access to the data.

Data analysis
For the purpose of analysis, epidemiological data was merged with the laboratory data and analysed using Stata version 15 and MS Excel. Variable labels and value labels were created; all categorical variables like gender, illness of past two weeks, vaccination status etc. were presented as frequencies and percentages while continuous variables like age were presented as means and standard deviation. Logistic regression analysis was performed to identify predictors of colonization with a PCV10 serotype. All variables with a p-value less than 0.25 in the bivariate analysis were used to build a multivariable model. A backward selection procedure was used to derive a parsimonious model for retaining only variables significant at p -value ≤ 0.05. Unless otherwise indicated, p < 0.05 was considered statistically significant.

Ethics Considerations
Ethical approval was obtained from Aga Khan University's Ethical Review Committee (3181-Ped-ERC-14). Written informed consent from caretakers of the children was obtained before they were enrolled in the study.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.