Cancer in Kenya: types and infection-attributable. Data from the adult population of two National referral hospitals (2008-2012)

Background: Cancer in Africa is an emerging health problem. In Kenya it ranks third as a cause of death after infectious and cardiovascular diseases. Nearly 31% of the total cancer burden in sub-Saharan Africa is attributable to infectious agents. Information on cancer burden is scanty in Kenya and this study aimed to provide comprehensive hospital based data to inform policies. Method: A cross-sectional retrospective survey was conducted at Kenyatta National Hospital (KNH) and Moi Teaching and Referral Hospital (MTRH) from January 2008 to December 2012. Data was obtained from the patients files and the study was approved by the KNH/University of Nairobi and MTRH Ethics and Research Committees. Results: In KNH, the top five cancers were: cervical (62, 12.4%), breast (59, 11.8%), colorectal (31, 6.2%), chronic leukemia (27, 5.4%) and stomach cancer (26, 5.2%). Some 154 (30.8%) of these cancers were associated with infectious agents, while an estimated 138 (27.6%) were attributable to infections. Cancers of the cervix (62, 12.4%), stomach (26, 5.2%) and nasopharynx (17, 3.4%) were the commonest infection-associated cancers. In MTRH, the five common types of cancers were Kaposi’s sarcoma (93, 18.6%), breast (77, 15.4%), cervical (41, 8.2%), non-Hodgkin’s lymphoma (37, 7.4%) and colorectal, chronic leukemia and esophageal cancer all with 27 (5.4%). Some 241 (48.2%) of these cancers were associated with infectious agents, while an estimated 222 (44.4%) were attributable to infections. Kaposi’s sarcoma (93, 18.6%), cancer of the cervix (41, 8.2%) and non-Hodgkin’s lymphoma (37, 7.4%) were the commonest infection-associated cancers. Conclusion: Our results suggest that 30.8% and 48.2% of the total cancer cases sampled in KNH and MTRH respectively were associated with infectious agents, while 27.6% and 44.4% were attributable to infections in the two hospitals respectively. Reducing the burden of infection-attributable cancers can translate to a reduction of the overall cancer burden.


Reviewer Status
Invited Reviewers

Amendments from Version 4
We have modified Table 2, Table 5 and Table 6. This is because some of the rows in Table 5 and Table 6 had been interchanged unintentionally. We are thankful to the reviewer for highlighting this observation.

Background
Cancer in Africa is an emerging health problem where about 847,000 new cancer cases and 591,000 deaths occurred in 2012, with about three quarters of these occurring in the sub-Saharan region 1 . In Kenya, cancer ranks third as a cause of death, after infectious and cardiovascular diseases, and in 2012 there was an estimated 37,000 new cancer cases, and 28,500 cancer deaths reported 2 . Infectious agents are an important cause of cancer, particularly in less developed countries. According to the International Agency for Research on Cancer (IARC), 11 infectious agents have been classified and established as carcinogenic agents in humans namely: Helicobacter pylori, hepatitis B virus (HBV), hepatitis C virus (HCV), human immunodeficiency virus type 1 (HIV-1), human papillomavirus (HPV), Epstein-Barr virus (EBV), human herpes virus type 8 (HHV-8; also known as Kaposi's sarcoma herpes virus), human T-cell lymphotropic virus type 1 (HTLV-1), Opisthorchis viverrini, Clonorchis sinensis, and Schistosoma haematobium 3 .
Nearly 31% of the total cancer burden in sub-Saharan Africa is attributable to infections 4 . Specifically, H. pylori, HPV, HBV, and HCV are the leading infectious agents contributing to the global cancer burden. When summed together they account for 92% of all infection-attributable cancers worldwide with 35.4%, 29.5%, 19.2%, and 7.8% respectively 4 . The rise of the HIV epidemic concentrated in low and middleincome countries has resulted to an increase in HIV-associated malignancies 5 . In Kenya, a HIV prevalence of 5.6% (95% CI: 4.9 to 6.3), and HIV incidence of 0.5% (95% CI: 0.2 to 0.9), corresponding to an annual HIV transmission rate of 8.9 per 100 HIV-infected persons has been reported 6 . This population is at greater risk of acquiring HIV associated cancers such as Kaposi's sarcoma (KS), non-Hodgkin lymphoma (NHL) and invasive cancer of the cervix (ICC) 7 . According to a recent study, KS was the second largest contributor to the cancer burden in sub-Saharan Africa 4 , while NHL is the second most common malignant disorder associated with HIV infection worldwide 7 . Previous studies done in Africa have been discussed in details in the methodology section.
Information on the burden of cancer and especially the burden attributable to infections is sparse in Kenya. In this study, we highlight the results from the adult population of two National referral hospitals in Kenya for five-year period between January 2008 to December 2012.

Methods
This was a retrospective cross-sectional study conducted at Kenyatta National Hospital (KNH) and Moi Teaching and Referral Hospital (MTRH) in Kenya. Initially the study targeted four teaching and referral hospitals located in the former Nairobi, Rift Valley, Coast and Nyanza Provinces, but the authorization to access medical records was only granted by the above mentioned hospitals. Kenya was divided into eight provinces (see map from the Kenya bureau of statistics 8 ) before the new constitution of Kenya that came into force in 2013. KNH is located in the former Nairobi Province which is the capital and the largest city of Kenya and according to the last official census taken in 2009 it had a population of 3,138,369 whose number has since grown to approximately 3.5 million people. KNH has a capacity of 2,000 beds and attends to an annual average of 70,000 inpatients and 600,000 outpatients. The Hospital has 50 wards and 24 operating theatres as well as 24 consultant clinics with over 6000 staff members. As a referral hospital, KNH offers specialized quality health care to patients from all over the Nation, East and Central Africa Region. MTRH is located in the former Rift Valley Province (Kenya's largest Province) with a population of 10,006,805 8 . MTRH has a bed capacity of about 991 patients, an average number of 1200 patients at any time and about 1500 out patients per day. The Hospital serves residents of Western Kenya Region (representing at least 22 Counties), parts of Eastern Uganda and Southern Sudan. AMPATH-Oncology centre is located at MTRH which evolved from a HIV program to a pediatric cancer program and later to cancer care program. It is integrated in the MTRH department of Hematology and Oncology 9 . At AMPATH oncology centre more than 1000 cancer patients are treated each month with about 45 oncology specialists. According to the Kenyan network of cancer organizations KNH and MTRH are amongst the oldest and largest public referral hospitals with cancer treatment services. Both hospitals offer screening, diagnosis and treatment services. There are few radiation machines and according to Strother et al., 2013, there were two cobalt-60 radiation oncology machines, both housed in KNH 9 . The two national hospitals are also the largest source of data for the two main cancer registries in Kenya. KNH provides data to the Nairobi cancer registry located in Nairobi while MTRH provides data to the Eldoret cancer registry located in Rift Valley. Recently, with an aim to decongest the national hospitals more (private, mission and public) health facilities have been equipped with the cancer services. However, because of the affordability of the services many patients opt for the public health facilities.
Data source Data for this study was obtained from hospital records of patients as this was the most convenient data source for all the information targeted.

Inclusion criteria
• Hospital records of patients diagnosed with cancer during the period January 2008 to December 2012.
• Records of patients above the age of 18 at the time of diagnosis and with a confirmed diagnosis either by histology, radiology or haematology.

Exclusion criteria
• Hospital records with incomplete data or not meeting the above criteria.

Sample size calculation
The sample size (n) was calculated according to the guidelines outlined for calculating sample sizes for cross-sectional studies (qualitative variable) as explained by 10-12 and as shown below; Where: p = Prevalence of condition or health state or the expected prevalence or proportion or estimated proportion of a disease d = degree of precision of the estimate or the absolute error z = Z statistic for a level of confidence or is the normal distribution critical value for a probability of α/2 in each tail. For a 95% CI, z=1.96. A 95% level of confidence and a ±5% (0.05) degree of precision were considered.
The prevalence of cancer in Kenya was not known at the time of the study and therefore a prevalence of 50% (0.5) was used in calculating the sample size. Elsewhere, it has been highlighted that when d=0.05 and a z=1.96, using a p of 0.5 (50%) yields the highest estimates for n (sample size) 13 .
A sample size of 384 was estimated as the minimal necessary to achieve the required power of the study. As the data from this study was collected at a single point in time, a 30% (116) non-response allowance was factored in resulting to a final sample size of 500. This allowance would help us in case one of the 384 files would be found incomplete after the completion of the data collection period. In KNH an estimated 17,584 (inpatient and outpatient) cancer files were reported while in MTRH 4304 (inpatient) cancer cases were reported during the five year period. Due to cost and time constraints, it was only feasible to study 500 files as calculated from the sample size. To obtain the final number of 500 files from the totals in each hospital, a proportional stratified sampling method was used as described by 11. (see Table 1) To randomly select the calculated proportions of files for each year obtained in the previous step, a systematic random sampling method was used as described by 11. At KNH, the files were all available at the health information department and in the databases where systematic random sampling was an automated process. At the time of data collection, MTRH was in the process of updating the database and only 2012 files were available at the health information department. The files for 2008 to 2011 were obtained from the oncology centre at the Academic Model for the Prevention and Treatment of HIV/ AIDS (AMPATH), and convenient sampling was used to achieve the required number of files. The reason for choosing convenient sampling was because we could not establish the total sampling frame to do the systematic randomization. However, an indirect systematic randomization was applied to select the files needed from the total files accessed per year and not the sampling frame. By the end of the data collection period, all the targeted 500 files were available from both the hospitals. In case a selected file was found incomplete during the data collection period, it was replaced by the next file following it after randomization.

Data collection
A pre-designed questionnaire was used to abstract the information (Supplementary File 1). The information abstracted Although, the study aimed at conducting the research at four referral hospitals in Kenya, only two hospitals granted permission to access the patient files while the other two refused and the reasons for refusal were unknown. The study was a minimal risk study and patient consent was not sought since there was no direct patient involvement but a retrospective review of patients' files. However, the patient identifying information was not included in the data collection forms.

Data handling and analysis
Data was entered into statistical package for social sciences programme (IBM-SPSS) version 23 that was labeled using the exact fields as the questionnaires and the excel files. Quality control checks were performed to prevent double entry and to ensure accurate entry of the data. The proportions of cancer cases were analyzed with reference to the study site, sex and the age group. GraphPad Prism 6 (GraphPad Software Inc., San Diego, CA, USA) was used to draw the figure images.  (Table 2).
In South Africa, ova of Schistosoma haematobium were seen in microscopic sections of bladder tumours in 85% of the patients with squamous cell carcinoma, in 50% of those with undifferentiated tumours and adenocarcinoma, in 17% of those with mixed tumours or sarcoma, and in only 10% of the patients with transitional cell carcinoma (all classifications of the bladder tumours) 36 . However, an AF of 41% was used derived from endemic areas in Africa 4 .
We did not come across any studies in Africa showing the prevalence of Opisthorchis viverrini and Clonorchis sinensis in cancer of the bile duct. Similarly the AF could not be obtained. Burkitt's lymphoma (BL) was first described in Eastern Africa where the highest incidence and mortality rates are seen. It has been associated with EBV and affects mainly children, where boys are more susceptible than girls 31 . However, we did not come across any cases of BL or Adult T-cell leukaemia/lymphoma (ATLL) from our study.  Table 3).

Cancer cases by age groups
From the results generated from both hospitals, it was suggestive that some cancers were predominant in specific age groups. Acute leukemia, NHL, cancer of the bone, genitalia,  HL and nasopharyngeal were predominant in the age group of 24 years and below. Cancer of the cervix was predominant in the age group of 22 to 44 years while breast and bile duct cancers were predominant between 45 to 64 years. The age-group of 65 to 84 years was predominated by esophagus, prostrate, larynx, endometrium and lip and oral cavity cancers. A larger age group ranging from 45 to 65 years was predominated by colorectal, stomach, ovary, lung and bronchus, liver, bladder and multiple myeloma cancers ( Table 5 and Table 6). This information is particularly important for estimating the age to go for the cancer checkups.  (Table 7).

Discussion
In females (n=300) the five most common cancers in KNH were cervical, breast, ovarian, chronic leukemia, endometrial and stomach while in MTRH (n=282), they were breast, cervical, Kaposi's sarcoma, non-Hodgkin's lymphoma and cancer of the ovary. These results are comparable with the data obtained from a previous study conducted retrospectively in Tenwek Hospital, in Bomet District, western Kenya in the period of 1999 to 2007 that showed that the common types of cancer in women were cervical, breast, stomach, uterus and esophageal 37 . Similarly an incidence rates study done using the Nairobi cancer registry data found breast, cervical, esophageal, large bowel stomach and ovarian cancers to be the most incident 38 . The high number of cervical cancer could reflect a potential higher prevalence of HPV infection, low screening rates or late detection of the disease. Most patients with cervical cancer were between in the age groups of 24 to 44 and 45 to 54      showed that the most common cancers in men were esophagus, stomach, prostate and colorectal and non-Hodgkin's lymphoma (NHL) 37 . Similarly an incidence rates study using the Nairobi cancer registry data found prostate, esophageal, large bowel, stomach, oral and liver cancers to be the most incident 38 . Other studies show that esophageal is the leading cause of death among both men and women in East Africa 37,39,41 . The majority of the esophageal cancers were in patients aged 65 to 84 years, followed by 45 to 64 years and was more common in males than women. Some of the risk factors independently associated with esophageal cancer (P < 0.05) identified from a study conducted at MTRH were low socio-economic status, smoking, alcohol consumption, tooth loss, cooking with charcoal and firewood, consumption of hot beverage and use of a traditional fermented milk referred to as mursik 41 .
Elsewhere, a study aiming to determine the burden and pattern of cancer in Western Kenya by use of data from the Eldoret cancer registry, from 1999 to 2006, found out that about 21% of the patients had haematological malignancies where lymphomas were the most common (11.9%) followed by acute and chronic leukemia with 4.0% and 3.2% respectively. Esophageal (10.5%), breast (6.2%) and Kaposi's sarcoma (5.9%) were the top most non-haematological cancers. From our study, KS, NHL and chronic leukemia were high especially in MTRH 42 . Chronic leukemia was specifically the 4 th most common type of cancer in KNH and 5 th most common in MTRH while acute leukemia was also high and highest in the age-group of 24 years and below. It was suggestive that prostate cancer commonly affected males in the age-group of 65 to 84 years in both hospitals and with notable higher occurrence also in the age-group of 45 to 64 years. These differences could possibly be attributed to lifestyle choices and family history of the disease which are among the risk factors associated elsewhere with the cancer 43 .  5,7 . The high numbers of Kaposi´s sarcoma, non-Hodgkin´s lymphoma and cervical cancer at MTRH could have been influenced by the source of data that was obtained from the AMPATH-Oncology centre that evolved from an existing HIV program 9 . The information on the HIV status of the patients from whose files were used, was not obtained to accurately link the associations. In Kenya, a HIV prevalence of 5.6% (95% CI: 4.9 to 6.3), and HIV incidence of 0.5% (95% CI: 0.2 to 0.9), corresponding to an annual HIV transmission rate of 8.9 per 100 HIV-infected persons has been reported 6 . This HIV pandemic could have influenced the high number of HIV-associated malignancies as documented by 4, 7, 46.
A high number of liver cancer cases were observed in males rather than females. This can be explained by higher prevalence of risk factors for liver cancer in men as compared to women such as higher alcohol consumption and infection with HBV or HCV 14 .

Limitations
Our study had several limitations. First, causality could not be proven and we could not ascertain that a given cancer was actually caused by an infectious agent including the high number of HIV-associated malignancies at MTRH. Therefore, we used AF generated from other studies. Information on associations is sparse in Kenya which opens up new avenues for future research studies. Our study used a sampling faction of the total number of files available at the two hospitals as compared to studying all the files that could have influenced our results. For failure of not knowing the whole sampling frame, there was a probability of introducing selection bias by the use of con-venient sampling method to select the files at MTRH. This could result in over or under-representation of the cases. Future studies should focus more on population based data or use proportions of individual cancers to calculate the sample size. Population based data would be more reflective of the overall population. Our choice of study population was influenced by the fact that cancer registration in Kenya was in its fairly early stages of development at the time the study was done, and the use of hospital based data seemed suitable. The percentage attributable to infections for certain cancers, such as BL, adult T-cell leukemia or bile duct cancers could not be calculated because we did not come across any cases of the first two cancers while the AF of bile duct was unobtainable.

Conclusion
Our study presented a picture of the burden of cancer and infection-attributable cancer from a hospital point of view. Despite the limitations, the role played by infectious agents in contributing to the overall cancer burden was highlighted.
Controlling for the infectious agents could translate to a

Supplementary File 1: Data collection form
Click here to access the data.
significant reduction in the cancer burden. Further research is warranted to prove causality between infection-attributable cancers and the infectious agents in Kenya as this may provide new avenues for effective cancer prevention.

Data availability
The data underlying this study is available from the Open Science Framework (OSF) The issue on the data collection and availability of the HIV status of the patients should be also mentioned in the Methods.
The sentence below in the overall Discussion is not clear: "The information on the HIV status of the patients from whose files were used, was not obtained to accurately link the associations." Could you rephrase it? Here is a suggestion: "Unfortunately, the information on the HIV status of the patients from whose files were used was not collected and therefore it was not possible to accurately link a cancer site to HIV infection." No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Cancer epidemiology and infections.

© 2019 Casabonne D. This is an open access peer review report distributed under the terms of the Creative Commons
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited.

Delphine Casabonne
Unit The authors addressed most of the issues. However, further points should be corrected and/or specified: Main comments: The authors mentioned that "The files for 2008 to 2011 were obtained from the oncology centre at ".

The authors the Academic Model for the Prevention and Treatment of HIV/ AIDS (AMPATH)
should clarify if all patients selected in this center for these years (81% of the sample for MTRH) were HIV+ or the authors should report if the HIV status of these patients was unknown. This information should be added to the manuscript to help interpreting the higher and different cancer proportions in MTRH and KNH.
In Table 2 (Previous prevalence studies on infectious agents and cancers): The virus HTLV should be associated with T-cell lymphoma and not Burkitt lymphoma. Burkitt lymphoma should be associated with EBV.
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Cancer epidemiology and infections.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 02 Nov 2019 , University of Nairobi, Nairobi, Kenya Lucy Wanjiku Macharia Dear Dr. Delphine Casabonne, The authors are thankful to you for reviewing our manuscript and for the suggestions. We have modified the manuscript and double-checked the tables to incorporate the suggestions. Kindly find below the point-by-point response to the queries. 1 2 The authors mentioned that "The files for 2008 to 2011 were obtained from the oncology ".

centre at the Academic Model for the Prevention and Treatment of HIV/ AIDS (AMPATH) The authors should clarify if all patients selected in this center for these years (81% of the sample for MTRH) were HIV+ or the authors should report if the HIV status of these patients was unknown. This information should be added to the manuscript to help interpreting the higher and different cancer proportions in MTRH and KNH.
We thank the reviewer for the question and the suggestion. We do not know the HIV information of the patients whose files were obtained from AMPATH. We have highlighted the suggestion in the discussion and limitations section.

In Table 2 (Previous prevalence studies on infectious agents and cancers): The virus HTLV should be associated with T-cell lymphoma and not Burkitt lymphoma. Burkitt lymphoma should be associated with EBV.
We agree with the reviewer's observation. We have modified Table 2 and corrected the misclassification. We are thankful for the observation.

Table 6 (Cancer cases by age groups in Moi Teaching and Referral Hospital (MTRH) from 2008-2012) should be checked again. For instance the total number of KS across the age for MTRH is not correct (N=2) and it should be N=93. The authors should check the tables again.
We are grateful to the reviewer for the keen observation. We have checked the tables and made the necessary modifications to Tables 5 and 6. We are thankful for the suggestion.

Minor comments:
In the abstract: Replace "stomach cancer 26 (5.2%)" by "stomach cancer (26, 5.2%)". We thank the reviewer for the observation. We have modified the statement highlighted above in the abstract section. be cited in some way? Or if you have provide more specificity about how and where those machines are intended to go, that will be helpful. Also in Methods: you introduce AMPATH without clarifying what it is. I would recommend providing a sentance of background of their role at MTRH.
In Discussion: At KNH, there is a variability in breast cancer proportions that may be interesting to look at. From 2008-2011, the proportion was between 10-15%. In 2012, it drops to below 5%. What happened that year? Were there really fewer cases? Was there an issue with data quality (more incomplete records, etc.)?
In Discussion: You have this line: "The high number of Kaposi's sarcoma, non-Hodgkin's lymphoma and cervical cancer could have been influenced by the association of the cancers with HIV" regarding the high number of HIV-related cases at MTRH. I think this actually requires further discussion, as it pushes the infection-related rates way above average. The presence of the AMPATH program, and its specialization in treating patients with HIV should be noted in discussion, as explanation. Because these are hospital-based data, this is especially important to note as this hospital attracts many HIV positive patients, most likely.
I would consider an examination and a citation to the article listed below, "Viral-associated malignancies in Africa: are viruses 'infectious traces' or 'dominant drivers'?" . PubMed Abstract Publisher Full Text

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: My areas of expertise include evaluative research methods, cancer control, data 1 Reviewer Expertise: My areas of expertise include evaluative research methods, cancer control, data analysis, and health systems. I have noted that I would prefer is a qualified statistician or epidemiologist confirmed the analysis and conclusions section of this paper. While it reads as very acceptable to me, I do not consider myself an expert in this area.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 03 Oct 2019 , University of Nairobi, Nairobi, Kenya Lucy Wanjiku Macharia Dear Dr. Kalina Duncan, Thank you for reading our manuscript, for the suggestions and the approval status. Kindly find below the response to the queries.

Comments
In the methods section: You note there are "few" radiation machines but more may be coming. It would help the context here to report the number and perhaps remove the bit about more coming unless it can be cited in some way? Or if you have provide more specificity about how and where those machines are intended to go, that will be helpful. We thank the reviewer for the comment. We have edited the statement, included the number of radiation machines and removed "the more are coming" bit due to lack of a proper citation.

Also in Methods: you introduce AMPATH without clarifying what it is. I would recommend providing a sentence of background of their role at MTRH.
We are thankful for the suggestion. We have included a sentence to connect AMPATH and MTRH.
In Discussion: At KNH, there is variability in breast cancer proportions that may be interesting to look at. From 2008-2011, the proportion was between 10-15%. In 2012, it drops to below 5%. What happened that year? Were there really fewer cases? Was there an issue with data quality (more incomplete records, etc.)? We agree with the reviewer's concern. The variability from the proportions of the breast cancer at KNH could have been influenced by various factors including the sample size used as compared to sampling the whole frame which was not feasible due to cost constraints. Similarly, maybe there were high numbers of patients with breast cancer that visited the hospital in certain years as compared to others. We have removed the proportion section from the results and discussion in line too with the suggestion from the previous reviewer. We are thankful for the keen observation.

In Discussion: You have this line: "The high number of Kaposi's sarcoma, non-Hodgkin's lymphoma and cervical cancer could have been influenced by the association of the cancers with HIV" regarding the high number of HIV-related cases at MTRH. I think this actually requires further discussion, as it pushes the infection-related rates way above average The presence of the AMPATH program, and its specialization in treating patients
. with HIV should be noted in discussion, as explanation. Because these are hospital-based data, this is especially important to note as this hospital attracts many HIV positive patients, most likely. I would consider an examination and a citation to the article listed below, "Viral-associated malignancies in Africa: are viruses 'infectious traces' or 'dominant drivers'?".

References
The details of the calculations are not needed in the text. Please delete these two sections of the text (page 4): a) "Therefore, n=1.96^2..." to "n=384". b) In the text from "N1= 3168..." to "... ∑ni=500". This information is provided in Table 1.
More details should be given in relation to the selection of the records. What is the "non-response allowance" in this cross-sectional design? Were all 500 selected records available for both hospitals? If a selected record had missing data (based on the pre-designed questionnaire) was it replaced by another randomly selected record? This part is unclear; please clarify.
For MTRH, the origin of the 2008-2011 files is unclear. Which "convenient sampling" was used for the 390 files of MTRH? Could the authors explain better the procedure of selection for this hospital? How could biases be introduced for interpretability? How is representativeness kept?
Please modify the methods section accordingly.

Results:
As mentioned by previous reviewers, the higher prevalence of KS and NHL in MTRH might reflect the tight relationship of this hospital with AMPATH/HIV. Also, the convenience sampling method used for the selection of the records for this hospital might accentuate the number of KS and NHL cases observed in this hospital. Please add to the current Table 7 the estimates of the 2 hospitals altogether and mention the overall 36% of cancers attributable to infections agents.

Figures 2 & 3:
The titles and the figures are unclear and inconsistent with the text. From the results section, in KNH hospital, the first commonest cancers in males were prostate (11.5%) and laryngeal (9.5%) cancers. Why is colorectal cancer plotted in Figure 2? In page 9, why do the authors mention access to treatment in relation to the histograms in the result section? Also, the different units in the y-axis are misleading. Finally, due to the small sample size by year and the use of convenience sampling for MTRH, I would suggest not to show Figures 2 and 3. Please delete the text in the results section associated with these Figures (page 9, "Proportions of the top most cancer cases"). :

Minor comments
Since penile cancer is one of the cancer sites associated with infections please provide the number of cases of penile cancers that were observed in each hospital. If none were observed please provide this information in the text.
The authors mentioned 11 infections in page 9 but there are less infections in Table 2. Please modify accordingly.
Can the referral hospital for KNH be MTRH? And vice versa?

Discussion:
As mentioned by previous reviewers, the authors gave too much emphasis on comparing the two hospital results. The presence of KS and NHL in the top four cancers for MTRH should be put in perspective with the hospital origin of the patients (HIV center). The authors should discuss this point in more detail. Given the method of selection of the patients, KNH might reflect better the overall prevalence.
In the limitations, the representativeness of the selected patients should be addressed. How In the limitations, the representativeness of the selected patients should be addressed. How complete are the hospital registration of the patients? Could any biases be introduced at this stage? Are some cancer subtypes less likely to be recorded in these hospitals? Is misdiagnosis a potential issue? Minor comments: In page 13, "Similarly an incidence using the Nairobi cancer registry..." , please correct this sentence. Do you mean incidence rates?
Please delete all references to the result part (Tables X, figures Y, ...) in the Discussion section.

Summary:
This article describes the cancer prevalence and cancers attributable to infections in two hospitals from Kenya. Due to the selection methods there are important issues to take into account when interpreting the results -in particular in relation to HIV-associated cancers -that have not been addressed adequately.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? No
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Cancer epidemiology and infections.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 03 Oct 2019 , University of Nairobi, Nairobi, Kenya Lucy Wanjiku Macharia Dear Dr. Delphine Casabonne, We are thankful to you for reading our manuscript, for the positive criticism and the suggestions. Kindly find below the answers to the questions highlighted.

Title:
Title: Please add in the title the words "adult" and the years "2008-2012". We have added the words "adult" and years to the title. Thank you for the suggestion.

Abstract: Please indicate in the abstract and methods not only the selected years (2008-2012) but also the months.
We have modified the abstract and method section to include the years and months. We are grateful for the suggestion.

Methods: Please indicate in the abstract and methods not only the selected years (2008-2012) but
. also the months We have included the new suggestions to the abstract and methods section. Sample size calculation: This is a case-study and the prevalence of cancers is not the prevalence of cancers in Kenya but the proportion of a specific cancer subtypes within the total number of cancer cases in the hospitals. Proportions estimated in Kenya in different published reports could have been used. For example, in https://gco.iarc.fr/today, the cancer prevalence by subtypes in Kenya for 2018 were estimated to be: breast: 13.6%, cervix: 11.5%, oesophagus: 8.4%, prostate: 5.8%, colorectum: 4.8%, etc. It might therefore be more informative to give the number of patients needed from different cancer subtype proportions (p=0.01, 0.05, 0.10, 0.15 and  0.20).
We are thankful for the comment. Our objective was to sample 500 files from each hospital and identify how many cancer types were in the 500 selected files as a representative sample. This is because we could not afford to sample the whole population. For this, the we used in the sample p size calculation was for expected prevalence or proportion or estimated proportion of a disease p (Cancer) as explained by references (9 to 11) which we assumed a p of 50% for not knowing the prevalence at the time which is also acceptable as explained by Arya et. al., 2012 (Reference 12). This is the sample size that was also approved by the ethical committees. To apply the proportions of individual cancer type, we would need to know the information on the total numbers of the specific cancers in both the hospitals, information that was unavailable. Lastly, as we cannot change the sample size calculation strategy at this stage, we can only highlight how we arrived at the sample size of 500 files and highlight the new suggestion and a potential limitation. We are thankful for the suggestion.
The details of the calculations are not needed in the text. Please delete these two sections of the text (page 4): a) "Therefore, n=1.96^2..." to "n=384". b) In the text from "N1= 3168..." to "... Σni=500". This information is provided in Table 1. The calculation details highlighted above have been removed from the text. We are thankful for the suggestion More details should be given in relation to the selection of the records. What is the "non-response allowance" in this cross-sectional design?
We have inserted a statement to clarify the meaning of the "non-response allowance" in the context of our study. We appreciate the comment.

Were all 500 selected records available for both hospitals? If a selected record had
Were all 500 selected records available for both hospitals? If a selected record had missing data (based on the pre-designed questionnaire) was it replaced by another randomly selected record? This part is unclear; please clarify. Yes. All the targeted 500 files were available from both hospitals by the end of the data collection period. In case a selected file was found incomplete, it was replaced by the next file following it after randomization. We thank the reviewer for the question.
For MTRH, the origin of the 2008-2011 files is unclear. Which "convenient sampling" was used for the 390 files of MTRH? Could the authors explain better the procedure of selection for this hospital? How could biases be introduced for interpretability? How is representativeness kept? Please modify the methods section accordingly. It was difficult to apply probability randomization as the total sampling frame could not be established. But from the total number of files that we could access from each year, the study files were chosen in a "semi-randomized" manner based on the total number of files that we could access each year and not the sampling frame. Being a non-probability sampling method, bias could be introduced. The bias will be mentioned in the limitations section. We are thankful for the comment.

Results: As mentioned by previous reviewers, the higher prevalence of KS and NHL in MTRH might reflect the tight relationship of this hospital with AMPATH/HIV. Also, the convenience sampling method used for the selection of the records for this hospital might accentuate the number of KS and NHL cases observed in this hospital.
We agree with the reviewer's concern. We have highlighted this as a possible source of the study's limitation and modified the explanation in the discussion section as suggested also by reviewer 4. We appreciate the observation. Table 7 the estimates of the 2 hospitals altogether and mention the overall 36% of cancers attributable to infectious agents. We have added the estimates of the 2 hospitals to table 7. We appreciate the suggestion.

Figures 2 & 3:
The titles and the figures are unclear and inconsistent with the text. From the results section, in KNH hospital, the first commonest cancers in males were prostate (11.5%) and laryngeal (9.5%) cancers. Why is colorectal cancer plotted in Figure 2? In page 9, why do the authors mention access to treatment in relation to the histograms in the result section? Also, the different units in the y-axis are misleading. Finally, due to the small sample size by year and the use of convenience sampling for MTRH, I would suggest not to show Figures 2 and 3. Please delete the text in the results section associated with these Figures (page 9, "Proportions of the top most cancer cases"). We have deleted Figures 2 and 3 together with their associated texts in the manuscript document. We thank the reviewer for the suggestion.

Minor comments:
Since penile cancer is one of the cancer sites associated with infections please provide the number of cases of penile cancers that were observed in each hospital. If none were observed please provide this information in the text. There were two penile cases in MTRH. The numbers have been included in table 7 in the males' section. Thank you for the suggestion.
The authors mentioned 11 infections in page 9 but there are less infections in Table 2.
The authors mentioned 11 infections in page 9 but there are less infections in Table 2. Please modify accordingly. We have included modified table 2 accordingly. Some infectious agents have been listed together under one cancer type. We appreciate the observation.
Can the referral hospital for KNH be MTRH? And vice versa? I strongly believe that both hospitals can receive patients referred from the other. However, I lack the evidence to support my opinion.

Discussion:
As mentioned by previous reviewers, the authors gave too much emphasis on comparing the two hospital results. The presence of KS and NHL in the top four cancers for MTRH should be put in perspective with the hospital origin of the patients (HIV center). The authors should discuss this point in more detail. Given the method of selection of the patients, KNH might reflect better the overall prevalence. We thank the reviewer for the comment. We have modified the infectious agents section where we have highlighted that the high numbers of KS, NHL and cervical cancer could also have been influenced by AMPATH-Oncology as the source of data for MRTH. Selection bias has been addressed in the limitation section.

In the limitations, the representativeness of the selected patients should be addressed. How complete are the hospital registration of the patients? Could any biases be introduced at this stage? Are some cancer subtypes less likely to be recorded in these hospitals? Is misdiagnosis a potential issue?
We thank the reviewer for the concern. We have addressed the selection bias in the limitation section. For the 500 files that we collected, they all had the information we were aiming to get. During the data collection period, if a file had missing information, it was immediately excluded and replaced with the one following if after randomization. Convenient sampling could have led to under-representation or over-representation which we have included in the limitations section. We come across files with incomplete information or missing or even unclear diagnosis but these were excluded from the study.

Minor comments:
In page 13, "Similarly an incidence using the Nairobi cancer registry...", please correct this sentence. Do you mean incidence rates?. We are thankful for the observation. We meant incidence rates and have modified the text. Please delete all references to the result part (Tables X, figures   We thank the reviewer for having reviewed our manuscript, for the suggestions and for the positive criticism. We have revised the manuscript in agreement with the suggestions. The discussion section has been edited to shorten the section. We hope that this new version of the manuscript can be considered for approval. Kindly find below point-by-point response to the comments and queries. Sincerely,

Response:
We have modified the title to make the aim clear and have removed the "trends" from the title. We are thankful for the suggestions 2: Abstract "No comments".

Response:
We are thankful for the positive feedback

Responses:
We are thankful for the questions. There was no objective exclusion of patients under the age of 18 years. It was mainly to circumvent possible ethical debates revolving the use of data from under age group. All the cases had a clinical diagnosis but for them to be included in the study, a confirmatory diagnosis besides clinical, was prerequisite. We agree with the reviewer's concern. An improved description of the hospital facilities has been provided in the methodology section of the new manuscript. However, to get the exact information additional permission would need to be sought from the Hospitals.

Responses:
We are thankful for the suggestions. The period 2008-2012 sample (500/500) has been added in Figure 1. We agree with the reviewer's concern and have modified Fig. 2 and Fig. 3 into a histogram and changed their titles. We used two cancer sites as they are the only ones who gave permission to access their data among all the four referral hospitals targeted at the time of the study. The reasons for refusal by the other hospitals are unknown. 6: Discussion I was expecting a comparison of your data with Eldoret and Nairobi cancer registry. The discussion is long for the data, please make it shorter. ASR are only possible to generate if you have a population based cancer registry database. We are grateful for the suggestion. The discussion section has been modified to shorten the section. There are limited published studies on "Cancer in Kenya" and the few mentioned were among the few obtainable. Specifically, Reference 37 used the Nairobi Cancer Registry data Reference 41 used the Eldoret Cancer Registry data Reference 40 The study was done at MTRH Reference 39 is a Kenya National cancer control strategy report No competing interests were disclosed.

Introduction
The introduction is repetitive and should be edited. Separate paragraphs should be devoted to those infections that are established as causes of cancer while those will weaker associations should be in a separate paragraph. The role of HIV infection in cancer etiology, prevalence and incidence in Kenya deserves more discussion than just mentioning HIV in a list of pathogens. Previous studies of infection-associated cancers in Kenya, East Africa or Sub-Saharan Africa should be discussed briefly to provide more introduction to the field and establish why this study was necessary and what it hopes to accomplish.

Methods
It would be more informative to describe the demographic characteristics of the catchment areas, the pattern of healthcare services including other hospitals, and referral systems for the two hospitals included in this study. While these are the biggest hospitals, they may not see the most cancer patients if specialized cancer hospitals are located in the same region or city.
The rationale, application, and choice of sample size calculation are unclear. This is a retrospective case series. The researchers should analyze all the case records that they can find irrespective of sample size calculation. Further, it is highly unlikely that the prevalence of cancer in adult Kenyan population is 50%.
The justification for randomly selecting case notes to review is not clear. Rather than statistically estimate the number of files, why not obtain the number of patients seen and the number who have a cancer diagnosis directly from the institutions? If this is not possible, the authors should say so.
The randomization procedure is not clear. The authors state that "the first record was selected randomly every year". What is the "first record"? Record of the first patient seen in the institution within a specific year? The first record selected from among all the patients' records for a specific year? If the latter how was the selection done? Were all the records assigned numbers and random number generators used to select records? 2 3 1.
Do the authors have an objective assessment of the completeness of the records of these two institutions?
What is the relationship of the two institutions that refused to grant permission for this study and their locations with the institutions that gave permission for the study? A map showing this information would be informative for international readers The analytic methods used in the paper and described in the data analysis section should be described better.

Results
The authors should be precise in reporting results. " 4304 inpatient cancer cases were available in Around MTRH" is imprecise and unacceptable.
There is a clear difference in the cases of cancer at the KNH compared to MTRH, most probably because one site hosts AMPATH HIV treatment and prevention programs. It is therefore not justifiable to make too much of the differences in the proportion of cancers seen in these two institutions and attempt to extrapolate that to the general population.

Discussion
The interpretation of the result is not justifiable based on the data, methods, and analyses.
Proportions of cancers presenting in specific institutions may not reflect population level epidemiology of the cancers. See Chapter 1 of the Cancer in Africa (Ed) Max Parkin et al.
The authors suggest that their data shows the incidence of the cancers that they described but this is not correct. The authors can describe the pattern of cancers presenting to each of these hospitals over a period of time, but this is not "incidence" of those cancers in the population. The pattern of presentation of cancers to specific hospitals is influenced by many factors. An example is illustrated in their paper where the pattern of cancers presenting at the AMPATH related hospital reflects the focus on HIV treatment and prevention at the institution.

General comments
There are several grammatical and typographical errors in the paper. For example in the Discussion section, Paragraph 3 "Elsewhere, a study aiming determine the burden and pattern of cancer in ……." There is inconsistency in the format of data presentation. For example, age groups were written as 22 to 44 years, and in another 45-65 years and elsewhere 45-64 years. Some sentences are too long. E.g. "Nearly 15% of the global cancer burden is attributable to infectious agents, with two-thirds of infection attributable cancers occurring in the less developed countries and in which infections accounts for nearly one in four cancers." Summary In this paper, the authors described the prevalence of cancers and the proportion of cancers that were associated with infections in two referral hospitals in Kenya. The data was over-interpreted. The methods are inadequately described and the results are poorly presented. The conclusions drawn from the data are not justifiable.
Sincerely, L.W.M Hospital (MTRH), Kenya over a 5 year period from 2008 to 2012. These 2 centers are the oldest and largest national referral hospitals in Kenya. The authors randomly selected some cancer cases reported at these centers within the study period and reviewed the charts for cancer types and numbers of cancers attributable to infections. The study reported that 40.0% to 53.2% of cancers that were seen in these referral hospitals were attributable to infections."

Response:
We are thankful to the reviewers for revising our manuscript, for the positive criticism and for the suggestions. We have revised the manuscript text, tables, figures and reference list as described below and in the revised manuscript. We have also updated our results and discussions section together with additional analysis as suggested.

Response:
We agree with the reviewers' concern. We have applied the PAF/AF standard formula in estimating the number of cancer cases attributable to infections. We are thankful for the suggestion.

Response:
We are grateful for the insightful suggestions. We have edited the introduction as recommended in the new manuscript.
We have included a new paragraph on HIV prevalence and incidence in Kenya together with its We have included a new paragraph on HIV prevalence and incidence in Kenya together with its associated cancers.
Previous studies on infection-associated cancers in sub-Saharan Africa have been expounded in details in the data analysis section.
Comment 3: Methods 3.1 "It would be more informative to describe the demographic characteristics of the catchment areas, the pattern of healthcare services including other hospitals, and referral systems for the two hospitals included in this study. While these are the biggest hospitals, they may not see the most cancer patients if specialized cancer hospitals are located in the same region or city." 3.2 "The rationale, application, and choice of sample size calculation are unclear. This is a retrospective case series. The researchers should analyze all the case records that they can find irrespective of sample size calculation. Further, it is highly unlikely that the prevalence of cancer in adult Kenyan population is 50%." 3.3 "The justification for randomly selecting case notes to review is not clear. Rather than statistically estimate the number of files, why not obtain the number of patients seen and the number who have a cancer diagnosis directly from the institutions? If this is not possible, the authors should say so." 3.4 "The randomization procedure is not clear. The authors state that "the first record was selected randomly every year". What is the "first record"? Record of the first patient seen in the institution within a specific year? The first record selected from among all the patients' records for a specific year? If the latter how was the selection done? Were all the records assigned numbers and random number generators used to select records?" 3.5 "Do the authors have an objective assessment of the completeness of the records of these two institutions?" 3.6 "What is the relationship of the two institutions that refused to grant permission for this study and their locations with the institutions that gave permission for the study? A map showing this information would be informative for international readers" 3.7 "The analytic methods used in the paper and described in the data analysis section should be described better. ."

Response:
3.1. We are thankful to the reviewers for the suggestions. We have included the demographic characteristics of the catchment areas together with the pattern of healthcare services in the study site section of the new manuscript. To clarify the reviewers' concern, there are 12 health facilities in Kenya that offer cancer services; seven private hospitals, two mission hospitals and three public facilities (KNH, MTRH and Coast Province General Hospital). Because of the affordability of the cancer services, most patients opt for the public facilities resulting in congestion of the facilities and long waiting times of the patients. Patients with private insurance and the government-sponsored scheme, National Health Insurance Fund, are more likely to undergo treatment than those without scheme, National Health Insurance Fund, are more likely to undergo treatment than those without insurance.
3.2. We have edited the sample size calculation section and included detailed explanations with references in the new manuscript. We do agree with the reviewers' concern of analyzing all the chart reviews. However, due to cost and time constraints, we were only able to analyze 500 files in each hospital, as this was the minimum necessary sample size needed to achieve the required power of the study. We have also come across an article showing that sampling is also applicable in retrospective chart review studies but to avoid any misunderstanding we have included this [1] , as one of the limitations of the study. Regarding the prevalence, we stated in the sample size calculation section that the prevalence of cancer in Kenya was unknown at the time of conducting the study. For a better clarification, a 50% prevalence (p) was used in calculating the sample size since it is known that when d =0.05 and a z=1.96, using a p of 0.5 (50%) yields the highest sample size required for cross-sectional studies A detailed explanation has been included in the new [2]. manuscript.
3.3. We thank the reviewers for the positive criticism. For clarification, knowing the number of patients with cancer was not our only objective. We also aimed at knowing the of cancer types, the sex and age of the patients under study, their origin by birth, the method of cancer diagnosis used, year of diagnosis and whether the patient was referred from another health facility. At the time of the study, the hospital databases could not provide all the information needed and therefore we had to use the files. Randomization was done because of the cost and time constraints. In a facility like KNH that had 17,584 cancer files, abstracting all the information needed from the patient files to the data collection form would cost more and need longer time that the time allocated to us by the ethics committee.
3.4. We agree with the reviewer's concern. We have modified the randomization section. For clarification, the files in KNH were captured in a database and randomization was purely an automated process since all files have a hospital number (that cannot be disclosed to protect the privacy of the patients). However, in MTRH, the records department was in the middle of updating their databases when this study was being done and because of this we were only given an estimate of the number of files and to achieve the required sample size we used convenient sampling method.
3.5. Yes, the authors had an objective assessment of the completeness of the records at the two institutions. First, randomization depended on the total number given to us of the number of cancer files available for the five year period. Secondly, to obtain all the data for the questions we had, we needed to use a patient file with complete information.
3.6. We thank the reviewers for the question. The four hospitals initially selected for the study included; Kenyatta National Hospital (KNH), Moi Teaching and Referral Hospital (MTRH), Jaramogi Oginga Odinga Teaching and Referral Hospital (JOOTRH) and Coast General Hospital previously known as Coast Province General Hospital (CPGH). At the time, they were all teaching and referral hospitals. KNH and MTRH are national hospitals (level 6) while JOOTRH and CPGH are level 5 hospitals. KNH is located in the former Nairobi province, MTRH is located in the former Rift Valley province, JOOTRH is located in the former Nyanza province and CPGH is located in the former Coast Province. Although we had ethical approval together with a letter of authority from the Ministry of Health giving us the authorization to conduct research in the four facilities, permission to access the files was only granted by the National hospitals. We have included a map in the new access the files was only granted by the National hospitals. We have included a map in the new manuscript to guide international readers. We are very grateful for the suggestion. 3.7. We are thankful for the suggestion. We have edited the "data analysis" section in the new manuscript.
Comment 4: Results "The authors should be precise in reporting results. "Around 4304 inpatient cancer cases were available in MTRH" is imprecise and unacceptable." "There is a clear difference in the cases of cancer at the KNH compared to MTRH, most probably because one site hosts AMPATH HIV treatment and prevention programs. It is therefore not justifiable to make too much of the differences in the proportion of cancers seen in these two institutions and attempt to extrapolate that to the general population."

Response:
We appreciate the suggestion. We have improved the precision of reporting the results.
We thank the reviewers for the suggestion. We have edited the results section and minimized extrapolating the differences from the two facilities to the general population.

The authors can describe the pattern of cancers presenting to each of these hospitals over a period of time, but this is not "incidence" of those cancers in the population. The pattern of presentation of cancers to specific hospitals is influenced by many factors. An example is illustrated in their paper where the pattern of cancers presenting at the AMPATH related hospital reflects the focus on HIV treatment and prevention at the institution."
Response: 5.1. We thank the reviewers for the positive criticism. We have modified the analysis and improved on the interpretation of the data. We have emphasized where necessary that we were comparing our hospital-based results to other results obtained from population-based studies.
5.2. We agree with the reviewer's concern that the proportions of cancers presenting in specific institutions may not be reflective at a population level. We have highlighted this as a limitation of this study.
5.3. The use of the term "Incidence" was an unintended mistake. We have edited the discussion