A Statistical Analysis of Risk Factors and Biological Behavior in Canine Mammary Tumors: A Multicenter Study

Simple Summary The increase in the incidence of neoplastic disease represents a relentless challenge in veterinary medicine, and many efforts aimed to increase early diagnosis and life perspective have been made. Canine mammary tumors are the most common neoplasm and one of the leading causes of death in female dogs. Using a large number of data from three academic institutions, we found that dogs with malignant tumors were significantly older than dogs harboring benign tumors and that malignant tumors were significantly larger than benign counterparts. Moreover, a consistent fraction of malignant tumors is smaller than 1 cm, providing compelling evidence that the size of mammary tumors is a critical but easily detectable, indirect prognostic-related, clinical factor. We suggest that the control of cancer-related risk factors represents one of the most compelling prevention strategies and paves the way for further investigations. Abstract Canine mammary tumors (CMTs) represent a serious issue in worldwide veterinary practice and several risk factors are variably implicated in the biology of CMTs. The present study examines the relationship between risk factors and histological diagnosis of a large CMT dataset from three academic institutions by classical statistical analysis and supervised machine learning methods. Epidemiological, clinical, and histopathological data of 1866 CMTs were included. Dogs with malignant tumors were significantly older than dogs with benign tumors (9.6 versus 8.7 years, p < 0.001). Malignant tumors were significantly larger than benign counterparts (2.69 versus 1.7 cm, p < 0.001). Interestingly, 18% of malignant tumors were smaller than 1 cm in diameter, providing compelling evidence that the size of the tumor should be reconsidered during the assessment of the TNM-WHO clinical staging. The application of the logistic regression and the machine learning model identified the age and the tumor’s size as the best predictors with an overall diagnostic accuracy of 0.63, suggesting that these risk factors are sufficient but not exhaustive indicators of the malignancy of CMTs. This multicenter study increases the general knowledge of the main epidemiologica-clinical risk factors involved in the onset of CMTs and paves the way for further investigations of these factors in association with CMTs and in the application of machine learning technology.

Simple Summary: The increase in the incidence of neoplastic disease represents a relentless challenge in veterinary medicine, and many efforts aimed to increase early diagnosis and life perspective have been made. Canine mammary tumors are the most common neoplasm and one of the leading causes of death in female dogs. Using a large number of data from three academic institutions, we found that dogs with malignant tumors were significantly older than dogs harboring benign tumors and that malignant tumors were significantly larger than benign counterparts. Moreover, a consistent fraction of malignant tumors is smaller than 1 cm, providing compelling evidence that the size of mammary tumors is a critical but easily detectable, indirect prognostic-related, clinical factor. We suggest that the control of cancer-related risk factors represents one of the most compelling prevention strategies and paves the way for further investigations.
Abstract: Canine mammary tumors (CMTs) represent a serious issue in worldwide veterinary practice and several risk factors are variably implicated in the biology of CMTs. The present study examines the relationship between risk factors and histological diagnosis of a large CMT dataset from three academic institutions by classical statistical analysis and supervised machine learning methods. Epidemiological, clinical, and histopathological data of 1866 CMTs were included. Dogs with malignant tumors were significantly older than dogs with benign tumors (9.6 versus 8.7 years, p < 0.001). Malignant tumors were significantly larger than benign counterparts (2.69 versus 1.7 cm, p < 0.001). Interestingly, 18% of malignant tumors were smaller than 1 cm in diameter, providing compelling evidence that the size of the tumor should be reconsidered during the assessment of the TNM-WHO clinical staging. The application of the logistic regression and the machine learning model identified the age and the tumor's size as the best predictors with an overall diagnostic accuracy of 0.63, suggesting that these risk factors are sufficient but not exhaustive indicators of the malignancy of CMTs. This multicenter study increases the general knowledge of the main epidemiologica-clinical risk factors involved in the onset of CMTs and paves the way for further investigations of these factors in association with CMTs and in the application of machine learning technology.
represents stage IV disease, regardless of tumor size, and distant metastasis constitutes stage V. Notably, the size of the tumors represents a critical parameter in stage I, II, and III and strongly impacts on CMT prognosis and outcome. According to MacEwen et al. [25], 1985, dogs with tumors larger than 3.4 cm in diameter have a statistically significant worse outcome than dogs with smaller tumors, both in terms of remission and survival. Other authors, however, have found a change in prognosis only when tumors are larger than 5 cm [21]. In one study, tumor size was not prognostic when node involvement was detected [24]. Despite these studies, the importance of the tumor size is a biologically trustworthy factor, considering that more aggressive tumors grow faster and, therefore, are larger and more likely to harbor metastatic subclones [8]. Hence, the staging systems integrating different clinical parameters provide specific recommendations to clinician's treatment decision making [1,26,27].
In this study, we evaluated in a large retrospective statistical analysis the breed, the spayed status, and the age as epidemiological risk factors and the tumor size as a clinical prognostic-related feature of 1866 CMTs collected from three different Departments of Veterinary Medicine of the University of Sassari (UNISS), Padua (UNIPD), and Perugia (UNIPG). We analyzed the relationship between some epidemiological-clinical risk factors and the histological diagnosis to test the ability to prompt clinical data in predicting the diagnosis and, indirectly, a prognostic outcome. A supervised machine learning technique was compared to the classical statistical analysis and used to investigate the ability to predict the diagnosis of CMTs (malignant versus benign).

Materials and Methods
This retrospective study focused on reviewing CMT data generated from 3 different tumors databases (UNISS, UNIPD, UNIPG). Experiment permission was not required from the University's Animal Care Ethics Committee because all the samples were retrieved from the archive of the pathology laboratories and were used for diagnostic purposes.
The inclusion criteria for data selection were: dogs with single mammary neoplasia, availability of documented medical history including breed, age, macroscopical tumor size as indicated either by the clinician or by the histological laboratory and histopathological diagnosis of the neoplasm.
All previous histological diagnoses were updated and classified according to the recent publication of Surgical Pathology of Tumors of Domestic Animals, Volume 2: Mammary tumors [28].

Statistical Analysis-Descriptive Statistics and Univariate Analysis
To determine whether there was an association between epidemiological (age, breed, spayed status) and clinical characteristics (tumor size) and tumor diagnosis, the breed, age, spayed status, and tumor size were examined in association with the histological diagnosis. For statistical purposes, the breed was classified as pure breed and mixed breed, the age was either treated as a numerical variable (in years) or categorized in 4 classes (0-4 years; 5-8 years; 9-12 years and >13 years).
According to Pena et al., 2013, and references therein, malignant tumors were grouped into 3 histological categories (i.e., HD3 categories) based on morphological features and biological behavior as follows: group I, which included in situ carcinoma, simple carcinoma, carcinoma arising in a mixed tumor, complex carcinoma, mixed-type carcinoma, ductal carcinoma, and adenosquamous carcinoma; group II, which included solid carcinoma, comedocarcinoma, carcinoma, and malignant myoepithelioma, and anaplastic carcinoma; group III, which included other histological types [24].
Statistical analysis was carried out using a Student's T test for continuous normally distributed variables, chi-square (X 2 ) test and nonparametric Kruskal-Wallis ANOVA followed by Dunn's post hoc test for categories. Data were analyzed with Stata version 11.2 (StataCorp, 2009), and results were considered significant when p ≤ 0.05.

Statistical Analysis-Multivariate Analysis and Machine Learning Model
Logistic regression analysis was performed to evaluate the influence of the different covariates (age, tumor size, spayed status, and breed) on tumor diagnosis. Covariates were selected through a nested likelihood ratio test (Table 1 and Supplementary Materials).  1 Models are built so that the smaller models are special cases of the larger ones. Equivalently, the smaller models are obtained by sequentially setting to 0 the coefficients of the full model (IV). The general form is: log-odds = β 0 + β 1×1 + β 2 X 2 + ... + β n X n : where β 1 , β 2 , ..., β n are the coefficients of the x 1 , x 2 , ..., x n independent variables (covariates) included in the model. odds is calculated according to the formula: odds = exp(β 0 + β 1 X 1 + β 2 X 2 + ... + β n X n ); 2 Covariates included in the model. Intercept not reported; 3 Likelihood ratio test statistic: Deviance, p value; 4 Wald test statistic: z and p value; 5 Model parameters β n; 6 Exponentiated model parameters e βn; 7 Wald 95% confidence interval for an exponentiated model parameter.
The selected continuous covariates were then converted into categorical covariates according to the previously described schemes, generating two further models: the IC model where the tumor size was encoded according to the WHO TNM system and the IIC model where the tumor size was split into 5 categories as previously reported by Sonremno et al., 2009 [1,8].
Machine learning was performed to investigate the possibility to predict the diagnosis of mammary neoplasms in the dog (malignant versus benign) based on the recorded epidemiological (breed, spayed status, and the age) and clinical (tumor size) factors. Models were built using the R programming language relying upon the caret package through algorithms provided by the GLM (for logistic regression), and the GBM (for stochastic gradient boosting) libraries [29][30][31][32][33][34][35] (see Supplementary Materials for details). In particular, the supervised machine learning technique employed is stochastic gradient boosting which is a powerful learning method based on the combination of many simple models. The basic idea is to apply sequentially a "weak" learner (here, a decision tree) to modified versions of the initial data. Each time a tree is built, the data are modified by applying weights to increase the influence of misclassified observations. The final classification is performed through a weighted majority vote [36][37][38][39]. To assess the predictive performances of logistic regressions (GLM) and stochastic gradient boosting (GBM), a nested cross-validation was performed [39]. The dataset was split into 5 nonoverlapping training and a test sets by keeping 80% of cases for training. The split was performed randomly within each of the two classes of the outcome, to preserve the overall class distribution of the data. For each of the two classifiers (even if not required for GLM, using the same procedure allows for an easier comparison), the tuning of the hyperparameters was performed through 10-fold cross-validation repeated 5 times [34,36]. Continuous features were centered and scaled. The best setup was chosen by optimizing the area under the receiver operating characteristic (ROC) curve [40] and, with such parameters, a final fit was performed on the entire training set. The final result was obtained by repeating the procedure for each outer split and taking the average over the test sets.

Descriptive Data and Histological Diagnosis
The databases account for 1866 single mammary neoplasms. According to the histological classification, 867/1866 (46.5%) were benign tumors (BTs) and 999/1866 (53.5%) malignant tumors (MTs). According to the applied classification [28]  Although BTs occur predominantly in small breed dogs with Yorkshire terrier breed the most represented (64/867; 7%) and MTs occur mostly in German Shepherd dogs (79/999; 7.91%), no statistically significant association was observed between breeds and the prevalence of BTs and MTs. Similar results were noticed for the three histological malignant categories proposed by Pena (X 2 (2) = 0.9090, p = 0.635).

Multivariate Analysis and Machine Learning Model
According to the likelihood ratio and Wald test performed on the logistic regression, the tumor size and the dog's age were significantly related to the histological diagnosis, differently than what was observed for the spay status and breed (Table 1).
Given the values of the exponentiated coefficients and using the age and tumor size covariates as continuous variables (model II), a 25% increase in the odds of a malignant tumor per 1 cm increase in tumor size adjusting for age was observed. Similarly, the logistic regression model estimates a 12% increase in the odds of a malignant tumor per 1 year increase in age, adjusting for tumor size.
Furthermore, when continuous variables were converted in categorical covariates (Table 3), a 2.3and a 3.6-fold increase in the odds of a malignant tumor was observed when passing from T1 (<3 cm) to T2 (from 3 to 5 cm), and from T1(<3 cm) to T3 (<5 cm), respectively (p < 0.05). A similar pattern is present for the IIC model; compared to the reference level (0-1 cm) the odds ratios of all other tumor size groups were larger than 1, progressing from 1.3 (tumor size 1 to 2 cm) to 4.9 in neoplasm larger than 5 cm (Table 3). In both models, only animals with an age greater than 12 years have more than a 2-fold increase in the odds of an MT when compared to the baseline age of 0-4 years.
Predictive performances of the logistic regression in terms of overall accuracy, positive predictive values (PPVs), and negative predictive values (NPVs) (i.e., number of malignant -PPV-and benign -NPV-tumors correctly diagnosed) were 0.63 (CI 0.60-0.65), 0.65 (CI 0.63-0.67), and 0.61 (CI 0.57-0.64), respectively. The GBM machine learning model had a similar predictive performance compared to the logistic model (Table 4), probably as a consequence of the small number of predictors of the dataset, which does not allow for the full exploitation such a technique to model complex nonlinear relationships possibly present in the data [41,42]. Interestingly, the tumor size and the age had a relative influence of~69% and~30%, respectively, while the breed (<1%) and the spay status (<1%) were insignificant in the gradient boosting model. R code, and corresponding output can be found in the Supplementary Material.

Discussion
In veterinary medicine, the increase in the incidence of neoplastic disease represents a relentless challenge for veterinary oncology specialists. Consequently, many efforts have been made in the on-going research to increase the early diagnosis and life perspective in dogs harboring mammary tumors. As a consequence, in this background, cancer research is mainly focused on the discovery and control of cancer-related risk factors [43,44]. However, a large retrospective statistical analysis that related the breed, hormonal status, age, and tumor size with the histological diagnosis and, consequently, with the possible behavior of CMTs, has not been previously performed.
In this work, an approximately equal proportion of benign (46.5%) and malignant tumors (53.5%) was observed, and mixed BTs accounted for the highest number of the total cases. Mixed neoplasms are the most frequent neoplasias in female dogs, and are characterized by the proliferation of both luminal epithelial and interstitial myoepithelial elements admixed with foci of mesenchymal tissues such as cartilage, bone, and fat [28,45]. The most frequent MT was simple tubular or tubulopapillary carcinoma (26.1%) followed by complex carcinoma (13.3%) confirming what has been reported in the literature [10] and references therein.
In our study, sixty-one percent of CMTs were observed in pure breed dogs, suggesting, as previously described by Sorenmo and colleagues [6], that the breed could be a putative risk factor, and that certain breeds, such as Miniature Toy, Shih Tzu as well as German Shepherd, are prone to develop mammary neoplasms [1,2,6,10,11]. Interestingly, in our study, benign tumors occurred predominantly in small breed dogs, particularly in Yorkshire terriers, while malignant ones were detected with higher frequency in German Shepherd dogs. A better prognosis for small breeds has been previously reported in a retrospective multivariate survival analysis [46]. However, given the increasing prevalence of CMTs in small breeds, it is uncertain whether small size in dogs could represent a reliable risk factor or if these data are influenced by the greater veterinary care in those breeds than larger dogs [10]. According to Salas and collaborators [7], no significant association was observed between the breed and the development of BTs, MTs, as well as with the malignant carcinoma categories proposed by Pena et al., 2013 [24]. Similarly, the breed showed a slight influence in the logistic and GBM machine learning models (<1%), corroborating the considerable divergences between studies regarding the breed as a CMT risk factor. Moreover, considering that the mutations in Breast BRCA1 and 2 genes and their protein products have been variably associated with the development of CMTs, a definitive conclusion about CMT breed-related risk should be performed in the context of genetic research [12][13][14][15].
Age is considered one of the most important risk factors for developing mammary tumors with a peak incidence between 8 to 11 years, with younger dogs prone to having BTs [6][7][8][9]. These data seem to be confirmed by our study, in full agreement with what was reported by Sorenmo [6].
Noteworthy, simple MTs occurred at an older age than nonsimple ones. According to different authors [47,48], simple carcinomas have a poor prognosis compared to complex ones confirming, as proposed by Pena and collaborators [24], that the age should be considered an indirect, but a strong, prognostic factor. Furthermore, these data are supported by the multivariate analysis where a 12% increase in the odds of a MT per 1 year increase in age was observed.
Hormonal exposure is a well-documented canine mammary tumor-associated risk factor and steroid hormones, mainly 17 beta-estradiol (E2), are involved in cell proliferation by exerting an antiapoptotic effect that favors the neoplastic process [10,49]. Furthermore, the landmark publication by Schneider et al., in 1969, reported that mammary tumors occurred in 0.05% of females spayed before the first heat cycle, and this incidence increased from 8% to 26% when the animals were spayed after the first or second heat [16]. As a consequence, reproductive health policies responsible for spaying animals at a very early stage of life had a double beneficial effect, contributing to the reduction in the number of stray dogs and preventing mammary neoplasm development. Likewise, in our study, 83% of mammary neoplasms were diagnosed in unspayed dogs, substantiating the protective effects of ovariohysterectomy as described by several authors [16,18,19].
However, the lack of significance between BTs and MTs and spayed and unspayed dogs could suggest that the hormonal influence sorts an unrelated effect on the CMT malignancy, although 39% of the tumors observed in our cohort of spayed dogs were simple MTs that are generally related to an overall poor prognosis when compared to complex tumors [47,48]. Nevertheless, considering that our dataset lacks information regarding the age of dogs at spaying and that most of the tumors occurred in unspayed dogs probably as a consequence of the ethical concerns in Mediterranean countries regarding the gonadectomy, a careful and prudent outlook should be kept regarding the generalization of the hormonal status role in the onset of CMTs.
The size of the tumor is considered one of the main macroscopical findings related to CMT behavior. In the present study, we also considered the role of the tumor's size as a clinical, prognosis-related, CMT factor demonstrating that BTs were smaller than MTs, as previously reported by Sonremno et al., 2009 [8]. Furthermore, the tumor's diameter was related to the histological malignant categories proposed by Pena [24], with small size neoplasm was more prone to a better prognosis compared to the larger one. However, considering the size of the tumor using the WHO classification and the five categories proposed by Sonremno [8], 62.5% of carcinomas were smaller than 3 cm, and 18% were less than 1 cm. Interestingly, these data conflict with what has been described by Sorenmo et al., 2009 [8], who reported that only 3% of MTs were smaller than 1 cm, providing compelling evidence that the tumor size should be carefully evaluated during the assessment of the TNM-WHO clinical staging, as previously suggested by Pena [24].
Supporting these data, the application of the logistic regression characterized the age and the size as the best predictors, with an overall diagnostic accuracy of 0.63 and low predictive values, both positive and negative. This value of accuracy is probably related to the number of factors used in our model. A similar predictive performance was observed using one of the most powerful machine learning models, suggesting that the age and the size are sufficient but not exhaustive parameters for the diagnosis of CMTs. Thanks to dramatic breakthroughs in artificial intelligence and machine learning technologies in the mainstreaming of vertiginous cancer-related research, it is highly credible that the ways to investigate cancer risk factors and the consequently generalized impact will be subverted and revolutionized in a tailor-made personalized animal outlook.

Conclusions
In conclusion, this multicenter retrospective study, accounting for a large number of CMTs from different academic institutions, offers a unique opportunity to increase the overall knowledge of the main factors involved in canine mammary tumor onset. In our study, the observation that a high number of MTs are smaller than 1 cm suggests the need for a reconsideration of the size (T) parameter in the TNM system and pave the way for the development of tools for the investigation and control of clinical risk factors for small size tumors.
Funding: This research was funded by "Università degli Studi di Sassari, fondo di Ateneo per la ricerca 2019".