Abstract
Background: Recurrence is an important cornerstone in breast cancer behavior, intrinsically related to mortality. In spite of its relevance, it is rarely recorded in the majority of breast cancer datasets, which makes research in its prediction more difficult. Objectives: To evaluate the performance of machine learning techniques applied to the prediction of breast cancer recurrence. Material and Methods: Revision of published works that used machine learning techniques in local and open source databases between 1997 and 2014. Results: The revision showed that it is difficult to obtain a representative dataset for breast cancer recurrence and there is no consensus on the best set of predictors for this disease. High accuracy results are often achieved, yet compromising sensitivity. The missing data and class imbalance problems are rarely addressed and most often the chosen performance metrics are inappropriate for the context. Discussion and Conclusions: Although different techniques have been used, prediction of breast cancer recurrence is still an open problem. The combination of different machine learning techniques, along with the definition of standard predictors for breast cancer recurrence seem to be the main future directions to obtain better results.
- P. H. Abreu, H. Amaro, D. C. Silva, P. Machado, and M. H. Abreu. 2013b. Personalizing breast cancer patients with heterogeneous data. In Proceedings of the IFMBE International Conference on Health Informatics. 39--42.Google Scholar
- P. H. Abreu, H. Amaro, D. C. Silva, P. Machado, M. H. Abreu, N. Afonso, and A. Dourado. 2013a. Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data. In Proceedings of the Mediterranean Conference on Medical and Biological Engineering and Computing. 1366--1369.Google Scholar
- R. Agrawal, T. Imielinski, and A. N. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 207--216. Google ScholarDigital Library
- M. A. Ahmadi, M. R. Ahmadi, and S. R. Shadizadeh. 2013. Evolving artificial neural network and imperialist competitive algorithm for prediction permeability of the reservoir. Neural Computing and Applications 13, 2 (2013), 1--9. Google ScholarDigital Library
- N. S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175--185.Google ScholarCross Ref
- S. Arlot and A. Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4 (2010), 40--79.Google ScholarCross Ref
- A. Atla, R. Tada, V. Sheng, and N. Singireddy. 2011. Sensitivity of different machine learning algorithms to noise. Journal of Computing Sciences in Colleges 26, 5 (2011), 96--103. Google ScholarDigital Library
- A. Azevedo and M. F. Santos. 2008. KDD, SEMMA and CRISP-DM: A parallel overview. In Proceedings of Informatics and Data Mining. 182--185.Google Scholar
- F. Beca, R. Santos, D. Vieira, L. Zeferino, R. Dufloth, and F. Schmitt. 2014. Primary relapse site pattern in women with triple-negative breast cancer. Pathology - Research and Practice 210, 9 (2014), 571--575.Google ScholarCross Ref
- Z. Beheshti, S. M. H. Shamsuddin, E. Beheshti, and S. S. Yuhaniz. 2014. Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis. Soft Computing 18, 11 (2014), 2253--2270. Google ScholarDigital Library
- S. Belciug, F. Gorunescu, A. B. Salem, and M. Gorunescu. 2010. Clustering-based approach for detecting breast cancer recurrence. In Proceedings of the International Conference on Intelligent Systems Design and Applications (ISDA). 533--538.Google Scholar
- C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
- B. Boser, I. Guyon, and V. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the Annual Workshop on Computational Learning Theory. 144--152. Google ScholarDigital Library
- L. Breiman. 1996. Bagging predictors. In Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
- L. Breiman. 1998. Arcing classifiers. The Annals of Statistics Journal 26, 3 (1998), 801--849.Google ScholarCross Ref
- L. Breiman. 2001. Random forests. Machine Learning Journal 45 (2001), 5--32. Google ScholarDigital Library
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. 1984. Classification and Regression Trees. Wadsworth 8 Brooks, Monterey, CA.Google Scholar
- R. Chandrasekaran, Y. U. Ryu, V. S. Jacob, and S. Hong. 2005. Isotonic separation. INFORMS Journal on Computing 17, 4 (2005), 462--474. Google ScholarDigital Library
- P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. 2000. CRISP-DM 1.0: Step-by-Step Data Mining Guide. SPSS.Google Scholar
- K. W. Chau. 2007. Application of a PSO-based neural network in analysis of outcomes of construction claims. Automation in Construction 16, 5 (2007), 642--646.Google ScholarCross Ref
- V. Chaurasia and S. Pal. 2014. Data mining techniques: To predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing 3, 1 (2014), 10--22.Google Scholar
- N. V. Chawla. 2010. Data Mining and Knowledge Discovery Handbook (2nd ed.). Springer US. 875--886. Google ScholarDigital Library
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 2002 (2002), 321--357. Google ScholarCross Ref
- N. V. Chawla, N. Japkowicz, and A. Kotcz. 2004. Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 1--6. Google ScholarDigital Library
- H. Chen, S. S. Fuller, C. Friedman, and W. Hersh (Eds.). 2005. Medical Informatics—Knowledge Management and Data Mining in Biomedicine. Vol. 8. Springer-Verlag US. Google ScholarDigital Library
- F. Cismondi, A. S. Fialho, S. M. Vieira, S. R. Reti, J. M. Sousa, and S. N. Finkelstein. 2013. Missing data in medical databases: Impute, delete or classify? Artificial Intelligence in Medicine 58, 1 (2013), 63--72. Google ScholarDigital Library
- J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.Google ScholarCross Ref
- J. A. Cruz and D. S. Wishart. 2006. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics 2 (2006), 59--77.Google ScholarCross Ref
- A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1 (1977), 1--38.Google Scholar
- M. Dettling and P. Buhlmann. 2004. Finding predictive gene groups from microarray data. Journal of Multivariate Analysis 90, 1 (2004), 106--131. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. 2012. Pattern Classification (2nd ed.). John Wiley 8 Sons. Google ScholarDigital Library
- B. Efron and R. Tibshirani. 1994. An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC.Google Scholar
- B. D. Eugenio and M. Glass. 2004. The kappa statistic: A second look. Computational Linguistics 30, 1 (2004), 95--101. Google ScholarDigital Library
- Q. Fan, C. J. Zhu, and L. Yin. 2010. Predicting breast cancer recurrence using data mining techniques. In Proceedings of International Conference on Bioinformatics and Biomedical Technology. 310--311.Google Scholar
- A. Farr, R. Wuerstlein, A. Heiduschka, C. F. Singer, and N. Harbeck. 2013. Modern risk assessment for individualizing treatment concepts in early-stage breast cancer. Reviews in Obstetrics and Gynecology 6, 3 (2013), 165--173.Google Scholar
- U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. 1996. From data mining to knowledge discovery in databases. Artificial Intelligence Magazine 17, 3 (1996), 37--54.Google Scholar
- C. Ferri, J. Hernández-Orallo, and R. Modroiu. 2009. An experimental comparison of performance measures for classification. Pattern Recognition Letters 30, 1 (2009), 27--38. Google ScholarDigital Library
- A. Fischer and C. Igel. 2012. An introduction to restricted Boltzmann machines. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, 14--36.Google Scholar
- A. Fischer and C. Igel. 2014. Training restricted Boltzmann machines: An introduction. Pattern Recognition 47, 1 (2014), 25--39. Google ScholarDigital Library
- R. A. Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 2 (1936), 179--188.Google ScholarCross Ref
- Y. Freund and R. E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory. 23--37. Google ScholarDigital Library
- J. Friedman, T. Hastie, and R. Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 2 (2000), 337--407.Google ScholarCross Ref
- M. H. Galea, R. W. Blamey, C. E. Elston, and I. O. Ellis. 1992. The Nottingham prognostic index in primary breast cancer. Breast Cancer Research and Treatment 22, 3 (1992), 207--219.Google ScholarCross Ref
- V. Ganganwar. 2012. An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering 2, 4 (2012), 42--47.Google Scholar
- P. J. García-Laencina, P. H. Abreu, M. H. Abreu, and N. Afonso. 2015. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Computers in Biology and Medicine 59, 2015 (2015), 125--133. Google ScholarDigital Library
- P. J. García-Laencina, J. L. Sancho-Gómez, and A. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neural Computing 8 Applications 19, 2010 (2010), 263--282. Google ScholarDigital Library
- P. J. García-Laencina, J. L. Sancho-Gómez, and A. R. Figueiras-Vidal. 2013. Classifying patterns with missing values using multi-task learning perceptrons. Expert Systems with Applications 40, 4 (2013), 1333--1341. Google ScholarDigital Library
- V. Garcia, R. A. Mollineda, R. Alejo, and J. M. Sotoca. 2007. The class imbalance problem in pattern classification and learning. In II Congreso Español de Informática (CEDI’07). 978--84.Google Scholar
- O. Gevaert, F. De Smet, D. Timmerman, Y. Moreau, and B. De Moor. 2006. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 22, 14 (2006), 184--190. Google ScholarDigital Library
- L. Guo-Zheng. 2011. Machine learning for clinical data processing. In Machine Learning: Concepts, Methodologies, Tools and Applications. IGI Global, 875--897.Google Scholar
- J. Han, M. Kamber, and J. Pei. 2011. Data Mining: Concepts and Techniques: Concepts and Techniques (3rd ed.). Morgan Kaufmann. Google ScholarDigital Library
- N. Harbeck, C. Thomssen, and M. Gnant. 2013. St. Gallen 2013: Brief preliminary summary of the consensus discussion. Breast Care 8, 2 (2013), 102--109.Google ScholarCross Ref
- H. He and E. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263--1284. Google ScholarDigital Library
- J. Huang. 2005. Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17, 3 (2005), 290--310. Google ScholarDigital Library
- H. In, K. Y. Bilimoria, A. K. Stewart, K. E. Wroblewski, M. C. Posner, M. S. Talamonti, and D. P. Winchester. 2014. Cancer recurrence: An important but missing variable in national cancer registries. Annals of Surgical Oncology 21 (2014), 1520--1529.Google ScholarCross Ref
- O. Intrator and N. Intrator. 2001. Interpreting neural-network results: A simulation study. Computational Statistics and Data Analysis 37, 3 (2001), 373--393. Google ScholarDigital Library
- H. Irshad, A. Gouaillardd, L. Rouxa, and D. Racoceanub. 2014. Multispectral band selection and spatial characterization: Application to mitosis detection in breast cancer histopathology. Computerized Medical Imaging and Graphics 38, 5 (2014), 390--402.Google ScholarCross Ref
- A. Jain and R. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall, Inc, Upper Saddle River, NJ. Google ScholarDigital Library
- A. K. Jain. 2010. Data clustering: 50 years beyond k-means. Pattern Recognition Letters Journal 31, 8 (2010), 651--666. Google ScholarDigital Library
- J. M. Jerez-Aragonés, J. A. Gomez-Ruiz, G. Ramos-Jimenez, J. Munoz-Perez, and E. Alba-Conejo. 2003. A combined neural network and decision trees model for prognosis of breast cancer relapse. Artificial Intelligence in Medicine 27, 1 (2003), 45--63. Google ScholarDigital Library
- T. Jo and N. Japkowicz. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter 6, 1 (2004), 40--49. Google ScholarDigital Library
- T. Jonsdottir, E. T. Hvannberg, H. Sigurdsson, and S. Sigurdsson. 2008. The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. Expert Systems with Applications 34, 1 (2008), 108--118. Google ScholarDigital Library
- M. Kantardzic. 2011. Data Mining: Concepts, Models, Methods, and Algorithms (2nd ed.). Wiley-IEEE Press. Google ScholarCross Ref
- W. Kim, K. S. Kim, J. E. Lee, D. Y. Noh, S. W. Kim, Y. S. Jung, M. Y. Park, and R. W. Park. 2012. Development of novel breast cancer recurrence prediction model using support vector machines. Journal of Breast Cancer 15, 2 (2012), 230--238.Google ScholarCross Ref
- D. Kleinbaum, M. Klein, and E. Pryor. 2002. Logistic Regression: A Self-Learning Text. Statistics for Biology and Health Series. Springer-Verlag.Google Scholar
- T. Kohonen. 1995. Self-Organizing Maps. Springer, Berlin. Google ScholarDigital Library
- I. Kononenko. 2001. Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23 (2001), 89--109. Google ScholarDigital Library
- S. Kotsiantis, D. Kanellopoulos, and P. Pintelas. 2006. Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering 30, 1 (2006), 25--36.Google Scholar
- S. B. Kotsiantis. 2007. Supervised machine learning: A review of classification techniques. Informatica 31 (2007), 249--268.Google Scholar
- K. Kouroua, T. P. Exarchosa, K. P. Exarchosa, M. V. Karamouzisc, and D. I. Fotiadisa. 2015. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal 13 (2015), 8--17.Google ScholarCross Ref
- B. S. Kumar. 2012. Boosting techniques on rarity mining. International Journal of Advanced Research in Computer Science and Software Engineering 2, 10 (2012), 27--35.Google Scholar
- H. Larochelle and Y. Bengio. 2008. Classification using discriminative restricted Boltzmann machines. In Proceedings of the International Conference on Machine Learning. 536--543. Google ScholarDigital Library
- H. Larochelle, M. Mandel, R. Pascanu, and Y. Bengio. 2012. Learning algorithms for the classification restricted Boltzmann machine. Journal of Machine Learning Research 13, 1 (2012), 643--669. Google ScholarDigital Library
- D. T. Larose. 2005. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley. Google ScholarDigital Library
- S. Lee and P. A. Abbott. 2003. Bayesian networks for knowledge discovery in large datasets: Basics for nurse researchers. Journal of Biomedical Informatics 36, 2003 (2003), 389--399. Google ScholarDigital Library
- Z. Li and J. R. Eastman. 2006. The nature and classification of unlabelled neurons in the use of Kohonen’s self-organizing map for supervised classification. Transactions in GIS 10, 4 (2006), 599--613.Google ScholarCross Ref
- M. Lichman. 2015. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
- R. J. A. Little and D. B. Rubin. 2002. Statistical Analysis with Missing Data (2nd ed.). Wiley.Google Scholar
- P. Liu, L. Lei, and N. Wu. 2005. A quantitative study of the effect of missing data in classifiers. In Proceedings of the International Conference on Computer and Information Technology. 28--33. Google ScholarDigital Library
- R. Longadge and S. Dongre. 2013. Class imbalance problem in data mining review. International Journal of Computer Science and Network 1, 2 (2013), 83--87.Google Scholar
- S. P. Luttrell. 1994. Partitioned mixture distribution: An adaptive Bayesian network for low-level image processing. IEE Proc Vision, Image Signal Process 141, 4 (1994), 251--260.Google ScholarCross Ref
- M. T. Mahmoudi, F. Taghiyareh, N. Forouzideh, and C. Lucas. 2013. Evolving artificial neural network structure using grammar encoding and colonial competitive algorithm. Neural Computing and Applications 22, 1 (2013), 1--16.Google ScholarCross Ref
- S. Mani, M. J. Pazzani, and J. West. 1997. Knowledge discovery from a breast cancer database. Artificial Intelligence in Medicine 1211 (1997), 130--133. Google ScholarDigital Library
- Z. Markov and D. T. Larose. 2006. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. John Wiley 8 Sons, Inc. Google ScholarDigital Library
- J. P. Marques de Sá. 2001. Pattern Recognition: Concepts, Methods and Applications. Springer-Verlag.Google ScholarCross Ref
- W. McCulloch and W. Pitts. 1943. A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133.Google ScholarCross Ref
- M. L. McHugh. 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22, 3 (2012), 276--282.Google ScholarCross Ref
- S. A. Medjahed, T. A. Saadi, and A. Benyettou. 2013. Breast cancer diagnosis by using k-nearest neighbor with different distances and classification rules. International Journal of Computer Applications 62, 1 (2013), 1--5.Google ScholarCross Ref
- G. Menardi and N. Torelli. 2014. Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery 28, 1 (2014), 92--122. Google ScholarDigital Library
- E. Mendonza. 2013. Predictors of early distant metastases in women with breast cancer. Journal of Cancer Research and Clinical Oncology 139, 4 (2013), 645--652.Google ScholarCross Ref
- M. Minsky and S. Papert. 1969. An Introduction to Computational Geometry. MIT Press.Google Scholar
- M. Mitchell. 1996. An Introduction to Genetic Algorithms. MIT Press. Google ScholarDigital Library
- T. M. Mitchell. 1997. Machine Learning (1st ed.). McGraw-Hill, Inc. Google ScholarDigital Library
- C. Molina, B. Prados-Suarez, D. R. M. Prados, and Y. C. Pena. 2013. Improving hospital decision making with interpretable associations over datacubes. Studies in Health Technology and Informatics 197 (2013), 91--95.Google Scholar
- S. E. Moody, D. Perez, T. C. Pan, C. J. Sarkisian, C. P. Portocarrero, C. J. Sterner, K. L. Notorfrancesco, R. D. Cardiff, and L. A. Chodosh. 2005. The transcriptional repressor snail promotes mammary tumor recurrence. Cancer Cell 8, 3 (2005), 197--209.Google ScholarCross Ref
- M. S. Murti. 2012. Using rule based classifiers for the predictive analysis of breast cancer recurrence. Journal of Information Engineering and Applications 2, 2 (2012), 12--19.Google Scholar
- Nature Publishing Group. 2015. Nature International Weekly Journal of Science. (2015). http://www.nature.com/nature.Google Scholar
- I. A. Olivotto, C. D. Bajdik, P. M. Ravdin, C. H. Speers, A. J. Coldman, B. D. Norris, and K. A. Gelmon. 2005. Population-based validation of the prognostic model adjuvant! for early breast cancer. Journal of Clinical Oncology 23, 22 (2005), 2716--2735.Google ScholarCross Ref
- C. Ozkan, O. Kisi, and B. Akay. 2011. Neural networks with artificial bee colony algorithm for modeling daily reference evapotranspiration. Irrigation Science 29, 6 (2011), 431--441.Google ScholarCross Ref
- B. R. Patel and K. K. Rana. 2014. A survey on decision tree algorithm for classification. Journal of Engineering Development and Research 2, 1 (2014), 5 pages.Google Scholar
- A. P. Pawlovsky and M. Nagahashi. 2014. A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. In Proceedings of IEEE-EMBS International Conference on Biomedical and Health Informatics. 189--192.Google Scholar
- K. Polyak. 2011. Heterogeneity in breast cancer. Journal of Clinical Investigation 121, 10 (2011), 3786--3788.Google ScholarCross Ref
- J. R. Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81--106. Google ScholarCross Ref
- J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarDigital Library
- C. R. Rao. 1948. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society. Series B (Methodological) 10, 2 (1948), 159--203.Google ScholarCross Ref
- A. R. Razavi, H. Gill, H. Ahlfeldt, and N. Shahsavar. 2005. A data pre-processing method to increase efficiency and accuracy in data mining. In Artificial Intelligence in Medicine. Vol. 3581. Springer, Berlin, 434--443. Google ScholarDigital Library
- A. R. Razavi, H. Gill, H. Ahlfeldt, and N. Shahsavar. 2007. Predicting metastasis in breast cancer: Comparing a decision tree with domain experts. Journal of Medical Systems 31, 4 (2007), 263--273. Google ScholarDigital Library
- D. B. Rubin. 2004. Multiple Imputation for Nonresponse in Surveys. John Wiley 8 Sons.Google Scholar
- Y. U. Ryu, R. Chandrasekaran, and V. S. Jacob. 2007a. Breast cancer prediction using the isotonic separation technique. European Journal of Operational Research 181, 2 (2007), 842--854.Google ScholarCross Ref
- Y. U. Ryu, R. Chandrasekaran, and V. S. Jacob. 2007b. Data classification using the isotonic separation technique: Application to breast cancer prediction. European Journal of Operational Research 181 (2007), 1--30.Google ScholarCross Ref
- G. I. Salama, M. B. Abdelhalim, and M. A. E. Zeid. 2012. Experimental comparison of classifiers for breast cancer diagnosis. In Proceedings of International Conference on Computer Engineering and Systems (ICCES). 180--185.Google ScholarCross Ref
- M. S. Santos, P. H. Abreu, P. J. García-Laencina, A. Simão, and A. Carvalho. 2015. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. Journal of Biomedical Informatics 58 (2015), 49--59. Google ScholarDigital Library
- SAS Institute. 2015. SAS Enterprise Miner - SEMMA. Retrieved from https://web.archive.org/web/20120308165638/http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.html/.Google Scholar
- R. E. Schapire. 1990. The strength of weak learnability. Machine Learning 5, 2 (1990), 197--227. Google ScholarDigital Library
- B. Scholkopf and A. Smola. 2002. Learning with Kernels. MIT Press, Cambridge, MA.Google Scholar
- SEER Research. 2015. Surveillance, Epidemiology, and End Results (SEER) Program. Retrieved from http://seer.cancer.gov/data/access.html.Google Scholar
- L. A. Shalabi and Z. Shaaban. 2006. Normalization as a preprocessing engine for data mining and the approach of preference matrix. In Proceedings of the International Conference on Dependability of Computer Systems. 207--214. Google ScholarDigital Library
- J. Shawe-Taylor and N. Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. Google ScholarDigital Library
- K. Socha and C. Blum. 2007. An ant colony optimization algorithm for continuous optimization: Application to feed-forward neural network training. Neural Computing and Applications 16, 3 (2007), 235--247. Google ScholarDigital Library
- M. Srinivas and C. K. Mohan. 2015. Multi-level classification: A generic classification method for medical data sets. In Proceedings of the IEEE International Conference on E-Health Networking, Application Service. 6 pages.Google Scholar
- R. Srivastava. 2013. Research Developments in Computer Vision and Image Processing: Methodologies and Applications. IGI Global. Google ScholarDigital Library
- J. Stefanowski. 2005. An experimental study of methods combining multiple classifiers—Diversified both by feature selection and bootstrap sampling. In Issues in the Representation and Processing of Uncertain and Imprecise Information. 337--354.Google Scholar
- M. M. Suarez-Alvarez, D. Pham, Y. Mikhail, and Y. I. Prostov. 2012. Statistical approach to normalization of feature vectors and clustering of mixed datasets. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 468, 2145 (2012), 2630--2652.Google Scholar
- Y. Sun. 2007. Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 1035--1051. Google ScholarDigital Library
- Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie. 2007. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics 23, 1 (2007), 30--37. Google ScholarDigital Library
- Thomson Reuters. 2015. Web of Science. (2015). http://thomsonreuters.com/thomson-reuters-web-of-science/.Google Scholar
- J. M. Tomczak. 2013. Prediction of breast cancer recurrence using classification restricted Boltzmann machine with dropping. CoRR abs/1308.6324 (2013), 9 pages.Google Scholar
- H. Trevor, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics.Google Scholar
- E. Trumbelj, Z. Bosnic, I. Kononenko, B. Zakotnik, and C. Kuhar. 2010. Explanation and reliability of prediction models: The case of breast cancer recurrence. Knowledge Information System 24 (2010), 305--324. Google ScholarDigital Library
- N. Tsikriktsis. 2005. A review of techniques for treating missing data in OM survey research. Journal of Operations Management 24, 1 (2005), 53--62.Google ScholarCross Ref
- C. Van den Hurk. 2011. Unfavourable pattern of metastases in M0 breast cancer patients during 1978--2008: A population-based analysis of the Munich cancer registry. Breast Cancer Research and Treatment 128, 3 (2011), 795--805.Google ScholarCross Ref
- L. J. van ’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002), 530--536.Google ScholarCross Ref
- V. Vapnik. 1999. The Nature of Statistical Learning Theory (Information Science and Statistics) (2nd ed.). Springer. Google ScholarDigital Library
- A. Verikas, A. Gelzinis, and M. Bacauskiene. 2011. Mining data with random forests: A survey and results of new tests. Pattern Recognition 44, 2 (2011), 330--349. Google ScholarDigital Library
- I. H. Witten and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarDigital Library
- World Health Organization. 2012. GLOBOCAN 2012: Estimated cancer incidence, mortality and prevalence worldwide in 2012. Retrieved from http://globocan.iarc.fr.Google Scholar
- S. Zhang, Z. Qin, C. X. Ling, and S. Sheng. 2005. “Missing is useful”: Missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17, 12 (2005), 1689--1693. Google ScholarDigital Library
- B. Zheng, S. Yoon, and S. S. Lam. 2014. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications 41, 4, Part 1 (2014), 1476--1482. Google ScholarDigital Library
- Z. H. Zhou and Y. Jiang. 2003. Medical diagnosis with c4.5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 7, 1 (2003), 37--42. Google ScholarDigital Library
- M. D. Zio, U. Guarnera, and O. Luzi. 2007. Imputation through finite Gaussian mixture models. Computational Statistics and Data Analysis 51, 11 (2007), 5305--5316. Google ScholarDigital Library
Index Terms
- Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
Recommendations
Prediction of Breast Cancer using Machine Learning Techniques
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary ComputingBreast cancer is a topic that is frequently discussed these days. It is one of the most widespread diseases and forms of cancer. The National Cancer Institute says that the second most frequent malignancy in women is breast cancer. Every year, around ...
Breast cancer classification using deep belief networks
We present a CAD scheme using DBN unsupervised path followed by NN supervised path.Our two-phase method 'DBN-NN' classification accuracy is higher than using one phase.Overall accuracy of DBN-NN reaches 99.68% with 100% sensitivity & 99.47% ...
Predicting the recurrence of breast cancer using machine learning algorithms
AbstractBreast cancer is one of the most common types of cancer among Jordanian women. Recently, healthcare organizations in Jordan have adopted electronic health records, which makes it feasible for researchers to access huge amounts of medical records. ...
Comments