Abstract
The aim of this study is to explore trends in retracted publications in life sciences and biomedical sciences over axes like time, countries, journals and impact factors, and topics. Nearly seven thousand publications, which comprise the entirety of retractions visible through PubMed as of August 2019, were used. This work involved sophisticated data collection and analysis techniques to use data from PubMed, Wikipedia, and WikiData, and study it with respect to the above mentioned axes. Importantly, I employ state-of-the-art analysis and visualization techniques from natural language processing (NLP) to understand the topics in retracted literature. To highlight a few results, the analyses demonstrate an increasing rate of retraction over time and noticeable differences in the publication quality (as measured by journal impact factor) among top countries. Moreover, while molecular biology and cancer dominate retractions, we also see a number of retractions not related to biology. The methods and results of this study can be applied to continuously understand the nature and evolution of retractions in life sciences, thus contributing to the health of this research ecosystem.
Similar content being viewed by others
Data availability
The dataset is available in the github repository (https://github.com/bbhatt001/Retracted_Life_Sciences_Literature).
References
Arturo Casadevall, R., Steen, G., & Fang, F. C. (2014). Sources of error in the retracted scientific literature. The FASEB Journal, 28(9), 3847–3855.
Bar-Ilan, J., & Halevi, G. (2017). Post retraction citations in context: A case study. Scientometrics, 113(1), 547–565.
Ben Mabey. pyldavis. https://github.com/bmabey/pyLDAvis. Python library for interactive topic model visualization.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Bozzo, A., Bali, K., Evaniew, N., & Ghert, M. (2017). Retractions in cancer research: A systematic survey. Research Integrity and Peer Review, 2(1), 5.
Budd, J. M., Sievert, M. E., & Schultz, T. R. (1998). Phenomena of retraction: Reasons for retraction and citations to the publications. Jama, 280(3), 296–297.
Clarivate analytics, 2018 Journal Impact Factor, Journal Citation Reports, 2019.
Cokol, M., Ozbay, F., & Rodriguez-Esteban, R. (2008). Retraction rates are on the rise. EMBO Reports, 9(1), 2.
Coletti, M. H., & Bleich, H. L. (2001). Medical subject headings used to search the biomedical literature. Journal of the American Medical Informatics Association, 8(4), 317–323.
Fang, F. C., Grant Steen, R., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences, 109(42), 17028–17033.
Ferric, C. F., & Casadevall, A. (2011). Retracted science and the retraction index.
Foo, J. Y. A. (2011). A retrospective analysis of the trend of retracted publications in the field of biomedical and life sciences. Science and Engineering Ethics, 17(3), 459–468.
Gasparyan, A. Y., Ayvazyan, L., Akazhanov, N. A., & Kitas, G. D. (2014). Self-correction in biomedical publications and the scientific impact. Croatian Medical Journal, 55(1), 61.
Grant Steen, R. (2011a). Retractions in the scientific literature: Is the incidence of research fraud increasing? Journal of Medical Ethics, 37(4), 249–253.
Grant Steen, R. (2011b). Retractions in the scientific literature: Do authors deliberately commit research fraud? Journal of Medical Ethics, 37(2), 113–117.
Grant Steen, R., Casadevall, A., & Fang, F. C. (2013). Why has the number of scientific retractions increased? PloS ONE, 8(7), e68397.
Kans, J. (2013). Entrez direct: E-utilities on the unix command line. https://www.ncbi.nlm.nih.gov/books/NBK179288/.
King, E. G., Oransky, I., Sachs, T. E., Farber, A., Flynn, D. B., Abritis, A., et al. (2018). Analysis of retracted articles in the surgical literature. The American Journal of Surgery, 216(5), 851–855.
Korpela, K. M. (2010). How long does it take for the scientific literature to purge itself of fraudulent material?: The breuning case revisited. Current Medical Research and Opinion, 26(4), 843–847.
Lowe, H. J., & Octo Barnett, G. (1994). Understanding and using the medical subject headings (mesh) vocabulary to perform literature searches. Jama, 271(14), 1103–1108.
Masoomi, R., & Amanollahi, A. (2018). Why Iranian biomedical articles are retracted? The Journal of Medical Education and Development, 13(2), 87–100.
Medical subject headings. https://www.nlm.nih.gov/mesh/meshhome.html.
Moylan, E. C., & Kowalczuk, M. K. (2016). Why articles are retracted: A retrospective cross-sectional study of retraction notices at biomed central. BMJ OPEN, 6(11), e012047.
Nath, S. B., Marcus, S. C., & Druss, B. G. (2006). Retractions in the research literature: Misconduct or mistakes? Medical Journal of Australia, 185(3), 152–154.
Neumann, M., King, D., Beltagy, I. & Ammar, W. (2019). Scispacy: Fast and robust models for biomedical natural language processing.
Pratanwanich, N., & Lio, P. (2014). Exploring the complexity of pathway-drug relationships using latent Dirichlet allocation. Computational Biology and Chemistry, 53, 144–152.
Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks (pp. 45–50), Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.
Retraction watch. https://retractionwatch.com/.
Shaheen Syed and Marco Spruit. Full-text or abstract? examining topic coherence scores using latent Dirichlet allocation. In 2017 IEEE International conference on data science and advanced analytics (DSAA) (pp. 165–174). IEEE.
Sievert, C., & Shirley, K. (2014). LDAVIS: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70).
Spacy. https://github.com/explosion/spaCy. Industrial-strength Natural Language Processing (NLP) with Python and Cython.
Wager, E., & Williams, P. (2011). Why and how do journals retract articles? An analysis of medline retractions 1988–2008. Journal of Medical Ethics, 37(9), 567–570.
Wang, H., Ding, Y., Tang, J., Dong, X., He, B., Qiu, J., & Wild, D. J. (2011). Finding complex biological relationships in recent pubmed articles using bio-lda. PloS One, 6(3).
Wang, T., Xing, Q.-R., Wang, H., & Chen, W. (2019). Retracted publications in the biomedical literature from open access journals. Science and Engineering Ethics, 25(3), 855–868.
Wu, Y., Liu, M., Jim Zheng, W., Zhao, Z., & Xu, H. (2012). Ranking gene-drug relationships in biomedical literature using latent Dirichlet allocation. In Biocomputing 2012 (pp. 422–433). World Scientific.
Zheng, B., McLean, D. C., & Xinghua, L. (2006). Identifying biological concepts from a protein-related corpus with a probabilistic topic model. BMC Bioinformatics, 7(1), 58.
Author information
Authors and Affiliations
Contributions
Bhatt designed the study, collected and analyzed the data and wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
This research did not receive any funding. The author declares no conflict of interest.
Rights and permissions
About this article
Cite this article
Bhatt, B. A multi-perspective analysis of retractions in life sciences. Scientometrics 126, 4039–4054 (2021). https://doi.org/10.1007/s11192-021-03907-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-03907-0