Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methods: The Case of P2P Money Transfers

Lavlin Agrawal, Pavankumar Mulgund, Raj Sharman

Source Title: Journal of Database Management (JDM)35(1)

ISSN: 1063-8016|EISSN: 1533-8010|EISBN13: 9798369324301|DOI: 10.4018/JDM.335888

MLA

Agrawal, Lavlin, et al. "Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methods: The Case of P2P Money Transfers." JDM vol.35, no.1 2024: pp.1-37. http://doi.org/10.4018/JDM.335888

APA

Agrawal, L., Mulgund, P., & Sharman, R. (2024). Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methods: The Case of P2P Money Transfers. Journal of Database Management (JDM), 35(1), 1-37. http://doi.org/10.4018/JDM.335888

Chicago

Agrawal, Lavlin, Pavankumar Mulgund, and Raj Sharman. "Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methods: The Case of P2P Money Transfers," Journal of Database Management (JDM) 35, no.1: 1-37. http://doi.org/10.4018/JDM.335888

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

The adoption of empirical methods for secondary data analysis has witnessed a significant surge in IS research. However, the secondary data is often incomplete, skewed, and imbalanced at best. Consequently, there is a growing recognition of the importance of empirical techniques and methodological decisions made to navigate through such issues. However, there is not enough methodological guidance, especially in the form of a worked case study that demonstrates the challenges of imbalanced datasets and offers prescriptive on how to deal with them. Using data on P2P money transfer services, this article presents a running example by analyzing the same dataset using several different methods. It then compares the outcomes of these choices and explicates the rationale behind some decisions such as inclusion and categorization of variables, parameter setting, and model selection. Finally, the article discusses certain regressions models such as weighted logistic regression and propensity matching, and when they should be used.