ABSTRACT
Diagnosis prediction is becoming crucial to develop healthcare plans for patients based on Electronic Health Records (EHRs). Existing works usually enhance diagnosis prediction via learning accurate disease representation, where many of them try to capture inclusive relations based on the hierarchical structures of existing disease ontologies such as those provided by ICD-9 codes. However, they overlook exclusive relations that can reflect different and complementary perspectives of the ICD-9 structures, and thus fail to accurately represent relations among diseases and ICD-9 codes. To this end, we propose to project disease embeddings and ICD-9 code embeddings into boxes, where a box is an axis-aligned hyperrectangle with a geometric region and two boxes can clearly "include" or "exclude" each other. Upon box embeddings, we further obtain patient embeddings via aggregating the disease representations for diagnosis prediction. Extensive experiments on two real-world EHR datasets show significant performance gains brought by our proposed framework, yielding average improvements of 6.04% for diagnosis prediction over state-of-the-art competitors.
Supplemental Material
- T. Bai, S. Zhang, B. L. Egleston, and S. Vucetic. Interpretable representation learning for healthcare via capturing disease progression through time. In SIGKDD, pages 43--51, 2018.Google ScholarDigital Library
- E. Choi, M. T. Bahadori, J. A. Kulas, A. Schuetz, W. F. Stewart, and J. Sun. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS, 2016.Google Scholar
- E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun. Gram: graph-based attention model for healthcare representation learning. In SIGKDD, 2017.Google ScholarDigital Library
- E. R. Hansen, T. Sagi, and K. Hose. Diagnosis prediction over patient data using hierarchical medical taxonomies. Workshop Proceedings of the EDBT/ICDT, 2023.Google Scholar
- S. Jiang, Q. Yao, Q. Wang, and Y. Sun. A single vector is not enough: Taxonomy expansion via box embeddings. In WWW, pages 2467--2476, 2023.Google Scholar
- A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-Wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 2016.Google Scholar
- C. Lu, C. K. Reddy, P. Chakraborty, S. Kleinberg, and Y. Ning. Collaborative graph learning with auxiliary text for temporal event prediction in healthcare. In IJCAI, 2021.Google ScholarCross Ref
- J. Luo, M. Ye, C. Xiao, and F. Ma. Hitanet: Hierarchical time-aware attention networks for risk prediction on electronic health records. In SIGKDD, pages 647--656, 2020.Google ScholarDigital Library
- F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In SIGKDD, pages 1903--1911, 2017.Google ScholarDigital Library
- F. Ma, Q. You, H. Xiao, R. Chitta, J. Zhou, and J. Gao. Kame: Knowledge-based attention model for diagnosis prediction in healthcare. In CIKM, 2018.Google ScholarDigital Library
- Y. Onoe, M. Boratko, A. McCallum, and G. Durrett. Modeling fine-grained entity types with box embeddings. In ACL, 2021.Google ScholarCross Ref
- T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 2018.Google Scholar
- Z. Qiao, Z. Zhang, X. Wu, S. Ge, and W. Fan. Mhm: Multi-modal clinical data based hierarchical multi-label diagnosis prediction. In SIGIR, 2020.Google ScholarDigital Library
- Z. Sun, X. Yang, Z. Feng, T. Xu, X. Fan, and J. Tian. Ehr2hg: Modeling of ehrs data based on hypergraphs for disease prediction. In BIBM, 2022.Google ScholarCross Ref
- Q. Suo, J. Chou, W. Zhong, and A. Zhang. Tadanet: Task-adaptive network for graph-enriched meta-learning. In SIGKDD, pages 1789--1799, 2020.Google ScholarDigital Library
- Y. Tan, C. J. Yang, X. Wei, C. Chen, W. Liu, L. Li, J. Zhou, and X. Zheng. Metacare: Meta-learning with hierarchical subtyping for cold-start diagnosis prediction in healthcare data. In SIGIR, pages 449--459, 2022.Google ScholarDigital Library
- M. Usama, B. Ahmad, W. Xiao, M. S. Hossain, and G. Muhammad. Self-attention based recurrent convolutional neural network for disease prediction using healthcare data. Comput Methods Programs Biomed, 190:105191, 2020Google Scholar
Index Terms
- BoxCare: A Box Embedding Model for Disease Representation and Diagnosis Prediction in Healthcare Data
Recommendations
An Embedding-Based Approach for Oral Disease Diagnosis Prediction from Electronic Medical Records
ICMHI '18: Proceedings of the 2nd International Conference on Medical and Health InformaticsThis paper reports a diagnosis prediction study from electronic medical records (EMRs) of oral diseases. We propose to learn continuous vector representations (embeddings) of symptoms and diagnoses through training neural networks. To the best of our ...
Modeling healthcare data using multiple-channel latent Dirichlet allocation
Display Omitted A novel topic model to handle diagnoses, medications, and contextual information.Discovery of patient groups and corresponding characteristics.Prediction of diagnoses giving medications and vice versa.Pairing of diagnoses and medications ...
MetaCare++: Meta-Learning with Hierarchical Subtyping for Cold-Start Diagnosis Prediction in Healthcare Data
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalCold-start diagnosis prediction is a challenging task for AI in healthcare, where often only a few visits per patient and a few observations per disease can be exploited. Although meta-learning is widely adopted to address the data sparsity problem in ...
Comments