March 2023 Probabilistic HIV recency classification—a logistic regression without labeled individual level training data
Ben Sheng, Changcheng Li, Le Bao, Runze Li
Author Affiliations +
Ann. Appl. Stat. 17(1): 108-129 (March 2023). DOI: 10.1214/22-AOAS1618

Abstract

Accurate HIV incidence estimation, based on individual recent infection status (recent vs long-term infection), is important for monitoring the epidemic, targeting interventions to those at greatest risk of new infection, and evaluating existing programs of prevention and treatment. Starting from 2015, the population-based HIV impact assessment (PHIA) individual-level surveys are implemented in the most-affected countries in sub-Saharan Africa. PHIA is a nationally-representative HIV-focused survey that combines household visits with key questions and cutting-edge technologies, such as biomarker tests for HIV antibody and HIV viral load which offer the unique opportunity of distinguishing between recent infection and long-term infection, and providing relevant HIV information by age, gender, and location. In this article we propose a semisupervised logistic regression model for estimating individual level HIV recency status. It incorporates information from multiple data sources—the PHIA survey, where the true HIV recency status is unknown, and the cohort studies provided in the literature where the relationship between HIV recency status and the covariates are presented in the form of a contingency table. It also utilizes the national level HIV incidence estimates from the epidemiology model. Applying the proposed model to Malawi PHIA data, we demonstrate that our approach is more accurate for the individual level estimation and more appropriate for estimating HIV recency rates at aggregated levels than the current practice—the binary classification tree (BCT).

Funding Statement

Le Bao, Ben Sheng, and Changcheng Li are supported by NIH/NIAID 5-R01-AI136664. Runze Li and Changcheng Li are supported by National Science Foundation, DMS 1820702, DMS 1953196, and DMS 2015539.

Acknowledgments

All authors contribute equally and are listed in the order of seniority.

We appreciate the PHIA teams making the data publicly available for research purposes. Authors are also thankful to the Editor, Associate Editor and two anonymous reviewers for their constructive suggestions and comments. The early version of this work was completed while C. Li worked at The Pennsylvania State University as a postdoctoral fellow.

To whom correspondence should be addressed: E-mail: lebao@psu.edu.

Citation

Download Citation

Ben Sheng. Changcheng Li. Le Bao. Runze Li. "Probabilistic HIV recency classification—a logistic regression without labeled individual level training data." Ann. Appl. Stat. 17 (1) 108 - 129, March 2023. https://doi.org/10.1214/22-AOAS1618

Information

Received: 1 March 2021; Revised: 1 January 2022; Published: March 2023
First available in Project Euclid: 24 January 2023

MathSciNet: MR4539024
zbMATH: 07656969
Digital Object Identifier: 10.1214/22-AOAS1618

Keywords: Contingency table , HIV incidence , HIV recency , logistic regression , Weakly supervised learning

Rights: Copyright © 2023 Institute of Mathematical Statistics

JOURNAL ARTICLE
22 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.17 • No. 1 • March 2023
Back to Top