Southern African Treatment Resistance Network (SATuRN) RegaDB HIV drug resistance and clinical management database: supporting patient management, surveillance and research in southern Africa

Abstract Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on www.bioafrica.net. Database URL: http://www.bioafrica.net/regadb/


Introduction
The international response to the human immunodeficiency virus (HIV) pandemic has been characterized by unprecedented speed and depth that was only possible through collaboration of clinicians, scientists and civil society including many disciplines and research groups. As a result of this process, the study of the HIV epidemic has generated substantial amount of data. In addition, because HIV is one of the first organisms for which genomic data have been used to identify resistance to drugs and to trace its origin, hundreds of thousands of subgenomic regions have been sequenced both from the management of patients infected by the virus and from academic endeavors to try to understand the epidemic and to discover and develop interventions.
Numerous public databases, such as the Los Alamos HIV Database and the Stanford HIV Drug Resistance Database, have been created to manage the burgeoning number of HIV genomic data sets (1,2). These databases provide platforms for academics to share and compare data as well as to answer new research questions not originally envisioned by the original investigators. However, a limitation of these public databases is that they do not store and curate data before publication. Moreover, these databases are primarily sequence repositories and contain limited clinical data. Therefore, there is a need for databases to manipulate and curate primary data that include both sequence data and associated clinical, treatment and monitoring data.
In this article, we describe the online database of the Southern African Treatment and Resistance Network (SATuRN). The network is a consortium of clinicians, scientists, public health experts and policy makers (3). The network has 24 member institutions working in southern Africa at the epicenter of the HIV and tuberculosis (TB) epidemics. To foster collaboration among members and to curate primary data, the SATuRN RegaDB HIV Drug Resistance and Clinical Management Database (http:// www.bioafrica.net/regadb/index.html) was established. RegaDB is an integrated open source relational database for the management and analysis of HIV treatment, monitoring and resistance data (4). The database is designed to facilitate individual patient management and also to enable real-time surveillance and research, ultimately to inform public health policies in the region.
SATuRN member sites are encouraged to use RegaDB for real-time management of patients failing antiretroviral therapy (ART). It is configured to incorporate a number of online analytic tools such as the Rega HIV Subtyping tool (5) and drug resistance interpretation tools such as HIVDB (6), REGA (7) and ANRS (8,9). The database is used to produce genotypic resistance reports with specialist advice on therapeutic options tailored to the clinical and treatment history data provided by the clinicians. As a consequence, medical officers and nurses are impelled to provide detailed data to receive robust advice, and this cycle drives quality data curation.
In this article, we first describe the data collection methods followed by a description of the data curation process. We also provide information on database users, data sharing and access policy. We conclude the article with examples of how our curated data have been used for biological discovery.

Data collection and curation process: primary data
A clinical case report form was developed for the collection of demographic and clinical information, treatment data and laboratory monitoring data for input into the SATuRN RegaDB (Supplementary Information). All data are anonymized at the point of entry to the database, but secure records held separately allow linkage to patient care and access to follow-up data. The medical officers or nurses managing the patient send the completed form together with a blood sample submitted for genotypic resistance testing to a central laboratory. The sample is used to produce an HIV-1 DNA sequence of the protease (PR) and reverse transcriptase (RT) genes, which are HIV-1 proteins targeted by first-and second-line ART regimens in southern Africa. Genotypic resistance testing is performed using the in-house SATuRN/Life Technologies method (10). The genotype is translated, and drug resistance mutations are characterized with the use of the drug resistance algorithms contained within RegaDB.
The demographic data collected includes age and sex. Patient identifiers such as names and national identity (ID) numbers are not stored in the RegaDB database. All the stored data are anonymized, and it is the responsibility of the medical officers to link the data to the patient for clinical management. Laboratory data include the patient's viral load and CD4+ cell count results; in addition, hepatitis B virus surface antigen (HBsAg), creatinine clearance, hemoglobin and alanine aminotransferase results, pertinent to future treatment decisions, are also included.
The treatment information focuses on the ART history as well as treatments for comorbid conditions, in particular TB and hepatitis B virus, as these influence the selection of ART regimens. The start and end dates of specific ART regimens are recorded, with dosages and reasons for any substitution or switch of ART. The clinical form also includes a series of questions relating to adherence, which are based on the assessment tools within the South African national guidelines, and social factors that may influence adherence such as alcohol intake.
Once the HIV genomic data are generated and uploaded, they pass through a quality control step, which includes analysis for deletions, insertions and frame shifts, as well as for contamination using the Basic Local Alignment Search Tool (BLAST) (11) and phylogenetic methods. The genomic sequence is also subtyped using phylogenetic methods that can identify recombinants (5). The PR and RT proteins are analyzed by a pre-selected drug resistance algorithm, such as Stanford HIVDB or REGA algorithm or ANRS, to identify drug resistance mutations and provide an assessment of the level of resistance. The drug resistance interpretation, together with a graphical history of the patient's ART and laboratory monitoring history, is presented in the form of a report in MS Word (.doc) and rich text format (.rtf) ( Figure 1).
All SATuRN member institutions are encouraged to use SATuRN's RegaDB database reports for the management of patients failing ART. Specialist clinician advice in the report adheres, wherever possible, to regimens included within national treatment guidelines. Non-standard regimens are only recommended when there is a strong justification, and any request for ARTs outside the national guidelines requires further approval by the Department of Health. The public health approach to ART is of crucial importance to southern Africa, as >4 million patients are on treatment in the region and the great majority of patients receive treatment at primary health care clinics, with limited pharmaceutical and medical support.
The data are reviewed at many stages to seek out inconsistency and improve quality. Figure 2 shows a model example of the data sources and the review and curation steps. In this model, data from the clinical case report form are added to the database by a team of data curators. Two independent data clerks are involved in the data entry process-one enters the data and the other reviews the data to ensure accurate data entry. In addition, the clinical form is sent to the laboratory staff and the specialist clinicians. The laboratory staff are trained to interpret the clinical chart and to ensure that it makes biological sense (for example, a suppressed viral load result is not usually plausible unless the patient was on treatment at the time). The specialist clinicians ensure that the resistance levels are consistent with the drug regimens received by the patient, and write a detailed therapy recommendation ( Figure 1) that is added to the database.
The whole process described in Figure 2 takes, on average, 10-14 days. This process is part of the HIV Treatment Failure Clinic (HIV-TFC) model (12). It is important that the data curation is expedited as the patient follow-up visit is normally scheduled for 3-4 weeks after the initial visit at which the blood sample was taken. To receive a detailed report including specialist clinicians comments, the medical officers managing the patients need to complete the clinical form accurately. This motivates them to supply accurate information. Although the processes of data cleaning, validation and curation described here might seem laborious and time-consuming, the quality data obtained from this process are worth the investment. This is especially true in the context of reports of significant weaknesses of the general public health program data in the region (13,14). Furthermore, the use of a specialized physician to interpret drug resistance results is the norm in South Africa and Botswana public health HIV treatment programs (15). However, a process evaluation has been performed to review the whole system and identify areas that still need further optimization so as to deliver the most cost-efficient system while maintaining quality.

Database users and data access policies
SATuRN RegaDB genotypes are deposited in GenBank after publication. The data deposited in GenBank are limited to basic demographic information (age and sex), country of origin and isolation year. Genomic data are also deposited in the Stanford HIVDB, complemented with a list of antiretroviral drugs received before the genotype. Furthermore, RegaDB has an automatic export function that is compatible with GenBank and Stanford HIVDB. This is an important process for SATuRN, as one of its aims is to increase the amount of public genotypic drug resistance data in Africa. As part of this process, we have also installed and made publicly available the first mirror of the Stanford HIVDB in Africa (3).

Example of biological discoveries
A literature review and data analysis was performed to review the temporal trends of HIV-1 transmitted drug resistance (TDR) in South Africa (16). Publicly available data were retrieved either from GenBank or by direct request to original authors. Ten data sets with 1618 sequences collected between 2000 and 2010 were pooled, with 72 sequences from recent sero-converters from the Africa Centre's (AC) 2010 HIV survey in KwaZulu-Natal, South Africa. All of the data were curated and stored in SATuRN RegaDB and were analyzed using the Calibrated Population Resistance Program (17). There was no evidence of TDR from the AC samples.The temporal analysis for South Africa showed that 2002 was the year with the highest TDR rate (6.67, 95% confidence interval: 3.09-13.79%). After 2002, TDR levels decreased to <5% (WHO low-level TDR threshold). There was no statistically significant increase in the interval between 2002 and 2010. These results were published and discussed with the National Department of Health, as they conflicted with a recent publication(18) that pointed to an increase in TDR in KwaZulu-Natal. A large collaborative national survey involving SATuRN investigators and the National Department of Health is currently in process. Continuous representative TDR surveys are needed to ensure that current first-line regimens remain effective, as an increase in TDR could reverse the gains of ARV rollout.
Results from an analysis of data on SATuRN RegaDB for patients with first-line antiretroviral treatment failure in a rural primary health care program in KwaZulu-Natal were recently published (19). A secondary analysis to identify factors associated with the absence of HIV drug resistance in patients with failure of first-line ART to inform adherence strategies and to determine whether unnecessary switches to second-line therapy could be avoided was also done (20). In total, 243 patients were included in the final analysis, and detailed adherence and clinical information was curated from clinics in the Hlabisa Treatment and Care Programme, South Africa (Appendix). The genotypes were linked to 38 other adherence and clinical variables. This information was reviewed by two data clerks and a medical officer and added to RegaDB. Predictors associated with the absence of drug resistance were analyzed by univariable and multivariable logistic regression methods. These data curation and analysis showed that there are a number of factors associated with the absence of drug resistance following ART failure, one of which (baseline CD4+ count) is a strong association with poor adherence, which can lead to significantly higher levels of immunological failure, putting these patients at increased risk of mortality.
The data stored in SATuRN RegaDB are currently being used in many research projects. This article also serves to provide details of existence of these longitudinal data sets that are available for applications from researchers to analyze the data, as described in the SATuRN Manual of Operation. In addition, the HIV drug resistance data curated in the SATuRN RegaDB have provided the information and tools to enable the education and training of health care workers and patients. At the time of writing this publication, clinical cases from the database have been presented to 2050 medical officers and nurses throughout Africa at the annual SATuRN conferences and workshops. Moreover, 15 cases stored within RegaDB, which highlight some of the major challenges involved in managing patients failing ART in the public sector in southern Africa, were collated and published recently as an open access book (21), of which 10 000 copies were freely distributed by SATuRN, Keth'impilo, Medicine San Fronteirs (MSF) and the Southern African HIV Clinicians Society to their medical officer members.