Defining blood hematology reference values in female pig-tailed macaques (Macaca nemestrina) using the Isolation Forest algorithm

Background: Pig-tailed macaques (PTMs) are commonly used as preclinical models to assess antiretroviral drugs for HIV prevention research. Drug toxicities and disease pathologies are often preceded by changes in blood hematology. To better assess the safety profile of pharmaceuticals, we defined normal ranges of hematological values in PTMs using an Isolation Forest (iForest) algorithm. Methods: Eighteen female PTMs were evaluated. Blood was collected 1–24 times per animal for a total of 159 samples. Complete blood counts were performed, and iForest was used to analyze the hematology data to detect outliers. Results: Median, IQR, and ranges were calculated for 13 hematology parameters. From all samples, 22 outliers were detected. These outliers were excluded from the reference index. Conclusions: Using iForest, we defined a normal range for hematology parameters in female PTMs. This reference index can be a valuable tool for future studies evaluating drug toxicities in PTMs.

Macaque models are critical for evaluating pre-exposure prophylaxis (PrEP) modalities for preventing human immunodeficiency virus (HIV) infections.2][3] The translatability of macaque models for studying HIV pathogenesis and prevention is in part due to the close phylogeny and physiology shared between macaques and humans.[6][7][8] Pig-tailed macaques (PTMs) (Macaca nemestrina) are the preferred species for modeling vaginal HIV infection due to shared features with the human female reproductive tract, and unlike rhesus macaques (Macaca mulatta), which are seasonal breeders, female PTMs experience a lunar menstrual cycle. 9,10Hormonal fluctuations associated with the menstrual cycle have been shown to impact SIV/SHIV susceptibility, which should be carefully considered when evaluating the efficacy of prevention products. 11,124][15] The combination of extensive historical data and continued refinements make the PTM model extremely valuable for assessing the PrEP efficacy of novel HIV prevention products. 16,17fety and efficacy evaluations are priorities in preclinical studies, and as long-acting PrEP products become more prevalent, these studies can require lengthy assessments.Blood hematology is a common gauge of safety, particularly as it relates to drug toxicities and disease pathologies, which are often preceded by changes in complete blood counts (CBCs).As such, animal research facilities monitor the CBCs of animals to appraise health and aid illness diagnosis.9][20][21][22] However, published hematology information for southern PTMs (Macaca nemestrina) is limited to wild-caught or infant animals. 23,24One recent study published values for a single adult PTM, but this analysis was limited to white blood cells (WBC) and did not include subsets such as lymphocytes (LY), monocytes (MO), or granulocytes (GR). 25us, we aimed to characterize the hematology of female PTMs, as these animals are important for preclinical HIV research, and defining baseline ranges for healthy animals is imperative for understanding study outcomes.To create a reference index of hematology parameters, we evaluated a large sample set consisting of 159 CBCs from 18 female PTMs.We identified statistical outliers in our data set using the Isolation Forest (iForest) model, which is an unsupervised algorithm that builds an ensemble of decision trees (iTrees) that repeatedly segment data using randomly selected features (i.e.hematology parameters) and split values. 26The assumption is that outliers will be isolated in fewer segmentations compared to normal data points.Using the iForest data science method, we have established a reference index of blood hematology parameters in female PTMs.This reference will be an asset for any preclinical research studies involving PTMs, specifically for those evaluating drug toxicities and other disease pathologies.

| Humane care guidelines
The authors confirm that the ethical policies of the journal, as noted on the journal's author guidelines page, have been adhered to and the appropriate ethical review committee approval has been received.This research was conducted under a Centers for Disease Control and Prevention Institutional Animal Care and Use Committee-approved protocol in compliance with the Animal Welfare Act, PHS Policy, and other federal statutes and regulations relating to the use of animals in research.Animals were housed in an AAALAC International accredited facility that adheres to principles stated in the Guide for the Care and Use of Laboratory Animals policy. 27

| Study animals
Eighteen female PTMs with a median [range] age of 9 [7-17] years and a median [range] weight of 6.72 [5.65-12.05]kg were evaluated.Animals were not undergoing any clinical or protocol-related treatments with antiretrovirals or other drugs except for sedatives required for procedures.Blood was collected between 1 and 24 times per individual animal for a total of 159 samples.All macaques were purpose-bred and confirmed negative against Mycobacterium tuberculosis, Trichuris spp., Shigella, Campylobacter, Salmonella, Yersinia, simian retroviruses (SRVs), SIV, and simian T-lymphotropic viruses by the vendors.The macaques were not screened for herpes B virus.

| Statistical analyses
Outliers were detected using the iForest model, implementing the scikit-learn package (v1.0.2) in Python (v3.9.13). 28This algorithm builds an ensemble of decision trees, called iTrees, which consist of nodes and branches.The n_estimators parameter was tuned by identifying the point of convergence (n_estimators = 1000) in plots of the average path lengths against the number of iTrees for randomly selected sets of data points.The max_features parameter was chosen to equal the total number of hematology parameters (max_features = 13).Histograms of anomaly scores were examined to assess the presence of outliers, and the 'auto' contamination parameter was chosen.All statistical analyses were conducted in Python (v3.9.13) and R (v4.3.1).

| Complete blood counts
A total of 159 CBCs were analyzed from a cohort of 18 female PTMs.As outlined in Table 1, blood was collected between 1 and 24 times from each animal, with a median [range] of 5 [1-24] collections per individual.Macaques had blood collected weekly for up to 24 weeks, with differing number of collections based on animal allocation for other study needs.Thirteen hematology parameters were analyzed.

| Outlier detection
Outliers were detected using the iForest algorithm with the following settings: max features = 13, contamination = "auto", and n_estimators = 1000.In total, 22 outliers from 10 of the 18 animals were detected from the initial 159 sample set (Table 1).

| Blood hematology reference index
Each hematology parameter was plotted to visualize the effects of outlier removal on the distribution of the data set (Figure 1).As expected, the removal of outliers reduced the range, but the medians remained similar.
Age and weight are important factors to consider when allocating animals to research study groups. 29Therefore, we sought to understand if age and weight are determinants of hematological outliers.The median [range] age of animals that had outliers (n = 22) was 16.27 [6.63-17.31]years while the median [range] age of animals that did not have outliers (n = 137) was 8.67 [6.75-17.35]years.For weight, the median [range] of animals that had outliers (n = 22) was 7.19 [6.01-12.05]kg while the median [range] weight of animals that did not have outliers (n = 137) was 6.68 [5.65-10.63]kg (Figure 2).After outliers were excluded from the data set, four PTMs from the initial 18 were completely excluded from further analysis (Table 1: BB125, Z15211, Z14140, and Z15336).A reference index of 13 blood hematology parameters was then created using 137 CBCs from 14 PTMs, with a median [range] of 7 [1-24] collections per animal (Table 2).
Here, we report a reference index of blood hematology parameters in female PTMs, an important animal model for HIV research.These values provide an understanding of natural and expected variations in PTM hematology and can serve as a baseline for future studies evaluating drug safety and toxicities.The importance of these reference values was recently highlighted when clinical trials with the anti-HIV drug islatravir (ISL) were discontinued by the FDA after observing lymphopenia in the participants. 30Before its clinical hold, ISL was an attractive candidate for long-acting PrEP due to its high antiviral potency, long half-life, and predicted efficacy in macaque models. 31,32Unfortunately, the effect of ISL on LY was not fully investigated during early preclinical development.By providing an index of PTM reference values, our study may help to understand if the PTM model could detect lymphocyte toxicity due to ISL or other novel anti-HIV drugs prior to clinical advancement.
Hematology reference values have been reported for rhesus macaques (Macaca mulatta), 18,19 Japanese macaques (Macaca fuscata), 20 and cynomolgus macaques (Macaca fascicularis). 21Comparisons of their CBC values indicate many similarities among species and captivity status.However, there are some differences, such as WBCs, which were 2-fold greater in rhesus, Japanese, and cynomolgus macaques compared to PTMs.LYs and GRs were also 2-fold greater in rhesus and Japanese macaques than in PTMs.Interestingly, MO values were similar across all the studies.However, these variations may also have resulted from differences in the hematology analyzer used or other factors such as sedation and captivity status, making direct comparisons across studies difficult.Despite these factors, it is evident that the hematology of each species may have distinct and unpredictable characteristics, confirming the need for a reference index specific to PTMs.
For nonhuman primate research, the Animal Research: Reporting of In Vivo Experiments guidelines are followed when reporting study groups and results.4][35] Thus, we sought to determine if age and/or weight impacted the likelihood an animal would have an outlier value in their hematology.The median age and weight of the outliers were higher than those of non-outliers, indicating that the majority of outliers were among the oldest and heaviest animals in our cohort, although their IQRs had some overlap with normal observations.This data reinforces the importance of age and weight-matched randomization for macaque research.Aging animals also have reduced regularity of hormonal cycles and low progesterone levels. 36,37We did not assess the menstrual cycle, but future studies evaluating the hematology in cycling female macaques could be beneficial to understand the effect of age and cycle phase on SHIV susceptibility.
To create a comprehensive reference index from this large data set, we employed the iForest algorithm to detect outliers in place of classic statistical methods.We considered several other outlier detection algorithms, including Mahalanobis distance and one-class Support Vector Machine (SVM).The Mahalanobis distance method can identify outliers in multivariate normal data sets. 38However, 10 of the hematology features were not normally distributed, and thus, parametric methods such as Mahalanobis distance were not appropriate.One-class SVM is a non-parametric test that is appropriate for multivariate data.However, the iForest method was selected over the one-class SVM due to faster run times and reduced sensitivity to outliers, which enabled the model to be trained on contaminated data (data containing both normal and outlier data points).Additionally, the iForest method has several adjustable model parameters including contamination, max features, and n_estimators.
In this study, we analyzed 159 CBCs from 18 female PTMs, with a median of five blood collections per animal.One limitation of this data set is that each animal had different sampling frequencies, which may create a bias towards animals that were sampled more often.However, longitudinal sampling is also imperative in understanding how factors such as weight gain/loss, and aging can alter intra-animal variability.Additionally, other factors, such as infection, may induce immune responses that can cause significant changes in hematology.In this study, all animals were confirmed to be free of Mycobacterium tuberculosis, Trichuris, Shigella, Campylobacter, Salmonella, Yersinia, and SRVs through initial and routine health screenings.However, macaques were not screened for ubiquitous pathogens such as Herpes B virus or other active infections that have the potential to alter CBCs. 39Sedative medications and the frequency of sedation can also affect hematology.Previous studies have observed decreases in LY, HGB, HCT, and other parameters in rhesus macaques following sedations. 40All blood samples in this study were collected from anesthetized animals, providing consistency within the study samples tested.However, the effects of sedation should be considered when comparing these results to other studies that do not sedate animals for blood collection.Our results were obtained from the analysis of whole blood collected in K 2 EDTA BD Vacutainer ® EDTA Tubes using the Beckman Coulter AcT diff2 Hematology Analyzer. 41Other analyzers and alternative blood diluents may yield varied results.Lastly, another potential limitation of our study was the use of the iForest which has a tunable contamination parameter.This function requires a prior estimation of the percentage of outliers in the data.Due to the paucity of historical data on which to base these assumptions, the automatic estimation value was instead chosen for outlier analysis.
In conclusion, we defined a reference index of 13 hematology parameters from 137 CBCs from 14 female PTMs, using an iForest algorithm to detect and remove outliers from the total data set.After removing outliers, we characterized the expected values and ranges for each hematology parameter.This established reference index will be useful for monitoring the health of PTMs and will provide important baseline data for future studies that utilize this animal model for preclinical research.Sampling frequency and the number of outliers for each animal.One hundred and fifty-nine blood samples were collected from 18 female pig-tailed macaques for hematology analysis.The Isolation Forest algorithm was applied to detect statistical outliers.

FIGURE 2 .
FIGURE 2. Macaque age and weight distribution.The data distribution of all observations (blue), normal observations (green), and outliers (red) is shown for (A) age and (B) weight.The box and whiskers plots show the interquartile range (IQR) and the minimum and maximum.The line in the box represents the median.