Soy Milk Consumption in the United States of America: An NHANES Data Report

With the increasing adoption of plant-based diets in the United States, more and more individuals replace cow milk with plant-based milk alternatives. Soy milk is a commonly used cow milk substitute, which is characterized by a higher content of polyunsaturated fatty acids and fibers. Despite these favorable characteristics, little is known about the current prevalence of soy milk consumption the United States. We used data from the National Health and Nutrition Examination Surveys (NHANES) to assess soy milk usage in the United States and identified potential predictors for its consumption in the US general population. The proportion of individuals reporting soy milk consumption in the NHANES 2015–2016 cycle was 2%, and 1.54% in the NHANES 2017–2020 cycle. Non-Hispanic Asian and Black ethnicities (as well as other Hispanic and Mexican American ethnicities in the 2017–2020 cycle) significantly increased the odds for soy milk consumption. While a college degree and weekly moderate physical activity were associated with significantly higher odds for consuming soy milk (OR: 2.21 and 2.36, respectively), sex was not an important predictor. In light of the putative health benefits of soy milk and its more favorable environmental impact as compared to cow milk, future investigations should attempt to identify strategies that may help promote its consumption in selected populations.


Introduction
The plant-based diet is increasingly adopted by the general population in Western countries and has also attracted the interest of the scientific community and the food industry [1][2][3]. As a result, the market has increased the available amounts of innovative plant-based foods to meet this growing demand [4,5]. The interest in switching to plantbased alternatives is frequently derived from ethical aspects and advantages associated with health [6,7], and recently also from a greater sensitivity towards environmental aspects that have emerged from the scientific literature [8][9][10].
Adoption of vegetarian and vegan diets has shown a beneficial effect on cancer incidence [6], and has been associated with a reduction in cardiovascular morbidity and mortality in recent clinical studies [7,11]. These aspects are particularly relevant considering that around one third of cardiovascular and neoplastic diseases in the world could be prevented by increasing fruit and vegetable intake, according to the World Health Organization and the World Cancer Research Fund [12].
With the increase in the demand for plant-based foods, the consumption of alternatives to cow milk also raised, with a forecast increase of over 10% from 2000 to 2024 (globally), with the major trend observed for the Asia-Pacific region [13]. At the same time, research has also moved to bridge the gap between consumer needs (milk allergy, lactose intolerance, or vegan diet) and commercial options [14][15][16]. Although the term "milk" had already been regulated as an exclusive term for the mammary secretion of cows and other mammals by the Food and Drugs Administration (FDA) and the European Union [17,18], the FDA recently issued a recommendation regarding the labelling of plant-based dairy alternatives, defining the lawfulness of including the term "milk" [19]. The consumer is now thoroughly familiar with these foods so there is no longer need for the previous terminological restrictions, with the recommendation of clear labelling regarding the nutritional properties of the products. Accurate labelling and fortification of plant-products already available on the market would allow consumers to assess the adequacy of vitamins and other micronutrients usually lacking in these products if compared to cow milk [20].
Among plant-based drinks, one of the most commonly used as a substitute for cow milk is soy milk [21]. Soy is a widely used food in vegetarian diets [22]. Among its nutritional characteristics, soy milk is the only plant-based alternative to cow milk with a similar protein content [23]. Furthermore, it has a comparable Digestible Indispensable Amino Acid Score, demonstrating a good protein quality [24]. Additionally, soy milk is characterized by a higher content of polyunsaturated fatty acids, fibers, and by the absence of cholesterol [25]. These features may help reduce LDL levels [26]. The replacement of cow milk with soy milk could have an advantage in vegetarian diets as regards the absence of iron in the former and the possible presence of vegetable ferritin in the latter [27].
Soybean crops have a relevant environmental impact, with a variable effect on factors such as eutrophication, acidification, and global warming in different countries [28], and with a negative social impact on humans [29]. Nevertheless, soybean represents the main source of animal feed production [30]. Moreover, almost 80% of the world's soy production is destined for livestock, including milk and dairy production [31], with about 2% designated for soy milk for humans [32].
Used as an alternative to cow milk, soy milk represents a more sustainable solution in terms of environmental impact and can be consistent with food security objectives [33]. Even if the presence of isoflavones has raised health concerns, it could have an advantage in mitigating menopausal disorders, without critical hormonal and fertility disturbance [34,35]. Nonetheless, soy milk has shown beneficial antioxidant actions, mainly attributable to the content of isoflavones [36].
Based on comments submitted to the FDA, dietitians appear to have a more accurate understanding of plant-based substitutes than other healthcare professionals [37]. More than half of consumers do not believe that dairy products are nutritionally better than plant-based alternatives and think that the latter can be part of a healthy diet [37]. In a sensory evaluation study, soy milk was shown to be the most popular milk alternative across various groups of participants, including omnivores and vegans [38].
Soy milk is one of the most common plant-based alternatives to cow milk and the only plant-based dairy substitute in the Dietary Guidelines for Americans [39]. Yet, data on its consumption in the US is sparse. This cross-sectional study sought to investigate the prevalence of soy milk consumption in a large and nationally representative cohort of American adults (NHANES-National Health and Nutrition Examination Survey) and aimed at a better understanding of its association with correlated sociodemographic aspects.

Study Population and Design
This analysis is based on data from the NHANES-an ongoing program of studies by the Centers for Disease Control and Prevention designed to comprehensively assess the health and nutritional status of the non-institutionalized U.S. population [40,41]. The NHANES' complex multistage, stratified, clustered, and probability sampling design allows for nationally representative health and nutritional status assessments. Key program characteristics (including recruitment methods, study size, and study execution details) have been described elsewhere in detail [39,40]. NHANES was approved by the National Center for Health Statistics (NCHS) and all study participants gave written and oral consent to the study [42].
For this analysis, we used data from two different NHANES cycles: (I) the NHANES 2015-2016 cycle, and (II) NHANES 2017-2020 (which is also called the pre-pandemic cycle) [43,44]. Both cycles were analyzed independently for methodological issues because some important variables that were included in the 2015-2016 cycle were no longer available in the NHANES pre-pandemic cycle.

Primary Outcome Variable
Data on soy milk consumption was obtained from the NHANES Diet Behavior and Nutrition questionnaire. This module provides personal interview data on various dietary behavior and nutrition related topics. Amongst others, it includes one question on milk product consumption in the past 30 days. Said question reads as follows: "In the past 30 days, how often did you have milk to drink or on your cereal?" Participants were instructed to include chocolate and other flavored milks as well as hot cocoa made with milk. Moreover, they were instructed not to count small amounts of milk added to coffee or tea. The question did not cover milk usage in cooking. Answer options included "never", "rarely-less than once a week", "sometimes-once a week or more, but less than once a day", "often-once a day or more", "varied", and "never". All participants that reported at least some occasional milk consumption were further asked: "What type of milk was it? Was it usually . . . "?
Subsequently, the NHANES inquired about several milk types, including (but not limited to) whole-milk, 1% fat milk, skim milk, and soy milk. Those participants who indicated soy milk consumption at least less than once a week were considered soy milk consumers. Those who denied soy milk consumption were considered non-consumers.

Covariates
Covariates for this analysis included sociodemographic data (gender, race/ethnicity, age, marital status, educational level, annual household income, household size, number of persons in the household, household food security category) as well as self-perceived general health status. Moreover, we included diabetes status (as assessed by the question: "Have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?"), smoking status (as assessed by the question "Have you smoked at least 100 cigarettes in your entire life?"), and physical activity (as assessed by the question "In a typical week do you do any moderate-intensity sports, fitness, or recreational activities that cause a small increase in breathing or heart rate such as brisk walking, bicycling, swimming, or volleyball for at least 10 min continuously?"). Apart from age (continuous variable) all other variables were treated as categorical variables.

Inclusion and Exclusion Criteria
We included all participants with the following criteria: age ≥ 20 years, available demographic data, and available milk intake data. Individuals with incomplete or missing data were not considered for this study.

Statistical Analysis
Statistical analysis was performed with Stata 14 statistical software (StataCorp. 2015. Stata Statistical Software: Release 14. College Station, TX, USA: StataCorp LP). The primary sampling unit variable for variance estimation and the pseudo-stratum variable as the stratification variable that were provided with both NHANES cycles were used for each analysis. To avoid missing standard errors because of strata with a single sampling unit, we used the "singleunit(scaled)" option in Stata, which is a scaled version of singleunit(certainty) and intro-duces a scaling factor that is derived from using the average of the variances from the strata with multiple sampling units for each stratum with a singleton primary sampling unit [45].
We used histograms and subpopulation summary statistics to check for normality of the data. Categorical variables were described with their weighted proportions and standard error in parenthesis. Normally distributed variables were described with their mean and standard error in parenthesis. All standard errors were estimated using Taylor series linearization to account for the complex NHANES sampling design. All weighting procedures were performed in accordance with the most recent applied survey data analysis techniques by Heeringa, West, and Berglund [46], and in compliance with the current National Center for Health Statistics (NCHS) data presentation standards for proportions [47]. All weighted proportions were manually screened for reliability using the user-written post-estimation Stata command "kg_nchs" [48]. Potentially unreliable proportions that did not meet the NCHS presentation standards were highlighted and clearly marked with superscript letters.
Stata's Rao-Scott test and multivariate logistic regression models were used to examine potential associations between self-reported soy milk intake and various predictor variables. Logistic regression models were constructed based on the recommendations of Heeringa, West, and Berglund [46]. In a first step, we conducted exploratory bivariate analyses to check the eligibility of potential candidate predictors of soy milk intake. Candidate predictors of scientific interest and a bivariate relationship of significance p < 0.25 with the response variable were included in the multivariate logistic models. Subsequently, we evaluated the contribution of each predictor to the multivariate model using Wald tests. All variables (except age) were entered as categorical variables into the regression models. At least two models were constructed for each cycle, based on the available cycle-specific predictors. A p-value < 0.05 was used as the cutoff for statistical significance.

Results
The total NHANES 2015-2016 sample for analysis comprised n = 5264 participants with a full data set, of which n = 132 reported soy milk consumption. This may be extrapolated to represent n = 4,427,078 US Americans. The NHANES 2017-2020 prepandemic cycle included n = 8511 participants with a full dataset, of which n = 187 reported soy milk consumption. This may be extrapolated to represent n = 3,460,784 US Americans. Figure 1 shows the participant inclusion flow chart for the 2015-2016 cycle on the left side and for the NHANES 2017-2020 pre-pandemic cycle on the right side.
The weighted proportion of individuals reporting soy milk consumption in the 2015-2016 cycle was 2%, whereas it was 1.54% in the NHANES 2017-2020 pre-pandemic cycle.

NHANES 2015-2016
The sample characteristics of the participants reporting soy milk consumption are shown in Table 1. The weighted percentage of females consuming soy milk tended to be higher as compared to males drinking soy milk (Table 1); however, the difference was not statistically significant. Almost 43% (weighted proportion) of soy milk consumers were of Non-Hispanic White origin. Non-Hispanic Blacks and Non-Hispanic Asians accounted for more than 17% each.
Significant differences between soy milk consumers and non-consumers were found with regard to educational level. A significantly higher weighted proportion of individuals reporting soy milk intake had a college degree or higher (46.96% vs. 32.18%, p = 0.03). No significant intergroup differences were found with regard to household size, household food security level, and annual income. A significantly higher proportion of soy milk consumers indicated moderate recreational activities as compared to non-consumers. The weighted proportion of individuals reporting soy milk consumption in the 2015-2016 cycle was 2%, whereas it was 1.54% in the NHANES 2017-2020 pre-pandemic cycle.

NHANES 2015-2016
The sample characteristics of the participants reporting soy milk consumption are shown in Table 1. The weighted percentage of females consuming soy milk tended to be higher as compared to males drinking soy milk (Table 1); however, the difference was not statistically significant. Almost 43% (weighted proportion) of soy milk consumers were of Non-Hispanic White origin. Non-Hispanic Blacks and Non-Hispanic Asians accounted for more than 17% each. In a next step, we used multivariate logistic regression models to examine potential associations between soy milk intake status (dependent variable) and various predictor variables (Table 2). While female sex did not increase the odds for soy milk consumption, Non-Hispanic Black and Non-Hispanic Asian ethnicities significantly increased the odds (OR: 2.51 and 4.87, respectively) in model 1. In a second (model 2) households with six or more persons had significantly lower odds for soy milk consumption (Table 2). Notably, said model was overall no longer statistically significant. When adding physical activity in model 3, statistical significance was retained. Participants with moderate-intensity sports and recreational activities had significantly higher odds for soy milk consumption (OR: 2.36).

NHANES 2017-2020
Sample characteristics of participants reporting soy milk consumption in the NHANES pre-pandemic cycle are shown in Table 3. The weighted percentage of females consuming soy milk was significantly higher in the NHANES 2017-2020 cycle: 63.45% vs. 36.55%. Only 34.55% (weighted proportion) of soy milk consumers were of Non-Hispanic White origin, whereas approximately 18.52% were Non-Hispanic Asians. Significant differences between both groups were also found with regard to educational level. The weighted proportion of individuals with a high school degree was substantially lower among soy milk consumers (16.01% vs. 27.10%, p = 0.006) while the weighted proportion of participants with (some) college degree tended to be higher. No significant differences were found with regard to household food security level, general (self-perceived health condition), and annual income. A significantly higher proportion of soy milk consumers indicated moderate recreational activities as compared to non-consumers. The weighted proportion of smokers also differed significantly between groups.    Weighted proportions. Total number of unweighted observations: n = 8511. Continuous variables shown as mean (standard error). Categorical variables shown as weighted proportion (standard error). a = includes multi-racial; b = based on Stata's design-adjusted Rao-Scott test, c = based on regression analyses followed by adjusted Wald tests, d = or equivalent, e = indicates significant differences in the weighted proportions, f = weighted proportions to be considered unreliable, as peer recent NCHS Guidelines. Column percentages may not equal 100% due to rounding.
Again, we used multivariate logistic regression models to examine potential associations between soy milk intake status and various predictor variables (Table 4). Female sex did not increase the odds for soy milk consumption after adjustment for race/ethnicity and education level. Notably, Mexican American and Other Hispanic ethnicities significantly increased the odds (OR: 4.26 and 3.21, respectively). The same applied to Non-Hispanic Black and Non-Hispanic Asian ethnicities (OR: 2.62 and 5.60, respectively) in a second model adjusted for smoking status and moderate intensity activity. In both models, college graduates had a significantly higher OR for soy milk consumption ( Table 4). The additional adjustment for physical activity did not significantly alter the findings from model 1. Participants with moderate-intensity sports and recreational activities had significantly higher odds for soy milk consumption (OR: 1.65).

Discussion
We used NHANES data to assess the prevalence of soy milk consumption in the Unites States and sought to identify potential sociodemographic predictors increasing the likelihood of its usage. The weighted proportion of individuals reporting soy milk intake in the NHANES 2015-2016 cycle was 2% and changed slightly to 1.54% in the NHANES 2017-2020 pre-pandemic cycle. Non-Hispanic Asian and Black ethnicities (as well as other Hispanic and Mexican American ethnicities in the 2017-2020 cycle) significantly increased the odds for soy milk consumption. College graduates also had significantly higher odds for consuming soy milk (OR: 2.14) in the pre-pandemic NHANES cycle. Our results also suggest that sex is apparently not an important predictor of soy milk consumption in this cross-sectional sample, while moderate physical activity was associated with higher odds.
Soy milk is one of the fastest growing categories in the U.S. plant-based non-dairy functional beverage market [49,50]. Cow milk allergies, lactose intolerance, calorie concerns, an unfavorable lipid profile, and a preference towards vegan diets for health and ethical reasons (including aspects such as environmental concerns and animal welfare) have increasingly influenced consumers across the globe towards choosing cow milk alternatives [50,51].
In addition to that, individuals are also increasingly concerned about potential negative health impacts of dairy products [52], including their high saturated fat content, their potential hormonal contamination [53], and, above all, their potential association with several diseases including various types of cancer [54][55][56]. However, recent systematic data highlighted some beneficial aspects of cow milk consumption in osteoporosis, cardiovascular diseases, and metabolic syndrome at various stages of life [57,58]. Nevertheless, concerns about acne, infant iron-deficiency anemia, prostate, colorectal and bladder cancers, and Parkinson's disease associated with cow milk consumption remain.
For the aforementioned reasons, soy milk is as a rapidly emerging competitor to dairy milk [49]. With regard to its nutritional profile, a 2018 review suggested that soy milk is the best alternative milk for replacing cow milk in the human diet [16]. Soy milk may also favorably affect circulating estrogen levels in premenopausal women, which could reduce the risk for breast cancer [59]. In men, soy milk consumption was associated with a reduction in prostate cancer risk [60].
Despite these putative benefits, data on soy milk intake is scarce. Sociodemographic predictors and drivers of soy milk have rarely been investigated. A study by Dharmasena and Capps suggested that age, employment status, education level, race, ethnicity, region, and presence of children in a household are significant drivers of the demand for soy milk [49]. While based on a larger sample, their study dates back to the year 2008 [49]. Using more recent data from the NHANES, we were able to confirm some of the previously identified sociodemographic predictors.
Our findings may provide valuable information about soy milk consumers and could be employed in possible public health strategies to enhance soy milk product usage and consumption. Marketing for soy products is said to require meticulous consumer segmentation in order to development food products that may appeal to different populations with various opinion and tastes [61,62]. Based on our results, individuals of Non-Hispanic White ethnicity could be such a group. The same may apply to individuals with a lower education level. Targeted marketing improving the nutritional knowledge about soy milk as a potential dairy substitute could enhance consumption in said prospective buyers.

Strengths and Limitations
The present study has various strengths and limitations that require further discussion. One major limitation is the cross-sectional nature of this analysis, which does not allow for any causal inference. Although we used a nationally-representative sample of United States Americans, the number of soy milk consumers was only modest, and some estimated reported proportions must be considered unreliable as per recent NCHS guidelines. We transparently flagged these proportions in the results section and clearly acknowledge this limitation. Furthermore, this analysis solely relied on data from the NHANES Diet Behavior & Nutrition module, it is not based on 24-h dietary recalls and does not inquire about reasons for (and barriers to) soy milk consumption. Such variables were unavailable in the employed NHANES cycles but would have significantly enriched our analysis. Finally, the NHANES "only" inquired about the usage of (soy) milk consumption as a drink or in combination with cereals. This excludes cooking and therefore some classical (vegan) meals that include soy milk, including but not limited to dairy-free macaroni and cheese, dairy-free lasagna, soy milk shakes as well as dairy-free pies, desserts, and cookies. As such, we may have underestimated the true prevalence of soy milk consumption. Nevertheless, we believe in the value of our data and call for additional studies in this particular field to enhance our understanding of soy milk consumption.

Conclusions
The weighted proportion of individuals reporting soy milk consumption in the NHANES ranged from approximately 1.54 to 2.0% in some of the latest NHNAES cycles. Several sociodemographic predictors of soy milk consumption (including race/ethnicity, household size, and educational level) were identified. Nevertheless, additional studies are warranted to gain a better understanding of drivers for (and barriers to) soy milk consumption in the United States. In light of the putative health benefits of soy milk and its more favorable environmental impact as compared to cow milk, future investigations should attempt to identify strategies that help promote its consumption.

Institutional Review Board Statement:
The present study is a negligible risk research that involves existing collections of data that contain only non-identifiable data about human beings. It is a deidentified secondary analysis of freely available data. The research was performed in accordance with the Declaration of Helsinki and approved by the NCHS Research Ethics Review Board (https: //www.cdc.gov/nchs/nhanes/irba98.htm; accessed on 15 May 2022). Protocol #2011-17 and Protocol #2018-01. NHANES was approved by the National Centre for Health Statistics research ethics review board, and informed consent was obtained for all participants.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data are publicly available online (https://wwwn.cdc.gov/nchs/ nhanes/Default.aspx; accessed on 2 July 2022). The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.