COVID and nutrition: A machine learning perspective

A self-report questionnaire survey was conducted online to collect big data from over 16000 Iranian families (who were the residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19. With highly accurate scores, the findings strongly suggest that foods and water sources containing certain natural bioactive and phytochemical agents may help to reduce the risk of apparent COVID-19 infection.


Introduction
The Sars-Cov-2 pandemic (COVID-19) is a global crisis that has caused widespread devastation. Numerous researchers have attempted to address its various facets since it first surfaced. In computer engineering, machine learning is a prominent method of providing datadriven insights into newly emerging diseases such as the COVID- 19.
An observational study was conducted to ascertain the relationship between families' dietary nutrition regimens and their risk of contracting COVID-19 [15]. To this end, an online self-report questionnaire survey was conducted to collect data from over 16000 Iranian families (residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19.

Data collection
The resulting data storage includes some records regarding the effects of lifestyle factors (e.g., nutrition, water consumption sources, physical activity, smoking, age, gender, ethnic origin, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). These items combine to form a collection of 125 features (84 features for the nutrition state of the family). Phase 1 collected 11K completed questionnaires until the end of Mordad (July-August). Following that, an additional 5K completed questionnaires were added until Day (December), bringing the total to over 16K completed questionnaires in Phase 2. A subset of the research data is available in Ref. [16].

Data preprocessing
All incomplete or blank records were discarded (less than 3% of the total data). An object-oriented model for data processing was designed and implemented in Java. This Java code generated the required CSV tables for machine learning experiments.

Hyperparameter optimization
A greedy parameter optimization algorithm was used to calculate the best window size for running averages (Fig. 1). Running averages let us transform discrete data to continuous space data for micro-communities [24] (Fig. 2).

Experiments and results
Weka was used as the primary platform, running on a Corei7equipped PC. The results of twenty experiments (Tables I-II indicated that the accuracy rate was acceptable. Numerous classification algorithms have been evaluated. The random forest algorithm [17] and the multilayer perceptron algorithm [18] both performed better in terms of accuracy. According to calculations on billions of permutations of nutrition conditions and dietary regime items using data from people's diets and infection status, many dietary conditions significantly reduced the risk of apparent COVID-19 infection by 90%. In comparison, certain dietary factors increased risk by a factor of three or more. The findings indicate that certain diets may have a protective effect against COVID-19-related death (Fig. 3) (see Table 3).
An ID3 algorithm [19] (with 2540 instances of data and 9 features) was executed on Colab, and a decision tree was developed for several essential features with a Gini coefficient of 0.5 (Fig. 4).
The Appendix contains some of the observed results (for Phase 1 until Mordad for 11000 families). The researchers could obtain additional information about the data [16] or submit a request.

Metabolites experiments
Nutrition and lifestyle factors can affect the blood serum metabolite profile. Thus, metabolite analysis is a technique for examining the relationship between nutrition and the COVID-19. This section analyzed metabolomics data from a Chinese study (in Wuhan) [20], which included 430 metabolite features for 96 blood tests on 44 samples (including healthy, moderate, severe, and fatal COVID-19 cases). As a result, 96 instances with 430 features were available to analyze the relationship between blood metabolites and the status and severity of COVID-19 infection. Additionally, five data experiments were conducted in this section (with 10-fold cross-validation). The results indicated that precision and accuracy were nearly 90%, and the ROC was approximately 0.99 (see Table 3).
The J48 algorithm's decision tree indicated that the key control variables "death" and "survival" in severe COVID-19 cases were the blood level of T3 thyroid hormone (see Fig. 5). This finding corroborates the research results of several previous studies [21,22].

Dietary experiments of countries
On a broader scale, differences exist between countries regarding nutrition diets and COVID-19 statistics. This study conducted some classification experiments using the dataset provided by Ref. [23]. The first 99 countries with a high COVID prevalence were classified into 46  Fig. 3. The above diagram was plotted for the citizens of Tehran in the research dataset for 330K dietary conditions associated with a reduction in the risk of COVID-19. Each point represents a distinct group of dietary conditions, and each condition is further subdivided into four subparts (e.g., daily coffee consumption, daily dairy consumption, weekly consumption of fish, and high consumption of fast foods).   Table 4).

Conclusion
A comprehensive questionnaire survey was conducted with over 16000 Iranian families to collect data (the residents of more than 1000 different urban cities and rural areas of Iran). The survey resulted in the creation of big data of COVID-19 and lifestyle (with more than 1 M of data records and more than 1G of items collected by acquiring semantic entailment rules-for a digest report, see Table 5). The resulting big data set included records about the effect of lifestyle factors (nutrition, water sources, physical activity, smoking, age, gender, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). The findings strongly indicated that foods and water sources containing several naturally occurring hypomethylating agents significantly reduced the risk of apparent COVID-19 infection. Overall, the experimental data indicated an acceptable level of accuracy for the relationship between nutrition and Sars-Cov-2 infection. Moreover, computations on billions of combinations of nutrition conditions and dietary regime items indicated that several dietary conditions mitigated the risk of apparent COVID-19 infection.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.   5. The J48 algorithm's decision tree suggests that the key control variables for "death" and "survival" in severe COVID-19 cases were the level of T3 thyroid hormone in the blood.