Reviewing Multimodal Machine Learning and Its Use in Cardiovascular Diseases Detection

: Machine Learning (ML) and Deep Learning (DL) are derivatives of Artiﬁcial Intelligence (AI) that have already demonstrated their effectiveness in a variety of domains, including healthcare, where they are now routinely integrated into patients’ daily activities. On the other hand, data heterogeneity has long been a key obstacle in AI, ML and DL. Here, Multimodal Machine Learning (Multimodal ML) has emerged as a method that enables the training of complex ML and DL models that use heterogeneous data in their learning process. In addition, Multimodal ML enables the integration of multiple models in the search for a single, comprehensive solution to a complex problem. In this review, the technical aspects of Multimodal ML are discussed, including a deﬁnition of the technology and its technical underpinnings, especially data fusion. It also outlines the differences between this technology and others, such as Ensemble Learning, as well as the various workﬂows that can be followed in Multimodal ML. In addition, this article examines in depth the use of Multimodal ML in the detection and prediction of Cardiovascular Diseases, highlighting the results obtained so far and the possible starting points for improving its use in the aforementioned ﬁeld. Finally, a number of the most common problems hindering the development of this technology and potential solutions that could be pursued in future studies are outlined.


Introduction
Artificial Intelligence (AI) has experienced rapid growth over the past two decades. The concept of AI has been around since 1950, and the term itself was coined in 1965 at the Dartmouth Summer Workshop, which is considered the founding event of AI as a field [1]. However, the growth in Information and Communication Technologies (ICTs) and the increasing power of computers have contributed significantly to the increasing feasibility and adoption of AI [2]. AI technologies are becoming more advanced and are capable of analyzing enormous amounts of data, learning from past experiences, and making predictions based on patterns and trends [3]. Despite the popularity of AI, there is no single definition for this technology. Researchers in [4], for example, defined it as a set of tools and techniques that use principles and devices from various fields, such as computation, mathematics, logic, and biology, to address the problem of realizing, modeling, and mimicking human intelligence and cognitive processes. Furthermore, the authors define in [5] AI as the study of an "Intelligent Agent", i.e., machines that are able to recognize and understand their environment and consequently take appropriate actions to increase their chances of achieving their goals. In an attempt to unify definitions, the authors defined in [6] AI as a program that can cope in an arbitrary world no worse than a human. These different definitions reflect the different competencies of AI, which explains the diversity of AI implementations in our daily lives.
Machine Learning (ML) [7], Deep Learning (DL) [8], Federated Machine Learning (FL) [9], and Multimodal Machine Learning [10] are all well-known and popular derivatives of the AI concept that have been adopted by users and applied in various aspects of our daily lives. These different branches of AI are depicted in Figure 1. In this context, Machine Learning is defined as a field of study that focuses on the development of algorithms and statistical models that enable computer systems to learn from data and make predictions or decisions without being explicitly programmed. It involves the application of various approaches, such as supervised and unsupervised learning, Reinforcement Learning, and Deep Learning, that allow computers to automatically improve their performance on a given task through experience [7]. On the other hand, Machine Learning has demonstrated high efficiency in solving classification and regression problems. Machine Learning's ability to extract meaningful insights and patterns from vast and complicated datasets and use this knowledge to make accurate predictions, automate decision making, and enable intelligent systems to learn and adapt in real-time is fundamental to its success. This success has led researchers from different fields to implement ML algorithms, and their efficiency can be observed in various fields, such as: • Healthcare services [11][12][13]; • Image, speech and pattern recognition [14,15]; • Internet of Things (IoT) and smart cities [14,16]; • Cybersecurity and threat intelligence [17]; • Natural language processing and sentiment analysis [18]; • User behavior analytics and context-aware smartphone applications [14,15]; • E-commerce and product recommendations [14,15]; • Sustainable agriculture [19]; • Industrial applications [20].

Machine Learning Domain Challenges
The great success of Machine Learning is not magic but the result of its ability to analyze large amounts of data at high speed and with high accuracy. However, the field of ML still suffers from various challenges and obstacles arising from different problems. Table 1 below summarizes the Machine Learning challenges and categorizes them based on their source. These challenges have been extensively studied in the literature, and more details can be found in several articles, such as [9,[21][22][23]. Table 1. Machine Learning domain common challenges.

Heterogeneity: Motivation(s) behind Multimodal ML
Advances in sensor technologies, storage concepts, communication networks, and other tools have driven data collection [28]. According to recent figures from Statista [29], the total amount of data generated worldwide will reach 64.2 zettabytes or 6.42 × 10 16 Megabytes in 2020. This increase exceeded predictions due to increasing demand as a result of the COVID-19 pandemic, as more individuals worked and studied from home and increasingly used utilized home entertainment alternatives. For the above reasons, data volumes are expected to reach 180 zettabytes in the next five years by 2025.
However, these data differ in type, structure, format, usability, lifespan, and other aspects. This heterogeneity poses several challenges in Machine Learning, as it can make it difficult to use data from different resources to gain useful insights or build accurate models. There are many types of heterogeneity, the most common of which are listed below [21,30,31]: • Structured vs. unstructured data: structured data are highly organized and usually follow a specific schema, while unstructured data have no predefined structure or organization; • Numeric vs. categorical data: Numeric data are quantitative and can be expressed as numbers, while categorical data are qualitative and represent discrete values, such as colors, types, or labels; • Temporal data: This type of data contains time-stamped information and can be used to analyze patterns and trends over time; • Multimodal data: This type of data combines different types of information, such as text, audio, images, and videos.
Thus, dealing with heterogeneous data requires careful processing and feature engineering to put the data into the form required for a single Machine Learning model [31]. In addition, multiple preprocessing steps may be required to analyze heterogeneous data, such as normalization, scaling, or other steps. In some cases, however, it may seem impossible to analyze heterogeneous data, even though training the model with this variety of resources improves its feasibility and increases confidence in its predictions.
For example, Magnetic Resonance Imaging (MRI) analysis using ML models has shown high efficiency in predicting Cardiovascular Diseases (CVDs), as shown in [32]. In addition, smart wearables equipped with ML models are also highly feasible in predicting cardiac disease, as shown in [33]. In addition, the use of Electronic Health Records (EHRs) collected from various health centers such as clinics, hospitals, or smart homes is also a good source for Cardiovascular Disease prediction using ML algorithms [34]. However, trying to merge these three types of data seems to be technically impossible because the first data source, namely MRI images, are stored in the form of medical electronic image files, and the data collected by wearables are structured data, while EHRs can be a collection of both structured and unstructured data, free text reports, medical examination data, or other formats. In the real world, a physician may analyze all of these data to make a more accurate diagnosis, though it is not easy to analyze these data sets simultaneously using the same model. This case is illustrated in Figure 2 below. In this context, Multimodal Machine Learning is proposed as a solution to the challenge of data heterogeneity in ML. Multimodal ML gives models the ability to analyze different data within the same ML workflow, whether by merging different datasets, by merging different models, or both, to arrive at a single result, such as the diagnosis of CVDs in the showcase mentioned above [10]. The ability to analyze these heterogeneous data with multiple views can be of varying importance to a learning task. Therefore, merging all of these data sets and treating them with equal importance is unlikely to lead to optimal learning outcomes [30].

Machine Learning and Healthcare
The importance of health to human life cannot be overstated, as it is essential for meeting basic needs, pursuing goals, maintaining relationships, and having an adequate quality of life, and poor health can have significant financial and societal consequences. Therefore, researchers are constantly striving to improve the quality of healthcare services. In this context, Artificial Intelligence and its branches, such as Machine Learning and Deep Learning, have been incorporated into healthcare services due to their high feasibility and usability in this field. Machine learning, in particular, is a powerful tool that has the potential to revolutionize healthcare in many ways [35]. ML has made remarkable progress in healthcare, not because of any mystical powers, but because of its superior data processing capabilities compared to those of humans. Because of its speed and precision, thousands of AI applications have already been developed for healthcare, making it a potentially revolutionary tool for solving a wide range of healthcare problems [36].
Machine Learning has been used in various areas of healthcare. Whether diagnosing diseases or even predicting diseases, it has proven to be very useful. Moreover, the development of communication tools, such as smart wearables equipped with Machine Learning and Deep Learning models, has opened the door to real-time continuous monitoring. In this context, smart wearables have shown high feasibility in predicting various diseases such as Cardiovascular Diseases [33], diabetes [37], liver disease [38], fatigue and stress [39], mental illness [40], and many other diseases [41]. In addition, ML models have been used to increase the efficiency of healthcare decision systems [42]. In addition, ML has also been used in the field of genomic medicine [43]. Overall, ML has succeeded in transforming health services and creating personalized digital health services that support physicians and improve the overall quality of public health [44].
Therefore, considering the importance of healthcare, it is urgent to improve the efficiency of ML. The use of state-of-the-art methods and the removal of obstacles to progress are essential to improving performance. The challenges described previously are reflected in the barriers to expanding the use of ML in healthcare, which are common to all ML implementations across all diseases. With this in mind, new solutions that could help promote the use of ML will lead to improved applications in a variety of settings.
1. Define the scope of the review: Clearly define the scope and objective of the review article. What is the main topic or research question that the review aims to address? What specific subtopics or themes will be covered? 2. Identify the key concepts and themes: Based on the scope and objective of the review, identify the key concepts and themes that will be discussed in the article. These should be organized in a logical and coherent manner that supports the overall objective of the review. 3. Develop a framework for presenting the review: Once the key concepts and themes have been identified, develop a framework for presenting the review. This could involve organizing the content chronologically, thematically, or conceptually, depending on the nature of the review and the key concepts and themes identified. 4. Clearly articulate the review framework: Finally, clearly articulate the review framework in the introduction or early sections of the review article. This will help to orient readers to the overall structure and organization of the review and make it easier for them to follow the content. Overall, the goal is to provide a clear and structured overview of the review article that highlights the key concepts and themes and guides the reader through the content in a logical and coherent manner.

Review Framework: Scope, Outline and Main Contributions
In this article, Multimodal Machine Learning is explored, and its role as a solution to the challenge of heterogeneity is detailed. In addition, the use of Multimodal ML in Cardiovascular Disease detection and prediction is technically reviewed to support its use in this field.

Scope of Research
To achieve the objectives of the study, Multimodal Machine Learning has been explored, along with the data fusion concept, which is the basis of the technology under study. In addition, the technical perspectives of Multimodal ML are studied, and the workflows related to it are examined. Furthermore, a comparison between Multimodal ML and other known techniques is made in order to distinguish between these different techniques. On the other hand, distinct areas where Multimodal ML is used are inspected, and a comprehensive overview of its application in Cardiovascular Diseases, including the state of the art, is therefore obtained. In addition, these implementations were analyzed from different perspectives to understand the limitations and future areas of research. Finally, the challenges and future recommendations associated with advancing this field are reviewed.

Research Questions
The scope of the article defined in the previous section is summarized by the research questions mentioned in the list below: To answer the above questions, the article is outlined as follows. In Section 2, Multimodal ML is reviewed from various angles, including technical definition(s), differences from other domains, such as classical ML, ensemble ML and others, available frameworks, and other details. Then, in Section 3, the use of Multimodal ML technology in CVD detection and prediction is presented by listing the state of the art in this field and discussing the technical details of the implementations mentioned in the literature. Later, in Section 4, the challenges that hinder progress in this field are discussed, and therefore, some future perspectives that could help in overcoming these challenges are proposed. This article attempts to answer the following questions:

Comparison with Previous Review Frameworks
The topic of Multimodal ML has been a hot and trending topic in recent years. As a result, numerous studies have already addressed this topic, with a large proportion of these studies reviewing Multimodal ML. However, this article proposes several new ideas that add to the knowledge of Multimodal ML. First, this study proposes a technical study for Multimodal ML that, on the one hand, helps to understand this technology and distinguish it from other existing AI techniques. Moreover, none of the previous studies proposed a technical review for the use of this technology in CVD detection and prediction. Moreover, this review discusses in detail the challenges and future ideas in this field to help future researchers select the most relevant ideas on which to build their future work.

Materials and Methods: What Is Multimodal ML?
The human mind processes information from multiple senses simultaneously. Sometimes it is not enough to just hear about a problem; individuals need to see it for themselves in order to make an informed judgment. For Artificial Intelligence to expand its knowledge of the world, it must be able to process a variety of information sources that may contradict each other. This principle also applies to the field of AI known as Machine Learning (ML), where Multimodal Machine Learning focuses on using numerous data sources to achieve a single goal by leveraging complementary information in a unified computational framework. The ability to explore diverse data increases predictive power and leads to more accurate and reliable results, making Multimodal Machine Learning a multidisciplinary topic with tremendous efficiency and amazing potential [5,10].

Overview and Definition(s)
Despite the fact that Multimodal Machine Learning is a popular and young research area that has received much attention, it is still in its infancy [4][5][6]45]. As a result, there is no single and universally accepted definition. Nevertheless, all definitions lead to the same concept: the ability to analyze different data sets to reach a single conclusion. For example, the authors describe in [4] Multimodal ML as the ability to evaluate data from Multimodal datasets, identify a common phenomenon, and use complementary knowledge to learn a complex task. Multimodal datasets are described in this way as data seen with many sensors, where the output of each sensor is called a modality and can be associated with a dataset. Similarly, the authors of [5] describe Multimodal ML as the integration of multiple data sources collected by different instruments, devices, or techniques, followed by the analysis of these merged data using different ML architectures. In addition, Multimodal Machine Learning is described in [10] as an area that aims to develop intelligent models that can process and link data from many sources.

Multimodal ML and Data Fusion
Multimodal ML brings together data from multiple and disparate modalities to identify a single task. The discipline behind merging data from multiple sources is called data fusion. More specifically, data fusion is defined as "the process of combining data to refine state estimates and predictions" [5]. According to the Joint Directors of Laboratories Data Fusion Subpanel (JDL), the technique of "data fusion" is a must for processing more than one type of data [46]. The authors in [46] support this definition by explaining that any process that deals with associating, correlating, or combining data from one or more sources to obtain enriched information is called a process that uses data fusion. In data fusion, given the novelty of the literature, there is no consensus on how best to combine different data, especially since there are four different techniques for implementing data fusion, which may have many names depending on the context and research area [5,46,47]. These different approaches are illustrated in Figure 3: • Early Fusion: also called Low-Level Fusion, is the simplest form of data fusion in which disparate data sources are merged into a single feature vector before being used by a single Machine Learning algorithm. Therefore, it can be referred to as a multiple-data, single-algorithm technique. • Intermediate Fusion: is also referred to as Medium-Level Fusion, joint fusion, or Feature-Level Fusion, and occurs in the intermediate phase between the input and output of a ML architecture when all data sources have the same representation format. In this phase, features are combined to perform various tasks such as feature selection, decision-making, or predictions based on historical data. • Late Fusion: also known as decision-level fusion, defines the aggregation of decisions from multiple ML algorithms, each trained with different data sources. In addition, various rules are used to determine how decisions from different classifiers are combined, e.g.,: Even rules learned using a metaclassifier • Hybrid Fusion: defines the use of more than one fusion discipline in a single deep algorithm. Based on the information in [4,5], early fusion is the most common form of fusion, which has the advantage of converting all data into the same representation that can be classified using robust classical models, such as Support Vector Machines or Logistic Regression. However, when the input modalities are particularly uncorrelated and have widely varying dimensionality and sampling rate, it is easier to use a late fusion approach. In addition, both early and late fusion offer the most flexibility in terms of the number of models that can be used to analyze the data, but there is no conclusive evidence that late fusion is better than early fusion because its performance is highly problem dependent. Alternatively, intermediate fusion provides more flexibility in terms of how and when representations learned from Multimodal data are fused. Table 2 discusses the different features of each approach.

Multimodal ML: Technical Perspectives
The goal of Multimodal Machine Learning, also known as Multimodal Deep Learning, is to develop algorithms and models that can interpret and learn from data across multiple modalities, such as text, audio, images, and video. Multimodal ML is a thriving research area with the potential to transform a wide range of applications, from speech recognition and language translation to autonomous cars and medical imaging, among many other areas. Multimodal ML, from a technical perspective, encompasses the various approaches, algorithms, and architectures used in creating and evaluating these models. Data preprocessing, feature extraction, model architecture, training methods, evaluation criteria, generalization, interpretability, and scalability are the most common possible viewpoints. Understanding the technical aspects of Multimodal ML is essential for developing efficient models that can leverage complementary instances across many modalities and make more accurate and robust predictions in the real world. Therefore, the technical perspectives of Multimodal ML are described below.

Data Preparation
Because Multimodal data are often complex and heterogeneous, they must be thoroughly processed before they can be used to train the model. The first step is to recognize the many modalities in action, then learn how to preprocess them, and finally, merge them into a single representation that can be fed into the model [4,5,10].

Model Architecture
Multimodal data can be represented in a variety of ways, including concatenation, fusion, and attentional mechanisms. Choosing the right architecture that can handle the multiple modalities and learn a combined representation is crucial depending on the data and the task to be solved [46,47].

Training Strategies
Pretraining individual modalities, joint training of all modalities, and training individual models and combining them at the time of inference are all viable options for training Multimodal ML models. Selecting the right training methods is a crucial step in achieving the desired goal [4,5,10].

Evaluation Metrics
Following the performance metrics used to evaluate classical ML algorithms, accuracy, precision, recall, sensitivity, specificity, F1 score, and area under the curve (AUC) are just some of the measures that can be used to evaluate Multimodal ML algorithms. It is controversial whether these measures are useful or not when applied to Multimodal data.
As a result, the use of evaluation criteria that consider the success of each modality and the overall performance of the model is essential [21][22][23].

Generalization
Multimodal models are often trained on a specific collection of data and may not generalize well to new data. To assess how well the model can be generalized, it should be tested and validated with data that are very different from the training data [21][22][23].

Interpretability
Because of their complexity and the relationships between multiple modalities, Multimodal ML algorithms can be difficult to understand and even more difficult to explain and interpret. To decipher the decision process of the model, some tools such as attentional mechanisms and visualization can be used [21][22][23]48].

Scalability
In Multimodal Machine Learning, scalability is critical because it enables models to deal with real-world situations where datasets are large and complex, and the amount of data is constantly growing. To ensure that the models can cope with the increase in data volume and complexity in the future, it is necessary to develop models that are scalable to enable effective training and deployment, reduce computational costs, and scale the models [25][26][27]48].

Multimodal ML and Other Technologies: Borderlines
Multimodal Machine Learning is a new and rapidly growing discipline that focuses on building models that can learn from a variety of data sources. To distinguish Multimodal ML from other areas of Machine Learning, its characteristic aspects should be highlighted, such as the use of many modalities and the need for effective integration of these modalities. Establishing precise terminology and creating an understandable description of the field will help to differentiate it from other techniques. However, because it is a relatively new field, there may be an overlap with other areas of Machine Learning, and it will be critical to accurately define the boundaries of Multimodal ML as the topic evolves.

Multimodal ML vs. Multimodal Datasets
Multimodal datasets are datasets acquired with different sensors, instruments, technologies, or devices to observe a common phenomenon, where the acquired data are considered complementary [49]. Consequently, multimodal datasets define the data itself, regardless of the identity of the algorithms used to analyze the data, whether they have a multimodal or unimodal architecture. However, merging multimodal datasets and unifying their representation into a single vector and then analyzing them with an ML model is considered an early fusion approach that is a type of Multimodal ML.

Multimodal ML vs. Multilabel Models
Multilabel Machine Learning algorithms are used to analyze datasets with more than one target variable. For example, the output of multilabel classification models consists of multiple classification labels. Moreover, when performing predictions using multilabel ML algorithms, a given input may belong to more than one label. For example, predicting the category of a movie may result in horror, action, science fiction, drama, or some or all of these categories simultaneously. In other words, multilabel classification associates data with a set of labels. Classification involves learning from a set of examples associated with a single label called "l" from a set of disjoint labels called "L", where |L| > 1. When |L| = 2, the learning problem is called a binary classification problem, and when |L| > 2, it is called a multiclass classification problem [50,51]. Thus, Multimodal ML and multilabel learning differ in the data structure itself, where the former uses data from multiple or different sources to obtain a single result, while the latter uses data from only one source to obtain a single classification result with more than two possible outcomes.

Multimodal ML vs. Ensemble Learning
The goal of ensemble Machine Learning is to improve performance and accuracy by combining numerous models into a single prediction. When making predictions, ensemble learning uses multiple interconnected models rather than a single model. Ensemble learning combines the predictions of many models with the goal that the combined predictions are more accurate and robust than any single model. There are several types of ensemble learning techniques, including [52,53]: • Bagging (Bootstrap Aggregating): is the process of training several models using random subsets of the training data to minimize overfitting; • Boosting is a technique in which models are trained progressively, and the weights of misclassified data points are raised to enhance performance; • Stacking is the process of training many models and combining their predictions with another model to obtain the final forecast.
Ensemble Learning has proven useful in a variety of applications, including classification, regression, and anomaly detection. Following this, although Ensemble Learning uses multiple ML models to solve one task, the main difference between these two technologies is that Multimodal ML is able to analyze more than one dataset with more than one model to solve a task, while Ensemble Learning uses multiple models for the same dataset to solve a task. Therefore, unlike Multimodal ML, Ensemble Learning does not perform data fusion to solve the task. Table 3 below summarizes the comparison between Multimodal ML and other technologies.

Multimodal ML Available Frameworks
Multimodal Machine Learning frameworks provide a systematic approach for developing models that can learn and integrate information from multiple modalities such as text, audio, images, and other data types. As more and more data are created across multiple modalities, multimodal frameworks for Machine Learning are becoming increasingly important. These frameworks enable the integration of diverse information, allowing for a more comprehensive understanding of complicated events. They're used in everything from speech recognition and natural language processing to image and video analysis. Some of the existing and commonly used Multimodal ML frameworks are: • MMF (a framework for multimodal AI models) [54]: is a PyTorch-based modular framework. MMF comes with cutting-edge vision and language pretrained models, a slew of ready-to-use standard datasets, common layers and model components, and training and inference utilities. MMF is also utilized by various Facebook product teams for multimodal understanding of use cases, allowing them to swiftly put research into production; • TinyM 2 Net (a flexible system, algorithm co-designed multimodal learning framework for tiny devices) [55]: a unique multimodal learning framework that can handle multimodal inputs of images and audio and can be re-configured for individual application needs. TinyM2Net also enables the system and algorithm to incorporate fresh sensor data that are tailored to a variety of real-world settings. The suggested framework is built on a convolutional neural network, which has previously been recognized as one of the most promising methodologies for audio and visual data classification;  [59]: a technique for addressing multimodal analytics within a single data processing approach in order to obtain a streamlined architecture that can fully use the potential of Big Data infrastructures' parallel processing.

Training and Evaluation of Multimodal ML Algorithms
Multimodal Machine Learning is a technique that combines different modalities in an attempt to solve a complex task. Given that Multimodal ML is based on the concept of data fusion [46], the training process of a multimodal model may differ depending on the type of fusion (early, intermediate, or late fusion). Although it is a Machine Learning concept, it follows the familiar ML workflow, which would be: data preprocessing, model selection, model training, evaluation, fine-tuning, and deployment, but different steps may occur depending on the fusion stage.
First, in the case of early fusion, after preprocessing, the different datasets can be combined and merged into one modality. Once the data are ready and fused, it can be fed into the model to be trained, and then the other steps can be performed. In the second case, called intermediate fusion, the data passed to the same model are merged after preprocessing, then a single model is trained on the fused dataset, and later, the result of the refined model is fused with other models if they exist. Finally, in the late fusion approach, each dataset is passed to a different model after preprocessing, then the models are trained, evaluated, and fine-tuned, and later, the results are merged into a single result. The three approaches are shown in Figure 4 below.
On the other hand, the evaluation of the Multimodal ML model is also influenced by the chosen approach of data fusion. Since data fusion applies a single model to fused data sources, only a single evaluation is required. In the other two approaches, intermediate and late fusion, each individual model must be evaluated, and later, the final model that merges the different models must be evaluated. The performance measures used to evaluate the Multimodal ML correspond to parameters commonly used in the classical ML domain, such as accuracy, precision, recall, sensitivity, specificity, F1 Score, Area Under Curve (AUC) and others [44]. The evaluation step is also shown in Figure 4 below.

Results: Multimodal ML in Action
Multimodal Machine Learning is a rapidly growing research area that involves the use of many modalities to evaluate and interpret complicated data, such as images, audio, and text [5,47]. Numerous real-world applications, including self-driving vehicles, voice recognition software, and medical imaging, require the ability to integrate and analyze data from multiple sources. Multimodal ML is based on the notion that multiple modalities provide complementary information and that merging these modalities can lead to more accurate and robust models. Multimodal ML has been a hot topic in the scientific community in recent years, and researchers have been striving to develop new algorithms and strategies to improve its performance [5,[60][61][62].

Multimodal ML: Fields of Implementation
The ability to analyze diverse and complementary data increases the success of Machine Learning algorithms in solving more complex problems. In this context, Multimodal ML has proven its success in a variety of domains. Some of the most promising application areas include [5,[60][61][62][63][64]: • Healthcare: in medical imaging, Multimodal ML can be used to integrate information from different imaging modalities such as MRI, CT, and PET scans to improve diagnosis and treatment planning. It can also be used to classify and predict disease based on a mix of clinical, genetic, and imaging data; • Autonomous Vehicles: by combining data from numerous sensors, the Multimodal ML can help self-driving vehicles better understand their surroundings. This has the potential to improve object recognition, navigation and safety; • Natural Language Processing: by blending audio and text data, Multimodal ML can improve speech recognition and natural language comprehension. This can help voice assistants, chatbots and other applications improve their accuracy; • Robotics: by combining inputs from sensors such as cameras, microphones, and touch sensors, Multimodal ML can be used to improve robot perception and interaction. This has the potential to improve navigation, object recognition, and human-robot interaction; • Education: this technology is used in education to analyze student data from numerous sources, such as exams, quizzes, and essays, to make individualized learning suggestions and improve student performance; • Agriculture: this technology can revolutionize agriculture by enabling the optimization of farming processes. It can be used for crop yield prediction, pest and disease detection, precision agriculture, and crop optimization by combining data from multiple sources, such as satellite imagery, weather data, and soil moisture sensors; • Internet of Things (IoTs): this technology can be used in the context of the Internet of Things to make better use of data provided by networked devices. Multimodal ML can enable more accurate and robust models for predicting, monitoring, and managing IoT systems by incorporating data from many sources, such as sensors, cameras, and audio recordings, leading to advances in areas such as energy management, transportation, and smart cities.

Multimodal ML in Healthcare
Multimodal ML is still in its infancy but has been studied and applied in many areas of life, including healthcare. Multimodal ML is an effective method for assessing health data from multiple sources and improving predictive ability due to the inherent heterogeneity of such information [5,62,64]. To date, there are 128 applications of Multimodal ML in healthcare, with neurology and cancer being the most prevalent, as reported in [5]. Multimodal machine learning has shown promising results in various medical areas, as illustrated in Figure 5. While the areas depicted in the figure are the most commonly studied to date, it is worth noting that the potential applications of multimodal machine learning extend beyond these domains:

Multimodal ML and Cardiovascular Diseases: State-of-the-Art
Cardiovascular Disease, the most deadly disease, is a topic of interest for Multimodal ML implementations. For example, in [65], the authors developed a multimodal data fusion ML model to predict hypertension. Using a Convolutional Neural Network (CNN)-based model, they analyzed different Electronic Health Records (EHRs) that were merged with the multimodal data fusion approach. Their model proved its efficiency with an accuracy that reached 94%. In a similar approach, the authors in [66] created a multimodal data fusion model to predict 30-day hospital readmission of patients with heart failure. For this purpose, they developed a Deep Unified Network (DUNs) trained with EHRs from the Enterprise Data Warehouse (EDW) and the Research Patient Data Repository (RPDR). Their model achieved an accuracy of 76.4%. In addition, the study [67] also implemented a data fusion model to cluster patients with hypertension. The authors proposed a novel Hybrid Non-Negative Matrix Factorization (HNMF) method-based model trained with phenotype and genotype information from the HyperGen dataset [68]. The accuracy of their proposed model reached up to 96%. In addition, the authors also developed a data fusion model in [69]. Their goal was to classify different CVDs, so they developed and trained a Text-Image Embedding network (TieNet) model with Chest X-Ray and free-text radiology clinical reports extracted from ChestX-Ray14 [70] and OpenI [71] Chest X-Rays datasets. The proposed model had an Area Under Curve (AUC) of 0.9, as they mentioned. In the same context, the solution proposed in [72] is a data fusion model developed to classify patients at potential cardiovascular risk. The model was based on Recurrent Neural Networks and trained on EHR data, achieving 96% accuracy.
Other implementations proposed model fusion or hybrid multimodal ML architectures to solve their problems. For example, in [73], the authors proposed a hybrid fusion multimodal ML to predict various cardiac diseases such as atelectasis, pleural effusion, cardiomegaly and edema. They created several ML models to analyze radiographs and associated reports obtained from MIMIC-CXR [74] and OpenI [71] Chest X-Ray datasets. Their proposed solution proved to be better than old implementations in terms of accuracy. Similarly, in [75], a multimodal unsupervised learning approach was proposed for Cardiometabolic Syndrome Detection. The authors applied multimodal hybrid fusion by combining unsupervised ML models to analyze fused data from metabolome, microbiome, genetics, and advanced imaging. Furthermore, in [76], the authors proposed a multimodal fusion-based ML model for stroke prediction. They fused both 3D Convolutional Neural Network and Multilayer Perceptron models to analyze neuroimaging information and clinical metadata extracted from the Hotter [77] dataset, which proved to be efficient and powerful with an AUC of 0.90. In addition, the solution proposed in [78] was used to predict Pulmonary Embolism (PE) by fusing multiple ML models trained with Computed Tomography Pulmonary Angiography scans and EHRs. Their model recorded an AUC of 0.947. Furthermore, in [79], the authors developed a Recurrent Neural Network model with Bidirectional Long-Term Memory (BiLSTM) to predict cardiovascular risk. Their model was trained with EHR data extracted from the Second Manifestations of ARTerial Disease (SMART) Study [80] and recorded an AUC of 0.847.
Similarly, in [81], the authors developed a data fusion model to predict Acute Ischemic Stroke. They used a series of cardiac CT images with EHR recordings to train a Gradient Boosting classifier that achieved an AUC of 0.856. Similarly, the study [82] proposed a Deep Convolutional Neural Network (DCNN) data fusion model to analyze Electrocardiograph (ECG) and Chest X-Ray images to efficiently predict Accessory Pathways (APs) syndrome. Finally, in [83], the authors proposed a novel tensor-based dimensionality reduction method using Naive Bayes, SVM, Random Forest, Adaboost, and LUCCK models. The created models were trained with fused data composed of Salient Physiological Signals and EHR data. Their solution was able to predict Hemodynamic Decompensation with an AUC value of 0.89. Table 4 below summarizes and presents the Multimodal ML implementations in CVDs.

Multimodal ML and CVDs: Discussion
Multimodal ML is a method for training different modalities using heterogeneous data that may not fit the same structure, format, or type that can be used for traditional ML algorithms. In the field of disease diagnosis, Multimodal ML could be used to train models on a huge distributed dataset of patient data from different hospitals or clinics. This method allows information and knowledge to be fused to solve complex problems. Using a larger, more diverse dataset also allows for more accurate and robust models. However, the implementation of Multimodal Machine Learning for disease prediction, especially Cardiovascular Disease, can be discussed from different angles, which are detailed in this section.

Models Performance: Competition between Multimodal and Classical ML
Data collection is the starting point for the operation of the established pipeline in the classical ML. It is generally accepted that more data can be used to increase the accuracy of an already trained Machine Learning model. It is generally accepted that due to the ability of Multimodal ML to analyze heterogeneous data, the accuracy of the models far exceeds that of typical ML models where more data are analyzed simultaneously.
In this context, the results presented in Table 4 reflect the high feasibility and accuracy that Multimodal ML cope with the diagnosis and prediction of Cardiovascular Disease. For example, the studies [65,67,72] achieved high accuracy records, with the first recording 94.8% and the other two, 96%. These results are highly comparable to the state of the art of conventional ML models used for the detection and prediction of CVDs and cerebrovascular events, with the highest recorded accuracy reaching 91.80%, as shown in [84]. In addition, the studies [69,76,78] recorded high values for Area Under Curve (AUC), with the first and second reaching a value of 0.9 and the third up to 0.95 for this parameter. These values demonstrate the high feasibility of these studies, which are consistent with and even exceed conventional ML algorithms. Moreover, the authors mention in [73] that their results show improved classification accuracy compared to conventional ML algorithms.
On the other hand, the results in [66] failed to outperform or even match conventional ML algorithms, where the recorded accuracy was 76.4%, which is lower than the values obtained by the latest ML algorithms in predicting ML models [84]. In addition, the studies [79,81,83] obtained different AUC values of 0.85, 0.86, and 0.89, respectively. These values are high and feasible, but they are close to but do not exceed the highest results obtained with classical ML models. Finally, the studies [75,82] did not mention the results obtained, which makes it impossible to compare their results with the classical ML models in the field of CVD diagnosis.
Overall, of the thirteen studies presented in Table 4, seven exceeded the results of the classical ML in terms of accuracy, three matched those results, and only one was obviously lower than them, and the other two are not comparable because they did not report their results. In this context, these figures help to confirm the hypothesis that the ability to analyze heterogeneous data increases the performance and accuracy of the models, which is a major strength in the field of multimodal ML since more than three-quarters of the Multimodal ML algorithms either match or exceed the results of the classic ML in the diagnosis of Cardiovascular Disease.

Real World vs. Research Implementations
The concept of Multimodal ML can be traced back to the early 2000s in the technology field, where authors in [85] suggested using this concept because the combination of communication modalities and acquisition devices can produce a wide range of unimodal and multimodal interface techniques. However, advances in computer technologies, data transmission, communication techniques, and other aspects have helped to increase the efficiency of Multimodal ML technology.
As a result, studies [65,75,85] have used their own data. Although these datasets are not publicly available, the authors assured that the data are real datasets collected from various health centers in compliance with medical standards and norms. This confirms that these studies can be classified as real-world studies. The same is true for [66,72,78,81,83], where each study used a dataset collected in different medical facilities in compliance with standard medical norms, making these studies real-world implementations.
On the other hand, the studies [67,69,73,76,79] used publicly available datasets, which are listed in Table 4. Although these datasets were collected under real-world conditions and obtained from patients, the study itself cannot be described as a real-world implementation. Real-world use of multimodal ML models in healthcare can provide a number of significant benefits, including: • Improved Diagnostic Accuracy: Multimodal ML models can evaluate multiple sources of patient data, such as medical imaging, electronic health records, and genetic information, to make more accurate and thorough diagnoses. This can help physicians identify diseases and conditions at an early stage when they are more curable; • Personalized Treatment: multimodal ML models can be trained on large data sets to identify trends and predict outcomes for individual patients. This can help physicians tailor treatments and therapies to the unique needs of each patient, leading to better outcomes and fewer side effects; • Efficient Resource Allocation: Multimodal ML models can help physicians allocate resources more efficiently by identifying patients who are at higher risk for poor outcomes or need more intensive care. This has the potential to reduce healthcare costs while improving overall system efficiency; • Improved patient experience: Multimodal ML models can help clinicians identify patients who need more individualized care or are at risk for problems or adverse events. This can help improve patient satisfaction and overall quality of care.
Overall, real-world adoption of Multimodal ML models in healthcare has the potential to enhance patient outcomes, lower costs, and improve healthcare delivery efficiency. However, it is critical that these models be created and used in an ethical manner, with proper protections for patient privacy and data security. That being said, the progress of Multimodal ML implementations and their real-world execution are promising where most of the carried applications are applied outside of labs, with real data, which enhances the trust in this technology and assists its adoption in the production stages.

Use of Smart Wearables and IoTs
Continuous monitoring of patients' heart rate, blood pressure and other biometric data through smart wearables and Internet of Things devices could revolutionize medical treatment. This has the potential to enable earlier detection of medical problems, more accurate diagnosis, and more personalized treatment approaches. Wearable technologies that can monitor and interact with the user's health could enable individuals to participate more fully in their treatment. In addition, Internet of Things (IoT) devices can enable physicians to monitor patients remotely and deliver treatments more effectively, reducing demand on healthcare systems and improving access to care for people in underserved or extremely remote and isolated areas. Smart wearables and Internet of Things (IoT) devices could increase hospital efficiency, save costs, and improve patient outcomes [86,87].
Consequently, only studies [67,75] considered the use of smart wearables or IoTs devices in their implementations. The other studies used data collected with other devices. Therefore, there is a lot of catching up to do in the implementation of multimodal ML in wearables and IoTs for CVD detection and prediction. Considering the fact that these technologies can revolutionize healthcare, as mentioned earlier, there is a great need to increase the use of wearables and IoTs in this field. In Table 5 below, the comparison between the performance of Multimodal ML and classical ML, the validation in practice, and the use of smart wearables and IoTs for the state of the art in predicting CVDs with Multimodal ML is summarized.

Limitations in the Use of Multimodal ML for Disease Prediction
From this perspective, the use of Multimodal Machine Learning for the diagnosis and prognosis of CVDs is still in its infancy. Apart from the fact that not all implementations of Multimodal Machine Learning are superior to traditional ML models, vivid real-world examples can be observed when discussing this topic. Moreover, it has been rare to see FL researchers using smart wearables or IoTs in their experiments. This highlights the need to further investigate the use of such technologies due to their high degree of practicality and applicability in the field. Other limitations and difficulties encountered in the field of multimodal ML and its applications in disease prediction are discussed in Section 4.1, which can also be seen below.

Multimodal ML in CVDs: A Technical Overview
In Multimodal Machine Learning technology, the main goal is to analyze different data with different structures, such as merging EHR data with medical images to predict the occurrence of Cardiovascular Disease. In this context, each Multimodal ML implementation follows its own workflow and goes through its own steps to achieve its goal. In the aforementioned implementations of Cardiovascular Disease detection using Multimodal ML, different workflows, model structures, and hyperparameters were used for different implementations. All the related data provided by the authors are listed in Table 6 below.

Discussion: Challenges and Future Perspectives
Recently, Multimodal Machine Learning (ML) has emerged as an effective method for studying and analyzing complex data from multiple sources and modalities. However, dealing with diverse data presents researchers with unique challenges that must be overcome for efficient analysis and interpretation to increase the feasibility and usability of multimodal ML [10,48,49,62]. Unifying and standardizing multiple data sources and establishing links between them are significant obstacles. In addition, data must be normalized and preprocessed to ensure reliability and accuracy. However, future research could take several approaches to mitigate these challenges and overcome future obstacles. This section addresses these issues and identifies future perspectives needed to overcome them and improve multimodal FL.

Challenges
Multimodal Machine Learning still struggles with various challenges arising from the use of heterogeneous data with different structures and formats. Moreover, the fusion process, whether applied to the data itself or to different trained models to recognize a single result, is a challenging process that requires further research. Therefore, the most common challenges can be summarized in the following points [10,48,49,62].

Data Availability and Quality
To efficiently train multimodal ML models, large amounts of high-quality data are needed. However, collecting and processing large amounts of high-quality data in healthcare can be challenging, especially for rare or complex diseases. Data scarcity or poor data quality can lead to biased or unreliable models, compromising the accuracy of predictions and treatment decisions. To develop more robust and effective multimodal ML models for healthcare, researchers must seek to identify and address data quality and quantity issues.

Data Representation
Multimodal ML promotes the use of data from multiple sources for presentation. As a result, there is a high likelihood of dealing with heterogeneous data, which presents a number of problems. For example, it may be difficult to merge heterogeneous data that do not overlap in common characteristics or overlap only in a very limited area. In addition, data from different sources may need to be processed to different extents, especially with respect to noise reduction and missing data management. This hurdle is clearly reflected in the fact that until recently, most multimodal representations were simply the concatenation of unimodal ones [88]. Smoothness, temporal and spatial coherence, sparsity, and natural grouping have been cited by authors in [89] as qualities for excellent data representation.

Data Integration and Interoperability
Multimodal Machine Learning models are used to integrate and analyze data from multiple sources, such as electronic health records, medical imaging, and genetic data.
However, data from different sources may use different formats, standards, or terminologies, posing significant challenges for data integration and interoperability. Medical images, for example, may use different file formats or imaging techniques, making it difficult to compare and analyze data from different studies or sources.

Fusion
It is not easy to learn the ability to merge information from two modalities and determine the optimal fusion strategy. This is due to the different predictive capacities and noise structures of the different information coming from different senses. In addition, the ability to deal with missing data at different levels has a significant impact on the ability to perform fusion tasks.

Translation
The challenge in translation is not only the heterogeneity of data but also the relationships between modalities. The translation or mapping of data is subjective; for example, two models may describe the same image in more than one correct way, and a perfect or uniform translation or mapping may not exist. Several studies argue that while translations can be quite broad and modality-specific, they still have a number of unifying features. Accordingly, there are two forms of translation, namely the "Example-Based" and the "Generative" models. The former relies on a dictionary to translate data across modalities, while the latter relies on the creation of a model that manages translation according to uniform or at least explicit standards.

Alignment
Finding connections and correspondences between subelements from two or more different modalities is called multimodal alignment. This also involves distinguishing between these linear connections rather than just recognizing them. In this context, there are few data sets with obvious and identifiable correlations. Therefore, it is challenging to perform similarity measurements across modalities. Moreover, there may be numerous alignments without being able to select the optimal one, and not all components in one modality may match in another.

Explainability and Interpretability
Multimodal Machine Learning models (ML) have shown great promise in healthcare by enabling more accurate and tailored diagnosis and treatment recommendations. However, these models can be very complicated and difficult to understand, making it difficult for physicians to understand how the models arrived at a particular decision or recommendation. The lack of interpretability and openness of these models can affect their clinical acceptance and confidence.

Co-Learning
Merging different modalities, such as images, text, and sensor data, can increase model performance and enable more comprehensive analysis of complicated data in Multimodal Machine Learning. However, there are significant hurdles to this fusion, including the difficulty of transferring knowledge, representation, and predictive models across modalities. Each modality has its own characteristics and advantages, and it can be difficult to successfully integrate these aspects into a coherent representation. In addition, different modalities may require different strategies for feature engineering, preprocessing, and modeling.

Increased Computation Cost
When multiple modalities and features are introduced into a Multimodal Machine Learning model, the complexity of the model may increase, and the performance of the model may degrade due to the increased difficulty in computing the desired outcome.
Complex models have higher processing requirements, which can increase inference times and memory consumption. The complexity of a model makes it more difficult to optimize, which can lead to an increased risk of over-or under-fitting the data.

Regulatory and Ethical Considerations
Apart from the technical hurdles in developing and implementing multimodal ML models in healthcare, there are also legal and ethical factors to consider. Depending on their intended use, these models may be subject to regulatory restrictions, such as the European Union's General Data Protection Regulation (GDPR) [90], China's Cyber Security Law of the People's Republic of China [91], the General Principles of the Civil Law of the People's Republic of China [92], the PDPA in Singapore [93], and hundreds of principles that apply around the world. In addition, researchers and clinicians must ensure that these models are created and used in an ethical manner and that patient privacy and data security are adequately protected. For example, patient data must be de-identified and protected from illegal access or disclosure. In addition, maintaining the fairness and openness of these models is critical to minimize bias and discrimination. Responsible development and adoption of multimodal ML models therefore require careful evaluation of these legal and ethical factors to ensure that they deliver safe, effective, and fair outcomes for patients.

Implementation and Adoption
To fully deliver on their promise to improve healthcare, Multimodal Machine Learning models (ML) must be integrated into current healthcare processes and systems. However, several barriers stand in the way of this integration, such as technological, organizational, and cultural. In addition to the technical challenges mentioned above, resistance to change, lack of stakeholder participation, and concerns about accountability and obligations are all examples of organizational and cultural hurdles that may arise.
These challenges give rise to the study questions in the list below (the abbreviation RQ in the list below refers to the term "research question"):

Future Perspectives
The challenges faced in Multimodal Machine Learning can be solved through different approaches and perspectives. These solutions have either already been considered but should be more widely used in the field of Cardiovascular Disease prediction to improve and increase their usability and feasibility. In this context, the following solutions can serve as future recommendations.

Use Convenient Tools to Collect More Data
Modern technology has changed the method of data collection and analysis. The use of smart wearables and Internet-of-Things (IoT) devices has enabled the real-time collection of vast amounts of data [33,39,86,87]. These data can provide useful insights in a variety of areas, particularly in healthcare. In addition to these new data sources, current data sources should be used to create more complete databases. Researchers can gain access to larger and more diverse data sets by collaborating with other institutions, which can help them identify patterns and correlations that would not be obvious with smaller data sets. Collaboration between different institutions could be achieved using a variety of techniques such as Federated Machine Learning technology, which can help train Machine Learning models by sharing parameters rather than the data itself [9].

Automate and Boost Data Preprocessing
Creating larger and more comprehensive datasets could help improve the quality of Machine Learning models but is not yet sufficient. To gain valuable insights, data must be processed and analyzed using advanced techniques. These techniques include artifact automation and noise removal, as performed in [94,95]. In addition, it may be necessary to use techniques such as data augmentation [96] or data normalization [97] and data resampling [98] to ensure that the data are balanced and ready for model training and to improve the quality of the overall process.

Employment of Advanced Data Integration Tools
To address the problems posed by the diversity of data formats and structures, improved methods for data harmonization [99], standardization [100], and normalization [97] need to be developed, as well as the use of AI and ML algorithms to automate these processes. Multimodal ML has the potential to revolutionize healthcare by enabling thorough and tailored analysis of patient data from numerous sources if these barriers are overcome.

Embedding Modern Techniques to Enhance Explainability
To address the problems associated with the black-box nature of multimodal ML models, more explainable and interpretable models are needed that give healthcare professionals insight into how the models arrive at their judgments. Approaches such as feature relevance ranking [101], model visualization [102], decision rules [103], probabilistic [104] and neuro-fuzzy approaches [105], and many others can improve the interpretability of multimodal ML models so that interested parties can make more informed and confident treatment decisions. In the list below, a brief definition for each of these tools is presented: • Feature relevance ranking: include methods such as permutation significance and partial dependency plots to give insights into the importance and correlations of input variables, allowing for a better understanding of the model's decision-making process and boosting transparency and interpretability in healthcare applications; • Model visualization: such as decision trees and heatmaps that provide a graphical representation of the model's decision-making process, allowing for better understanding of the factors that influence the model's predictions and increasing the transparency and interpretability of the technology; • Decision rules: by providing clear and understandable rationales for the model's predictions, decision rules that specify explicit decision criteria based on the input data improve the interpretability and transparency of machine learning models in healthcare. • Probabilistic approach: employ probabilistic reasoning to represent and manage the uncertainty inherent in medical data allowing for transparent decision-making that can be easily understood by healthcare practitioners; • Neuro-fuzzy techniques: combine the benefits of neural networks and fuzzy logic to generate more interpretable models that can deal with imprecise and uncertain inputs.

Implementing Necessary Methods to Guarantee Knowledge Transfer
The diversity of datasets and models in the field of multimodal ML can lead to knowledge transfer problems. Therefore, researchers need to develop novel strategies for multimodal feature selection [106], fusion [46], and modeling that can capture complementary information from many modalities while minimizing redundancy or overfitting. Overcoming these obstacles will allow for more robust and accurate multimodal ML models that will lead to improved diagnosis, treatment, and patient outcomes in healthcare settings.

Reducing Computation Cost
Reducing computational costs in multimodal ML is a critical issue. Therefore, researchers need to explore methods for model compression [107] and optimization [108] that can reduce the computational complexity of the model without compromising its performance. As an added bonus, Multimodal Machine Learning can benefit from efficient hardware and software implementations, such as specialized hardware accelerators and distributed computing frameworks, that can reduce computational load. The use of such techniques can help build multimodal ML models that are more robust, efficient, and scalable, and therefore applicable to a wider variety of health problems, leading to faster and more accurate solutions.

Increase Trust and Feasibility to Raise the Technology Adoption
Researchers, clinicians, information technology experts, and healthcare administrators must work together to increase confidence in multimodal ML technology. In addition, cultural and organizational barriers can be reduced by promoting trust and transparency through open dialog and training. The best way to improve patient outcomes and revolutionize healthcare delivery is to properly integrate multimodal ML models into current healthcare delivery processes and systems.
The results of the mapping of challenges and solutions can be summarized in the following topics (the symbol TR in the list below refers to the term "Trending Research Topic"): Finally, the challenges that hinder the progress of Multimodal Machine Learning techniques, along with the solutions and future perspectives that could be pursued, are presented in Figure 6 below.

Conclusions
In summary, Multimodal ML is a new technique that enables the simultaneous use of multiple models and data types in the creation of complex ML and DL models. Multimodal ML has the potential to significantly improve the accuracy and effectiveness of AI applications, especially in healthcare, where it has already become an important part of everyday patient care by addressing the problem of data heterogeneity. In particular, the technical features of Multimodal ML, such as data fusion and workflows, were covered, and the differences with other technologies, such as Ensemble Learning, were highlighted. In addition, an overview of the application of Multimodal ML in the diagnosis and prediction of Cardiovascular Disease was provided, highlighting the encouraging results to date and the room for growth in this area. Privacy, bias, and interpretability of results are just some of the remaining difficulties that need to be addressed, as with any rapidly evolving technology. However, it is likely that these obstacles can be addressed through further research and development and that multimodal ML will continue to play an important role in the development of AI applications in a variety of sectors, particularly healthcare.

Data Availability Statement:
The study did not report any data.