Machine-Learning Methods for Speech and Handwriting Detection Using Neural Signals: A Review

Brain–Computer Interfaces (BCIs) have become increasingly popular in recent years due to their potential applications in diverse fields, ranging from the medical sector (people with motor and/or communication disabilities), cognitive training, gaming, and Augmented Reality/Virtual Reality (AR/VR), among other areas. BCI which can decode and recognize neural signals involved in speech and handwriting has the potential to greatly assist individuals with severe motor impairments in their communication and interaction needs. Innovative and cutting-edge advancements in this field have the potential to develop a highly accessible and interactive communication platform for these people. The purpose of this review paper is to analyze the existing research on handwriting and speech recognition from neural signals. So that the new researchers who are interested in this field can gain thorough knowledge in this research area. The current research on neural signal-based recognition of handwriting and speech has been categorized into two main types: invasive and non-invasive studies. We have examined the latest papers on converting speech-activity-based neural signals and handwriting-activity-based neural signals into text data. The methods of extracting data from the brain have also been discussed in this review. Additionally, this review includes a brief summary of the datasets, preprocessing techniques, and methods used in these studies, which were published between 2014 and 2022. This review aims to provide a comprehensive summary of the methodologies used in the current literature on neural signal-based recognition of handwriting and speech. In essence, this article is intended to serve as a valuable resource for future researchers who wish to investigate neural signal-based machine-learning methods in their work.


Introduction
Acquiring and analyzing neural signals can greatly benefit individuals who have limitations in their movement and communication. Neurological disorders, such as Parkinson's disease, multiple sclerosis, infectious diseases, stroke, injuries of the central nervous system, developmental disorders, locked-in syndrome [1], and cancer, often lead to physical activity impairments [2]. The acquisition of neural signals, along with stimulation and/or neuromodulation using BCIs [3], aims to alleviate some of these conditions. In addition, neural signals have been utilized in various fields such as security and privacy, cognitive training, imaginary or silent speech recognition [4,5], emotion recognition [6,7], mental state recognition [8], human identification [9,10], speech communication [11], synthesized speech communication [12] gaming [13], Internet of Things (IoT) applications [14], Brain Machine Interface (BMI) applications [15][16][17], neuroscience research [18,19], speech activity detection [20,21] and more. The first step involves collecting neural signals from patients, which are then processed and analyzed. The processed signals are then used to operate assistive devices, which helps patients with their movements and communication. Neural signals can also be utilized to gauge the mental state of the general population, detect

Regions of the Brain Responsible for Handwriting and Speech Production
The production of speech involves several stages in the brain, including the translation of thoughts into words, the construction of sentences, and the physical articulation of sounds [38]. Three key areas of the brain are directly involved in speech production: the primary motor cortex, Broca's area, and Wernicke's area [39]. Wernicke's area is primarily responsible for producing coherent speech that conveys meaningful information. Damage of Wernicke's area, also known as fluent aphasia, can affect comprehension and meaningless sentences [40]. Broca's area aids in generating smooth speech and constructing sentences before speaking. Damage to one's Broca's area results in a condition known as Broca's aphasia, or non-fluent aphasia, which can cause the person to lose their ability to produce speech sounds altogether or to only speak slowly and in short sentences [39]. Finally, the motor cortex plays a role in planning and executing the muscle movements necessary for speech production, including the movement of the mouth, lips, tongue, and vocal cords. Damages to the primary motor cortex can cause paralysis of the muscles used for speaking. However, therapy and repetition can help improve these impairments [41].
When writing is initiated, our ideas are first organized in our mind, and the physical act of writing is facilitated by our brain, which controls the movements of our hands, arms, and fingers [42]. This process is initiated by the cingulate cortex of the brain. The visual cortex then creates an internal picture of what the writing will look like. Next, the left angular gyrus [43] converts the visual cortex signal into a comprehension of words, and this process involves Wernicke's area also. Finally, the parietal lobe and the primary motor cortex work together to coordinate all of these signals and produce motor signals that control the movements of the hand, arm, and finger required for writing [42]. In a study, Willett et al. [44] proposed a discrete BCI, which is capable of accurately decoding limb movements, including those of all four limbs, from the hand knob [45,46]. Figure 1 shows the regions of the brain that are primarily responsible for speech production and motor movements for handwriting.

Cingulate cortex
Visual cortex

Regions involved in handwriting generation
Regions involved in both speech production and handwriting generation Figure 1. Key regions of the brain that are fundamentally responsible for speech production and initiating motor movements for generating handwriting. Wernicke's area is responsible for speech production. The parietal lobe, Visual cortex, and Cingulate cortex are responsible for handwriting generation. The primary motor cortex and Broca's area are responsible for both speech production and handwriting generation.

Methods of Collecting Data from Brain
The primary objective of many BCIs is to capture neural signals in a manner that allows external computer software or devices to interpret them with ease. Neural signals can be obtained from the brain through various methods such as EEG sensors, Microelectrode arrays, or ECoG arrays. As shown in Figure 2, EEG signals can be extracted non-invasively from the scalp. because of which they typically have lower magnitudes compared to other neural signals. On the other hand, ECoG arrays can produce signals of higher magnitude since they are implanted invasively in the brain. However, because of their physical dimensions, the spatial resolution is still limited. Finally, microelectrode arrays can acquire high-frequency spikes with much improved spatial resolution [47]. In all of these methods, the signals must be processed in a way that enables the BCI software or devices to effectively decipher them [48].  Existing technologies like EEG Sensors, ECoG Arrays, and Microelectrode Arrays that are used to acquire neural signals with their acquired signal characteristics including amplitude and frequency bands [47]. The amplitudes of neural signals acquired from ECoG arrays and the frequency of neural signals acquired from microelectrode arrays are typically higher than other existing technologies.

Invasive Methods
The neural signals are directly collected using invasive electrodes, placed inside the skull. Here, brain surgery is needed for implanting the electrodes into the grey matter of the brain. As the signals are coming directly from the grey matter of the brain this technique always provides high-quality signals [49], with better SNR. However, as it requires surgery for implanting the electrodes inside the skull, the invasive methods carry a high risk of brain infection. Additionally, in invasive methods, the brain reacts with a process called gliosis that creates scar tissue around the foreign object (electrode), and thus the electrodes can hardly collect neural signals [50] over time. Most of the papers included in this review that utilizes invasive methods, have extracted the signals from the primary motor cortex area of the brain [51].
Invasive methods of collecting neural signals are mostly used in medical applications in a hospital setting. As the signals are more accurate, they can be used to help paralyzed patients with certain functionalities to move or provide commands through computers. Since there is direct contact with neurons at the time of collecting signals they provide more information even if the signals are coming from only a few neurons. These signals can be used to control artificial arms [52], speech decoding [53], TV, lights, Brain to Text implementation [54], speech recognition [55,56] and other software applications [57]. Figure 3a shows the invasive process of collecting the invasive signals from the brain.  . Existing methods of collecting neural signals from brain. (a) Data processing flow diagram, advantages, and disadvantages of invasive process of collecting neural signals from the brain. Though invasive process requires surgery and high cost, neural signals that are extracted from invasive process provide accurate results and higher SNR. (b) Data processing flow diagram, advantages, and disadvantages of non-invasive process of collecting neural signals from the brain. The non-invasive process requires no surgery and low cost, but the neural signals acquired from the non-invasive process provide less accurate results and lower SNR.

Non-Invasive Methods
In the non-invasive way of collecting neural signals, the electrodes are placed on the scalp/skin to measure and collect neural signals. This technique has been used widely because it's easier to use, and does not require surgery as the neural signals are acquired using external sensors or electrodes. Hence, it is cheaper and provides more comfort to the person and it is also less risky.
However, as the signals are collected at a larger distance from the actual neurons, it provides noisy data and worse signal resolution. Thus, this method is less effective than the invasive methods in terms of the SNR. Most of the non-invasive ways focus on collecting EEG signals as it is easier and cheaper. However, the EEG signals can vary from person to person, and even within the subject from time to time [58]. Therefore, it is very difficult to deal with the real-time experiment that the model has trained with the past EEG signals dataset [59].
In the non-invasive techniques, the neural signals can also be sent back into the brain using transcranial magnetic stimulation (TMS) which has already been used by medics [49]. EEG signals are also used to recognize unspoken [60] and imagined speech from individuals [61,62]. Examples of non-invasive techniques are EEG [63], magnetoencephalography (MEG) [64], functional magnetic resonance imaging (fMRI) [65], and near-infrared spectroscopy (NIRS) [66]. Figure 3b shows the non-invasive process of collecting EEG data from the brain.

Speech Recognition Using Non-Invasive Neural Datasets
In 2017, Kumar et al. [67] proposed a Random Forest (RF) based silent speech recognition system utilizing EEG signals. They introduced a coarse-to-fine-level envisioned speech recognition model using EEG signals, where the coarse level predicts the category of the envisioned speech, and the finer-level classification predicts the actual class of the expected category. The model performed three types of classification: digits, characters, and images. The EEG dataset comprised 30 text and non-text class objects that were imagined by multiple users. After performing the coarse-level classification, a fine-level classification accuracy of 57.11% was achieved using the Random Forest classifier. The study also examined the impact of aging and the time elapsed since the EEG signal was recorded.
In 2017, Rosinová et al. [68] proposed a voice command recognition system using EEG signals. EEG data were collected from 20 participants aged 18 to 28 years, consisting of 13 females and 7 males. The EEG data of 50 voice commands were recorded 5 times during the training phase. The proposed model was tested on a 23-year-old participant, whose EEG signal data was collected when speaking the 50 voice commands 30 times. The hidden Markov model (HMM) and Gaussian Mixture model (GMM) were used to train and test the proposed model. The authors claim that the highest classification accuracy was achieved on alpha, beta, and theta frequencies. However, the recording data were insufficient and the accuracy was very low.
In 2019, Krishna et al. [69] presented a method for automatic speech recognition from an EEG signal based on Gated Recurrent Units (GRU). Their proposed method was trained on only four English words-"yes", "no", "left", and "right"-spoken by four different individuals. The proposed method can effectively detect speech in the presence of background noise, with a 60 dB noise level used in the research. The paper reported a high recognition accuracy of 99.38% even in the presence of background noise.
In 2020, Kapur et al. [35] proposed a silent speech recognition system based on Convolutional Neural Network (CNN) utilizing neuromuscular signals. This research marks the first non-invasive real-time silent speech recognition system. The dataset used comprised 10 trials of 15 sentences from three multiple sclerosis (MS) patients. The research obtained 81% accuracy, and an information transfer rate of 203.73 bits per minute was recorded.
In 2021, Vorontsova et al. [2] proposed a silent speech recognition system based on Residual Networks (ResNet)18 and GRU models that use EEG signals. The researchers collected EEG data from 268 healthy participants who varied in age, gender, education, and occupation. The study focused on the classification of nine Russian words as silent speech. The dataset consists of a 40-channel EEG signal recorded at a 500 Hz frequency. The results showed an 85% accuracy rate for the classification of the nine words. Interestingly, the authors found that a smaller dataset collected from a single participant can provide higher accuracy compared to a larger dataset collected from a group of people. However, the out-of-sample accuracy is relatively low in this study.

Speech Recognition Using Invasive Neural Datasets
In 2014, Mugler et al. [70] published the first research article about decoding the entire set of phonemes from American English. In linguistics, a phoneme refers to the smallest distinctive unit of sound in a language, which can be used to differentiate one word from another [71]. The authors used ECoG signals from four individuals. In this study, a high-density (1-2 mm) electrode array with 4 cm of speech motor cortex was used to decode speech. The researchers achieved 36% accuracy in classifying phonemes using ECoG signals with Linear Discriminant Analysis (LDA). However, the accuracy in word identification from phonemic analysis alone was only 18.8%, which falls short of the mark.
In 2019, Anumanchipalli et al. [72] proposed a speech restoration technique that converts brain impulses into understandable synthesized speech at the rate of a fluent speaker. Bidirectional long short-term memory (BLSTM) was used to decode kinematic representations of articulation from high-density ECoG signals collected from 5 individuals.
In 2019, Moses et al. [73] proposed a real-time question-and-answer decoding method using ECoG recordings. The authors used the Viterbi decoding algorithm which is the most commonly used decoding algorithm for HMM. The real-time high gamma activity of the ECoG signals has been collected from the brain. The authors received 61% decoding accuracy for producing utterances and 76% decoding accuracy for perceiving utterances.
In 2020, Makin et al. [74] published an article on machine translation of cortical activity to text using ECoG signals. The authors trained a Recurrent Neural Network (RNN) to encode each sentence-length sequence of neural activity. The encoder-decoder framework was employed for machine translation. The authors decoded cortical activity to text based on words, as they are more distinguishable than phonemes. For training purposes, 30-50 sentences of data were used.
In 2022, Metzger et al. [75] proposed an Artificial Neural Network (ANN) based model for recognizing attempts at silent speech mainly built on GRU layers. ECoG activity from the neural signal, along with a speech detection model, was used for spelling sentences. Only code words from the North Atlantic Treaty Organization (NATO) phonetic alphabet [76] were used during spelling to improve the neural discriminability from one word to another. In online mode, an 1152-word vocabulary model was used, with a 6.13% character error rate and 29.4 characters per minute. The beam search technique was used to spell the most accurate sentences. However, only one participant was involved in this training and spelling process.

Handwritten Character Recognition Using Non-Invasive Neural Datasets
In September 2015, Chen et al. [77] proposed a BCI speller using EEG. The study implemented a Joint Frequency Phrase Modulation (JFPM) based SSVEP speller to achieve highspeed spelling. Eighteen participants took part in the study, and six blocks of 40 characters were used for training with 40 trials on each block in random order. The study found a spelling rate of up to 60 characters per minute and an information transfer rate of up to 5.32 bits per second.
Saini et al. [78] presented a method for identifying and verifying individuals using their signature and EEG signals in 2017. The study involved collecting signatures and EEG signals from 70 individuals between the ages of 15 and 55. Each participant provided 10 signature samples, and EEG signals were captured using an Emotiv Epoc+ neuro headset. The researchers used 1400 samples of signature and EEG signals for user identification, and an equal number of samples for user verification. They evaluated the performance of the method using three types of tests: using only signatures, using only EEG signals, and using signature-EEG fusion. The results showed that the signature-EEG fusion data achieved the highest accuracy of 98.24% for person identification. For user verification, the EEG-based model performed better than the signature-based model and the signature-EEG fusion data. The authors also found that individuals between the ages of 15 and 25 had higher identification accuracy than others, and males had higher identification accuracy than females.
In 2019, Kumar et al. [79] proposed a novel user authentication system that utilizes both dynamic signatures and EEG signals. The study involved collecting signatures and EEG signals from 58 individuals who signed on their mobile phones simultaneously. A total of 1980 samples of dynamic signatures and EEG signals were collected, with EEG signals being recorded using an Emotiv EPOC+ device and signatures being written on the mobile screen. To train the system, a BLSTM neural network-based classifier was utilized for both dynamic signatures and EEG signals. The results showed that the signature-EEG fusion data using the Borda count fusion technique achieved an accuracy of 98.78%. The Borda count decision fusion verification model was used for user verification, which resulted in a false acceptance rate of 3.75%.
In 2021, Pei et al. [80] proposed a method for mapping scalp-recorded brain activities to handwritten character recognition using EEG signals. In the study, five participants provided their neural signal data while writing the phrase "HELLO, WORLD!" CNN based classifiers were employed for the analysis. The accuracy of handwritten character recognition varied among participants, ranging from 76.8% to 97%. The accuracy of crossparticipant recognition ranged from 11.1% to 60%.

Handwritten Character Recognition Using Invasive Neural Datasets
In 2021, Willett et al. [81] proposed a brain-to-text communication method using neural signals from the motor cortex. The authors employed a RNN for decoding the text from the neural activity. The proposed model decoded 90 characters per minute with 94.1% raw accuracy in real-time and greater than 99% accuracy offline using a language model. Sentence labeling was performed using a HMM, and the Viterbi search technique was employed for offline language modeling. The authors also demonstrated that handwriting letters with neural activity is easier to distinguish than point-to-point movements. Figure 4 shows the overall summary for the speech and handwritten character recognition-based articles with invasive and non-invasive neural signal acquisition.

General Principle of Using Machine Learning Methods for Neural Signals
The research conducted on neural signals typically follows a standardized flowchart. It begins with the acquisition of neural signals and concludes with the identification of these signals using the most efficient methods. In this context, we will focus on research conducted using machine learning and classical techniques. Figure 5 depicts a step-by-step diagram commonly utilized in existing research articles that work with neural signals. To begin, invasive or non-invasive processes are used to collect, digitize and store neural signals. These signals then undergo a series of preprocessing techniques to enhance their quality. Next, meaningful features are extracted from the processed signals. Finally, machine learning methods are employed to accurately decode the signals. The various steps involved in the research articles have been summarized in the following subsections.

Collection of brain signals
Preprocessing Feature extraction Machine learning methods used for decoding Figure 5. Diagram of data processing and machine learning methods used for decoding neural signals (each block corresponds to one step of the whole process).

Prepossessing Techniques and Feature Extraction Methods
Most of the papers used independent component analysis and principle component analysis in their preprocessing stages. For extracting meaningful features from the raw data the authors used Mel Frequency Cepstrum Coefficients (MFCCs) in most of the papers for recognizing speech or silent speech. In [81], the authors labeled the sentences using a hidden Markov model. They provided a neural representation of the attempted handwriting using principal component analysis and time-warping of the neural activity. Additionally, they showed a 2D visualization of the neural activity using t-distributed stochastic neighbor embedding. In [80], the EEG signals were first downsampled to 250 Hz and bandpass filtered between 1 and 45 Hz. Additionally, silent parts of the signals were removed. Next, Independent Component Analysis (ICA) was applied to extract meaningful features. In [78], the raw EEG signals are smoothed using the Moving Average (MA) filter and then Discrete Wavelet Transform (DWT) analysis has been applied for decomposing the signals. Furthermore, features from the gamma frequency band were measured from the EEG signals. In a separate article [79], DFT features were extracted from EEG signals for use in user authentication. In the case of dynamic signatures, the feature generation process involved combining the signature trajectory and writing direction, which were both measured. In [73], Several preprocessing techniques like amplification, quantization, noise removal, and sampling have been performed on the raw ECoG data. The PCA-LDA model have been also used here for extracting principle components.
In [75], the neural signals were first digitized using a percutaneous pedestal connector. Next, noise cancellation and anti-aliasing filters were applied to the signals, which were streamed at 1 kHz. In [72], Dynamic Time Warping (DTW) was used in conjunction with Mel Frequency Cepstral Coefficients to extract important features from silent speech. In [70], the ECoG signals were marked according to the onset of phoneme time, and Fast Fourier Transform (FFT) was performed on the ECoG signal. This was done to convert the signals into meaningful features by combining FFT coefficients to form each frequency band of interest. In [69], the first and most important 13 MFCC features were extracted, and first and second-order differentials were computed. This resulted in a total of 39 MFCC features, which were sampled at 100 Hz and mainly used for training purposes. The raw EEG signals are first processed using a moving average filter to remove various types of noise, trends, and artifacts in [67]. Next, the Standard Deviation, Root Mean Square, Sum of Values, and Energy of the signals are computed to extract features. In [35], heartbeat artifacts and high-frequency noise were removed from the Surface Electromyography (sEMG) signals, which were then sampled at 1 kHz. In [68], the raw EEG signals were first normalized, and then a 2nd-order Butterworth band-stop, low-pass, and high-pass filter was used to remove muscular artifacts and random noise.

Machine Learning Methods Used for Training Neural Signals
The machine learning methods used for training neural signals have been divided into 2 parts namely classical classification methods and deep learning methods. We summarized the methods used in the existing research that worked with neural signals. The Figure 6 shows our methods division strategy for better understanding.

Classical Classification Methods
Several studies have utilized classical models to train neural signals for recognizing both speech and handwriting activities. The majority of these studies have employed HMM and GMM to train brain activities. One article [68] used HMM and GMM to train and test EEG signals obtained from the brain. In another study, authors in [78] employed sequential HMM for evaluating three types of testing, including testing with only signatures, testing with only EEG signals, and testing with signature EEG fusion. In [73], the authors used the Viterbi decoding algorithm which is one of the most useful and commonly used decoding algorithms for HMM.
In addition to HMM, LDA was used in [70] to train the entire set of American English phonemes from the ECoG signal. Lastly, JFPM along with a decoding algorithm has been used which utilized SSVEPs [36,77] to implement an EEG-based BCI speller.
In [67], the authors have proposed a classifier based on RF that operates at both a coarse and fine level. To identify three distinct levels of classes, three RF classifiers were run in parallel. The authors stated that the RF classifier is superior to SVM and ANN-based classifiers because it employs bagging ensemble and bootstrap aggregation techniques to create multiple models that are combined to yield greater accuracy. Figure 7a shows the overall accuracy of classical classification methods used till now for working with neural signals. 12 of 23

Classical Classification Methods
Several studies have utilized classical models to train neural signals for recognizing both speech and handwriting activities. The majority of these studies have employed HMM and GMM to train brain activities. One article [68] used HMM and GMM to train and test EEG signals obtained from the brain. In another study, authors in [78] employed sequential HMM for evaluating three types of testing, including testing with only signatures, testing with only EEG signals, and testing with signature EEG fusion. In [73], the authors used the Viterbi decoding algorithm which is one of the most useful and commonly used decoding algorithms for HMM.
In addition to HMM, LDA was used in [70] to train the entire set of American English phonemes from the ECoG signal. Lastly, JFPM along with a decoding algorithm has been used which utilized SSVEPs [36,77] to implement an EEG-based BCI speller.
In [67], the authors have proposed a classifier based on RF that operates at both a coarse and fine level. To identify three distinct levels of classes, three RF classifiers were run in parallel. The authors stated that the RF classifier is superior to SVM and ANN-based classifiers because it employs bagging ensemble and bootstrap aggregation techniques to create multiple models that are combined to yield greater accuracy. Figure 7a shows the overall accuracy of classical classification methods used till now for working with neural signals.

Deep Learning Methods
Most recent articles have employed machine learning techniques to decode EEG signals from the brain [84]. Again, machine learning methods have been used in the training phase of most of the papers. Here the neural data have been trained and tested using various machine learning models. Most of the researchers use RNN for developing the model because of the ability of RNNs to process time-series data better. However, in certain scenarios, CNN is used at the time of training the model with the neural dataset.
In [81], RNNs were used to convert the neural activity into probabilities that describe the likelihood of characters that will be written. The probabilities are then thresholded to identify the actual character. Again for decoding sentences to words from ECoG signals. In [74], an encoder RNN was used to encode each sentence span of neural signal into a conceptual expression. Then, a decoder RNN was used to decode this expression into words and English sentences.
The GRU is also commonly used in most silent speech recognition tasks that involve non-invasive neural signals. In [69], a GRU-based deep learning model was trained using three different feature sets, including only EEG features, only acoustic features, and the concatenation of acoustic and EEG features. In [2], the authors achieved the best results using a ResNet18 + 2GRU neural network. They did not use any dropout, and the Adam optimizer was employed with a 16-mini batch size and a 0.01 learning rate.
BLSTM neural network-based models have also been utilized for a variety of tasks, including speech and handwriting recognition from neural signals. In the article previously discussed [79], a BLSTM neural network-based classifier was employed for both dynamic signatures and EEG signals, both individually and in combination. Here [85], a deep Long Short Term Memory (LSTM) has been used to recognize imaginary speech from EEG data. In another article [72], BLSTM was utilized for decoding kinematic representations of articulation from ECoG signals.
CNN models were also used in the training process. In [35], a CNN model with 5-fold repeated stratified cross-validation was trained using the Adam optimizer and a batch size of 50 to minimize the cross-entropy loss of the spoken dataset. To recognize imaginary speech from EEG data CNN has been used with cross-validation [4]. In [80], the 2D ERP, pattern segments are processed and identified as images, which are then trained on a CNN model to achieve higher accuracy. In [86], a densely connected 3D CNN has been used for speech synthesis from ECOG signals.
The authors in [75] have developed an artificial neural network for speech detection and letter classification. The neural network includes a 1D CNN input layer, followed by two layers of bidirectional GRU. This configuration was chosen to optimize accuracy in these tasks. The authors in [72], employed BLSTM to convert recorded cortical activity into articulatory movement representations, and then converted those representations into speech acoustics during the training process. This approach was utilized to decode cortical activity and improve the accuracy of speech representation. In a study, Hinton et al. [87] proposed a deep neural network-based speech recognition system that outperforms GMMs on a variety of speech recognition benchmarks. Figure 7b shows the distribution of the deep learning methods used in speech and handwritten recognition from neural signals.

Discussion
Previous studies have shown that neural signals can assist individuals with disabilities in their communication and movement. Moreover, neural signals have been applied in a variety of fields such as security and privacy, emotion recognition, mental state recognition [88], user verification, gaming, IoT applications, and others. As a result, the research on neural signals is steadily increasing. Although classical methods were once widely used, machine-learning techniques have yielded promising results in recent years.
When working with neural signals, collecting and processing them can be one of the most challenging tasks. As a result, much of the research in this field has been conducted using non-invasive neural signals, which are easier to collect and process. However, some research has also been done on invasive neural signals. Table 2 summarizes the existing research by presenting the dataset, methods, and other important features of each corresponding study. In [89], Nieto et al. also proposed an EEG-based dataset for inner speech recognition. The use of neural signals to recognize a person's handwriting and speech has received significant attention in recent times. According to a study conducted by authors in [81], identifying letters through neural activity is more practical than point-topoint movements. Inner speech recognition through neural signals is also becoming more popular in research [89]. Out-of-sample accuracy is relatively low in this study Most studies on speech-based Brain-Computer Interfaces (BCIs) have used acute or short-term ECoG recordings, but in the future, the potential of long-term ECoG recordings and their applications could be explored further [23]. Currently, the development of highspeed BCI spellers is one of the most popular research directions. Ongoing innovations aim to increase electrode counts by at least an order of magnitude to improve the accuracy of extracting neural signals. Multimodal approaches using simultaneous EEG or ECoG signals to identify individuals have also gained considerable attention in recent years [79]. The performance of BCI communication can be enhanced by applying modern machine learning models to a large, accurate, and user-friendly dataset. In the future, more robust features may be extracted from EEG or ECoG signals to improve system recognition performance. EEG used to monitor the electrical activity of the brain, is an invaluable tool for investigating disease pathologies. It involves analyzing the numerical distribution of data and establishing connections between brain signals (EEG) and other biomedical signals. These include the electrical activity of the heart measured by electrocardiogram (ECG), heart rate monitoring using a photoplethysmography (PPG), and the electrical activity generated by muscles recorded through electromyography (EMG) [90][91][92]. The integration of neural signals with other biomedical signals has led to diverse applications, such as emotion detection through eye tracking [93], video gaming and game research [94], epilepsy detection [95], and motion classification utilizing sEMG-EEG signals [96,97], among others [98,99].
One other extremely important consideration is the ability to detect and analyze the neural signals in real-time for the production of speech and handwriting. To develop real-time BCI applications, several issues and challenges have to be addressed. The neural data collection methods need to become faster as well as more accurate. The pre-processing techniques for the neural signals should also be improved in terms of their latency and efficiency. At the same time, the decoding and classification methods used on these processed data should also work with good accuracy and low latency. Moreover, for developing real-time BCI, certain features of the neural signals should be extracted from the processed data within a short time. Intraoperative mapping using high-resolution ECoG can be used to produce results within minutes but still, more work need to be done to perform this in real time [100]. The amplitude of the neural signals should remain high, and the latency should remain low. For developing real-time speech detection from ECoG signals the high gamma activity feature has been used in [73]. Again, kinematic features have been used in [81] from ECoG data as well as from EEG [80]. The real-time functional cortical mapping may be used for detecting handwriting and speech from ECoG recording in real time [101]. A Pyramid Histogram of Orientation Gradient features extracted from signature images can be used for fast signature detection from EEG data. Event-related desynchronization/synchronization features from the EEG data may be used for handwriting detection when an individual thinks about writing a character, as shown in [102].
As these technologies target providing access to the signals generated by the brain, ethical issues have emerged regarding the use of BCIs to detect speech and handwriting from neural signals. It is important to consider individuals' freedom of thought in BCI communication, as modern BCI communication techniques raise concerns about the potential for private thoughts to be read [5]. Key concerns involve the invasion of privacy and the risk of unauthorized access to one's thoughts. To address these concerns, solutions may include the implementation of regulations, acquiring informed consent, and implementing strong data protection measures. Furthermore, advancements in encryption and anonymization techniques play a crucial role in ensuring the privacy and confidentiality of individuals. Ongoing research endeavors focus on enhancing BCI accuracy and dependability through the development of signal processing algorithms and machine learning models [103,104].
The future of BCI research in detecting handwriting and speech from neural signals shows immense potential. It offers the possibility of improving the lives of individuals with speech or motor impairments by providing alternative communication options. However, there are challenges that need to be overcome, including improving the accuracy and reliability of BCI systems, developing effective algorithms for decoding neural signals, and addressing ethical concerns such as privacy protection. Moving forward, efforts need to be focused by the new researchers in this field on refining signal processing techniques, exploring novel approaches to recording neural activity and advancing machine learning algorithms [105,106]. One other direction that the current research is focused on is the collection of signals using distributed implants [107][108][109][110][111], which can provide simultaneous recording from multiple sites scattered throughout the brain. Such technologies hold immense promise in terms of providing more information from various regions which potentially produces correlated neural activity during the generation of speech and handwriting.

Conclusions
The future of research in BCIs focusing on the detection of handwriting and speech from neural signals holds significant promise. Innovative advancements in this field have the potential to create a user-friendly and interactive platform that facilitates communication for individuals who experience disabilities related to their mobility, speech, or ability to communicate effectively. In this review paper, we have investigated how the brain signals are generated at the time of speech and the generation of handwriting and the signal collection strategies from the brain. We tried to gather the existing machine learning methods and decoding techniques that work with detecting speech and handwriting from neural signals. We have also investigated which features of the neural signals are very important for recognition purposes. However, to enhance the accuracy of this field, researchers should strive to identify effective signal processing techniques, employ appropriate data collection methods, and select precise machine learning and decoding algorithms suitable for analyzing neural signals.
As non-invasive BCI carries less risk than invasive BCI, research on non-invasive BCI is growing day by day. However, the signals received from non-invasive BCI are weak and prone to interference. Additionally, measuring neural signals is a challenging task. The BCI system is generally much more complicated than other systems. Collecting neural signals is entirely dependent on the individuals, hence users must be very active during signal collection [105]. Nevertheless, there are now more studies focusing on neural signal processing to help paralyzed patients. Silent speech and handwriting recognition with the help of neural signals can be very useful for individuals with limitations in their speech and handwriting. Furthermore, these neural signals have the potential to pave the way for the development of advanced AR/VR applications in the near future. This review can be a great help to those interested in speech and handwriting recognition using neural signals.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: