Role of deep learning models and analytics in industrial multimedia environment

Qureshi, Nawab Muhammad Faseeh; Menon, Varun G.; Bashir, Ali Kashif; Mumtaz, Shahid; Mehmood, Irfan

doi:10.1007/s00530-023-01098-7

Role of deep learning models and analytics in industrial multimedia environment

Editorial
Published: 04 May 2023

Volume 29, pages 1663–1664, (2023)
Cite this article

Download PDF

Multimedia Systems Aims and scope Submit manuscript

Role of deep learning models and analytics in industrial multimedia environment

Download PDF

Nawab Muhammad Faseeh Qureshi¹,
Varun G. Menon²,
Ali Kashif Bashir³,
Shahid Mumtaz⁴ &
…
Irfan Mehmood⁵

966 Accesses
Explore all metrics

Deep learning models and data-driven intelligent analytics are widely used components of artificial intelligence. Deep learning models discover features through autonomous or representation learning and process them through artificial neural networks to retrieve the desired results. There are several base types of deep learning models, such as radial basis function networks (RBFN), recurrent neural networks (RNN), generative-adversarial-networks (GANs), long-short-term memory networks (LSTMs), convolutional neural networks (CNNs), self-organizing maps (SOM), restricted Boltzmann machines (RBM), autoencoders, and multilayer-perceptron (MLP). These types depend on the requirement; for example, the autoencoder is designed to transform input data into a different representation, such as re-generating or re-constructing an image. In the same way, self-organizing maps are created to solve high-dimensional data that consist of the number of features that are larger than the number of observations and use the winning weight award technique to choose distinct features in the high-dimensional complex data. The industrial multimedia data that include hypermedia, hypertext, graphics having 2D and 3D formats, 3D animation, and audio and video types are fragile and complex. And, with the variety of base deep learning models, it is difficult to understand how we use a particular type for a specific multimedia data problem.

We observed the recent research contributions and understood the requirement of utilizing deep learning models in the industrial multimedia environment. And, sought the submissions carefully related to the topic of deep learning models having the scope of multimedia data formats only. In the response, we received various numbers of submissions out of which, a total of thirteen papers were accepted after rigorous review. We share a summary of the contributions from different parts of the world mentioned below.

The paper by Tiago do Carmo Nogueira et al. proposes a novel idea using encoder–decoder structure to extract features from reference images and gated-recurrent-units (GRUs) for creating descriptions. And they used part-of-speech (PoS) analysis to generate weights. They evaluated their technique using MS-COCO and Flickr30k datasets. They performed prediction resulting in more descriptive captions for predicted and KNN-selected captions.

The paper by Ahmed Barnawi et al. presents a new method of detecting COVID-19 using emergency services such as UAVs. They designed and proposed a transfer-learning-based deep CNN architecture to categorize patients into positive, negative, and null (pneumonia patient) categories. Using the developed model, they evaluated their technique through time-bounded services and achieved 94.92% accuracy.

The paper by Faria Nazir et al. proposes a deep learning model to address the problem of language pronunciation mistakes using speech mistakes analysis. They further divide the solution into phonemic errors (confusing phonemes) and prosodic errors (partially modified pronunciation variants of phones). They use CNN-based clustering technique to identify the faults and categorize phonemes through K-nearest neighboring technique along with Naïve Bayes mechanism, and support-vector-machine (SVM) algorithm. They evaluated the model using an Arabic dataset of 28 individuals and received an accuracy of 97% than traditional models.

The paper by Linbo Wang et al. present a collaborative transformational–spatial clustering model that identifies inliers with two-way proximities. They discuss the technique so that, at first, a generalized match is transformed into a collaborative transformational–spatial space. After that, a collaborative kernel density estimator maps the object with images. Finally, they fix matching proximities to enhance application on different images. They perform experiments and achieve superior performance on feature-identical jobs, such as multi-object pairing, duplicate-object pairing, and object-retrieve technique.

The paper by Loveleen Gaur et al. discusses a deep learning model that detects COVID-19 using autonomous deep convolutional neural networks. Using transfer learning, they focus on chest X-rays and evaluate three pre-trained CNN models, EfficientNetBo, VGG16, and InceptionV3. They consider the technique by measuring performance metrics, such as accuracy, recall, precision, and F1 scores. They achieve an overall accuracy of 92.92% with a sensitivity of 94.79%.

The paper by Asma Kausar et al. proposes a deep learning model to automate left-side-atrium segmentation on magnetic resonance imaging (MRI) to assist medication and diagnosis of heart surgical treatment. They discuss a three-D multi-scale residual-learning-based model to maintain granular and standard-level features through a network. They evaluated their model using the award-winning left-atrial-segmentation technique with less constraints. They claimed not to add any extensive pre-processing of input data for the said task.

The paper by Jimmy Ming-Tai Wu et al. proposes a graph-based CNN-LSTM deep learning mode and predicts stock prices having high indicators. They use a financial time series dataset onto a joint convolution neural network (CNN) and long-short-term-memory neural network (LSTM) and constructs a sequence-array of historical data with leading indicators. They evaluated their model using the USA and Taiwan stock datasets and achieved better results than existing approaches.

The paper by Gengsheng Xie et al. presents a re-identification (Re-ID) technique that focuses on a deep metric representation technique for extracting the features through a dataset. They discuss a pose-guided feature region-based fusion network (PFRFN) for using pose landmarks as local features. They evaluate the technique using various datasets, such as Market-1501, DukeMTMC, and CUHK03, and achieve improvement over traditional models.

The paper by Sumit Pundir et al. proposes an intelligent machine learning model that handles the botnet attacks through a malware detection technique in the IoT-enabled industrial multimedia environment. They use four types of methods: naïve Bayes, logistic regression, artificial neural network (ANN), and random forest, to detect the malware. They evaluate the idea and achieve 99.5% detection success having a 0.5% false positive rate.

The paper by Akshi Kumar presents a model that focuses on crowd knowledge and answers how-to-do concerns in the QA websites. For that, he develops the mechanism of Siamese neural architecture and extracts similarity-matching features. Furthermore, the training is performed through a multi-layer perceptron for predictions. And semantically matched questions are grouped to figure out the experts. He evaluated the technique by combining multi-layer perceptron and Manhattan distance function and compared the results with existing models.

Another paper by Ranran Lou et al. proposes a model to protect the ocean environment and predict the unknown elements and deep sea resource monitoring. They use a data-driven analytics approach to analyze ocean data, including sound source identification, element prediction, and physical constraints. They evaluated the model with standard ocean datasets and compared the results with existing approaches.

In the paper by Mohib Ullah Khan et al. presents a technique focusing on social media reviews for the restaurant industry. They use a novel convolutional attention-based bi-directional modified LSTM of the word, successive sequences, and patterns with aspect category detection (ACD). They extract features of public reviews as entities and attributes to further develop sequences and patterns. They compare the technique using SemEval-2015, SemEval-2016, and SentiHood datasets and achieve results with an average improvement of 79% than traditional models.

In the last, the paper by Celestine Iwendi et al. proposes an experimental analysis based on four deep learning models, such as recurrent neural network (RNNs) along with Bidirectional Long Short-Term Memory (BLSTM), Long Short-Term Memory (LSTMs), and Gated Recurrent Units (GRU), for detecting insults in the social media platform commentary. For that, they develop a method of text cleaning, tokenization, stemming, Lemmatization, and removal of stop words and perform prediction using the models. They evaluate deep learning models and share findings compared to existing models.

We are excited to share the details and hope that the research community related to deep learning models will find these articles with colossal interest and relevant to the multimedia-based deep learning models. We thank Editor-In-Chief Prof. Changsheng Xu and the editorial staff especially Garth Haller, senior publisher, for their support and collaboration in executing this special issue in the Multimedia Systems Journal.

Author information

Authors and Affiliations

Department of Computer Education, Sungkyunkwan University, Seoul, Republic of Korea
Nawab Muhammad Faseeh Qureshi
Department of Computer Science, SCMS School of Engineering and Technology, Kochi, India
Varun G. Menon
Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, UK
Ali Kashif Bashir
Nottingham Trent University, Nottingham, UK
Shahid Mumtaz
Center of Visual Computing, University of Bradford, Bradford, UK
Irfan Mehmood

Authors

Nawab Muhammad Faseeh Qureshi
View author publications
You can also search for this author in PubMed Google Scholar
Varun G. Menon
View author publications
You can also search for this author in PubMed Google Scholar
Ali Kashif Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Shahid Mumtaz
View author publications
You can also search for this author in PubMed Google Scholar
Irfan Mehmood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nawab Muhammad Faseeh Qureshi.

Ethics declarations

Conflict of interest

The authors do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qureshi, N.M.F., Menon, V.G., Bashir, A.K. et al. Role of deep learning models and analytics in industrial multimedia environment. Multimedia Systems 29, 1663–1664 (2023). https://doi.org/10.1007/s00530-023-01098-7

Download citation

Published: 04 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00530-023-01098-7

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Role of deep learning models and analytics in industrial multimedia environment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation