A novel sampling-based visual topic models with computational intelligence for big social health data clustering

Narasimhulu, K.; Abarna, K. T. Meena; Kumar, B. Siva; Suresh, T.

doi:10.1007/s11227-021-04300-7

A novel sampling-based visual topic models with computational intelligence for big social health data clustering

Published: 19 January 2022

Volume 78, pages 9619–9641, (2022)
Cite this article

Download PDF

The Journal of Supercomputing Aims and scope Submit manuscript

A novel sampling-based visual topic models with computational intelligence for big social health data clustering

Download PDF

K. Narasimhulu ORCID: orcid.org/0000-0003-0756-379X¹,
K. T. Meena Abarna¹,
B. Siva Kumar^1,2 &
…
T. Suresh¹

935 Accesses
3 Citations
Explore all metrics

Abstract

Twitter is a popular social network for people to share views or opinions on various topics. Many people search for health topics through Twitter; thus, obtaining a vast amount of social health data from Twitter is possible. Topic models are widely used for social health-care data clustering. These models require prior knowledge about the clustering tendency. Determining the number of clusters of given social health data is known as the health cluster tendency. Visual techniques, including visual assessment of the cluster tendency, cosine-based, and multiviewpoint-based cosine similarity features VAT (MVCS-VAT), are used to identify social health cluster tendencies. The recent MVCS-VAT technique is superior to others; however, it is the most expensive technique for big social health data cluster assessment. Thus, this paper aims to enhance the work of the MVCS-VAT using a sampling technique to address the big social health data assessment problem. Experimental is conducted on different health datasets for demonstrating an efficiency of proposed work. Accuracy of social health data clustering is improved at a rate of 5 to 10% in the proposed S-MVCS-VAT when compared to MVCS-VAT. From obtained results, it also proved that the proposed S-MVCS-VAT is a faster and memory efficient for discovering social health data clusters.

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Targeted marketing on social media: utilizing text analysis to create personalized landing pages

Article 04 April 2024

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

Article Open access 15 April 2024

1 Introdcution

Twitter is one of the platforms commonly used by social users to share opinions or trusts across the world. People can share experiences or opinions through tweets. Health care data are an emerging need for society, and it is necessary to automate tweet health data to identify major health problems in society. Usually, health-care tweet data are extensive, and tweet data need to be assessed to find knowledge about significant health problems (or health clusters). This is the crucial motivation for addressing the health cluster tendency problem. Visual techniques, such as VAT [1], cVAT [2], and MVCS-VAT [3], can be used to access information about several clusters of tweet health data (or social health data). Popular topic models, including nonmatrix factorization (NMF) [4], latent semantic indexing (LSI) [5], probabilistic LSI (PLSI) [6], and latent Dirichlet allocation (LDA) [7], are used to extract the topic features of tweet data. The topic-tweet document matrix is created using the topic models for the set of tweet documents. TF-IDF is another alternative matrix for describing tweet document features based on term analysis, and the matrix usually known as the TF-IDF matrix [8]. Tweet document analysis using topics is more practical than using the TF-IDF matrix because data sparsity occurs in the TF-IDF matrix.

The topic-document matrix (TDM) is the most recommended approach in text clustering applications [9] [25]. Dissimilarity features are derived using a Euclidean distance measurement in a VAT. In a cVAT, the dissimilarity features are derived using the cosine distance metric. In the majority of text clustering applications [10] [23][26], the authors proved that cosine-based cluster assessment is more informative than a standard Euclidean distance formula. In a cVAT, the cosine-based similarity is measured using a single reference viewpoint, i.e., the origin. An extended version of the cVAT is the MVCS-VAT [3]. In MVCS-VAT, the cosine-based similarity values are derived using multiple viewpoints. Deriving the similarity using multiple viewpoints is a more accurate mechanism than a single viewpoint approach in the cVAT. Justifying the cluster assessment using the multiviewpoint cosine-based similarity values is more appropriate than the justification of a single viewpoint. The recent MVCS-VAT methods conducts the cluster assessment of health data in an excellent manner [27][31]. Each cluster represents a health cluster, which clusters the tweets; and those tweets belong to the same health topic are discussed. The tweets are categorized into health clusters based on the similarities among tweet documents. The problem of the MVCS-VAT is that it takes more computational time and memory space due to the assessment of health clusters using multiple viewpoints. For example, finding the similarity between two tweets documents t1 and t2 among the n documents is performed using n-2 viewpoints. Every tweet among the n tweets is taken as a viewpoint except t1 and t2; hence, there are ‘n-2’ viewpoints. The cosine similarity is computed between two tweet documents for n-2 viewpoints. Finally, similarity computation is applied for n(n-1)/2 cases concerning n-2 viewpoints. Thus, the total computation time is n(n-1)(n-2)/2. Therefore, the MVCS-VAT is a more expensive cluster assessment model for a large number of tweet documents. The proposed work uses an effective sampling procedure to further extend the MVCS-VAT[28]. The existing study proposes using a constant number of sample viewpoints instead of taking the n-2 multiple viewpoints in the proposed sampling-based MVCS-VAT (S-MVCS-VAT) algorithm. The algorithm and experimental details are demonstrated in the next sections.

The key contributions of the paper are summarized as follows:

1.
Health clusters from big social data are assessed.
2.
A sampling-based visual technique for determining the health clusters in a visual form is proposed.
3.
Crisp partitions are derived from the visual images from the proposed S-MVCS-VAT.
4.
Significant social health data cluster results are derived.
5.
The performance of visual techniques for social and benchmark health data is empirically demonstrated.

The remaining sections are summarized as follows: Sect. 2 presents the literature on visual techniques for precluster assessment; Sect. 3 introduces the proposed sampling-based MVCS-VAT; Sect. 4 illustrates the experimental study; and, finally, Sect. 5 provides the conclusion and future scope of the work.

2 Literature of visual techniques for precluster assessment

Top clustering methods, such as k-means [11] and hierarchical clustering, are widely used in clustering-related applications [12]. The data clustering process depends on two crucial steps: finding the knowledge about the number of clusters and making a data partition of the data. Determining the number of clusters is known as the cluster tendency problem. Social health data are the opinions or views of social users on Twitter. Social health data are tweeted health data. Finding the categories of clusters of social health data based on health topics is known as finding the health cluster tendency [29]. The preassessment of several health topics in social data is a challenging problem. With this motivation, many visual techniques are surveyed for the precluster assessment of social health data. Bezdek et al. [1] proposed a basic model, namely the visual assessment of (cluster) tendency (VAT), for determining the number of clusters of numerical data. It works for numerical data. Its algorithmic is shown in the following.

Thus, social data are initially preprocessed into the topic-document matrix using various topic models [13]. This is a better representation of social data than the TF-IDF matrix. Four topic models, latent Dirichlet allocation, latent semantic indexing (LSI), probabilistic latent semantic indexing (PLSI), and nonnegative matrix factorization (NMF), are the recommended topic models in text clustering-related applications. These models are used to convert the social data into a numeric topic-document matrix. With this matrix, social health data are denoted in the form of a numeric representation. In a VAT [14], the social health topic-document matrix is used to find the dissimilarity features using the Euclidean distance matrix. The reordered dissimilarity matrix (RDM) [15] is derived according to the given steps of the VAT and then displays the image of the RDM. The number of health clusters (or health cluster tendency) is derived from the count of the number of square-shaped dark colored blocks in the RDM image (also known as the VAT image). A cosine metric uses vectors’ magnitude and distance to find the similarity features between two data objects whereas a Euclidean distance metric only uses the distance. Therefore, in a text clustering application, cosine-based cluster assessment succeeds more than Euclidean distance assessment. Following a cosine metric, another visual technique, i.e., the cosine-based VAT (cVAT), was developed in [12] for the precluster assessment problem.

In the cVAT, the similarity (or dissimilarity) features between two data objects are derived using a single viewpoint, i.e., the origin. Computing similarity features using a single viewpoint cannot provide a more informative assessment. Thus, multiple viewpoints are used in the later development of visual techniques, such as the multiviewpoint-based cosine similarity features VAT (MVCS-VAT) [3]. The MVCS-VAT is the most recommended visual technique to acquire accurate similarity features using a multiple viewpoint strategy instead of just a single viewpoint. For n tweet documents, as per the MVCS-VAT, n-2 viewpoint computations are needed to find the cosine-based similarity features among any two tweet documents. Finally, average n-2 similarity features concerning n-2 viewpoints are taken as the similarity features between the two tweet documents. This method is most accurate for visualizing the number of clusters for the set of n tweet documents [30]. The approach for the similarity feature computation between any two documents for the set of five tweet documents is shown in Fig. 1.

The key limitation of the MVCS-VAT is that it demands more computational time and memory allocation for finding the social data clustering results from a large set of tweet documents. The proposed methods present the best sampling-based MVCS-VAT for the scalable computation of social data health clustering results.

Further work must find the similarity features between the tweet documents for sample viewpoints instead of n-2 viewpoints. Social data are enormous big data; thus, this proposed base sampling idea optimizes the time and memory requirements in finding health cluster tendencies. This optimized approach to find the health cluster tendency from social data is derived in the next section.

3 Proposed sampling-based Mvcs-Vat (S-Mvcs-Vat)

The clustering of social data (tweet health data) depends on the similarity features of data objects. The cosine-based similarity features are very successful in text data clustering applications. The similarity features concerning a single origin or a single reference viewpoint are derived. The MVCS-VAT uses multiple viewpoints to find accurate similarity features among the tweet documents compared to a single reference viewpoint. Due to the expensiveness of the MVCS-VAT, our proposed work takes the sample viewpoints to determine the quality of social health data clustering results. Algorithm 1 illustrates the procedural steps of the proposed work.

The proposed algorithm uses topic models, such as LDA, LSI, PLSI, and NMF, to extract the features of health tweets in topic-document matrix form. The proposed algorithm reduces the sparsity problem of tweet data. The topic-document matrix was then converted into a bag-of-features representation of tweet data. The features of tweets are denoted in the vector representation {TF₁,TF₂,…TF_N}.

Randomly select the r^th tweet document feature, find the distances between TF_r and {TF₁,TF₂,…TF_N} and save the distances into ‘Dist_Start.’ The maximum distance-maintained tweet data object is determined using the argmax function, and the corresponding tweet document number is saved into the variable ‘index.’ These are in Step 1 and Step 2. Next, the distance array Dist_I is updated according to explored tweet documents, and this is in Step 3. Again, the tweet document with the largest deviation is selected by applying the argmax function to Dist_I. The corresponding index found by the argmax is another centroid of tweet datasets. The same procedural steps are repeated to find the remaining expected number of centroids of the clusters. After selecting the centroids, the remaining tweet documents are moved into the nearest centroids based on the distances measured in Step 4. The distances are measured using the cosine distance metric of the sample viewpoints. The size of the sample viewpoints is measured based on a percentage of s. The mentioned percentage of samples is equally sampled from every cluster (except clusters TF1 and TF2). These steps are clearly illustrated, similarity features concerning the sample viewpoints are computed, and the C_MVCS computational statement is shown. Dissimilarity values are stored in DM, and normalized matrix values are stored in NormS.

The reordered dissimilarity matrix is computed by applying the visual assessment tendency (VAT) to NormS, as shown in Step 7. The RDM image is visualized to assess the number of visual clusters by counting the squared shaped dark colored blocks that appear along the diagonal. The crisp partitions of the RDM image show the predicted cluster labels of health tweets, which discover the health data clustering results; and these steps are clearly illustrated in Step 8 and Step 9.

For the proposed algorithm, the similarity features for the pair of tweet documents are derived using every viewpoint; and finally, the average of the obtained similarity values is used in the computation of tweet document similarity features. The similarity feature computation is less expensive due to taking sample viewpoints instead of a large number of all viewpoints. This provides a considerable improvement for finding the social data clustering results compared to the state-of-the-art visual topic models.

In the recent MVCS-VAT technique, effective social health data clustering results are derived using all given viewpoints. For small datasets, the MVCS-VAT is very impressive at determining the clustering tendency and individual social data clustering results. However, the amount of social data is massive; therefore, the MVCS-VAT uses many viewpoints to find the social health data clustering results. Ultimately, the method demands large computational and spatial costs. The MVCS-VAT is always suitable for finding social data clustering results, and it is expensive for big social data. Our proposed S-MVSC-VAT uses the sampling schema to perform scalable computations for big social data clustering. The experimental demonstrations are presented in the following section.

4 Experimental study

Tweet data [2] are collected on different health topics to assess health data clustering results. Each subset of data is created with specific health topics. Table 1 presents the details of the social health data in terms of a number of health topics [18], names of health diseases, and the size of the datasets.

Table 1 Social health datasets topics description

Full size table

Benchmarked health datasets are retrieved based on the health keywords provided by TREC [16] [17], which are mentioned in the same table.

After extracting the tweet features in the form of a bag-of-features, various big social data visual clustering methods are tested in the experimental study. Three traditional visual methods, the VAT, cVAT, MVS-VAT, and the proposed S-MVCS-VAT are applied to the provided big social data. Visual images with excellent clarity are provided by both the S-MVCS-VAT and MVCS-VAT compared to other visual methods. The notable improvement of the proposed method is that it can derive faster health data clustering results than the MVCS-VAT.

The crisp partitions are derived based on the diagonal and nondiagonal pixel intensity values. The cluster labels of data objects are derived based on these cluster partitions, and the results are shown in Fig. 5b for three data topics.

Tweet document features are extracted through the four different topic models: LDA, LSI, PLSI, and NMF. Figure 2, Fig. 3, Fig. 4, and Fig. 5a show the results of visual health data clustering for these topic models. From the illustration of the visual health data clustering results, S-MVCS-VAT shows the visual clusters.

in the form of diagonal square-shaped dark colored blocks with outstanding clarity under all four topic models.

The clarity of the proposed work with sampling viewpoints is the best. With sampling viewpoints and without sampling approaches showed almost the same clarity of visual clusters.

Crisp partitions and consequent quality clustering results depend on the clarity of visual image clusters. The S-MVCS-VAT has the ability to obtain social health data clustering results with optimized time and space values. All four proposed variants are developed with the four specified topic models. These are the LDA-S-MVCS-VAT, LSI-S-MVCS-VAT, PLSI-S-MVCS-VAT, and NMF-S-MVCS-VAT. All the comparative analyses of time values (taking the speed parameter) of four variants of existing and proposed models are shown in Figs. 6, 7 and 8. These figures compare the same models using the memory space parameter and time comparison parameter. Empirical analysis of the speed, memory, and time and space costs shows that the proposed S-MVCS-VAT is a more scalable visual health data clustering model in speed and memory efficiency. This leads to the S-MVCS-VAT being faster and more memory efficient than other visual health data clustering models.

The performance or quality of the visual data clustering models is evaluated using four parameters: the cluster accuracy (CA) [19], normalized mutual information (NMI) [20], precision [21], and recall [21]. These values are given in Tables 2, 3,4, and 5, respectively.

Table 2 Cluster Accuracy (CA) for the visual health data cluster models

Full size table

Table 3 Normalized mutual information (NMI) for the visual health data cluster models

Full size table

Table 4 Precision (P) for the visual health data cluster models

Full size table

Table 5 Recall (R) for the visual health data cluster models

Full size table

From the crisp partitions, the data object labels are predicted, and the performance of visual health cluster models is evaluated based on the matching the predicted cluster labels and ground truth labels using CA, NMI, precision, and recall.

4.1 Critical observations

The proposed method used the sample viewpoints only to assess the cluster tendency and data clustering results. Thus, the proposed method is faster method than the MCS-VAT. Crisp partition images with the best clarity and goodness-of-fit occur when using the proposed method. The proposed work is able to discover the quality of large social health data clustering results.

Table 6 presents the goodness-of-fit of the existing and proposed visual images and shows that S-MVCS-VAT scored higher than the other methods underlying the four topic models.

Table 6 Goodness-of-fit of the visual Images

Full size table

The overall experimental analysis shows that the accuracy was improved at a rate of 5 to 10% in the proposed S-MVCS-VAT method underlying the four topic models NMF, LDA, LSI, and PLSI for big social health data.

5 Conclusion and future scope

Health data assessment is an emerging need in society. Twitter is one of the enriched social sources for people to exchange views or opinions on any topic. Big social data are extracted through Twitter using lakhs of tweets. For the lakhs of tweets, it is most expensive to find social health data clusters. The recent visual technique, the MVCS-VAT, effectively conducts social health data cluster assessment with n-2 multiple viewpoints. The proposed work uses an efficient sampling strategy and four topic models to enhance the MVCS-VAT. Experimental is carried out on 18 different case studies, i.e., 18 different subsets of health datasets. Overall observation of these experimental states that proposed S-MVCS-VAT improves the quality of social health data clusters with significant rate of 5 to 10%. Goodness-of-fit images for the visual clusters are much improved in S-MVCS-VAT for all these datasets. Two scalable parameters, i.e., computational time and memory, are calculated for the proposed S-MVSC-VAT and existing MVCS-VAT underlying with different topic models for all 18 case studies (i.e., 2 topics to 15 topics; 2 topics to 5 topics in TREC 2018) carried in the experimental work. It proved that the proposed S-MVCS-VAT is more scalable with respect to computational time and memory allocation. Future work can be extended to develop scalable ailment visual techniques for health analysis and socially recommended solutions.

References

Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02, 2002, p 2225–2230
Vijeya Kaveri V, Maheswari V (2019) A framework for recommending health-related topics based on topic modeling in conversational data (Twitter). Cluster Computing.
Narasimhulu K, Meena AbarnaSivakumar KTB (2021) An enhanced cosine-based visual technique for the robust tweets data clustering. Int J Intell Comp Cybern. 14(2):170–184. https://doi.org/10.1108/IJICC-10-2020-0151
Article Google Scholar
Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing SYSTEMS 13, NIPS, Denver, CO, USA p 556–562
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Hofmann T (1999) Probabilistic latent semantic indexing. SIGIR. ACM, New York, pp 50–57
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018
Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining, knowledge information system, vol 14. Springer, Heidelberg, pp 1–37
Google Scholar
Suleman Basha M, Mouleeswaran SK, Rajendra Prasad K (2019) Cluster tendency methods for visualizing the data partitions. Int J Innovative Technol Explor Eng (IJTEE). https://doi.org/10.35940/ijitee.K2285.0981119
Article Google Scholar
J. Wang and X. Su (2011) An improved K-Means clustering algorithm. In: IEEE 3rd International Conference on Communication Software and Networks, p. 44–46. https://doi.org/10.1109/ICCSN.2011.6014384.
Rajendra Prasad K, Mohammed M, Noorullah RM (2019) Hybrid topic cluster models for social healthcare data. Int J Adv Comput Sci Appl 10(11):490–506. https://doi.org/10.14569/IJACSA.2019.0101168
Article Google Scholar
Suleman Basha M, Mouleeswaran SK, Prasad KR (2021) Sampling-based visual assessment computing techniques for an efficient social data clustering. J Supercomp. 77:8013–8037. https://doi.org/10.1007/s11227-021-03618-6
Article Google Scholar
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385
Article Google Scholar
Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS 10(12):1–20
Google Scholar
https://trec.nist.gov/data/web2014.html
https://trec.nist.gov/data/microblog2015.h
https://www.webmd.com/
Pattanodom et al. (2016) Clustering data with the presence of missing values by ensemble approach. In: Second Asian Conference on Defense Technology
Alessia Amelio, Clara Pizzuti (2015) Is normalized mutual information a fair measure for comparing community detection methods. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Bhatnagar V, Majhi R, Jena PR (2018) Comparative performance evaluation of clustering algorithms for grouping manufacturing frms. Arab J Sci Eng 43:4071–4083
Article Google Scholar
Rajendra Prasad K, Mohammed M, Noorullah RM (2019) Visual topic models for healthcare data clustering. Evol Intel. https://doi.org/10.1007/s12065-019-00300-y
Article Google Scholar
Basha S (2020) comparison of real datasets characteristics by using clustering approaches. J mech cont math sci. https://doi.org/10.26782/jmcms.2020.08.00061
Article Google Scholar
Todd Gamblin, Bronis R.de Supinski, Martin Schulz, Rob Fowler, Danier A. Reed, (2010) Clustering performance data efficiently at massive scales. In: ICS '10 Proceedings of the 24th ACM International Conference on Supercomputing, p 243–252. https://doi.org/10.1145/1810085.1810119
Surya Bhupal Rao, S. Rahamat Basha, G. Ravi Kumar (2020) A comparative approach of text mining: classification, clustering and extraction techniques. J Mech Continua Math Sci. (5)120–131
Shafqat S, Kishwer S, Rasool RU et al (2020) Big data analytics enhanced healthcare systems: a review. J Supercomput 76:1754–1799. https://doi.org/10.1007/s11227-017-2222-4
Article Google Scholar
Vidhya K, Shanmugalakshmi R (2020) Modified adaptive neuro-fuzzy inference system (M-ANFIS) based multi-disease analysis of healthcare Big Data. J Supercomput 76:8657–8678. https://doi.org/10.1007/s11227-019-03132-w
Article Google Scholar
Hashimoto T, Shepard DL, Kuboyama T et al (2021) Analyzing temporal patterns of topic diversity using graph clustering. J Supercomput 77:4375–4388. https://doi.org/10.1007/s11227-020-03433-5
Article Google Scholar
AlZubi AA (2020) Big data analytic diabetics using map reduce and classification techniques. J Supercomput 76:4328–4337. https://doi.org/10.1007/s11227-018-2362-1
Article Google Scholar
Doghri W, Saddoud A, Chaari Fourati L (2021) Cyber-physical systems for structural health monitoring: sensing technologies and intelligent computing. J Supercomput. https://doi.org/10.1007/s11227-021-03875-5
Article Google Scholar
Krishnaraj N, Bellam K (2020) Improved Distributed Frameworks to Incorporate Big Data through Deep Learning. Journal of Advanced Research in Dynamical & Control Systems 12:332–338
Google Scholar

Download references

Author information

Authors and Affiliations

Annamalai University, Chidambaram, Tamilnadu, India
K. Narasimhulu, K. T. Meena Abarna, B. Siva Kumar & T. Suresh
Department of CSE, Rajeev Gandhi Memorial College of Engineering & Technology, Nandyal, Andhra Pradesh, India
B. Siva Kumar

Authors

K. Narasimhulu
View author publications
You can also search for this author in PubMed Google Scholar
K. T. Meena Abarna
View author publications
You can also search for this author in PubMed Google Scholar
B. Siva Kumar
View author publications
You can also search for this author in PubMed Google Scholar
T. Suresh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Narasimhulu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narasimhulu, K., Abarna, K.T.M., Kumar, B.S. et al. A novel sampling-based visual topic models with computational intelligence for big social health data clustering. J Supercomput 78, 9619–9641 (2022). https://doi.org/10.1007/s11227-021-04300-7

Download citation

Accepted: 28 December 2021
Published: 19 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11227-021-04300-7

A novel sampling-based visual topic models with computational intelligence for big social health data clustering

Abstract

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Targeted marketing on social media: utilizing text analysis to create personalized landing pages

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

1 Introdcution

2 Literature of visual techniques for precluster assessment

3 Proposed sampling-based Mvcs-Vat (S-Mvcs-Vat)

4 Experimental study

4.1 Critical observations

5 Conclusion and future scope

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel sampling-based visual topic models with computational intelligence for big social health data clustering

Abstract

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Targeted marketing on social media: utilizing text analysis to create personalized landing pages

A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise

1 Introdcution

2 Literature of visual techniques for precluster assessment

3 Proposed sampling-based Mvcs-Vat (S-Mvcs-Vat)

4 Experimental study

4.1 Critical observations

5 Conclusion and future scope

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation