Computer Science and Information Systems 2023 Volume 20, Issue 4, Pages: 1591-1638
https://doi.org/10.2298/CSIS230331062B
Full text ( 1639 KB)


DG_summ: A schema-driven approach for personalized summarizing heterogeneous data graphs

Beldi Amal (Tunis El Manar University, Faculty of Mathematical Physical and Natural Sciences of Tunis, SERCOM Laboratory, Tunis, Tunisia + University Pau & Pays Adour, LIUPPA, Anglet, France), salma.tissaoui@univ-pau.fr
Sassi Salma (University Pau & Pays Adour, LIUPPA, Anglet, France), richard.chbeir@univ-pau.fr
Chbeir Richard (University Pau & Pays Adour, LIUPPA, Anglet, France), abderrazekjai@yahoo.co.uk
Jemai Abderrazek (Tunis El Manar University, Faculty of Mathematical Physical and Natural Sciences of Tunis, SERCOM Laboratory, Tunis, Tunisia + Carthage University, Polytechnic School of Tunisia, SERCOM Laboratory, INSAT, Tunis, Tunisia)

Advances in computing resources have enabled the processing of vast amounts of data. However, identifying trends in such data remains challenging for humans, especially in fields like medicine and social networks. These challenges make it difficult to process, analyze, and visualize the data. In this context, graph summarization has emerged as an effective framework aiming to facilitate the identification of structure and meaning in data. The problem of graph summarization has been studied in the literature and many approaches for static contexts are proposed to summarize the graph. These approaches provide a compressed version of the graph that removes many details while retaining its essential structure. However, they are computationally prohibitive and do not scale to large graphs in terms of both structure and content. Additionally, there is no framework providing summarization of mixed sources with the goal of creating a dynamic, syntactic, and semantic data summary. In this paper, our key contribution is focused on modeling data graphs, summarizing data from multiple sources using a schema-driven approach, and visualizing the graph summary version according to the needs of each user. We demonstrate this approach through a case study on the use of the E-health domain.

Keywords: Heterogenous data, labeled graph, Graph summarization, operation, structure, content, versioning