Editorial

We are in era of “Big Data”. As data and knowledge volume keep increasing while global means for information dissemination continue to diversify, new methods, modeling paradigms and structures are needed to efficiently mount scalability requirements [1]. In the last few years, we have seen the proliferation of the use of heterogeneous distributed systems, ranging from simple Networks of Workstations, to highly complex grid computing environments. Such computational paradigms have been preferred due to their reduced costs and inherent scalability, which pose many challenges to scalable systems and applications in terms of information access, storage and retrieval. Cluster computing [2], cloud computing technology [3], data and knowledge bases, distributed information retrieval technology [4] and networking technology [5] should all converge to address the scalability concern. Furthermore, with the advent of emerging computing architectures (e.g., SMTs, GPUs, and multicores), the importance of designing techniques explicitly targeting these systems is becoming more and more important.

The 5th International Conference on Scalable Information Systems (InfoScale) has mainly focused on a wide array of scalability issues and investigated new approaches to tackle problems arising from the ever-growing size and complexity of information of all kinds.

This special issue features six selected papers with high quality from InfoScale held in Seoul, Korea, September 25–26, 2014. The first paper, entitled “Real-time Event Detection on Social Data Stream”, presents an interesting framework to process big social data. Particularly, by analyzing the data streams from social media, useful events can be detected and applied to provide users with intelligent context-aware services.

The second paper “Similarity Searching for the Big Data” authored by Pavel Zezula considers the scalable framework for big data processing. The paper firstly finds out a necessity to discover descriptive information of complex and heterogeneous objects to make them accessible. Second, multimodal search structures are requested to efficiently conduct complex similarity queries possibly in outsourced environments while preserving privacy.

The third paper “Weighted Similarity Schemes for High Scalability in User-Based Collaborative Filtering” by Pirasteh et al. presents a novel recommender system (RecSys) based on similarity integration method. Different from conventional RecSys (which is based on single similarity between a pair of users), the proposed system collects all possible similarities between two users, and integrates them for representing the relationship between them in the best manners.

The fourth paper “A Novel Ranking Model for a Large-Scale Scientific Publication” investigates how to rank academic publications (e.g., research papers, conference proceedings, and journals), which are modeled in very highly complex networks. Given a large amount of publication data, authors have proposed multi-layered network model (call N-star model) to process.

The sixth paper “Multi-modal Similarity Retrieval with Distributed Key-value Store” by David Novak studies distributed system architecture by using key-value store. Particularly, with designing several search aspects (called, modalities), the study shows an efficient information retrieval performance for a large amount data (e.g., CoPhIR benchmark dataset).

The last paper “Big Bibliographic Data Analytics by Random Walk Model” by Jason J. Jung shows an interesting probabilistic approach to analyze a large amount of bibliographic data for providing users with various publication (and Citation) services. Particularly, the proposed method is based on random work for measuring citation networks among the research papers.

The guest editors are thankful to our reviewers for their effort in reviewing the manuscripts. We also thank the Edit-in-Chief, Dr. Imrich Chlamtac for his supportive guidance during the entire process.