Efficient XML Data Processing Based on MapReduce Framework

Chen, Shih-Ying; Chen, Hung-Ming; Zeng, Wei-Chen

doi:10.1007/978-94-017-9558-6_12

Shih-Ying Chen⁵,
Hung-Ming Chen⁵ &
Wei-Chen Zeng⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 329))

1087 Accesses

Abstract

Due to the advances of information technology, new devices generate amount of data. Especially, XML is a standard format for data exchange. Therefore, processing big XML data is an important topic. We propose an efficient XML data processing mechanism, which includes a design of XMLInputFormat class, MapReduce modules, and an HBase schema. The mechanism scans an XML document to reconstruct parent-child relationships in the document. It generates deserialized paths, which are stored in HBase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

High-performance XML modeling of parallel queries based on MapReduce framework

Article 14 September 2016

A Study of Heterogeneous Database Integration Based on Web Service and XML

Flexible Data Management across XML and Relational Models: A Semantic Approach

References

Chen Z, Gehrke J, Korn F, Koudas N, Shanmugasundaram J, Srivastava D (2007) Index structures for matching xml twigs using relational query processors. Data Knowl Eng 60(2):283–302
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Google Scholar
Elghandour I, Aboulnaga A, Zilio DC, Chiang F, Balmin A, Beyer K, Zuzarte C (2008) An XML index advisor for DB2. In: Proceedings of ACM SIGMOD, pp 1267–1270
Google Scholar
Emoto K, Imachi H (2012) Parallel tree reduction on MapReduce. Procedia Comput Sci 9:1827–1836
Article Google Scholar
http://msdn.microsoft.com/zh-tw/windowsazure/ff721941.aspx
http://oss-tw.blogspot.tw/2010/04/hbase-vs-cassandra.html
https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/filecache/DistributedCache.html
Lin X, Wang N, Xu D, Zeng X (2010) A novel XML keyword query approach using entity subtree. J Syst Softw 83:990–1003
Article Google Scholar
Roddick C, Braganholo V, Mattoso M (2011) Virtual partitioning ad-hoc queries over distributed XML databases. J Inf Data Manage 2:495–510
Google Scholar
White T (2010) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Google Scholar
Zinn D, Bowers S, Kohler S, Ludascher B (2010) Parallelizing XML data-streaming workflows via MapReduce. J Comput Syst Sci 76:447–463
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work is supported in part by NSC in Taiwan, R.O.C. under Grant No. NSC-100-2221-E-025-014. The authors are also grateful to the reviewers for their helpful comments.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung, Taiwan
Shih-Ying Chen, Hung-Ming Chen & Wei-Chen Zeng

Authors

Shih-Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chen Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shih-Ying Chen .

Editor information

Editors and Affiliations

Seoul National University of Science and Technology (SeoulTech), Seoul, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
Yi Pan
Digital Media Engineering, Anyang University, Anyang, Korea, Republic of (South Korea)
Cheonshik Kim
Information & Communication Technologies, Swinburne University of Technology, Melbourne, Victoria, Australia
Yun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, SY., Chen, HM., Zeng, WC. (2015). Efficient XML Data Processing Based on MapReduce Framework. In: Park, J., Pan, Y., Kim, C., Yang, Y. (eds) Future Information Technology - II. Lecture Notes in Electrical Engineering, vol 329. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9558-6_12

Download citation

DOI: https://doi.org/10.1007/978-94-017-9558-6_12
Published: 30 January 2015
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9557-9
Online ISBN: 978-94-017-9558-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Efficient XML Data Processing Based on MapReduce Framework

Abstract

Access this chapter

Similar content being viewed by others

High-performance XML modeling of parallel queries based on MapReduce framework

A Study of Heterogeneous Database Integration Based on Web Service and XML

Flexible Data Management across XML and Relational Models: A Semantic Approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient XML Data Processing Based on MapReduce Framework

Abstract

Access this chapter

Similar content being viewed by others

High-performance XML modeling of parallel queries based on MapReduce framework

A Study of Heterogeneous Database Integration Based on Web Service and XML

Flexible Data Management across XML and Relational Models: A Semantic Approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation