Abstract
Due to the advances of information technology, new devices generate amount of data. Especially, XML is a standard format for data exchange. Therefore, processing big XML data is an important topic. We propose an efficient XML data processing mechanism, which includes a design of XMLInputFormat class, MapReduce modules, and an HBase schema. The mechanism scans an XML document to reconstruct parent-child relationships in the document. It generates deserialized paths, which are stored in HBase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen Z, Gehrke J, Korn F, Koudas N, Shanmugasundaram J, Srivastava D (2007) Index structures for matching xml twigs using relational query processors. Data Knowl Eng 60(2):283–302
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Elghandour I, Aboulnaga A, Zilio DC, Chiang F, Balmin A, Beyer K, Zuzarte C (2008) An XML index advisor for DB2. In: Proceedings of ACM SIGMOD, pp 1267–1270
Emoto K, Imachi H (2012) Parallel tree reduction on MapReduce. Procedia Comput Sci 9:1827–1836
https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/filecache/DistributedCache.html
Lin X, Wang N, Xu D, Zeng X (2010) A novel XML keyword query approach using entity subtree. J Syst Softw 83:990–1003
Roddick C, Braganholo V, Mattoso M (2011) Virtual partitioning ad-hoc queries over distributed XML databases. J Inf Data Manage 2:495–510
White T (2010) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Zinn D, Bowers S, Kohler S, Ludascher B (2010) Parallelizing XML data-streaming workflows via MapReduce. J Comput Syst Sci 76:447–463
Acknowledgments
This work is supported in part by NSC in Taiwan, R.O.C. under Grant No. NSC-100-2221-E-025-014. The authors are also grateful to the reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Chen, SY., Chen, HM., Zeng, WC. (2015). Efficient XML Data Processing Based on MapReduce Framework. In: Park, J., Pan, Y., Kim, C., Yang, Y. (eds) Future Information Technology - II. Lecture Notes in Electrical Engineering, vol 329. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9558-6_12
Download citation
DOI: https://doi.org/10.1007/978-94-017-9558-6_12
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9557-9
Online ISBN: 978-94-017-9558-6
eBook Packages: EngineeringEngineering (R0)