Skip to main content

Efficient XML Data Processing Based on MapReduce Framework

  • Conference paper
  • First Online:
Future Information Technology - II

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 329))

  • 1087 Accesses

Abstract

Due to the advances of information technology, new devices generate amount of data. Especially, XML is a standard format for data exchange. Therefore, processing big XML data is an important topic. We propose an efficient XML data processing mechanism, which includes a design of XMLInputFormat class, MapReduce modules, and an HBase schema. The mechanism scans an XML document to reconstruct parent-child relationships in the document. It generates deserialized paths, which are stored in HBase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen Z, Gehrke J, Korn F, Koudas N, Shanmugasundaram J, Srivastava D (2007) Index structures for matching xml twigs using relational query processors. Data Knowl Eng 60(2):283–302

    Google Scholar 

  2. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Google Scholar 

  3. Elghandour I, Aboulnaga A, Zilio DC, Chiang F, Balmin A, Beyer K, Zuzarte C (2008) An XML index advisor for DB2. In: Proceedings of ACM SIGMOD, pp 1267–1270

    Google Scholar 

  4. Emoto K, Imachi H (2012) Parallel tree reduction on MapReduce. Procedia Comput Sci 9:1827–1836

    Article  Google Scholar 

  5. http://msdn.microsoft.com/zh-tw/windowsazure/ff721941.aspx

  6. http://oss-tw.blogspot.tw/2010/04/hbase-vs-cassandra.html

  7. https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/filecache/DistributedCache.html

  8. Lin X, Wang N, Xu D, Zeng X (2010) A novel XML keyword query approach using entity subtree. J Syst Softw 83:990–1003

    Article  Google Scholar 

  9. Roddick C, Braganholo V, Mattoso M (2011) Virtual partitioning ad-hoc queries over distributed XML databases. J Inf Data Manage 2:495–510

    Google Scholar 

  10. White T (2010) Hadoop: the definitive guide. O’Reilly Media, Sebastopol

    Google Scholar 

  11. Zinn D, Bowers S, Kohler S, Ludascher B (2010) Parallelizing XML data-streaming workflows via MapReduce. J Comput Syst Sci 76:447–463

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported in part by NSC in Taiwan, R.O.C. under Grant No. NSC-100-2221-E-025-014. The authors are also grateful to the reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shih-Ying Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Chen, SY., Chen, HM., Zeng, WC. (2015). Efficient XML Data Processing Based on MapReduce Framework. In: Park, J., Pan, Y., Kim, C., Yang, Y. (eds) Future Information Technology - II. Lecture Notes in Electrical Engineering, vol 329. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9558-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-9558-6_12

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-017-9557-9

  • Online ISBN: 978-94-017-9558-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics