Processing Data in R and Python

Lebanon, Guy; El-Geish, Mohamed

doi:10.1007/978-3-319-98149-9_9

Guy Lebanon³ &
Mohamed El-Geish⁴

3487 Accesses

Abstract

There is no shortcut to knowledge; and there are no worthwhile data without preprocessing. In the first three sections of this chapter, we discuss situations that necessitate data preprocessing and how to handle them. In the final section we discuss how to manipulate data in general; specifically, how to manipulate data in R using the reshape2 and plyr packages and in Python using the pandas module.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the asterisk in a*ply and elsewhere to indicate a collection of functions obtained by substituting the asterisk with other characters.

References

H. Wickham. Reshaping data with the reshape package. Journal of Statistical Software, 21 (12), 2007.
Google Scholar
H. Wickham. The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40 (1), 2011.
Google Scholar
R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley, second edition, 2002.
Google Scholar
P. J. Huber. Robust Statistics. Wiley, 1981.
Google Scholar
R. Maronna, D. R. Martin, and V. J. Yohai. Robust Statistics: Theory and Methods. Wiley, 2006.
Google Scholar
M. Kutner, C. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. McGraw-Hill, fifth edition, 2004.
Google Scholar
P. Spector. Data Manipulation with R. Springer, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Amazon, Menlo Park, CA, USA
Guy Lebanon
Voicera, Santa Clara, CA, USA
Mohamed El-Geish

Authors

Guy Lebanon
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El-Geish
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lebanon, G., El-Geish, M. (2018). Processing Data in R and Python. In: Computing with Data. Springer, Cham. https://doi.org/10.1007/978-3-319-98149-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-98149-9_9
Published: 29 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98148-2
Online ISBN: 978-3-319-98149-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics