2. pandas: Structuring Data For Analysis
Objectives
Create
pandas
data frames.Sort and access data stored in data frames.
pandas
is a Python module that provides functionality for data organized in tabular form (like a spreadsheet). The tabular data layout, called a data frame, is the bread and butter of data science, and likewise pandas
is central to working with tabular data in Python.
If you are eager to see what pandas
can do, have a look at the Pandas Cookbook for a crash course. For a gentler but deeper experience, skip to the selections from the official pandas
documentation.
Optional Crash Course: Pandas Cookbook
If you are ready to dive in, the Pandas Cookbook provides a broad overview of common tasks in pandas
. The cookbook includes data files and a collection of IPython notebooks organized into chapters. Just open each notebook in Jupyter and work through Chapters 1-4 and 6-8 in order. Expect each chapter to take 5 to 20 minutes.
The cookbook was written for Python 2 and older versions ofpandas
so I found a few things that did not quite work on my system. Try debugging it as practice interpreting Python error messages!
The two places I encountered big problems were in Chapters 5 and 9. Chapter 5 relies on a web data source that has either moved or changed its interface so that the code no longer works. You can read through the text in Chapter 5 if it sounds interesting. Chapter 9 is an example of connecting to databases, and is incomplete. You don't need that skill yet and there are good resources elsewhere. Skip it.
Deeper Intro: 10 minutes to pandas
The pandas
documentation also includes a quick-start called 10 minutes to pandas
. It has grown considerably longer than ten minutes (it took me about 4 hours to get through). It is a rather technical gallery of concise examples showing how do many simple tasks.
Selections From The pandas
Documentation
pandas
DocumentationThe full pandas
documentation gives detailed example walkthroughs. I recommend the following progression.
pandas
Data Structures: Intro to data structuresEssentials: Essential basic functionality
Reading And Writing Data: How do I read and write tabular data?
Subsetting: How do I select a subset of a
DataFrame
?Deriving Features: How to create new columns derived from existing columns?
Rearranging a
DataFrame
: How to reshape the layout of tables?Combining more than one
DataFrame
How to combine data from multiple tables?Aggregating And Summarizing How to calculate summary statistics?
Dates And Times: How to handle time series data with ease?
Last updated
Was this helpful?