2. pandas: Structuring Data For Analysis

Objectives

  • Create pandas data frames.

  • Sort and access data stored in data frames.

pandas is a Python module that provides functionality for data organized in tabular form (like a spreadsheet). The tabular data layout, called a data frame, is the bread and butter of data science, and likewise pandas is central to working with tabular data in Python.

If you are eager to see what pandas can do, have a look at the Pandas Cookbook for a crash course. For a gentler but deeper experience, skip to the selections from the official pandas documentation.

Optional Crash Course: Pandas Cookbook

If you are ready to dive in, the Pandas Cookbook provides a broad overview of common tasks in pandas. The cookbook includes data files and a collection of IPython notebooks organized into chapters. Just open each notebook in Jupyter and work through Chapters 1-4 and 6-8 in order. Expect each chapter to take 5 to 20 minutes.

The cookbook was written for Python 2 and older versions ofpandas so I found a few things that did not quite work on my system. Try debugging it as practice interpreting Python error messages!

The two places I encountered big problems were in Chapters 5 and 9. Chapter 5 relies on a web data source that has either moved or changed its interface so that the code no longer works. You can read through the text in Chapter 5 if it sounds interesting. Chapter 9 is an example of connecting to databases, and is incomplete. You don't need that skill yet and there are good resources elsewhere. Skip it.

Deeper Intro: 10 minutes to pandas

The pandas documentation also includes a quick-start called 10 minutes to pandas. It has grown considerably longer than ten minutes (it took me about 4 hours to get through). It is a rather technical gallery of concise examples showing how do many simple tasks.

Selections From The pandas Documentation

The full pandas documentation gives detailed example walkthroughs. I recommend the following progression.

  1. pandas Data Structures: Intro to data structures

  2. Reading And Writing Data: How do I read and write tabular data?

  3. Rearranging a DataFrame: How to reshape the layout of tables?

  4. Combining more than one DataFrame How to combine data from multiple tables?

  5. Aggregating And Summarizing How to calculate summary statistics?

Last updated

Was this helpful?