# 2. pandas: Structuring Data For Analysis

## Objectives

* Create `pandas` data frames.
* Sort and access data stored in data frames.

`pandas` is a Python module that provides functionality for data organized in tabular form (like a spreadsheet). The tabular data layout, called a data frame, is the bread and butter of data science, and likewise `pandas` is central to working with tabular data in Python.

If you are eager to see what `pandas` can do, have a look at the Pandas Cookbook for a crash course. For a gentler but deeper experience, skip to the selections from the official `pandas` documentation.

## Optional Crash Course: Pandas Cookbook

If you are ready to dive in, the [Pandas Cookbook](https://github.com/jvns/pandas-cookbook) provides a broad overview of common tasks in `pandas`. The cookbook includes data files and a collection of IPython notebooks organized into chapters. Just open each notebook in Jupyter and work through Chapters 1-4 and 6-8 in order. Expect each chapter to take 5 to 20 minutes.

The cookbook was written for Python 2 and older versions of`pandas` so I found a few things that did not quite work on my system. Try debugging it as practice interpreting Python error messages!

The two places I encountered big problems were in Chapters 5 and 9. Chapter 5 relies on a web data source that has either moved or changed its interface so that the code no longer works. You can read through the text in Chapter 5 if it sounds interesting. Chapter 9 is an example of connecting to databases, and is incomplete. You don't need that skill yet and there are good resources elsewhere. Skip it.

## Deeper Intro: 10 minutes to pandas

The `pandas` documentation also includes a quick-start called [10 minutes to `pandas`](https://pandas.pydata.org/docs/getting_started/10min.html). It has grown considerably longer than ten minutes (it took me about 4 hours to get through). It is a rather technical gallery of concise examples showing how do many simple tasks.

## Selections From The `pandas` Documentation

The full `pandas` documentation gives detailed example walkthroughs. I recommend the following progression.

1. `pandas` Data Structures: [Intro to data structures](https://pandas.pydata.org/docs/getting_started/dsintro.html)
2. Essentials: [Essential basic functionality](https://pandas.pydata.org/docs/getting_started/basics.html)
3. Reading And Writing Data: [How do I read and write tabular data?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html)
4. Subsetting: [How do I select a subset of a `DataFrame`?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html)
5. Deriving Features: [How to create new columns derived from existing columns?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html)
6. Rearranging a `DataFrame`: [How to reshape the layout of tables?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/07_reshape_table_layout.html)
7. Combining more than one `DataFrame` [How to combine data from multiple tables?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/08_combine_dataframes.html)
8. Aggregating And Summarizing [How to calculate summary statistics?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html)
9. Dates And Times: [How to handle time series data with ease?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html)
10. Text: [How to manipulate textual data?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/10_text_data.html)
11. Plots: [How to create plots in `pandas`?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/04_plotting.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kflagg.gitbook.io/pythonds/2.-pandas-structuring-data-for-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
