1. Hello World: Python Setup/How Computers Run Code

Objectives

  • Get Python running.

  • Use the fundamental Python variable types and operations.

  • Load Python modules.

  • Understand how computers go from code to results.

Getting Python Onto Your Computer

Before you can practice your hands-on Python talents, you need a computer with a Python system on it. A Python system consists of [1] the core Python interpreter, [2] modules that give Python the functionality you need, and [3] software for editing Python code.

If that sounds complicated, don't worry because there are some solutions that will package everything together for you. We will circle back to those later, but first let us go over the individual components of the system.

Python core

Python is both a language and a software program. You write code in the Python language. Then the code is interpreted and executed by the Python program.

Python can run code in two ways. First, it can run code saved in a script file, when it runs the code start to finish automatically. Second, it can provide an interactive shell where the user enters code one line at a time and it is run immediately. Interactive mode is how data scientists do exploration and development, and is how you should learn the language.

There are two supported versions of Python: Python 2 and the newer Python 3. Make sure you get Python 3 for this course. The two versions are largely compatible, but beware that you will occasionally encounter Python 2 code that needs some updating to be run by Python 3. Depending on how Python gets set up on your computer, you may need to change some commands to specify version 3, i.e. using the command pip3 instead of pip to install modules.

You can install Python from the Python website or from your operating system's package manager.

Python modules

Python itself includes only the fundamental tools for instructing the computer to do things. More complicated functionality, like graphics, networking, and complicated math, are provided by add-on modules. You will need several modules for this course.

  • pandas provides tabular data organization similar to R, most statistics software, and database tables. It is the most important Python tool for data scientists.

  • numpy provides Matlab-like mathematical functionality. Many other math, statistics, and data science modules are built upon it.

  • statsmodels provides standard statistical modeling tools.

  • matplotlib and seaborn provide functions to visualize data.

Modules are installed using Python's companion program pip. You can use the pip command on the terminal like so:

python -m pip install <modulename>

For example, to install pandas:

python -m pip install pandas

Software workspace

The standard way to communicate Python ideas is via Jupyter notebooks, documents which combine formatted text with chunks of code and output. Jupyter is a project to promote interactive and accessible scientific computing. Their software provides the notebook functionality and a code editing environment that runs rights in your web browser. Other tools that use notebooks use Jupyter behind the scenes, so you will need Jupyter installed as part of your Python system.

If you're like me and you dislike things running in browsers, you may want to look at other environments. For learning the language, you can do quite a bit by pasting code directly into the python or ipython shell. But when you are working on projects and need to save and revise code, it is best to use an integrated development environment (IDE) that provides a code editor and shell.

A perennially popular Python-specific IDE is PyCharm, which also has an educational edition. If you are a new or hobbyist programmer, it is worth a look.

One of the best IDEs currently available (and the one I use most frequently with Python) is Microsoft's Visual Studio Code (VSCode). It is modular, highly customizable, and works with practically every language you will ever need. Once you have VSCode installed, just go to the extensions pane and search for python. Install the Microsoft-developed Python extension and you will be able to run Python code and work with Jupyter notebooks.

Complete Python systems

The minimum you need to start working with Python is: [1] Python itself, [2] an assortment of modules, and [3] Jupyter. You can optionally add another IDE like VSCode if you desire it. If you are not up for piecing together your system, there are some options that combine everything together for you.

The Anaconda distribution is a software collection focused on data science. It provides Python, Jupyter, popular modules for data science, and additional package management. For most people reading this, Anaconda is exactly what you need.

If you want to avoid installing anything on your own computer, check out the Google Colaboratory. This is basically Google Docs for Python notebooks, and as a bonus you can offload the computation from your computer to Google's servers! It includes an extensive library of modules and you can install additional modules via pip. The basic functionality is free; you can pay for heavier-duty computing resources if you need them. The downside is that data and user-installed modules are not saved from session to session, but you can mount your Google Drive and save data there.

You've read about the options for procuring a Python system. You can now pick one and get started!

Getting Started Coding

DataCamp has a fantastic free course to get you going with using the interactive Python shell. This will introduce you to variables, arithmetic, and the core datatypes.

If DataCamp isn't your style, check out the free textbook A Byte of Python. An afternoon with this book will give you a solid start in coding. Start with the Basics chapter, then go through Operators and Expressions, Control Flow, Functions, Modules, and Data Structures.

How Computers Execute Code

In pursuing scientific computing, I have heard many conflicting ideas from people with diverse backgrounds who don't know what they don't know. Scientists' only "coding" training may be being handed an example script by a professor, TA, or PI and figuring out by trial and error how to make the computer do something resembling what they want it to do. On the other hand computer scientists often take everything to the most formal, technical, and automated level without appreciating scientific iteration, theoretical, or mathematical considerations for the methods they are implementing.

To sort everything out, it helps to understand the relationship between programmer, language, and computer. Traditionally, software developers write code and then compile it into a program in the computer's native binary language. The program runs start-to-finish; its code is fixed and it accepts only the inputs that it was programmed to pay attention to. This is how computer scientists are trained because it is the most natural way for computers to work. However, many newer languages, Python included, are translated (interpreted) on the fly instead of compiled before execution. Interpreted languages can be run interactively, with the user (not "programmer") examining the result of one operation, then considering several possible alternative next steps without committing anything to machine code. The latter is how modern data scientists explore data and build models.

Whatever workflow humans use to complete their tasks, the interaction between human and computer is similar. The following steps are from a C++ programming textbook (Gaddis 2005, Figure 1-8).

  1. Clearly define what the program is to do.

  2. Visualize the program running on the computer.

  3. Use design tools such as a hierarchy chart, flowcharts, or pseudocode to create a model of the program.

  4. Check the model for logical errors.

  5. Type the code, save it, and compile it.

  6. Correct any errors found during compilation. Repeat steps 5 and 6 as many times as necessary.

  7. Run the program with test data for input.

  8. Correct any errors found while running the program. Repeat steps 5 through 8 as many times as necessary.

  9. Validate the results of the program.

Some slight modifications are needed for interpreted languages like Python. The "compile it" part of step 5 does not apply to most things you do in Python, and you will typically combine steps 5 and 7 into "Type, the code, save it, and run it with test data for input." Steps 6 and 8 also occur at the same time, but are conceptually different. Step 6 is about making sure your code makes sense to the computer; step 8 is about making sure the computer interprets your code the way you intended.

Python occupies a strange position in that it is used in both workflows. It can combine access to low-level functionality of the computer with high-level routines for advanced math. Data scientists using their workflow can create solutions that are implemented verbatim inside software developed in the computer science paradigm.

Read The Documentation!

This final heading speaks for itself. Learning to read language references is the most useful skill I picked up in my programming courses. Get familiar with the official documentation of the language and modules to save yourself from hours of hopeless Googling.

In Python, you can read about most functions by typing help(<function name>). This opens a brief help page in the Python terminal; press q to close it. The documentation website has lots of detail and is worth getting to know, in particular the language reference.

References

Gaddis, T. 2005. Standard Version of Starting Out with C++, 4th Ed., 2005 Update. Addison-Wesley Longman, Incorporated.

Last updated

Was this helpful?