Different tutorials will suggest or illustrate specific setups. Either choose a setup and then pick a matching tutorial, or choose a tutorial and follow their instructions for setup. Many tutorials have step-by-step instructions.
You can download and install just Python (from Python.org) or get a different distribution, which includes by default additional packages/functions and interfaces for Steps 2 and 3 (but you can also download and install them yourself). The most popular distributions include:
Python comes with a base set of functions, but most functionality is provided through packages (each containing many functions). Each package focuses on different kinds of actions. These may also be called libraries, which is a collection of packages.
The Anaconda distribution (above) comes with the most used and common Data Science packages already installed so you can skip this step.
see the Visualization and Data Collection tabs for lists of relevant packages
There are two package managers: pip and Anaconda (or conda for short). If you installed the Anaconda or Miniconda distribution, you should follow the instructions on the conda tab. Otherwise, or if conda doesn't work, follow the instructions on the pip tab as that comes with every Python installation.
See the pip and conda tabs above for more information
First, open your terminal. On Windows, you will usually use Command Prompt by default on OS X and Linux, this will be Terminal by default. Assuming you installed Python into your system's PATH variable, you will be able to install packages with pip using the install argument as such:
pip install numpy pip install pandas pip install scikit-learn pip install scipy pip install matplotlib pip install seaborn
Sometimes it is recommended to add an upgrade argument to your pip install. This allows you to (1) ensure you are installing the latest version of the package or (2) update an already installed package to a new version:
pip install --upgrade numpy
When you have several packages to install at once, such as in our example of 6 data science packages, it is inefficient to install each one after the next. To speed this up, you can simply list the packages sequentially in one install call:
pip install --upgrade numpy pandas scikit-learn scipy matplotlib seaborn
For a full list of pip commands, reference the pip documentation.
Installing packages with Conda is very similar to pip, though with some caveats. First, we will call conda from the command line to initiate an install:
conda install numpy
However, due to how conda distributes packages, it is often better to set a specific channel when installing with conda:
conda install -c conda-forge numpy
We can tell conda we want to prioritize the conda-forge channel this way:
conda config --add channels conda-forge conda config --set channel_priority strict # Now we can simply call: conda install <name_of_package>
We can also install packages in bulk:
conda install numpy pandas scikit-learn scipy matplotlib seaborn
Reference the conda documentation for more commands and advanced features.
Some Python distributions come with one or more of these, so check to see if it is already installed. These are just the most popular, but there are others.
Ask a Librarian | Hours & Directions | Mason Libraries Home
Copyright © George Mason University