Skip to Main Content
George Mason University InfoGuides

Software: Learn Python for Data

Resources to learn and use the Open Source Programming Environment Python for Data Science.

Almost all tutorials on doing Data Science, Statistical Analysis, or Machine Learning in Python assume that you already know how to use Python.
Some may also assume knowledge of the Pandas library, be sure to check.

  • NumPy
  • SciPy
  • pandas
  • statsmodels
  • scikit-learn
"R is a language dedicated to statistics. Python is a general-purpose language with statistics modules. R has more statistical analysis features than Python, and specialized syntaxes. However, when it comes to building complex analysis pipelines that mix statistics with e.g. image analysis, text mining, or control of a physical experiment, the richness of Python is an invaluable asset." - Gaël Varoquaux

Statistics

These are resources for using Python to do basic descriptive and inferential statistics as used by academic researchers and statisticians. For additional information on statistical modeling, see the materials on Machine Learning. 

Python for Biostatistics

Scikit Learn

Scikit-Learn

https://scikit-learn.org/

  • Excellent machine learning library containing a huge catalogue of models and algorithms.
  • Supports classification, regression, clustering, data cleaning, and feature engineering.
  • Used by both machine learning beginners and experts.

Recommended: Scikit-Learn Coding Examples & A Gentle Introduction to Scikit-Learn

MOOC Course Materials: Scikit-learn Course by the developers

Videos: Introduction to Machine Learning with SciKit-learn (DataSchool) - Free registration or watch on YouTube

From Scikit-Learn: 

TensorFlow & Keras

TensorFlow

Website: https://www.tensorflow.org/

  • Open source Python library for deep learning and neural networks from Google.
  • More complex models, a complicated base syntax, and a steeper learning curve.
  • Supports transfer and access to pre-trained models learning through TensorFlow Hub.

Recommended: TensorFlow Coding Examples

 

Keras

Website: https://keras.io/

  • Deep learning API that interfaces with TensorFlow.
  • Offers a much more Pythonic syntax that makes programming deep neural networks easier in TensorFlow.
  • Extremely popular—second only to Scikit-Learn, according to Keras.

Recommended: Keras Coding Examples

Pytorch

PyTorch

Website: https://pytorch.org/

  • Open source Python library for deep learning and neural networks from Facebook.
  • Allows for building complex models and does have a bit of a learning curve, but the syntax is more Pythonic than base TensorFlow.
  • Supports access to pre-trained models, extensions, and modules via the PyTorch Ecosystem.

Recommended: Practical Deep Learning for Coders Course by fast.ai (Free!)

Books on Machine Learning, including Deep Learning

Machine Learning, including Deep Learning

These books cover TensorFlow and Keras. 

Does not require prior knowledge of machine learning. Offers both hands-on experience with machine learning as well as the concepts behind the algorithms, how to use them, and how to avoid common pitfalls. Covers classification and regression, data pre-processing, applications of machine learning, and neural networks. 

For people who are comfortable with Python and Machine learning, and need a quick reference for the code to use. Covers loading and wrangling data, preparing different data types, and analyses from linear regression through neural networks.