InfoGuides: Software: Learn Python for Data: Data Analysis & Machine Learning

First

Almost all tutorials on doing Data Science, Statistical Analysis, or Machine Learning in Python assume that you already know how to use Python.
Some may also assume knowledge of the Pandas library, be sure to check.

NumPy
SciPy
pandas
statsmodels
scikit-learn

"R is a language dedicated to statistics. Python is a general-purpose language with statistics modules. R has more statistical analysis features than Python, and specialized syntaxes. However, when it comes to building complex analysis pipelines that mix statistics with e.g. image analysis, text mining, or control of a physical experiment, the richness of Python is an invaluable asset." - Gaël Varoquaux

Statistics

These are resources for using Python to do basic descriptive and inferential statistics as used by academic researchers and statisticians. For additional information on statistical modeling, see the materials on Machine Learning.

Python for Biostatistics

BioPython
Biopython is a set of freely available tools for biological computation written in Python. For additional details, see the BioPython Tutorial and Cookbook

Bioinformatics with Python Cookbook by Tiago Antao
Call Number: Available ONLINE through Mason

ISBN: 9781782175117

Publication Date: 2015-06-25

If you have intermediate-level knowledge of Python and are well aware of the main research and vocabulary in your bioinformatics topic of interest, this book will help you develop your knowledge further.

Illustrating Python via Examples from Bioinformatics
This page is hosted by the Center for Biomedical Computing at the Simula Research Laboratory, which is a Norwegian non-profit research organization.

Popular Machine Learning Libraries

Scikit-Learn

https://scikit-learn.org/

Excellent machine learning library containing a huge catalogue of models and algorithms.
Supports classification, regression, clustering, data cleaning, and feature engineering.
Used by both machine learning beginners and experts.

MOOC Course Materials: Scikit-learn Course by the developers

Videos: Introduction to Machine Learning with SciKit-learn (DataSchool) - Free registration or watch on YouTube

From Scikit-Learn:

Machine Learning with Scikit-Learn Quick Start Guide by Kevin Jolly Deploy supervised and unsupervised machine learning algorithms using scikit-learn to perform classification, regression, and clustering. Key Features Build your first machine learning model using scikit-learn Train supervised and unsupervised models using popular techniques such as classification, regression and clustering Understand how scikit-learn can be applied to different types of machine learning problems Book Description Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize, and evaluate all of the important machine learning algorithms that scikit-learn provides. This book teaches you how to use scikit-learn for machine learning. You will start by setting up and configuring your machine learning environment with scikit-learn. To put scikit-learn to use, you will learn how to implement various supervised and unsupervised machine learning models. You will learn classification, regression, and clustering techniques to work with different types of datasets and train your models. Finally, you will learn about an effective pipeline to help you build a machine learning project from scratch. By the end of this book, you will be confident in building your own machine learning models for accurate predictions. What you will learn Learn how to work with all scikit-learn's machine learning algorithms Install and set up scikit-learn to build your first machine learning model Employ Unsupervised Machine Learning Algorithms to cluster unlabelled data into groups Perform classification and regression machine learning Use an effective pipeline to build a machine learning project from scratch Who this book is for This book is for aspiring machine learning developers who want to get started with scikit-learn. Intermediate knowledge of Python programming and some fundamental knowledge of linear algebra and probability will help.
Call Number: Available ONLINE through Mason Libraries

ISBN: 1789343704

Publication Date: 2018-10-30
Hands-On Machine Learning with Scikit-learn and Scientific Python Toolkits by Tarek Amr Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems Key Features Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python Master the art of data-driven problem-solving with hands-on examples Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms Book Description Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits. The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms. By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production. What you will learn Understand when to use supervised, unsupervised, or reinforcement learning algorithms Find out how to collect and prepare your data for machine learning tasks Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff Apply supervised and unsupervised algorithms to overcome various machine learning challenges Employ best practices for tuning your algorithm's hyper parameters Discover how to use neural networks for classification and regression Build, evaluate, and deploy your machine learning solutions to production Who this book is for This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.
Call Number: Available ONLINE through Mason Libraries

ISBN: 1838826041

Publication Date: 2020-07-24

TensorFlow

Website: https://www.tensorflow.org/

Open source Python library for deep learning and neural networks from Google.
More complex models, a complicated base syntax, and a steeper learning curve.
Supports transfer and access to pre-trained models learning through TensorFlow Hub.

Recommended: TensorFlow Coding Examples

Keras

Website: https://keras.io/

Deep learning API that interfaces with TensorFlow.
Offers a much more Pythonic syntax that makes programming deep neural networks easier in TensorFlow.
Extremely popular—second only to Scikit-Learn, according to Keras.

Recommended: Keras Coding Examples

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how. By using concrete examples, minimal theory, and two production-ready Python frameworks--Scikit-Learn and TensorFlow--author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You'll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you've learned, all you need is programming experience to get started. Explore the machine learning landscape, particularly neural nets Use Scikit-Learn to track an example machine-learning project end-to-end Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods Use the TensorFlow library to build and train neural nets Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning Learn techniques for training and scaling deep neural nets
Call Number: Available ONLINE through Mason Libraries

ISBN: 1492032646

Publication Date: 2019-10-22
TensorFlow 2 Pocket Reference by K. C. Tung This easy-to-use reference for TensorFlow 2 design patterns in Python will help you make informed decisions for various use cases. Author KC Tung addresses common topics and tasks in enterprise data science and machine learning practices rather than focusing on TensorFlow itself. When and why would you feed training data as using NumPy or a streaming dataset? How would you set up cross-validations in the training process? How do you leverage a pretrained model using transfer learning? How do you perform hyperparameter tuning? Pick up this pocket reference and reduce the time you spend searching through options for your TensorFlow use cases. Understand best practices in TensorFlow model patterns and ML workflows Use code snippets as templates in building TensorFlow models and workflows Save development time by integrating prebuilt models in TensorFlow Hub Make informed design choices about data ingestion, training paradigms, model saving, and inferencing Address common scenarios such as model design style, data ingestion workflow, model training, and tuning
Call Number: Available ONLINE through Mason Libraries

ISBN: 1492089184

Publication Date: 2021-08-10
Applied Deep Learning with Python: Use Scikit-Learn, TensorFlow, and Keras to Create Intelligent Systems and Machine Learning Solutions by Alex Galea; Luis Capelo A hands-on guide to deep learning that's filled with intuitive explanations and engaging practical examples Key Features Designed to iteratively develop the skills of Python users who don't have a data science background Covers the key foundational concepts you'll need to know when building deep learning systems Full of step-by-step exercises and activities to help build the skills that you need for the real-world Book Description Taking an approach that uses the latest developments in the Python ecosystem, you'll first be guided through the Jupyter ecosystem, key visualization libraries and powerful data sanitization techniques before we train our first predictive model. We'll explore a variety of approaches to classification like support vector networks, random decision forests and k-nearest neighbours to build out your understanding before we move into more complex territory. It's okay if these terms seem overwhelming; we'll show you how to put them to work. We'll build upon our classification coverage by taking a quick look at ethical web scraping and interactive visualizations to help you professionally gather and present your analysis. It's after this that we start building out our keystone deep learning application, one that aims to predict the future price of Bitcoin based on historical public data. By guiding you through a trained neural network, we'll explore common deep learning network architectures (convolutional, recurrent, generative adversarial) and branch out into deep reinforcement learning before we dive into model optimization and evaluation. We'll do all of this whilst working on a production-ready web application that combines Tensorflow and Keras to produce a meaningful user-friendly result, leaving you with all the skills you need to tackle and develop your own real-world deep learning projects confidently and effectively. What you will learn Discover how you can assemble and clean your very own datasets Develop a tailored machine learning classification strategy Build, train and enhance your own models to solve unique problems Work with production-ready frameworks like Tensorflow and Keras Explain how neural networks operate in clear and simple terms Understand how to deploy your predictions to the web Who this book is for If you're a Python programmer stepping into the world of data science, this is the ideal way to get started.
Call Number: Available ONLINE through Mason Libraries

ISBN: 1789804744

Publication Date: 2018-08-31

PyTorch

Website: https://pytorch.org/

Open source Python library for deep learning and neural networks from Facebook.
Allows for building complex models and does have a bit of a learning curve, but the syntax is more Pythonic than base TensorFlow.
Supports access to pre-trained models, extensions, and modules via the PyTorch Ecosystem.

Recommended: Practical Deep Learning for Coders Course by fast.ai (Free!)

Deep Learning for Coders with Fastai and PyTorch by Jeremy Howard; Sylvain Gugger Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. But as this hands-on guide demonstrates, programmers comfortable with Python can achieve impressive results in deep learning with little math background, small amounts of data, and minimal code. How? With fastai, the first library to provide a consistent interface to the most frequently used deep learning applications. Authors Jeremy Howard and Sylvain Gugger, the creators of fastai, show you how to train a model on a wide range of tasks using fastai and PyTorch. You'll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes. Train models in computer vision, natural language processing, tabular data, and collaborative filtering Learn the latest deep learning techniques that matter most in practice Improve accuracy, speed, and reliability by understanding how deep learning models work Discover how to turn your models into web applications Implement deep learning algorithms from scratch Consider the ethical implications of your work Gain insight from the foreword by PyTorch cofounder, Soumith Chintala
Call Number: Available ONLINE through Mason Libraries

ISBN: 9781492045496

Publication Date: 2020-06-29
PyTorch Pocket Reference by Joe Papa This concise, easy-to-use reference puts one of the most popular frameworks for deep learning research and development at your fingertips. Author Joe Papa provides instant access to syntax, design patterns, and code examples to accelerate your development and reduce the time you spend searching for answers. Research scientists, machine learning engineers, and software developers will find clear, structured PyTorch code that covers every step of neural network development-from loading data to customizing training loops to model optimization and GPU/TPU acceleration. Quickly learn how to deploy your code to production using AWS, Google Cloud, or Azure and deploy your ML models to mobile and edge devices. Learn basic PyTorch syntax and design patterns Create custom models and data transforms Train and deploy models using a GPU and TPU Train and test a deep learning classifier Accelerate training using optimization and distributed training Access useful PyTorch libraries and the PyTorch ecosystem
Call Number: Available ONLINE through Mason Libraries

ISBN: 149209000X

Publication Date: 2021-06-01

Books

Data Science

These books focus on data management, and sometimes analysis.

Python Data Science Handbook by Jake VanderPlas Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all--IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms
Call Number: Available for FREE Online through Mason or from the author

ISBN: 9781098121228

Publication Date: 2023-01-31

Has sections on NumPy, Pandas, Matplotlib. Covers data management and exploration (not statistical modeling or testing). See Jupyter notebooks that make up the entire book at the author's Github repository.

Think Stats by Allen B. Downey If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. By working with a single case study throughout this thoroughly revised book, you'll learn the entire process of exploratory data analysis--from collecting data and generating statistics to identifying patterns and testing hypotheses. You'll explore distributions, rules of probability, visualization, and many other tools and concepts. New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries. Develop an understanding of probability and statistics by writing and testing code Run experiments to test statistical behavior, such as generating samples from several distributions Use simulations to understand concepts that are hard to grasp mathematically Import data from most sources with Python, rather than rely on data that's cleaned and formatted for statistics tools Use statistical inference to answer questions about real-world data
Call Number: Available for FREE through Mason or the author

ISBN: 1491907339

Publication Date: 2014-11-11

Introduction to exploratory data analysis, as tends to be done in the social and health sciences. Covers NumPy, pandas, SciPy, MatplotLib, and some statsmodels for Regression and time series.

Practical Statistics for Data Scientists by Peter Bruce; Andrew Bruce; Peter Gedeck Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you'll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data
Call Number: Call Number: Available for free ONLINE through Mason

ISBN: 9781492072942

Publication Date: 2020-06-02

Code in both R & Python and on Github. Assumes familiarity with Python and statistics. Covers exploratory data analysis, sampling distributions, significance testing, regression, classification, and both supervised and unsupervisd learning. Uses scikit-learn, supplemented by statsmodels.

Machine Learning, including Deep Learning

These books cover TensorFlow and Keras.

Python Machine Learning by Sebastian Raschka; Vahid Mirjalili Applied machine learning with a solid foundation in theory. Revised and expanded for TensorFlow 2, GANs, and reinforcement learning. Key Features Third edition of the bestselling, widely acclaimed Python machine learning book Clear and intuitive explanations take you deep into the theory and practice of Python machine learning Fully updated and expanded to cover TensorFlow 2, Generative Adversarial Network models, reinforcement learning, and best practices Book Description Python Machine Learning, Third Edition is a comprehensive guide to machine learning and deep learning with Python. It acts as both a step-by-step tutorial, and a reference you'll keep coming back to as you build your machine learning systems. Packed with clear explanations, visualizations, and working examples, the book covers all the essential machine learning techniques in depth. While some books teach you only to follow instructions, with this machine learning book, Raschka and Mirjalili teach the principles behind machine learning, allowing you to build models and applications for yourself. Updated for TensorFlow 2.0, this new third edition introduces readers to its new Keras API features, as well as the latest additions to scikit-learn. It's also expanded to cover cutting-edge reinforcement learning techniques based on deep learning, as well as an introduction to GANs. Finally, this book also explores a subfield of natural language processing (NLP) called sentiment analysis, helping you learn how to use machine learning algorithms to classify documents. This book is your companion to machine learning with Python, whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments. What you will learn Master the frameworks, models, and techniques that enable machines to 'learn' from data Use scikit-learn for machine learning and TensorFlow for deep learning Apply machine learning to image classification, sentiment analysis, intelligent web applications, and more Build and train neural networks, GANs, and other models Discover best practices for evaluating and tuning models Predict continuous target outcomes using regression analysis Dig deeper into textual and social media data using sentiment analysis Who This Book Is For If you know some Python and you want to use machine learning and deep learning, pick up this book. Whether you want to start from scratch or extend your machine learning knowledge, this is an essential resource. Written for developers and data scientists who want to create practical machine learning and deep learning code, this book is ideal for anyone who wants to teach computers how to learn from data.
Call Number: Available ONLINE through Mason Libraries

ISBN: 9781789955750

Publication Date: 2019-12-12

Does not require prior knowledge of machine learning. Offers both hands-on experience with machine learning as well as the concepts behind the algorithms, how to use them, and how to avoid common pitfalls. Covers classification and regression, data pre-processing, applications of machine learning, and neural networks.

Machine Learning with Python Cookbook by Chris Albon This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you may encounter in your daily work. If you're comfortable with Python and its libraries, including pandas and scikit-learn, you'll be able to address specific problems such as loading data, handling text or numerical data, model selection, and dimensionality reduction and many other topics. Each recipe includes code that you can copy and paste into a toy dataset to ensure that it actually works. From there, you can insert, combine, or adapt the code to help construct your application. Recipes also include a discussion that explains the solution and provides meaningful context. This cookbook takes you beyond theory and concepts by providing the nuts and bolts you need to construct working machine learning applications. You'll find recipes for: Vectors, matrices, and arrays Handling numerical and categorical data, text, images, and dates and times Dimensionality reduction using feature extraction or feature selection Model evaluation and selection Linear and logical regression, trees and forests, and k-nearest neighbors Support vector machines (SVM), naïve Bayes, clustering, and neural networks Saving and loading trained models
Call Number: Available ONLINE through Mason Libraries

ISBN: 9781491989388

Publication Date: 2018-04-17

For people who are comfortable with Python and Machine learning, and need a quick reference for the code to use. Covers loading and wrangling data, preparing different data types, and analyses from linear regression through neural networks.

Software: Learn Python for Data