Data Science
These books focus on data management, and sometimes analysis.
-
Python Data Science Handbook
by
Jake VanderPlas
Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all--IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms
ISBN: 9781098121228
Publication Date: 2023-01-31
Has sections on NumPy, Pandas, Matplotlib. Covers data management and exploration (not statistical modeling or testing). See Jupyter notebooks that make up the entire book at the author's Github repository.
-
Think Stats
by
Allen B. Downey
If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. By working with a single case study throughout this thoroughly revised book, you'll learn the entire process of exploratory data analysis--from collecting data and generating statistics to identifying patterns and testing hypotheses. You'll explore distributions, rules of probability, visualization, and many other tools and concepts. New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries. Develop an understanding of probability and statistics by writing and testing code Run experiments to test statistical behavior, such as generating samples from several distributions Use simulations to understand concepts that are hard to grasp mathematically Import data from most sources with Python, rather than rely on data that's cleaned and formatted for statistics tools Use statistical inference to answer questions about real-world data
ISBN: 1491907339
Publication Date: 2014-11-11
Introduction to exploratory data analysis, as tends to be done in the social and health sciences. Covers NumPy, pandas, SciPy, MatplotLib, and some statsmodels for Regression and time series.
-
Practical Statistics for Data Scientists
by
Peter Bruce; Andrew Bruce; Peter Gedeck
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you'll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data
Call Number: Call Number: Available for free ONLINE through
Mason
ISBN: 9781492072942
Publication Date: 2020-06-02
Code in both R & Python and on Github. Assumes familiarity with Python and statistics. Covers exploratory data analysis, sampling distributions, significance testing, regression, classification, and both supervised and unsupervisd learning. Uses scikit-learn, supplemented by statsmodels.