Blog post with animated graphics goes through the multi-step process of removing extra formatting and fixing values.
Managing Data Using Excel by Mark GardenerMicrosoft Excel is a powerful tool that can transform the way you use data. This book explains in comprehensive and user-friendly detail how to manage, make sense of, explore and share data, giving scientists at all levels the skills they need to maximize the usefulness of their data. Readers will learn how to use Excel to: * Build a dataset - how to handle variables and notes, rearrangements and edits to data. * Check datasets - dealing with typographic errors, data validation and numerical errors. * Make sense of data - including datasets for regression and correlation; summarizing data with averages and variability; and visualizing data with graphs, pivot charts and sparklines. * Explore regression data - finding, highlighting and visualizing correlations. * Explore time-related data - using pivot tables, sparklines and line plots. * Explore association data - creating and visualizing contingency tables. * Explore differences - pivot tables and data visualizations including box-whisker plots. * Share data - methods for exporting and sharing your datasets, summaries and graphs. Alongside the text, Have a Go exercises, Tips and Notes give readers practical experience and highlight important points, and helpful self-assessment exercises and summary tables can be found at the end of each chapter. Supplementary material can also be downloaded on the companion website. Managing Data Using Excel is an essential book for all scientists and students who use data and are seeking to manage data more effectively. It is aimed at scientists at all levels but it is especially useful for university-level research, from undergraduates to postdoctoral researchers.
Call Number: Available ONLINE through Mason Libraries
Publication Date: 2015-03-16
Data Cleaning by Venkatesh Ganti; Anish Das SarmaData warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks. Table of Contents: Preface / Acknowledgments / Introduction / Technological Approaches / Similarity Functions / Operator: Similarity Join / Operator: Clustering / Operator: Parsing / Task: Record Matching / Task: Deduplication / Data Cleaning Scripts / Conclusion / Bibliography / Authors' Biographies
Call Number: Available ONLINE through Mason
Publication Date: 2011-12-15
Best Practices in Data Cleaning by Jason W. OsborneMany researchers jump from data collection directly into testing hypothesis without realizing these tests can go profoundly wrong without clean data. This book provides a clear, accessible, step-by-step process of important best practices in preparing for data collection, testing assumptions, and examining and cleaning data in order to decrease error rates and increase both the power and replicability of results. Jason W. Osborne, author of the handbook Best Practices in Quantitative Methods (SAGE, 2008) provides easily-implemented suggestions that are evidence-based and will motivate change in practice by empirically demonstrating--for each topic--the benefits of following best practices and the potential consequences of not following these guidelines.