Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Software: Learn Python for Data

Resources to learn and use the Open Source Programming Environment Python for Data Science.

Web Scraping

Python has the built-in requests library to retrieve web pages or API content.

Web Scraping

  • Beautiful Soup - the easiest package but just for parsing HTML.
    • slides. - This session covers topics:  What is Web scraping?  Why do we need it? ; How to parse a web page to extract data, how to save parsed data in CSV format; terms of services and legal issues involved in website scraping, and more.
  • Scrapy - a more comprehensive package that can retrieve data asynchronously
  • Selenium - automates a browser allowing interaction with with JavaScript on pages

APIs

  • json - built-in library that encodes and decodes JSON, the standard format of many APIs
  • tweepy - library optimized for accessing the Twitter API.

Tutorials

Experiments

Collecting Data with Experiments

These are stand-alone software to run experiments and do not require programming, but it can be useful to know Python in order to work with them.