Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Learn Python for Data

Resources to learn and use the Open Source Programming Environment Python for Data Science.

Web Scraping

Python has the built-in requests library to retrieve web pages or API content.

Web Scraping

  • Beautiful Soup - the easiest package but just for parsing HTML.
    • slides. - This session covers topics:  What is Web scraping?  Why do we need it? ; How to parse a web page to extract data, how to save parsed data in CSV format; terms of services and legal issues involved in website scraping, and more.
  • Scrapy - a more comprehensive package that can retrieve data asynchronously
  • Selenium - automates a browser allowing interaction with with JavaScript on pages


  • json - built-in library that encodes and decodes JSON, the standard format of many APIs
  • tweepy - library optimized for accessing the Twitter API.



Collecting Data with Experiments

These are stand-alone software to run experiments and do not require programming, but it can be useful to know Python in order to work with them.