Learn Python for Data

Resources to learn and use the Open Source Programming Environment Python for Data Science.

Web Scraping

Python has the built-in requests library to retrieve web pages or API content.

  • Beautiful Soup - the easiest package but just for parsing HTML.
    • slides. - This session covers topics:  What is Web scraping?  Why do we need it? ; How to parse a web page to extract data, how to save parsed data in CSV format; terms of services and legal issues involved in website scraping, and more.
  • Scrapy - a more comprehensive package that can retrieve data asynchronously
  • Selenium - automates a browser allowing interaction with with JavaScript on pages


  • json - built-in library that encodes and decodes JSON, the standard format of many APIs
  • tweepy - library optimized for accessing the Twitter API.



Collecting Data with Experiments

These are stand-alone software to run experiments and do not require programming, but it can be useful to know Python in order to work with them.