Python has the built-in requests library to retrieve web pages or API content.
Web Scraping
- Beautiful Soup - the easiest package but just for parsing HTML.
- slides. - This session covers topics: What is Web scraping? Why do we need it? ; How to parse a web page to extract data, how to save parsed data in CSV format; terms of services and legal issues involved in website scraping, and more.
- Scrapy - a more comprehensive package that can retrieve data asynchronously
- Selenium - automates a browser allowing interaction with with JavaScript on pages
APIs
- json - built-in library that encodes and decodes JSON, the standard format of many APIs
- tweepy - library optimized for accessing the Twitter API.
Tutorials