Web Scraping

Web Scraping is essentially getting a program to extract data from the internet within a fraction of the time you would require to do it manually. Web Scraping can be used to compare product reviews/prices from various e-commerce sites, scrape job hosting sites to check for available jobs in your area, monitor social media to gather the latest trends/hashtags. You can also automate your browser to do tasks such as buying your favourite band's concert tickets as soon as they go up for sale, notify you if your exam results are available and much more.
This article will cover Web Scraping in Python, which is the most popular language today used for the purpose.

Contents

Prerequisites

A basic knowledge of HTML and Python. Check out our Absolute Newbie and First Programming Language guide for more.

Resources

  • This is an amazing tutorial to help you get started. Brownie points for doing the Practice Projects mentioned in the end.
  • Beautiful Soup is a tool that can be used to easily parse HTML code.
  • Check out the excellent unofficial documentation on Selenium( A python library used for browser automation ).
  • This link contains an exhaustive list of tools and libraries used in browser automation and web scraping using python. You can also check out the original repository to get information about the tools and libraries used for web scraping in other languages.

Disclaimer

  • Some websites do prohibit the use of robots(i.e web scrapers) to gather information from them, so it is best to read the Website User Agreement before proceeding for the same.

See Also