Use Proxy Servers for Anonymous Web ScrapingThursday, September 22, 2016 2:03 AM
For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
Getting your IP address blocked is one of the problems you may face when scraping websites. Therefore, a proxy or proxy server is an essential part of web scraping and it is widely used for anonymous web scraping.
Basically, a proxy or proxy server is another computer that servers as a hub through which internet requests are processed.
When you scrape data from some websites too frequently, you repeatedly access the remote web servers and there is a chance that they may block your computer’s IP address from you further web scraping. And sometimes you may not want to reveal your identity(network details) to web servers while scraping data.
Using Proxy Servers for Anonymous Web Scraping
1. Scrape Websites on Cloud Platform
Octoparse’s cloud platform provides rich rotating anonymous IP address proxy servers for web scraping. You don’t have to manually set up the connection with different proxy servers.
In case you choose “Cloud Extraction”, the software will automatically rotate and use proxy servers. The cloud platform allows you to access and scrape web data through the provided proxy servers, thereby maintaining anonymous web scraping.
To use this feature, just run your extraction task on the cloud platform by selecting “Cloud Extraction as shown below.
Or you can select a specific task in "My Task", right-click it and then select "Cloud Extraction".
2. Manually Set up IP Addresses
Octoparse also allows you to do web scraping with the help of free proxy servers. Either a single proxy server or a list of proxy servers can be used for anonymous web scraping.
To obtain proxy servers, there are many free as well as paid proxy servers available around the web. Free proxies are usually slow and probably result in the termination of your web scraping process. Plus these free proxies may not be reliable. So we recommend using Octoparse’s cloud platform that has automatic IP rotating feature.
Nevertheless, to configure this feature, you can add a single proxy or a list of proxies as shown below. Enter one proxy per line.
3. Use a VPN for Web Scraping
Also, you can use a VPN to hide your scraping activities instead of proxies for anonymous web scraping. There are many free or paid VPN software available around the web. You can find them by googling.
Please contact our support in case you need assistance or have any questions.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.