Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Set up proxiesThursday, August 16, 2018
There are some websites that might be very sensitive to web scraping and take some serious anti-scraping measures like IP’s blocking to stop any possible scraping activities. Manual set up proxies in Octoparse is particularly useful if you would like to access the website with external proxies (or from a specific country) or you prefer to use your own proxies instead of using our auto IP rotation features of cloud extraction .
Unlike other scraping utilities that charge for the set up external proxies feature, Octoparse allows both free and premium users to add custom proxies for IP rotation. Getting your IP address blocked is one of the problems you may face when scraping websites. So a proxy or proxy server is an essential part of web scraping and it is widely used for anonymous web scraping.
To use external proxies for rotation:
Click "Setting" above the workflow once you've finished configuration.
("Setting" option is available only when there’s an "Extract data" step in the workflow.)
Select "Use proxies" and click "Settings" to add custom proxies. Currently Octoparse only supports HTTP proxies. Separate IP address of the proxy server and port number with a colon. e.g. 184.108.40.206:2318.
If you have a list of IP's, add each proxy in "IP Proxies" on a new line.
Click "OK" and "Save" to save your changes. Octoparse will execute the rotation according to your settings when running task locally.
Customizing proxies for rotation is only available for local extraction . (Please note that currently Octoparse does not provide proxies for IP rotation of local extraction. To obtain external proxies, there are many free as well as paid proxy servers available around the web. )
For Octoparse Standard/Professional Plan, when a task is executed with cloud extraction , it will be run on the cloud platform supported by thousands of cloud servers, each with a unique IP address. 6-20 servers will be assigned simultaneously and requests are performed through various IP’s, minimizing the chance of being traced or blacklisted.
Use a proxy to change the IP address for login Octoparse - If you fail to login to Octoparse due to your student or company intranet restricts some external request, use a proxy for login to use Octoparse.
To do this, click "Use IP Proxy" and enter the information requested:
Click "Test" button to test if the connection is successful. If it's successful, it will prompt:
- Most popular tutorials
- Scrape product information from Amazon
- How to download images from a list of URLs?
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar