When the target data is behind authentication, it is still possible to access the data with Octoparse. In this tutorial, we will show you how to extract data behind a login, as well as how to save cookies to optimize the workflow of your task.
1. Enter your login information to sign in
Click on the textbox for username input on the web page
Select Enter text from the Tips panel
Input the username into Textbox 1, click Confirm, and the username entered is automatically populated to the username textbox on the web page
Click on Continue and select Click button from the Tips panel (you can skip this if the password box is directly below the Email/username box)
Follow the same steps to enter the password
Click the Sign In button on the page and select Click button from the Tips panel
Octoparse has now logged into the website successfully!
Note: Clear cookies
As all websites handle cookies differently, to ensure the task workflow will work consistently, start with the login steps every time the task is executed. To do this, you can clear any cookies saved before the login page is loaded. This way, the target website will always "forget" you and take you to the login page on which you can enter all the login information.
Click Go to Web Page action and select Options
Select Clear cache before opening the web page
Click Apply to save
2. Use cookies to optimize the workflow
Most of the time, you can optimize the workflow by saving the cookies in the task after login. This way, Octoparse will send the saved cookies to the website during loading, and there's a good chance the website will remember "you" and skip the login steps.
Switch to Browser mode
You can log in to the website just like you do on a regular browser.
After login, go to the Options settings of the Go to web page action, tick Use Cookie and click Use cookie from the current page.
Click Apply to save the settings
Now the web page is supposed to "remember" the login and skip the login steps when the crawler is running next time.
Note:
1. A saved cookie is only effective before it gets expired
Cookies come in many different forms. Some have a specific expiration time, others expire immediately as the browser is closed. In Octoparse, the saved cookie will no longer work when it gets expired. To resolve this, you will need to go through the login steps once again under browser mode in order to obtain and save the updated cookie.
2. Your password is well-protected
In Octoparse, when you enter your password, it is only accessible to your own account. When a task is exported, the password saved in the task gets removed automatically.
Any login information saved will be removed from your account permanently as soon as the task is deleted.
3. Entering the captcha manually while running local extraction
If you encounter a captcha, you can manually input the captcha when running the task locally.
Octoparse can automatically deal with certain types of captcha, and you can refer to Resolve Captcha.