Web Scraping Tutorial: Scrape Websites That Require Login
Wednesday, September 28, 2016 6:13 AMIn many occasions, login is required to access the data needed. In this tutorial, I will take ebay for an example to show you how to scrape websites that requires login.
Step 1: Navigate to the target URL
- Enter the URL into the build-in browser (URL for the example: https://signin.ebay.com/ws/eBayISAPI.dll?SignIn&ru=http%3A%2F%2Fwww.ebay.com%2F)
Step 2: Enter username and password
- Click any where on the text box for email/username, when prompted, select “Enter text value”
- Input your account information into the text box for “Enter text”
- Click “Save” (Now, see how your account information is synched to text box on the webpage)
- Enter password by following the same steps
Step 3: Sign in
- Click “Sign in”, when prompted, select “Click an item”.
(Now you have logged-in and can proceed to scraping the data needed)
Step 4: Create a list for the items to be extracted
- Click on the first item of the list, when prompted, select "Create a list of items"
- Select "Add current item to the list"
(Now, the first item has been added to the list, we need to finish adding all items to the list)
- Click "Continue to edit the list"
- Click on the second item with similar layout
- Select "Add current item to the list"
(Now you should have all items added to the list)
- Click "Finish Creating List"
- Select "loop" to have Octoparse to click on each item of the list one by one
(As the detailed page for the first item for the first item in the list, we can now proceed to extract the detailed information about the specific item)
- Click on the desired text, when prompted, select "Extract Text"
- Continue with all data needed
- Rename the fields if necessary
- Click "Save"
Click the first item ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".
Then the first item has been added to the list. ➜ Click "Continue to edit the list".
Click the second item ➜ Click "Add current item to the list" again. Now we get only 4 items from the page. ➜ Click "Continue to edit the list". ➜ Click the last item ➜ Click "Add current item to the list" again. Now we get all the items from the page.
Then click "Finish Creating List" ➜ Click "loop" to process the list for extracting the elements in each page.
- See more at: http://www.octoparse.com/tutorial/scrape-data-from-multiple-web-pages-example-medline/?category=#sthash.E3qlhysa.dpufClick the first item ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".
Then the first item has been added to the list. ➜ Click "Continue to edit the list".
Click the second item ➜ Click "Add current item to the list" again. Now we get only 4 items from the page. ➜ Click "Continue to edit the list". ➜ Click the last item ➜ Click "Add current item to the list" again. Now we get all the items from the page.
Then click "Finish Creating List" ➜ Click "loop" to process the list for extracting the elements in each page.
- See more at: http://www.octoparse.com/tutorial/scrape-data-from-multiple-web-pages-example-medline/?category=#sthash.E3qlhysa.dpufClick the first item ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".
Then the first item has been added to the list. ➜ Click "Continue to edit the list".
Click the second item ➜ Click "Add current item to the list" again. Now we get only 4 items from the page. ➜ Click "Continue to edit the list". ➜ Click the last item ➜ Click "Add current item to the list" again. Now we get all the items from the page.
Then click "Finish Creating List" ➜ Click "loop" to process the list for extracting the elements in each page.
- See more at: http://www.octoparse.com/tutorial/scrape-data-from-multiple-web-pages-example-medline/?category=#sthash.E3qlhysa.dpuf
Step 5: Starting running your task
- Click “Next”
- Select “Local Extraction”
- Click “OK” to run the task on your computer.
(Octoparse will automatically extract all the data selected. Check the "Data Extracted" pane for the extraction progress)
- Click “Export” to export the extracted data to any formats of our choice, or to any databases
Author: The Octoparse Team
For more information about Octoparse, please click here.
Author's Picks
Octoparse Smart Mode -- Get Data in Seconds
Get Started with Octoparse in 2 Minutes
Top 30 Free Web Scraping Software
Top 30 Free Web Scraping Software
- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf