Scrape Websites with Infinite Scrolling (Quora, Facebook,Twitter)Thursday, April 14, 2016 4:54 AM
Scroll-down-to-refresh feature can be found in many websites such as Quora, Facebook, Twitter.
Pagination is usually needed when configuring extraction, because one page of data is not enough. So you need to add a page navigation action. I’m going to take Quora for example.
1. You need to navigate to the target URL. Enter the URL in the build-in browser.
2. We’re now on the search result page. (I’m searching the topic about web scraping.)
Waiting until the page loaded, select the “Advanced Options”
3. Choose “scroll down to page bottom when finished loading”. ➜ Then enter how many times you wanna scroll.
Select internal time and scroll way.➜ I choose “scroll down for one screen”. You can also choose “scroll to the end of the page”.
Now we’re down configuring pagination.
4. Next, select the first answer. ➜ Create a list of items ➜ Add current item to the list ➜ Continue to edit the list
5. Then select the second answer. ➜ Add current item to the list ➜ Finish creating list ➜ Loop to process the list.
Now it’ll automatically repeat the selection.
6. Then you can scrape whatever you want in the answer. Click on title to extract title. ➜ Choose “Extract text". (Extract views, answer and time.)
7. All the content will be selected in Data Fields. ➜ Click the "Field Name" to modify.
8. Once done configuring extraction rule, click “next”.
9. You can choose not to load images to speed up the extraction. But sometimes may cause problems on certain websites.
10. Now the Task is completed! Choose the “Local extraction” to run the task on your computer.
11. The data extracted will be shown in "Data Extracted" pane. Click button to export the results to Excel file, databases or other formats and save the file to your computer. You can check out the built-in browser to see if the task runs as expected.
The result looks pretty good.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
If this video tutorial is not available for you, you can click here to see the corresponding graphic tutorial.