Web Scraping - How to Scrape Data That Appears on HoverMonday, March 6, 2017 11:14 PM
It happens that the data you want to pull out of the web page would only appear when you hover over the data. Octoparse provide the feature "Hover Over" that enables you to extract data that is only visible when a mouse hovers over it.
In this web scraping tutorial we will show you how to extract data that only appears on hover from Twitter using Octoparse. You can follow the steps below to make a scraping task(What is an OTD. file) to scrape information from Twitter.
Step 1. Set up basic information.
Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜ Complete basic information ➜ Click "Next".
Step 2. Enter www.twitter.com in the built-in browser. ➜ Click "Go" icon to open the webpage.
Step 3. Enter login information(your username and password) and store the cookie in Octoparse.
Please follow the steps from this tutorial - Web Scraping - How to Store Cookies in Octoparse and delete the unnecessary actions in the Workflow. We will ignore these steps and continue with a ready twitter.com site.
Step 4. Extract the items from this web page.
For some websites, we need to right click the items to prevent from triggering the hyperlink of the items when creating a list for extracting these items.
Right click the first item➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".
Then the first item has been added to the list. ➜ Click "Continue to edit the list".
Click the second item➜ Click "Add current item to the list" again (Now we get all the items with similar layout) ➜ Click "Finish Creating List" ➜ Click "loop" to process the list for extracting the detailed information.
Here, we can replace the "Extract Data" action with a "Cursor Over" action after the "loop" for processing the list is created.
Right click the "Extract Data" action inside the Loop ➜ Choose "Delete" ➜ Drag a "Cursor Over" action into this Loop ➜ Check the "Use Loop" option and the "AJAX Load" option under Advanced Options ➜ set an AJAX timeout of 5 seconds (or longer if needed) ➜ Click "Save".
Go through the task.
Go to the webpage ➜ The Loop Item box ➜ Cursor Over.
And you can see the data that appears on hover on this web page.
Step 5. Extract data that appears on hover from the these items.
Click the Loop Item box ➜ Select the first item of the list ➜ Click "Cursor Over" ➜ Click the data that appears on hover ➜ Select "Extract text".
You will see the data is extracted in Data Fields and an "Extract Data" action is created in the Workflow. Other contents can be extracted in the same way.
You can click the "Field Name" to rename it. Then click "Save".
Step 6. Click "Save" to save your configuration. Then click "Next" ➜ Click "Next" ➜ Click "Local Extraction" to run the task on your computer. Octoparse will automatically extract all the data selected.
Step 7. All data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
1. If you find the scraping task stuck, you can just go through the Workflow from the beginning or refresh it by reopen the task.
2. If you have difficulty in creating a list for extracting the items from the web page, you can try to close the scraping task and reopen it.
Author: The Octoparse Team