Web Scraping Feature Study | How to Scrape Data by Searching Multiple Keywords on A Website?Wednesday, April 27, 2016 8:05 AM
What it is ?
Sometimes, a series of searching keywords are required as an input for a loop search action. In this case, we need to define a loop for a series of keywords that we want to use for searching. Specifically, we call this loop mode - Text list.
When do you want to use it ?
It can be used as long as we want to create a loop list with pre-defined and specific text values.
How to use it ?
Step 1. First, to define a loop for a list, we can drop an “Loop” action into the Workflow designer.
Step 2. Further, we should go to the "Loop Mode " and select “Text list”, since the loop items in this loop are all type-ahead text values.
In this way, you can search different keywords one by one. Enter the keywords you want to search(one keyword in one row). Click "OK" and then click "Save".
Step 3. Next, click on the search bar of the website in the built-in browser. Choose “Enter text value” to enter the searching keywords.
Step 4. Now, drag “Enter text value” into the “Loop Item” box so that the program will loop to enter the keywords consecutively, and automatically search them in the search bar.
Step 5. Then select “Use current loop text to fill the text box”. Then click "save".
Step 6. To search the keywords from the target website, we need to click the “Search” button of the website and choose “Click an item”.
Step 7. Check the workflow
Now we need to check the workflow by clicking actions from the beginning of the rule.
Now we are going to scrape the searching results!
Step 8. Set up pagination
Click the next page button and choose "Loop click the element"
Step 9. Create a list of items
- Click the first product link(make sure you select the A tag)
- Choose "Create a list of items" and choose "Add the current item into the list"
We will see the first link being added into the list. Then we need to continue to add items.
- Click "Continue to add items into the list"
- Click the second product link
- Choose "Add the current item into the list"
All the items would be added into the list as soon as you add the second one.
- Click "Finish the loop"
- Click "Loop"
Octoparse would goes to the first link after you click "Loop".
Step 10. Extract data
Just click on the data you need and choose "Extract Text". Rename the data field if you need.
Or learn more related topics:
If this video tutorial is not available for you, you can click hereto see the corresponding graphic tutorial.