How to Scrape Data by Searching Multiple Keywords on A Website?Wednesday, April 27, 2016 8:05 AM
★ ★ Click HERE to download the .otd file before you get started. The extraction rule of this task is stored in this .otd file. ★ ★
Welcome to Octoparse’s tutorial.
Sometimes you may want to scrape the search results on websites such as eBay, or Amazon. In this video, I’m going to show you how to scrape data by searching multiple keywords on a website. I will take www.capterra.com for example.
Step 1. Set basic information
Enter the task name. Save your task to a category. Then click Next to the second step.
Step 2. Design Workflow
Copy the target URL. Then enter the URL in the built-in browser.
Wait till the page has loaded, then drop an “Loop” action into the Workflow designer.
Then select a loop mode > Choose “Text list”.
In this way, you can search different keywords one by one. Enter the keywords you want to search. Here I search data and travel for example. Then click “OK”. Click “Save.”
Next, click on the search bar of the website in the built-in browser. Choose “Enter text value”.
Now, drag “Enter text value” into the “Loop Item” box so that the program will loop to enter the keywords, and automatically search them in the search bar.
Then select “Use current loop text to fill the text box”. Then click "save".
Click the “Search” button of the website > Choose “Click an item”.
Now we need to check the workflow by clicking actions from the beginning of the rule.
Go to the webpage > Loop Item > Enter text > Click Item.
Now we are on the search result page. These are the products about data or with the word in the title.
The information I want is on the product page. So I need to create a list of item to get into that page.Click on the title > Select Create a list a item > Add current item to the list > Continue to edit the list.
Click on the send product title > Add current item to the list again. Now all the products about data are have been selected. Finished create list > Then click “loop”.
Now we’re on the product page. Next, extract data. Extract product title. Select “extract text”. Scrape the company of the product.
Now we’re doing configuring extraction rule. You can choose not to load image to speed up the extraction. Then click “Next”.
Choose the "Local Extraction" to run the task on your computer.
The data scraped will be showed in this pane. And you can also see the configured rule of the task. You can also check out the built-in browser to see if the task runs as expected.
Export the results to Excel files, or other formats and save the file to the computer.
This is the data extracted.
You’ve seen how to extract data from the website quickly and effectively. Download Octoparse now and try to extract data yourself!
If this video tutorial is not available for you, you can click here to see the corresponding graphic tutorial.