Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Thursday, August 16, 2018
Sometimes you may need to interact with a web page while extracting data. For example:
· You want to scrape data from a website which requires to login to first. So you need to input your username and password to login before accessing the data you want.
· You have a list of keywords to be searched through into a search box, but you don’t want to enter them one by one.
In this tutorial, we will show you how to deal with single or multiple texts/keywords input on the web page with Octoparse.
1）Input a single keyword into the textbox
Entering the text or keyword in Octoparse is easy. With the built-in browser, you can interact with the web page by simply pointing and clicking, just like what you do using any normal browser.
Let’s see the very basic steps to input the text in Octoparse.
1. Select the input field on the page in the built-in browser
When you click on the input field in the built-in browser, Octoparse can detect that you select a textbox. The "Enter text" action will automatically appear in "Action tips".
2. Select "Enter text"
Once you click on "Enter text", a text box will be there in "Action tips".
3. Enter the text/keyword
Input the text or keyword in the textbox, and click "OK".
You can see what you just input also appear in the input field on the page in the built-in browser.
Octoparse would inform you with "Input Text Saved" in "Action Tips", and you can also notice the "Enter text" action is added into the workflow.
2）Input multiple keywords into a search box
If you have series of pre-defined and specific text values, you can add them into "Text list" to create a loop search action. Octoparse will automatically enter every word in the list into the search box, one word a time.
Let's see how to create a "Text list" loop mode to scrape data by searching multiple keywords on a website.
The “Text list” mode is used for loop items that are all type-ahead text value. There are actually 5 loop modes in Octoparse: Variable List, Single Element, Fixed List, List of URLs, and Text List.
If you want to know more about these loop modes, you can go to the following articles:
1. Drop a "Loop item" action into the Workflow designer
2. Go to "Loop Mode" and select "Text list"
3. Go to "Text list" below and click "A" to enter the keywords you want to search in the textbox
Click "OK" when your finish entering. Then You can see your keywords in the “Loop Item” box.
4. Click on the search box on the page in the built-in browser and select "Enter text" in "Action Tips"
5. Input the first keyword in your "Text list" in the text box
6. Drag the "Enter Text" action into the "Loop Item" in the Workflow designer
7. Click on the "Enter Text" action in the Workflow designer
Go to "Loop Text" and select "Use the text in Loop Item to fill in the text box"
8. Click the search button of the web page and select "Click button" in "Action Tips"
After clicking on "Click button", you will notice the "Click Item" action is added into the workflow.
9. Click "Save" to finish creating the "Text list" search loop.
Finally, don't forget to check the workflow.
Let's see how Octoparse will get these keywords to be searched through into the search box and interact with the website.
1. Click on the "Loop Item" box
You can see the keywords that you’ve just input displayed in "Loop Item".
2. Select one keyword, and click on the "Enter Text" action
In the built-in browser, you can see that the selected word is entered in the search box.
3. Click on "Click Item"
Octoparse simulates real browsing activities as it clicks the search button. You can see the search results of the select word on the web page in the built-browser.
- Most popular tutorials
- Scrape real estate data on Realtor.com
- Scrape Data Via Google Searching
- Scrape data on Instagram
- Why cloud extraction does not run all the tasks I set up to execute?
- Data fetched to the incorrect data fields