Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Lesson 2: Getting to know OctoparseThursday, August 16, 2018
In this tutorial, we’ll introduce the user interface of Octoparse Version 7.X. By the end of this reading, you should know exactly where to start a new task, where to check your data when the extraction is done and most importantly where to get help when you need it. It is a preliminary and essential step for anyone to get familiar with the Octoparse UI in order to prepare for successful scraping experience ahead. Let’s take a quick tour of Octoparse V7.0!
The Octoparse user interface has two main parts to it: the sidebar navigation and the main screen. Clicking on any items from the sidebar navigation menu will take you to a new tab on the main screen.
Dashboard is the main console where you’ll manage all your tasks, such as starting, stopping tasks or setting schedules for any tasks. You’ll also see the progress of any running tasks and can easily access the extracted data here.
1. Click on to rename the task easily.
2. Batch start/delete/export tasks using batch task management located at the bottom.
Tools provides extra help with XPath generation, regular expression, export to database and data API.
Tutorials includes an abundance of learnings associated with all features in Octoparse, as well as many step-by-step tutorials to scrape high-profile websites.
Data Service takes care of your data scraping requests if you are looking for additional help such as task configuration service or data delivery service.
Contact support for any questions regarding getting data with Octoparse or any other data scraping inquiries.
1. To find out what your account status and the expiration date is, just hover over your account username.
2. Right below the account username, we have two handy icons: click to start a new task or click to modify account settings.
3. Click on to collapse the side menu.
4. You can always set Workflow Mode to be the default mode at startup of a new task by accessing account setting.
Now, let’s quickly start a new task and check out the task configuration interface.
1) The Select Mode
The Octoparse Select Mode is new in Version 7.0 and is specifically designed for easy capturing any web data with simple clicks. All you need to do is click on the desired data field to capture and select the appropriate action to perform from the Action panel, whether it is capturing the text or building a list. Once you’ve clicked on any elements from the page, Octoparse intelligently predicts and detects the data you might want to capture and provide you with all the available actions to choose from.
1. Click on to minimize the Action panel.
The Octoparse Select Mode will give you an easy start to any web scraping jobs, but what if you want to see how the task is set up from the beginning or check the previous step to see if it was added correctly? This is done by switching to the Workflow Mode.
2. Switch between the Select Mode and the Workflow Mode using the on-and-off button located at the upper right corner.
2) The Workflow Mode
The Workflow Mode offers far more flexibility over how each single step in the workflow can be further customized to accomplish the corresponding action, such as adding wait time, adjusting for AJAX and many more.
The Workflow Designer shows explicitly how one action is connected to the next action. All extraction actions can be dragged and added to the workflow manually. By clicking through each step in the workflow, you can easily see how Octoparse is interacting with the website and if the target data fields can be extracted as expected.
Now you are all set to start getting some data with Octoparse.
- Most popular tutorials
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar
- Scraping restaurant info from Grubhub
- Scrape product images from eBay