logo
languageENdown
menu

What’s New in Octoparse 7.1

3 min read

We’re pleased to announce the release of Octoparse version 7.1.0!

This release introduces a brand-new feature, Task Templates with ready-to-use tasks for extracting different types of websites, such as Amazon, Yelp, Tripadvisor, etc., and also includes three major updates to the dashboard, URL input features, and anti-blocking settings.

New

· Task Templates

Octoparse’s new Task Templates are designed to make web scraping easier and more accessible for anyone. With pre-built task templates, there’s no need to configure the scraping tasks. The ready-to-use task templates will shorten your learning curve and help you quickly get on board.

– How does it facilitate scraping?

With Task Templates, anyone without/with little programming knowledge is able to achieve web scraping very easily.  All you need to do is just to enter parameters (target page URL, keywords for searching, etc.). And then just sit back and relax!

1. Dozens of ready-to-use templates covering the most popular websites across different industries

2. Rich built-in data fields

3. Sample output preview

– How it works?

After selecting the desired template, you will be prompted to enter the required parameters, like the keywords to be searched through or the target URLs, then the scraper will work itself out to collect data from the website.

Updates

· Dashboard upgraded

Compared to Dashboard in version 7.0, the new Dashboard layout is more informative, customizable, and efficient. 

In version 7.1, you could completely change the look of your dashboard and the display order of your tasks.

1. Customizable information columns

A selection of columns is provided for users to decide what task information you’d like to see.

2. Two default view modes

By default,  tasks would be sorted by groups on the dashboard. By switching the view mode, you could sort the tasks based on the last executed time in descending

3. Efficient custom filters

With the upgraded custom filters, with very little effort you could have your own unique dashboard, or narrow it down to one single task/a specific cluster of tasks.

· URL input upgraded

We’ve expanded the input URL limits from 20,000 to 1,000,000 and also introduced two new input methods for large-scale data extraction projects.

1. Increased maximum input quota of URLs

The maximum number of URLs allowed to be input at once is significantly raised. Compared to 20k URLs previously, now Octoparse supports for adding up to 1 million URLs to any single task/crawler.

  • Tips: Please notice that the maximum number for the pasting-in method to input URLs is deduced to 10K.

2. Batch import URLs from files or another task

– Import URLs from files

In version 7.1, you could import a CSV, TXT, or Excel file, and Octoparse would intelligently read the URL data from the file.

– Import URLs from tasks

Two options are supported. One is simple import, importing URLs from a completed task directly; and the other is advanced import, “transferring” URLs from a parent task into a child task in associated running.

When two tasks are associated, Octoparse provides four execution options. For example, if you select “Run task as soon as its parent task starts”, then once Octoparse reads any URL extracted in the parent task, it would automatically transfers the URL into the child task and set the task to execute.

Tips:

  • 2. When there is no data extracted in the parent task, to start configuring the child task, you’ll need to manually paste in one URL. 

3. Batch generate URLs based on a pre-defined pattern

This feature allows you to easily modify the needed parameter/s in one given URL so as to generate a list of URLs that follows that pattern.

Highlight the wanted parameter, click “Add parameter”, and select from the four options to define the pattern you need. 

 · Anti-blocking settings upgraded

We have added two options to help reduce the chance of getting blocked by scraping-sensitive websites. In version 7.1, now Octoparse could automatically switch UA and clear cookies for you.

1. Auto-switch browser (User agent)

2. Auto-clear cookies

Hot posts

Explore topics

image
Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today

Download

Related Articles