undefined

Conglomerate Data in Octoparse

Wednesday, August 30, 2017 10:00 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust!  Download and upgrade here if you haven't already done so!

The "Merge multiple rows" feature can be used to easily combine data of different rows into ONE single row. 

 

Let's suppose you need to extract an article from a blog. In some cases, you might not be able to select the entire article to extract as there are different paragraphs, but you still want all the paragraphs in one single row instead of having different paragraphs in different rows like this: 

 

This is the perfect time to take advantage of the "Merge multiple rows" feature for combining the extracted data into one single row of data. Let's see how to get this done with an example.

 

Here we use blog content from https://philipyancey.com/a-view-from-abroad to demonstrate. 

 

1) Select the desired data to extract

1. Click on the first paragraph of the article and choose "Select all" on the Tips panel. A Loop Item will be created to extract every paragraph of the post. 

2. Select Extract text of the selected elements

select data 

 

 

2) Merge the extracted data

1. Click on the Extract Data action and go to the Data Preview panel

2. Click on  mceclip1.png  ,  and select Merge multiple rows of data into one

merge data

 

You are all set! Let's run the task and see what the actual exported data looks like. You can see that paragraphs captured in "Field 1"  are now merged into a single row as one big chunk.

sample data

 

Tip!

1. "Merge multiple rows of data into one" is especially useful for extracting articles from any website.

You can extract the article as one whole chunk with no other elements like blank lines, comments, or images.

2. When the data are conglomerated as one big chunk, you can further use Data reformat tools  to add a prefix or suffix, such as "|" and "\" to reformat the data.

3. If there are multiple fields to extract, you would need to set up "Merge multiple rows of data into one" for every field. 

4. This feature can also be used to merge two fields. Use two Extract Data in the workflow, one field in one Extract Data action, then name the fields the same and set the "merge multiple rows" for the fields. As a result, the data scraped in the two fields will be merged into one cell.

 

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline