Conglomerate Data in Octoparse
Wednesday, August 30, 2017 10:00 AMFor the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
The "Merge multiple rows" feature can be used to easily combine data of different rows into ONE single row.
Let's suppose you need to extract an article from a blog. In some cases, you might not be able to select the entire article to extract as there are different paragraphs, but you still want all the paragraphs in one single row instead of having different paragraphs in different rows like this:
This is the perfect time to take advantage of the "Merge multiple rows" feature for combining the extracted data into one single row of data. Let's see how to get this done with an example.
Here we use blog content from https://philipyancey.com/a-view-from-abroad to demonstrate.
1) Select the desired data to extract
1. Click on the first paragraph of the article and choose "Select all" on the Tips panel. A Loop Item will be created to extract every paragraph of the post.
2. Select Extract text of the selected elements
2) Merge the extracted data
1. Click on the Extract Data action and go to the Data Preview panel
2. Click on , and select Merge multiple rows of data into one
You are all set! Let's run the task and see what the actual exported data looks like. You can see that paragraphs captured in "Field 1" are now merged into a single row as one big chunk.
Tip! 1. "Merge multiple rows of data into one" is especially useful for extracting articles from any website. You can extract the article as one whole chunk with no other elements like blank lines, comments, or images. 2. When the data are conglomerated as one big chunk, you can further use Data reformat tools 3. If there are multiple fields to extract, you would need to set up "Merge multiple rows of data into one" for every field. 4. This feature can also be used to merge two fields. Use two Extract Data in the workflow, one field in one Extract Data action, then name the fields the same and set the "merge multiple rows" for the fields. As a result, the data scraped in the two fields will be merged into one cell. |
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.