undefined
Blog > Post

Making a Simple Web Scraper with Octoparse

Tuesday, February 18, 2020

To make good use of the content posted on the web, we can extract the data from the web for valid purposes, legally. This process is called web scraping, and the tool used during this data extraction is called web scraper.

Usually, we copy and paste web content manually if we don’t know how to program. Using this traditional web extraction method is extremely time-consuming and inefficient. Besides, most of the information on the website is written in different forms on the web - within an HTML tag or an HTML attribute. Therefore, it’s better for non-programmers to use some web scraping software that can grab the exact content you want to pull out of the website and combine the data with your own system/database.

(picture from neerajkumar.name)

 

I assume you want to extract data from websites manually when you are reading this article and thinking of making a simple web scraper. In fact, it’s easy to make such a simple web scraper with some automated web data extraction software and you don't even need to know how to write code. All you need is to pick the right tool to help you. So how to choose the best software to help make a simple web scraper with so many web data extraction software to choose from?

 

What is the first thing that comes into your mind then? Well, it’s best that the software is free. Then, it is a great option to select Octoparse, a powerful automated data extraction software that offers advanced features to help you extract all the text in the HTML documents. Click HERE to learn more about Octoparse.

 

It would be easier to understand how a web scraper work if you know the structure of a web page. Let’s get started to make a simple web scraper using Octoparse old version- extracting the title and URLs of all the case tutorials from octoparse.com.

 

Check out the latest version of this article with Octoparse 7.X: How to Build a Web Crawler – A Guide for Beginners

 

Step 1. Download Octoparse and launch it. Choose the Wizard Mode and click on the “Start” button.

 

Step 2. Click on the “Create” button under “List and Detail Extraction”, then enter the basic information for the web scraper.

 

Step 3. Enter the URL from which we want to pull data.

Step 4. Click random two items of the web page and click on the “Next” button.

 

Step 5. Check the “Enable pagination” option, and go to the bottom of the web page to click the “Next Page” link with 4 times, then click on the “Next” button. Octoparse will take you to the tutorial detail page.

 

 

Step 6. Click the content you want from the tutorial, and click on the “Next” button.

 

Step 7. Now you are done making a simple web scraper! Click “Local Extraction” to begin extracting data from octoparse.com.

 

The data extraction results screen appears in the Data Extracted pane. You can export the data if needed.

 

Conclusion

In this tutorial, we’ve made a simple web scraper with Octoparse within a few minutes. Since most data that can bring valuable insight is included in complex website, you can explore Octoparse to try to make a web scraper to collect some semi-structured data and then convert it into structured data to further process it. Happy scraping!

 

 

 

 Author: The Octoparse Team 

Octoparse Download

 

More Resources

 

Top 20 Web Scraping Tools to Scrape the Websites Quickly

Top 30 Big Data Tools for Data Analysis

Web Scraping Templates Take Away

How to Build a Web Crawler - A Guide for Beginners

Video: Create Your First Scraper with Octoparse 7.X

 

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download