Extract Data from Websites in Seconds -- Smart Mode No Coding No Training

9/13/2016 1:30:32 AM

Extract data from websites in seconds? Sounds impossible? But Octoparse’s Smart Mode does enable you to extract data with just one click.

It’s very simple. Just paste the URL of the site you want to scrape, and click the “SMART” button. The rest will be done by the algorithms automatically, without any extra input, training or annotation for the users.

 

Introduction to Smart Mode

Let’s take Booking for example.

Step 1: Paste the URL of the site you want to extract in the text box and click “SMART”.

 

Step 2: Now you have all the information you want in a table. You could remove the rows you don’t want or modify the headers to make it easy to read. 

 

Step 3: If you want more than one page, click “Next Page”. And then you can export the extracted data as Excel format that can be easily analyzed.

 

Step 4: If you want to collect more data from the website, you should switch to Local Extraction or Cloud Extraction. The extraction rule of Smart Mode is automatically created and it also allows to be edited under Wizard Mode or Advanced Mode.

 

Pros and Cons of Smart Mode

Pros

  • Free

  • Faster and easier for basic scraping work, and the data can be exported as a structured format.

  • It doesn’t need any coding and training.

  • It can deal with javascript pages.

  • It could switch to Wizard Mode or Advanced Mode for further data extraction.

Cons

  • It works best on list/table pages with more than one row of data like a search results page, category pages, etc.

  • It can’t deal with complex websites.

  • Limit of 3 pages

 

Conclusion

Although there are a lot of room to improve to simplify the process of data extraction on complicated websites, it is still a good data extraction technique.

 

Recent Posts

Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.