Extract Data from Websites in Seconds -- Smart Mode No Coding No TrainingTuesday, September 13, 2016
Extract data from websites in seconds? Sounds impossible? But Octoparse’s Smart Mode does enable you to extract data with just one click.
It’s very simple. Just paste the URL of the site you want to scrape, and click the “SMART” button. The rest will be done by the algorithms automatically, without any extra input, training or annotation for the users.
Introduction to Smart Mode
Let’s take Booking for example.
Step 1: Paste the URL of the site you want to extract in the text box and click “SMART”.
Step 2: Now you have all the information you want in a table. You could remove the rows you don’t want or modify the headers to make it easy to read.
Step 3: If you want more than one page, click “Next Page”. And then you can export the extracted data as Excel format that can be easily analyzed.
Step 4: If you want to collect more data from the website, you should switch to Local Extraction or Cloud Extraction. The extraction rule of Smart Mode is automatically created and it also allows to be edited under Wizard Mode or Advanced Mode.
Pros and Cons of Smart Mode
Faster and easier for basic scraping work, and the data can be exported as a structured format.
It doesn’t need any coding and training.
It could switch to Wizard Mode or Advanced Mode for further data extraction.
It works best on list/table pages with more than one row of data like a search results page, category pages, etc.
It can’t deal with complex websites.
Limit of 3 pages
Although there are a lot of room to improve to simplify the process of data extraction on complicated websites, it is still a good data extraction technique.
Most popular posts
- 1 . Recruiter Tips: How to Master High-Quality Recruitment Leads with Web Scraping
- 2 . 2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic
- 3 . 3 Ways to Scrape Financial Data WITHOUT Python
- 4 . 3 Actionable SEO Hacks through Content Scraping
- 5 . Top 30 Free Web Scraping Software in 2020