URLs - Advanced ModeThursday, March 24, 2016 6:04 AM
Extract Data from A List of URLs with Similar Web Content Layouts
Step 1. Download Octoparse and install it. Register a new account at www.octoparse.com. Or directly click the “Sign up” option the Login interface.
Step 2. Advanced Mode: Go to “Advanced Mode” ➜ “List of URL” ➜ “Start”.
Step 3. Complete basic information. ➜ Click “Continue” ➜ Click “Next”.
Step 4. Drag a "Loop Item“action and drop it into Workflow Designer.
Click "Copy URLs". ➜ Enter a list of URLs with similar page structure. ➜ Paste the URLs in the textbox. ➜ Click "save".
Step 5. Wait until the page loaded, extract the title and content of the first page. ➜ Click these two elements. ➜ Select “Extract the text".
After extracting the elements of the first page, Octoparse will extract data with similar layout in other pages.
All the content will be selected in Data Fields. ➜ Click the "Field Name" to modify. ➜ Click “Next” ➜ Click “Next”.
Step 6. Click “Local Extraction”. ➜ “OK” to run the task on your computer. Octoparse will automatically extract all the data selected.
The data extracted will be shown in "Data Extracted" pane.Click button to export the results to Excel file, databases or other formats and save the file to your computer.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
If this video tutorial is not available for you, you can click here to see the corresponding graphic tutorial.