Scrape Content Details from Freelancer.comTuesday, September 13, 2016 11:41 PM
(Download my extraction task of this tutorial HERE just in case you need it.)
In this tutorial, I will take www. freelancer.com for example to show you how to scrape content details from websites.(extract the data from list pages with more than one row of data like a search results page and the details of each item.)
Step 1. Advanced Mode: Choose “Advanced Mode” ➜ Click “List & Detail” ➜ Click “Start”.
Step 2. Complete basic information. ➜ Click “Next”.
Step 3. Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.
(URL of the example: https://www.freelancer.com/freelancers/? )
Step 4. Click on the “Next” pagination link. ➜ Choose “Loop click Next Page”.
(Note: If you want to extract the information from every page of search result, you need to add a page navigation action.)
Step 5. Move the cursor on the section with similar layout where you would extract data.
Click the first highlighted link ➜ Create a list of sections with similar layout. Click “create a list of items” (sections with similar layout). ➜ “Add current item to the list”.
Then the first highlighted link has been added to the list. ➜ Click “Continue to edit the list”. Click the second highlighted link ➜ Click “Add current item to the list” again.
Now we get all the links with similar layout. ➜ Click “Finish Creating List” ➜ Click “loop” to process the list for extracting the elements in each page.
Step 6. Extract the search results.
Extract the title of the first section. ➜ Click the title ➜ Select “Extract text”. Other contents can be extracted in the same way.
Step 7. All the content will be selected in Data Fields. ➜ Click the “Field Name” to modify.
Step 9. Drag the second “Loop Item” before “Click to paginate” action in the Workflow Designer so that we can grab all the elements of sections from multiple pages.
Step 10. Extract details of the page.
Click the first highlighted link of the first item ➜ Click “Click an item”, then it will automatically turn to the detail webpage. ➜ Click “Advanced Options” ➜ Click “Click items in Loop Item” ➜ Extract the data you want (see step 6 & step 7).
(Note: You can change XPath to exactly locate the data you want to extract. Click HERE to learn more about XPath.)
Step 11. Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” ➜ “OK” to run the task on your computer. Octoparse will automatically extract all the data selected.
Step 12. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!