Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
Scraping info from CraigslistMonday, February 25, 2019
In this tutorial, we are going to show you how to scrape information from Craigslist.
To follow through you might want to use the URL in this tutorial:
Here are the main steps in this tutorial:[Download task file here]
1) "Go To Web Page" - to open the targeted web page
- Create the task with "Advanced Mode".
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on.
2) Create a pagination loop - to scrape all the results from multiple pages
- Click the "Next>" button on the webpage
- Click "Loop click the selected link" on "Action Tips"
3)Create a "Loop Item" - to loop click into each item on each list
We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we 'd better go back to the first page.
- Click "Go To Web Page" in the workflow.
- Select the pagination loop in the workflow
By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.
- Click the title of the first item
The first item is highlighted in green while the others are highlighted in red
- Click "Select All" on "Action Tips"
All of the items are highlighted in green
- Select "Loop click each URL"
4) Extract data - to select data you need to scrape
- Select data you need on the item page to scrape, such as compensation, employment type, title etc.
- Select "Extract text of the selected element" and rename the "Field name" column if necessary.
5) Run extraction - to run your task and get data
- Click "Save"
- Click "Start Extraction" and "Local Extraction"
Here is the sample output:
Was this article helpful? Contact us anytime if you need our help!
- Most popular tutorials
- Scrape product information from Amazon
- How to download images from a list of URLs?
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar