Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scraping info from Craigslist

Monday, February 25, 2019

In this tutorial, we are going to show you how to scrape information from Craigslist.

To follow through you might want to use the URL in this tutorial:

https://newyork.craigslist.org/d/accounting-finance/search/acc

 

 Here are the main steps in this tutorial:[Download task file here]

1."Go To Web Page" - to open the targeted web page

2. Create a pagination loop - to scrape all the results from multiple pages

3. Create a "Loop Item" - to loop click into each item on each list

4. Extract data - to select data you need to scrape

5. Run extraction - to run your task and get data

 

 

 

 

 

1) "Go To Web Page" - to open the targeted web page

  • Create the task with "Advanced Mode".
  • Paste the URL into the "Extraction URL" box and click "Save URL" to move on.

 

 

 

 

2) Create a pagination loop - to scrape all the results from multiple pages

  • Click the "Next>" button on the webpage
  • Click "Loop click the selected link" on "Action Tips"

 

 

 

 

 

3)Create a "Loop Item" - to loop click into each item on each list

We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we 'd better go back to the first page.

  • Click "Go To Web Page" in the workflow.
  • Select the pagination loop in the workflow

By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.

  • Click the title of the first item

  The first item is highlighted in green while the others are highlighted in red

  • Click "Select All" on "Action Tips"

  All of the items are highlighted in green

  • Select "Loop click each URL"

 

 

 

 

4) Extract data - to select data you need to scrape

  • Select data you need on the item page to scrape, such as compensation, employment type, title etc.
  • Select "Extract text of the selected element" and rename the "Field name" column if necessary.

 

 

 

 

5) Run extraction - to run your task and get data

  • Click "Save"
  • Click "Start Extraction" and "Local Extraction"

 

 

Here is the sample output:

 

 Was this article helpful? Contact us anytime if you need our help!

 

 

 

Author: Momo 

Editor:Suire

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download