All Collections
Case Tutorial
E-Commerce
Scrape product information from Tokopedia
Scrape product information from Tokopedia
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Tokopedia is an Indonesian technology company specializing in e-commerce. In this tutorial, we are going to show you how to scrape product information from Tokopedia.

For Tokopedia scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.

To demonstrate, we will use the URL below as an example: https://www.tokopedia.com/search?st=product&q=usb

The main steps are shown in the menu on the right, and you can download the sample task file here.


1. Create a Go to Web Page - to open the web page

  • Paste the URL on the home screen and click Start

2. Auto-detect web page data - to create a workflow

  • Select Auto-detect web page data on the Tips panel

  • After the auto-detection finishes, select Edit under Add a page scroll

Add_page_scroll.jpg
  • Select Scroll for one screen, set Repeats number to 20 and Confirm, then Create workflow

  • Go to Data Preview - delete all fields except the page URL by clicking on ... (more) next to the field headers

mceclip11.png

3. Create Pagination - to scrape from multiple pages

  • Click on Next page button on the Tips panel

  • Scroll down and click the next page button on the page

  • After the Button Xpath is auto-filled, click Confirm


4. Adjust the Workflow

  • To prevent data loss, hover over Loop Item and drag it under the Scroll page

  • Below is what the final workflow looks like. If everything is in place, you can continue to run the task.


5. Run Task - to get data

  • Run the task on the top right corner

  • Run task on your device to run the task on your local device (note that cloud run may not work for this website as it is sensitive to scrapers)

Did this answer your question?