All Collections
Octoparse 101
Lesson 4: Test-run the task
Lesson 4: Test-run the task
Updated over a week ago

Isn't it exciting that you are about to finish your first scraping task? There is just one more thing you should do (or better do) before running your task - test your workflow step by step to make sure things are working just as expected. With a test run, you'll see if you'd need to adjust your task settings to have the data captured accurately.

To demonstrate the process, we'll keep on using the test site as an example: http://test-sites.octoparse.com/?product_cat=e-commerce-category-1

Test-run workflow steps

It's always good to remember that the steps of the workflow should always be read from top to bottom, and from inside to outside for nested steps.

So for our example, we should test the steps in this order:

  1. Go to Web Page → test if the web page loads properly

  2. Pagination → test if the Next Page button is located correctly

  3. Click to Paginate → test if the web page paginates properly

  4. Loop Item → test if the list of items is complete and correct

  5. Extract Data → test if the data is selected and extracted correctly

2__1_.png

Note: Most of the workflows have only one Pagination. If you have multiple Paginations created in your workflow, it is better to double-check and test it. If you need to delete one, you can drag the steps inside this Pagination out and delete it.

It is necessary to mention that not all tasks are created the same, you may have a completely different task to test with, but the testing methodology can generally be extended to tasks of all kinds. Let's get started!


1. Click on "Go to Web Page"

Once you click on the step, it should load the web page in the built-in browser. If the web page loads well, there's nothing to worry about; however, there are a few things you should always pay extra attention to.

1.1 If the web page loads with infinitive scroll-down → you need to select "Scroll down the page after it is loaded" and complete the proper settings.

63.gif

1.2 If the web page is taking longer than usual to load → you may want to increase the page timeout. Click "General" → "Timeout" to pick an appropriate break time.

6636.png

2. Click the "Pagination" box

In order for pagination to work consistently, there are two things we need to check:

  • If the Next Page button/arrow is being located correctly.

  • If the paginating process works well on all pages, for instance, it needs to paginate correctly going from page 1 to page 2, page 2 to page 3, page 3 to page 4, etc.

After you click on the pagination box, go to the highlighted element on the web page and confirm if it is the correct Next Page button. If you don't have the right Next button, you may need to manually fix it by altering the corresponding XPath.

58.png

3. Click on "Click to Paginate"

When you click on "Click to Paginate", you are literally instructing Octoparse to click on the Next Page button defined in Step 2. If things are working correctly, it should go from page 1 to page 2. Repeat this two-step process (click the "Pagination" box then click "Click to Paginate") as many times as needed to make sure pagination is working correctly on all sequential pages. If the web page is not paginating properly on any of the pages, fix the element XPath in step 2 and test again.

abc.gif

4. Click on the "Loop Item" box

Testing the "Loop Item" is essentially confirming if all the desired items have been selected correctly.

Once clicked, go to the web page in the built-in browser and make sure all the items you need are being highlighted.

360.gif

Tips: If your list is not complete upon testing, you can check out the troubleshooting ideas below:


5. Click on "Extract Data"

Here is the final step - check if the data is being extracted as needed.

Once clicked, check the data in the preview section and confirm if this is the data that you need.

999.png

TIP: If you see any blank fields or if you find misplaced data, you can check out these troubleshooting ideas:


Perform a test run

After you have gone through each step in the task workflow, it is the perfect time to perform a test run on your local device. Click "Run" and select "Run task on your device".

run.png

Now watch your data get extracted live!

  • Show Browser: you can click on it to open a built-in browser and watch the websites to be opened.

  • Task Overview: you can check the start time and end time of the running process

  • Pause: you can pause the process to bypass login or captcha on the web pages

  • Data List: this will give you a preview of the data scraped

  • Event Log: It shows every action Octoparse executes during the scraping. You can easily find errors from the log.


Did this answer your question?