Check The Extraction Rule When Errors OccurWednesday, July 20, 2016 10:52 PM
Isn't it excited that you are about to finish your first scraping task? There is just one more thing you should do (or better do) before running your task - test your workflow step by step to make sure things are working just as expected. With a test run, you'll see if you'd need to adjust your task settings to have the data captured accurately.
To demonstrate the process, we'll keep on using the test site as an example: http://test-sites.octoparse.com/?product_cat=e-commerce-category-1
Test-run workflow steps
It's always good to remember that the steps of the workflow should always be read from top to bottom, and from inside to outside for nested steps.
So for our example, we should test the steps in this order:
- "Go to Web Page" → test if the web page loads properly
- "Pagination" → test if the Next Page button is located correctly
- "Click to Paginate" → test if the web page paginates properly
- "Loop Item" → test if the list of items is complete and correct
- "Extract Data" → test if the data is selected and extracted correctly
It is necessary to mention that not all tasks are created the same, you may have a completely different task to test with, but the testing methodology can generally be extended to tasks of all kinds. Let's get started!
1. Click on "Go to Web Page"
Once you click on the step, it should load the web page in the built-in browser. If the web page loads well, there's nothing to worry about; however, there are a few things you should always pay extra attention to.
1.1 If the web page loads with infinitive scroll-down → you need to select "Scroll down the page after it is loaded" and complete the proper settings.
1.2 If the web page is taking longer than usual to load → you may want to increase the page timeout. Click "General" → "Timeout" to pick an appropriate break time.
2. Click the "Pagination" box
In order for pagination to work consistently, there are two things we need to check:
- If the Next Page button/arrow is being located correctly.
- If the paginating process works well on all pages, for instance, it needs to paginate correctly going from page 1 to page 2, page 2 to page 3, page 3 to page 4, etc.
After you click on the pagination box, go to the highlighted element on the web page and confirm if it is the correct Next Page button. If you don't have the right Next button, you may need to manually fix it by altering the corresponding XPath.
3. Click on "Click to Paginate"
When you click on "Click to Paginate", you are literally instructing Octoparse to click on the Next Page button defined in Step 2. If things are working correctly, it should go from page 1 to page 2. Repeat this two-steps process (click "Pagination" box then click "Click to Paginate") as many times as needed to make sure pagination is working correctly on all sequential pages. If the web page is not paginating properly on any of the pages, fix the element XPath in step 2 and test again.
Check out these pagination troubleshooting ideas:
4. Click on the "Loop Item" box
Testing the "Loop Item" is essentially confirming if all the desired items have been selected correctly.
Once clicked, go to the web page in the built-in browser and make sure all the items you need are being highlighted.
If your list is not complete upon testing, you can check out the troubleshooting ideas below:
1. Loop Item
5. Click on "Extract Data"
Here is the final step - check if the data is being extracted as needed.
Once clicked, check the data in the preview section and confirm if this is the data that you need.
If you see any blank fields or if you find misplaced data, you can check out these troubleshooting ideas:
Perform a test run
After you have gone through each step in the task workflow, it is the perfect time to perform a test run on your local device. Click "Run" and select "Run task on your device".
Now watch your data get extracted live!
Check out the FAQs below for why you are not getting the data you need.
If none of these solves the problem, you can contact us for assistance.
Now you know your task is working right, it's time to get data for real!
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.