Web Scraping Case Study | Crawling flight information from ticket websitesFriday, May 12, 2017 2:43 AM
Welcome to our scraping case study!
In this tutorial, we will show you how to crawl flight information from ticket website: Ctrip.com.
List of features covered
Now, let's get started!
Step 1. Set up basic information
Step 2. Navigate to the target website
First, open the webpage in Octoparse's built-in browser.
Step 3. Create a list of items
Once the webpage finishes loading, notice how the data we want are nicely arranged in similar sections. Now, we will build a loop list of all the similar sections so that we could proceed to capturing specific data from each of the sections.
Note: If The selection had not been identified properly in the first place, click "Expand the selection area" to the point where the targeted section is outlined properly.
Once done, continue to build the list.
The first item has been added to the list, we need to finish adding all items to the list.
Now, we have successfully added all similar sections to the list.
Step 4. Select the data to be extracted and rename data fields
Click the first item or any other item from the list under "Loop Items", notice the selected section is now outlined.
Say we would like to capture the Airline, departure time, etc.
Note the extraction action we are setting up now is going to apply to the other sections of the list.
Step 5: Re-format Extracted Data
If we look closely at the sample data extracted for departure time and arrival time, it is obvious that the format is a bit messy with too many blanks. To fix this, we need to reformat this data field.
Now, all redundant spaces should have been removed and the data now looks just right.
Once we have complete re-formatting all the data field needed, Click "Save".
Step 6. Starting running your task
Now we are done configuring the task, it's time to run the task to get the data we want.
There are Local Extraction and Cloud Extraction (premium plan). With a local extraction, the task will be run in your own machine; with a Cloud extraction, the task will be run on Octoparse Cloud platform, you can basically set it up to run and turn off your desktop or laptop and data will be automatically extracted and saved to the cloud. Features such as scheduled extraction, IP rotation, API are also supported with the Cloud. Find out more about Octoparse Cloud here.
Step 7. Check the data and export
Good job for completing this tutorial！Check out more related case studies：
Or learn more about how Octoparse can help you get the data you want: