Web Scraping Case Study | Scraping Data from YoutubeMonday, April 24, 2017 5:01 AM
In this tutorial, I will show you step by step on how to crawl data from Youtube.com with Octoparse.
List features covered
- Build a list
- Expand Selection Area
- Local Extraction
Now, let's get started!
Step 1. Set up basic information
- Click on "New Task" to start on a new task and complete the basic information
Step 2. Navigate to the target website
- Enter the target URL in the built-in browser (URL of the example: https://www.youtube.com/feed/trending)
- Click "Go" icon to open webpage
Step 3. Create a list of items
To proceed with the extraction, we'll first need to build a list of items to extract from.
Move your cursor over the youtube listings with similar layout, where you would extract the data needed.
- Click any where on the first listing section
Notice that, we had not selected the whole section properly in the first place. Hence we will need to expand the selection to ensure the whole section has been accurately selected.
- Click “Expand the selection area” to the point where the outlined box includes all the content you want to scrape.
- When prompted, Click “Create a list of items” (sections with similar layout)
- Click “Add current item to the list”
Now, the first item has been added to the list, we need to finish adding all items to the list
- Click “Continue to edit the list”
- Click a second section with similar layout, similarly, expand to include the whole section
- Click “Add current item to the list” again
Now we get all the sections added to the list.
- Click “Finish Creating List”
- Click “loop”， this action will tell Octoparse to click on each section on the list to extract the selected data
Step 4. Select the data to be extracted
Now, we have arrived on the detail page which we would like to capture data from. Click on the specific data to extract
- Click the video info data field “How Tall is Giant”
- Select “Extract text”
- Follow the same steps to extract the other data.
- Click "Save"
Step 5. Rename Data Fields
Rename the any field names if necessary.
Step 6. Starting running your task
- After saving your extraction configuration，click “Next”
- Select “Local Extraction”
- Click “OK” to run the task on your computer.
Octoparse will automatically extract all the data selected. Check the "Data Extracted" pane for the extraction progress.
Step 7. Check the data and export
- Check the data extracted
- Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer
Good job for completing this case tutorial. You can download and run this Example on your own.
Now check out similar case studies:
- Web Scraping Case Study | Security System News
- How to Scrape WordPress Posts
- Web Scraping - Scraping Facebook That Required Login with Octoparse
Or, learn more about related topics: