Scraping Data from Fifa.comMonday, January 02, 2017 9:56 PM
In this web scraping tutorial we will scrape the men's football rankings from fifa.com with Octoparse.
The website URL we will use is http://www.fifa.com/fifa-world-ranking/ranking-table/men/index.html?intcmp=fifacom_hp_module_associations.
The data fields include Team name, the ranking of the team, total points, previous points, Avg(2016-100%), AVG WGT(2016-100%), Avg(2015-50%), AVG WGT(2015-50%), Avg(2014-30%), AVG WGT(2014-30%), and Avg(2013-20%), AVG WGT(2013-20%),.
You can directly download the task (The OTD. file) to begin collect the data. Or you can follow the steps below to make a scraping task to scrape the men's football rankings from fifa.com. (Download my extraction task of this tutorial HERE just in case you need it.)
Step 1. Set up basic information.
Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click "Next".
Step 2. Enter the target URL in the built-in browser. ➜ Click "Go" icon to open the webpage.
Step 3. Click the "More" button to extract more information
We need to keep clicking on the "More" button at the bottom of the web page to reveal more data about men's football teams.
Click on the "More" button ➜ Click the "Expand the selected area" button ➜ Choose "Loop click in the element" to create a loop automatically ➜ Click "Save".
Step 4. Move your cursor over the section within the table, where you would extract data about these football teams.
Click the first football team ➜ Click the "Expand the selected area" button to select the whole row ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".
Then the first football team has been added to the list. ➜ Click "Continue to edit the list".
Click the second football team ➜ Click the "Expand the selected area" button to select the whole row ➜ Click "Add current item to the list" again. Now we get only 50 teams with similar layout. ➜Then we locate to the team ranked 51st ➜ Click the No.51st football team ➜ Click the "Expand the selected area" button to select the whole row. Now we get all the teams from the table. ➜ Click "Finish Creating List" ➜ Click "loop" to process the list for extracting the elements in each row.
Step 5. Extract the rankings information from the table.
Click the team name ➜ Select "Extract text". Other contents can be extracted in the same way.
All the content will be selected in Data Fields. ➜ Click the "Field Name" to modify. Then click "Save".
Step 6. Check the workflow.
Now we need to check the workflow by clicking actions from the beginning of the workflow.
Go to the webpage ➜ The first Cycle Pages box (The "More" button has been selected correctly)➜ Click to Paginate (Tick "AJAX Load" checkbox and set a timeout of 4 seconds) ➜The first Loop Item box (All the teams have been selected) ➜ Extract Data (All the data fields are extracted correctly).
Step 7. Click "Save" to save your configuration. Then click "Next" ➜ Click "Next" ➜ Click "Local Extraction" to run the task on your computer. Octoparse will automatically extract all the data selected.
Step 8. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!