undefined

Scrape cryptocurrencies information from Yahoo Finance

Thursday, January 5, 2017 3:07 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

A cryptocurrency is a digital or virtual currency that is secured by cryptography, which makes it nearly impossible to counterfeit or double-spend. Many cryptocurrencies are decentralized networks based on blockchain technology—a distributed ledger enforced by a disparate network of computers. 

Cryptocurrency players need to monitor price fluctuations in currencies as prices change in seconds. Octoparse can schedule the scraping to run instantly to help update the information in time.

In this tutorial, we are going to show you how to scrape cryptocurrency info from Yahoo Finance.

 

For Yahoo Finance, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need to do is type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templatesexternal-link-symbol-1.png

 

To follow through, you may want to use this URL in the tutorial:

https://finance.yahoo.com/cryptocurrencies?count=50&offset=0


We will scrape data such as the Symbol and Name from the cryptocurrency chart with Octoparse.

Cryptocurrency chart

 

 

 

 

 

 

 

Here are the main steps in this tutorial:  [Download task file here]

1. Go to Web Page - to open the targeted web page

2. Auto-detect web page data - to create the workflow

3. Extract data - to modify the data fields

4. Modify XPath of Pagination - to fix endless scraping

5. Start extraction - to run the task and get data

 

1. Go to Web Page - to open the targeted web page

  • Enter the page URL on the home screen and click Start to create a new task

 

2. Auto-detect web page data - to create the workflow

    • Choose Auto-detect web page data and wait for detection to complete
    • Click Switch auto-detect results on the Tips panel until you see the table information be selected
    • Uncheck Add a page scroll

 

  • Click Create workflow
  • Click on Click to Paginate action
  • Extend the AJAX timeout to 7-10s
  • Click Apply to save

 

3. Extract data - to refine the data fields

  • Switch to vertical view
  • Rename fields by double-clicking each field name
  • Delete the fields by clicking the 5.png

 

Tip!

A field name can only include letters, numbers, and "_". Also, it must start with a number. 

 

We need to modify the Xpath for some fields to make the data scraping more precisely. 

(1) Price: //fin-streamer[@data-field="regularMarketPrice"]

(2) Marketcap: //fin-streamer[@data-field="marketCap"]

  • Click the ... -> Customize Xpath
  • Paste the Xpath provided above and click Apply to save

 

 

4. Modify XPath of Pagination - to fix endless scraping

The auto-generated XPath of Pagination needs to be modified; otherwise, the scraping cannot be stopped. Octoparse will keep scraping the last page. Check out details about this issue here.

  • Click on Pagination
  • Input the new XPath //button[not(@disabled)]//span[text()="Next"]
  • Click Apply to confirm

 

5. Start extraction - to run the task and get the data

  • Click 16.png
  • Click 17.png on the upper left side
  • Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only). You can also schedule a task to update the data frequently

 

You can export the result data in provided formats such as EXCEL, CVS, JSON or in your database.

 

Here is the sample output.

yahoo finance demo data

 

Was this article helpful? Contact us at any time if you need our help!

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline