undefined
Blog > Web Scraping > Post

Stock Market Analysis using Web Scraping in 2020

Wednesday, January 13, 2021

Stock Market Data Entry

Investment firms nowadays are in the race of developing sophisticated algorithms for stocks trading. Whether it is about stock price prediction, stock market sentiment analysis or Equity research, they need a large volume of accurate data. It is often the case that they have the capital to hire a troop of developers. For independent researchers to predict the stock market, there is an affordable method to obtain the data at scale effortlessly. 

 

Table of Contents

Stock Market Data Entry

Data Scraping Preparation

Data Scraping Case Study

 

In this tutorial, I will show you how to extract up-to-date stock data for further actions. 

 

Data Scraping Preparation:

  1. This method doesn’t require coding. You can extract valuable information from stock market websites without a tech background. 
  2. We need to use a web scraping tool, it would be best if you have Octoparse installed on your computer. Check out the video below if you are new to the tool. 
  3. Read this article to fetch finance data step by step Scrape information from Yahoo Finance

 

ai web scraping extract with auto detection

 

 

 

Case Study: Stock Market Data Scraping

Let’s dive right into it.

 

scrape yahoo finance data to excel

 

 

We will scrape the Balance Sheet from the Bank of America stock from Yahoo! Finance as an example. With the balance sheet on hands, you can build up a database altogether with the historical stock price. With this data, you could further build-up algorithms/machine learning that correlates numbers with prices of a stock. When you scale up the number of stocks, you have a bigger pipe to train your AI's model.

The URL we are going to need is https://finance.yahoo.com/quote/BAC/balance-sheet?p=BAC

 

Step 1: Create a new project

Click “+ Task” under Advance Mode. Enter the URL into the box and click “Save URL

This will bring to the Bank of America Stock Market with Octoparse built-in browser.

The data is presented in the form of Table cells. As a result, the bot needs to scrape by table rows. To clarify what I mean, we can open up Chrome developer tools and inspect the website source. The whole table is constructed with <tr>, and <tr> consists of multiple <td>s which stands for the data of a row. The data we are going to extract is stored inside each <td>. It makes sense that the bot follows the logic of source code, and extract the information by rows.

 

html structure for table

Web Data HTML Structure

 

 

Step 2: Select data you want to scrape

Next, we need to tell the bot what data we want to obtain. Click any number from the table cell. The bot discovers other numbers from the same column. As I mentioned earlier, we need to follow the logic of the source code and extract by rows. In this case, click “TR” at the bottom of the Action Panel. Now Octoparse finds the first row. This is great! Choose “Select All Sub-Element”,  then choose “Select All” to proceed. 

 

octoparse selecting web data

Octoparse Selecting Web Data

 

Step 3: Confirm your selection

 Now all elements have been selected successfully. Choose “Extract Data in the loop” command to continue. 

 

Step 4: Start Scraping

Now we finish built the crawler! Click “Start Extraction” and choose “Local Extraction” to run the task. Notice that “Local Extraction” is to run the crawler on your own computer. Unlike Cloud Extraction that has multiple parallel extractions distributed into different servers, Local Extraction only taxes the local resource and the speed gets affected by internet and hardware.  It is likely to get overloaded if you have concurrent running tasks. Therefore, Cloud Extraction is an optimal choice for large-scale extractions. 

 

Step 5: Check the data you scraped

The data you scraped should be like this. You can pick a preferred format to export the data.  

 

financial data scraped by octoparse

Financial Data Scraped by Octoparse

 

 

Data Scraping for Market Analysis

Now we have Bank of America Balance sheet from 2015 to 2018, but how can you use it in a market analysis?

 

I am not an expert in financial investment, and this blog does not provide financial advice.  Hopefully, it can give you an idea to research for companies worthy to invest. 

 

When it comes to an investment opportunity, it is a fundamental step to analyze how one company performs by examining the balance sheet. It is because a balance sheet is the financial statement of one company’s assets, debts and return. If the current assets are greater than the liabilities, this means the company can cover the short-term debts and likely to hold a favorable position. When one company grows at a steady pace over the years, it is more likely your investment is in good hands. However, the balance sheet of a bank is much more complicated than that of one company. Bank investment tends to be riskier yet lucrative. It is reasonable to keep tabs on the bank’s financial performance for a comprehensive decision making.

 

Author: Ashley

Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction

Artículo en español: Análisis del Mercado de Valores Utilizando Raspado Web
También puede leer artículos de web scraping en el Website Oficial

  

Related resources

Scrape information from Yahoo Finance

3 Ways to Scrape Financial Data WITHOUT Python

Scrape cryptocurrencies information from Yahoo Finance

Scrape Stock Info from Bloomberg

Stock Market Analysis using Web Scraping in 2021

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline