Over 2.5 quintillion (1018) bytes of data are generated every day. Given the massive amount of data, the need for data analysis has never been clearer. It brings impetus to data analysts. Indeed.com reported that the growth rate for this profession had reached more than 4,000 percent. This guide shows you how to get started on how to get familiar with data, especially for those who want to pursue a career in data analytics.
What is the data analyst:
Data analytics is a fundamental part of every industry. As such, a data analyst has a broad career path in different industries.
Some industries have a high demand for data analysts as following:
Market Research analyst: conducts research to analyze the present market landscape. They collect consuming behaviors, buying habits, etc. Then they estimate the product demand to help companies optimize sales. The entry-level salary is $51,000 to $65,000.
Financial analyst: works with financial data to provide models and forecasts. Investment industries, like banking investment, rely highly on data to explore investment opportunities. The entry-level salary is $54,700 to $69,000
Business analyst: turns data into actionable business insights. It requires extensive skills in Excel, Power BI, and SQL. The entry-level salary is $52,700 to $66,000
What tech skills do you need to succeed?
- SQL: Structured Query Language is designed to access, manage, manipulate the database. This is the basic requirement for a data analyst.
- Excel: For lighter and quick data analysis, advanced Excel skills like writing Macro and VBA lookup are required.
- Statistical programming: R, MATLAB, and SAA are statistical languages to explore large datasets and display in fancy graphs for better understanding.
- Data visualization: the ability to present and describe the outcome is essential as well. Tools like Power BI and Tableau are considered as standard analytic tools.
On top of all these, you need to create a data pool that enables you to conduct analysis.
Web scraping can’t replace all analytical skills but complement them. Most of the time, data analysts need to cope with messy data unless you know a better way to locate and extract structured data. Luckily, there is a quick way to get started by using a web scraping tool like Octoparse. There are many other options as well. Here is a quick list as a handy reference
Give a scraping example
Let’s take an example to make sense of data analysis using web scraping, Excel, and Tableau together. The end goal here is to examine the relationship between GDP per capita of a country and its internet user growth rate.
To do this we need data from two data:
- GDP per capita (https://www.cia.gov/library/publications/the-world-factbook/)
- Internet user growth rate ( https://www.internetworldstats.com/top20.htm)
This is a preview of the complete workflow. Octoparse allows you to interact with the webpage and extract desired information via point-and-clicks. The workflow is visualized, and you can edit upon through drag-and-drop.
After you finish setting up the crawler, click the "start extraction" button. Octoparse will work on its magic and get the data for you. The best part is that the extracted data is structured. It means you just save yourself tons of time from cleaning the data as you would before knowing Octoparse.
I scraped the data and put them into spreadsheets, welcome to practice with them.
Excel to look up the values
Next, we need to use Index and Match formats to joint countries and corresponding values (internet user growth-rate and GDP per capita) from two separate spreadsheets.
INDEX (column to return a value from, MATCH (lookup value, column to lookup against, 0))
First, we need to use Match format to look up the “country” from Sheet 2 and return the position from sheet 2.
Then, we use the Index format to look up the “position” and return the corresponding value from Sheet 1
Data1 and Data 2 is the lookup range I named from Sheet1. It is because we are cross-referencing from two sheets. It’s easier to call up rather than typing the cells range.
With this formula, it will look up the Country position (DATA2) returned from Match function, and return the corresponding value from GDP_per_capita (DATA1). After you type in the format, drag the plus sign on the right corner of the cell.
Once we finish matching the values, we are able to visualize the data. Tableau is easy to pick up. We can just drag the desired values to the dashboard. It looks like the following chart.
You can interpret the result:
There is a strong negative correlation between internet growth rate % and GDP per capita for a country. That could mean the faster the internet users increases, the lower the GDP per capita is. It makes sense as High-GDP countries are usually more developed with limited room to grow. Whereas, lower-GDP countries have the full potential to increase internet infrastructure. Thus the overall internet growth rate increases faster than advanced nations.
In conclusion: If you plan to pursue a career in data analysis. you’d better plan out the career path as each industry has a unique definition of the job title. Next, honing the basic skills mentioned above. There are abundant free resources available online. In addition, web scraping can be a bright spot on your resume as it significantly increases the efficiency of data analysis as it saves you the time from data collection and data cleaning.
Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction
Si desea ver el contenido en español, por favor haga clic en: Guía para Principiantes: Cómo Convertirse en Analista de Datos También puede leer artículos de web scraping en el sitio web oficial