How Web Scraping Helps Hedge Funds Gain Competitive Edge

6 min read

It has become impossible to hide previously hidden data. Many advanced tools can now extract new data or even scrape them from various sources on the Internet. More in-depth analytics have enabled hedge funds to exploit new and growing important alpha sources.

Around the beginning of the year, a study was collaborated by Greenwich Associates and Thomson Reuters to offer captivating knowledge on the tremendous changes in the investment research landscape. The title, “The Future of Investment Research,” contains a bunch of contributory factors that support this qualitative change and has a few specifically informative observations concerning alternative data.

The significance of alternative datasets had previously been reviewed; these include geolocation data and satellite imagery, they are proving to hedge funds there are loads of untapped alpha in these data sets for institutions ready to invest their money in its acquisition, thereby allowing them to take advantage of the vital informational benefit above their competition.

From the Greenwich/Thomson Reuters study, it is clear that the average investment firm invests around $900,000 on alternative data annually, while its alternative data has an estimate of annual industry budgets presently standing at about $300 million. This is nearly twice more than the previous year. Based on this data, web-scraped data has been identified as the most popular data adopted by investment professionals.

In the web scraping process (regarded as ‘data scraping’, ‘spidering’, or ‘automated data collection’), the software is used to pull in data that are potentially valuable from online sources. Meanwhile, for hedge funds, having to pay companies to get these particular data can assist them in making more intelligent, reasonable, and informed investment decisions even before others and their competitors.

Quandl is an example of such a company and is now the center of attraction in the alternative data revolution. What this Canadian company does is to scrap the web to compile datasets, or collaborate with domain experts, and then offer the data for sale to hedge funds as well as other customers that show interest. 

There are many forms of web-scraped data as reported by Greenwich, which include insights from expert networks, product pricing, web traffic data, and search trends.

An instance is how the web traffic from Alexa.com is scraped by Goldman Sachs Asset Management, which was able to recognize a skyrocketing increase in visits to the HomeDepot.com website. The asset manager was able to procure the stock before the company increased its outlook and reap the benefits when its stock finally appreciates.

Among its various strategies, an alternative data company, Eagle Alpha scrapes pricing data from big retailers; and this has proven to be valuable in the provision of a directional indicator for consumer product sales. For example, when data are scraped from electronic websites in the United States, the company can observe that GoPro products are decreasing in demand and thus, the correct conclusion is that the manufacturer of the action camera will probably miss the 2015Q3 targets. Over 68 percent of the recommendations were to buy the stock two days before the underperformance of GoPro was eventually declared publicly.

The value of social media data cannot be understated. It is the largest dataset to help us understand social behavior and companies are actively scraping these data to uncover its hidden value.

According to a recent report by Bloomberg, “Twitter stream provides very large and wholesome alternative data sets, particularly for alpha-seeking researchers,” Bloomberg’s newly launched news service takes in finance-related Twitter feeds and scans for valuable news tweets for investible insights. To further emphasize

the value of social media data, it was found that “movements of Dow Jones can be predicted by collective mood states obtained directly from the large-scale feeds from Twitter, with an accuracy of about 87.6 percent.

EY released a survey in November 2017 and discovered that social media data were being used or to be used by more than a quarter of hedge funds in their investment strategies within 6-12 months. The providers personally get the data from sources like Facebook, YouTube, and Twitter, or sometimes via a web scraping tool such as Octoparse.

When popular websites that can be easily accessed, like Amazon and Twitter, become actively scrapped. Hedge funds will be propelled to regularly seek new and special data sources to bring to light, precise trading signals to remain on top of their game. For this reason, there will be no end to how deeply firms can delve. The dark web may even be included.

Scraped data may even include customer or individual data, especially the ones that can be scraped from different sources like criminal records, flight logs, phone directories, and electoral registries. Based on the arguments revolving around issues with personal data that gained traction this year, particularly with the emergence of the Cambridge Analytica scandal of Facebook, no doubt scrappers will soon meet stiffer resistance from promoters of more stringent laws regarding data privacy.

Tammer Kamel, CEO and Founder of Quandl, has recently stated that there is a “healthy paranoia” amongst different organizations to eliminate personal information before the sales of his company’s alternative datasets, and that particular misstep can mount severe consequences for an operational fund in the space. In any case, adequate regulatory protection is paramount at this level. The data type acquired by hedge funds is not particularly devoid of personal details. This implies too much information regarding an individual can be compiled as we don’t yet have a set of governing standards in place.

Last year, Hedge Fund Law Report stated that “even though e-commerce has relatively matured, automated data collection is yet to be legal. In as much as many cases have surfaced to analyze scraping disputes concerning the federal and different state statutes, there is no particular law, and previous decisions are considered fact-specific.” Realistically, a few complicated legal cases support the scrapers…

Moreover, the federal Computer Fraud and Abuse Act, represented as CFAA of the United States, has been known as a statute that enforces liability on those who deliberately gain access to unauthorized computers or go beyond their authorized access, to retrieve… information from computers that are protected.” Due to this, many companies specify the prevention of third-party attempting to gather data. In 2017, there was this popular case – HiQ Labs vs LinkedIn, where LinkedIn made use of CFAA for the argument that HiQ broke its term of use by scraping data with the use of bots from public user profiles. Eventually, LinkedIn was ordered legally to do away with the technology that was hindering HiQ Labs from performing the scraping operation, because authorization is not required to gain public access to accessible profile pages.

It must also be mentioned that web scraping is a double-edged sword and is not always employed for the greater good. Cybercriminals can ruin a company’s reputation if the criminals make use of it, for instance, to pilfer copyrighted content. Since there is no way to determine the intention of those behind the deployed bots, determining the malicious intents may be very difficult from the good ones.

Furthermore, if web-scraping bots become more sophisticated, they will be able to make their way even more into web applications and APIs. An instance is the use of proxy IPs – this will even make the malicious attack more successful.

Life structures of an attack (source)

Even with the way these issues are manifesting, Hedge funds will probably not stop adopting web scraping, particularly, if there are more opportunities like accessing fresh and more profitable investment opportunities, and regulations are still being put in place. In reality, according to a statistic, about 46 percent of traffic over the Internet is a result of web-scraping bots. When the web is scrapped for mentions of a certain company, hedge funds can be provided with a very clear idea of its customer perception and its outlook.

With more proof of the importance of web scrapping to the entire use in the hedge fund industry, legitimately or not, it appears that our online world is fully prepared to undergo more analysis more regularly and closely than ever.

Hot posts

Explore topics

Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today


Related Articles