Thursday, April 21, 2016


Popular Social Networking Sites, most of which have cropped up in the last decade, are generating enormous valuable data that companies are interested in the most. The number of social media active users around the world and their interaction with companies on these social media platforms have an increasing impact on the activities and development of companies. Companies gain insight from their customers' behavior and opinions. 


Before identifying which pieces of data matter, companies need to pull the data out from social media sites. However, it's not easy to obtain the social media data as most of these sites have complex structures and anti scraping mechanisms to protect their websites from being gathered user profiles and other personal data. For example, a user will be blocked from doing things on certain site if he/she access the site and get site data too frequently. Admittedly, a right tool can make a hard job easy. There are many web data extraction tool available that can gather site data, but not all of them can deal with social media sites. 




Here, we recommend you to try out Octoparse, a social media data extractor. It can mimic human browsing behavior and collect social media data by rotating anonymous HTTP proxy servers. Octoparse cloud service applies more than 500 3rd party proxies for automatic IP rotation. For free edition, you can input several IP addresses in Octoparse manually.



Besides, you can set time interval to stably collect social media data to avoid being blocked by these sites which restrict site access if one IP address visit their sites too frequently. To exactly extract data you want, you may need two tools - XPath and RegEx, to match specific HTML elements.




There are several data export formats available for you to store the data extracted - your own machine (Excel, Txt, HTML or your database) or in the cloud. Moreover, you can extract data from our APIs.









 Author: The Octoparse Team




