Blog > Post

7 Web Mining Tools Around the Web

Wednesday, September 07, 2016

Web mining tool is computer software that uses data mining techniques to identify or discover patterns from large data sets. Data is money in today’s world, but information is huge, diverse and redundant. Having the tools for mining is going to be a method to help you get the right information.


In this post, I’m going to make a list that complies some of the popular web mining tools around the web.
There are 3 areas of web mining: web content mining, web usage mining and web structure mining.

. Web Content Mining
Web content mining is a process of collecting useful data from websites. This content includes news, comments, company information, product catalogs, etc.

. Web Usage Mining
Web usage mining is a process of identifying or discovering patterns from large data sets. And these patterns enable you to predict user behaviors or something like that. They are two types of techniques for patterns: pattern analysis tool and pattern discovery tool.

. Web Structure Mining
Web structure mining is also known as link mining. It is a process to discover the relationship between web pages linked by information or direct link connection.


7 Web Mining Tools Around the Web

1. R

R is a language or a free environment for statistical computing and graphics. It has been made accessible from scripting languages like Python, Ruby, Perl, etc.

Supported Operating Systems: UNIX platforms, Windows, MacOS
Area of Web Mining: Web Usage Mining


2. Octoparse

Octoparse is a simple but powerful web data mining tool that automates web data extraction. It allows you to create highly accurate extraction rules. (You know I will definitely mention our tool.) Crawlers run in Octoparse are determined by the rules configured. The extraction rule would tell Octoparse: which website is to be open; where is the data you plan to crawl; what kind of data you want, etc. 

Supported Operating Systems: Windows XP/7/8/10
Area of Web Mining: Web Content Mining


3. Oracle Data Mining (ODM)

Oracle Data Mining is a data mining software by Oracle. Oracle Data Mining is implemented in the Oracle Database kernel, and mining models are first class database objects. Oracle Data Mining processes use built-in features of OracleDatabase to maximize scalability and make efficient use of system resources.

Supported Operating Systems: Microsoft Windows
Area of Web Mining: Web Usage Mining


4. Tableau

Tableau offers a family of interactive data visualization products focused on business intelligence. Tableau allows for instantaneous insight by transforming data into visually appealing, interactive visualizations called dashboards. This process takes only seconds or minutes rather than months or years, and is achieved through the use of an easy to use drag-and-drop interface.

Supported Operating Systems: Mac, Microsoft Windows
Area of Web Mining: Web Usage Mining


5. Scrapy

Scrapy is an open source and framework for collect data from websites. It is written in Python and you can write the rules to extract web data.

Supported Operating Systems: Linux, Windows, Mac and BSD
Area of Web Mining: Web Content Mining


6. HITS algorithm

HIS, short for Hyperlink-Induced Topic Search, also known as hubs and authorities, is a link analysis algorithm that rates Web pages.

In the HITS algorithm, the first step is to retrieve the most relevant pages to the search query. This set is called the root set and can be obtained by taking the top pages returned by a text-based search algorithm. A base set is generated by augmenting the root set with all the web pages that are linked from it and some of the pages that link to it. The web pages in the base set and all hyperlinks among those pages form a focused subgraph.

Area of Web Mining: Web Structure Mining


7. PageRank Algorithm

PageRank Algorithm is the Popular Web structure Mining Algorithm.

PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references.

Area of Web Mining: Web Structure Mining


Any tips for me?

If you have tips for me about this list, please drop me a message HERE.

Thank you in advance for your contribution to this list!


Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download