undefined

Scraping Online Dictionary - Merriam-Webster.com

Wednesday, December 28, 2016 2:40 AM

 

Octoparse enables you to scrape the online dictionary into an organized list by entering a list of words. It’s very easy to use and could get the definition and examples of the word you want by using a Loop mode for entering a text list.

 

In this tutorial, I will show you how to scrape definition of some words from merriam-webster.com.

The website URL we will use is www.merriam-webster.com.

The data fields include the word, its characteristic, its definition and example.

 

You can directly download the task (The OTD. file) to begin collect the data. Or you can follow the steps below to make a scraping task to scrape word’s definition.

(Download my extraction task of this tutorial HERE just in case you need it.)

 

Step 1. Set up basic information.

Click “Quick Start” ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click “Next”.

 

Step 2. Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.

(URL of the example: https://www.merriam-webster.com)

 

Note: If the URL keeps loading while the content of the website has fully loaded, you can click the multiplication sign (×) to prevent it from loading.

 

Step 3. Create a loop for entering texts.

Drag a "Loop Item" into the Workflow Designer and then choose "Text list" in the "Loop mode".

Enter the text or a list of text you want to scrape in the "Text list" box and Click "Save".

You can see the list of text will be shown on the “Loop Item” box.

 

You need to click the search bar where you enter the text in the built-in browser, and choose the “Enter text value”option.

 

Drag the "Enter Text" box into the "Loop Item" box under Workflow Designer. And then tick "Use the text in Loop Item to fill in the text box". Click "Save". So you could see that the program will enter the text one by one. 

 

Step 4. Get the search results.

Click the “Search” button of the website ➜ Choose “Click an item”.

 

Step 5. Extract the words’ definitions.

Now we are on the search result page of the first word “socialism”.  

Extract the word. ➜ Click the word ➜ Select “Extract text”. Other contents can be extracted in the same way.

All the content will be selected in Data Fields. ➜ Click the “Field Name” to modify. Then click “Save”.

 

Step 6. Check the workflow.

Now we need to check the workflow by clicking actions from the beginning of the workflow.

Go to the webpage ➜ Loop Item box ➜ Enter text ➜ Click Item ➜ Extract Data.

 

Step 7. Click “Save” to save your configuration. Then click “Next” ➜ Click “Next” ➜ Click “Local Extraction” to run the task on your computer. Octoparse will automatically extract all the data selected.

 

Step 8. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.

Note:

The correct use of XPath is the key to extract data in Octoparse. If you find anything missing values, please go back to your workflow and go through it from the beginning and modify the Xpath expressions for the data fields. Check out this article to modify the XPath expressions of a data field.

Knowing a little knowledge about XPath could help you solve a lot of problems in using Octoparse. The tutorials or FAQs below could help you pick up XPath quickly.

How to use Firebug and Firepath?

Getting started with XPath 1

Getting Started With XPath 2

Modify XPath Manually in Octoparse

 

You can use Cloud Extraction to speed up the extraction, our cloud servers will collect the data for you within a short time. Go to the pricing page to get more information about our subscription plans and extraction services.  http://www.octoparse.com/pricing

 

Author: The Octoparse Team

 

 

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today!

 

Author's Picks

 

Be the Best Junior Management Consultant: Skills You Need to Succeed

How to Get Data from the Web

A Must-Have Web Scraper for Data Comparison Software - Octoparse

10 Best Free Tools for Startups - Octoparse

Octoparse Data Export API

 

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close