Nerdabout: the art and craft of technology

User Agents

Simple Scraping With Lynx

November 24, 2009

A scraper is simply an automated Web browser.

By automated, we mean that a scraper can be driven either by you directly (usually via the command line) or by a script (or batch file). Scraping at its most basic, is simply the act of running a program that automatically retrieves information from a remote Web page or Web service.

If we accept that simple definition of scraping, then the very simplest of all scrapers, is Lynx.

Canada Lynx (Lynx canadensis)

Continue reading >

How To Use Lynx to Quickly List All the URLs On A Web Page

November 17, 2009

When used with the --dump option, Lynx parses an HTML document and builds a list of hyperlinks in that document.

If you have tried either of our previous Lynx recipes, then you may have noticed that, in addition to parsing out the textual content from Web pages. Lynx prints all of the URLs referenced in a document, as a numbered list at the bottom of the output.

how to use Lynx to list all the URLs in
an HTML page

Continue reading >

Lynx As A Simple (but Fast!) HTML Parser

November 10, 2009

There are numerous useful things that can be done with Lynx. For instance, you can quickly scrape a weather report for any US zip code.

Another little-known use for Lynx, is to preview HTML documents.

howto preview HTML from Maruku, using
Lynx

Continue reading >

Using Lynx to Scrape a Weather Report

November 03, 2009

Lynx As A Simple HTML Parser

One great and often-overlooked feature of Lynx is its --dump option.

lynx --dump <url> dumps the text content of the Web page at URL, followed by a numbered list of URLs that were referenced in that page.

how to use Lynx to scrape a weather forecast

Continue reading >

The Nerdabout bloggers are (from left to right) Elizabeth Suman, John Son, Heather Quinlan, Joanna Burgess, Noah Sussman and Dave Caputo.
nerdabout group photo

@Nerdabout on Twitter

Please wait while our tweets load…

Or visit the Nerdabout's Twitter.

Advertisement

Nerdabout's TumbleBlog

Currently listening to…

David Caputo is rockin out to…

Noah Sussman listens to…

Nerdabout on Flickr

Joanna Burgess on Flickr


Noah Sussman on Flickr


Dave Caputo on Flickr


Heather Quinlan on Flickr


Elizabeth Suman on Flickr


John Son on Flickr

SITE SEARCH
SUBSCRIBE TO OUR NEWSLETTERS
CREDITS Photos: iStockphoto | Getty Images | AP | Wikipedia | DCL |
DISCOVERY SITES Discovery Channel / TLC / Animal Planet / Discovery Health / Science Channel / Planet Green / Discovery Kids / Military Channel /
Investigation Discovery / HD Theater / Turbo / FitTV / HowStuffWorks / TreeHugger / Petfinder / PetVideo / Discovery Education
SHOP Toys / Games / Telescopes / DVD Sets / Planet Earth DVD Sets / Gift Ideas
CUSTOMER SERVICE Viewer Relations / Free Newsletters / RSS /
CORPORATE Discovery Communications, Inc / Advertising / Careers @ Discovery / Privacy Policy / Visitor Agreement
ATTENTION! We recently updated our privacy policy. The changes are effective as of Tuesday, October 30, 2007. To see the new policy, click here. Questions? See the policy for the contact information.