Entry 37411 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 37411

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/11 [General] UID:1000 Activity:popular

7/11

2005/4/29-5/1 [Computer/SW/Languages/Perl] UID:37411 Activity:moderate

4/29    I need help getting information off a web site.  A page presents
        information about an item in locations spread througout the page.
        Each page presents information about one item. What is a quick and
        easy way to go through several pages, capture all the information
        related to each item, and put them into a spreadsheet with a unique
        index?  I think this might be possible by scraping the screen, but how
        does one go about this from a Windows workstation (with no app
        servers)?  Would it be easier to record a bunch of copy and paste
        actions with automation / macro recording software and replay the
        macro?
        \_ On a windows machine with dotnet you can just simply write
           the whole thing in a couple lines of C Sharp. They've even
           got a snarf util in the O'Reilly book.
        \_ perl.  -tom
           \_ Typical Tom answer.  Tom, when you dont know much about
              something, why don't you leave it for others to answer?
              \_ what do you mean?  perl is a fine solution.  -tom
                 \_ WWW:Mechanize is a valid suggestion.  "Use perl" is a
                    step away from "write a prgrogam."  Sad you can't see
                    this.
                    \_ If you know anything about perl, you know
                       there's more than one way to do it.  I wouldn't
                       use WWW::Mechanize, though that's certainly a
                       reasonable approach.  -tom
           \_ more specifically, WWW::Mechanize is useful -dwc
        \_ Python's urlib module does this quite easily also. -scottyg