Entry 36115 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 36115

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/02 [General] UID:1000 Activity:popular

7/2

2005/2/9-10 [Computer/SW/Unix] UID:36115 Activity:low

2/9     What are some good oss search engines that can parse an HTML page
        and spit out the top X relevant keywords?  TIA.
        \_ lynx -dump $URL | sed '/^References$/,$d' | perl -ne\
            'while(s/([a-z]+)//i){print "$1\n";}' | sort | uniq -c | sort -rn
            \_ Pretty cool.  But I think the op was thinking something
               kind of like google.
               \_ Yea, as much I figured, but google or anything remotely
                  of the sort relies on _multiple_ documents linking to each
                  other to establish relevance/importance/etc. If all you have
                  to work with is a single document with no context, there's
                  rather little you can do unless you want to get neck-deep
                  in natural-language issues (well, knee-deep if you hack up
                  something to figure out which words are "unusually" common
                  in this document compared to the language at large, but
                  any serious solution would require some amount of parsing
                  and language understanding). Hence the above silly hack,
                  which I meant largely as a joke.  -alexf
                  \_ What if you can assume that the page authors aren't
                     trying to game the system with off-topic keywords, etc?

2025/07/02 [General] UID:1000 Activity:popular

7/2

You may also be interested in these entries...

2013/8/22-10/28 [Computer/Companies/Yahoo, Industry/SiliconValley] UID:54732 Activity:nil

8/22    http://marketingland.com/yahoo-1-again-not-there-since-early-08-56585
        Y! is back to #1! Marissa, you are SEXY!!!
        \_ how the heck do you only have 225M uniq vis/month when there
           are over 1 billion internet devices out there?
           \_ You think that every single Internet user goes to Y!?
        \_ Tall blonde skinny pasty, not my type at all -former Y!
	...

2012/8/30-11/7 [Computer/SW/Apps, Computer/SW/Unix] UID:54470 Activity:nil

8/30    Is wall just dead? The wallall command dies for me, muttering
        something about /var/wall/ttys not existing.
        \_ its seen a great drop in usage, though it seems mostly functional.
            -ERic
        \_ Couldn't open wall log!: Bad file descriptor
           Could not open wall subscription directory /var/wall/ttys: No such file or directory
	...

2011/11/20-2012/2/6 [Computer/Companies/Apple, Computer/SW/Unix] UID:54237 Activity:nil

11/20   Are there tools that can justify a chunk of plain ASCII text by
        replacing words with words of similar meaning and inserting/removing
        commas into the text?  I received a 40-line plain text mail where
        all the lines are justified on left and right.  Every word and comma
        is followed by only one space, and every period is followed by two
        spaces.  The guy is my kid's karate instructor which I don't think is
	...

2011/10/26-12/6 [Computer/SW/Unix] UID:54202 Activity:nil

10/24  What's an easy way to see if say column 3 of a file matches a list of
       expressions in a file? Basically I want to combine "grep -f <file>"
       to store the patterns and awk's $3 ~ /(AAA|BBB|CCC)/ ... I realize
       I can do this with "egrep -f " and use regexp instead of strings, but
       was wondering if there was some magic way to do this.
       \_ UNIX has no magic. Make a shell script to produce the ask or egrep
	...

2010/11/21-2011/1/13 [Computer/SW/Languages/Web] UID:53988 Activity:moderate

11/21   Lifehacker's recommending Dreamhost as a personal web hosting service.
        Apart from csua, who do you guys use? --erikred
        \_ What do you want to use it for? Do you need CGI or PHP?  My
           brother worked for Dreamhost and said they are unethical. In
           fact, he sued them. This refers to their treatment of customers
           and employees both. I don't know who  or what "Lifehacker" is,
	...