Berkeley CSUA MOTD:Entry 41375
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2025/05/24 [General] UID:1000 Activity:popular
5/24    

2006/1/14-17 [Computer/SW/Unix] UID:41375 Activity:low
1/13    Does anyone know a free server side HTTP parser? (In C/C++)
        \_ http://www.google.com/search?q=HTTP+parser
           \_ Yeah, none of those seem to do what I need.
        \_ speaking of which, does anyone know of commandline unix util that
           will strip all html from a text file?
           \_ lynx
           \_ sed
           \_ outsource it.
           \_ Try one of these (html2text):
              Perl: http://www.greenend.org.uk/rjk/2000/10/html2text.html
              C++: http://www.mbayer.de/html2text
           \_ http://www.zazzybob.com/bin/striptags.sed.html
        \_ What do you mean by HTTP parser? Are you looking for a request
           parser or something else? Maybe libcurl will work for you.
           \_ Yes, a request parser.  Looking at libcurl now, it may work
              for me, but I haven't yet found exactly the right code.
2025/05/24 [General] UID:1000 Activity:popular
5/24    

You may also be interested in these entries...
2013/10/24-2014/2/5 [Academia/Berkeley/CSUA/Motd, Computer/SW] UID:54746 Activity:nil
9/26    I remember there was web version of the motd with search function
        (originally due to kchang ?).  The last time I used it it was hosted
        on the csua website but I can't remember its url (onset of dementia?)
        now. Can somebody plz post it, tnx.
        \_ http://csua.com
           \_ for some reason I couldn't log in since Sept and the archiver
	...
2013/10/28-2014/2/5 [Computer/SW/Database] UID:54751 Activity:nil
10/28   Oracle software to blame for Obamacare website debacles:
        http://www.forbes.com/sites/theapothecary/2013/10/14/obamacares-website-is-crashing-because-it-doesnt-want-you-to-know-health-plans-true-costs
        \_ Larry Ellison is a secret Tea Party supporter.
           Most of this article is bunk, btw. Boy are the Republicans
           getting desperate.
            \_ Umm, no.  Larry Ellison is a not so secret fascist.
	...
2013/12/13-2014/2/5 [Computer/SW/Languages/Web] UID:54757 Activity:nil
12/17   http://axonflux.com/5-quotes-by-the-creator-of-php-rasmus-lerdorf
        Why I love PHP.
12/17
 _________________________________________
/ You will pay for your sins. If you have \
| already paid, please disregard this     |
	...
Cache (360 bytes)
www.google.com/search?q=HTTP+parser
Tuning the HTTP parser - The Polipo Manual 33 Tuning the HTTP parser. As a number of HTTP servers and CGI scripts serve incorrect HTTP headers, Polipo uses a lax parser by default, meaning that ... Home page of XML Pull Parser (XPP) A fast parser using a pull-based approach instead of SAX. It does not read DTDs and is optimized for small documents like SOAP.
Cache (546 bytes)
www.greenend.org.uk/rjk/2000/10/html2text.html
Bugs I've only bothered to get this program working as far as is necessary for the documents I currently want to convert: therefore it's entirely possible that it won't work very well for your documents. either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. You should have received a copy of the GNU General Public License along with this program;
Cache (916 bytes)
www.mbayer.de/html2text -> www.mbayer.de/html2text/
Changes log html2text is a command line utility, written in C++, that converts HTML documents into plain text. Each HTML document is loaded from a location indicated by a URI or read from standard input, and formatted into a stream of plain text characters that is written to standard output or into an output-file. The input-URI may specify a remote site, from that the documents are loaded via the Hypertext Transfer Protocol (HTTP). The program is able to preserve the original positions of table fields, allows you to set the screen width (to a given number of output characters), and accepts also syntactically incorrect input (attempting to interpret it "reasonably"). Boldface and underlined text is rendered by default with backspace sequences (which is particulary usefull when piping the program's output into "less" or an other pager). All rendering properties can largely be costomised trough an RC-file.
Cache (207 bytes)
www.zazzybob.com/bin/striptags.sed.html
HTML tag, remove it s/< *>//g #branch if a successful substitution was made t loop } /</ { #if just an opening tag is found, append the #next line of input into the pattern space N b loop } # print the rest!