|
5/24 |
2006/1/14-17 [Computer/SW/Unix] UID:41375 Activity:low |
1/13 Does anyone know a free server side HTTP parser? (In C/C++) \_ http://www.google.com/search?q=HTTP+parser \_ Yeah, none of those seem to do what I need. \_ speaking of which, does anyone know of commandline unix util that will strip all html from a text file? \_ lynx \_ sed \_ outsource it. \_ Try one of these (html2text): Perl: http://www.greenend.org.uk/rjk/2000/10/html2text.html C++: http://www.mbayer.de/html2text \_ http://www.zazzybob.com/bin/striptags.sed.html \_ What do you mean by HTTP parser? Are you looking for a request parser or something else? Maybe libcurl will work for you. \_ Yes, a request parser. Looking at libcurl now, it may work for me, but I haven't yet found exactly the right code. |
5/24 |
|
www.google.com/search?q=HTTP+parser Tuning the HTTP parser - The Polipo Manual 33 Tuning the HTTP parser. As a number of HTTP servers and CGI scripts serve incorrect HTTP headers, Polipo uses a lax parser by default, meaning that ... Home page of XML Pull Parser (XPP) A fast parser using a pull-based approach instead of SAX. It does not read DTDs and is optimized for small documents like SOAP. |
www.greenend.org.uk/rjk/2000/10/html2text.html Bugs I've only bothered to get this program working as far as is necessary for the documents I currently want to convert: therefore it's entirely possible that it won't work very well for your documents. either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. You should have received a copy of the GNU General Public License along with this program; |
www.mbayer.de/html2text -> www.mbayer.de/html2text/ Changes log html2text is a command line utility, written in C++, that converts HTML documents into plain text. Each HTML document is loaded from a location indicated by a URI or read from standard input, and formatted into a stream of plain text characters that is written to standard output or into an output-file. The input-URI may specify a remote site, from that the documents are loaded via the Hypertext Transfer Protocol (HTTP). The program is able to preserve the original positions of table fields, allows you to set the screen width (to a given number of output characters), and accepts also syntactically incorrect input (attempting to interpret it "reasonably"). Boldface and underlined text is rendered by default with backspace sequences (which is particulary usefull when piping the program's output into "less" or an other pager). All rendering properties can largely be costomised trough an RC-file. |
www.zazzybob.com/bin/striptags.sed.html HTML tag, remove it s/< *>//g #branch if a successful substitution was made t loop } /</ { #if just an opening tag is found, append the #next line of input into the pattern space N b loop } # print the rest! |