Berkeley CSUA MOTD:Entry 21708
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2024/11/23 [General] UID:1000 Activity:popular
11/23   

2001/7/3-4 [Computer/SW/Unix] UID:21708 Activity:high
7/3     Isn't fgrep supposed to be faster then grep?  On SunOS 5.7, I did some
        timing by using grep and fgrep to look for a non-existing string in
        /usr/include/*.h, and fgrep seems to take twice as long as grep.
        \_ http://www.von-bassewitz.de/uz/unixhier.html
        \_ fgrep is just grep with string searches instead of regexp searches.
           I would be suprised if they didn't use the exact same code path.
                \_ I would be very surprised if they did.
        \_ ``In  addition,  two  variant  programs  egrep and fgrep are
           available.  egrep is the same as grep -E.   fgrep  is  the
           same as grep -F.  zgrep is the same as grep -Z.''
                \_ This is GNU grep. Don't erase my comments chump.
        \_ It's faster for very large strings or large numbers of strings.
           For normal usage, it's not much different:
                http://www.elementkjournals.com/sun/9704/sun9742.htm
           Once upon a time agrep was fastest, but GNU grep may have caught
           up since then:
                http://www.tgries.de/agrep
                http://webglimpse.org/pubs/TR94-17.pdf
2024/11/23 [General] UID:1000 Activity:popular
11/23   

You may also be interested in these entries...
2011/10/26-12/6 [Computer/SW/Unix] UID:54202 Activity:nil
10/24  What's an easy way to see if say column 3 of a file matches a list of
       expressions in a file? Basically I want to combine "grep -f <file>"
       to store the patterns and awk's $3 ~ /(AAA|BBB|CCC)/ ... I realize
       I can do this with "egrep -f " and use regexp instead of strings, but
       was wondering if there was some magic way to do this.
       \_ UNIX has no magic. Make a shell script to produce the ask or egrep
	...
2011/3/12-4/20 [Consumer/CellPhone, Computer/HW/Laptop] UID:54057 Activity:nil
3/12    I am curious what others think of tablets like iPad. They don't seem
        useful to me, but I use my computer for more than web browsing,
        Facebook, and Twitter. Why would I buy one instead of a laptop?
        They seem like a disabled laptop to me, but at a higher price.
        \_ You are most likely a coder.  iPad is not for coders.  They are
           what you get your non-technical friends.  Or musicians.  Look at
	...
2011/2/6-19 [Computer/Networking] UID:54028 Activity:nil
2/5     hmm.
$netstat -at | grep LISTEN
tcp        0      0 *:43300                 *:*                     LISTEN
        \_ this is an sshd
tcp        0      0 *:49416                 *:*                     LISTEN
tcp        0      0 *:36201                 *:*                     LISTEN
	...
2009/1/25-29 [Computer/SW/Unix] UID:52456 Activity:low
1/23    may the awesome rootstaff please apt-get install:
        colorgcc, colordiff, colormake
        thanks!
        \_ Done.  In the future email such requests to root@csua for
           faster response
           \_ totally, understood.  Altho yeah I do like asking for
	...
2008/10/14-20 [Computer/SW/Languages/Misc, Computer/SW/Languages/Web] UID:51527 Activity:nil
10/14   2 apache 2.0.52 servers running on Linux boxes.  Identical httpd.conf
        files (except for ServerName).  But on one, if a CGI script takes
        longer than 300 seconds, it times out.  The other, not.  Why is that?
        \_ Perhaps network equipment configuration. Or try comparing settings
           in /proc/sys/net.
           \_ I ran /sbin/sysctl -a | grep tcp, all settings are the same.
	...
2008/9/3 [Computer/SW/Unix] UID:51030 Activity:nil
9/3     Okay, my sed and awk skills are obviously not up to par here.
        I want to only see the "500's" in my apache error log, how do I
        do that? I want to see the whole line, not just the 500 error code.
        Never mind, grep " 500 " is close enough.
	...
2008/7/14-16 [Computer/SW/Languages/Perl, Computer/SW/Unix] UID:50557 Activity:moderate
7/14    Shell Programming question: I want to call a script with 1 arg
        and have it figure out whether $1 is a MAC address or an IP address
        and then do call the appropriate function.  What is the best way
        to do this, given that sh/bash/ksh do not have something like
        the =~ in perl.  Check for exit status of grep, or is there a
        a better way?  For the moment, let's just say the two tests are:
	...
2007/9/4-6 [Computer/Rants] UID:47882 Activity:moderate
9/4     Happy Labor Day! Thank you, Labor, for weekends, mandatory breaks,
        and a generally un-serf-like workplace.
        \_ There are a few books coming out about how horrible the
           New Deal was for America.  I love it.
           New Deal was for America.
           \_ There are also books out about how much the Jews control
	...
2007/8/18 [Computer/SW/Unix] UID:47649 Activity:nil
8/17    How to I grep to exclude all lines with more than 1 / ?
        \_ After reading the grep man page:
           grep -v -E '/[^/]*/' myFile
	...
2007/6/12-14 [Computer/SW/Unix] UID:46925 Activity:high
6/12    Inside of a C++ program, I do a "ps | grep usename" for logging
        purposes.  where username = getenv("USER");  Doing this directly is a
        gigantic security hole because someone could set $USER to some command
        line and execute arbitrary code.  What's the best way to make this
        safe?  Is there some standard way to check the input in a case like
        this?
	...
Cache (110 bytes)
www.von-bassewitz.de/uz/unixhier.html
I found the text in some other places without giving an author name, so it's probably ok to put it on the web.
Cache (266 bytes)
www.elementkjournals.com/sun/9704/sun9742.htm
Please 21 email the Webmaster and include the date and time this error occurred and a brief description of what you were trying to do and if you can, the URL to the page where the error occurred. Alternatively you may want to try your request again in a few minutes.
Cache (8192 bytes)
www.tgries.de/agrep -> www.tgries.de/agrep/
From the authors' notes: AGREP is a powerful tool for fast searching a file or many files for a string or regular expression, with approximate matching capabilities and user-definable records. AGREP is similar to egrep (or grep or fgrep), but it is much more general and usually faster. It also supports many kinds of queries including arbitrary wild cards, sets of patterns, and in general, regular expressions. It supports most of the options supported by the GREP family plus several more (but it is not 100% compatible with grep). AGREP is the search engine and part of the 14 GLIMPSE tool for searching and indexing whole file systems. GLIMPSE stands for Global Implicit Search and is part of the 15 HARVEST Information Discovery and Access System. AGREP belongs to the University of Arizona, which licenses it (see 16 copyright). It is not public-domain, but free for non-commercial use. But running this, you need one, two, or three additional files, depending on your operating system (see next section). EXE All these Zip files include the COPYRIGHT and README files. OUT Remark: this does not search the C:\MAIL directory itself. Another method to search multiple files and/or subdirectories: Use AGREP's built-in @reponsefile (listfile) option ! LST Remember, that AGREP is faster when you load it once and let it search a bunch of files: AGREP needle C:\MAIL\* AGREP by Udi Manber, University of Arizona, Sun Wu, Thomas Gries. The DPMI DOS extender RSX allows to run 32-bit programs (like AGREP) in a DOS box of several operation systems. More information on codepages can be found here: 39 codepage 437 (US) and 850 (Latin-1) (list of pointers) 40 ISO 8859-1 National Caracter Set FAQ (FAQ; Examples: "" matches "" "" matches "" but these do not match "a" or "" ! This simply allows to force case-sensitive searches in case that you also use the environment variable 42 AGREPOPTS saying to search case-insensitive. SYS) SET AGREPOPTS=-i -V4 make AGREP search case-insensitive by default, verbose level 4 Different levels of verbose option -V version information only -V0 no diagnostic messages at all (use the -V0 option together with -s option to avoid any output) -V1 shows Grand Total (= count of records having matches; When calling AGREP, the return code reflects to the number of matches: return code of AGREP meaning >= 0 the total number of matches (zero means no match) < 0 syntax error/s or inaccessible file/s There was a problem of infinite loops for older AGREP versions. The following can cause an infinite loop: agrep pattern * > output_file. If the number of matches is high, they may be deposited in output_file before it is completely read leading to more matches of the pattern within output_file (the matches are against the whole directory). It's not clear whether this is a "bug" (grep will do the same), but be warned. EXE for OS/2 cannot be run in a DOS box of Windows or OS/2. There are a few restrictions regarding the possible combinations of search options. AGREP will display a message, when it does not support the requested combination of options. As later versions could do, please check regularly this page for amendments. EXE does not allow to search long file or directory names under Windows 95, Windows NT. But AGREP does not fail to find your needle, it's only a problem of presenting this record. When using AGREP and pipes, there could be some problems like the message "no target files found". The program itself shows six pages of on-line help - when you call it without any parameters. Visit the help pages for AGREP 46 here There is a list of all options of AGREP and a lot of examples. If you find a problem, please send 47 me your bug report. The Rexx API - An Introduction to Extending the Rexx Language ( 50 contents) (by Bill Potvin). CMD - Examples taken from \EMX\SAMPLES of 51 emx (by Eberhard Mattes). Return codes: If you intend to call this version of AGREP from a PERL script, you probably want to avoid any output while keeping AGREPs 52 return code. In this case, please use options -s (almost silent) together with 53 -V0 (verbose nothing) to avoid the output of the Grand Total number of matches. Retrieval in context (RIC) = focus function One of the most powerful features is already the -d option allowing user-definable records. The proposed extension of this option would be very useful when the target files do not have a certain record structure. In this case, one would prefer to run AGREP line oriented (default), but giving -dn would allow a range of n target lines to be displayed around the line with the needle. Preferable in combination with highlight function: Highlight (mark, tag) the matches in the output record Allow user-definable prefix- and suffix strings to mark the needle in the output record. The strings could be composed of ANSI strings to select/deselect colours, or they could be used to generate 54 HTML links. Implementation of Sunday's Optimal Mismatch Algorithm (for exact pattern searches) New metasymbols for three predefined sets of characters @ all letters % all digits all the rest Examples: search for a car plate number which starts with "ABC" followed by 4 digits: "ABC%%%%" or to search for 55 US patents "US%%%%%%%" or for 56 European patents "EP%%%%%%%" Dynamic Metasymbol Assignment DMSA In its current implementation, AGREP needs sixteen characters from the character set internally. The graphic characters, which are not common to text files, cannot be searched at the moment and must therefore not appear in the needle string. It is planned to remove that restriction of AGREP in a later version. Except for exact matching of simple patterns, for which we use a simple variation of the Boyer-Moore algorithm, all the algorithms (listed below) were designed by 60 Sun Wu and 61 Udi Manber. It supports many extensions such as approximate regular expression pattern matching, non-uniform costs, simultaneous matching of multiple patterns, mixed exact/approximate matching, etc. It assumes that the set of patterns contains k patterns, and that the shortest pattern is of size m. Let b = log_c (2*m), where c is the size of alphabet set. In the preprocessing, a table is built to determine whether a given substring of size b is in the pattern. Suppose we are looking for matches with at most k errors. The search is done in two passes: In the first pass (the filtering pass), the areas in the text that have a possibility to contain the matches are marked. The second pass finds the matches in those marked areas. The search in the first pass is done in the following way. Suppose the end position of the pattern is currently aligned with position tx in the text. The algorithm scans backward from tx until either (k+1) blocks that do not occur in the pattern have been scanned, or the scan has passed position (tx-m+k). In the former case, pattern is shifted forward to align the beginning position of the pattern with one character after the position in the text where the scan was stopped. In the latter case, we marked tx-m to tx+m as a candidate area. For ASCII text and pattern, this algorithm is faster than amonkey. If we partition A into (k+1) blocks, then the distance between A and B is > k if none of the blocks of A occur in B. This implies that to match A with no more than k errors, B has to contain a substring that matches exactly one block of A. Permission is granted to copy this software, to redistribute it on a nonprofit basis, and to use it for any purpose, subject to the following restrictions and understandings. Any copy made of this software must include this copyright notice in full. All materials developed as a consequence of the use of this software shall duly acknowledge such use, in accordance with the usual standards of acknowledging credit in academic research. The authors have made no warranty or representation that the operation of this software will be error-free or suitable for any application, and they are under under no obligation to provide any services, by way of maintenance, update, or otherwise. The software is an experimental prototype offered on an as-is basis. Redistribution for profit requires the express, w...