Entry 27621 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 27621

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/15 [General] UID:1000 Activity:popular

7/15

2003/3/7-10 [Computer/SW/SpamAssassin] UID:27621 Activity:high

3/7     SpamAssassin is working great. However, I'm still getting the
        occassional spam in my Inbox, and they all hover around 4.7. I know
        you can adjust scores for certain tests.  What are some of the score
        adjustments you guys have made?
        \_ score HTML_90_100 2.0
           score BIG_FONT 1.0
           score CLICK_BELOW 1.0
           I almost never get personal email that is 1. in HTML
           2. in Big fonts, so i cranked up the score a bit.
           rest of it is personal.  For example, i tone down
           non-ascii and 8-bit subjects cuz I do get stuff in
           Chinese.
           \_ How do I do this? type that into my user_prefs file?
              \_ Yes  ~/.spamassassin/user_prefs
        \_ ...just started using this as well. I'm pretty happy with the
           score settings, but a couple have slipped through both ways
           (spams not caught, and vice-versa). Whats the best way to do
           whitelists and blacklists? ... just with basic procmail stuff?
           \_ you can manually add whitelist in your user_prefs file.
              it even contains samples for you to follow.  what I did
              was wrote a shell script convert my pine's address book
              ito proper format and dump inside the user_pref file
              \_ perl -pi -e's/\t\n/\t/;s/.*<(.*)>/\1/;s/.*\t.*\t(.*)/\1/' .add
                 at least for me, this handles pine's weird handling of long
                 names as well as addresses that have <email> in there.
           \_ not procmail, but look in ~/.spamassassin/user_prefs
        \_ 2.50 has bayesian filtering. if you have a recent spam corpus,
           you can train it really easily. otherwise it will train based
           on stuff that scores high, or things you submit as spam --aaron
           \_ can't wait until 2.5 comes out.  Does anyone know rather
              people tried to use bayesian filtering techniques on
              Chinese email?  it won't work natively cuz Chinese words
              are not seperated by spaces
              \_ 2.50 *is* out. so don't wait longer, that'd be silly. also,
                 bayesian will still work on Chinese as long as it knew how
                 to use individual characters as tokens, which is only an
                 issue of being charset aware. --aaron
                 \_ Soda God, please install version 2.5
                 \_ a bit more complicated than that in terms of dealing
                    with Chinese.  The real issue is... well,
                    imagineYouWriteASentenseWithoutSpaceCharacterOr
                    punchuation...
                    \_ someonedoesntseemtoundertandinghowbayesiananalysisworks
                       \_ If you write like this, and have spelling errors,
                          this becomes very hard -- the same problem as
                          breaking up a sequence that looks like GTGTTTAGG ...
                          into meaningful bits.
                          \_ i would try to explain if you weren't anon --aaron
                          \_ i'm not as nice as aaron.  you're clueless and
                             need to go look up how it really works, not come
                             to the motd and pretend you're smart.
                             \_ It's naive bayes.  It assumes features are
                                independent given the hypothesis.  Or, to put it
                                in slightly less snobbish terms, it's counting
                                with a fancy name.  You don't
                                want to be lecturing me on naive bayes.  While
                                one really always knows where word breaks are
                                in chinese, if you have no word breaks in
                                english, and have misspellings this is a hard
                                problem, and you can't use naive bayes to
                                solve it -- you need something like HMMs.
                                \_ And it'll still be caught by spamassassin.
                                   \_ With no spaces and enough misspellings it
                                      will not.  Sorry.
                                   \_ Spamassassin uses silly regexps.  The
                                      poster above called me clueless because
                                      he thought naive bayes could handle this
                                      problem; I think the irony of the
                                      situation is quite dead to him.

.

2025/07/15 [General] UID:1000 Activity:popular

7/15

You may also be interested in these entries...

2012/8/16-10/17 [Computer/SW/SpamAssassin] UID:54458 Activity:nil

8/16    Why does my Y! mail account always full of unfiltered spam
        mails (and they're obviously spams)? Why can't they do
        a better job like Google mail? Why does Y! mail charge
        for exporting email? Google mail doesn't do that.
	...

2010/8/13-9/7 [Computer/SW/SpamAssassin] UID:53924 Activity:nil

8/12    Ugg, no spamd any longer?  I figured I'd have to just give up on my
        soda address (sad, very sad) but Vacation doesn't seem to be installed
        either, so I can't even leave a mesg. to people telling them where
        tom mail me now.  Or can I ?  Any advice out there.  Or can we get
        spamassassin/spamd reinstalled or Vacation or... help....
        \_ Ha, gmail as spamassassin.  presently I am forwarding to gmail
	...

2009/12/8-26 [Politics/Domestic/Crime, Computer/SW/SpamAssassin] UID:53580 Activity:low

12/8    Old news, but new to me:
        Spam King kills himself and his family after escaping prison
        http://blogs.zdnet.com/security/?p=1553&tag=rbxccnbzd1
        Hopefully more spammers will take the hint.
        \_ I wish the same fate can go to all marketing and
           advertising folks, selling people things they don't
	...

2009/8/18-9/1 [Computer/SW/Database, Computer/SW/Languages/Perl] UID:53283 Activity:low

8/18    trying to write an intentionally slow regex.
        what is your worst regex ever?
        this is using MySQL regexp but I'll also accept
        perl format         --brain
        \_ you need to know how regex is implemented internally in order to
           have a worst regex in terms of running time. Something that uses
	...

2009/7/17-24 [Computer/SW/SpamAssassin] UID:53157 Activity:nil

7/17    Thanks to steven, et al. for restoring Soda. In lieu of www.csua providing
        status, could there be a text file with current status and future plans.
        I'm wondering if SpamAssassin is obsolete (and my procmailrc and scripts)
        and won't be restored, and what's filtering spam now.  thanks!
        \_ How do I buy steven a beer or donate gobs of money?
           \_ I got him a Hacker-Pschorr, he seems to like ales.  Prob IPAs
	...

2009/5/8-14 [Computer/SW/SpamAssassin] UID:52971 Activity:nil

5/7     Dear csua, looks like /usr/bin/spamc and /usr/bin/formail don't exist
        on the emailer. I'm getting a bunch of binaries not found error
        on my .procmail-log.
    \_ Complaining via motd is not a reliable way to be heard and get your
       stuff fixed. Try emailing us. --t
	...