3/7 SpamAssassin is working great. However, I'm still getting the
occassional spam in my Inbox, and they all hover around 4.7. I know
you can adjust scores for certain tests. What are some of the score
adjustments you guys have made?
\_ score HTML_90_100 2.0
score BIG_FONT 1.0
score CLICK_BELOW 1.0
I almost never get personal email that is 1. in HTML
2. in Big fonts, so i cranked up the score a bit.
rest of it is personal. For example, i tone down
non-ascii and 8-bit subjects cuz I do get stuff in
Chinese.
\_ How do I do this? type that into my user_prefs file?
\_ Yes ~/.spamassassin/user_prefs
\_ ...just started using this as well. I'm pretty happy with the
score settings, but a couple have slipped through both ways
(spams not caught, and vice-versa). Whats the best way to do
whitelists and blacklists? ... just with basic procmail stuff?
\_ you can manually add whitelist in your user_prefs file.
it even contains samples for you to follow. what I did
was wrote a shell script convert my pine's address book
ito proper format and dump inside the user_pref file
\_ perl -pi -e's/\t\n/\t/;s/.*<(.*)>/\1/;s/.*\t.*\t(.*)/\1/' .add
at least for me, this handles pine's weird handling of long
names as well as addresses that have <email> in there.
\_ not procmail, but look in ~/.spamassassin/user_prefs
\_ 2.50 has bayesian filtering. if you have a recent spam corpus,
you can train it really easily. otherwise it will train based
on stuff that scores high, or things you submit as spam --aaron
\_ can't wait until 2.5 comes out. Does anyone know rather
people tried to use bayesian filtering techniques on
Chinese email? it won't work natively cuz Chinese words
are not seperated by spaces
\_ 2.50 *is* out. so don't wait longer, that'd be silly. also,
bayesian will still work on Chinese as long as it knew how
to use individual characters as tokens, which is only an
issue of being charset aware. --aaron
\_ Soda God, please install version 2.5
\_ a bit more complicated than that in terms of dealing
with Chinese. The real issue is... well,
imagineYouWriteASentenseWithoutSpaceCharacterOr
punchuation...
\_ someonedoesntseemtoundertandinghowbayesiananalysisworks
\_ If you write like this, and have spelling errors,
this becomes very hard -- the same problem as
breaking up a sequence that looks like GTGTTTAGG ...
into meaningful bits.
\_ i would try to explain if you weren't anon --aaron
\_ i'm not as nice as aaron. you're clueless and
need to go look up how it really works, not come
to the motd and pretend you're smart.
\_ It's naive bayes. It assumes features are
independent given the hypothesis. Or, to put it
in slightly less snobbish terms, it's counting
with a fancy name. You don't
want to be lecturing me on naive bayes. While
one really always knows where word breaks are
in chinese, if you have no word breaks in
english, and have misspellings this is a hard
problem, and you can't use naive bayes to
solve it -- you need something like HMMs.
\_ And it'll still be caught by spamassassin.
\_ With no spaces and enough misspellings it
will not. Sorry.
\_ Spamassassin uses silly regexps. The
poster above called me clueless because
he thought naive bayes could handle this
problem; I think the irony of the
situation is quite dead to him.
. |