Как говорится. за что купил. за то продал:
Код: New beta-build of anti-spam plugin BayesIt! is available.
The changes are:
+ One more filtering method - the "hybrid" of Paul Graham's and
Gary Robinson's methods.
+ "White list" of kludges now works with incomplete strings. (for
example, you can enter "x-spam-..." which will be treated as any
kludge, begins with "x-spam-" (like "x-spam-level","x-spam-grade"
and so on).
- One more bug fixed, (thanks to Alexander V. Hramov), which
caused the filter and manager stuck if the very last kludge in
letter header contains an empty value.
+ Finally autotraining became to work! Below I describe how it is
done.
+ If you run BayesIt! manager (learnengine.exe) with the path to a
letter, saved as .eml (or .msg) as command-line parameter, then
manager will inverse the grade of the letter in the regarding
base.
It is exactly necessary if autotraining make a mistake. In this
case you can just found the wrongly trained letter by it's MSID
(can be found in the log-file) in The Bat!, save it to disk as,
for example, wrong.eml and just run "learnengine wrong.eml".
The letter regarding will be inverted to the opposite category.
Now the couple of words about autotraining.
A letter will be autotrained only with these conditions:
a) Autotraining in options is set to be "on".
b) the quantity of letters in regarding base ("power" of
base) by every of two parts (spam and non-spam) is more than
the number, defined in training options, "Size of base to
autotrain".
If this two conditions are well, then autotraining idle
process will run together with The Bat!
Then, for every previously regarded letter, if it's grade is
more or equal to "Minimal SPAM grade to autotrain" or less or
equal to "Maximal NON-SPAM grade to autotrain", the filter
regard the letter as surely regarded, confirms it into
preferred corpus (spam or non-spam, depends on regard), then
recalculates and saves regarding base. Note, that this grade
is counted in the scale 0..1000000, where "0" is grade of
non-spam and "1000000" is grade of spam. Also note, that Gary
Robinson's method give you more soft value, and if you use
this method then autotraining process will be really seldom
event. The most safe value (from the viewpoint of possibility
to mistake) is exactly 0 and 1000000.
This operation is repeated until there are no more letters
which are "surely" regarded and can be confirmed
automatically. After finishing all the letters, autotraining
thread go to sleep.
When a new letters arises, it wakes up again after about 10
seconds after last letter in the session received, and
autotrain appropriate letters again.
When you exit The Bat!, the filter finally check how many
letters rest untrained, and if this number is more than
parameter, defined in "Autostart manual training...", it ask
you to run autotrain process.
Nearest "To-do" list now includes .bye export/import
features, and, of course, fixing the bugs, if somebody found
them.
As usually, this version is available here:
http://klirik.narod.ru/arc/bayesit02c.zip (95kb).