8. Various

8.1 Process/Prefix		A convention used by many treatment commands.
8.2 Interactive		Making Gnus ask you many questions.
8.3 Symbolic Prefixes		How to supply some Gnus functions with options.
8.4 Formatting Variables		You can specify what buffers should look like.
8.5 Window Layout		Configuring the Gnus buffer windows.
8.6 Faces and Fonts		How to change how faces look.
8.7 Compilation		How to speed Gnus up.
8.8 Mode Lines		Displaying information in the mode lines.
8.9 Highlighting and Menus		Making buffers look all nice and cozy.
8.10 Buttons		Get tendinitis in ten easy steps!
8.11 Daemons		Gnus can do things behind your back.
8.12 NoCeM		How to avoid spam and other fatty foods.
8.13 Undo		Some actions can be undone.
8.14 Predicate Specifiers		Specifying predicates.
8.15 Moderation		What to do if you're a moderator.
8.16 Fetching a Group		Starting Gnus just to read a group.
8.17 Image Enhancements		Modern versions of Emacs/XEmacs can display images.
8.18 Fuzzy Matching		What's the big fuzz?
8.19 Thwarting Email Spam		A how-to on avoiding unsolicited commercial email.
8.20 Interaction with other modes
8.21 Various Various		Things that are really various.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.1 Process/Prefix

Many functions, among them functions for moving, decoding and saving articles, use what is known as the Process/Prefix convention.

This is a method for figuring out what articles the user wants the command to be performed on.

It goes like this:

If the numeric prefix is N, perform the operation on the next N articles, starting with the current one. If the numeric prefix is negative, perform the operation on the previous N articles, starting with the current one.

If transient-mark-mode in non-nil and the region is active, all articles in the region will be worked upon.

If there is no numeric prefix, but some articles are marked with the process mark, perform the operation on the articles marked with the process mark.

If there is neither a numeric prefix nor any articles marked with the process mark, just perform the operation on the current article.

Quite simple, really, but it needs to be made clear so that surprises are avoided.

Commands that react to the process mark will push the current list of process marked articles onto a stack and will then clear all process marked articles. You can restore the previous configuration with the M P y command (see section 3.7.6 Setting Process Marks).

One thing that seems to shock & horrify lots of people is that, for instance, 3 d does exactly the same as d d d. Since each d (which marks the current article as read) by default goes to the next unread article after marking, this means that 3 d will mark the next three unread articles as read, no matter what the summary buffer looks like. Set gnus-summary-goto-unread to nil for a more straightforward action.

Many commands do not use the process/prefix convention. All commands that do explicitly say so in this manual. To apply the process/prefix convention to commands that do not use it, you can use the M-& command. For instance, to mark all the articles in the group as expirable, you could say M P b M-& E.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.2 Interactive

gnus-novice-user: If this variable is non-nil, you are either a newcomer to the World of Usenet, or you are very cautious, which is a nice thing to be, really. You will be given questions of the type "Are you sure you want to do this?" before doing anything dangerous. This is t by default.
gnus-expert-user: If this variable is non-nil, you will seldom be asked any questions by Gnus. It will simply assume you know what you're doing, no matter how strange.
gnus-interactive-catchup: Require confirmation before catching up a group if non-nil. It is t by default.
gnus-interactive-exit: Require confirmation before exiting Gnus. This variable is t by default.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.3 Symbolic Prefixes

Quite a lot of Emacs commands react to the (numeric) prefix. For instance, C-u 4 C-f moves point four characters forward, and C-u 9 0 0 I s s p adds a permanent Subject substring score rule of 900 to the current article.

This is all nice and well, but what if you want to give a command some additional information? Well, what most commands do is interpret the "raw" prefix in some special way. C-u 0 C-x C-s means that one doesn't want a backup file to be created when saving the current buffer, for instance. But what if you want to save without making a backup file, and you want Emacs to flash lights and play a nice tune at the same time? You can't, and you're probably perfectly happy that way.

I'm not, so I've added a second prefix--the symbolic prefix. The prefix key is M-i (gnus-symbolic-argument), and the next character typed in is the value. You can stack as many M-i prefixes as you want. M-i a C-M-u means "feed the C-M-u command the symbolic prefix a". M-i a M-i b C-M-u means "feed the C-M-u command the symbolic prefixes a and b". You get the drift.

Typing in symbolic prefixes to commands that don't accept them doesn't hurt, but it doesn't do any good either. Currently not many Gnus functions make use of the symbolic prefix.

If you're interested in how Gnus implements this, see section 10.7.7 Extended Interactive.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4 Formatting Variables

Throughout this manual you've probably noticed lots of variables called things like gnus-group-line-format and gnus-summary-mode-line-format. These control how Gnus is to output lines in the various buffers. There's quite a lot of them. Fortunately, they all use the same syntax, so there's not that much to be annoyed by.

Here's an example format spec (from the group buffer): `%M%S%5y: %(%g%)\n'. We see that it is indeed extremely ugly, and that there are lots of percentages everywhere.

8.4.1 Formatting Basics		A formatting variable is basically a format string.
8.4.2 Mode Line Formatting		Some rules about mode line formatting variables.
8.4.3 Advanced Formatting		Modifying output in various ways.
8.4.4 User-Defined Specs		Having Gnus call your own functions.
8.4.5 Formatting Fonts		Making the formatting look colorful and nice.
8.4.6 Positioning Point		Moving point to a position after an operation.
8.4.7 Tabulation		Tabulating your output.
8.4.8 Wide Characters		Dealing with wide characters.

Currently Gnus uses the following formatting variables: gnus-group-line-format, gnus-summary-line-format, gnus-server-line-format, gnus-topic-line-format, gnus-group-mode-line-format, gnus-summary-mode-line-format, gnus-article-mode-line-format, gnus-server-mode-line-format, and gnus-summary-pick-line-format.

All these format variables can also be arbitrary elisp forms. In that case, they will be evaled to insert the required lines.

Gnus includes a command to help you while creating your own format specs. M-x gnus-update-format will eval the current form, update the spec in question and pop you to a buffer where you can examine the resulting Lisp code to be run to generate the line.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.1 Formatting Basics

Each `%' element will be replaced by some string or other when the buffer in question is generated. `%5y' means "insert the `y' spec, and pad with spaces to get a 5-character field".

As with normal C and Emacs Lisp formatting strings, the numerical modifier between the `%' and the formatting type character will pad the output so that it is always at least that long. `%5y' will make the field always (at least) five characters wide by padding with spaces to the left. If you say `%-5y', it will pad to the right instead.

You may also wish to limit the length of the field to protect against particularly wide values. For that you can say `%4,6y', which means that the field will never be more than 6 characters wide and never less than 4 characters wide.

Also Gnus supports some extended format specifications, such as `%&user-date;'.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.2 Mode Line Formatting

Mode line formatting variables (e.g., gnus-summary-mode-line-format) follow the same rules as other, buffer line oriented formatting variables (see section 8.4.1 Formatting Basics) with the following two differences:

There must be no newline (`\n') at the end.
The special `%%b' spec can be used to display the buffer name. Well, it's no spec at all, really---`%%' is just a way to quote `%' to allow it to pass through the formatting machinery unmangled, so that Emacs receives `%b', which is something the Emacs mode line display interprets to mean "show the buffer name". For a full list of mode line specs Emacs understands, see the documentation of the mode-line-format variable.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.3 Advanced Formatting

It is frequently useful to post-process the fields in some way. Padding, limiting, cutting off parts and suppressing certain values can be achieved by using tilde modifiers. A typical tilde spec might look like `%~(cut 3)~(ignore "0")y'.

These are the valid modifiers:

pad

pad-left

Pad the field to the left with spaces until it reaches the required length.

pad-right

Pad the field to the right with spaces until it reaches the required length.

max

max-left

Cut off characters from the left until it reaches the specified length.

max-right

Cut off characters from the right until it reaches the specified length.

cut

cut-left

Cut off the specified number of characters from the left.

cut-right

Cut off the specified number of characters from the right.

ignore

Return an empty string if the field is equal to the specified value.

form

Use the specified form as the field value when the `@' spec is used.

Here's an example:

"~(form (current-time-string))@"

Let's take an example. The `%o' spec in the summary mode lines will return a date in compact ISO8601 format---`19960809T230410'. This is quite a mouthful, so we want to shave off the century number and the time, leaving us with a six-character date. That would be `%~(cut-left 2)~(max-right 6)~(pad 6)o'. (Cutting is done before maxing, and we need the padding to ensure that the date is never less than 6 characters to make it look nice in columns.)

Ignoring is done first; then cutting; then maxing; and then as the very last operation, padding.

If you use lots of these advanced thingies, you'll find that Gnus gets quite slow. This can be helped enormously by running M-x gnus-compile when you are satisfied with the look of your lines. See section 8.7 Compilation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.4 User-Defined Specs

All the specs allow for inserting user defined specifiers---`u'. The next character in the format string should be a letter. Gnus will call the function gnus-user-format-function-`X', where `X' is the letter following `%u'. The function will be passed a single parameter--what the parameter means depends on what buffer it's being called from. The function should return a string, which will be inserted into the buffer just like information from any other specifier. This function may also be called with dummy values, so it should protect against that.

Also Gnus supports extended user-defined specs, such as `%u&foo;'. Gnus will call the function gnus-user-format-function-`foo'.

You can also use tilde modifiers (see section 8.4.3 Advanced Formatting to achieve much the same without defining new functions. Here's an example: `%~(form (count-lines (point-min) (point)))@'. The form given here will be evaluated to yield the current line number, and then inserted.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.5 Formatting Fonts

There are specs for highlighting, and these are shared by all the format variables. Text inside the `%(' and `%)' specifiers will get the special mouse-face property set, which means that it will be highlighted (with gnus-mouse-face) when you put the mouse pointer over it.

Text inside the `%{' and `%}' specifiers will have their normal faces set using gnus-face-0, which is bold by default. If you say `%1{', you'll get gnus-face-1 instead, and so on. Create as many faces as you wish. The same goes for the mouse-face specs--you can say `%3(hello%)' to have `hello' mouse-highlighted with gnus-mouse-face-3.

Text inside the `%<<' and `%>>' specifiers will get the special balloon-help property set to gnus-balloon-face-0. If you say `%1<<', you'll get gnus-balloon-face-1 and so on. The gnus-balloon-face-* variables should be either strings or symbols naming functions that return a string. When the mouse passes over text with this property set, a balloon window will appear and display the string. Please refer to section `Tooltips' in The Emacs Manual, (in GNU Emacs) or the doc string of balloon-help-mode (in XEmacs) for more information on this. (For technical reasons, the guillemets have been approximated as `<<' and `>>' in this paragraph.)

Here's an alternative recipe for the group buffer:

;; Create three face types.
(setq gnus-face-1 'bold)
(setq gnus-face-3 'italic)

;; We want the article count to be in
;; a bold and green face.  So we create
;; a new face called my-green-bold.
(copy-face 'bold 'my-green-bold)
;; Set the color.
(set-face-foreground 'my-green-bold "ForestGreen")
(setq gnus-face-2 'my-green-bold)

;; Set the new & fancy format.
(setq gnus-group-line-format
      "%M%S%3{%5y%}%2[:%] %(%1{%g%}%)\n")

I'm sure you'll be able to use this scheme to create totally unreadable and extremely vulgar displays. Have fun!

Note that the `%(' specs (and friends) do not make any sense on the mode-line variables.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.6 Positioning Point

Gnus usually moves point to a pre-defined place on each line in most buffers. By default, point move to the first colon character on the line. You can customize this behavior in three different ways.

You can move the colon character to somewhere else on the line.

You can redefine the function that moves the point to the colon. The function is called gnus-goto-colon.

But perhaps the most convenient way to deal with this, if you don't want to have a colon in your line, is to use the `%*' specifier. If you put a `%*' somewhere in your format line definition, Gnus will place point there.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.7 Tabulation

You can usually line up your displays by padding and cutting your strings. However, when combining various strings of different size, it can often be more convenient to just output the strings, and then worry about lining up the following text afterwards.

To do that, Gnus supplies tabulator specs---`%='. There are two different types---hard tabulators and soft tabulators.

`%50=' will insert space characters to pad the line up to column 50. If the text is already past column 50, nothing will be inserted. This is the soft tabulator.

`%-50=' will insert space characters to pad the line up to column 50. If the text is already past column 50, the excess text past column 50 will be removed. This is the hard tabulator.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4.8 Wide Characters

Fixed width fonts in most countries have characters of the same width. Some countries, however, use Latin characters mixed with wider characters--most notable East Asian countries.

The problem is that when formatting, Gnus assumes that if a string is 10 characters wide, it'll be 10 Latin characters wide on the screen. In these countries, that's not true.

To help fix this, you can set gnus-use-correct-string-widths to t. This makes buffer generation slower, but the results will be prettier. The default value under XEmacs is t but nil for Emacs.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.5 Window Layout

No, there's nothing here about X, so be quiet.

If gnus-use-full-window non-nil, Gnus will delete all other windows and occupy the entire Emacs screen by itself. It is t by default.

Setting this variable to nil kinda works, but there are glitches. Use at your own peril.

gnus-buffer-configuration describes how much space each Gnus buffer should be given. Here's an excerpt of this variable:

((group (vertical 1.0 (group 1.0 point)
                      (if gnus-carpal (group-carpal 4))))
 (article (vertical 1.0 (summary 0.25 point)
                        (article 1.0))))

This is an alist. The key is a symbol that names some action or other. For instance, when displaying the group buffer, the window configuration function will use group as the key. A full list of possible names is listed below.

The value (i.e., the split) says how much space each buffer should occupy. To take the article split as an example -

(article (vertical 1.0 (summary 0.25 point)
                       (article 1.0)))

This split says that the summary buffer should occupy 25% of upper half of the screen, and that it is placed over the article buffer. As you may have noticed, 100% + 25% is actually 125% (yup, I saw y'all reaching for that calculator there). However, the special number 1.0 is used to signal that this buffer should soak up all the rest of the space available after the rest of the buffers have taken whatever they need. There should be only one buffer with the 1.0 size spec per split.

Point will be put in the buffer that has the optional third element point. In a frame split, the last subsplit having a leaf split where the tag frame-focus is a member (i.e. is the third or fourth element in the list, depending on whether the point tag is present) gets focus.

Here's a more complicated example:

(article (vertical 1.0 (group 4)
                       (summary 0.25 point)
                       (if gnus-carpal (summary-carpal 4))
                       (article 1.0)))

If the size spec is an integer instead of a floating point number, then that number will be used to say how many lines a buffer should occupy, not a percentage.

If the split looks like something that can be evaled (to be precise--if the car of the split is a function or a subr), this split will be evaled. If the result is non-nil, it will be used as a split. This means that there will be three buffers if gnus-carpal is nil, and four buffers if gnus-carpal is non-nil.

Not complicated enough for you? Well, try this on for size:

(article (horizontal 1.0
             (vertical 0.5
                 (group 1.0)
                 (gnus-carpal 4))
             (vertical 1.0
                 (summary 0.25 point)
                 (summary-carpal 4)
                 (article 1.0))))

Whoops. Two buffers with the mystery 100% tag. And what's that horizontal thingie?

If the first element in one of the split is horizontal, Gnus will split the window horizontally, giving you two windows side-by-side. Inside each of these strips you may carry on all you like in the normal fashion. The number following horizontal says what percentage of the screen is to be given to this strip.

For each split, there must be one element that has the 100% tag. The splitting is never accurate, and this buffer will eat any leftover lines from the splits.

To be slightly more formal, here's a definition of what a valid split may look like:

split      = frame | horizontal | vertical | buffer | form
frame      = "(frame " size *split ")"
horizontal = "(horizontal " size *split ")"
vertical   = "(vertical " size *split ")"
buffer     = "(" buf-name " " size *[ "point" ] *[ "frame-focus"] ")"
size       = number | frame-params
buf-name   = group | article | summary ...

The limitations are that the frame split can only appear as the top-level split. form should be an Emacs Lisp form that should return a valid split. We see that each split is fully recursive, and may contain any number of vertical and horizontal splits.

Finding the right sizes can be a bit complicated. No window may be less than gnus-window-min-height (default 1) characters high, and all windows must be at least gnus-window-min-width (default 1) characters wide. Gnus will try to enforce this before applying the splits. If you want to use the normal Emacs window width/height limit, you can just set these two variables to nil.

If you're not familiar with Emacs terminology, horizontal and vertical splits may work the opposite way of what you'd expect. Windows inside a horizontal split are shown side-by-side, and windows within a vertical split are shown above each other.

If you want to experiment with window placement, a good tip is to call gnus-configure-frame directly with a split. This is the function that does all the real work when splitting buffers. Below is a pretty nonsensical configuration with 5 windows; two for the group buffer and three for the article buffer. (I said it was nonsensical.) If you eval the statement below, you can get an idea of how that would look straight away, without going through the normal Gnus channels. Play with it until you're satisfied, and then use gnus-add-configuration to add your new creation to the buffer configuration list.

(gnus-configure-frame
 '(horizontal 1.0
    (vertical 10
      (group 1.0)
      (article 0.3 point))
    (vertical 1.0
      (article 1.0)
      (horizontal 4
        (group 1.0)
        (article 10)))))

You might want to have several frames as well. No prob--just use the frame split:

(gnus-configure-frame
 '(frame 1.0
         (vertical 1.0
                   (summary 0.25 point frame-focus)
                   (article 1.0))
         (vertical ((height . 5) (width . 15)
                    (user-position . t)
                    (left . -1) (top . 1))
                   (picon 1.0))))

This split will result in the familiar summary/article window configuration in the first (or "main") frame, while a small additional frame will be created where picons will be shown. As you can see, instead of the normal 1.0 top-level spec, each additional split should have a frame parameter alist as the size spec. See section `Frame Parameters' in The GNU Emacs Lisp Reference Manual. Under XEmacs, a frame property list will be accepted, too--for instance, (height 5 width 15 left -1 top 1) is such a plist. The list of all possible keys for gnus-buffer-configuration can be found in its default value.

Note that the message key is used for both gnus-group-mail and gnus-summary-mail-other-window. If it is desirable to distinguish between the two, something like this might be used:

(message (horizontal 1.0
                     (vertical 1.0 (message 1.0 point))
                     (vertical 0.24
                               (if (buffer-live-p gnus-summary-buffer)
                                   '(summary 0.5))
                               (group 1.0))))

One common desire for a multiple frame split is to have a separate frame for composing mail and news while leaving the original frame intact. To accomplish that, something like the following can be done:

(message
  (frame 1.0
         (if (not (buffer-live-p gnus-summary-buffer))
             (car (cdr (assoc 'group gnus-buffer-configuration)))
           (car (cdr (assoc 'summary gnus-buffer-configuration))))
         (vertical ((user-position . t) (top . 1) (left . 1)
                    (name . "Message"))
                   (message 1.0 point))))

Since the gnus-buffer-configuration variable is so long and complicated, there's a function you can use to ease changing the config of a single setting: gnus-add-configuration. If, for instance, you want to change the article setting, you could say:

(gnus-add-configuration
 '(article (vertical 1.0
               (group 4)
               (summary .25 point)
               (article 1.0))))

You'd typically stick these gnus-add-configuration calls in your `~/.gnus.el' file or in some startup hook--they should be run after Gnus has been loaded.

If all windows mentioned in the configuration are already visible, Gnus won't change the window configuration. If you always want to force the "right" window configuration, you can set gnus-always-force-window-configuration to non-nil.

If you're using tree displays (see section 3.24 Tree Display), and the tree window is displayed vertically next to another window, you may also want to fiddle with gnus-tree-minimize-window to avoid having the windows resized.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.5.1 Example Window Configurations

Narrow left hand side occupied by group buffer. Right hand side split between summary buffer (top one-sixth) and article buffer (bottom).

+---+---------+
| G | Summary |
| r +---------+
| o |         |
| u | Article |
| p |         |
+---+---------+

(gnus-add-configuration
 '(article
   (horizontal 1.0
               (vertical 25 (group 1.0))
               (vertical 1.0
                         (summary 0.16 point)
                         (article 1.0)))))

(gnus-add-configuration
 '(summary
   (horizontal 1.0
               (vertical 25 (group 1.0))
               (vertical 1.0 (summary 1.0 point)))))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.6 Faces and Fonts

Fiddling with fonts and faces used to be very difficult, but these days it is very simple. You simply say M-x customize-face, pick out the face you want to alter, and alter it via the standard Customize interface.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.7 Compilation

Remember all those line format specification variables? gnus-summary-line-format, gnus-group-line-format, and so on. Now, Gnus will of course heed whatever these variables are, but, unfortunately, changing them will mean a quite significant slow-down. (The default values of these variables have byte-compiled functions associated with them, while the user-generated versions do not, of course.)

To help with this, you can run M-x gnus-compile after you've fiddled around with the variables and feel that you're (kind of) satisfied. This will result in the new specs being byte-compiled, and you'll get top speed again. Gnus will save these compiled specs in the `.newsrc.eld' file. (User-defined functions aren't compiled by this function, though--you should compile them yourself by sticking them into the `~/.gnus.el' file and byte-compiling that file.)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.8 Mode Lines

gnus-updated-mode-lines says what buffers should keep their mode lines updated. It is a list of symbols. Supported symbols include group, article, summary, server, browse, and tree. If the corresponding symbol is present, Gnus will keep that mode line updated with information that may be pertinent. If this variable is nil, screen refresh may be quicker.

By default, Gnus displays information on the current article in the mode lines of the summary and article buffers. The information Gnus wishes to display (e.g. the subject of the article) is often longer than the mode lines, and therefore have to be cut off at some point. The gnus-mode-non-string-length variable says how long the other elements on the line is (i.e., the non-info part). If you put additional elements on the mode line (e.g. a clock), you should modify this variable:

(add-hook 'display-time-hook
          (lambda () (setq gnus-mode-non-string-length
                           (+ 21
                              (if line-number-mode 5 0)
                              (if column-number-mode 4 0)
                              (length display-time-string)))))

If this variable is nil (which is the default), the mode line strings won't be chopped off, and they won't be padded either. Note that the default is unlikely to be desirable, as even the percentage complete in the buffer may be crowded off the mode line; the user should configure this variable appropriately for her configuration.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.9 Highlighting and Menus

The gnus-visual variable controls most of the Gnus-prettifying aspects. If nil, Gnus won't attempt to create menus or use fancy colors or fonts. This will also inhibit loading the `gnus-vis.el' file.

This variable can be a list of visual properties that are enabled. The following elements are valid, and are all included by default:

group-highlight: Do highlights in the group buffer.
summary-highlight: Do highlights in the summary buffer.
article-highlight: Do highlights in the article buffer.
highlight: Turn on highlighting in all buffers.
group-menu: Create menus in the group buffer.
summary-menu: Create menus in the summary buffers.
article-menu: Create menus in the article buffer.
browse-menu: Create menus in the browse buffer.
server-menu: Create menus in the server buffer.
score-menu: Create menus in the score buffers.
menu: Create menus in all buffers.

So if you only want highlighting in the article buffer and menus in all buffers, you could say something like:

(setq gnus-visual '(article-highlight menu))

If you want highlighting only and no menus whatsoever, you'd say:

(setq gnus-visual '(highlight))

If gnus-visual is t, highlighting and menus will be used in all Gnus buffers.

Other general variables that influence the look of all buffers include:

gnus-mouse-face: This is the face (i.e., font) used for mouse highlighting in Gnus. No mouse highlights will be done if gnus-visual is nil.

There are hooks associated with the creation of all the different menus:

gnus-article-menu-hook: Hook called after creating the article mode menu.
gnus-group-menu-hook: Hook called after creating the group mode menu.
gnus-summary-menu-hook: Hook called after creating the summary mode menu.
gnus-server-menu-hook: Hook called after creating the server mode menu.
gnus-browse-menu-hook: Hook called after creating the browse mode menu.
gnus-score-menu-hook: Hook called after creating the score mode menu.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.10 Buttons

Those new-fangled mouse contraptions is very popular with the young, hep kids who don't want to learn the proper way to do things these days. Why, I remember way back in the summer of '89, when I was using Emacs on a Tops 20 system. Three hundred users on one single machine, and every user was running Simula compilers. Bah!

Right.

Well, you can make Gnus display bufferfuls of buttons you can click to do anything by setting gnus-carpal to t. Pretty simple, really. Tell the chiropractor I sent you.

gnus-carpal-mode-hook: Hook run in all carpal mode buffers.
gnus-carpal-button-face: Face used on buttons.
gnus-carpal-header-face: Face used on carpal buffer headers.
gnus-carpal-group-buffer-buttons: Buttons in the group buffer.
gnus-carpal-summary-buffer-buttons: Buttons in the summary buffer.
gnus-carpal-server-buffer-buttons: Buttons in the server buffer.
gnus-carpal-browse-buffer-buttons: Buttons in the browse buffer.

All the buttons variables are lists. The elements in these list are either cons cells where the car contains a text to be displayed and the cdr contains a function symbol, or a simple string.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.11 Daemons

Gnus, being larger than any program ever written (allegedly), does lots of strange stuff that you may wish to have done while you're not present. For instance, you may want it to check for new mail once in a while. Or you may want it to close down all connections to all servers when you leave Emacs idle. And stuff like that.

Gnus will let you do stuff like that by defining various handlers. Each handler consists of three elements: A function, a time, and an idle parameter.

Here's an example of a handler that closes connections when Emacs has been idle for thirty minutes:

(gnus-demon-close-connections nil 30)

Here's a handler that scans for PGP headers every hour when Emacs is idle:

(gnus-demon-scan-pgp 60 t)

This time parameter and that idle parameter work together in a strange, but wonderful fashion. Basically, if idle is nil, then the function will be called every time minutes.

If idle is t, then the function will be called after time minutes only if Emacs is idle. So if Emacs is never idle, the function will never be called. But once Emacs goes idle, the function will be called every time minutes.

If idle is a number and time is a number, the function will be called every time minutes only when Emacs has been idle for idle minutes.

If idle is a number and time is nil, the function will be called once every time Emacs has been idle for idle minutes.

And if time is a string, it should look like `07:31', and the function will then be called once every day somewhere near that time. Modified by the idle parameter, of course.

(When I say "minute" here, I really mean gnus-demon-timestep seconds. This is 60 by default. If you change that variable, all the timings in the handlers will be affected.)

So, if you want to add a handler, you could put something like this in your `~/.gnus.el' file:

(gnus-demon-add-handler 'gnus-demon-close-connections 30 t)

Some ready-made functions to do this have been created: gnus-demon-add-nocem, gnus-demon-add-disconnection, gnus-demon-add-nntp-close-connection, gnus-demon-add-scan-timestamps, gnus-demon-add-rescan, and gnus-demon-add-scanmail. Just put those functions in your `~/.gnus.el' if you want those abilities.

If you add handlers to gnus-demon-handlers directly, you should run gnus-demon-init to make the changes take hold. To cancel all daemons, you can use the gnus-demon-cancel function.

Note that adding daemons can be pretty naughty if you over do it. Adding functions that scan all news and mail from all servers every two seconds is a sure-fire way of getting booted off any respectable system. So behave.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.12 NoCeM

Spamming is posting the same article lots and lots of times. Spamming is bad. Spamming is evil.

Spamming is usually canceled within a day or so by various anti-spamming agencies. These agencies usually also send out NoCeM messages. NoCeM is pronounced "no see-'em", and means what the name implies--these are messages that make the offending articles, like, go away.

What use are these NoCeM messages if the articles are canceled anyway? Some sites do not honor cancel messages and some sites just honor cancels from a select few people. Then you may wish to make use of the NoCeM messages, which are distributed in the `alt.nocem.misc' newsgroup.

Gnus can read and parse the messages in this group automatically, and this will make spam disappear.

There are some variables to customize, of course:

gnus-use-nocem

Set this variable to t to set the ball rolling. It is nil by default.

You can also set this variable to a positive number as a group level. In that case, Gnus scans NoCeM messages when checking new news if this value is not exceeding a group level that you specify as the prefix argument to some commands, e.g. gnus, gnus-group-get-new-news, etc. Otherwise, Gnus does not scan NoCeM messages if you specify a group level to those commands. For example, if you use 1 or 2 on the mail groups and the levels on the news groups remain the default, 3 is the best choice.

gnus-nocem-groups

Gnus will look for NoCeM messages in the groups in this list. The default is

("news.lists.filters" "news.admin.net-abuse.bulletins"
 "alt.nocem.misc" "news.admin.net-abuse.announce")

gnus-nocem-issuers

There are many people issuing NoCeM messages. This list says what people you want to listen to. The default is

("Automoose-1" "clewis@ferret.ocunix.on.ca"
 "cosmo.roadkill" "SpamHippo" "hweede@snafu.de")

fine, upstanding citizens all of them.

Known despammers that you can put in this list are listed at
http://www.xs4all.nl/~rosalind/nocemreg/nocemreg.html.

You do not have to heed NoCeM messages from all these people--just the ones you want to listen to. You also don't have to accept all NoCeM messages from the people you like. Each NoCeM message has a type header that gives the message a (more or less, usually less) rigorous definition. Common types are `spam', `spew', `mmf', `binary', and `troll'. To specify this, you have to use (issuer conditions ...) elements in the list. Each condition is either a string (which is a regexp that matches types you want to use) or a list on the form (not string), where string is a regexp that matches types you don't want to use.

For instance, if you want all NoCeM messages from Chris Lewis except his `troll' messages, you'd say:

("clewis@ferret.ocunix.on.ca" ".*" (not "troll"))

On the other hand, if you just want nothing but his `spam' and `spew' messages, you'd say:

("clewis@ferret.ocunix.on.ca" (not ".*") "spew" "spam")

The specs are applied left-to-right.

gnus-nocem-verifyer

This should be a function for verifying that the NoCeM issuer is who she says she is. The default is pgg-verify, which returns non-nil if the verification is successful, otherwise (including the case the NoCeM message was not signed) returns nil. If this is too slow and you don't care for verification (which may be dangerous), you can set this variable to nil.

Formerly the default was mc-verify, which is a Mailcrypt function. While you can still use it, you can change it into pgg-verify running with GnuPG if you are willing to add the PGP public keys to GnuPG's keyring.

gnus-nocem-directory

This is where Gnus will store its NoCeM cache files. The default is
`~/News/NoCeM/'.

gnus-nocem-expiry-wait

The number of days before removing old NoCeM entries from the cache. The default is 15. If you make it shorter Gnus will be faster, but you might then see old spam.

gnus-nocem-check-from

Non-nil means check for valid issuers in message bodies. Otherwise don't bother fetching articles unless their author matches a valid issuer; that is much faster if you are selective about the issuers.

gnus-nocem-check-article-limit

If non-nil, the maximum number of articles to check in any NoCeM group. NoCeM groups can be huge and very slow to process.

Using NoCeM could potentially be a memory hog. If you have many living (i. e., subscribed or unsubscribed groups), your Emacs process will grow big. If this is a problem, you should kill off all (or most) of your unsubscribed groups (see section 2.4 Subscription Commands).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.13 Undo

It is very useful to be able to undo actions one has done. In normal Emacs buffers, it's easy enough--you just push the undo button. In Gnus buffers, however, it isn't that simple.

The things Gnus displays in its buffer is of no value whatsoever to Gnus--it's all just data designed to look nice to the user. Killing a group in the group buffer with C-k makes the line disappear, but that's just a side-effect of the real action--the removal of the group in question from the internal Gnus structures. Undoing something like that can't be done by the normal Emacs undo function.

Gnus tries to remedy this somewhat by keeping track of what the user does and coming up with actions that would reverse the actions the user takes. When the user then presses the undo key, Gnus will run the code to reverse the previous action, or the previous actions. However, not all actions are easily reversible, so Gnus currently offers a few key functions to be undoable. These include killing groups, yanking groups, and changing the list of read articles of groups. That's it, really. More functions may be added in the future, but each added function means an increase in data to be stored, so Gnus will never be totally undoable.

The undoability is provided by the gnus-undo-mode minor mode. It is used if gnus-use-undo is non-nil, which is the default. The C-M-_ key performs the gnus-undo command, which should feel kinda like the normal Emacs undo command.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.14 Predicate Specifiers

Some Gnus variables are predicate specifiers. This is a special form that allows flexible specification of predicates without having to type all that much.

These specifiers are lists consisting of functions, symbols and lists.

Here's an example:

(or gnus-article-unseen-p
    gnus-article-unread-p)

The available symbols are or, and and not. The functions all take one parameter.

Internally, Gnus calls gnus-make-predicate on these specifiers to create a function that can be called. This input parameter to this function will be passed along to all the functions in the predicate specifier.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.15 Moderation

If you are a moderator, you can use the `gnus-mdrtn.el' package. It is not included in the standard Gnus package. Write a mail to `larsi@gnus.org' and state what group you moderate, and you'll get a copy.

The moderation package is implemented as a minor mode for summary buffers. Put

(add-hook 'gnus-summary-mode-hook 'gnus-moderate)

in your `~/.gnus.el' file.

If you are the moderator of `rec.zoofle', this is how it's supposed to work:

You split your incoming mail by matching on `Newsgroups:.*rec.zoofle', which will put all the to-be-posted articles in some mail group--for instance, `nnml:rec.zoofle'.
You enter that group once in a while and post articles using the e (edit-and-post) or s (just send unedited) commands.
If, while reading the `rec.zoofle' newsgroup, you happen upon some articles that weren't approved by you, you can cancel them with the c command.

To use moderation mode in these two groups, say:

(setq gnus-moderated-list
      "^nnml:rec.zoofle$\\|^rec.zoofle$")

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.16 Fetching a Group

It is sometimes convenient to be able to just say "I want to read this group and I don't care whether Gnus has been started or not". This is perhaps more useful for people who write code than for users, but the command gnus-fetch-group provides this functionality in any case. It takes the group name as a parameter.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17 Image Enhancements

XEmacs, as well as Emacs 21(2), is able to display pictures and stuff, so Gnus has taken advantage of that.

8.17.1 X-Face		Display a funky, teensy black-and-white image.
8.17.2 Face		Display a funkier, teensier colored image.
8.17.3 Smileys		Show all those happy faces the way they were meant to be shown.
8.17.4 Picons		How to display pictures of what you're reading.
8.17.5 Various XEmacs Variables		Other XEmacsy Gnusey variables.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.1 X-Face

X-Face headers describe a 48x48 pixel black-and-white (1 bit depth) image that's supposed to represent the author of the message. It seems to be supported by an ever-growing number of mail and news readers.

Viewing an X-Face header either requires an Emacs that has `compface' support (which most XEmacs versions has), or that you have suitable conversion or display programs installed. If your Emacs has image support the default action is to display the face before the From header. If there's no native X-Face support, Gnus will try to convert the X-Face header using external programs from the pbmplus package and friends. For XEmacs it's faster if XEmacs has been compiled with X-Face support. The default action under Emacs without image support is to fork off the display program.

On a GNU/Linux system, the display program is from the ImageMagick package. For external conversion programs look for packages with names like netpbm, libgr-progs and compface.

The variable that controls this is the gnus-article-x-face-command variable. If this variable is a string, this string will be executed in a sub-shell. If it is a function, this function will be called with the face as the argument. If the gnus-article-x-face-too-ugly (which is a regexp) matches the From header, the face will not be shown.

(Note: x-face is used in the variable/function names, not xface).

Face and variable:

gnus-x-face: Face to show X-Face. The colors from this face are used as the foreground and background colors of the displayed X-Faces. The default colors are black and white.

If you use posting styles, you can use an x-face-file entry in gnus-posting-styles, See section 5.6 Posting Styles. If you don't, Gnus provides a few convenience functions and variables to allow easier insertion of X-Face headers in outgoing messages. You also need the above mentioned ImageMagick, netpbm or other image conversion packages (depending the values of the variables below) for these functions.

gnus-random-x-face goes through all the `pbm' files in gnus-x-face-directory and picks one at random, and then converts it to the X-Face format by using the gnus-convert-pbm-to-x-face-command shell command. The `pbm' files should be 48x48 pixels big. It returns the X-Face header data as a string.

gnus-insert-random-x-face-header calls gnus-random-x-face and inserts a `X-Face' header with the randomly generated data.

gnus-x-face-from-file takes a GIF file as the parameter, and then converts the file to X-Face format by using the gnus-convert-image-to-x-face-command shell command.

Here's how you would typically use the first function. Put something like the following in your `~/.gnus.el' file:

(setq message-required-news-headers
      (nconc message-required-news-headers
             (list '(X-Face . gnus-random-x-face))))

Using the last function would be something like this:

(setq message-required-news-headers
      (nconc message-required-news-headers
             (list '(X-Face . (lambda ()
                                (gnus-x-face-from-file
                                 "~/My-face.gif"))))))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.2 Face

Face headers are essentially a funkier version of X-Face ones. They describe a 48x48 pixel colored image that's supposed to represent the author of the message.

The contents of a Face header must be a base64 encoded PNG image. See http://quimby.gnus.org/circus/face/ for the precise specifications.

Viewing an Face header requires an Emacs that is able to display PNG images.

Gnus provides a few convenience functions and variables to allow easier insertion of Face headers in outgoing messages.

gnus-convert-png-to-face takes a 48x48 PNG image, no longer than 726 bytes long, and converts it to a face.

gnus-face-from-file takes a JPEG file as the parameter, and then converts the file to Face format by using the gnus-convert-image-to-face-command shell command.

Here's how you would typically use this function. Put something like the following in your `~/.gnus.el' file:

(setq message-required-news-headers
      (nconc message-required-news-headers
             (list '(Face . (lambda ()
                              (gnus-face-from-file "~/face.jpg"))))))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.3 Smileys

Smiley is a package separate from Gnus, but since Gnus is currently the only package that uses Smiley, it is documented here.

In short--to use Smiley in Gnus, put the following in your `~/.gnus.el' file:

(setq gnus-treat-display-smileys t)

Smiley maps text smiley faces---`:-)', `8-)', `:-(' and the like--to pictures and displays those instead of the text smiley faces. The conversion is controlled by a list of regexps that matches text and maps that to file names.

The alist used is specified by the smiley-regexp-alist variable. The first item in each element is the regexp to be matched; the second element is the regexp match group that is to be replaced by the picture; and the third element is the name of the file to be displayed.

The following variables customize where Smiley will look for these files:

smiley-data-directory: Where Smiley will look for smiley faces files.
gnus-smiley-file-types: List of suffixes on smiley file names to try.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.4 Picons

So... You want to slow down your news reader even more! This is a good way to do so. It's also a great way to impress people staring over your shoulder as you read news.

What are Picons? To quote directly from the Picons Web site:

Picons is short for "personal icons". They're small, constrained images used to represent users and domains on the net, organized into databases so that the appropriate image for a given e-mail address can be found. Besides users and domains, there are picon databases for Usenet newsgroups and weather forecasts. The picons are in either monochrome XBM format or color XPM and GIF formats.

For instructions on obtaining and installing the picons databases, point your Web browser at http://www.cs.indiana.edu/picons/ftp/index.html.

If you are using Debian GNU/Linux, saying `apt-get install picons.*' will install the picons where Gnus can find them.

To enable displaying picons, simply make sure that gnus-picon-databases points to the directory containing the Picons databases.

The following variables offer control over where things are located.

gnus-picon-databases: The location of the picons database. This is a list of directories containing the `news', `domains', `users' (and so on) subdirectories. Defaults to ("/usr/lib/picon" "/usr/local/faces").
gnus-picon-news-directories: List of subdirectories to search in gnus-picon-databases for newsgroups faces. ("news") is the default.
gnus-picon-user-directories: List of subdirectories to search in gnus-picon-databases for user faces. ("users" "usenix" "local" "misc") is the default.
gnus-picon-domain-directories: List of subdirectories to search in gnus-picon-databases for domain name faces. Defaults to ("domains"). Some people may want to add `"unknown"' to this list.
gnus-picon-file-types: Ordered list of suffixes on picon file names to try. Defaults to ("xpm" "gif" "xbm") minus those not built-in your Emacs.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.5 Various XEmacs Variables

gnus-xmas-glyph-directory: This is where Gnus will look for pictures. Gnus will normally auto-detect this directory, but you may set it manually if you have an unusual directory structure.
gnus-xmas-modeline-glyph: A glyph displayed in all Gnus mode lines. It is a tiny gnu head by default.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.17.5.1 Toolbar

gnus-use-toolbar: This variable specifies the position to display the toolbar. If nil, don't display toolbars. If it is non-nil, it should be one of the symbols default, top, bottom, right, and left. default means to use the default toolbar, the rest mean to display the toolbar on the place which those names show. The default is default.
gnus-toolbar-thickness: Cons of the height and the width specifying the thickness of a toolbar. The height is used for the toolbar displayed on the top or the bottom, the width is used for the toolbar displayed on the right or the left. The default is that of the default toolbar.
gnus-group-toolbar: The toolbar in the group buffer.
gnus-summary-toolbar: The toolbar in the summary buffer.
gnus-summary-mail-toolbar: The toolbar in the summary buffer of mail groups.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.18 Fuzzy Matching

Gnus provides fuzzy matching of Subject lines when doing things like scoring, thread gathering and thread comparison.

As opposed to regular expression matching, fuzzy matching is very fuzzy. It's so fuzzy that there's not even a definition of what fuzziness means, and the implementation has changed over time.

Basically, it tries to remove all noise from lines before comparing. `Re: ', parenthetical remarks, white space, and so on, are filtered out of the strings before comparing the results. This often leads to adequate results--even when faced with strings generated by text manglers masquerading as newsreaders.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19 Thwarting Email Spam

In these last days of the Usenet, commercial vultures are hanging about and grepping through news like crazy to find email addresses they can foist off their scams and products to. As a reaction to this, many people have started putting nonsense addresses into their From lines. I think this is counterproductive--it makes it difficult for people to send you legitimate mail in response to things you write, as well as making it difficult to see who wrote what. This rewriting may perhaps be a bigger menace than the unsolicited commercial email itself in the end.

The biggest problem I have with email spam is that it comes in under false pretenses. I press g and Gnus merrily informs me that I have 10 new emails. I say "Golly gee! Happy is me!" and select the mail group, only to find two pyramid schemes, seven advertisements ("New! Miracle tonic for growing full, lustrous hair on your toes!") and one mail asking me to repent and find some god.

This is annoying. Here's what you can do about it.

8.19.1 The problem of spam		Some background, and some solutions
8.19.2 Anti-Spam Basics		Simple steps to reduce the amount of spam.
8.19.3 SpamAssassin, Vipul's Razor, DCC, etc		How to use external anti-spam tools.
8.19.4 Hashcash		Reduce spam by burning CPU time.
8.19.5 Filtering Spam Using The Spam ELisp Package
8.19.6 Filtering Spam Using Statistics with spam-stat

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.1 The problem of spam

First, some background on spam.

If you have access to e-mail, you are familiar with spam (technically termed UCE, Unsolicited Commercial E-mail). Simply put, it exists because e-mail delivery is very cheap compared to paper mail, so only a very small percentage of people need to respond to an UCE to make it worthwhile to the advertiser. Ironically, one of the most common spams is the one offering a database of e-mail addresses for further spamming. Senders of spam are usually called spammers, but terms like vermin, scum, sociopaths, and morons are in common use as well.

Spam comes from a wide variety of sources. It is simply impossible to dispose of all spam without discarding useful messages. A good example is the TMDA system, which requires senders unknown to you to confirm themselves as legitimate senders before their e-mail can reach you. Without getting into the technical side of TMDA, a downside is clearly that e-mail from legitimate sources may be discarded if those sources can't or won't confirm themselves through the TMDA system. Another problem with TMDA is that it requires its users to have a basic understanding of e-mail delivery and processing.

The simplest approach to filtering spam is filtering, at the mail server or when you sort through incoming mail. If you get 200 spam messages per day from `random-address@vmadmin.com', you block `vmadmin.com'. If you get 200 messages about `VIAGRA', you discard all messages with `VIAGRA' in the message. If you get lots of spam from Bulgaria, for example, you try to filter all mail from Bulgarian IPs.

This, unfortunately, is a great way to discard legitimate e-mail. The risks of blocking a whole country (Bulgaria, Norway, Nigeria, China, etc.) or even a continent (Asia, Africa, Europe, etc.) from contacting you should be obvious, so don't do it if you have the choice.

In another instance, the very informative and useful RISKS digest has been blocked by overzealous mail filters because it contained words that were common in spam messages. Nevertheless, in isolated cases, with great care, direct filtering of mail can be useful.

Another approach to filtering e-mail is the distributed spam processing, for instance DCC implements such a system. In essence, N systems around the world agree that a machine X in Ghana, Estonia, or California is sending out spam e-mail, and these N systems enter X or the spam e-mail from X into a database. The criteria for spam detection vary--it may be the number of messages sent, the content of the messages, and so on. When a user of the distributed processing system wants to find out if a message is spam, he consults one of those N systems.

Distributed spam processing works very well against spammers that send a large number of messages at once, but it requires the user to set up fairly complicated checks. There are commercial and free distributed spam processing systems. Distributed spam processing has its risks as well. For instance legitimate e-mail senders have been accused of sending spam, and their web sites and mailing lists have been shut down for some time because of the incident.

The statistical approach to spam filtering is also popular. It is based on a statistical analysis of previous spam messages. Usually the analysis is a simple word frequency count, with perhaps pairs of words or 3-word combinations thrown into the mix. Statistical analysis of spam works very well in most of the cases, but it can classify legitimate e-mail as spam in some cases. It takes time to run the analysis, the full message must be analyzed, and the user has to store the database of spam analyses. Statistical analysis on the server is gaining popularity. This has the advantage of letting the user Just Read Mail, but has the disadvantage that it's harder to tell the server that it has misclassified mail.

Fighting spam is not easy, no matter what anyone says. There is no magic switch that will distinguish Viagra ads from Mom's e-mails. Even people are having a hard time telling spam apart from non-spam, because spammers are actively looking to fool us into thinking they are Mom, essentially. Spamming is irritating, irresponsible, and idiotic behavior from a bunch of people who think the world owes them a favor. We hope the following sections will help you in fighting the spam plague.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.2 Anti-Spam Basics

One way of dealing with spam is having Gnus split out all spam into a `spam' mail group (see section 6.3.3 Splitting Mail).

First, pick one (1) valid mail address that you can be reached at, and put it in your From header of all your news articles. (I've chosen `larsi@trym.ifi.uio.no', but for many addresses on the form `larsi+usenet@ifi.uio.no' will be a better choice. Ask your sysadmin whether your sendmail installation accepts keywords in the local part of the mail address.)

(setq message-default-news-headers
      "From: Lars Magne Ingebrigtsen <larsi@trym.ifi.uio.no>\n")

Then put the following split rule in nnmail-split-fancy (see section 6.3.6 Fancy Mail Splitting):

(...
 (to "larsi@trym.ifi.uio.no"
     (| ("subject" "re:.*" "misc")
        ("references" ".*@.*" "misc")
        "spam"))
 ...)

This says that all mail to this address is suspect, but if it has a Subject that starts with a `Re:' or has a References header, it's probably ok. All the rest goes to the `spam' group. (This idea probably comes from Tim Pierce.)

In addition, many mail spammers talk directly to your SMTP server and do not include your email address explicitly in the To header. Why they do this is unknown--perhaps it's to thwart this thwarting scheme? In any case, this is trivial to deal with--you just put anything not addressed to you in the `spam' group by ending your fancy split rule in this way:

(
 ...
 (to "larsi" "misc")
 "spam")

In my experience, this will sort virtually everything into the right group. You still have to check the `spam' group from time to time to check for legitimate mail, though. If you feel like being a good net citizen, you can even send off complaints to the proper authorities on each unsolicited commercial email--at your leisure.

This works for me. It allows people an easy way to contact me (they can just press r in the usual way), and I'm not bothered at all with spam. It's a win-win situation. Forging From headers to point to non-existent domains is yucky, in my opinion.

Be careful with this approach. Spammers are wise to it.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.3 SpamAssassin, Vipul's Razor, DCC, etc

The days where the hints in the previous section were sufficient in avoiding spam are coming to an end. There are many tools out there that claim to reduce the amount of spam you get. This section could easily become outdated fast, as new products replace old, but fortunately most of these tools seem to have similar interfaces. Even though this section will use SpamAssassin as an example, it should be easy to adapt it to most other tools.

Note that this section does not involve the spam.el package, which is discussed in the next section. If you don't care for all the features of spam.el, you can make do with these simple recipes.

If the tool you are using is not installed on the mail server, you need to invoke it yourself. Ideas on how to use the :postscript mail source parameter (see section 6.3.4.1 Mail Source Specifiers) follow.

(setq mail-sources
      '((file :prescript "formail -bs spamassassin < /var/mail/%u")
        (pop :user "jrl"
             :server "pophost"
             :postscript
             "mv %t /tmp/foo; formail -bs spamc < /tmp/foo > %t")))

Once you manage to process your incoming spool somehow, thus making the mail contain e.g. a header indicating it is spam, you are ready to filter it out. Using normal split methods (see section 6.3.3 Splitting Mail):

(setq nnmail-split-methods '(("spam"  "^X-Spam-Flag: YES")
                             ...))

Or using fancy split methods (see section 6.3.6 Fancy Mail Splitting):

(setq nnmail-split-methods 'nnmail-split-fancy
      nnmail-split-fancy '(| ("X-Spam-Flag" "YES" "spam")
                             ...))

Some people might not like the idea of piping the mail through various programs using a :prescript (if some program is buggy, you might lose all mail). If you are one of them, another solution is to call the external tools during splitting. Example fancy split method:

(setq nnmail-split-fancy '(| (: kevin-spamassassin)
                             ...))
(defun kevin-spamassassin ()
  (save-excursion
    (save-restriction
      (widen)
      (if (eq 1 (call-process-region (point-min) (point-max)
                                     "spamc" nil nil nil "-c"))
          "spam"))))

Note that with the nnimap backend, message bodies will not be downloaded by default. You need to set nnimap-split-download-body to t to do that (see section 6.5.1 Splitting in IMAP).

That is about it. As some spam is likely to get through anyway, you might want to have a nifty function to call when you happen to read spam. And here is the nifty function:

 (defun my-gnus-raze-spam ()
  "Submit SPAM to Vipul's Razor, then mark it as expirable."
  (interactive)
  (gnus-summary-show-raw-article)
  (gnus-summary-save-in-pipe "razor-report -f -d")
  (gnus-summary-mark-as-expirable 1))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.4 Hashcash

A novel technique to fight spam is to require senders to do something costly for each message they send. This has the obvious drawback that you cannot rely on everyone in the world using this technique, since it is not part of the Internet standards, but it may be useful in smaller communities.

While the tools in the previous section work well in practice, they work only because the tools are constantly maintained and updated as new form of spam appears. This means that a small percentage of spam will always get through. It also means that somewhere, someone needs to read lots of spam to update these tools. Hashcash avoids that, but instead prefers that everyone you contact through e-mail supports the scheme. You can view the two approaches as pragmatic vs dogmatic. The approaches have their own advantages and disadvantages, but as often in the real world, a combination of them is stronger than either one of them separately.

The "something costly" is to burn CPU time, more specifically to compute a hash collision up to a certain number of bits. The resulting hashcash cookie is inserted in a `X-Hashcash:' header. For more details, and for the external application hashcash you need to install to use this feature, see http://www.cypherspace.org/~adam/hashcash/. Even more information can be found at http://www.camram.org/.

If you wish to call hashcash for each message you send, say something like:

(require 'hashcash)
(add-hook 'message-send-hook 'mail-add-payment)

The `hashcash.el' library can be found in the Gnus development contrib directory or at http://users.actrix.gen.nz/mycroft/hashcash.el.

You will need to set up some additional variables as well:

hashcash-default-payment: This variable indicates the default number of bits the hash collision should consist of. By default this is 0, meaning nothing will be done. Suggested useful values include 17 to 29.
hashcash-payment-alist: Some receivers may require you to spend burn more CPU time than the default. This variable contains a list of `(addr amount)' cells, where addr is the receiver (email address or newsgroup) and amount is the number of bits in the collision that is needed. It can also contain `(addr string amount)' cells, where the string is the string to use (normally the email address or newsgroup name is used).
hashcash: Where the hashcash binary is installed.

Currently there is no built in functionality in Gnus to verify hashcash cookies, it is expected that this is performed by your hand customized mail filtering scripts. Improvements in this area would be a useful contribution, however.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5 Filtering Spam Using The Spam ELisp Package

The idea behind `spam.el' is to have a control center for spam detection and filtering in Gnus. To that end, `spam.el' does two things: it filters new mail, and it analyzes mail known to be spam or ham. Ham is the name used throughout `spam.el' to indicate non-spam messages.

First of all, you must run the function spam-initialize to autoload spam.el and to install the spam.el hooks. There is one exception: if you use the spam-use-stat (see section 8.19.5.13 spam-stat spam filtering) setting, you should turn it on before spam-initialize:

(setq spam-use-stat t) ;; if needed
(spam-initialize)

So, what happens when you load `spam.el'?

First, some hooks will get installed by spam-initialize. There are some hooks for spam-stat so it can save its databases, and there are hooks so interesting things will happen when you enter and leave a group. More on the sequence of events later (see section 8.19.5.1 Spam ELisp Package Sequence of Events).

You get the following keyboard commands:

M-d

M s x

S x

gnus-summary-mark-as-spam.

Mark current article as spam, showing it with the `$' mark. Whenever you see a spam article, make sure to mark its summary line with M-d before leaving the group. This is done automatically for unread articles in spam groups.

M s t

S t

spam-bogofilter-score.

You must have Bogofilter installed for that command to work properly.

See section 8.19.5.11 Bogofilter.

Also, when you load `spam.el', you will be able to customize its variables. Try customize-group on the `spam' variable group.

8.19.5.1 Spam ELisp Package Sequence of Events
8.19.5.2 Spam ELisp Package Filtering of Incoming Mail
8.19.5.3 Spam ELisp Package Global Variables
8.19.5.4 Spam ELisp Package Configuration Examples
8.19.5.5 Blacklists and Whitelists
8.19.5.6 BBDB Whitelists
8.19.5.7 Gmane Spam Reporting
8.19.5.8 Anti-spam Hashcash Payments
8.19.5.9 Blackholes
8.19.5.10 Regular Expressions Header Matching
8.19.5.11 Bogofilter
8.19.5.12 ifile spam filtering
8.19.5.13 spam-stat spam filtering
8.19.5.14 Using SpamOracle with Gnus
8.19.5.15 Extending the Spam ELisp package

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.1 Spam ELisp Package Sequence of Events

You must read this section to understand how spam.el works. Do not skip, speed-read, or glance through this section.

There are two contact points, if you will, between spam.el and the rest of Gnus: checking new mail for spam, and leaving a group.

Getting new mail is done in one of two ways. You can either split your incoming mail or you can classify new articles as ham or spam when you enter the group.

Splitting incoming mail is better suited to mail backends such as nnml or nnimap where new mail appears in a single file called a Spool File. See See section 8.19.5.2 Spam ELisp Package Filtering of Incoming Mail.

For backends such as nntp there is no incoming mail spool, so an alternate mechanism must be used. This may also happen for backends where the server is in charge of splitting incoming mail, and Gnus does not do further splitting. The spam-autodetect and spam-autodetect-methods group parameters (accessible with G c and G p as usual), and the corresponding variables gnus-spam-autodetect-methods and gnus-spam-autodetect-methods (accessible with M-x customize-variable as usual).

When spam-autodetect is used, it hooks into the process of entering a group. Thus, entering a group with unseen or unread articles becomes the substitute for checking incoming mail. Whether only unseen articles or all unread articles will be processed is determined by the spam-autodetect-recheck-messages. When set to t, unread messages will be rechecked.

spam-autodetect grants the user at once more and less control of spam filtering. The user will have more control over each group's spam methods, so for instance the `ding' group may have spam-use-BBDB as the autodetection method, while the `suspect' group may have the spam-use-blacklist and spam-use-bogofilter methods enabled. Every article detected to be spam will be marked with the spam mark `$' and processed on exit from the group as normal spam. The user has less control over the sequence of checks, as he might with spam-split.

When the newly split mail goes into groups, or messages are autodetected to be ham or spam, those groups must be exited (after entering, if needed) for further spam processing to happen. It matters whether the group is considered a ham group, a spam group, or is unclassified, based on its spam-content parameter (see section 8.19.5.3 Spam ELisp Package Global Variables). Spam groups have the additional characteristic that, when entered, any unseen or unread articles (depending on the spam-mark-only-unseen-as-spam variable) will be marked as spam. Thus, mail split into a spam group gets automatically marked as spam when you enter the group.

So, when you exit a group, the spam-processors are applied, if any are set, and the processed mail is moved to the ham-process-destination or the spam-process-destination depending on the article's classification. If the ham-process-destination or the spam-process-destination, whichever is appropriate, are nil, the article is left in the current group.

If a spam is found in any group (this can be changed to only non-spam groups with spam-move-spam-nonspam-groups-only), it is processed by the active spam-processors (see section 8.19.5.3 Spam ELisp Package Global Variables) when the group is exited. Furthermore, the spam is moved to the spam-process-destination (see section 8.19.5.3 Spam ELisp Package Global Variables) for further training or deletion. You have to load the gnus-registry.el package and enable the spam-log-to-registry variable if you want spam to be processed no more than once. Thus, spam is detected and processed everywhere, which is what most people want. If the spam-process-destination is nil, the spam is marked as expired, which is usually the right thing to do.

If spam can not be moved--because of a read-only backend such as NNTP, for example, it will be copied.

If a ham mail is found in a ham group, as determined by the ham-marks parameter, it is processed as ham by the active ham spam-processor when the group is exited. With the variables spam-process-ham-in-spam-groups and spam-process-ham-in-nonham-groups the behavior can be further altered so ham found anywhere can be processed. You have to load the gnus-registry.el package and enable the spam-log-to-registry variable if you want ham to be processed no more than once. Thus, ham is detected and processed only when necessary, which is what most people want. More on this in See section 8.19.5.4 Spam ELisp Package Configuration Examples.

If ham can not be moved--because of a read-only backend such as NNTP, for example, it will be copied.

If all this seems confusing, don't worry. Soon it will be as natural as typing Lisp one-liners on a neural interface... err, sorry, that's 50 years in the future yet. Just trust us, it's not so bad.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.2 Spam ELisp Package Filtering of Incoming Mail

To use the `spam.el' facilities for incoming mail filtering, you must add the following to your fancy split list nnmail-split-fancy or nnimap-split-fancy:

(: spam-split)

Note that the fancy split may be called nnmail-split-fancy or nnimap-split-fancy, depending on whether you use the nnmail or nnimap back ends to retrieve your mail.

Also, spam-split will not modify incoming mail in any way.

The spam-split function will process incoming mail and send the mail considered to be spam into the group name given by the variable spam-split-group. By default that group name is `spam', but you can customize spam-split-group. Make sure the contents of spam-split-group are an unqualified group name, for instance in an nnimap server `your-server' the value `spam' will turn out to be `nnimap+your-server:spam'. The value `nnimap+server:spam', therefore, is wrong and will actually give you the group `nnimap+your-server:nnimap+server:spam' which may or may not work depending on your server's tolerance for strange group names.

You can also give spam-split a parameter, e.g. spam-use-regex-headers or "maybe-spam". Why is this useful?

Take these split rules (with spam-use-regex-headers and spam-use-blackholes set):

 nnimap-split-fancy '(|
                      (any "ding" "ding")
                      (: spam-split)
                      ;; default mailbox
                      "mail")

Now, the problem is that you want all ding messages to make it to the ding folder. But that will let obvious spam (for example, spam detected by SpamAssassin, and spam-use-regex-headers) through, when it's sent to the ding list. On the other hand, some messages to the ding list are from a mail server in the blackhole list, so the invocation of spam-split can't be before the ding rule.

You can let SpamAssassin headers supersede ding rules, but all other spam-split rules (including a second invocation of the regex-headers check) will be after the ding rule:

nnimap-split-fancy
      '(|
        ;; all spam detected by spam-use-regex-headers goes to `regex-spam'
        (: spam-split "regex-spam" 'spam-use-regex-headers)
        (any "ding" "ding")
        ;; all other spam detected by spam-split goes to spam-split-group
        (: spam-split)
        ;; default mailbox
        "mail")

This lets you invoke specific spam-split checks depending on your particular needs, and to target the results of those checks to a particular spam group. You don't have to throw all mail into all the spam tests. Another reason why this is nice is that messages to mailing lists you have rules for don't have to have resource-intensive blackhole checks performed on them. You could also specify different spam checks for your nnmail split vs. your nnimap split. Go crazy.

You should still have specific checks such as spam-use-regex-headers set to t, even if you specifically invoke spam-split with the check. The reason is that when loading `spam.el', some conditional loading is done depending on what spam-use-xyz variables you have set. This is usually not critical, though.

Note for IMAP users

The boolean variable nnimap-split-download-body needs to be set, if you want to split based on the whole message instead of just the headers. By default, the nnimap back end will only retrieve the message headers. If you use spam-check-bogofilter, spam-check-ifile, or spam-check-stat (the splitters that can benefit from the full message body), you should set this variable. It is not set by default because it will slow IMAP down, and that is not an appropriate decision to make on behalf of the user.

See section 6.5.1 Splitting in IMAP.

TODO: spam.el needs to provide a uniform way of training all the statistical databases. Some have that functionality built-in, others don't.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.3 Spam ELisp Package Global Variables

The concepts of ham processors and spam processors are very important. Ham processors and spam processors for a group can be set with the spam-process group parameter, or the gnus-spam-process-newsgroups variable. Ham processors take mail known to be non-spam (ham) and process it in some way so that later similar mail will also be considered non-spam. Spam processors take mail known to be spam and process it so similar spam will be detected later.

The format of the spam or ham processor entry used to be a symbol, but now it is a CONS cell. See the individual spam processor entries for more information.

Gnus learns from the spam you get. You have to collect your spam in one or more spam groups, and set or customize the variable spam-junk-mailgroups as appropriate. You can also declare groups to contain spam by setting their group parameter spam-contents to gnus-group-spam-classification-spam, or by customizing the corresponding variable gnus-spam-newsgroup-contents. The spam-contents group parameter and the gnus-spam-newsgroup-contents variable can also be used to declare groups as ham groups if you set their classification to gnus-group-spam-classification-ham. If groups are not classified by means of spam-junk-mailgroups, spam-contents, or gnus-spam-newsgroup-contents, they are considered unclassified. All groups are unclassified by default.

In spam groups, all messages are considered to be spam by default: they get the `$' mark (gnus-spam-mark) when you enter the group. If you have seen a message, had it marked as spam, then unmarked it, it won't be marked as spam when you enter the group thereafter. You can disable that behavior, so all unread messages will get the `$' mark, if you set the spam-mark-only-unseen-as-spam parameter to nil. You should remove the `$' mark when you are in the group summary buffer for every message that is not spam after all. To remove the `$' mark, you can use M-u to "unread" the article, or d for declaring it read the non-spam way. When you leave a group, all spam-marked (`$') articles are sent to a spam processor which will study them as spam samples.

Messages may also be deleted in various other ways, and unless ham-marks group parameter gets overridden below, marks `R' and `r' for default read or explicit delete, marks `X' and `K' for automatic or explicit kills, as well as mark `Y' for low scores, are all considered to be associated with articles which are not spam. This assumption might be false, in particular if you use kill files or score files as means for detecting genuine spam, you should then adjust the ham-marks group parameter.

Variable: ham-marks: You can customize this group or topic parameter to be the list of marks you want to consider ham. By default, the list contains the deleted, read, killed, kill-filed, and low-score marks (the idea is that these articles have been read, but are not spam). It can be useful to also include the tick mark in the ham marks. It is not recommended to make the unread mark a ham mark, because it normally indicates a lack of classification. But you can do it, and we'll be happy for you.

Variable: spam-marks: You can customize this group or topic parameter to be the list of marks you want to consider spam. By default, the list contains only the spam mark. It is not recommended to change that, but you can if you really want to.

When you leave any group, regardless of its spam-contents classification, all spam-marked articles are sent to a spam processor, which will study these as spam samples. If you explicit kill a lot, you might sometimes end up with articles marked `K' which you never saw, and which might accidentally contain spam. Best is to make sure that real spam is marked with `$', and nothing else.

When you leave a spam group, all spam-marked articles are marked as expired after processing with the spam processor. This is not done for unclassified or ham groups. Also, any ham articles in a spam group will be moved to a location determined by either the ham-process-destination group parameter or a match in the gnus-ham-process-destinations variable, which is a list of regular expressions matched with group names (it's easiest to customize this variable with M-x customize-variable RET gnus-ham-process-destinations). Each group name list is a standard Lisp list, if you prefer to customize the variable manually. If the ham-process-destination parameter is not set, ham articles are left in place. If the spam-mark-ham-unread-before-move-from-spam-group parameter is set, the ham articles are marked as unread before being moved.

If ham can not be moved--because of a read-only backend such as NNTP, for example, it will be copied.

Note that you can use multiples destinations per group or regular expression! This enables you to send your ham to a regular mail group and to a ham training group.

When you leave a ham group, all ham-marked articles are sent to a ham processor, which will study these as non-spam samples.

By default the variable spam-process-ham-in-spam-groups is nil. Set it to t if you want ham found in spam groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

By default the variable spam-process-ham-in-nonham-groups is nil. Set it to t if you want ham found in non-ham (spam or unclassified) groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

When you leave a ham or unclassified group, all spam articles are moved to a location determined by either the spam-process-destination group parameter or a match in the gnus-spam-process-destinations variable, which is a list of regular expressions matched with group names (it's easiest to customize this variable with M-x customize-variable RET gnus-spam-process-destinations). Each group name list is a standard Lisp list, if you prefer to customize the variable manually. If the spam-process-destination parameter is not set, the spam articles are only expired. The group name is fully qualified, meaning that if you see `nntp:servername' before the group name in the group buffer then you need it here as well.

If spam can not be moved--because of a read-only backend such as NNTP, for example, it will be copied.

Note that you can use multiples destinations per group or regular expression! This enables you to send your spam to multiple spam training groups.

The problem with processing ham and spam is that Gnus doesn't track this processing by default. Enable the spam-log-to-registry variable so spam.el will use gnus-registry.el to track what articles have been processed, and avoid processing articles multiple times. Keep in mind that if you limit the number of registry entries, this won't work as well as it does without a limit.

Set this variable if you want only unseen articles in spam groups to be marked as spam. By default, it is set. If you set it to nil, unread articles will also be marked as spam.

Set this variable if you want ham to be unmarked before it is moved out of the spam group. This is very useful when you use something like the tick mark `!' to mark ham--the article will be placed in your ham-process-destination, unmarked as if it came fresh from the mail server.

When autodetecting spam, this variable tells spam.el whether only unseen articles or all unread articles should be checked for spam. It is recommended that you leave it off.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.4 Spam ELisp Package Configuration Examples

Ted's setup

From Ted Zlatanov <tzz@lifelogs.com>.

;; for gnus-registry-split-fancy-with-parent and spam autodetection
;; see `gnus-registry.el' for more information
(gnus-registry-initialize)
(spam-initialize)

;; I like C-s for marking spam
(define-key gnus-summary-mode-map "\C-s" 'gnus-summary-mark-as-spam)

(setq
 spam-log-to-registry t     ; for spam autodetection
 spam-use-BBDB t
 spam-use-regex-headers t   ; catch X-Spam-Flag (SpamAssassin)
 ;; all groups with `spam' in the name contain spam
 gnus-spam-newsgroup-contents
  '(("spam" gnus-group-spam-classification-spam))
 ;; see documentation for these
 spam-move-spam-nonspam-groups-only nil
 spam-mark-only-unseen-as-spam t
 spam-mark-ham-unread-before-move-from-spam-group t
 nnimap-split-rule 'nnimap-split-fancy
 ;; understand what this does before you copy it to your own setup!
 nnimap-split-fancy '(|
                      ;; trace references to parents and put in their group
                      (: gnus-registry-split-fancy-with-parent)
                      ;; this will catch server-side SpamAssassin tags
                      (: spam-split 'spam-use-regex-headers)
                      (any "ding" "ding")
                      ;; note that spam by default will go to `spam'
                      (: spam-split)
                      ;; default mailbox
                      "mail"))

;; my parameters, set with G p

;; all nnml groups, and all nnimap groups except
;; `nnimap+mail.lifelogs.com:train' and
;; `nnimap+mail.lifelogs.com:spam': any spam goes to nnimap training,
;; because it must have been detected manually

((spam-process-destination . "nnimap+mail.lifelogs.com:train"))

;; all NNTP groups
;; autodetect spam with the blacklist and ham with the BBDB
((spam-autodetect-methods spam-use-blacklist spam-use-BBDB)
;; send all spam to the training group
 (spam-process-destination . "nnimap+mail.lifelogs.com:train"))

;; only some NNTP groups, where I want to autodetect spam
((spam-autodetect . t))

;; my nnimap `nnimap+mail.lifelogs.com:spam' group

;; this is a spam group
((spam-contents gnus-group-spam-classification-spam)

 ;; any spam (which happens when I enter for all unseen messages,
 ;; because of the gnus-spam-newsgroup-contents setting above), goes to
 ;; `nnimap+mail.lifelogs.com:train' unless I mark it as ham

 (spam-process-destination "nnimap+mail.lifelogs.com:train")

 ;; any ham goes to my `nnimap+mail.lifelogs.com:mail' folder, but
 ;; also to my `nnimap+mail.lifelogs.com:trainham' folder for training

 (ham-process-destination "nnimap+mail.lifelogs.com:mail"
                          "nnimap+mail.lifelogs.com:trainham")
 ;; in this group, only `!' marks are ham
 (ham-marks
  (gnus-ticked-mark))
 ;; remembers senders in the blacklist on the way out---this is
 ;; definitely not needed, it just makes me feel better
 (spam-process (gnus-group-spam-exit-processor-blacklist)))

;; Later, on the IMAP server I use the `train' group for training
;; SpamAssassin to recognize spam, and the `trainham' group fora
;; recognizing ham---but Gnus has nothing to do with it.

Using ``spam.el`' on an IMAP server with a statistical filter on the server

From Reiner Steib <reiner.steib@gmx.de>.

My provider has set up bogofilter (in combination with DCC) on the mail server (IMAP). Recognized spam goes to `spam.detected', the rest goes through the normal filter rules, i.e. to `some.folder' or to `INBOX'. Training on false positives or negatives is done by copying or moving the article to `training.ham' or `training.spam' respectively. A cron job on the server feeds those to bogofilter with the suitable ham or spam options and deletes them from the `training.ham' and `training.spam' folders.

With the following entries in gnus-parameters, spam.el does most of the job for me:

   ("nnimap:spam\\.detected"
    (gnus-article-sort-functions '(gnus-article-sort-by-chars))
    (ham-process-destination "nnimap:INBOX" "nnimap:training.ham")
    (spam-contents gnus-group-spam-classification-spam))
   ("nnimap:\\(INBOX\\|other-folders\\)"
    (spam-process-destination . "nnimap:training.spam")
    (spam-contents gnus-group-spam-classification-ham))

The Spam folder:
In the folder `spam.detected', I have to check for false positives (i.e. legitimate mails, that were wrongly judged as spam by bogofilter or DCC).

Because of the gnus-group-spam-classification-spam entry, all messages are marked as spam (with $). When I find a false positive, I mark the message with some other ham mark (ham-marks, 8.19.5.3 Spam ELisp Package Global Variables). On group exit, those messages are copied to both groups, `INBOX' (where I want to have the article) and `training.ham' (for training bogofilter) and deleted from the `spam.detected' folder.

The gnus-article-sort-by-chars entry simplifies detection of false positives for me. I receive lots of worms (sweN, ...), that all have a similar size. Grouping them by size (i.e. chars) makes finding other false positives easier. (Of course worms aren't spam (UCE, UBE) strictly speaking. Anyhow, bogofilter is an excellent tool for filtering those unwanted mails for me.)
Ham folders:
In my ham folders, I just hit S x (gnus-summary-mark-as-spam) whenever I see an unrecognized spam mail (false negative). On group exit, those messages are moved to `training.ham'.

Reporting spam articles in Gmane groups with `spam-report.el`

From Reiner Steib <reiner.steib@gmx.de>.

With following entry in gnus-parameters, S x (gnus-summary-mark-as-spam) marks articles in gmane.* groups as spam and reports the to Gmane at group exit:

   ("^gmane\\."
    (spam-process (gnus-group-spam-exit-processor-report-gmane)))

Additionally, I use (setq spam-report-gmane-use-article-number nil) because I don't read the groups directly from news.gmane.org, but through my local news server (leafnode). I.e. the article numbers are not the same as on news.gmane.org, thus spam-report.el has to check the X-Report-Spam header to find the correct number.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.5 Blacklists and Whitelists

Variable: spam-use-blacklist

Set this variable to t if you want to use blacklists when splitting incoming mail. Messages whose senders are in the blacklist will be sent to the spam-split-group. This is an explicit filter, meaning that it acts only on mail senders declared to be spammers.

Variable: spam-use-whitelist

Set this variable to t if you want to use whitelists when splitting incoming mail. Messages whose senders are not in the whitelist will be sent to the next spam-split rule. This is an explicit filter, meaning that unless someone is in the whitelist, their messages are not assumed to be spam or ham.

Variable: spam-use-whitelist-exclusive

Set this variable to t if you want to use whitelists as an implicit filter, meaning that every message will be considered spam unless the sender is in the whitelist. Use with care.

Variable: gnus-group-spam-exit-processor-blacklist

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the senders of spam-marked articles will be added to the blacklist.

WARNING

Instead of the obsolete gnus-group-spam-exit-processor-blacklist, it is recommended that you use '(spam spam-use-blacklist). Everything will work the same way, we promise.

Variable: gnus-group-ham-exit-processor-whitelist

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the senders of ham-marked articles in ham groups will be added to the whitelist. Note that this ham processor has no effect in spam or unclassified groups.

WARNING

Instead of the obsolete gnus-group-ham-exit-processor-whitelist, it is recommended that you use '(ham spam-use-whitelist). Everything will work the same way, we promise.

Blacklists are lists of regular expressions matching addresses you consider to be spam senders. For instance, to block mail from any sender at `vmadmin.com', you can put `vmadmin.com' in your blacklist. You start out with an empty blacklist. Blacklist entries use the Emacs regular expression syntax.

Conversely, whitelists tell Gnus what addresses are considered legitimate. All messages from whitelisted addresses are considered non-spam. Also see 8.19.5.6 BBDB Whitelists. Whitelist entries use the Emacs regular expression syntax.

The blacklist and whitelist file locations can be customized with the spam-directory variable (`~/News/spam' by default), or the spam-whitelist and spam-blacklist variables directly. The whitelist and blacklist files will by default be in the spam-directory directory, named `whitelist' and `blacklist' respectively.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.6 BBDB Whitelists

Variable: spam-use-BBDB

Analogous to spam-use-whitelist (see section 8.19.5.5 Blacklists and Whitelists), but uses the BBDB as the source of whitelisted addresses, without regular expressions. You must have the BBDB loaded for spam-use-BBDB to work properly. Messages whose senders are not in the BBDB will be sent to the next spam-split rule. This is an explicit filter, meaning that unless someone is in the BBDB, their messages are not assumed to be spam or ham.

Variable: spam-use-BBDB-exclusive

Set this variable to t if you want to use the BBDB as an implicit filter, meaning that every message will be considered spam unless the sender is in the BBDB. Use with care. Only sender addresses in the BBDB will be allowed through; all others will be classified as spammers.

Variable: gnus-group-ham-exit-processor-BBDB

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the senders of ham-marked articles in ham groups will be added to the BBDB. Note that this ham processor has no effect in spam or unclassified groups.

WARNING

Instead of the obsolete gnus-group-ham-exit-processor-BBDB, it is recommended that you use '(ham spam-use-BBDB). Everything will work the same way, we promise.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.7 Gmane Spam Reporting

Variable: gnus-group-spam-exit-processor-report-gmane

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the spam-marked articles groups will be reported to the Gmane administrators via a HTTP request.

Gmane can be found at http://gmane.org.

WARNING

Instead of the obsolete gnus-group-spam-exit-processor-report-gmane, it is recommended that you use '(spam spam-use-gmane). Everything will work the same way, we promise.

Variable: spam-report-gmane-use-article-number

This variable is t by default. Set it to nil if you are running your own news server, for instance, and the local article numbers don't correspond to the Gmane article numbers. When spam-report-gmane-use-article-number is nil, spam-report.el will use the X-Report-Spam header that Gmane provides.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.8 Anti-spam Hashcash Payments

Variable: spam-use-hashcash

Similar to spam-use-whitelist (see section 8.19.5.5 Blacklists and Whitelists), but uses hashcash tokens for whitelisting messages instead of the sender address. You must have the hashcash.el package loaded for spam-use-hashcash to work properly. Messages without a hashcash payment token will be sent to the next spam-split rule. This is an explicit filter, meaning that unless a hashcash token is found, the messages are not assumed to be spam or ham.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.9 Blackholes

Variable: spam-use-blackholes

This option is disabled by default. You can let Gnus consult the blackhole-type distributed spam processing systems (DCC, for instance) when you set this option. The variable spam-blackhole-servers holds the list of blackhole servers Gnus will consult. The current list is fairly comprehensive, but make sure to let us know if it contains outdated servers.

The blackhole check uses the dig.el package, but you can tell `spam.el' to use dns.el instead for better performance if you set spam-use-dig to nil. It is not recommended at this time to set spam-use-dig to nil despite the possible performance improvements, because some users may be unable to use it, but you can try it and see if it works for you.

Variable: spam-blackhole-servers

The list of servers to consult for blackhole checks.

Variable: spam-blackhole-good-server-regex

A regular expression for IPs that should not be checked against the blackhole server list. When set to nil, it has no effect.

Variable: spam-use-dig

Use the dig.el package instead of the dns.el package. The default setting of t is recommended.

Blackhole checks are done only on incoming mail. There is no spam or ham processor for blackholes.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.10 Regular Expressions Header Matching

Variable: spam-use-regex-headers

This option is disabled by default. You can let Gnus check the message headers against lists of regular expressions when you set this option. The variables spam-regex-headers-spam and spam-regex-headers-ham hold the list of regular expressions. Gnus will check against the message headers to determine if the message is spam or ham, respectively.

Variable: spam-regex-headers-spam

The list of regular expressions that, when matched in the headers of the message, positively identify it as spam.

Variable: spam-regex-headers-ham

The list of regular expressions that, when matched in the headers of the message, positively identify it as ham.

Regular expression header checks are done only on incoming mail. There is no specific spam or ham processor for regular expressions.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.11 Bogofilter

Variable: spam-use-bogofilter

Set this variable if you want spam-split to use Eric Raymond's speedy Bogofilter.

With a minimum of care for associating the `$' mark for spam articles only, Bogofilter training all gets fairly automatic. You should do this until you get a few hundreds of articles in each category, spam or not. The command S t in summary mode, either for debugging or for curiosity, shows the spamicity score of the current article (between 0.0 and 1.0).

Bogofilter determines if a message is spam based on a specific threshold. That threshold can be customized, consult the Bogofilter documentation.

If the bogofilter executable is not in your path, Bogofilter processing will be turned off.

You should not enable this if you use spam-use-bogofilter-headers.

Variable: spam-use-bogofilter-headers

Set this variable if you want spam-split to use Eric Raymond's speedy Bogofilter, looking only at the message headers. It works similarly to spam-use-bogofilter, but the X-Bogosity header must be in the message already. Normally you would do this with a procmail recipe or something similar; consult the Bogofilter installation documents for details.

You should not enable this if you use spam-use-bogofilter.

Variable: gnus-group-spam-exit-processor-bogofilter

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, spam-marked articles will be added to the Bogofilter spam database.

WARNING

Instead of the obsolete gnus-group-spam-exit-processor-bogofilter, it is recommended that you use '(spam spam-use-bogofilter). Everything will work the same way, we promise.

Variable: gnus-group-ham-exit-processor-bogofilter

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the ham-marked articles in ham groups will be added to the Bogofilter database of non-spam messages. Note that this ham processor has no effect in spam or unclassified groups.

WARNING

Instead of the obsolete gnus-group-ham-exit-processor-bogofilter, it is recommended that you use '(ham spam-use-bogofilter). Everything will work the same way, we promise.

Variable: spam-bogofilter-database-directory

This is the directory where Bogofilter will store its databases. It is not specified by default, so Bogofilter will use its own default database directory.

The Bogofilter mail classifier is similar to ifile in intent and purpose. A ham and a spam processor are provided, plus the spam-use-bogofilter and spam-use-bogofilter-headers variables to indicate to spam-split that Bogofilter should either be used, or has already been used on the article. The 0.9.2.1 version of Bogofilter was used to test this functionality.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.12 ifile spam filtering

Variable: spam-use-ifile

Enable this variable if you want spam-split to use ifile, a statistical analyzer similar to Bogofilter.

Variable: spam-ifile-all-categories

Enable this variable if you want spam-use-ifile to give you all the ifile categories, not just spam/non-spam. If you use this, make sure you train ifile as described in its documentation.

Variable: spam-ifile-spam-category: This is the category of spam messages as far as ifile is concerned. The actual string used is irrelevant, but you probably want to leave the default value of `spam'.

Variable: spam-ifile-database-path

This is the filename for the ifile database. It is not specified by default, so ifile will use its own default database name.

The ifile mail classifier is similar to Bogofilter in intent and purpose. A ham and a spam processor are provided, plus the spam-use-ifile variable to indicate to spam-split that ifile should be used. The 1.2.1 version of ifile was used to test this functionality.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.13 spam-stat spam filtering

See section 8.19.6 Filtering Spam Using Statistics with spam-stat.

Variable: spam-use-stat

Enable this variable if you want spam-split to use spam-stat.el, an Emacs Lisp statistical analyzer.

Variable: gnus-group-spam-exit-processor-stat

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the spam-marked articles will be added to the spam-stat database of spam messages.

WARNING

Instead of the obsolete gnus-group-spam-exit-processor-stat, it is recommended that you use '(spam spam-use-stat). Everything will work the same way, we promise.

Variable: gnus-group-ham-exit-processor-stat

Add this symbol to a group's spam-process parameter by customizing the group parameters or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the ham-marked articles in ham groups will be added to the spam-stat database of non-spam messages. Note that this ham processor has no effect in spam or unclassified groups.

WARNING

Instead of the obsolete gnus-group-ham-exit-processor-stat, it is recommended that you use '(ham spam-use-stat). Everything will work the same way, we promise.

This enables `spam.el' to cooperate with `spam-stat.el'. `spam-stat.el' provides an internal (Lisp-only) spam database, which unlike ifile or Bogofilter does not require external programs. A spam and a ham processor, and the spam-use-stat variable for spam-split are provided.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.14 Using SpamOracle with Gnus

An easy way to filter out spam is to use SpamOracle. SpamOracle is an statistical mail filtering tool written by Xavier Leroy and needs to be installed separately.

There are several ways to use SpamOracle with Gnus. In all cases, your mail is piped through SpamOracle in its mark mode. SpamOracle will then enter an `X-Spam' header indicating whether it regards the mail as a spam mail or not.

One possibility is to run SpamOracle as a :prescript from the See section 6.3.4.1 Mail Source Specifiers, (see section 8.19.3 SpamAssassin, Vipul's Razor, DCC, etc). This method has the advantage that the user can see the X-Spam headers.

The easiest method is to make `spam.el' (see section 8.19.5 Filtering Spam Using The Spam ELisp Package) call SpamOracle.

To enable SpamOracle usage by `spam.el', set the variable spam-use-spamoracle to t and configure the nnmail-split-fancy or nnimap-split-fancy as described in the section See section 8.19.5 Filtering Spam Using The Spam ELisp Package. In this example the `INBOX' of an nnimap server is filtered using SpamOracle. Mails recognized as spam mails will be moved to spam-split-group, `Junk' in this case. Ham messages stay in `INBOX':

(setq spam-use-spamoracle t
      spam-split-group "Junk"
      nnimap-split-inbox '("INBOX")
      nnimap-split-rule 'nnimap-split-fancy
      nnimap-split-fancy '(| (: spam-split) "INBOX"))

Variable: spam-use-spamoracle: Set to t if you want Gnus to enable spam filtering using SpamOracle.

Variable: spam-spamoracle-binary: Gnus uses the SpamOracle binary called `spamoracle' found in the user's PATH. Using the variable spam-spamoracle-binary, this can be customized.

Variable: spam-spamoracle-database: By default, SpamOracle uses the file `~/.spamoracle.db' as a database to store its analyses. This is controlled by the variable spam-spamoracle-database which defaults to nil. That means the default SpamOracle database will be used. In case you want your database to live somewhere special, set spam-spamoracle-database to this path.

SpamOracle employs a statistical algorithm to determine whether a message is spam or ham. In order to get good results, meaning few false hits or misses, SpamOracle needs training. SpamOracle learns the characteristics of your spam mails. Using the add mode (training mode) one has to feed good (ham) and spam mails to SpamOracle. This can be done by pressing | in the Summary buffer and pipe the mail to a SpamOracle process or using `spam.el''s spam- and ham-processors, which is much more convenient. For a detailed description of spam- and ham-processors, See section 8.19.5 Filtering Spam Using The Spam ELisp Package.

Variable: gnus-group-spam-exit-processor-spamoracle

Add this symbol to a group's spam-process parameter by customizing the group parameter or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, spam-marked articles will be sent to SpamOracle as spam samples.

WARNING

Instead of the obsolete gnus-group-spam-exit-processor-spamoracle, it is recommended that you use '(spam spam-use-spamoracle). Everything will work the same way, we promise.

Variable: gnus-group-ham-exit-processor-spamoracle

Add this symbol to a group's spam-process parameter by customizing the group parameter or the gnus-spam-process-newsgroups variable. When this symbol is added to a group's spam-process parameter, the ham-marked articles in ham groups will be sent to the SpamOracle as samples of ham messages. Note that this ham processor has no effect in spam or unclassified groups.

WARNING

Instead of the obsolete gnus-group-ham-exit-processor-spamoracle, it is recommended that you use '(ham spam-use-spamoracle). Everything will work the same way, we promise.

Example: These are the Group Parameters of a group that has been classified as a ham group, meaning that it should only contain ham messages.

 ((spam-contents gnus-group-spam-classification-ham)
  (spam-process ((ham spam-use-spamoracle)
                 (spam spam-use-spamoracle))))

For this group the spam-use-spamoracle is installed for both ham and spam processing. If the group contains spam message (e.g. because SpamOracle has not had enough sample messages yet) and the user marks some messages as spam messages, these messages will be processed by SpamOracle. The processor sends the messages to SpamOracle as new samples for spam.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.5.15 Extending the Spam ELisp package

Say you want to add a new back end called blackbox. For filtering incoming mail, provide the following:

Code

(defvar spam-use-blackbox nil
  "True if blackbox should be used.")

Add

(spam-use-blackbox   . spam-check-blackbox)

to spam-list-of-checks.

Add

(gnus-group-ham-exit-processor-blackbox  ham spam-use-blackbox)
(gnus-group-spam-exit-processor-blackbox spam spam-use-blackbox)

to spam-list-of-processors.

Add

(spam-use-blackbox spam-blackbox-register-routine
                   nil
                   spam-blackbox-unregister-routine
                   nil)

to spam-registration-functions. Write the register/unregister routines using the bogofilter register/unregister routines as a start, or other restister/unregister routines more appropriate to Blackbox.

Functionality
Write the spam-check-blackbox function. It should return `nil' or spam-split-group, observing the other conventions. See the existing spam-check-* functions for examples of what you can do, and stick to the template unless you fully understand the reasons why you aren't.

Make sure to add spam-use-blackbox to spam-list-of-statistical-checks if Blackbox is a statistical mail analyzer that needs the full message body to operate.

For processing spam and ham messages, provide the following:

Code

Note you don't have to provide a spam or a ham processor. Only provide them if Blackbox supports spam or ham processing.

Also, ham and spam processors are being phased out as single variables. Instead the form '(spam spam-use-blackbox) or '(ham spam-use-blackbox) is favored. For now, spam/ham processor variables are still around but they won't be for long.

(defvar gnus-group-spam-exit-processor-blackbox "blackbox-spam"
  "The Blackbox summary exit spam processor.
Only applicable to spam groups.")

(defvar gnus-group-ham-exit-processor-blackbox "blackbox-ham"
  "The whitelist summary exit ham processor.
Only applicable to non-spam (unclassified and ham) groups.")

Gnus parameters
Add
(const :tag "Spam: Blackbox" (spam spam-use-blackbox)) (const :tag "Ham: Blackbox" (ham spam-use-blackbox))
to the spam-process group parameter in gnus.el. Make sure you do it twice, once for the parameter and once for the variable customization.
Add
(variable-item spam-use-blackbox)
to the spam-autodetect-methods group parameter in gnus.el.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.6 Filtering Spam Using Statistics with spam-stat

Paul Graham has written an excellent essay about spam filtering using statistics: A Plan for Spam. In it he describes the inherent deficiency of rule-based filtering as used by SpamAssassin, for example: Somebody has to write the rules, and everybody else has to install these rules. You are always late. It would be much better, he argues, to filter mail based on whether it somehow resembles spam or non-spam. One way to measure this is word distribution. He then goes on to describe a solution that checks whether a new mail resembles any of your other spam mails or not.

The basic idea is this: Create a two collections of your mail, one with spam, one with non-spam. Count how often each word appears in either collection, weight this by the total number of mails in the collections, and store this information in a dictionary. For every word in a new mail, determine its probability to belong to a spam or a non-spam mail. Use the 15 most conspicuous words, compute the total probability of the mail being spam. If this probability is higher than a certain threshold, the mail is considered to be spam.

Gnus supports this kind of filtering. But it needs some setting up. First, you need two collections of your mail, one with spam, one with non-spam. Then you need to create a dictionary using these two collections, and save it. And last but not least, you need to use this dictionary in your fancy mail splitting rules.

8.19.6.1 Creating a spam-stat dictionary
8.19.6.2 Splitting mail using spam-stat
8.19.6.3 Low-level interface to the spam-stat dictionary

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.6.1 Creating a spam-stat dictionary

Before you can begin to filter spam based on statistics, you must create these statistics based on two mail collections, one with spam, one with non-spam. These statistics are then stored in a dictionary for later use. In order for these statistics to be meaningful, you need several hundred emails in both collections.

Gnus currently supports only the nnml back end for automated dictionary creation. The nnml back end stores all mails in a directory, one file per mail. Use the following:

Function: spam-stat-process-spam-directory: Create spam statistics for every file in this directory. Every file is treated as one spam mail.

Function: spam-stat-process-non-spam-directory: Create non-spam statistics for every file in this directory. Every file is treated as one non-spam mail.

Usually you would call spam-stat-process-spam-directory on a directory such as `~/Mail/mail/spam' (this usually corresponds the the group `nnml:mail.spam'), and you would call spam-stat-process-non-spam-directory on a directory such as `~/Mail/mail/misc' (this usually corresponds the the group `nnml:mail.misc').

When you are using IMAP, you won't have the mails available locally, so that will not work. One solution is to use the Gnus Agent to cache the articles. Then you can use directories such as `"~/News/agent/nnimap/mail.yourisp.com/personal_spam"' for spam-stat-process-spam-directory. See section 6.8.5 Agent as Cache.

Variable: spam-stat: This variable holds the hash-table with all the statistics--the dictionary we have been talking about. For every word in either collection, this hash-table stores a vector describing how often the word appeared in spam and often it appeared in non-spam mails.

If you want to regenerate the statistics from scratch, you need to reset the dictionary.

Function: spam-stat-reset: Reset the spam-stat hash-table, deleting all the statistics.

When you are done, you must save the dictionary. The dictionary may be rather large. If you will not update the dictionary incrementally (instead, you will recreate it once a month, for example), then you can reduce the size of the dictionary by deleting all words that did not appear often enough or that do not clearly belong to only spam or only non-spam mails.

Function: spam-stat-reduce-size: Reduce the size of the dictionary. Use this only if you do not want to update the dictionary incrementally.

Function: spam-stat-save: Save the dictionary.

Variable: spam-stat-file: The filename used to store the dictionary. This defaults to `~/.spam-stat.el'.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.6.2 Splitting mail using spam-stat

In order to use spam-stat to split your mail, you need to add the following to your `~/.gnus.el' file:

(require 'spam-stat)
(spam-stat-load)

This will load the necessary Gnus code, and the dictionary you created.

Next, you need to adapt your fancy splitting rules: You need to determine how to use spam-stat. The following examples are for the nnml back end. Using the nnimap back end works just as well. Just use nnimap-split-fancy instead of nnmail-split-fancy.

In the simplest case, you only have two groups, `mail.misc' and `mail.spam'. The following expression says that mail is either spam or it should go into `mail.misc'. If it is spam, then spam-stat-split-fancy will return `mail.spam'.

(setq nnmail-split-fancy
      `(| (: spam-stat-split-fancy)
          "mail.misc"))

Variable: spam-stat-split-fancy-spam-group: The group to use for spam. Default is `mail.spam'.

If you also filter mail with specific subjects into other groups, use the following expression. Only mails not matching the regular expression are considered potential spam.

(setq nnmail-split-fancy
      `(| ("Subject" "\\bspam-stat\\b" "mail.emacs")
          (: spam-stat-split-fancy)
          "mail.misc"))

If you want to filter for spam first, then you must be careful when creating the dictionary. Note that spam-stat-split-fancy must consider both mails in `mail.emacs' and in `mail.misc' as non-spam, therefore both should be in your collection of non-spam mails, when creating the dictionary!

(setq nnmail-split-fancy
      `(| (: spam-stat-split-fancy)
          ("Subject" "\\bspam-stat\\b" "mail.emacs")
          "mail.misc"))

You can combine this with traditional filtering. Here, we move all HTML-only mails into the `mail.spam.filtered' group. Note that since spam-stat-split-fancy will never see them, the mails in `mail.spam.filtered' should be neither in your collection of spam mails, nor in your collection of non-spam mails, when creating the dictionary!

(setq nnmail-split-fancy
      `(| ("Content-Type" "text/html" "mail.spam.filtered")
          (: spam-stat-split-fancy)
          ("Subject" "\\bspam-stat\\b" "mail.emacs")
          "mail.misc"))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.19.6.3 Low-level interface to the spam-stat dictionary

The main interface to using spam-stat, are the following functions:

Function: spam-stat-buffer-is-spam: Called in a buffer, that buffer is considered to be a new spam mail. Use this for new mail that has not been processed before.

Function: spam-stat-buffer-is-no-spam: Called in a buffer, that buffer is considered to be a new non-spam mail. Use this for new mail that has not been processed before.

Function: spam-stat-buffer-change-to-spam: Called in a buffer, that buffer is no longer considered to be normal mail but spam. Use this to change the status of a mail that has already been processed as non-spam.

Function: spam-stat-buffer-change-to-non-spam: Called in a buffer, that buffer is no longer considered to be spam but normal mail. Use this to change the status of a mail that has already been processed as spam.

Function: spam-stat-save: Save the hash table to the file. The filename used is stored in the variable spam-stat-file.

Function: spam-stat-load: Load the hash table from a file. The filename used is stored in the variable spam-stat-file.

Function: spam-stat-score-word: Return the spam score for a word.

Function: spam-stat-score-buffer: Return the spam score for a buffer.

Function: spam-stat-split-fancy: Use this function for fancy mail splitting. Add the rule `(: spam-stat-split-fancy)' to nnmail-split-fancy

Make sure you load the dictionary before using it. This requires the following in your `~/.gnus.el' file:

(require 'spam-stat)
(spam-stat-load)

Typical test will involve calls to the following functions:

Reset: (setq spam-stat (make-hash-table :test 'equal))
Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
Save table: (spam-stat-save)
File size: (nth 7 (file-attributes spam-stat-file))
Number of words: (hash-table-count spam-stat)
Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
Reduce table size: (spam-stat-reduce-size)
Save table: (spam-stat-save)
File size: (nth 7 (file-attributes spam-stat-file))
Number of words: (hash-table-count spam-stat)
Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")

Here is how you would create your dictionary:

Reset: (setq spam-stat (make-hash-table :test 'equal))
Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
Repeat for any other non-spam group you need...
Reduce table size: (spam-stat-reduce-size)
Save table: (spam-stat-save)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.20 Interaction with other modes

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.20.1 Dired

gnus-dired-minor-mode provided some useful functions for dired buffers. It is enabled with

(add-hook 'dired-mode-hook 'turn-on-gnus-dired-mode)

C-c C-m C-a: Send dired's marked files as an attachment (gnus-dired-attach). You will be prompted for a message buffer.
C-c C-m C-l: Visit a file according to the appropriate mailcap entry (gnus-dired-find-file-mailcap). With prefix, open file in a new buffer.
C-c C-m C-p: Print file according to the mailcap entry (gnus-dired-print). If there is no print command, print in a PostScript image.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.21 Various Various

gnus-home-directory

All Gnus file and directory variables will be initialized from this variable, which defaults to `~/'.

gnus-directory

Most Gnus storage file and directory variables will be initialized from this variable, which defaults to the SAVEDIR environment variable, or `~/News/' if that variable isn't set.

Note that Gnus is mostly loaded when the `~/.gnus.el' file is read. This means that other directory variables that are initialized from this variable won't be set properly if you set this variable in `~/.gnus.el'. Set this variable in `.emacs' instead.

gnus-default-directory

Not related to the above variable at all--this variable says what the default directory of all Gnus buffers should be. If you issue commands like C-x C-f, the prompt you'll get starts in the current buffer's default directory. If this variable is nil (which is the default), the default directory will be the default directory of the buffer you were in when you started Gnus.

gnus-verbose

This variable is an integer between zero and ten. The higher the value, the more messages will be displayed. If this variable is zero, Gnus will never flash any messages, if it is seven (which is the default), most important messages will be shown, and if it is ten, Gnus won't ever shut up, but will flash so many messages it will make your head swim.

gnus-verbose-backends

This variable works the same way as gnus-verbose, but it applies to the Gnus back ends instead of Gnus proper.

nnheader-max-head-length

When the back ends read straight heads of articles, they all try to read as little as possible. This variable (default 8192) specifies the absolute max length the back ends will try to read before giving up on finding a separator line between the head and the body. If this variable is nil, there is no upper read bound. If it is t, the back ends won't try to read the articles piece by piece, but read the entire articles. This makes sense with some versions of ange-ftp or efs.

nnheader-head-chop-length

This variable (default 2048) says how big a piece of each article to read when doing the operation described above.

nnheader-file-name-translation-alist

This is an alist that says how to translate characters in file names. For instance, if `:' is invalid as a file character in file names on your system (you OS/2 user you), you could say something like:

(setq nnheader-file-name-translation-alist
      '((?: . ?_)))

In fact, this is the default value for this variable on OS/2 and MS Windows (phooey) systems.

gnus-hidden-properties

This is a list of properties to use to hide "invisible" text. It is (invisible t intangible t) by default on most systems, which makes invisible text invisible and intangible.

gnus-parse-headers-hook

A hook called before parsing headers. It can be used, for instance, to gather statistics on the headers fetched, or perhaps you'd like to prune some headers. I don't see why you'd want that, though.

gnus-shell-command-separator

String used to separate two shell commands. The default is `;'.

gnus-invalid-group-regexp

Regexp to match "invalid" group names when querying user for a group name. The default value catches some really invalid group names who could possibly mess up Gnus internally (like allowing `:' in a group name, which is normally used to delimit method and group).

IMAP users might want to allow `/' in group names though.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by XEmacs shared group account on December, 19 2009 using texi2html 1.65.

8. Various

8.1 Process/Prefix

8.2 Interactive

8.3 Symbolic Prefixes

8.4 Formatting Variables

8.4.1 Formatting Basics

8.4.2 Mode Line Formatting

8.4.3 Advanced Formatting

8.4.4 User-Defined Specs

8.4.5 Formatting Fonts

8.4.6 Positioning Point

8.4.7 Tabulation

8.4.8 Wide Characters

8.5 Window Layout

8.5.1 Example Window Configurations

8.6 Faces and Fonts

8.7 Compilation

8.8 Mode Lines

8.9 Highlighting and Menus

8.10 Buttons

8.11 Daemons

8.12 NoCeM

8.13 Undo

8.14 Predicate Specifiers

8.15 Moderation

8.16 Fetching a Group

8.17 Image Enhancements

8.17.1 X-Face

8.17.2 Face

8.17.3 Smileys

8.17.4 Picons

8.17.5 Various XEmacs Variables

8.17.5.1 Toolbar

8.18 Fuzzy Matching

8.19 Thwarting Email Spam

8.19.1 The problem of spam

8.19.2 Anti-Spam Basics

8.19.3 SpamAssassin, Vipul's Razor, DCC, etc

8.19.4 Hashcash

8.19.5 Filtering Spam Using The Spam ELisp Package

8.19.5.1 Spam ELisp Package Sequence of Events

8.19.5.2 Spam ELisp Package Filtering of Incoming Mail

8.19.5.3 Spam ELisp Package Global Variables

8.19.5.4 Spam ELisp Package Configuration Examples

Ted's setup

Using `spam.el' on an IMAP server with a statistical filter on the server

Reporting spam articles in Gmane groups with spam-report.el

8.19.5.5 Blacklists and Whitelists

8.19.5.6 BBDB Whitelists

8.19.5.7 Gmane Spam Reporting

8.19.5.8 Anti-spam Hashcash Payments

8.19.5.9 Blackholes

8.19.5.10 Regular Expressions Header Matching

8.19.5.11 Bogofilter

8.19.5.12 ifile spam filtering

8.19.5.13 spam-stat spam filtering

8.19.5.14 Using SpamOracle with Gnus

8.19.5.15 Extending the Spam ELisp package

8.19.6 Filtering Spam Using Statistics with spam-stat

8.19.6.1 Creating a spam-stat dictionary

8.19.6.2 Splitting mail using spam-stat

8.19.6.3 Low-level interface to the spam-stat dictionary

8.20 Interaction with other modes

8.20.1 Dired

8.21 Various Various

Using ``spam.el`' on an IMAP server with a statistical filter on the server

Reporting spam articles in Gmane groups with `spam-report.el`