[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8. Configuring the Citation Engine

At the heart of Supercite is a regular expression interpreting engine called Regi. Regi operates by interpreting a data structure called a Regi-frame (or just frame), which is a list of Regi-entries (or just entry). Each entry contains a predicate, typically a regular expression, which is matched against a line of text in the current buffer. If the predicate matches true, an associated expression is evaluated. In this way, an entire region of text can be transformed in an awk-like manner. Regi is used throughout Supercite, from mail header information extraction, to header nuking, to citing text.

8.1 Using Regi  
8.2 Frames You Can Customize  

While the details of Regi are discussed below (see section 8.1 Using Regi), only those who wish to customize certain aspects of Supercite need concern themselves with it. It is important to understand though, that any conceivable citation style that can be described by a regular expression can be recognized by Supercite. This leads to some interesting applications. For example, if you regularly recieve email from a co-worker that uses an uncommon citation style (say one that employs a `|' or `}' character at the front of the line), it is possible for Supercite to recognize this and coerce the citation to your preferred style, for consistency. In theory, it is possible for Supercite to recognize such things as uuencoded messages or C code and cite or fill those differently than normal text. None of this is currently part of Supercite, but contributions are welcome!


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1 Using Regi

Regi works by interpreting frames with the function regi-interpret. A frame is a list of arbitrary size where each element is a entry of the following form:

 
(pred func [negate-p [case-fold-search]])

Regi starts with the first entry in a frame, evaluating the pred of that entry against the beginning of the line that `point' is on. If the pred evaluates to true (or false if the optional negate-p is non-nil), then the func for that entry is evaluated. How processing continues is determined by the return value for func, and is described below. If pred was false the next entry in the frame is checked until all entries have been matched against the current line. If no entry matches, `point' is moved forward one line and the frame is reset to the first entry.

pred can be a string, a variable, a list or one of the following symbols: t, begin, end, or every. If pred is a string, or a variable or list that evaluates to a string, it is interpreted as a regular expression. This regexp is matched against the current line, from the beginning, using looking-at. This match folds case if the optional case-fold-search is non-nil. If pred is not a string, or does not evaluate to a string, it is interpreted as a binary value (nil or non-nil).

The four special symbol values for pred are recognized:

t
Always produces a true outcome.
begin
Always executed before the frame is interpreted. This can be used to initialize some global variables for example.
end
Always executed after frame interpreting is completed. This can be used to perform any necessary post-processing.
every
Executes whenever the frame is reset, usually after the entire frame has been matched against the current line.

Note that negate-p and case-fold-search are ignored if pred is one of these special symbols. Only the first occurance of each symbol in a frame is used; any duplicates are ignored. Also note that for performance reasons, the entries associated with these symbols are removed from the frame during the main interpreting loop.

Your func can return certain values which control continued Regi processing. By default, if your func returns nil (as it should be careful to do explicitly), Regi will reset the frame to the first entry, and advance `point' to the beginning of the next line. If a list is returned from your function, it can contain any combination of the following elements:

the symbol continue
This tells Regi to continue processing entries after a match, instead of reseting the frame and moving `point'. In this way, lines of text can have multiple matches, but you have to be careful to avoid entering infinite loops.

the symbol abort
This tells Regi to terminate frame processing. However, any end entry is still processed.

the list (frame . newframe)
This tells Regi to substitute newframe as the frame it is interpreting. In other words, your func can modify the Regi frame on the fly. newframe can be a variable containing a frame, or it can be the frame in-lined.

the list (step . step)
Tells Regi to move step number of lines forward as it continues processing. By default, Regi moves forward one line. step can be zero or negative of course, but watch out for infinite loops.

During execution of your func, the following variables will be temporarily bound to some useful information:

curline
The current line in the buffer that Regi is looking-at, as a string.
curframe
The current frame being interpreted.
curentry
The current frame entry being interpreted.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.2 Frames You Can Customize

As mentioned earlier, Supercite uses various frames to perform certain jobs such as mail header information extraction and mail header nuking. However, these frames are not available for you to customize, except through abstract interfaces such as sc-nuke-mail-header, et al.

However, the citation frames Supercite uses provide a lot of customizing power and are thus available to you to change to suit your needs. The workhorse of citation is the frame contained in the variable sc-default-cite-frame. This frame recognizes many situations, such as blank lines, which it interprets as paragraph separators. It also recognizes previously cited nested and non-nested citations in the original message. By default it will coerce non-nested citations into your preferred citation style, and it will add a level of citation to nested citations. It will also simply cite uncited lines in your preferred style.

In a similar vein, there are default frames for unciting and reciting, contained in the variables sc-default-uncite-frame and sc-default-recite-frame respectively.

As mentioned earlier (see section 2.2 Recognizing Citations), citations are recognized through the values of the regular expressions sc-citation-root-regexp, et al. To recognize odd styles, you could modify these variables, or you could modify the default citing frame. Alternatively, you could set up association lists of frames for recognizing specific alternative forms.

For each of the actions -- citing, unciting, and reciting -- an alist is consulted to find the frame to use (sc-cite-frame-alist, sc-uncite-frame-alist, and sc-recite-frame-alist respectively). These frames can contain alists of the form:

 
((infokey (regexp . frame) (regexp . frame) ...)
 (infokey (regexp . frame) (regexp . frame) ...)
 (...))

Where infokey is a key suitable for sc-mail-field, regexp is a regular expression which is string-match'd against the value of the sc-mail-field key, and frame is the frame to use if a match occurred. frame can be a variable containing a frame or a frame in-lined.

When Supercite is about to cite, uncite, or recite a region, it consults the appropriate alist and attempts to find a frame to use. If one is not found from the alist, then the appropriate default frame is used.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by XEmacs shared group account on December, 19 2009 using texi2html 1.65.