[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12. What's MIME?

Messages so far, more exactly RFC822 messages, cannot contain objects other than text. MIME is multi-purpose message to extend RFC822.

MIME has

 
MIME-Version: 1.0

field in its header. Without this field, it is an RFC822 message. In MIME, Content-Type: to indicate data type and Content-Transfer-Encoding to specify encoding are important fields. The following sections describe these fields and feature of MIME.

12.1 Labeling data type  
12.2 Encoding for transport-safe  
12.3 Multipart structure  
12.4 Header extensions  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.1 Labeling data type

With MIME, data type can be specified in Content-Type:(CT:) field. The following is an example message whose body is US-ASCII text.

 
MIME-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: hello
From: Kazu

Hi all,

If CT: is omitted, the content is treated as "Text/Plain; charset=us-ascii". And if CT: is "Text/Plain" and charset is not specified, its charset is considered as US-ASCII.

Likewise, if CT: is text, charset can be specified in the context of MIME. For Japanese, ISO-2022-JP is used.

MIME can embed multiple objects in its body, so called multipart. Each part in multipart consists of content-header and content-body. CT: appears in content-header as well as header. In the contrary, you can take header as a special type of content-header.

For more information, please refer to See section 12.3 Multipart structure.

Important CT: is listed below.

`Text/Plain'
Text
`Message/Rfc822'
Message including MIME which has a header and a body
`Multipart/Mixed'
Multipart
`Application/Postscript'
PostScript
`Application/Octet-stream'
Binary stream. Can be considered as a binary file
`Image/Gif'
GIF
`Image/Jpeg'
JPEG
`Audio/Basic'
Audio file with AU format
`Video/Mpeg'
MPEG
`Message/External-body'
An phantom object whose real object exists outside of the message


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.2 Encoding for transport-safe

"uuencode" has been used for a long time to transport binary. It encodes three 8-bit characters into four 6-bit characters, however, the result contains many kinds of symbols. Some of them have special meanings in header so they cannot be used to extent header functionality.

Space character bothers the transport system. Space character cannot exist in end of line of the file system of BITNET. Suppose that an encoded object with uuencode contains space character in end of line. When a message gateway BITNET received this kind of message, it removes the space character, of course. In the result, receivers cannot decode and extract the original object.

MIME specified 2 encoding methods for body.

Base64 encoding
Encode three 8-bit characters into four 6-bit characters with 64 letters, "0-9A-Za-z/+". PEM originates it.
Quoted-Printable encoding
Represent non-printable characters in hexagonal preceded by "=".

Encoding is specified by Content-Transfer-Encoding:(CTE:) in content-header. The candidate values are as follows:

7bit
No encoding is applied. The content consists of 7 bit lines.
8bit
No encoding is applied. The content consists of 8 bit lines.
binary
No encoding is applied. The content is 8 bit stream.
base64
Encoded with Base64. The content consists of 7 bit lines.
quoted-printable
Encoded with Quoted-Printable. The content consists of 7 bit lines.

If CTE: is omitted, it is treated as `7bit'.

Since ISO-2022-JP is 7bit character set, CTE: is 7bit. That it, CTE: can be omitted. You may encode it with base64 or quoted-printable, of course. However, you cannot read messages in folder directly with such a encoding, I don't recommend.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.3 Multipart structure

If CT: is multipart, its content-body has multiple objects. They are separated with a string specified in the "boundary" parameter. Let's look at an example.

 
Message-Id: <13060.789566615@iijlab.net>
From: Kazuhiko Yamamoto =?ISO-2022-JP?B?GyRCOzNLXE9CSScbKEI=?=
        <kazu@iijlab.net>
Subject: =?ISO-2022-JP?B?GyRCPC8kTjMoGyhC?=
To: m-sakura@ccs.mt.nec.co.jp
Mime-Version: 1.0
Content-Type: Multipart/Mixed; boundary=simple
Content-Transfer-Encoding: 7bit

--simple
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here is a picture of deer.

--Kazu
    
--simple
Content-Type: Image/Gif
Content-Transfer-Encoding: base64
Content-Description: "Deer on the Nara park"
    
R0lGODdhFwG8ANUAABETDCoYDC8lFi4dJxcnKTMwLkUUC04uG2opEkgeJ04yMWg4Ly1FLVJG
NWdSMywyTks1Tmc3RjdRVjNcalRMUG9UU1xbY051eG9pcIcxEp5bM8d1NI1VSJhrVrRwUpR0
cKZ1dcN9WXuHOWmHc7WJN6yLbcyEWNCZdDZjjml0i5t7im+TmGeRonWly5aLlrCLlK+arJmn
pbettMabktWumM+zsrnCrtTLua21ycq6x6/J3NbQ1+bk29na5dzp8+7w8ywAAAAAFwG8AAAG
/8CLcPhYtVgNyirWasZYEgDhIWGxRiXWcTIATHS/Hs6K2+1wt59azYtdJnBhKrVaWYcp7==
    
--simple--

In this case, a string "simple" is used. A string specified in the "boundary" parameter is preceded with "--". The last one is also followed by "--".

Each part consists of a content-header and a content-body. They are separated with a null line as header and body. Changing a point of view, header and body is special content-header and special content-body respectively.

When you send objects other than text, you should use multipart. Of course, it is not illegal to contain, for example, Audio/Basic in body but the receivers would be really confused. You are kind if you enclose describing text in the first part and embed Audio/Basic in the second part.

Multipart can take recursive. So, you can enjoy multipart of multipart.

By the way, preceding CRLF is included in a boundary. For example up above, the boundary is "CRLF--simpleCRLF".


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.4 Header extensions

Header contains information used for transportation, so it should be strongly prohibited to insert improper characters that make transport agents misoperated. With MIME, non-ASCII characters are encoded into transport-safe characters then stored as a field value with the following format.

 
=?<charset>?<encoding>?<encoded-string>?=

<charset> is identical to the charset parameter of CT: Text/Plain. For <encoding>, `B' or `Q' is used. The former is exactly base64 and the latter is a kind of Quoted-Printable.

For ISO-2022-JP, `B' is encouraged. `Q' is also acceptable, however, few message interfaces support it(of course, Mew does).

For instance, the author's Japanese name in Subject: is encoded as follows:

 
Subject: =?ISO-2022-JP?B?GyRCOzNLXE9CSScbKEI=?=

It is not parameter values but field values that this format can handle. One of the reasons why this format must not be applied to parameter values is that the "=" keyword conflicts the separator between parameter names and parameter values. To encode non-ASCII characters in a parameter value, another format should be used. Please see the following example to understand the differences:

 
Content-Disposition: attachment;
 filename*=iso-2022-jp''%1B%24BF%7CK%5C8l%24N%25U%25%21%25%24%25k%1B%28B


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by XEmacs shared group account on December, 19 2009 using texi2html 1.65.