Email attachments are encoded into text format before sending to ensure that control characters aren’t sent over the Internet.
The text of your email is stored as a series of readable alphanumeric characters. However, graphics, spreadsheets, video files, and even word processing documents can contain characters that may be stored in any of the 256 different combinations of 1’s and 0’s that make up an 8-bit byte. For example, three different byte combinations are shown below:
01000010
01101101
01011011
Most byte combinations are readable, such as the ones above, which represent the characters “B”, “m”, and “[” respectively. Something less than 100 of the 256 different possible byte combinations represent the standard alphanumeric characters including capital letters, lower case letters, numbers, punctuation marks, and other characters found on most computer keyboards — a list of codes for the standard ANSI characters can be found here. Extending beyond these old fashioned codes to keep information technology relevant in the modern world, the Unicode Consortium provides standards management of a consistent assignment of codes to a wide range of different characters across platforms, programs and languages.
However, many of the byte combinations that don’t represent readable characters are also used as instructions. If bytes containing these instruction codes were transmitted over the Internet, at a minimum the message would be broken into separate pieces, and at worse the instruction bytes would unintentionally tell the routers, switches, and other components that handled your email to take all sorts of unpredictable actions. In practice, the software systems — email client, operating system, networking software — between your email program and the Internet might catch any strange byte combinations before they hit the wider network, but since this is not a standard case they can produce unpredictable results. For example, this writer once sent an attachment with an experimental email system that didn’t encode attachments properly, resulting in the attachment being received as a zero-length file, truncated of all content by an intervening software layer who presumably stopped reading as soon as it saw an unconverted control character.
To protect against this problem, email programs routinely encode attached files before they are mailed with a program that filters out any non-readable bytes in a predictably reversible way. When the recipient’s email program receives the attachment and it is downloaded onto their machine, their email program decodes the attachment according to a standard procedure to reconstruct the original file.
Each encoded file includes an instruction that tells the email recipient what type of encoding program was used. There are a number of more or less standard encoding methods, including MIME, uuencode, BinHex, and AppleDouble (Mac version of MIME). Most email programs can decode most of the common standards. Two of the most common standards are described below:
- MIME. The modern MIME encoding standard was first defined in paragraph 4.3 of RFC 989, updated by paragraph 4.3.2.4 of RFC 1421, and has become the most common standard used for email encoding. MIME encodes a file into the following 64 alphanumeric characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/
- Uuencode. Uuencode was one of the earliest encoding standards, first developed on the Unix BSD operating system. For many years there was no other standards document defining uuencode, which led to incompatible implementations until later versions were generally built to be compliant with the POSIX standard P1003.2b/D11, later IEEE Std 1003.1-2001. These later versions incorporated the MIME standard as an option. Most of the earlier versions encoded a file into the following 64 text characters:
`!”#$%&'()*+,-./0122456789:;?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_
You are sometimes given a choice in your email application settings for which encoding method your program should use. The best choice is usually MIME, which almost all email programs support. However, if somebody can’t read an attachment that you send them, try setting the encoding method to AppleDouble (same as MIME on a Mac), BinHex, or Uuencode in that order. Remember to change your settings back to MIME for sending to everyone else.
Resources. The following RFC provides a good description of some current encoding standards, including Base64 which is a common name for MIME.
- RFC 3548; S. Josefsson, Ed.; The Base16, Base32, and Base64 Data Encodings; July 2003.