Mbox Mailbox Format¶
For information on how to configure mbox in Dovecot, see Mbox Configuration.
Warning
Mbox format is considered deprecated and is maintained primarily for backwards compatibility and utility purposes (specifically for archival purposes, as mbox allows multiple messages to be natively stored in a single file).
mbox is not being maintained for write fixes or general feature or optimization improvements, so it is not advised to use to actively store production data.
In a production system, a more modern mailbox format should be used, e.g., dbox Mailbox Format (or Maildir Mailbox Format).
Overview¶
Usually UNIX systems are configured by default to deliver mails to
/var/mail/username
or /var/spool/mail/username
mboxes. In the IMAP
world, these files are called INBOX mailboxes. IMAP protocol supports multiple
mailboxes , so there needs to be a place for them as well. Typically they’re
stored in ~/mail/
or ~/Mail/
directories.
The mbox file contains all the messages of a single mailbox. Because of this, the mbox format is typically thought of as a slow format. However with Dovecot’s indexing this isn’t true. Only expunging messages from the beginning of a large mbox file is slow with Dovecot, most other operations should be fast. Also because all the mails are in a single file, searching is much faster than with Maildir Mailbox Format.
Modifications to mbox may require moving data around within the file, so interruptions (eg. power failures) can cause the mbox to break more or less badly. Although Dovecot tries to minimize the damage by moving the data in a way that data should never get lost (only duplicated), mboxes still aren’t recommended to be used for important data.
History¶
The history of mbox format, and a discussion of its historical use and generally agreed-upon conventions, can be found in RFC 4155.
Additionally, the mbox Wikipedia page also is a good resource.
Locking¶
Locking is a mess with mboxes. There are multiple different ways to lock a mbox, and software often uses incompatible locking. See Mbox Locking for how to check what locking methods some commonly used programs use.
There are at least four different ways to lock a mbox:
Method |
Description |
---|---|
dotlock |
|
flock |
|
fcntl |
Very similar to flock, also commonly used by software. In some
systems this |
lockf |
POSIX |
Dotlock¶
Another problem with dotlocks is that if the mailboxes exist in
/var/mail/
, the user may not have write access to the directory, so the
dotlock file can’t be created. There are a couple of ways to work around this:
Give a mail group write access to the directory and then make sure that all software requiring access to the directory runs with the group’s privileges. This may mean making the binary itself setgid-mail, or using a separate dotlock helper program which is setgid-mail. With Dovecot this can be done by setting
mail_privileged_group = mail
.Set sticky bit to the directory (
chmod +t /var/mail
). This makes it somewhat safe to use, because users can’t delete each others mailboxes, but they can still create new files (the dotlock files). The downside to this is that users can create whatever files they wish in there, such as a mbox for newly created user who hadn’t yet received mail.
Deadlocks¶
If multiple lock methods are used, which is usually the case since dotlocks aren’t typically used for read locking, the order in which the locking is done is important. Consider if two programs were running at the same time, both use dotlock and fcntl locking but in different order:
Program A: fcntl locks the mbox
Program B at the same time: dotlocks the mbox
Program A continues: tries to dotlock the mbox, but since it’s already dotlocked by B, it starts waiting
Program B continues: tries to fcntl lock the mbox, but since it’s already fcntl locked by A, it starts waiting
Now both of them are waiting for each others locks. Finally after a couple of minutes they time out and fail the operation.
Directory Structure¶
By default, when listing mailboxes, Dovecot simply assumes that all files it sees are mboxes and all directories mean that they contain sub-mailboxes. There are two special cases however which aren’t listed:
.subscriptions
file contains IMAP’s mailbox subscriptions..imap/
directory contains Dovecot’s index files.
Because it’s not possible to have a file which is also a directory, it’s not normally possible to create a mailbox and child mailboxes under it.
However if you really want to be able to have mailboxes containing both messages and child mailboxes under mbox, then Dovecot can be configured to do this, subject to certain provisos; see Mbox Child Folders Configuration.
Dovecot’s Metadata¶
Dovecot uses c-Client (ie. UW-IMAP, Pine) compatible headers in mbox messages to store metadata. These headers are:
Header |
Description |
---|---|
X-IMAPbase |
Contains UIDVALIDITY, last used UID, and list of used keywords |
X-IMAP |
Same as X-IMAPbase but also specifies that the message is a “pseudo-message” |
X-UID |
Message’s allocated UID |
Status |
R (Seen) and O (non-Recent) flags |
X-Status |
A (Answered), F (Flagged), T (Draft), and D (Deleted) flags |
X-Keywords |
Message’s keywords |
Content-Length |
Length of the message body in bytes |
Whenever any of these headers exist, Dovecot treats them as its own private metadata. It does sanity checks for them, so the headers may also be modified or removed completely. None of these headers are sent to IMAP/POP3 clients when they read the mail.
The :ref:`MTA <mta>`, :ref:`MDA <mda>` or :ref:`LDA <lda>` should strip all these headers case-insensitively before writing the mail to the mbox.*
Only the first message contains the X-IMAP or X-IMAPbase header. The difference is that when all the messages are deleted from mbox file, a pseudo message is written to the mbox which contains X-IMAP header.
This is the DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
message
which you hate seeing when using non-C-client and non-Dovecot software. This
is however important to prevent abuse, otherwise the first mail which is
received could contain faked X-IMAPbase header which could cause trouble.
If message contains X-Keywords header, it contains a space-separated list of keywords for the mail. Since the same header can come from the mail’s sender, only the keywords are listed in X-IMAP header are used.
The UID for a new message is calculated from last used UID in X-IMAP header + 1. This is done always, so fake X-UID headers don’t really matter. This is also why the pseudo-message is important. Otherwise the UIDs could easily grow over 231 which some clients start treating as negative numbers, which then cause all kinds of problems. Also when 232 is exceeded, Dovecot will also start having some problems.
Content-Length is used as long as another valid mail starts after that many bytes. Because the byte count must be exact, it’s quite unlikely that abusing it can cause messages to be skipped (or rather appended to the previous message’s body).
Status and X-Status headers are trusted completely, so it’s pretty good idea to filter them in LDA if possible.
Dovecot’s Speed Optimizations¶
Updating messages’ flags and keywords can be a slow operation since you may have to insert a new header (Status, X-Status, X-Keywords) or at least insert data in the header’s value. Some mbox MUAs do this simply by rewriting all of the mbox after the inserted data. If the mbox is large, this can be very slow. Dovecot optimizes this by always leaving some space characters after some of its internal headers. It can use this space to move only minimal amount of data necessary to get the necessary data inserted. Also if data is removed, it just grows these spaces areas.
There are several configuration options that can be used that will affect optimization:
From Escaping¶
In mboxes a new mail always begins with a “From “ line, commonly referred to
as From_
-line. To avoid confusion, lines beginning with “From “ in message
bodies are usually prefixed with ‘>’ character while the message is being
written to in mbox.
Dovecot doesn’t currently do this escaping however. Instead it prevents this confusion by adding Content-Length headers so it knows later where the next message begins. Dovecot also doesn’t remove the ‘>’ characters before sending the data to clients.
Mbox Variants¶
There are a few minor variants of this format:
Name |
Description |
---|---|
mboxo |
An original mbox format originated with Unix System V. Messages are stored in a single file, with each message beginning with a line containing “From SENDER DATE”. If “From “ (case-sensitive, with the space) occurs at the beginning of a line anywhere in the email, it is escaped with a greater-than sign (to “>From “). Lines already quoted as such, for example “>From “ or “>>>From ” are not quoted again, which leads to irrecoverable corruption of the message content. |
mboxrd |
Named for Raul Dhesi in June 1995, though several people came up with the same idea around the same time. An issue with the mboxo format was that if the text “>From “ appeared in the body of an email (such as from a reply quote), it was not possible to distinguish this from the mailbox format’s quoted “>From “. mboxrd fixes this by always quoting already quoted “From “ lines (e.g. “>From “, “>>From “, “>>>From “, etc.) as well, so readers can just remove the first “>” character. This format is used by qmail and getmail (>=4.35.0). |
mboxcl |
Originated with Unix System V Release 4 mail tools. It adds a Content-Length field which indicates the number of bytes in the message. This is used to determine message boundaries. It still quotes “From “ as the original mboxo format does (and not as mboxrd does it). |
mboxcl2 |
Like mboxcl but does away with the “From “ quoting. Dovecot uses this format internally. |
MMDF |
(Multi-channel Memorandum Distribution Facility mailbox format) originated with the MMDF daemon. The format surrounds each message with lines containing four control-A’s. This eliminates the need to escape From: lines. |
How a message is read stored in mbox extension¶
An email client reader scans throughout mbox file looking for
From_
lines.Any
From_
line marks the beginning of a message.Once the reader finds a message, it extracts a (possibly corrupted) envelope sender and delivery date out of the
From_
line.It then reads until the next
From_
line or scans till the end of file, wheneverFrom_
comes first.It removes the last blank line and deletes the quoting of
>From_
lines and>>From_
lines and so on.