Transaction log¶
See Mail index file format for an overview of what the transaction log is and what its benefits are.
File format¶
The transaction log is usually appended to. Once in a while the log gets rotated into .log.2 and a new .log is created. During this rotation the main index is also recreated.
The log begins with a header (struct mail_transaction_log_header
)
followed by transaction records. Each transaction record begins with a header
(struct mail_transaction_header
) followed by transaction type specific
content. The transaction types are described in enum mail_transaction_type
.
A single transaction record may contain multiple changes of the same
type, although some types don’t allow this. Because the size of the
transaction record for each type is known (or can be determined from the
type-specific record contents), the mail_transaction_header.size
field can
be used to figure out how many changes there are. For example a transaction
record that appends two mails:
struct mail_transaction_header {
.type = MAIL_TRANSACTION_APPEND,
.size = sizeof(struct mail_index_record) * 2
}
struct mail_index_record { .uid = 1, .flags = 0 }
struct mail_index_record { .uid = 2, .flags = 0 }
For supporting multi-record transactions, there’s a MAIL_TRANSACTION_BOUNDARY
type. These transactions are always processed at the same time, or if the
write broke before finishing them the log file truncates the partial
transaction away. However, there’s no further guarantee that the subsequent
transaction processing will all either succeed or fail. Generally they should,
but the mailbox syncing could fail partially for example due to storage
problems.
See lib-index/mail-transaction-log.h
in the source code for details.
Internal vs. external¶
Transactions are either internal or external. The difference is that external transactions describe changes that were already made to the mailbox, while internal transactions are commands to do something to the mailbox. This is especially relevant with mailbox formats that support changes to them done outside Dovecot, like mbox or Maildir. When beginning to synchronize such a mailbox with index files, the index file is first updated with all the external changes, and the uncommitted internal transactions are applied on top of them.
When synchronizing the mailbox, using the synchronization transaction writes only external transactions. Also if the index file is updated when saving new mails to the mailbox, the append transactions must be external. This is because the changes are already in the mailbox at the time the transaction is read.
Reading and writing¶
Appending to an existing dovecot-index.log
file locks it exclusively
using the index files’ default lock method (lock_method
setting). The transaction log files are opened with O_APPEND
flag,
which usually makes the writes appear as atomic (although this doesn’t
seem to be actually guaranteed, practically this seems to happen).
Reading transaction logs doesn’t require any locking at all. Due to the
O_APPEND
behavior the reads typically don’t even see partially
written transactions (although they are also handled properly).
A new log is created by first creating a dovecot.index.log.newlock
dotlock file, which guarantees that the process itself is the only one
creating a new log. After this, verify whether another process had just
recreated the dovecot.index.log
. If it had, there’s no need to recreate
it again. If not, finish writing the log header to the newlock file and
finally rename()
it to dovecot.index.log
.
Transaction logs are always only appended to, with one exception: If a partially written transaction is found, it’ll be truncated away and new transaction is overwritten on top of the old truncated data. FIXME: This should be avoided, rather rotate the transaction log instead.
UIDs¶
Many record types contain uint32_t uid1, uid2
fields. This means
that the changes apply to all the messages in uid1..uid2 range. Dovecot
used to optimize these so that if for example the first 3 messages in
a mailbox were 1,100,1000, these could have been referred to with
{ uid1=1, uid2=1000 }
. However, this is no longer done for a few
reasons:
Another session might still have the same mailbox open without having synced away the expunged messages (IMAP sessions can do this by not sending any commands that allow syncing expunges). For example it might still see that the 2nd mail is uid=50. It would be confusing that the already expunged mail gets flag updates.
It makes it more difficult to debug problems when it’s not clear which messages exactly were intended to be referred to.
There are likely also some issues related to dsync.
Expunges¶
Because expunges actually destroy messages, they deserve some extra
protection to make it less likely to accidentally expunge wrong messages
in case of for example file corruption. The expunge transactions must
have MAIL_TRANSACTION_EXPUNGE_PROT
ORed to the transaction type
field. If an expunge type is found without it, assume a corrupted
transaction log.
MAIL_TRANSACTION_EXPUNGE_GUID
is preferred to be used for expunging
messages over MAIL_TRANSACTION_EXPUNGE
that just lists the UIDs. The
mailbox syncing looks up the actual GUID for the referred mail, and verifies
that it matches the GUID in the expunge request. If they don’t match,
something’s corrupted and the mail won’t be expunged. The expunge GUID
records also improve dsync behavior.
Flags and keywords¶
IMAP protocol supports changing both flags and keywords with the same STORE command, but in Dovecot index fields they are handled separately.
Flags can be added/removed with MAIL_TRANSACTION_FLAG_UPDATE
while
keywords can be added/remove with MAIL_TRANSACTION_KEYWORD_UPDATE
.
Keywords can be completely cleared out with MAIL_TRANSACTION_KEYWORD_RESET
.
To completely replace all flags and keywords with wanted ones, set:
MAIL_TRANSACTION_FLAG_UPDATE
: Setmail_transaction_flag_update.add_flags
to the wanted system flags and.remove_flags = 0xff
.
MAIL_TRANSACTION_KEYWORD_RESET
to remove all keywords.
MAIL_TRANSACTION_KEYWORD_UPDATE
to set back the wanted keywords.
Extensions¶
Extension records allow creating and updating extension-specific header and message record data. For example messages’ offsets to cache file or mbox file are stored in extensions.
Whenever using an extension, you’ll need to first write
MAIL_TRANSACTION_EXT_INTRO
record. This is a bit kludgy and
hopefully will be replaced by something better in future. The intro
contains:
struct mail_transaction_ext_intro {
uint32_t ext_id;
uint32_t reset_id;
uint32_t hdr_size;
uint16_t record_size;
uint16_t record_align;
uint16_t flags;
uint16_t name_size;
/* unsigned char name[]; */
};
If the extension already exists in the index file, ext_id
can be set
to it directly (extensions can’t be removed from an existing index).
For adding a new extension, specify the extension name instead and use
ext_id=(uint32_t)-1
. It’s always possible to just give the name if
you don’t know the existing extension ID, but this uses more disk space.
reset_id
contains kind of a “transaction validity” field. It’s
updated with MAIL_TRANSACTION_EXT_RESET
record, which (optionally)
causes the extension records’ contents to be zeroed. If an introduction’s
reset_id
doesn’t match the last EXT_RESET, it means that the
extension changes are stale and they must be ignored. For example:
dovecot.index.cache
file’sfile_seq
header is used as areset_id
. Initially it’s 1.Process A: Begins a cache transaction, updating some fields in it.
Process B: Decides to compress the cache file, and issues a
reset_id = 2
change.Process A: Commits the transaction with
reset_id = 1
, but the cache file offsets point to the old file, so the changes must be ignored.
hdr_size
specifies the number of bytes the extension wants to have
in the index file’s header. record_size
specifies the number of
bytes it wants to use for each record. The sizes may grow or shrink at any
time. record_align
contains the required alignment for the
field. For example if the extension contains a 32bit integer, the alignment
should be 32bit so that the process won’t crash in CPUs which
require proper alignment. Of course, if the field is accessed only as
4 individual bytes, the alignment can be 1.
Extension record updates typically are message-specific, so the changes must be done for each message separately rather than an UID range. For example:
struct mail_transaction_ext_rec_update {
uint32_t uid; // instead of uid1, uid2
/* unsigned char data[]; */
};