Dovecot Flatcurve FTS Plugin
Added: 2.4.0
This is a Dovecot FTS plugin to enable message indexing using the Xapian Open Source Search Engine Library.
Note
Requires Xapian 1.4+.
The plugin relies on Dovecot to do the necessary stemming. It is intended to act as a simple interface to the Xapian storage/search query functionality.
This driver supports match scoring and substring matches (on by default), which means it is RFC 3501 (IMAP4rev1) compliant. This driver does not support fuzzy searches.
The driver passes all of the ImapTest search tests.
Note
This plugin requires the fts plugin to be activated and configured
Enabling flatcurve is designed to be as easy as adding this to configuration:
mail_plugins = $mail_plugins fts fts_flatcurve
plugin {
fts = flatcurve
}
Settings
Note
The default settings should be fine in most scenarios.
fts_flatcurve_commit_limit
Default | 500 |
---|---|
Value | unsigned integer |
Advanced Setting; this should not normally be changed. |
Commit database changes after this many documents are updated. Higher commit limits will result in faster indexing for large transactions (i.e. indexing a large mailbox) at the expense of high memory usage. The default value should be sufficient to allow indexing in a 256 MB maximum size process.
Set to 0
to use the Xapian default.
fts_flatcurve_max_term_size
Default | 30 |
---|---|
Value | unsigned integer |
Advanced Setting; this should not normally be changed. |
The maximum number of characters in a term to index.
The maximum value for this setting is 200
.
fts_flatcurve_min_term_size
Default | 2 |
---|---|
Value | unsigned integer |
Advanced Setting; this should not normally be changed. |
The minimum number of characters in a term to index.
fts_flatcurve_optimize_limit
Default | 10 |
---|---|
Value | unsigned integer |
Advanced Setting; this should not normally be changed. |
Once the database reaches this number of shards, automatically optimize the DB at shutdown.
Set to 0
to disable auto-optimization.
fts_flatcurve_rotate_count
Default | 5000 |
---|---|
Value | unsigned integer |
Advanced Setting; this should not normally be changed. |
When the "current" fts database reaches this number of messages, it is rotated to a read-only database and replaced by a new write DB. Most people should not change this setting.
Set to 0
to disable rotation.
fts_flatcurve_rotate_time
Default | 5000 |
---|---|
Value | time (milliseconds) |
Advanced Setting; this should not normally be changed. |
When the "current" fts database exceeds this length of time (in msecs) to commit changes, it is rotated to a read-only database and replaced by a new write DB. Most people should not change this setting.
Set to 0
to disable rotation.
fts_flatcurve_substring_search
Default | no |
---|---|
Value | boolean |
If enabled, allows substring searches (RFC 3501 compliant). However, this requires significant additional storage space. Many users today expect "Google-like" behavior, which is prefix searching, so substring searching is arguably not the modern expected behavior anyway. Therefore, even though it is not strictly RFC compliant, prefix (non-substring) searching is enabled by default.
Configuration Example
mail_plugins = $mail_plugins fts fts_flatcurve
plugin {
fts = flatcurve
# Maximum email-address token size (254) is larger than Xapian can handle,
# so we will need to truncate at some level. It is doubtful that large
# email-addresses are useful for search purposes, so this optional config
# will prevent these large addresses (more than 100 bytes) from being
# stored.
#fts_tokenizers = generic email-address
#fts_tokenizer_email_address = maxlen=100
# All of these are optional, and indicate the default values.
# They are listed here for documentation purposes; most people should
# not need to define/override in their config.
fts_flatcurve_commit_limit = 500
fts_flatcurve_max_term_size = 30
fts_flatcurve_min_term_size = 2
fts_flatcurve_optimize_limit = 10
fts_flatcurve_rotate_count = 5000
fts_flatcurve_rotate_time = 5000
fts_flatcurve_substring_search = no
}
Data Storage
Xapian search data is stored separately for each mailbox.
The data is stored under a 'fts-flatcurve' directory in the Dovecot index file location for the mailbox. The Xapian library is responsible for all data stored in that directory - no Dovecot code directly writes to any file.
Logging/Events
INFO
This plugin emits with category fts-flatcurve
, a child of the category fts
(see events design).
fts_flatcurve_expunge
Added: 2.4.0
Emitted when a message is expunged from a mailbox.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
mailbox | The mailbox name |
uid | The UID that was expunged from FTS index |
fts_flatcurve_index
Added: 2.4.0
Emitted when a message is indexed.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
mailbox | The mailbox name |
uid | The UID that was added to the FTS index |
fts_flatcurve_last_uid
Added: 2.4.0
Emitted when the system queries for the last UID indexed.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
mailbox | The mailbox name |
uid | The last UID contained in the FTS index |
fts_flatcurve_optimize
Added: 2.4.0
Emitted when a mailbox is optimized.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
mailbox | The mailbox name |
fts_flatcurve_query
Added: 2.4.0
Emitted when a query is completed.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
count | The number of messages matched |
mailbox | The mailbox name |
maybe | Are the results uncertain? [yes | no] |
query | The query text sent to Xapian |
uids | The list of UIDs returned by the query |
fts_flatcurve_rescan
Added: 2.4.0
Emitted when a rescan is completed.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
expunged | The list of UIDs that were expunged during rescan |
mailbox | The mailbox name |
status | Status of the rescan [expunge_msgs | missing_msgs | ok] |
uids | The list of UIDs that triggered a non-ok status response |
fts_flatcurve_rotate
Added: 2.4.0
Emitted when a mailbox has its underlying Xapian DB rotated.Field List
Field | Description |
---|---|
duration | Duration of the event (in microseconds) |
reason_code | List of reason code strings why the event happened. See event reasons for possible values. |
mailbox | The mailbox name |