Restarting obox backend with minimal user impact

This procedure is generic procedure to perform backend maintenance with minimal user impact.

Note

The best way to avoid any user impact is to avoid having to use this procedure in the first place:

  • Upgrading can be simply done with installing new Dovecot deb/rpm packages.

  • Configuration changes can be done by modifying the config file and running doveadm reload. If there are configuration mistakes, the reload will fail and preserve the original configuration. Although this only happens for syntax mistakes and other mistakes that can doveconf can catch - not mistakes that are detected only at runtime.

Shutting down backend

  1. Do this procedure 1 backend at a time.

    This is to minimize the user impact.

  2. In director set vhost count on that backend to 0.

    This is to stop director process from mapping new user sessions to this backend and to notify dovemon that the backend is under maintenance.

    doveadm director update <backend ip> 0
    
  3. Wait 15mins for disconnected user session hashes to expire.

  4. In director check how many users still mapped to backend.

    doveadm director status
    
  5. Disable service lmtp on the selected backend.

    This is to minimize metacache changes while doing metacache flush.

    doveadm service stop lmtp
    
  6. Shut down dovecot on the selected backend

    This will also flush metacache as long as dovecot-metacache-flush service is not disabled.

    systemctl dovecot stop
    
  7. In director flush all user sessions in backend.

    Backend is now shut down, we need to tell director layer to rehash the sessions to remaining backends.

    doveadm director down <backend ip>
    doveadm director flush <backend ip>
    
  8. Backend has now been removed from director ring and all user sessions are rehashed to remaining backends.

Now all sessions are gone and backend is ready for upgrade or major config change.

Starting up backend

  1. Synchronize metacache

    Metacache database may not be fully synchronized with the index files that actually exist on filesystem. It’s recommended at this stage to either delete the metacache or rescan it.

    • Rescan metacache:

      New in version v2.3.7.

      doveadm metacache rescan
      
    • Delete metacache:

      1. Remove old metacache database files.

        As metacache service is now reduced to just one file the old 4 files need to be removed.

        rm -f /var/lib/dovecot/metacache/metacache-users*
        
      2. Remove metacache from filesystem:

        rm -rf /var/dovecot/vmail/*
        
  2. Start dovecot again.

    systemctl start dovecot
    
  3. Verify with test user that backend is usable.

    doveadm mailbox list -u <uid>
    doveadm mailbox status -u <uid> messages "*"
    doveadm fetch -u <uid> text all > /dev/null
    
    • The first command fetches mailbox list from metacache. This is fetched from storage now as metacache is reset.

    • The second command fetches more info from metacache.

    • The last command verifies that dovecot can fetch mail objects from storage.

  4. If all of the above commands succeed, backend can be put back to production.

  5. In director ring update backend status

    doveadm director update <backend ip> 100
    doveadm director up <backend ip>