Back to blog

How to diagnose and fix email delivery failures after DNS changes (Exim + BIND)

10/26/2025 · 2 min · Email

Share

DNS changes can break inbound mail while local sending still appears normal. This runbook was used to recover delivery by validating each layer in order, with evidence-based checks.

1) Incident pattern

  1. internal send works
  2. external inbound fails
  3. bounces or missing messages
  4. Exim queue accumulates defers/frozen entries

Log watch:

tail -f /var/log/exim_mainlog | egrep -i "defer|frozen|dns|host lookup|retry"

2) Validate authoritative BIND zone first

named-checkzone domain.com.br /var/named/domain.com.br.db

Common faults:

Apply and verify:

rndc reload
rndc status

3) Validate local authoritative answers

dig @127.0.0.1 MX domain.com.br +short
dig @127.0.0.1 NS domain.com.br +short
dig @127.0.0.1 A mail.domain.com.br +short

4) Trace external delegation and propagation

dig MX domain.com.br +trace
dig NS domain.com.br +trace

Cross-check public resolvers:

dig @8.8.8.8 MX domain.com.br +short
dig @1.1.1.1 MX domain.com.br +short
dig @9.9.9.9 MX domain.com.br +short

5) Audit Exim queue during transition window

exiqgrep -r "@domain.com.br"
exiqgrep -r "@domain.com.br" -c
exim -Mvh MESSAGE_ID
exim -Mvl MESSAGE_ID
exim -Mt MESSAGE_ID
exim -qff
exiqgrep -z -i

6) Frequent root causes in this incident class

  1. new MX configured but registrar still delegates old NS
  2. MX target exists but A/AAAA missing
  3. SMTP 25/tcp blocked on new host
  4. retry/frozen queue items from transition timing

Port check:

nc -vz mail.domain.com.br 25

7) Recovery acceptance criteria

  1. zone check passes
  2. authoritative local answers are correct
  3. public resolvers converge to new MX
  4. Exim queue drains without new DNS defers
  5. external inbox tests succeed (at least two providers)

8) Prevention actions

  1. run DNS pre-check before production changes
  2. document TTL and expected propagation window
  3. monitor queue by domain after cutover
  4. include external inbound test in change checklist

Post-DNS email failures are usually delegation plus queue state, not random behavior. The BIND -> trace -> Exim sequence provides fast, controlled recovery.

CC BY-NC

This post is licensed under CC BY-NC.

Comments

Join the discussion below.