DNS changes can break inbound mail while local sending still appears normal. This runbook was used to recover delivery by validating each layer in order, with evidence-based checks.
1) Incident pattern
- internal send works
- external inbound fails
- bounces or missing messages
- Exim queue accumulates defers/frozen entries
Log watch:
tail -f /var/log/exim_mainlog | egrep -i "defer|frozen|dns|host lookup|retry"
2) Validate authoritative BIND zone first
named-checkzone domain.com.br /var/named/domain.com.br.db
Common faults:
- SOA serial not incremented
- missing trailing dot in FQDN
- MX host without valid A/AAAA
- NS mismatch against registrar delegation
Apply and verify:
rndc reload
rndc status
3) Validate local authoritative answers
dig @127.0.0.1 MX domain.com.br +short
dig @127.0.0.1 NS domain.com.br +short
dig @127.0.0.1 A mail.domain.com.br +short
4) Trace external delegation and propagation
dig MX domain.com.br +trace
dig NS domain.com.br +trace
Cross-check public resolvers:
dig @8.8.8.8 MX domain.com.br +short
dig @1.1.1.1 MX domain.com.br +short
dig @9.9.9.9 MX domain.com.br +short
5) Audit Exim queue during transition window
exiqgrep -r "@domain.com.br"
exiqgrep -r "@domain.com.br" -c
exim -Mvh MESSAGE_ID
exim -Mvl MESSAGE_ID
exim -Mt MESSAGE_ID
exim -qff
exiqgrep -z -i
6) Frequent root causes in this incident class
- new MX configured but registrar still delegates old NS
- MX target exists but A/AAAA missing
- SMTP 25/tcp blocked on new host
- retry/frozen queue items from transition timing
Port check:
nc -vz mail.domain.com.br 25
7) Recovery acceptance criteria
- zone check passes
- authoritative local answers are correct
- public resolvers converge to new MX
- Exim queue drains without new DNS defers
- external inbox tests succeed (at least two providers)
8) Prevention actions
- run DNS pre-check before production changes
- document TTL and expected propagation window
- monitor queue by domain after cutover
- include external inbound test in change checklist
Post-DNS email failures are usually delegation plus queue state, not random behavior. The BIND -> trace -> Exim sequence provides fast, controlled recovery.
This post is licensed under CC BY-NC.
Comments
Join the discussion below.
Comments are not configured yet. Add Cusdis settings in /assets/json/config/blog-comments-config.json.