Building resilient SIP routing with Kamailio dispatcher
Most Kamailio deployments that start with a round-robin dispatcher don't stay round-robin for long. When a carrier route degrades — increasing packet loss, rising PDD, occasional 5xx storms — you need Kamailio to detect it and route around it without human intervention. That's what the dispatcher module with active probing gives you.
This post covers the configuration in enough detail to ship a production setup, not just a working demo.
dispatcher.list: the basics
The dispatcher module reads destinations from a file or database. The file format is one destination per line:
# /etc/kamailio/dispatcher.list
# setid destination flags priority
1 sip:carrier-a.example:5060 0 10
1 sip:carrier-b.example:5060 0 10
2 sip:backup.example:5060 0 5
- setid groups destinations.
ds_select_dst(1, 4)picks from setid 1 using algorithm 4 (round-robin with weights). - flags mark destination state.
0= active,1= inactive,2= probing. Kamailio updates these at runtime viads_set_state(). - priority breaks ties in weight-based algorithms. Higher priority wins when weights are equal.
Reload the list without restart:
kamcmd dispatcher.reload
Probing modes
Active probing sends SIP OPTIONS to each destination on a configurable interval. When a destination stops responding, Kamailio marks it inactive and stops routing to it.
# kamailio.cfg
modparam("dispatcher", "ds_probing_mode", 1)
modparam("dispatcher", "ds_ping_interval", 10)
modparam("dispatcher", "ds_probing_threshold", 3)
modparam("dispatcher", "ds_inactive_threshold", 3)
ds_probing_mode = 1— probe all active and inactive destinations. Use2to probe only destinations flagged for probing (flag2in dispatcher.list).ds_ping_interval— seconds between OPTIONS sends. 10s is reasonable for carrier routes; go lower (5s) for critical paths, higher (30s) for backup destinations.ds_probing_threshold— consecutive failed probes before marking destination inactive.ds_inactive_threshold— consecutive successful probes before marking an inactive destination active again.
Handling probe responses in kamailio.cfg
The dispatcher module needs an onreply_route to process OPTIONS 200 OK responses:
onreply_route[MANAGE_REPLY] {
if (status =~ "[12][0-9][0-9]") {
ds_mark_dst("a"); # mark as active
}
}
failure_route[MANAGE_FAILURE] {
if (t_is_canceled()) {
exit;
}
ds_mark_dst("i"); # mark as inactive
if (!ds_next_dst()) {
# no more destinations — send 503
send_reply("503", "Service Unavailable");
exit;
}
t_relay();
}
ds_next_dst() moves to the next destination in the set and returns false when the set is exhausted. This gives you sequential failover within a single INVITE attempt.
Failover priority ordering
When you have a primary carrier and a backup, use separate setids with explicit fallback logic:
route[CARRIER_ROUTE] {
# Try setid 1 (primary carriers) first
if (!ds_select_dst(1, 4)) {
# setid 1 completely inactive — fall to backup
if (!ds_select_dst(2, 0)) {
send_reply("503", "No route available");
exit;
}
}
route(RELAY);
}
Set 2 (algorithm 0 = hash over call-id) acts as a stable backup. This setup means: try the primary pool first, and only if every destination in it is probing-inactive, fall through to the backup.
Keepalive tuning
The right ds_ping_interval depends on your NAT/firewall environment and your SLA:
| Environment | Recommended interval | Reasoning |
|---|---|---|
| Carrier interconnect (no NAT) | 30s | Low keepalive cost, carrier routes are stable |
| Enterprise trunks (behind FW) | 10s | Firewall state tables time out in 30-60s |
| WebRTC gateway | 5s | Fast detection matters for UX |
| Backup/failover only | 60s | Just need to know it's alive |
Don't set ds_ping_interval below 5s unless you have a very specific reason — you'll saturate the OPTIONS handling on busy carrier SBCs.
Monitoring dispatcher state
Check current state via the Kamailio management interface:
# All destinations in all sets
kamcmd dispatcher.list
# Specific set
kamcmd dispatcher.list 1
Output shows each destination with its current flags. Automate this into your monitoring stack:
# Prometheus textfile exporter (simplified)
kamcmd dispatcher.list | awk '/^{/ { next } /URI:/ { uri=$2 } /Flags:/ { flags=$2 } /Latency:/ { print "kamailio_dispatcher_latency{uri=\"" uri "\"} " $2 }' > /var/lib/node_exporter/kamailio_dispatcher.prom
With this in place, Grafana can alert when a destination's latency climbs above threshold before it drops completely — giving you early warning rather than reactive failover.
Putting it together
A production-ready dispatcher setup has:
- dispatcher.list with setids, flags, and explicit priorities
- Active probing with tuned threshold values
failure_routethat callsds_next_dst()for sequential failover- Separate setids for primary and backup pools
- Monitoring that surfaces per-destination state and latency
The dispatcher module is one of Kamailio's most reliable components — we've seen deployments handle 50k+ concurrent sessions with dispatcher failover completing in under 2 seconds on a single-node failure. The key is tuning the probing parameters to match your actual carrier SLA and your tolerance for misdirected calls during the detection window.




