Building resilient SIP routing with Kamailio dispatcher

Most Kamailio deployments that start with a round-robin dispatcher don't stay round-robin for long. When a carrier route degrades — increasing packet loss, rising PDD, occasional 5xx storms — you need Kamailio to detect it and route around it without human intervention. That's what the dispatcher module with active probing gives you.

This post covers the configuration in enough detail to ship a production setup, not just a working demo.

dispatcher.list: the basics

The dispatcher module reads destinations from a file or database. The file format is one destination per line:

# /etc/kamailio/dispatcher.list
# setid  destination            flags  priority
1        sip:carrier-a.example:5060   0        10
1        sip:carrier-b.example:5060   0        10
2        sip:backup.example:5060      0        5

setid groups destinations. ds_select_dst(1, 4) picks from setid 1 using algorithm 4 (round-robin with weights).
flags mark destination state. 0 = active, 1 = inactive, 2 = probing. Kamailio updates these at runtime via ds_set_state().
priority breaks ties in weight-based algorithms. Higher priority wins when weights are equal.

Reload the list without restart:

kamcmd dispatcher.reload

Probing modes

Active probing sends SIP OPTIONS to each destination on a configurable interval. When a destination stops responding, Kamailio marks it inactive and stops routing to it.

# kamailio.cfg
modparam("dispatcher", "ds_probing_mode", 1)
modparam("dispatcher", "ds_ping_interval", 10)
modparam("dispatcher", "ds_probing_threshold", 3)
modparam("dispatcher", "ds_inactive_threshold", 3)

ds_probing_mode = 1 — probe all active and inactive destinations. Use 2 to probe only destinations flagged for probing (flag 2 in dispatcher.list).
ds_ping_interval — seconds between OPTIONS sends. 10s is reasonable for carrier routes; go lower (5s) for critical paths, higher (30s) for backup destinations.
ds_probing_threshold — consecutive failed probes before marking destination inactive.
ds_inactive_threshold — consecutive successful probes before marking an inactive destination active again.

Handling probe responses in kamailio.cfg

The dispatcher module needs an onreply_route to process OPTIONS 200 OK responses:

onreply_route[MANAGE_REPLY] {
    if (status =~ "[12][0-9][0-9]") {
        ds_mark_dst("a"); # mark as active
    }
}

failure_route[MANAGE_FAILURE] {
    if (t_is_canceled()) {
        exit;
    }
    ds_mark_dst("i"); # mark as inactive
    if (!ds_next_dst()) {
        # no more destinations — send 503
        send_reply("503", "Service Unavailable");
        exit;
    }
    t_relay();
}

ds_next_dst() moves to the next destination in the set and returns false when the set is exhausted. This gives you sequential failover within a single INVITE attempt.

Failover priority ordering

When you have a primary carrier and a backup, use separate setids with explicit fallback logic:

route[CARRIER_ROUTE] {
    # Try setid 1 (primary carriers) first
    if (!ds_select_dst(1, 4)) {
        # setid 1 completely inactive — fall to backup
        if (!ds_select_dst(2, 0)) {
            send_reply("503", "No route available");
            exit;
        }
    }
    route(RELAY);
}

Set 2 (algorithm 0 = hash over call-id) acts as a stable backup. This setup means: try the primary pool first, and only if every destination in it is probing-inactive, fall through to the backup.

Keepalive tuning

The right ds_ping_interval depends on your NAT/firewall environment and your SLA:

Environment	Recommended interval	Reasoning
Carrier interconnect (no NAT)	30s	Low keepalive cost, carrier routes are stable
Enterprise trunks (behind FW)	10s	Firewall state tables time out in 30-60s
WebRTC gateway	5s	Fast detection matters for UX
Backup/failover only	60s	Just need to know it's alive

Don't set ds_ping_interval below 5s unless you have a very specific reason — you'll saturate the OPTIONS handling on busy carrier SBCs.

Monitoring dispatcher state

Check current state via the Kamailio management interface:

# All destinations in all sets
kamcmd dispatcher.list

# Specific set
kamcmd dispatcher.list 1

Output shows each destination with its current flags. Automate this into your monitoring stack:

# Prometheus textfile exporter (simplified)
kamcmd dispatcher.list | awk '/^{/ { next } /URI:/ { uri=$2 } /Flags:/ { flags=$2 } /Latency:/ { print "kamailio_dispatcher_latency{uri=\"" uri "\"} " $2 }' > /var/lib/node_exporter/kamailio_dispatcher.prom

With this in place, Grafana can alert when a destination's latency climbs above threshold before it drops completely — giving you early warning rather than reactive failover.

Putting it together

A production-ready dispatcher setup has:

dispatcher.list with setids, flags, and explicit priorities
Active probing with tuned threshold values
failure_route that calls ds_next_dst() for sequential failover
Separate setids for primary and backup pools
Monitoring that surfaces per-destination state and latency

The dispatcher module is one of Kamailio's most reliable components — we've seen deployments handle 50k+ concurrent sessions with dispatcher failover completing in under 2 seconds on a single-node failure. The key is tuning the probing parameters to match your actual carrier SLA and your tolerance for misdirected calls during the detection window.

Building resilient SIP routing with Kamailio dispatcher

Building resilient SIP routing with Kamailio dispatcher

dispatcher.list: the basics

Probing modes

Handling probe responses in kamailio.cfg

Failover priority ordering

Keepalive tuning

Monitoring dispatcher state

Putting it together

Ready to build on carrier-grade voice?