Sizing your SBC: a practical capacity planning guide

Most SBC sizing exercises start with the vendor's datasheet and end with a box that's either oversized by 4x or hits CPU at 70% of projected load. The datasheet numbers assume G.711 calls, no transcoding, and a pristine network — none of which describe real production traffic.

Here's how to size an open-source SBC (Kamailio + rtpengine) for the traffic you'll actually carry.

Two separate problems

SBC capacity splits cleanly into two planes:

Signaling plane (Kamailio) — handles SIP message processing: INVITE, REGISTER, OPTIONS, BYE, and all the state machine around them. Measured in calls per second (CPS) and concurrent dialogs.

Media plane (rtpengine) — handles RTP/SRTP forwarding, transcoding, and recording. Measured in concurrent sessions and CPU cores for transcoding.

Size them independently and deploy them on separate machines. A signaling overload shouldn't kill media, and a transcoding spike shouldn't drop your SIP registrations.

Signaling sizing (Kamailio)

Kamailio's CPS capacity on modern hardware:

Hardware	INVITE CPS (no DB)	INVITE CPS (with Postgres)	Concurrent dialogs
4-core 2020-era Xeon	4,000–6,000	1,500–2,500	200,000+
8-core Xeon/EPYC	10,000–15,000	4,000–7,000	500,000+
16-core EPYC	20,000–30,000	8,000–15,000	1M+

Rules of thumb:

If your routing decisions hit a database per INVITE, capacity drops by 50-70%. Use htable or Redis for hot lookups.
REGISTER storms (after a network partition, for example) can generate 10-50x your normal CPS. Size for the storm, not steady state.
Concurrent dialogs consume memory linearly. At ~1KB per dialog, 500k dialogs = 500MB. This is rarely the bottleneck.

Media sizing (rtpengine)

rtpengine's media capacity is determined by:

Concurrent sessions — RTP forwarding is cheap. A single 8-core machine handles 50,000+ concurrent G.711 relay sessions without transcoding.
Transcoding overhead — this is where you actually consume CPU. Rough per-core capacity:

Codec pair	Concurrent sessions per core
G.711 relay (no transcode)	~10,000
G.711 μ-law ↔ a-law	~5,000
G.711 ↔ G.729 (software)	~150–200
G.711 ↔ Opus	~80–120
G.711 ↔ AMR-WB	~60–100
G.729 ↔ Opus	~50–80

G.729 transcoding is the one that catches teams by surprise. If 30% of your calls transcode G.711 to G.729, you need 15x the CPU compared to relay-only.

SRTP overhead — SRTP encryption/decryption costs roughly 5-10% additional CPU vs. plain RTP. Negligible unless you're doing full transcoding at the same time.
Recording — writing PCM to disk while relaying adds I/O load. Budget 2-3x the RTP bandwidth in disk write throughput, and use dedicated disks or object store offload.

Working through an example

Scenario: 500 concurrent calls, 30% G.711↔G.729 transcoding, 70% G.711 relay, SRTP on all legs, no recording.

Signaling (Kamailio):

At 500 concurrent calls with ~3-minute average duration: ~2.8 CPS. A basic 4-core Kamailio handles this comfortably — signaling is not the bottleneck.
Size for 10x: 28 CPS for spikes. Still well within a 4-core node.

Media (rtpengine):

350 G.711 relay sessions: negligible
150 G.711↔G.729 sessions at ~175 sessions/core = 0.86 cores
SRTP overhead: +10% = ~0.95 cores total for transcoding
Add 50% headroom: 1.5 cores for transcoding

Result: a 4-core rtpengine server handles this comfortably with room for 3x growth.

Revised scenario with recording: Add recording for all 500 calls at G.711 = 64kbps per leg × 2 legs = 128kbps per call. 500 calls = 64MB/s of disk writes. This requires dedicated disk I/O — don't share with OS or application disks.

Network sizing

RTP bandwidth per call:

G.711 (20ms ptime): ~80kbps per direction
G.729 (20ms ptime): ~26kbps per direction
Opus (20ms ptime, default): ~40kbps per direction

For 500 concurrent G.711 calls (both directions through rtpengine): 500 × 2 × 80kbps = 80Mbps. A gigabit NIC handles this with headroom. Above 5,000 concurrent G.711 calls, start thinking about 10GbE and multiple NIC queues.

Hardware selection checklist

Before finalizing hardware:

Calculate peak CPS including registration storms (not just steady-state call setup)
Break down your codec mix — G.711 relay vs. transcoding matters by 50-100x
Add 50% headroom minimum on both planes for traffic growth and failure scenarios
If recording, isolate disk I/O on dedicated volumes or offload to S3
Plan HA pairs — your sizing for a single node assumes it's always up
Test under load with SIPp before production. The numbers above are empirical, not guaranteed.

HA pairing

Deploy Kamailio and rtpengine in HA pairs with keepalived for IP failover. The specific failure modes to test:

Kamailio node failure: the peer takes the VIP within <2s. In-progress calls can survive if you're using stateless processing (most INVITE routing is).
rtpengine node failure: in-progress calls drop at the media layer. Signal-level REINVITE is required to re-establish media. Design for this or deploy an rtpengine cluster where the partner node can take over an existing session via the ng control protocol.
Database failure: if your routing tables are in Postgres, a DB failure can take down both signaling nodes. Use htable caching with a warm-reload on DB reconnect.

Sizing is easier when you've measured. Instrument your deployment with Prometheus + Grafana from day one — knowing your actual CPS, concurrent session count, and codec mix takes all the guesswork out of capacity planning for the next upgrade.

Sizing your SBC: a practical capacity planning guide

Sizing your SBC: a practical capacity planning guide

Two separate problems

Signaling sizing (Kamailio)

Media sizing (rtpengine)

Working through an example

Network sizing

Hardware selection checklist

HA pairing

Ready to build on carrier-grade voice?